Skip to content

Reg Issue 1 : All messages don't get consumed in RabbitMQ #1

Description

@ArjunSharma98

I will provide you with a detailed explanation of this issue, including how it impacts the smooth functioning of our system and how I was able to recreate it

Firstly, let me explain with an example how and where we use this RabbitMQ consumer workflow.
For every alert device integrated with our system, we have a scheduled workflow that runs every 10 minutes.

Through an API call to the alert source, it retrieves alerts that were triggered in the last 10 minutes. This API can return a variable number of alerts—ranging from 10 or 100, and in rare cases, even up to 2000.

The workflow then processes each item (alert) one by one and posts it into our system.
The post, in turn, triggers a message to our RabbitMQ. The message contains the details of the alert.

This message is then consumed by the RabbitMQ consumer workflow, which consists of a series of actions that enrich the alert. For example, if an alert contains an IP address, we retrieve IP enrichment data from reputation sources such as AbuseIP or AlienVault and update the alert with this information to enable better analysis.

This is where we first noticed the issue. The enrichment was not happening for all alerts, which means that not all alerts were successfully processed through the post-processing workflow.

I can confirm that all messages are being successfully pushed to RabbitMQ, and there is no issue on our side, as we have customers currently using this system with our previous workflow engine without any problems.

This was an explanation of how and why we use this workflow, and why it is very critical for our system.

Next, I will explain how I replicated the issue, along with a detailed explanation of what I observed in the logs.

I created a simple Python script to loop through an array of exactly 100 objects (each object representing an alert) and post them into our system. I ensured that each item was short, simple, and identical.
Each item, after being posted, triggered a RabbitMQ message. I then created a workflow to consume these messages.

The actions in this workflow are very simple:

  1. Consume the message
  2. Get the application token
  3. Wait for 5 seconds ( mimicking the time it usually takes to run the entire post processing )
    Perform a simple update activity on the alert to change the priority of the alert from MEDIUM to LOW (mimicking the post-processing enrichment steps that normally occur)

Now with log examples let me explain what I noticed
Note: The log structure has been updated slightly as the original log structure was absolutely unreadable.

When the first item is consumed, it fetches the application token and then starts the sleep timer. Meanwhile, the execution of the second item begins and it looks like the first item never finishes processing
[08:04:07] STEP Wait 5 seconds before posting
[08:04:08] START Workflow: jogflow.alerts.message.handler.update.alert
[08:04:08] STEP setvars

Same repeats for the first 6 items. The workflow stops after the sleep step and never wakes up

What I noticed from the 7th item onward is that the sleep action does not even work.

[08:04:12] STEP Wait 5 seconds before posting
[08:04:12] STEP update alert
url = http://si-alerts:5002/api/alerts/AF-DO-2512
PUT http://si-alerts:5002/api/alerts/AF-DO-2512
Authorization = Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJVaWQiOiJhcGl1c2VyIiwiUm9sZVVpZCI6IkFQSUFDQ0VTUyIsIm5iZiI6MTc4MjQ2MTA1MiwiZXhwIjoxNzgyNTQ3NDUyLCJpc3MiOiJhbGVydGZ1c2lvbi5jb20iLCJhdWQiOiJhbGVydGZ1c2lvbi5jb20ifQ.RlmUKtjsWq0OS7-q0e0BNfeAdjoYM0Q058JWFV2B9Oo
priorityUid = LOW
store = result
[08:04:12] END Workflow
[08:04:13] START Workflow: jogflow.alerts.message.handler.update.alert

The next action runs immediately as we can see in the timestamp

Image

As we can see in the image, The first 6 items still have priority MEDIUM as it did not run the update action to it LOW.

Every other item (except another one) has been updated to LOW ( without sleeping for 5 seconds )

Image

Another item, the 96th item also did not get updated. That item searched with its UID AF-DO-2601 cannot even be found in the log file

[

jogflow.alerts.message.handler.update.alert.yaml
startup.jogflow.rabbitmq.alert.receive.yaml
logs_worker_20260626T080349411_api_startup.jogflow.rabbitmq.alert.receive.txt

](url)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions