Note: the code for this post can be found at this repo.
There are two flavors of SQS queues: Standard and FIFO (First-in-first-out).
For lambdas with an SQS standard queues event source, AWS will produce as many lambda executions as is allowed by the user-defined lambda concurrency (that is, also assuming that there are enough messages in the SQS queue).
As a result of executing as many lambda functions as is possible, ordering is lost.
By this, I mean that if event 1
through event 20
were sent to the SQS queue in order (event 1
before event 2
before event 3
and so on),
they may not be processed in the same order (maybe event 13
is processed before event 7
for instance).
Suppose 2 lambdas were executed as a result of the messages in the SQS queue.
The first lambda polled events 1
through 10
; the second, events 11
through 20
.
Since the polling process is relatively quick,
the second lambda may well receive events 11
through 20
before lambda 1 even got around to processing event 2
.
Hence, event 13
, for example, may be processed before event 7
.
SQS FIFO queues, however, are advertized to follow a first-in, first-out model (hence the name FIFO queue
).
Although at first glace, for me at least, it sounds like all messages in the FIFO queue are ordered, that is not the case.
Message ordering is only preserved for messages in the same message group (that is, messages with the same messsageGroupId
).
As such, FIFO queues can be thought of as being similar to Kinesis Data Stream or Kafka; and message groups as being similar to partitions.
The differnce between the two technologies is that you can create arbitrarily many message groups,
but the number of partitions (although not fixed) is something to be decided upon with care.
Reading about the FIFO queues made me wonder: how can I confirm this behavior (that messages in a FIFO queue is ordered within each message group)? Naturally, I eventually thought of one and this post is about the solution I came up with.
Also, the code for this test is available for users and can be found in my GitLab repo.
It comes with two deployement options, SAM
(serverless application model) and CDK
(cloud development kit).
Since the question is whether given a sequence of messages in a queue (standard or FIFO), will those messages be processed in the same order as when they were written into the queue.
Instead of writing out actual timestamps, let me represent them using t_i
with t_i
being the timestamp of message i
.
In addition, let our lambda function take in a timestamp t_i
(denoting when the message was written into the queue) and output a ordered pair (t_i
, p_i
),
where p_i
(denoting when the message was processed and written into a database table).
If ordering is preserved, than for any two events, say (t_1
, p_1
) and (t_2
, p_2
), if t_1 < t_2
then p_1 < p_2
.
All this is saying is that, the order in which messages are processed is the same as the order in which messages were written to the queue.
Suppose you have received a bunch of messages (with timestamp given to denote their ordering)
and have written the data to a database table.
Then, to check if the ordering is preserved, you can query for all of the data in the database table,
sort the data by the first field (t_i
in our scenario) and check if the second field (p_j
) is ordered as well.
As a final modification to this basic idea, I add a group number
to each message.
For FIFO queues, the messageGroupId
is set to this group number
.
To check the ordering, queried messages are first grouped by messageGroupId
and then we check if their p_j
's are ordered after sorting the t_i
's
For standard queues, there is no messageGroupId
, but, for consistency, messages are also given a group number
.
As such, we end up with the following:
Before testing the FIFO queue, it's probably best to first confirm that the test behaves as expected for something familiar: SQS standard queues. The expectation here is simply that the ordering is not preserved for each "group." Running the test and the confimation script, we find out that SQS standard queues do not preserve ordering (as expected).
The above picture in the section Table Schema
contains a few rows of data regarding the standard SQS queue.
As you can see, the data from the first two rows already demonstrate that order is not preserved.
The actual test script queries all data for with a given prefix (either "std#" or "fifo#"),
splits the data off into groups (for example, std#1
, std#2
, and so on),
then checks if the data in both the sort key column sk
and the attribute column timestamp
are sorted,
and prints the results.
For me, running python confirm.py
(confirm.py
is the name of my test script) returns the following:
----------
Testing SQS std queue:
Number of items: 100
----------
{'std#0': {'sk': True, 'ts': False},
'std#1': {'sk': True, 'ts': False},
'std#2': {'sk': True, 'ts': False},
'std#3': {'sk': True, 'ts': False},
'std#4': {'sk': True, 'ts': False},
'std#5': {'sk': True, 'ts': False},
'std#6': {'sk': True, 'ts': False}}
----------
Testing SQS fifo queue:
Number of items: 100
----------
{'fifo#0': {'sk': True, 'ts': True},
'fifo#1': {'sk': True, 'ts': True},
'fifo#2': {'sk': True, 'ts': True},
'fifo#3': {'sk': True, 'ts': True},
'fifo#4': {'sk': True, 'ts': True},
'fifo#5': {'sk': True, 'ts': True},
'fifo#6': {'sk': True, 'ts': True}}
For the SQS standard queue, for each message group, we see that
the timestamp column (ts
) is not sorted (False
) even though the sort key column (sk
) is.
This confirms for us that the SQS standard queue failed to preserve order.
Conversely, we can also see that the SQS FIFO queue does preserve order within each message group.
As a caveat, I should add that there is a situation where the SQS standard queue does "preserve" the message order: when only one agent is ever polling messages from said queue. For example, in the context of AWS Lambda functions, you can force this by
1
andAWS Resource Policy
for the SQS queue preventing anyone or anything else from using said SQS queue.(By the way, this is essentially how a FIFO queue would behave if all of its messages have the same messageGroupId
.)
Please note that this "solution" would also remove the parallelism that using lambda functions in conjunction with SQS queues provide. However, depending on the context, perhaps that's okay.