Testing SQS FIFO queues | DoingCloudStuff

Testing SQS FIFO queues

author: Vincent Chan

test sqs queues

Note: the code for this post can be found at this repo.

There are two flavors of SQS queues: Standard and FIFO (First-in-first-out).

For lambdas with an SQS standard queues event source, AWS will produce as many lambda executions as is allowed by the user-defined lambda concurrency (that is, also assuming that there are enough messages in the SQS queue).

As a result of executing as many lambda functions as is possible, ordering is lost. By this, I mean that if event 1 through event 20 were sent to the SQS queue in order (event 1 before event 2 before event 3 and so on), they may not be processed in the same order (maybe event 13 is processed before event 7 for instance).

Suppose 2 lambdas were executed as a result of the messages in the SQS queue. The first lambda polled events 1 through 10; the second, events 11 through 20. Since the polling process is relatively quick, the second lambda may well receive events 11 through 20 before lambda 1 even got around to processing event 2. Hence, event 13, for example, may be processed before event 7.

SQS FIFO queues, however, are advertized to follow a first-in, first-out model (hence the name FIFO queue). Although at first glace, for me at least, it sounds like all messages in the FIFO queue are ordered, that is not the case. Message ordering is only preserved for messages in the same message group (that is, messages with the same messsageGroupId). As such, FIFO queues can be thought of as being similar to Kinesis Data Stream or Kafka; and message groups as being similar to partitions. The differnce between the two technologies is that you can create arbitrarily many message groups, but the number of partitions (although not fixed) is something to be decided upon with care.

Reading about the FIFO queues made me wonder: how can I confirm this behavior (that messages in a FIFO queue is ordered within each message group)? Naturally, I eventually thought of one and this post is about the solution I came up with.

Also, the code for this test is available for users and can be found in my GitLab repo. It comes with two deployement options, SAM (serverless application model) and CDK (cloud development kit).

The idea

Since the question is whether given a sequence of messages in a queue (standard or FIFO), will those messages be processed in the same order as when they were written into the queue.

Instead of writing out actual timestamps, let me represent them using t_i with t_i being the timestamp of message i.

In addition, let our lambda function take in a timestamp t_i (denoting when the message was written into the queue) and output a ordered pair (t_i, p_i), where p_i (denoting when the message was processed and written into a database table).

If ordering is preserved, than for any two events, say (t_1, p_1) and (t_2, p_2), if t_1 < t_2 then p_1 < p_2. All this is saying is that, the order in which messages are processed is the same as the order in which messages were written to the queue.

Suppose you have received a bunch of messages (with timestamp given to denote their ordering) and have written the data to a database table. Then, to check if the ordering is preserved, you can query for all of the data in the database table, sort the data by the first field (t_i in our scenario) and check if the second field (p_j) is ordered as well.

Table schema

As a final modification to this basic idea, I add a group number to each message. For FIFO queues, the messageGroupId is set to this group number. To check the ordering, queried messages are first grouped by messageGroupId and then we check if their p_j's are ordered after sorting the t_i's For standard queues, there is no messageGroupId, but, for consistency, messages are also given a group number.

As such, we end up with the following:

  • partition key: "pk" (e.g. "std#2" for message group #2 from SQS standard queue and "fifo#6" for message group #6 from SQS FIFO queue)
  • sort key: "sk"
  • attribute: "timestamp"

Example table data

Control

Before testing the FIFO queue, it's probably best to first confirm that the test behaves as expected for something familiar: SQS standard queues. The expectation here is simply that the ordering is not preserved for each "group." Running the test and the confimation script, we find out that SQS standard queues do not preserve ordering (as expected).

The above picture in the section Table Schema contains a few rows of data regarding the standard SQS queue. As you can see, the data from the first two rows already demonstrate that order is not preserved.

Standard SQS queues do not preserve order

The actual test

The actual test script queries all data for with a given prefix (either "std#" or "fifo#"), splits the data off into groups (for example, std#1, std#2, and so on), then checks if the data in both the sort key column sk and the attribute column timestamp are sorted, and prints the results.

For me, running python confirm.py (confirm.py is the name of my test script) returns the following:

----------
Testing SQS std queue:
Number of items: 100
----------
{'std#0': {'sk': True, 'ts': False},
 'std#1': {'sk': True, 'ts': False},
 'std#2': {'sk': True, 'ts': False},
 'std#3': {'sk': True, 'ts': False},
 'std#4': {'sk': True, 'ts': False},
 'std#5': {'sk': True, 'ts': False},
 'std#6': {'sk': True, 'ts': False}}

----------
Testing SQS fifo queue:
Number of items: 100
----------
{'fifo#0': {'sk': True, 'ts': True},
 'fifo#1': {'sk': True, 'ts': True},
 'fifo#2': {'sk': True, 'ts': True},
 'fifo#3': {'sk': True, 'ts': True},
 'fifo#4': {'sk': True, 'ts': True},
 'fifo#5': {'sk': True, 'ts': True},
 'fifo#6': {'sk': True, 'ts': True}}

For the SQS standard queue, for each message group, we see that the timestamp column (ts) is not sorted (False) even though the sort key column (sk) is. This confirms for us that the SQS standard queue failed to preserve order. Conversely, we can also see that the SQS FIFO queue does preserve order within each message group.

Final thoughts: can we force the SQS standard queue to preserve order?

As a caveat, I should add that there is a situation where the SQS standard queue does "preserve" the message order: when only one agent is ever polling messages from said queue. For example, in the context of AWS Lambda functions, you can force this by

  • setting the lambda concurrency to 1 and
  • attaching an AWS Resource Policy for the SQS queue preventing anyone or anything else from using said SQS queue.

(By the way, this is essentially how a FIFO queue would behave if all of its messages have the same messageGroupId.)

Please note that this "solution" would also remove the parallelism that using lambda functions in conjunction with SQS queues provide. However, depending on the context, perhaps that's okay.