AWS SQS Dead-letter queue configuration and re-drive ( reprocess of failed messages )
This post is a continuation of the AWS SNS -> SQS integration that we saw in the previous posts
- Integrate AWS SNS with Spring Boot using spring-cloud messaging for HTTP callback hooks
- Integration of messages from AWS SNS -> SQS -> Spring boot cloud messaging for high-traffic events
In the above posts, we were able to set up a full integration of AWS SNS -> AWS SQS -> Spring boot client. For a production setup, we cannot assume all the messages to be processed by the client ( Due to differences in the format of the publisher or for any other reason ), there could be messages rejected by the client. These rejected messages must be handled properly to avoid issues in the queue.
There are multiple ways to handle this. The crude way is to ignore the message when not able to process it. But this is not a good design as we lose messages that could be important from a distributed system standpoint.
The recommended way is to set up a dead-letter queue to capture all the rejected messages ( after a certain threshold of retries ) and put them into a different queue. From this queue, we can inspect the messages and either delete them ( if not relevant ) or re-drive them to the original queue for re-processing after fixing the client or maybe forward them to a new queue altogether.
Use case and concepts
We are going to design a queue message recovery using the dead-letter queue in Amazon SQS. Following are the basic requirements.
Use-case
- We should be able to configure a dead-letter queue to a primary queue.
- Any messages that are not deliverable after a certain threshold of retries should be redirected to the dead-letter queue.
- We should be able to inspect the messages in the dead-letter queue and decide whether to :
- Delete them
- Redrive back to the original source queues
- Redrive them to a different queue
Dead-letter flow
- The message is received by the
source-queue
source-queue
tries to deliver it to the receiver- The receiver is not receiving ( or able to process ) after N retries
- Moves the message to a dead-letter queue configured
- Users can check the messages from the dead-letter queue and see if
- The message format is wrong and discard
- The message format is fine, but the receiver does not have handling → In this case, update the receiver configurations and then redrive it back to the source queue ( or a different queue )
Redrive
The process of moving the messages from a dead-letter queue to the source or a different queue for re-processing.
AWS Setup
We are going to assume that there is an existing SQS queue ( You can refer to the previous blog post on how to create an SQS queue ).
Define a dead-letter queue
Let’s start by defining an independent queue that will act as the dead-letter queue.
Goto AWS -> SQS -> Create Queue
- Create a new queue with a different name (We need to ensure that the type of the queue is selected in accordance with the type of the source queue. You can only use standard dead-letter queues for standard queues and FIFO dead-letter for FIFO queues )
- Leave the configuration to default values
- Disable the encryption ( You can configure this, but ensure that the KMS key access is setup properly for the source and dead-letter queues )
- Leave the Access Policy to default.
- Configure the re-drive policy to
Enabled
and specify the queues that can use the current queue asdead-letter
a queue ( By default, it will be allowed for any queue under the same account and region ). - Save the configuration
There are no access configurations to be done in the DLQ or the source queue as long as they all have the same account owner setup
Mapping DLQ in the source queue
Now, need to map the DLQ in the source queue. This can be done by editing the source queue and going to the “Dead letter queue” section
- Choose enabled
- Select the DLQ defined
- Check the maximum receives ( count of tries ) after which the message is to be moved to DLQ
That’s it. Now we have set up a dead-letter-queue
as the dead-letter queue to test-queue
. Any messages that are not delivered after 10 receives will be pushed to the dead-letter-queue
Testing DLQ
For testing the DLQ, you can set the receiver to reject the entry or if you don’t have access, do the following from the AWS console.
- Goto source queue
- Edit the dead letter queue and make the “max receives” as “2”
- Post a message to the source queue
- Goto Source queue in AWS → Send and receive messages → Poll for messages.
- Repeat step 4
- Repeat step 4
- Check the receive count and it should show 3
- Goto DLQ and check the messages ( Send and receive messages → Poll for messages )
- This particular message should be available there and removed from the source queue
NOTE: In AWS Console, the fetching of messages from the AWS console is considered as a recipient. Hence when we do receive it more than 2 times, it will be sent to DLQ. Make sure to set back the maximum receives to the required number after testing.
Redrive from DLQ ( Reprocess back to source queue )
Once the messages are in the DLQ, there are different ways to handle the un-processed messages.
Delete the messages
Check if the message is relevant and if not, delete them.
- Goto SQS -> Queues -> dead-letter-queue -> Send and receive messages
- Click on Poll messages
- Check the individual messages and delete
Re-process the message ( Re-drive ) to the original or different queue
Suppose the messages were not understood by the client ( may be due to a publisher’s new format ) and if the fix has been deployed, we can have the messages to be reprocessed.
NOTE: There is no option to manually select messages and re-drive. We need to delete the ones not needed and re-drive the remaining.
Goto the dead-letter-queue → “Start DLQ Redrive” button
This will show the option to
- Send back to the original source queue
- Send them to a different destination queue ( You can choose an existing queue from the list )
You can also view the message and delete them from DLQ if needed.
Once done, you can choose the “DLQ Redrive” option and process them. If you chose to Redrive to the source queue, the message should be deleted from the dead-letter queue and then to the source queue. You may go back to the source queue and check for the messages.
Footnotes
Dead-letter queues are an effective way to handle undeliverable messages. This ensures that our system does not miss any messages due to the inability to process them.