How to autoincrement in DynamoDB if you really need to

Now and then a business process requires generating sequential numbers. For instance, you might need sequential, gapless numbers for some sort of audit trail, like for invoice numbers. However, you also want to utilize the power of the cloud and its elasticity to scale up with growing demand. Thus, the setup needs to be resilient to clients concurrently requesting new numbers. DynamoDB's data model does not fit the out-of-the-box autoincrement functionality of a relational database system. However, the task still needs to be solved, and such a requirement cannot reasonably justify switching to a relational database system and crossing DynamoDB off your list.

Let's see how we can address our requirement. To get started we get ourselves a dedicated table which holds our counter and the counter's current value. The following CloudFormation template will generate a table called counters with a partition key named counterName.

AWSTemplateFormatVersion: "2010-09-09"
Resources:
  CountersTable:
    Type: 'AWS::DynamoDB::Table'
    Properties:
      TableName: counters
      AttributeDefinitions:
        - AttributeName: counterName
          AttributeType: S
      KeySchema:
        - AttributeName: counterName
          KeyType: HASH
      ProvisionedThroughput:
        ReadCapacityUnits: 1
        WriteCapacityUnits: 1

Let's initialize our table with a counter called importantCounter and set it to zero via the AWS CLI. To not get lost in braces and too long CLI commands we use counter.json to hold the actual data and pass it to the AWS CLI.

{
  "counterName": {
    "S": "importantCounter"
  },
  "currentValue": {
    "N": "0"
  }
}

The --item option of the put-item command gets our file as a reference. We are also requesting the capacity consumed by this operation.

aws dynamodb put-item --table-name counters \
 --item file://counter.json \
 --return-consumed-capacity TOTAL

The result should look like this:

{
    "ConsumedCapacity": {
        "TableName": "counters",
        "CapacityUnits": 1.0
    }
}

With the initialization done, let's see how we get a new counter value in this setting? The obvious approach is to fetch the item holding our counter value (in this case importantCounter). Once we have the value, we increment it in our application and use a conditional write to write the new value back to the table. A conditional write is necessary to secure the write against accidental usage of the same number twice and consequently overwrites. The approach requires two requests to DynamoDB and could look like this.

const AWS = require('aws-sdk');

const docClient = new AWS.DynamoDB.DocumentClient();

docClient.get({
  "TableName": "counters",
  "Key": {
    "counterName": "importantCounter"
  },
  "ConsistentRead": true
}, function (err, data) {
  if (err) console.log(err);
  else {
    const currentValue = data.Item.currentValue;
    const newValue = currentValue + 1;
    docClient.update({
      "TableName": "counters",
      "ReturnValues": "UPDATED_NEW",
      "ExpressionAttributeValues": {
        ":a": currentValue,
        ":bb": newValue
      },
      "ExpressionAttributeNames": {
        "#currentValue": "currentValue"
      },
      "ConditionExpression": "(#currentValue = :a)",
      "UpdateExpression": "SET #currentValue = :bb",
      "Key": {
        "counterName": "importantCounter"
      }
    }, function (err, data) {
      if (err) console.log(err);
      else console.log(data);
    })
  }
});

If two processes try to update the counter with the same value the second one will fail due to the failed ConditionExpression. To manage this type of situation, we need more logic in our application code to trigger the entire process again (getting the current counter's value and trying to update it with a new value) on a ConditionalCheckFailedException. This is not the end of the world and doesn't blow the complexity of our code out of proportion, but it also isn't pretty. Let's briefly discuss the WCUs and RCUs necessary for the process described above. The get operation needs one RCU (we are using a strongly consistent read), and the update operation consumes one WCU regardless of a successful outcome or a failure.

Update Expression

DynamoDB's update functionality provides the ability to manipulate item attributes in-place without prior fetching of the item's data. We have seen above that the UpdateExpression includes an assignment of the new value. With numerical attributes, we also get the ability to add a number to the existing attributes value right within the UpdateExpression. Also, that is basically what we are doing above. We read the value and add one. However, with a properly formulated UpdateExpression we can skip the reading part and let DynamoDB return the updated value.

The new code looks like this

const AWS = require('aws-sdk');

const docClient = new AWS.DynamoDB.DocumentClient();

docClient.update({
  "TableName": "counters",
  "ReturnValues": "UPDATED_NEW",
  "ExpressionAttributeValues": {
    ":a": 1
  },
  "ExpressionAttributeNames": {
    "#v": "currentValue"
  },
  "UpdateExpression": "SET #v = #v + :a",
  "Key": {
    "counterName": "importantCounter"
  }
}, function (err, data) {
  if (err) console.log(err);
  else console.log(data);
});

and retrieves the following data if the counter value was 6 before running the update command.

{ Attributes: { currentValue: 7 } }

It adds one to our counter item and returns the new value. This approach frees us from retrying failed updates, and we eliminate the need to get the counter's value in the first place. Thus, we are not consuming any RCU with this operation. Because DynamoDB delivers the counter's value for free with the update call as long as we specify that we want the updated item back ("ReturnValues": "UPDATED_NEW")! An additional advantage is that the operation is strongly consistent by design.

If we compare both approaches, it becomes evident that the second one is a more cost-effective, scalable, and cloud-native way to satisfy our requirement of providing numbers sequentially and in a gapless fashion.

Regardless which approach is finally chosen, the implementation has to be able to tolerate/mitigate failure cases. Like if a process crashes before it could persist its data. Such a situation might lead to an orphaned number, and ultimately gaps in the sequence could occur.

The complete sample code is available here. It includes the CloudFormation template, sample data, and the two counter approaches. A Node.js environment, as well as an AWS account, are required to run it.

arrow_back

Previous

Python Comprehension

Next

Lambda to the Rescue - CodePipeline State Change Notifications in Slack
arrow_forward