Choosing the right identity

When working with databases, a problem we all face, is how to uniquely identify each new record that gets introduced into the system. Most of the time we want some kind of generated sequential number. In particular we’ll look at how an ORM can be configured to have these identifiers generated for you in a central place like the database. An ORM will hide most of the dirty details and therefore is often considered as a kind of magic box. That’s why people tend to stick to the defaults, even though this is not always the best approach and can be the cause of many headaches. This post will explain some of the different options available and their benefits, so you can at least make an informed choice instead of blindly using the defaults.

Opening the magic box

The default identity generation strategy of the magic box (sometimes known as ORM, JPA or Hibernate) has at least two major problems:

1. There are two alternatives to the default one and you’ll find that each of these has their own pros and cons. It is important to know what you’re choosing as this impacts how the system performs in production. What works for 10 records of test data on your pc, will simply not behave the same for 1 million records in a distributed production environment.

2. The magic box can provide a nice level of abstraction, gets you started faster, lets you type less code,… But it should not impede you from understanding how it works. Never forget that a tool is not the same as a black box. If production crashes or performs badly: it is YOU who will debug it (including all the nice frameworks and tools). YOU should know the choices you make when designing your software and why.

The options

For the purpose of this post, we’ll stick to the standardized id generation strategies in the JPA specification: Identity, Sequence and Table.

Identity

Identity is what most people use by default. It delegates the id generation of new records to what is basically a counter in the database system. After each insert it will query the database for the generated id value, so the id of the created record can be used when defining relationships with other records.

Code example:

SQL for inserting 5 records:

As you can see, identity_val_local is called after every insert statement to get the generated id, resulting in a total of 10 statements.

Pros:

  • Very easy to use and understand
  • Little special configuration on the database (portable and happy DB admins)

Cons:

  • Performance: the system doesn’t know which id the database generated, so it needs to query the database for it after each insert. This results in general in more round trips from the application to the database.
  • Scalability: batch insertion of data from multiple servers can become problematic due to increased contention on the database.

Sequence

Sequence is also a commonly used alternative. A sequence generator is defined in the database and atomically increases its number every time a request comes in.

The system will request a block of ids using the sequence generator. For example:

  1. server1 requests a block and receives 1 – 100,
  2. server2 requests a block and receives 101 – 200,
  3. server1 has exhausted its block, requests a new range of ids and receives 201 – 300

The configuration has two additional parameters that are of special interest:

  • initialValue : the initial value of the sequence which will be the lowest generated id.
  • allocationSize : size of reserved block for an application. This should be large enough to reduce contention on the database sequence. However if it is too big, there will be large gaps in the generated ids when a transaction was rolled back or the server was restarted.

Code example:

SQL for inserting 5 records:

As you can see Hibernate will send requests to reserve its block of identifiers and use these in the subsequent insert statements. That’s why the system can reduce the number of statements it needs to send to the database.

Pros:

  • Good performance. The application chooses the ids from its reserved block. It doesn’t need to contact the database to find out the generated ids, so it can optimize and group the SQL statements it generates.
  • Good scalability. By using a sufficiently large block size, the contention on the database sequence can be reduced.

Cons:

  • Not all databases support sequences (but all the major ones do).
  • Special configuration of the sequence in the database.
  • A little more complex to configure in the code.

Table

The table generation strategy basically works the same as the sequence strategy, but uses a simple table to store its value instead of special database feature.

Code example:

SQL for inserting 5 records:

This example is very similar to the example above, but it uses a row in the hibernate_sequences table to keep track of the value of the sequence. Note the “for update” in the sql statements: this is the lock that Hibernate requests on the sequence row to prevent concurrency issues.

Acquiring the next block of ids is a little bit more work compared to using the sequences. This effect can be ignored when the blocks are sufficiently large as it only happens every time 100 ids have been used (in the example above).

Pros:

  • Same as the sequence strategy: good performance and scalability.
  • Very portable due to usage of a simple table instead of a special database feature.

Cons:

  • A little more complex to configure in the code.
  • In general a little slower than sequence generators.

Conclusion

The important lesson for me is that there are multiple options. I only presented the three standardized ones for this post. Depending on your choice of technology, more may be available. It is very important to remember what you are using and the impact that your choice has on your system.

For simple cases where scalability and performance is not an issue (like for example master data tables that don’t change often), using Identity is perfectly fine. For tables that will receive a lot of concurrent inserts, using a sequence generator makes more sense.

In general everything comes back to one principle: open the magic box and have a look inside!

All code is available at http://github.com/petermylemans/demos/tree/master/jpa-id-strategy

Leave a Reply

Your email address will not be published. Required fields are marked *