Unique Identifier Challenges

I came across an interesting security article by penetration tester Daniel Thatcher discussing a proposed attack against older versions of UUIDs (Daniel also created a neat challenge at the end). This reminded me that it wasn’t until Java 5 that version 4 UUIDs were supported at all.

Earlier in my career whilst working in defence we ended up calling windows GUID generation over COM into Java 1.4 because no library could produce cryptographically strong random number generation (Java wouldn’t access the hardware or other entropic features of the OS, such as the number of currently running threads as a seed etc). At another employer, we had hardware crypto cards that provided this for us.

This led to design considerations when designing the system for identifiers in Skyve. Using something universally unique and not derived from a data store means that there is very little possibility of a clash (past the heat death of the sun). Using database sequences, however, means going to the trouble of defining ranges for different systems.

Another pitfall of using database sequences for identifiers earlier systems encountered was that it allowed an attacker to guess or navigate to data they should not have access to by manipulating URLs or API requests in poorly secured applications.

Skyve uses a document number generator to create human-readable sequential identifiers, but this is always a surrogate key to the unique identifier Skyve uses internally to uniquely identify the record.

In distributed service-oriented systems, you want unique identifiers all the time. Also, it is useful to be able to identify records that are not yet data (not persisted yet), transient state data like sessions/conversations and short-lived data - correlation ids for messages etc.

Another benefit of using a UUID is that generating them is faster than getting sequence numbers and is less hub and spoke, and has fewer architectural dependencies. The downside of UUIDs is that they are not ascending so they fragment database B-Tree indexing.

Large systems should reindex during slow activity periods if possible - some of our larger production applications rebuild their indexes every Sunday morning for example.