The Problem With UUIDs

Use Blip to create an article or summary from any YouTube video.

I've wanted to make a video about UUIDs for a while now, and I'm thrilled to finally have the opportunity, thanks to my sponsor, PlanetScale. Although PlanetScale is a sponsor, they have given me no input on the content of this video. I just wanted to take the time to get it exactly right.

The problem with using a UUID as a primary key in MySQL is that it can hurt database performance. UUIDs are designed to be unique across all systems, and many developers are inclined to use them as primary keys for their records. However, there are several trade-offs to doing so when compared to an auto-incrementing integer.

When a new record is inserted into a table in MySQL, the index associated with the primary key needs to be updated. Indexes in MySQL take the form of a B+ tree, which is a multi-layer data structure that allows queries to quickly find the data that they need. However, when randomness is introduced into the algorithm, it can take significantly longer for MySQL to rebalance the tree on a high-volume database, which can hurt the user experience.

Additionally, primary keys in MySQL are indexed by default, and UUIDs consume more storage than auto-incrementing integers. If stored in a compact binary format, a single UUID would consume 128 bits, compared to 32 bits for an integer. If instead, you chose to use a more human-readable string-based representation, each UUID could be stored as a care 36, consuming a whopping 688 bits per UUID. This means that each record would store over 20 times more data than the 32-bit integer it's replacing.

Finally, page splitting can also negatively impact storage utilization and performance when using UUIDs as primary keys. When the primary key is random, the amount of space utilized for each page can be as low as 50%, leading to excessive usage of pages to store the index.

To minimize the negative side effects of using UUIDs as primary keys in SQL, there are a few best practices you can follow. First, you can use the binary data type to store UUIDs in their native binary format, reducing the storage requirement down to 16 bytes. Additionally, using an ordered UUID variant can mitigate some of the performance and storage impacts of using UUIDs by making the generated values more sequential, avoiding some of the page splitting issues described earlier.

Another option is to use an alternative ID type. UUIDs were first created in 1987, and there have been plenty of time for other professionals to propose different formats, such as Snowflake IDs, UL IDs, and even Nano IDs, which is what they use at PlanetScale. Nano IDs are particularly interesting because they are more compact and faster to generate than UUIDs.

In conclusion, while UUIDs are often used as primary keys in SQL, there are several trade-offs to doing so. By following best practices and considering alternative ID types, you can minimize the negative side effects of using UUIDs as primary keys in SQL.