So recently, I was on a live steam with @itsaydrian on Twitch where I demonstrated how to migrate a live serverless application (Next.js and Vercel) from a PostgreSQL database onto CockroachDB serverless with zero application downtime. If you're interested in that, please do check out the video.
In the stream, we talked a lot about different topics I had going on in the code, and one of them was about the difference between a CUID and a UUID. Now, at the time of the stream I was unsure of the difference between the two, and I just kind of used whatever.
After the stream, I too was curious what the difference is between the two. After doing some research, I'd like to share my findings.
We'll be looking at UUID and CUID, along with NanoID as it has started to gain traction lately.
UUID, or universally unique identifier, is a 128 bit label used for information in operating systems. The probability that an ID will be replicated isn't zero, it's close enough that when we generate a UUID, the chances of the identifier being used ANYWHERE else is near zero.
An example would be something like that fits the format of xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx, so something like 123e4567-e89b-12d3-a456-426614174000.
While researching, I came across this blog by PlanetScale, a MySQL-compatible serverless data platform which is one of my favorite databases to use.
The title of the post was "Why we chose NanoIDs for PlanetScale's API", and the post goes onto talk about how UUIDs are great and all as it's near impossible to generate a duplicate identifier, however they have a huge problem: they're too big, and take up too much space in the URL.
Don't we all love clean and short URLs? Nobody likes seeing a bunch of digits and weird symbols in your URL. Quoted from their blog,
They take up a lot of space in a URL: api.planetscale.com/v1/deploy-requests/7cb776c5-8c12-4b1a-84aa-9941b815d873
Try double clicking on that ID to select and copy it. You can't. The browser interprets it as 5 different words.
It may seem minor, but to build a product that developers love to use, we need to care about details like these.
Which is true. If we look into the history of UUID, we realize that it wasn't ever really made to be used in web applications. If we have a website such as, say, HasteBin, which depends upon generating these short, three letter IDs to easily share with people, this causes a problem.
CUID aims to solve the exact problem we discussed above with UUIDs. Quoted from CUID's GitHub:
Modern web applications have different requirements than applications from just a few years ago. Our modern unique identifiers have a stricter list of requirements that cannot all be satisfied by any existing version of the GUID/UUID specifications
CUID aims to focus on horizontal scalability (as today's applications don't run on a single machine), performance, size, security, and portability. The GitHub repository offers a more specific breakdown of the problems that this aims to solve.
They're shorter, but what about collisions? I found this online benchmark ran on a GitHub gist, showing that the chances for collisions are still extremely slim. It found no collisions on 100 million iterations.
As a bonus, we'll being going over NanoID. It's said to be a tiny (only 130 bytes minified and gzipped), fast (x2 faster than UUID), safe, short, and portable.
NanoID's website has a cool visualizer tool in it, where we can calculate the chances of a collision. With an ID length of 15 characters (pretty short and sweet), and a generation of 1,000 IDs every second, it'll take a mind boggling 569 thousand years for it to have a 1% probability of a single collision. Now if we switch that to 1000 IDs being generated every single second, it'd still take around 158 for it to have a 1% probability of at least a single collision. That's longer than me or you will probably live.
To visualize that, let's see:
- Human lifespan: 79 years.
- Life on Earth will be impossible in ~1.1 billion years.
- Age of the Earth: ~4.543 billion years.
- Age of the Universe: ~13.799 billion years.
NanoID also lets you customise the alphabet it uses, so you can remove and/or add in custom alphabets.
Anyways, I think you get the point. Just because something's short doesn't mean it's insecure — but let's move onto another metric: performance.
With that being said, I think I've given a fair comparison of the different methods of ID generation, their pros, and their cons. For most cases, I believe the best option is NanoID due to the fact that it's really customisable whilst being performant. It's slowly but surely taking over uuidv4, if we look at this npm trends comparison between the three. In the context of Prisma, the built-in CUID is perhaps the best choice.