The release of Percona Server for MySQL 8.4.0 includes the new UUID_VX component, which implements UUID versions 1, 3, 4, 5, 6, and 7 according to recently published RFC 9562.

UUIDs (Universally Unique Identifiers) are unique identifiers that can be generated independently without a central authority or coordination with other parties.

Unlike sequential integer identifiers, which need coordination to guarantee global uniqueness in a distributed database, UUIDs eliminate the necessity of coordination, making them preferable in sharded database environments. Guarantees are statistical by nature, so collisions are possible theoretically but practically are almost zero (½^120).

The UUID is an identifier type widely used for various entities in distributed systems. It is standardized in several ISO/IEC, ITU-T, and IETF RFC documents. For more information, please see this Wikipedia article. UUID v4 is the most used version. It is meant to generate UUIDs from truly random or pseudorandom numbers.

Performance issues limit using UUID version four as a table’s primary key. Due to the random nature of the generated values, it is impossible to cluster them in a database index. So, the data is inserted at random positions, thus negatively affecting the performance of common index data structures such as B-tree and similar data structures.

UUID version one is supported in MySQL with built-in functions. It is time-based and works better for database indexes than version four, but it is not as good as it should be. Please see below.

As explained in the recent article “Goodbye to sequential integers, hello UUIDv7!”:

UUID Version 7 (UUIDv7) is a time-ordered UUID which encodes a Unix timestamp with millisecond precision in the most significant 48 bits. As with all UUID formats, 6 bits are used to indicate the UUID version and variant. The remaining 74 bits are randomly generated. As UUIDv7 is time-ordered, values generated are practically sequential and therefore eliminates the index locality problem.

Therefore, implementing time-biased UUID version seven support for MySQL will solve performance problems in distributed environments and help with better data sharding.

UUID versions one through eight formats are defined in RFC 9562, published in May 2024.

Any version’s UUID is represented as a 128-bit value and has common properties. Thus, one common implementation of all UUID versions is possible and desirable.

UUID version two is for DCE Security UUIDs and is outside the scope of RFC 9562. Version eight is defined by the RFC as vendor-specific and experimental, so this document does not include how to implement versions two and eight. 

UUIDs can be divided into three kinds:

  • Timestamp-based UUIDs are versions one, six, and seven. Implementations SHOULD utilize UUID version seven instead of UUID one and UUID six if possible, as noted in RFC 9562.
  • Random number-based is UUID version four. It is the most widely used type of UUID.
  • String hash-based versions three and five. Version three uses MD5 hash, and version five uses SHA1 hash. Both hash functions are considered insecure in cryptography, but there is no serious danger for name-based UUIDs.

MySQL built-in UUID functions

Built-in functions work only with UUID version one. As noted above, UUID version one is time-based and also contains the “node” part.

As can be seen from the UUID version one data structure above, the timestamp is split into parts by the version field, going from lowest to highest. Then, we have the “clock_seq” field, which contains a sequential number. So, we have a strange order of bytes in the timestamp parts, and the database’s indexing engines will be confused when processing such indexes. 

For example, if you have two UUIDs generated at exactly the same time, the “clock_seq” field differs, and it goes after “time_high” to the higher digit of the number. If you have two UUIDs generated at nearly the same time, the “time_low” field will be changed to some lower middle digits of the number. So, for the indexing engine, it is impossible to place such IDs near each other as required for the time-stamp ordering of indexes.

By the way, the time stamp is not your old good “unix time.” It is a count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582 (the date of Gregorian reform to the Christian calendar).

The second part of the data structure of UUID version one is a “node” part, which is based on information from the network identity of the system (MAC address of the Ethernet card or something like that) and depends on implementations. In all implementations, it is not random and could be changed with the change of host network identity, for example, the MAC address of the virtual network interface of the Docker container. In that case, the ordering will also change, and indexes get messed up. That’s why the built-in implementation has a “swap flag” argument in the UUID_TO_BIN(string_uuid, swap_flag) function. 

So, the actual “sequential” part of this UUID works somewhat, but there’s no randomness, and UUID version one is strictly bound to the host.

Which UUID version is best for a primary key?

There’s no good answer. It really depends on the use case. But if you are going to use UUID as a table primary key, then the answer is version seven. And here’s why.

First, let’s look at the structure of UUID version seven.

The first six octets in the structure are timestamp bytes from the highest to the lowest in the big-endian (network) order. Database records that were created, consequently, will be naturally ordered by the time of creation. A special mechanism in the UUID allows it to have different timestamp values even if the clock resolution is lower than the speed of record creation. Besides that, the random part makes each UUID unique even within maximal resolution.

UUIDs of versions one and six are not so good because of the “node” part in the data structure. This part is calculated from the host address and is constant. So, we’re at the risk of collision within the clock resolution.

UUIDs of versions three and five are bad as primary keys also because they are based on hash functions of the supplied string. So, If you have the same string, you’ll have the same hash value. On the other hand, the good hash function tries to have an even distribution of its values, and we’ll have the same problem with key clustering as in the case of fully random UUID version four. 

UUID functions in UUID_VX_UDF component

Common function for all UUID versions

In Percona Server for MySQL 8.4.0-1, we introduced a new component, ‘uuid_vx_udf‘, that provides various functions for working with UUIDs of different versions. The name of the component’s shared library is ‘component_uuid_vx_udf.so’, and it is located in the plugins directory.

Below is the full list of functions provided by the component. The UUID_VX stands for  UUID version X and means that the component implements and can work with all UUID versions described in the RFC 9562 document. The names of functions are marked below as uppercase bold text.

UUID_VX_VERSION() This function works with UUIDs of all versions. It takes one string argument that represents UUID in the standard or hexadecimal form. Returns version of UUID (one through eight) or error if the argument is not UUID. Returns NULL if the argument is NULL.

UUID_VX_VARIANT() Takes one string argument that represents UUID in the standard or hexadecimal form. Returns string describing a variant of UUID according to RFC 9562 or error if the argument is not UUID.

IS_UUID_VX() Takes one string argument that represents UUID in the standard or hexadecimal form. Returns true if the argument string represents UUID. In case a NULL argument returns NULL. In all other cases, it returns false.

NIL_UUID_VX() Creates NIL UUID. The NIL UUID is a special form of UUID that is specified to have all 128 bits set to zero: 00000000-0000-0000-0000-000000000000. It takes no arguments.

MAX_UUID_VX() Cretes MAX UUID. The Max UUID is a special form of UUID that is specified to have all 128 bits set to 1: FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF. This UUID can be thought of as the inverse of the NIL UUID.

IS_NIL_UUID_VX() Takes one string argument that represents UUID in the standard or hexadecimal form. Returns true if the argument string represents UUID and it is NIL UUID. In the case of the NULL argument, the function returns NULL. The function throws an error if the argument is NOT and can not be parsed as UUID.

IS_MAX_UUID_VX() Takes one string argument that represents UUID in the standard or hexadecimal form. Returns true if the argument string represents UUID and it is MAX UUID. In all other cases, it returns false. In the case of the NULL argument, the function returns NULL. The function throws an error if the argument is not NOT and can not be parsed as UUID.

UUID_VX_TO_BIN() Takes one string argument, which should be a formatted or hexadecimal representation of UUID. This function converts the string representation of UUID into a binary representation.

UUID generator functions

UUID_V1() generates a UUID of version one (timestamp-based). The function takes no arguments and matches the built-in UUID() function. If possible, please use UUID_V7().

UUID_V3() generates a UUID of version three (name-based). The function takes one or two arguments.

The first argument is the string that will be hashed with the MD5 function and placed in the corresponding fields of the UUID. The second argument is optional; it defines a namespace for the UUID. The argument type is integer. Values: DNS: 0, URL: 1, OID: 2, X.500: 3. The default is 1 or URL. The function does not perform any checks of the string format of the first argument.

Note that the MD5 algorithm is obsolete and considered insecure in cryptography. Please use this with care and do not expose this data.

UUID_V4() generates a UUID of version four (random number-based). The function takes no arguments. UUID_V5() generates a UUID of version five (name-based). The function takes one or two arguments.

The first argument is the string that will be hashed with the SHA1 function and placed in the corresponding fields of the UUID. The second argument is optional; it defines a namespace for the UUID. The argument type is integer. Values: DNS: 0, URL: 1, OID: 2, X.500: 3. The default is 1 or URL. The function does not perform any checks of the string format of the first argument.

Note that the SHA1 algorithm is better than MD5, but this algorithm is obsolete and considered insecure in cryptography. Please use this with care and do not expose this data.

UUID_V6() generates a UUID of version six (timestamp-based). The function takes no arguments.

If it is possible, please use UUID_V7().

UUID_V7() generates a UUID of version seven (timestamp-based). The function takes no arguments or one integer argument. The argument is a number of milliseconds to shift the timestamp of UUID forth or back if the argument is negative. If there is no argument, timestamp shift is not performed. Timestamp shift may be used to hide the actual time of the record creation.

Specific functions for timestamp-based UUIDs

UUIDs of versions one, six, and seven are timestamp-based.

UUID_VX_TO_TIMESTAMP() takes one string argument and returns a timestamp string like “2024-05-29 18:04:14.201 ”. If the argument is not parsable as UUID one, six, or seven, the function throws an error.

The function always uses UTC time, regardless of system settings or time zone settings in MySQL.

UUID_VX_TO_TIMESTAMP_TZ() takes one string argument and returns a timestamp string with the time zone like “Wed May 29 18:05:07 2024 GMT”. If the argument is not parsable as UUID versions one, six, and seven, the function throws an error. The function always uses UTC time (GMT time zone) regardless of system settings or time zone settings in MySQL.

UUID_VX_TO_UNIXTIME() takes one string argument and returns a number of milliseconds since the Epoch. If the argument is not parsable as UUID versions one, six, and seven, the function throws an error.

Examples

UUID_VX component initialization and deinitialization

Initialize UUID_VX component:

Deinitialize  UUID_VX component:

UUID generators

UUID version one:

UUID version three, one argument, default UUID namespace is “URL”.

UUID version three, one argument, explicit UUID namespace is “URL”.

UUID version three, one argument, explicit UUID namespace is “DNS”.

Actually, this function and the function uuid_v5() do not check whether the string argument is a URL, valid DNS name, or valid OID. They just hash the string and place the hash in a UUID with a defined namespace. So, it is up to the user which strings to use.

UUID version four:

UUID version five:

The UUID_V5() function is similar to the UUID_V3(), except it uses the SHA1 algorithm instead of MD5. It processes arguments exactly the same way as for version three.

UUID version six:

UUID version seven generation:

UUID version seven with timestamp offset in 84000 seconds in the future

UUID version seven with timestamp offset in 84000 seconds in the past

The time offset helps to mangle record creation time in the case of primary keys.

UUID checks

Timestamp extraction

These functions work for timestamp-based UUID only; the versions are one, six, and seven.

Database table primary key

UUID of any version could be used as a primary key, though version four is not a good choice because it is totally random, and table records can not be clustered efficiently. So, if you need just a good random primary key with good behavior, then use version seven.

As you can see, the first few bytes of the primary key are the same, so the records will be clustered well. Also, uuid-s are different because of the random part, and it is possible to insert duplicating records with different keys.  

Implementation details

The UUID_VX component is implemented as a standard MySQL component. The source code is available at the (URL!!!). But, well, it is not as straightforward as it may appear. The implementation uses the latest version of the Boost UUID library (1.86), which was not officially released at the time of this publication. It is a header-only library, and its source is placed in include/boost_1_84_0/patches/boost/.

So, if you want to compile the UUID_VX component for the original MySQL or MariaDB, you have to repeat the trick: go to the Boost UUID library source at https://github.com/boostorg/uuid/tree/develop/include/boost and grab the stuff.

Of course, it is always better to use the Percona Server for MySQL with everything up and running 🙂.

Conclusion

UUIDs are extremely good identifiers for distributed environments. And the environment that is local today may become distributed very quickly.  Imagine the situation: you have identical systems in two or three different towns. And you want to merge data into a bigger regional system. In the case of some sequential primary keys, you are in trouble because keys will be duplicated in each system. In the case of UUID version seven primary keys, you can just merge data without any hassle. This feature comes in 8.0.39 and 8.4.0 releases of Percona Server for MySQL.


Percona Distribution for MySQL is a complete, stable, scalable, and secure, open source MySQL solution, delivering enterprise-grade database environments for your most critical business applications. Deploy anywhere and implement easily with one-to-one compatibility with MySQL Community Edition.

 

Try Percona Distribution for MySQL today!

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Oleksiy Lukin

I am sorry, I missed to put the source URL somehow during to publishing process. Gere it is:
https://github.com/percona/percona-server/tree/8.0/components/uuid_vx_udf