Generating Numeric Sequences in MySQL

Generating Numeric Sequences in MySQLWhat is the easiest way to generate a sequence of integers in MySQL? In other words, which “SELECT <something>” statement should I write to get 0, 1, 2, … N – 1?

This is the question I have been struggling with for years and it looks like I have finally got the answer (although I must admit I had to put some development efforts and add a few hundred lines to the server code).  Percona Server for MySQL 8.0.20-11 includes a new feature dedicated to solving exactly this problem.

However, before revealing all the secrets, let us first consider existing solutions. So, we want to get the following:

What are our options for doing so?

The Old School Way

Let us start with the most straightforward solutions.

UNION to the Rescue

It may sound a bit primitive but the simplest solution would be to combine the result of multiple SELECT statements into a single result set with UNION.

Although this may work for smaller numbers, this solution is not very extensible.

Existing Table With a Unique Column

Let’s say that we already have a table with a unique column of any data type. For instance:

Now, we can join this table with a copy of itself and select the number of records for which id in the copy is less than or equal to the id in the original table.

The main drawback of this solution is its quadratic complexity on N that may cause significant resource utilization when N is big.

Session Variable Increment Within a SELECT

Provided that we already have a table t1 as in the previous example (although unique column constraint is not required here), we can join it with a single value SELECT that assigns initial value to a session variable. At the same time, for each record of the existing table, it will increment the value of this session variable.

This one is not bad: it’s extensible, linear complexity on N does not introduce unnecessary overhead, and the only drawback is a requirement to have an existing table.

Joining Multiple Views

We can always join several tables (or views) that contain more than one record to multiply the total number of records in the result set.

Using the same approach, by intersecting K instances of binary_v we can generate a sequence of 2^K values. Similarly, we can create a view for digits and as the result get 10^K values.

Although this seems to be pretty easy to understand, the execution plan of such a query is definitely far from being perfect.

Classicism

Stored Procedures

Basically, before selecting, we can create a temporary table and fill it with the required numbers utilizing a pre-created stored procedure.

The stored procedure itself can be defined as follows:

The execution plan for this approach is almost perfect, the only drawback is the necessity to call generate_seq() before using the sequence.

Prepared Statements

Let us try to automate the UNION-based solution a bit. Instead of manually repeating the UNION clause we can generate this statement dynamically.

Where generate_seq_stmt() can be defined as follows:

This seems to work, but the main disadvantage of this solution is that it cannot be embedded directly into more complex queries (unless the latter are also converted into prepared statements).

Sequence Storage Engine (MariaDB)

MariaDB, on the other hand, took a completely different approach. Instead of extending SQL syntax and inventing new constructs, in version 10.0, they implemented Sequence Storage Engine that creates completely virtual, ephemeral tables automatically when users need them. All you have to do is execute the following:

This seems nice and clear, however, the idea of reserving an almost infinite number of table names (seq_<int1>_<int2> and seq_<int1>_<int2>_<int3>) for each database is not very appealing. For instance, you cannot do:

This approach requires a lot of unnecessary error handling and additional branching in the server code. Their documentation also mentions a couple of tricks (like ALTER TABLE seq_1_to_100 ENGINE = BLACKHOLE) you can apply to get around certain problems, but in general, they just add more complexity and corner cases.
Personally, I like the syntax SELECT * FROM <virtual_sequence_generator>, but implementing this construct as a new storage engine was not the best design decision.

Modern Way

Recursive Common Table Expressions (CTE)

In MySQL Server 8.0.1, Oracle introduced Common Table Expressions (CTE), both non-recursive and recursive.

Although this solution can be used for a pretty wide range of upper bounds N and seems to be exactly what we need, I doubt it is readable/easy to understand.

Starting from MySQL Server 8.0.19, you can simplify this query a bit using LIMIT instead of WHERE.

However, both of those solutions have a limitation. By default, upper bound N cannot be very high.

Increasing cte_max_recursion_depth can shift this limitation though.

VALUES ROW(…), ROW(…) …

If you are a lucky one who has already upgraded to MySQL Server 8.0.19, you can use a VALUES Statement (a table value constructor that also functions as a standalone SQL statement).

This one is a bit easier than UNION-based but still lacks extensibility.

JSON_TABLE()

In MySQL Server 8.0.4 Oracle introduced a new JSON_TABLE() function that can extract data from a JSON document and return it as a relational table having the specified columns. But you may ask how can this even potentially be related to generating numerical sequences. Let’s consider the following example.

Here, we pass a simple JSON document to JSON_TABLE() function. That JSON-document is an array that consists of a series of objects with the predefined key “a”. We iterate over all elements of the array that match “$[*]” JSON path expression and extract the value of the “$.a” key into a column of type BIGINT UNSIGNED with the name value.
Although at this point the syntactic overhead is just overwhelming, I am starting to see the light at the end of the tunnel.

We can do better and improve our first JSON_TABLE() example.

Here, we have an array of empty objects and use a special construct FOR ORDINALITY that is equivalent to specifying a column as AUTO_INCREMENT in a CREATE TABLE statement. Although I should note that we still have a predefined number of empty JSON objects in our array. That’s not enough – we have to go deeper.

We are almost there!

Post-Modern Way

Let us first summarize what we managed to achieve in the previous example.

  • Clear syntactic construct SELECT … FROM JSON_TABLE(…) AS tt (although function arguments are still non-trivial)
  • We made this construct generate a different number of rows depending on the value of the @upper_bound variable.
  • This construct can be used anywhere where any other derived table statement is allowed.
  • Not only may this construct be dependent on the value of a session variable, in case of joined tables, it may generate a different number of rows depending on the value from another table’s column.

SEQUENCE_TABLE()

In an ideal word it would be really great to have a function, say SEQUENCE_TABLE() that would behave identically to our JSON_TABLE(CONCAT(‘[{}’, REPEAT(‘,{}’, @upper_bound – 1), ‘]’), “$[*]” COLUMNS(rowid FOR ORDINALITY).
Unfortunately, MySQL Server 8.0.19 does not have such a function.

I must admit it would be really terrible from my side to finish this blog post at this point by saying “JSON_TABLE()-based solution is the best we’ve got so far. That’s all we can do, thanks for reading“, so I am not going to do so.
Instead, I am going to announce that in Percona Server 8.0.20-11 we implemented a new Percona-specific feature called ‘SEQUENCE_TABLE()’.

Long story short, now you can just write the following.

And “yes”, it is as easy and straightforward as you see it. Now, let us consider a bit more complicated examples.
What if we want to generate a sequence from 4 to 7 inclusive?

Alternatively, you can write:

Another example, even numbers from 0 to 6 inclusive:

Alternatively:

Yet another example, numbers from 0 to 3 inclusive in reverse order:

Alternatively:

SEQUENCE_TABLE() can also be used to generate a set of random numbers:

Please notice that usage patterns for SEQUENCE_TABLE() are not limited to numbers only. We can, for instance, generate a list of predefined string literals (convert a row into a column, if you wish).

Or the same but with repeating values:

Finally, this table function may also help with generating pseudo-random string values:

This construct can be used to fill an existing table:

Or even to create a new one with pre-filled values:

I am pretty sure there are a lot of other use cases (say, generating Fibonacci numbers or printing all prime numbers in a given range) and you will definitely be able to find a lot of your own.

Conclusion

In this blog post, I tried to show that SQL is a pretty powerful language and allows us to do a lot of exotic things. However, it’s very frustrating that in some very simple cases the only way to do what you want is using heavy artillery. The example with SEQUENCE_TABLE() that I demonstrated shows that if adding new functionality by extending server code is the only option remaining, don’t be scared – it is OK to do this, especially when you know what to do.


Our solution brief “Get Up and Running with Percona Server for MySQL” outlines setting up a MySQL® database on-premises using Percona Server for MySQL. It includes failover and basic business continuity components.

Download PDF

Share this post

Comments (2)

  • Justin Swanhart Reply

    It would be nice if it was sequence_value(end, start, increment) where start defaults to zero and increment defaults to 1. I assume the values are generated by a for loop, and starting a sequence at a high value (like I need 10000 – 10010) requires filtering out a lot of rows with the where clause. I put end first so that it would be compatible with your current function given a single input parameter.

    Just a thought.

    August 9, 2020 at 8:29 pm
    • Yura Sorokin Reply

      Justin, thanks for the feedback. To be honest we also considered 2- and 3-argument versions of this function during the design stage which was definitely not that hard to implement. The main problem then was in inconsistency between unary version and binary/ternary. As a developer and/or a DBA with some programming skills I would expect SEQUENCE_TABLE(N) to return a zero-based sequence of numbers with values less then N – 0 to N not inclusive, in other words (like in a number of programming languages when you specify an upper bound for a generator). In contrast, when I specify SEQUENCE_TABLE(N, M) I would subconsciously expect this function to return a sequence from N to M inclusive. So, the decision was made to implement only the simplest (1-argument) version to avoid confusion with this inclusive/non-inclusive rules.
      But who knows, there is always a chance that we will change this behavior in future versions of the Percona Server.
      BTW, for the example you mentioned there is a much simpler solution that does not require filtering
      SELECT value + 10000 FROM SEQUENCE_TABLE(11) AS tt

      August 11, 2020 at 3:55 pm

Leave a Reply