Generating test data from the mysql> prompt

There are a lot of tools that generate test data.  Many of them have complex XML scripts or GUI interfaces that let you identify characteristics about the data. For testing query performance and many other applications, however, a simple quick and dirty data generator which can be constructed at the MySQL command line is useful.

First, let’s talk about what kind of data you can easily create with MySQL function calls:

You can generate a decimal number between zero and another number using the MySQL RAND() function like the following query (here between 0 and 10000):

Similarly, you can generate a random integer by adding the FLOOR() function:

You can generate a random string of 32 characters using MD5():

You can return a random integer between 500 and 1000 with the following:

You can return a random string from a list of strings by using a table to hold the list. A subselect can select a random name from the list of names.

Now we can generate a “fact” table with many rows using fairly simple SQL statements.

First create a table to generate data into:

Seed the table with one initial row:

Now grow the table by selecting from the table but providing new random values for the inserted rows:

As you repeat the INSERT … SELECT, the table will grow exponentially. You may want to add a LIMIT clause to the INSERT … SELECT to reduce the amount of data generated as the table grows.

You will create a table with an even data distribution for each column. You can then add some queries to add skew, either using INSERT … SELECT or UPDATE, for example:

That will skew the values by creating many rows with the same data as our initial row.

Using these simple tools, you can generate a data set that is great for testing purposes. For example, dim1 might be a customer_id and dim2 a product_id, and you would populate those tables with 10000 and 500 rows, respectively.

Share this post