About 2 weeks ago Oracle published the MySQL 5.7.7-labs-json version which includes a very interesting feature called “Generated columns” (also know as Virtual or Computed columns). MariaDB has a similar feature as well: Virtual (Computed) Columns.
The idea is very simple: if we store a column
|
1 |
`FlightDate` date |
in our table we may want to filter or group by year(FlightDate), month(FlightDate) or even dayofweek(FlightDate). The “brute-force” approach: use the above Date and Time MySQL functions in the query; however it will prevent MySQL from using an index (see below). Generated columns will allow you to declare a “Virtual”, non-stored column which is computed based on the existing field; you can then add index on that virtual column, so the query will use that index.
Here is the original example:
|
1 |
CREATE TABLE `ontime` (<br> `id` int(11) NOT NULL AUTO_INCREMENT,<br> `FlightDate` date DEFAULT NULL,<br> `Carrier` char(2) DEFAULT NULL,<br> `OriginAirportID` int(11) DEFAULT NULL,<br> `OriginCityName` varchar(100) DEFAULT NULL,<br> `OriginState` char(2) DEFAULT NULL,<br> `DestAirportID` int(11) DEFAULT NULL,<br> `DestCityName` varchar(100) DEFAULT NULL,<br> `DestState` char(2) DEFAULT NULL,<br> `DepDelayMinutes` int(11) DEFAULT NULL,<br> `ArrDelayMinutes` int(11) DEFAULT NULL,<br> `Cancelled` tinyint(4) DEFAULT NULL,<br> `CancellationCode` char(1) DEFAULT NULL,<br> `Diverted` tinyint(4) DEFAULT NULL,<br> PRIMARY KEY (`id`),<br> KEY `FlightDate` (`FlightDate`)<br>) ENGINE=InnoDB <br> |
Now I want to find all flights on Sundays (in 2013) and group by airline.
|
1 |
mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm <br> WHERE dayofweek(FlightDate) = 7 group by carrier<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: ontime_sm<br> type: ALL<br>possible_keys: NULL<br> key: NULL<br> key_len: NULL<br> ref: NULL<br> rows: 151253427<br> Extra: Using where; Using temporary; Using filesort<br><br>Results:<br>32 rows in set (1 min 57.93 sec)<br> |
The problem here is: MySQL will not be able to use index when you use a function which will “extract” something from the column. The standard approach is to “materialize” the column:
|
1 |
ALTER TABLE ontime_sm ADD Flight_dayofweek tinyint NOT NULL; |
Then we will need to load data into that by running “UPDATE ontime_sm SET Flight_dayofweek = dayofweek(flight_date)”. After that we will also need to change the application to support that additional column or use a trigger to update the column. Here is the trigger example:
|
1 |
CREATE DEFINER = CURRENT_USER <br>TRIGGER ontime_insert<br>BEFORE INSERT ON ontime_sm_triggers <br>FOR EACH ROW<br>SET <br>NEW.Flight_dayofweek = dayofweek(NEW.FlightDate);<br> |
One problem with the trigger is that it is slow. In my simple example it took almost 2x slower to “copy” the table using “insert into ontime_sm_copy select * from ontime_sm” when the trigger was on.
The Generated Columns from MySQL 5.7.7-labs-json version (only this version supports it on the time of writing) solves this problem. Here is the example which demonstrate its use:
|
1 |
CREATE TABLE `ontime_sm_virtual` (<br> `id` int(11) NOT NULL AUTO_INCREMENT,<br> `FlightDate` date DEFAULT NULL,<br> `Carrier` char(2) DEFAULT NULL,<br> `OriginAirportID` int(11) DEFAULT NULL,<br> `OriginCityName` varchar(100) DEFAULT NULL,<br> `OriginState` char(2) DEFAULT NULL,<br> `DestAirportID` int(11) DEFAULT NULL,<br> `DestCityName` varchar(100) DEFAULT NULL,<br> `DestState` char(2) DEFAULT NULL,<br> `DepDelayMinutes` int(11) DEFAULT NULL,<br> `ArrDelayMinutes` int(11) DEFAULT NULL,<br> `Cancelled` tinyint(4) DEFAULT NULL,<br> `CancellationCode` char(1) DEFAULT NULL,<br> `Diverted` tinyint(4) DEFAULT NULL,<br> `CRSElapsedTime` int(11) DEFAULT NULL,<br> `ActualElapsedTime` int(11) DEFAULT NULL,<br> `AirTime` int(11) DEFAULT NULL,<br> `Flights` int(11) DEFAULT NULL,<br> `Distance` int(11) DEFAULT NULL,<br> `Flight_dayofweek` tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL,<br> PRIMARY KEY (`id`),<br> KEY `Flight_dayofweek` (`Flight_dayofweek`),<br>) ENGINE=InnoDB |
Here we add Flight_dayofweek tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL column and index it.
Now MySQL can use this index:
|
1 |
mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by carrier<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: ontime_sm_virtual<br> partitions: NULL<br> type: ref<br>possible_keys: Flight_dayofweek<br> key: Flight_dayofweek<br> key_len: 2<br> ref: const<br> rows: 165409<br> filtered: 100.00<br> Extra: Using where; Using temporary; Using filesort<br> |
To further increase performance of this query we want to add a combined index on (Flight_dayofweek, carrier) so MySQL will avoid creating temporary table. However it is not currently supported:
|
1 |
mysql> alter table ontime_sm_virtual <br> add key comb(Flight_dayofweek, carrier);<br>ERROR 3105 (HY000): 'Virtual generated column combines with other columns to be indexed together' is not supported for generated columns.<br> |
We can add an index on 2 generated columns thou, which is good. So a trick here will be to create a “dummy” virtual column on “carrier” and index 2 of those columns:
|
1 |
mysql> alter table ontime_sm_virtual add Carrier_virtual char(2) GENERATED ALWAYS AS (Carrier) VIRTUAL;<br>Query OK, 0 rows affected (0.43 sec)<br>Records: 0 Duplicates: 0 Warnings: 0<br><br>mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, Carrier_virtual);<br>Query OK, 999999 rows affected (36.79 sec)<br>Records: 999999 Duplicates: 0 Warnings: 0<br><br>mysql> EXPLAIN SELECT Carrier_virtual, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by Carrier_virtual<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: ontime_sm_virtual<br> partitions: NULL<br> type: ref<br>possible_keys: Flight_dayofweek,comb<br> key: comb<br> key_len: 2<br> ref: const<br> rows: 141223<br> filtered: 100.00<br> Extra: Using where; Using index |
Now MySQL will use an index and completely avoid the filesort.
The last, but not the least: loading data to the table with generated columns is significantly faster compared to loading it into the same table with triggers:
|
1 |
mysql> insert into ontime_sm_triggers (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm;<br>Query OK, 999999 rows affected (27.86 sec)<br>Records: 999999 Duplicates: 0 Warnings: 0<br><br>mysql> insert into ontime_sm_virtual (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm; <br>Query OK, 999999 rows affected (16.29 sec)<br>Records: 999999 Duplicates: 0 Warnings: 0<br> |
Now the big disappointment: all operations with generated columns are not online right now.
|
1 |
mysql> alter table ontime_sm_virtual add Flight_year year GENERATED ALWAYS AS (year(FlightDate)) VIRTUAL, add key (Flight_year), lock=NONE;<br>ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.<br><br>mysql> alter table ontime_sm_virtual add key (Flight_year), lock=NONE;<br>ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.<br> |
I hope it will be fixed in the future releases.
Conclusion
Generated columns feature is very useful. Imagine an ability to add a column + index for any “logical” piece of data without actually duplicating the data. And this can be any function: date/time/calendar, text (extract(), reverse(), metaphone()) or anything else. I hope this feature will be available in MySQL 5.7 GA. Finally, I wish adding a generated column and index can be online (it is not right now).
More information: