In my first post in this series, I described materialized views (MVs). An MV is essentially a cached result set at one point in time. The contents of the MV will become incorrect (out of sync) when the underlying data changes. This loss of synchronization is sometimes called drift. This is conceptually similar to a replication slave that is behind. Until it catches up, the view of the data on the slave is “behind” the changes on the master. An important difference is that each MV could have drifted by a different length of time.
A view which has drifted out of sync must be refreshed. Since an MV drifts over time from the “base tables” (those tables on which the view was built) there must be a process to bring them up-to-date. Flexviews includes two different refresh methods. Each method is named after the way in which the contents of the view are updated. The two methods are complete refresh and incremental refresh.  The view type/refresh method are interchangeable since a refresh method is selected at the time of view creation and may not be changed. If I talk about a ‘complete MV’, I mean a ‘MV for which the complete refresh method was selected at creation’. Also, when I say ‘view’ I mean MV.
I will fully describe each method below, but the important thing to know right now is that there are two significant differences between the methods.
The incremental method is somewhat conceptually similar to using mysql binary logs for point-in-time recovery after a restoring a backup. Since the backup is a essentially a ‘snapshot’ of the data at the time of the backup, upon restoration the backup may be ‘out of date’ since changes to the database may have happened since the backup was taken. Replaying the binary logs brings the database up-to-date by replaying the changes.
An MV is also like a snapshot. The incremental refresh method uses the changelogs collected by FlexCDC to update the snapshot to reflect the changes that have happened in the database, instead of replaying binary logs directly.
The refresh method you select is determined by the following two major factors:
The complete method completely rebuilds an MV from scratch at each refresh. This means that the minimum amount of drift for this type of MV is the amount of time it takes to execute the query on which the view is based. An incrementally refreshable view, on the other hand, can usually be refreshed very quickly, depending on the amount of data which has changed since the last refresh. This is because the incremental refresh method must examine only those rows related to the ones which changed since the last time the view was refreshed.
The incremental refresh method can be many orders of magnitude faster than the original query execution time. It may be possible, therefore, to refresh an MV based on a query that takes 45 minutes to execute with a greater frequency than 45 minutes, even as frequently as every minute, perhaps.
The refresh method you select effectively controls the minimum amount of time it takes to refresh the view, and therefore the minimum amount of drift it will encounter. Keep in mind that you can build complete refresh views on top of incrementally refreshable view, which can give you a lot of flexibility to get the results you want, but more quickly. More on this later.
Incremental refresh doesn’t support all of the SQL features available to SELECT statements.
| Refresh type | Aggregation | Outer join | All SQL functions | Built using SQL | Requires FlexCDC |
|---|---|---|---|---|---|
| COMPLETE | Y | Y | Y | Y | N |
| INCREMENTAL | Y* | N | N** | N*** | Y |
*All aggregate functions supported, except GROUP_CONCAT and AVG(distinct).
**non-deterministic functions like RAND(),NOW(), etc, are not supported.
***There is a script to convert SQL to the Flexviews API (see convert.php below)
The incremental refresh method uses table changelogs which are created by FlexCDC. The refresh algorithm computes changes in multiple transactions. After each transaction it must wait for FlexCDC to process changes from that transaction. In practice, this means that it takes a minimum of a few seconds to incrementally refresh a view, even when the number of changes is very small.
Unlike views which use the complete refresh method, incrementally refreshable views are not built directly from SQL. Instead, Flexviews includes a MySQL stored procedure API, called the SQL_API which is used to define the view. These stored procedures maintain the Flexviews data dictionary. The data dictionary is used by the incremental refresh algorithm to build and maintain the view. You should always use the stored procedures to modify the dictionary. Do not modify it directly.
convert.php
You are probably unfamiliar with the SQL_API. In order to make working with the Flexviews easier, it includes a script called ‘convert.php’. This script makes it easy to create incrementally refreshable views from SQL statements. It reads one or more “CREATE TABLE db.schema … AS SELECT” and/or “INSERT INTO db.schema … AS SELECT” statements from standard input, and outputs the SQL_API statements representing the original SQL statements. Each MV name will be taken from the table name specified in each statement. You should try each statement for correctness before you attempt to convert it. You can easily check that it parses by running the SELECT portion, adding LIMIT 0 to the SELECT part of the query.
convert.php
This script takes as one argument the default database. Notice in the following example, the schema is not specified.
The MV will be placed into whatever schema is specified as the first argument on the command line:
|
1 |
$ cat sales.sql<br>create table dashboard_customer_sales AS<br>select customer_id,<br>customer_name,<br>DATE_FORMAT(order_date,'%Y%m') as sale_when,<br>sum(quantity) as total_items,<br>sum(price * quantity) as total_price,<br>count(*) as total_lines<br>from orders o<br>join customers c using(customer_id)<br>join order_lines ol using(order_id)<br>group by customer_id,<br>customer_name, sale_when;<br><br>$ php convert.php demo < sales.sql<br>CALL flexviews.create('demo', 'dashboard_customer_sales', 'INCREMENTAL');<br>SET @mvid := LAST_INSERT_ID();<br>CALL flexviews.add_expr(@mvid,'GROUP','customer_id','customer_id');<br>CALL flexviews.add_expr(@mvid,'GROUP','customer_name','customer_name');<br>CALL flexviews.add_expr(@mvid,'GROUP','DATE_FORMAT(order_date,"%Y%m")','sale_when');<br>CALL flexviews.add_expr(@mvid,'SUM','quantity','total_items');<br>CALL flexviews.add_expr(@mvid,'SUM','price * quantity','total_price');<br>CALL flexviews.add_expr(@mvid,'COUNT','*','total_lines');<br><br>CALL flexviews.add_table(@mvid,'demo','orders','o',NULL);<br>CALL flexviews.add_table(@mvid,'demo','customers','c','USING (customer_id) ');<br>CALL flexviews.add_table(@mvid,'demo','order_lines','ol','USING (order_id) ');<br>CALL flexviews.enable(@mvid); |
The script (convert.php) supports basic queries which use select/group by/join/where. You may not use sub-queries or any non-deterministic functions like NOW() or RAND(). HAVING clauses, ORDER BY clauses, etc, are not supported. You can work around many of these limitations. I will discuss one of the workarounds below, in the ‘complete refresh’ section. I’ll discuss others in future blog posts.
The SQL_API call flexviews.enable() is used to actually build the contents of the view, making it available for querying:
|
1 |
mysql> call flexviews.enable(<br>-> flexviews.get_id('demo','dashboard_customer_sales'));<br>Query OK, 0 rows affected (41 min 52.04 sec) |
Here is a quick example of the list of tables used by the above view, as stored in the data dictionary:
|
1 |
mysql> select *<br>from flexviews.mview_table<br>where mview_id=<br>flexviews.get_id('demo','dashboard_customer_sales')G<br>*************************** 1. row ***************************<br> mview_table_id: 27<br> mview_id: 21<br> mview_table_name: orders<br> mview_table_schema: demo<br> mview_table_alias: o<br>mview_join_condition: NULL<br> mview_join_order: 999<br>*************************** 2. row ***************************<br> mview_table_id: 28<br> mview_id: 21<br> mview_table_name: customers<br> mview_table_schema: demo<br> mview_table_alias: c<br>mview_join_condition: USING (customer_id)<br> mview_join_order: 999<br>*************************** 3. row ***************************<br> mview_table_id: 29<br> mview_id: 21<br> mview_table_name: order_lines<br> mview_table_schema: demo<br> mview_table_alias: ol<br>mview_join_condition: USING (order_id)<br> mview_join_order: 999<br>3 rows in set (0.00 sec) |
The following SQL is dynamically generated by the flexviews.get_sql() function call. This function reads the meta-data stored in the Flexviews data dictionary. It returns the SQL which represents the data stored in the view. This is much more convenient than trying to read from the dictionary directly to determine the contents of the view.
|
1 |
mysql> select flexviews.get_sql(<br>-> flexviews.get_id('demo','dashboard_customer_sales'))<br>-> as 'SQL' G<br>*************************** 1. row ***************************<br>SQL:<br>SELECT NULL as mview$pk,<br>(customer_id) as `customer_id`,<br>(customer_name) as `customer_name`,<br>(DATE_FORMAT(order_date,'%Y%m')) as `sale_when`,<br>SUM(price * quantity) as `total_price`,<br>COUNT(*) as `total_lines`<br>FROM demo.orders as o<br>JOIN demo.customers as c USING (customer_id)<br>JOIN demo.order_lines as ol USING (order_id)<br>GROUP BY (customer_id),<br> (customer_name),<br>(DATE_FORMAT(order_date,'%Y%m'))<br>1 row in set, 1 warning (0.00 sec) |
This method is actually quite simple in its operation. It will completely rebuild a view based on the SQL which defines it, each time the view is refreshed.
This is not much different from building a table with CREATE TABLE .. AS SELECT. In fact, this is actually part of what the COMPLETE refresh method does during its operation. It also takes care of atomically* replacing the new contents of the view with the old contents (using RENAME TABLE) which means that the view remains available for querying, even during refresh.
A complete refresh type view can use non-deterministic functions like NOW(). In fact any valid SELECT statement can be materialized. The definition of the MV is stored internally, and represents the actual SQL that defines the view. A simple API call is made to associate the SQL definition with the view, but the definition itself is a SQL statement, similar to a regular view. If you choose to use only the complete refresh method, then FlexCDC is not required to use Flexviews.
In the following example, the total sales amounts are calculated (for all time) from a monthly summary. The monthly summary is actually another MV. This saves a significant amount of time. If this MV did not access the other MV, then the SQL necessary to compute this list would take over 40 minutes to execute, because that is how long it takes to create the monthly summary from scratch.
Here is an example of a complete refresh MV which computes the total sales for all customers. It computes this from the dashboard_top_customers MV, ordering them in descending order:
|
1 |
mysql> call flexviews.create(<br>-> 'demo','dashboard_top_customers','COMPLETE');<br>Query OK, 1 row affected (0.00 sec)<br><br>mysql> call flexviews.set_definition(<br>->flexviews.get_id('demo','dashboard_top_customers'),<br> -> 'select customer_id,<br> '> sum(total_price) total_price,<br> '> sum(total_lines) total_lines<br> '> from demo.dashboard_customer_sales dsc<br> '> group by customer_id<br> '> order by total_price desc;<br> '> ');<br>Query OK, 1 row affected (0.00 sec)<br> |
When the view is ‘enabled’, the contents are created:
|
1 |
mysql> call flexviews.enable(<br>-> flexviews.get_id('demo','dashboard_top_customers'));<br>Query OK, 0 rows affected (5.73 sec) |
It only took about six seconds to calculate the total sales for all customers from the dashboard_customer_sales table. If you were paying attention to the incremental example, the dashboard_customer_sales table just happens to be the view we created above.
I picked this example, because it is important to understand that you can build complete refresh views on top of incremental ones. This will allow you to create complete refresh views that refresh much more quickly.
If this had to actually access the orders, order_lines and customers tables directly (the tables on which dashboard_customer_sales is built), then the query would take significantly longer (40+ minutes):
|
1 |
mysql> SELECT <br>-> customer_id as `customer_id`,<br>-> SUM(price * quantity) as `total_price`,<br>-> COUNT(*) as `total_lines`<br>-> FROM demo.orders as o<br>-> JOIN demo.customers as c USING (customer_id)<br>-> JOIN demo.order_lines as ol USING (order_id)<br>-> GROUP BY (customer_id)<br>-> ORDER BY total_price desc<br>-> LIMIT 10;<br>+-------------+-------------+-------------+<br>| customer_id | total_price | total_lines |<br>+-------------+-------------+-------------+<br>| 689 | 770793 | 3811 |<br>| 6543 | 754138 | 3740 |<br>| 5337 | 742034 | 3674 |<br>| 5825 | 738420 | 3593 |<br>| 5803 | 733495 | 3670 |<br>| 1579 | 732507 | 3666 |<br>| 9316 | 731091 | 3610 |<br>| 2046 | 722631 | 3531 |<br>| 6319 | 720100 | 3572 |<br>| 6019 | 718031 | 3475 |<br>+-------------+-------------+-------------+<br><br>10 rows in set (43 min 10.11 sec) |
As an added benefit, if you build more than one complete refresh view from an incrementally refreshable one, then you can keep the complete views transactionally consistent with each other, as long as you refresh all of the complete views after the incremental one.
We can use the new MV (dashboard_top_customers) to list the top 10 customers nearly instantly. Note that every MV gets a special column mview$pk which is an auto_increment BIGINT surrogate key for the table. For complete refresh views which are ordered with ORDER BY, such as this one, this creates a ranking function automatically. In cases where this is not useful, simply ignore this column. It is used to prevent wide innodb primary keys on the MV.
|
1 |
mysql> select mview$pk as rank,<br> customer_id,<br> total_price,<br> total_lines<br> from demo.dashboard_top_customers<br>limit 10;<br>+------+-------------+-------------+-------------+<br>| rank | customer_id | total_price | total_lines |<br>+------+-------------+-------------+-------------+<br>| 1 | 689 | 770793 | 3811 |<br>| 2 | 6543 | 754138 | 3740 |<br>| 3 | 5337 | 742034 | 3674 |<br>| 4 | 5825 | 738420 | 3593 |<br>| 5 | 5803 | 733495 | 3670 |<br>| 6 | 1579 | 732507 | 3666 |<br>| 7 | 9316 | 731091 | 3610 |<br>| 8 | 2046 | 722631 | 3531 |<br>| 9 | 6319 | 720100 | 3572 |<br>| 10 | 6019 | 718031 | 3475 |<br>+------+-------------+-------------+-------------+<br>10 rows in set (0.00 sec) |
You can also use this MV to display the total sales amount for any particular customer.
Notice how you can use this summary to calculate the number of lines in the order_lines table, three orders of magnitude more quickly than COUNT(*).
|
1 |
mysql> select count(*) cnt from order_linesG<br>*************************** 1. row ***************************<br>cnt: 155187034<br>1 row in set (32.03 sec)<br><br>mysql> select sum(total_lines) cnt from dashboard_top_customersG<br>*************************** 1. row ***************************<br>cnt: 155187034<br>1 row in set (0.03 sec) |
Notice the difference in response time.
For demonstration purposes, I did the following:
The view type does not usually* affect the time it takes to build the MVs the first time. Both of these views build in about the same about of time:
|
1 |
mysql> call flexviews.enable(<br>-> flexviews.get_id('demo','complete_example2'));<br>Query OK, 0 rows affected (42 min 42.14 sec)<br><br>mysql> call flexviews.enable(<br>-> flexviews.get_id('demo','dashboard_customer_sales'));<br>Query OK, 0 rows affected (41 min 52.04 sec) |
*If you use MIN/MAX/COUNT_DISTINCT, a secondary view will be built in the flexviews schema, transparently, to manage the distinct values for each group in the view. This will increases the time it takes to build, and the refresh will be more expensive. This is an optimization which is required to efficiently refresh MV which use those aggregate functions.
Now I delete some line items from orders:
|
1 |
mysql> delete<br>-> from order_lines<br>-> where order_id<br>-> between 1 and 100<br>-> limit 500;<br>Query OK, 484 rows affected (0.27 sec) |
Since the base table has changed, the views now exhibit drift. There should be 484 less lines reflected in the count in our MVs but they are out-of-date and must be refreshed:
|
1 |
mysql> select sum(total_lines) from complete_example2;<br>+------------------+<br>| sum(total_lines) |<br>+------------------+<br>| 155187034 | <-- too high<br>+------------------+<br>1 row in set (0.68 sec)<br><br>mysql> select sum(total_lines) from dashboard_customer_sales ;<br>+------------------+<br>| sum(total_lines) |<br>+------------------+<br>| 155187034 | <-- too high<br>+------------------+<br>1 row in set (0.61 sec) |
The view which uses the complete refresh method takes a very long time to refresh:
|
1 |
mysql> call flexviews.refresh( <br>-> flexviews.get_id('demo','complete_example2'),'BOTH',NULL); <br>Query OK, 0 rows affected (42 min 42.14 sec) |
But the incrementally refreshable version does not take long to refresh:
|
1 |
mysql> call flexviews.refresh( <br>-> flexviews.get_id('demo', <br>-> 'dashboard_customer_sales'),'BOTH',NULL); <br>Query OK, 0 rows affected (7.01 sec) |
The second one is 365x faster because it examines only the rows that changed. This exceptional ability to look at only what changed, even for MVs with aggregation and joins, is the value proposition for Flexviews.
The first parameter to flexviews.refresh() is the materialized view id. Each view has an identifier which can be obtained with flexviews.get_id(‘schema’,’table’).
The second parameter reflects the refresh type. This parameter can take the options: ‘BOTH’,’COMPUTE’,’APPLY’, or ‘COMPLETE’. I specified ‘BOTH’ for the second parameter in both examples. ‘BOTH’ means compute the changes to the view, and also apply them. This is a combination of the ‘COMPUTE’ and ‘APPLY’ options. When you specify ‘COMPUTE’ then Flexviews will compute the row changes for the view, but not apply them. If their are changes which have not been applied, then you can apply those changes by passing the ‘APPLY’ option.
In both examples, I pass NULL as the last parameter. Materialized views which based on the complete refresh method will always take NULL for this parameter. For incrementally refreshable views, the last parameter is a transaction id number, which is usually obtained with flexviews.get_uow_id_from_datetime() function. If you pass a NULL value, then it refreshes the view up to the latest changes which have been collected. You can use this to refresh multiple incremental refresh views to the same transactional point in time.
Also note that you can ‘COMPUTE’ or ‘APPLY’ changes to a particular transaction id. Logically, you can not apply changes past the transaction id to which you have computed them.
|
1 |
mysql> select sum(total_lines) from dashboard_customer_sales ; <br>+------------------+ <br>| sum(total_lines) | <br>+------------------+ <br>| 155186550 | <br>+------------------+ <br>1 row in set (0.64 sec) <br><br>mysql> select sum(total_lines) from complete_example2 ; <br>+------------------+ <br>| sum(total_lines) |<br>+------------------+ <br>| 155186550 | <br>+------------------+ <br>1 row in set (0.68 sec) |
Flexviews supports two refresh methods, the complete method and the incremental method. I think you will agree that the incremental method has significant advantages over the complete method. In this example the former method was over 350x faster than the latter.
I also showed how you can combine both types of views together. The complete method examples show how to create a complete refresh view which reads from an incrementally refreshable one. This allows the use of SQL features not available with the incremental method, like ORDER BY or use of NOW(), it but still provides improved performance during refresh by accessing summarized data.
I hope this helps you understand how Flexviews can help you ensure fast response times in your application by making access to summary data efficient.
Resources
RELATED POSTS