If you know me, then you probably have heard of Flexviews. If not, then it might not be familiar to you. I’m giving a talk on it at the MySQL 2011 CE, and I figured I should blog about it before then. For those unfamiliar, Flexviews enables you to create and maintain incrementally refreshable materialized views.
You might be asking yourself “what is an incrementally refreshable materialized view?”. If so, then keep reading. This is the first in a multi-part series describing Flexviews.
The output of any SQL query is logically an in-memory table
The output of a SQL query, and what we normally think of as a result set is really a virtual table. It has columns and rows, just like a database table, but it is temporary in nature, usually existing only in memory for a short time. This concept extends from relational algebra, upon which SQL is built. All SQL can be broken down into relational algebra, which is convenient because that means there are lots of transformations that can be done to it, all without changing the meaning or the output. The output of all relational algebraic statements is a table, so conceptually so too is the output of a SQL statement.
MySQL even includes a SQL statement that makes this perfectly clear: CREATE TABLE .. AS SELECT (CTAS). The results of the SELECT portion of the statement are stored in a table. Storing the results of a SQL statement into a table (even a temporary table) is called materializing the results.
Views are similar to regular SQL statements.
A view is a SQL statement which acts like a table. When you execute a query on a view, the result set is ephemeral, generated at run-time for consumption and then immediately discarded.
Views are not generally considered to be a performance optimization because:
- The contents of the result set are computed each time the view is accessed.
- If multiple statements access the same view repeatedly or concurrently, then this computation is likely to be expensive
- If the view accesses large quantities of data, then the computation is likely expensive.
- Views containing grouping, aggregation, sorting, distinct or other conditions must be fully computed and stored in a temporary table before they can be accessed, which is very expensive.
What is a materialized view (MV)?
A materialized view is similar to a regular view, in that it represents the result set of a query, but the contents are stored (materialized!) as a real table.Â So a MV is similar to a table created with the CTAS command described above. This similarity is fairly superficial though. While both store results in a table, the MV represents the results of a SQL statement at a specific point in time, essentially a consistent snapshot of the query result. It is not possible to create multiple different tables via CTAS and have them all be transactionally consistent with one another, unless you stop all database write activity.
Materialized views can be used to enhance performance by acting as a cache. Further, the cost of a cache miss is lower because incremental refresh is faster than recomputing the contents from scratch.
- The contents of the result set are updated periodically, not each time the view is accessed.
- If multiple statements access the same view repeatedly or concurrently it is not likely to be very expensive.
- If the view is large, accessing the table will be considerably cheaper
- You can add indexes to the MV.
- Since the data is already joined together and pre-aggregated, CPU and memory usage may be reduced compared to computing the results.
- You can often generate multiple different blocks of content from the summarized data in one or more views
Materialized views must be refreshed.
Refreshing a MV brings it up to date to reflect the changes in the database since the view was either first created or last refreshed, whichever is more recent. More importantly a MV can can be refreshed to a specific point in time, not just “now”. This means that you can maintain multiple MVs and keep them synced to the same point in time.Â Â There are two different methods by which a MV can be refreshed.
The first is the complete refresh method, which rebuilds the entire contents of the view from scratch. This is the less desirable method:
- During a complete refresh, the view contents are completely recalculated, which could be very expensive
- On some databases, the contents of the view are not available during complete refresh (not true of Flexviews)
- During refresh the view may occupy twice as much space (similar to ALTER TABLE)
- Supports all SQL syntax (like outer join) but can’t be refreshed to a specific point in time.
- Performs no better than CTAS, but gives a convenient method of creating and refreshing the materialized results
The second is refresh method is incremental refresh. This method updates the view. It usually examines only the rows which have changed since the view was last refreshed.
Incremental refresh has obvious benefits which include:
- Refreshing large views is orders of magnitude faster than complete refresh.
- When updating the view, only a subset of the database rows must be examined
- The rows examined are related to the rows which have changed since the last refresh
- The view can be refreshed forward to to a specific transactional point in time
- Multiple views can be rolled forward to the exact same consistent point in time, with respect to committed transactions
- The processing can be done on a slave dedicated to such work.
And some drawbacks:
- Not all SQL syntax supported (no outer join), no non-deterministic functions, etc.
- There is overhead for change-data capture.
- Some extra storage is used for the changes and deltas.
- Creating an MV is not as simple as I’d like.
Incremental refresh capabilities imply that Flexviews has some way to capture the changes that happen in the database and then apply those changes to the views.
These capabilities break down into three main categories which will be blogged about in subsequent posts:
- Change Data Capture – How Flexviews figures out what changed in the database
- Delta computation – How it uses those changes to compute the differences between the old results and the new results
- Delta application – How Flexviews uses the deltas to actually make the changes
- Finally, the last blog post will describe how to use Flexviews for online schema change, or as an ELT* tool and it will cover how to create the materialized views which Flexviews manages.
*ELT – extract, load and transform, a variant of the data integration Extract-Transform-Load process (ETL).
You can find part 2 of the series here: http://www.percona.com/blog/2011/03/25/using-flexviews-part-two-change-data-capture/