One of the major features coming to MySQL 8.0 is the support of Window functions. The detailed documentation is already available here Window functions. I wanted to take a quick look at the cases where window functions help.
Probably one the most frequent limitations in MySQL SQL syntax was analyzing a dataset. I tried to find the answer to the following question: “Find the Top N entries for each group in a grouped result.”
To give an example, I will refer to this request on Stackoverflow. While there is a solution, it is hardly intuitive and portable.
This is a popular problem, so databases without window support try to solve it in different ways. For example, ClickHouse introduced a special extension for LIMIT. You can use LIMIT n BY m to find “m” entries per group.
This is a case where window functions come in handy.
As an example, I will take the IMDB database and find the TOP 10 movies per century (well, the previous 20th and the current 21st).To download the IMDB dataset, you need to have to have an AWS account and download data from S3 storage (the details are provided on IMDB page).
I will use the following query with MySQL 8.0.3:SELECT primaryTitle,century*100,rating,genres,rn as `Rank` FROM (SELECT primaryTitle,startYear div 100 as century,rating,genres, RANK() OVER (PARTITION BY startYear div 100 ORDER BY rating desc) rn FROM title,ratings WHERE title.tconst=ratings.tconst AND titleType='movie' AND numVotes>100000) t1 WHERE rn<=10 ORDER BY century,rating desc
The main part of this query is RANK() OVER (PARTITION BY startYear div 100 ORDER BY rating desc), which is the mentioned window function. PARTITION BY divides rows into groups, ORDER BY specifies the order and RANK() calculates the rank using the order in the specific group.
The result is:+---------------------------------------------------+-------------+--------+----------------------------+------+ | primaryTitle | century*100 | rating | genres | Rank | +---------------------------------------------------+-------------+--------+----------------------------+------+ | The Shawshank Redemption | 1900 | 9.3 | Crime,Drama | 1 | | The Godfather | 1900 | 9.2 | Crime,Drama | 2 | | The Godfather: Part II | 1900 | 9 | Crime,Drama | 3 | | 12 Angry Men | 1900 | 8.9 | Crime,Drama | 4 | | The Good, the Bad and the Ugly | 1900 | 8.9 | Western | 4 | | Schindler's List | 1900 | 8.9 | Biography,Drama,History | 4 | | Pulp Fiction | 1900 | 8.9 | Crime,Drama | 4 | | Star Wars: Episode V - The Empire Strikes Back | 1900 | 8.8 | Action,Adventure,Fantasy | 8 | | Forrest Gump | 1900 | 8.8 | Comedy,Drama,Romance | 8 | | Fight Club | 1900 | 8.8 | Drama | 8 | | The Dark Knight | 2000 | 9 | Action,Crime,Drama | 1 | | The Lord of the Rings: The Return of the King | 2000 | 8.9 | Adventure,Drama,Fantasy | 2 | | The Lord of the Rings: The Fellowship of the Ring | 2000 | 8.8 | Adventure,Drama,Fantasy | 3 | | Inception | 2000 | 8.8 | Action,Adventure,Sci-Fi | 3 | | The Lord of the Rings: The Two Towers | 2000 | 8.7 | Action,Adventure,Drama | 5 | | City of God | 2000 | 8.7 | Crime,Drama | 5 | | Spirited Away | 2000 | 8.6 | Adventure,Animation,Family | 7 | | Interstellar | 2000 | 8.6 | Adventure,Drama,Sci-Fi | 7 | | The Intouchables | 2000 | 8.6 | Biography,Comedy,Drama | 7 | | Gladiator | 2000 | 8.5 | Action,Adventure,Drama | 10 | | Memento | 2000 | 8.5 | Mystery,Thriller | 10 | | The Pianist | 2000 | 8.5 | Biography,Drama,Music | 10 | | The Lives of Others | 2000 | 8.5 | Drama,Thriller | 10 | | The Departed | 2000 | 8.5 | Crime,Drama,Thriller | 10 | | The Prestige | 2000 | 8.5 | Drama,Mystery,Sci-Fi | 10 | | Like Stars on Earth | 2000 | 8.5 | Drama,Family | 10 | | Whiplash | 2000 | 8.5 | Drama,Music | 10 | +---------------------------------------------------+-------------+--------+----------------------------+------+ 27 rows in set (0.19 sec)
The previous century was dominated by “The Godfather” and the current one by “The Lord of the Rings”. While we may or may not agree with the results, this is what the IMDB rating tells us.
If we look at the result set, we can see that there are actually more than ten movies per century, but this is how function RANK() works. It gives the same RANK for rows with an identical rating. And if there are multiple rows with the same rating, all of them will be included in the result set.
I welcome the addition of window functions into MySQL 8.0. This definitely simplifies some complex analytical queries. Unfortunately, complex queries still will be single-threaded — this is a performance limiting factor. Hopefully, we can see multi-threaded query execution in future MySQL releases.
Join Percona’s, CTO, Vadim Tkachenko as he presents Gain a MongoDB Advantage with the Percona Memory Engine on Wednesday, December 6, 2017, at 11:00 am PST / 2:00 pm EST (UTC-8).
Experience: Entry Level to Intermediate
Tags: Developer, DBAs, Operations
Looking for the performance of Redis or Memcache, the expressiveness of the MongoDB query language and simple high availability and sharding? Percona Memory Engine, available as part of Percona Server for MongoDB, has it all!
In this webinar, Vadim explains the architecture of the MongoDB In-Memory storage engine. He’ll also show some benchmarks compared to disk-based storage engines and other in-memory technologies.
Vadim will share specific use cases where Percona Memory Engine for MongoDB excels, such as:
Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products. He also co-authored the book High-Performance MySQL: Optimization, Backups, and Replication 3rd Edition. Previously, he founded a web development company in his native Ukraine and spent two years in the High-Performance Group within the official MySQL support team. Vadim received a BS in Economics and an MS in computer science from the National Technical University of Ukraine.
In this blog post, I investigate a case of spiking InnoDB Rows inserted in the absence of a write query, and find internal temporary tables to be the culprit.
Recently I was investigating an interesting case for a customer. We could see the regular spikes on a graph depicting “InnoDB rows inserted” metric (jumping from 1K/sec to 6K/sec), however we were not able to correlate those spikes with other activity. The innodb_row_inserted graph (picture from PMM demo) looked similar to this (but on a much larger scale):
Other graphs (Com_*, Handler_*) did not show any spikes like that. I’ve examined the logs (we were not able to enable general log or change the threshold of the slow log), performance_schema, triggers, stored procedures, prepared statements and even reviewed the binary logs. However, I was not able to find any single write query which could have caused the spike to 6K rows inserted.
Finally, I figured out that I was focusing on the wrong queries. I was trying to correlate the spikes on the InnoDB Rows inserted graph to the DML queries (writes). However, the spike was caused by SELECT queries! But why would SELECT queries cause the massive InnoDB insert operation? How is this even possible?
It turned out that this is related to temporary tables on disk. In MySQL 5.7 the default setting for internal_tmp_disk_storage_engine is set for InnoDB. That means that if the SELECT needs to create a temporary table on disk (e.g., for GROUP BY) it will use the InnoDB storage engine.
Is that bad? Not necessarily. Krunal Bauskar published a blog post originally about the InnoDB Intrinsic Tables performance in MySQL 5.7. The InnoDB internal temporary tables are not redo/undo logged. So in general performance is better. However, here is what we need to watch out for:
Beware of the new change in MySQL 5.7, the internal temporary tables (those that are created for selects when a temporary table is needed) are stored in InnoDB ibtmp file. In most cases this is faster. However, it can change the original behavior. If needed, you can switch the creation of internal temp tables back to MyISAM: set global internal_tmp_disk_storage_engine=MYISAM
In this post, we’ll examine how we’ve improved the GUI layout for Percona Monitoring and Management 1.5 by moving the Query Analytics (QAN) functions into the Grafana interface.
For Percona Monitoring and Management users, you might notice that QAN appears a little differently in our 1.5 release. We’ve taken steps to unify the PMM interface so that it feels more natural to move from reviewing historical trends in Metrics Monitor to examining slow queries in QAN. Most significantly:
Starting from the PMM landing page, you still see two buttons – one for Metrics Monitor and another for Query Analytics (this hasn’t changed):
Once you select Query Analytics on the left, you see the new Metrics Monitor dashboard page for PMM Query Analytics. It is now hosted as a Metrics Monitor dashboard, and notice the URL is no longer /qan:
Another advantage of the Metrics Monitor dashboard integration is that the QAN inherits the host selector from Grafana, which supports partial string matching. This makes it simpler to find the host you’re searching for if you have more than a handful of instances:
The last feature enhancement worth mentioning is the native Grafana time selector, which lets you select down to the minute resolution time frames. This was a frequent source of feature requests — previously PMM limited you to our pre-defined default ranges. Keep in mind that QAN has an internal archiving job that caps QAN history at eight days.
We hope you enjoy the new interface, and we look forward to your feedback on these improvements!
Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.
The CFP for Percona Live Santa Clara 2018 closes December 22, 2017: please consider submitting as soon as possible. We want to make an early announcement of talks, so we’ll definitely do a first pass even before the CFP date closes. Keep in mind the expanded view of what we are after: it’s more than just MySQL and MongoDB. And don’t forget that with one day less, there will be intense competition to fit all the content in.
A new book on MySQL Cluster is out: Pro MySQL NDB Cluster by Jesper Wisborg Krogh and Mikiya Okuno. At 690 pages, it is a weighty tome, and something I fully plan on reading, considering I haven’t played with NDBCLUSTER for quite some time.
Did you know that since MySQL 5.7.17, connection control plugins are included? They help DBAs introduce an increasing delay in server response to clients after a certain number of consecutive failed connection attempts. Read more at the connection control plugins.
While there are a tonne of announcements coming out from the Amazon re:Invent 2017 event, I highly recommend also reading Some data of interest as AWS reinvent 2017 ramps up by James Governor. Telemetry data from sumologic’s 1,500 largest customers suggest that NoSQL database usage has overtaken relational database workloads! Read The State of Modern Applications in the Cloud. Page 8 tells us that MySQL is the #1 database on AWS (I don’t see MariaDB Server being mentioned which is odd; did they lump it in together?), and MySQL, Redis & MongoDB account for 40% of database adoption on AWS. In other news, Andy Jassy also mentions that less than 1.5 months after hitting 40,000 database migrations, they’ve gone past 45,000 over the Thanksgiving holiday last week. Have you started using AWS Database Migration Service?Releases
Percona Server for MongoDB is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:
Percona Server for MongoDB requires no changes to MongoDB applications or code.
This release is based on MongoDB 3.4.10 and includes the following additional change:
Percona announces the release of Percona Monitoring and Management 1.5.1. This release contains fixes for bugs found after Percona Monitoring and Management 1.5.0 was released.Bug fixes
Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.
This release is the current GA (Generally Available) stable release in the 2.3 series.New Features
Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.New features:
For general inquiries, please send us your question and someone will contact you.