]]>
]]>

You are here

Latest MySQL Performance Blog posts

Subscribe to Latest MySQL Performance Blog posts feed
Percona's MySQL & InnoDB performance and scalability blog
Updated: 33 min 24 sec ago

Free MySQL QA & Bash/Linux Training Series

March 17, 2015 - 6:00am

Welcome to the MySQL QA Training Series!

If you have not read our introductory blog post on pquery yet, I’d recommend reading that one first to get a bit of background. The community is enthuastic about pquery, and today I am happy to announce a full training series on pquery and more. Whether you are a Linux or MySQL newbie or a seasoned QA engineer, there is something here for you. From Bash scripting (see episode 1 below), to every aspect of the new pquery framework, it is my hope that you enjoy this series. If you do, please leave us a comment

Database quality assurance is not as straightforward as it may seem. It’s not a matter of point-and-click, but rather of many intertwined tools and scripts. Beyond that, due to the complexity of the underlying product, it’s about having an overall plan or vision on how to adequately test the product in every aspect.

Take for example the SELECT statement; it allows specifying about 30 different clauses or modifiers (GROUP BY, WHERE, ORDER, LIMIT, HAVING, …). Then, think further about what one could do inside these clauses, or inside subselects etc. The number of possible combinations (exhaustive testing) of all commands (and all formats and variations thereof) plus all mysqld options (nearly 500 of them) is for all intents and purposes infinite, and thus seemingly impossible to test.

In Episode 13, an approach is proposed which, in our view, adequately solves this test infinite-possibility coverage problem through the use of random spread coverage testing and sql interleaving.Knowing your Bash/Linux/Gnu scripting well is also an almost definite prerequisite to getting started with mysqld QA. Episode 1 in this series is over 3.5 hours of in-depth (from easy to advanced) training on many often-used Bash commands and topics. From ls to sed, from cp to xargs and from variables to arrays. Enjoy!

Without further ado, here are the planned upcoming episodes:

MySQL QA Episode 1: Bash/GNU Tools & Linux Upskill & Scripting Fun
MySQL QA Episode 2: Build a MySQL Server: Git, Compiling, Build Tools
MySQL QA Episode 3: Debugging: GDB, Backtraces, Frames, Library Dependencies
MySQL QA Episode 4: QA Framework Setup Time! percona-qa, pquery, reducer & more
MySQL QA Episode 5: Preparing Your QA Run: mtr_to_sql.sh and pquery-run.sh
MySQL QA Episode 6: Analyzing & Filtering: pquery-prep-red.sh, -clean-known.sh & pquery-results.sh
MySQL QA Episode 7: Reducing Testcases for Beginners: single-threaded reducer.sh
MySQL QA Episode 8: Reducing Testcases for Engineers: tuning reducer.sh
MySQL QA Episode 9: Reducing Testcases for Experts: multi-threaded reducer.sh
MySQL QA Episode 10: Reproducing and Simplifying: How to get it Right
MySQL QA Episode 11: Valgrind Testing: Pro’s, Con’s, Why and How
MySQL QA Episode 12: Multi-node Cluster Testing Using Docker
MySQL QA Episode 13: A Better Approach to all MySQL Regression, Stress & Feature Testing: Random Coverage Testing & SQL Interleaving

A short introduction on each episode:

As episodes are finished, the series titles above will be linked so it’s easy to check this page for updates.

Enjoy!

The post Free MySQL QA & Bash/Linux Training Series appeared first on MySQL Performance Blog.

Deep dive into MySQL’s innochecksum tool

March 16, 2015 - 12:00am

One of our Percona Support customers recently reported that Percona XtraBackup failed with a page corruption error on an InnoDB table. The customer thought it was a problem or bug in the Percona XtraBackup tool. After investigation we found that an InnoDB page was actually corrupted and a Percona XtraBackup tool caught the error as expected and hence the backup job failed.

I thought this would be an interesting topic and worthy of a blog post. In this article I will describe the innochecksum tool, when and how to use it and what are the possible fixes if an InnoDB table suffers from page corruption.

The innochecksum tool is an offline tool that prints checksums for InnoDB files. This tool reads an InnoDB tablespace file, calculates the checksum for each page and reports mismatches, if any. A checksum mismatch is an indication of corrupt pages. Being as offline tool, innochecksum can’t be used on tablespace file that a MySQL server is currently using,  hence you need to shutdown the server prior to running the innochecksum tool. If you try to run the innochecksum tool on a running MySQL server, then there is a chance that innochecksum crashes or reports a bad checksum for a good page, resulting false positives results. There is chance when you run innochecksum on a tablespace file that is opened by mysqld, that pages are dirty and not checksummed yet by the InnoDB storage engine itself.

The point: don’t run innochecksum against a running server.

InnoDB corruption can be caused by many factors (e.g. power lost, faulty hardware, bugs).  The InnoDB storage engine validates calculated checksum while reading pages from a tablespace on disk to the stored checksum in the page. In case, InnoDB finds page checksum mismatch it will force down the MySQL server.

Let me show you a page corruption error identified by Percona XtraBackup during a backup run in which the backup failed afterward.

[01] xtrabackup: Database page corruption detected at page 25413, retrying... [01] xtrabackup: Database page corruption detected at page 25413, retrying... [01] xtrabackup: Database page corruption detected at page 25413, retrying...

First, we need to identify if the tablespace is really corrupted for that particular table. I do that with the help of the innochecksum utility as shown below. As I mentioned earlier, make sure to shut down MySQL before using the innochecksum tool.

$ innochecksum -p 25413 /path/to/datadir/database_name/table_name.ibd

I passed the -p (page) flag for innochecksum to only check the specific pages that were reported corrupt by Percona XtraBackup. Without passing any option to the innochecksum tool, it will check entire tablespace for corruption which will required additional server downtime. The innochecksum tool also supports the -d (debug) option to print the checksum for each page and the -v (verbose) parameter to print a progress indicator. You can find more details in the manual. If the tool reports page corruption then database table is really corrupted as below.

page 25413 invalid (fails log sequence number check)

In order to fix this issue, the first thing you should try is to mysqldump the corrupted table and If mysqldump succeeded then problem exists in secondary indexes for that tablespace. This is because the mysqldump utility doesn’t touch indexes as indexes are created after all rows are inserted.

If mysqldump succeeds then the problem is associated with indexes. I would suggest following options to fix the corruption.

— Execute OPTIMIZE TABLE on that table which rebuilds indexes. The table will be locked during the operation prior to MySQL 5.6.17. Since MySQL 5.6.17 OPTIMIZE TABLE is an online operation.
— Rebuild table with the pt-online-schema-change tool from Percona Toolkit. This will give the same result as OPTIMIZE TABLE a non-blocking way as the pt-online-schema=change tool is online schema change tool.
— Drop all secondary indexes and then recreate them. The table will be locked during that operation for writes only. Again, you can use pt-online-schema-change tool for this purpose without sacrificing read/writes ability on the table during the drop and create indexes operation.

Finally, I would suggest to re-run the innochecksum tool to verify the tables integrity again as this will make sure there is no more page corruption. In this case we found that the table was actually corrupted and fixing table corruption through the backup/reload table fixed the problem and Percona XtraBackup ran fine during the next run.

It is possible that mysqldump crashes a MySQL server for a corrupted table. Fortunately, Percona Server contains innodb_corrupt_table_action which you can enable. The configuration variable is dynamic in nature, this means enabling it doesn’t requires a MySQL server restart. Prior to Percona Server 5.6 innodb_corrupt_table_action was known as innodb_pass_corrupt_table. Once you enable this option, you can try mysqldump again. If you are using Oracle MySQL then I would suggest to try this with innodb_force_recovery in case mysqldump fails to dump the table contents.

As a side note, if your backup is successful without any errors while performing a backup with Percona Xtrabackup, this means your InnoDB tables don’t have any page checksum mismatch or corruption. Percona XtraBackup can validate page checksums and in case of errors it  logs error and exists as I mentioned above.

There is also a modified version of the innochecksum made available by Facebook’s Mark Callaghan and can be found in this bug report which provides extra stats on tablespace undo blocks. There is another tool made by Google’s Jeremy Cole known as the InnoDB Ruby Tool to examine the internals of InnoDB.

LIMITATIONS:

  • Innochecksum is an offline InnoDB checksum tool. This means you must stop MySQL server. Otherwise it produces “Unable to lock file” error since MySQL 5.7.
  • Old versions of innochecksum only supports files up to 2GB in size. However, since MySQL 5.6 innochecksum supports files greater than 2GB in size.
  • Percona Server variable innodb_corrupt_table_action is supported on tables existing in their tablespace (i.e. innodb_file_per_table).
  • If you are using compressed tables (ROW_FORMAT=COMPRESSED) , then you must use innochecksum from MySQL 5.7.2 or greater, as earlier versions of innochecksum don’t support compressed tables. Check this bug for details.

New Features for the innochecksum tool from MySQL 5.7:

  • As I mentioned above, since MySQL 5.7 innochecksum supports file sizes greater than 2GB.
  • Since MySQL 5.7 you can log the output with the –log option.
  • –page-type-summary option added for page type summaries.
  • MySQL 5.7 also includes another nice option –page-type-dump which dumps the details of each page to standard output (STDOUT) or standard error (STDERR).
  • Since MySQL 5.7 innochecksum can be executed on multiple user-defined system tablespace files.
  • Since MySQL 5.7 innochecksum can be executed on multiple system tablespace files.

You can read more about this is in the MySQL 5.7 manual page of innochecksum.

Conclusion:
In this post, we identified InnoDB page corruption using the logs generated by Percona XtraBackup and fixed  them by using the mysqldump tool. But again, unfortunately, there are chances that Percona XtraBackup will not always fail in the same way when it finds corruption. So in some cases, it’s not easy to tell whether Percona XtraBackup has failed due to a bad checksum or a bug of its own. But in most cases, page corruption is the culprit if Percona XtraBackup fails to complete.

To summarize, I would say that Percona XtraBackup is a good way of verifying whether or not InnoDB pages are corrupted – and you can also verify the same thing via the mysqldump utility.

The post Deep dive into MySQL’s innochecksum tool appeared first on MySQL Performance Blog.

Q&A: Multi-threaded Replication in MySQL 5.6 and MySQL 5.7

March 13, 2015 - 8:45am

My webinar “Multi-threaded Replication in MySQL 5.6 and 5.7″ on February 25 generated several excellent questions following the presentation (available here for playback along with the slides). I didn’t have time to answer many of the questions during the session and so in this post I answer all of them. Thanks to everyone who attended!

Q: What do you expect from MTS with logical clock? Do you think performance would be good as with per database?
A: MTS with 5.6 is not usable if you have a single database. I do not have numbers, but this is quite frequent. With 5.7 everyone should be able to benefit from multi-threaded replication.

Q: When MySQL 5.6 was released, performance of MTS was lower, than in 5.5, for example. Is this addressed now?
A: I am not sure which specific issue or bug you are referring, but if your data is spread across several databases

Q: How does Percona XtraBackup work with MTS? What are the changes in mysqldump?
A: As long as you are using GTIDs, you can safely take a backup from a slave using multi-threaded replication: with XtraBackup, add the --slave-info option as usual when taking a backup from a slave and with mysqldump, use --master-data instead of --dump-slave.

Q: For checkpoint position, what if MTS thread apply Insert before creating table where it inserting data. How MTR checkpoint will keep track of these transactions applying by different thread on slave?
A: The worker threads track all execution gaps to make sure that out-of-order execution is safe and to be able to replay all events without forgetting any of them. So it is not possible that a worker thread will insert data in a table that has not been created yet.

Q: Can you use MTS with all binlog_format options?
A: Yes

Q: Is there any way to have the threads work so that no database contention happens?
A: The short answer is no: the goal of the worker threads is to execute the incoming transactions as fast as possible. If that results in database contention, you should probably decrease the number of worker threads.

Q: Why doesn’t multi-threaded replication perform well on a single DB?
A: With 5.6, parallelization is based on isolating the transactions going to each database. If you only have a single DB, no parallelization is possible. You should look at 5.7 and the logical clock algorithm.

Q: Are there any implications with regards to GTIDs and Multi-Threaded replication when running a Master-to-Master setup?
A: I cannot think of any, however I am not sure master-master replication is still very relevant when using GTIDs.

Q: Is there any inconvenience with memory or cache when using more workers than the number of databases?
A: If the number of workers is just a bit higher than the number of databases (like 5 workers for 3 databases), there should not be any issue. However with ridiculously high numbers (500 workers for 2 databases), there might be performance degradation. I have not tested such cases, so I cannot give a good answer there. However the idea is that the number of workers should be close to the number of databases and should exceed the number of cores on the server.

Q: Is there multi-threaded replication in MySQL 5.7?
A: Yes, multi-threaded replication is available in MySQL 5.7 and offers improvements compared to MySQL 5.6 (mainly the parallelization with logical clock).

Q: Have you used DIM_STAT to created load and measure SLAVE Lag? Any interesting take-a-ways from that effort?
A: I used sysbench to generate load and Seconds_Behind_Master from SHOW SLAVE STATUS to measure slave lag. That mainly shows that if your workload is a good fit for MTS (multiple databases and load shared evenly across ), performance benefits can be significant.

Q: Does multi-threaded replication also work with Percona XtraDB Cluster/Percona Server?
A: Percona Server 5.6 is based on MySQL 5.6, so you can use multi-threaded replication exactly as you would use it on MySQL 5.6.

On Percona XtraDB Cluster, it is a bit different: replication inside the cluster uses Galera replication, which has nothing to do with MySQL replication. Note that Galera has offered parallel replication from the beginning (parallel applying of the replicated writesets to be accurate). However if you are using asynchronous replicas, these replicas can benefit from multi-threaded replication if they are running on MySQL/Percona Server 5.6.

Q: What happens to cross db transactions? Do they not replicate?
A: These transactions will replicate, but they will have to wait until all preceding transactions have been executed. Stated differently, cross db transactions introduce serialization, so you should avoid them as much as possible if you want to benefit from parallel applying.

To be accurate, if you have db1, db2 and db3 and if you execute a transaction involving db1 and db2, transactions on db3 can still be applied in parallel. So if you have many databases, cross db transactions may not be that bad.

Q: When using MTS without GTIDs, is “Seconds_Behind_Master” from SHOW SLAVE STATUS valid?
A: Seconds_Behind_Master is based on Exec_Master_Log_Pos. And with MTS, Exec_Master_Log_Pos is not reliable as it indicates the position of the latest checkpoint and not the position of the latest executed transaction. However in practice, checkpoints will happen at least every 300ms by default, so Seconds_Behind_Master is still a good indication of the replication lag. Of course you should keep in mind the usual limitations, such as with multi-tiered replication (if the setup is A->B->C, C will report its lag against B, not against A) or when there is a replication error (then Seconds_Behind_Master is NULL).

Q: How can all the servers be realistically restarted at the same time? There could be a few sec intervals if you have multiple servers [That was when I explained how to enable GTID replication].
A: With MySQL 5.6, the requirements are pretty strict when it comes to enabling GTIDs: all servers must be restarted at the same point in time in the replication stream. As you mention, it is difficult if you have several servers, so the only viable solution is: stop the writes on the master, wait until replication has caught up on all slaves, stop all servers, change the configuration, restart all servers.

What it means is that there is a time range when all servers are down. This is a showstopper for many people, and that’s why Percona Server 5.6 now includes a patch from Facebook that allows an online migration to GTIDs and that’s why MySQL 5.7.6 also offers this option.

* * *

Thanks for all of the great questions – and I hope to see you next month at OpenStack Live and the Percona Live MySQL Conference and Expo 2015 (April 13-16) – both at the Santa Clara conference center in sunny Silicon Valley. Get more info here.

The post Q&A: Multi-threaded Replication in MySQL 5.6 and MySQL 5.7 appeared first on MySQL Performance Blog.

MySQL and geospatial programming: An introduction to GIS

March 13, 2015 - 3:00am

Geographic information systems (GIS) are used by application developers to incorporate geographic information into their datasets to create apps with cool features such finding the address to the best steakhouse in town or the locations of local points of historical interest (the list is endless). In a nutshell, GIS captures, store, checks and displays data related to positions on Earth’s surface.

Next Wednesday I’m going provide an introduction to GIS functionality in MySQL along an overview of recent changes. I invite you to register now for this free webinar titled “MySQL and Geospatial Programming.” It starts at 10 a.m. Pacific time on March 18.

In this talk I’ll discuss:

  • The various tasks that deal with geocoding
  • How MySQL can solve problems such as points within a radius (e.g., “Find the 10 closest coffee shops) among others
  • Highlights from some of the new features bundled in the upcoming 5.7 release, and what benefits they can bring to your applications

I’m happy to field questions in advance in the comments section below. This webinar, like all Percona webinars, will be recorded. In addition to the video, my slides will also be available for download.

Register now and I’ll talk to you next week!

The post MySQL and geospatial programming: An introduction to GIS appeared first on MySQL Performance Blog.

Easy query metrics with MySQL Performance Schema

March 12, 2015 - 12:00am

The MySQL Performance Schema exposes so much data that it’s not trivial to learn, configure, and use. With recently released Percona Agent 1.0.11 you can get query metrics – like min, max, and average query execution time – with a few clicks:

Click “Apply” and about two minutes later you’ll have query metrics from Performance Schema, collected and sent every minute.

Percona Cloud Tools (PCT) and Percona Agent handle all the details. You’ll need MySQL (or Percona Server) 5.6 and Percona Agent 1.0.11 or newer. One caveat at the moment: it only works for local MySQL instances (so not Amazon RDS). This limitation should be fixed soon; we’re already working on it.

Why use Performance Schema? We prefer Query Analytics with MySQL slow logs (especially Percona Server slow logs which expose more metrics) because slow logs expose the most metrics compared to other sources, but sometimes the slow log just isn’t an option, so Performance Schema is the next best choice, but the choice means tradeoffs. For example, Performance Schema does not expose actual query examples (just fingerprints), so EXPLAIN does not work.

For those who’ve been following PCT development, you know that Percona Agent 1.0.5 first introduced support for Performance Schema. What’s new in 1.0.11 is everything – we completely rewrote this part of the agent. It’s so much better that it’s now the required minimum version for using Query Analytics with Performance Schema. Upgrading is really easy: just run the single command line you used to install the agent and it will auto-update.

MySQL Performance Schema exposes a lot of data and insights into the server, but query metrics are perhaps the most important because the primary job of your MySQL database is to execute queries. That’s why Percona Cloud Tools makes Query Analytics with Performance Schema (and slow logs) so easy: to help you focus on the essential and leave the details to the tools.

Percona Cloud Tools is in beta, so it’s still free to sign up and free to use all the tools and features.

The post Easy query metrics with MySQL Performance Schema appeared first on MySQL Performance Blog.

Advanced JSON for MySQL

March 10, 2015 - 3:00am
What is JSON

JSON is an text based, human readable format for transmitting data between systems, for serializing objects and for storing document store data for documents that have different attributes/schema for each document. Popular document store databases use JSON (and the related BSON) for storing and transmitting data.

Problems with JSON in MySQL

It is difficult to inter-operate between MySQL and MongoDB (or other document databases) because JSON has traditionally been very difficult to work with. Up until recently, JSON is just a TEXT document. I said up until recently, so what has changed? The biggest thing is that there are new JSON UDF by Sveta Smirnova, which are part of the MySQL 5.7 Labs releases. Currently the JSON UDF are up to version 0.0.4. While these new UDF are a welcome edition to the MySQL database, they don’t solve the really tough JSON problems we face.

Searching

The JSON UDF provide a number of functions that make working with JSON easier, including the ability to extract portions of a document, or search a document for a particular key. That being said, you can’t use JSON_EXTRACT() or JSON_SEARCH in the WHERE clause, because it will initiate a dreaded full-table-scan (what MongoDB would call a full collection scan). This is a big problem and common wisdom is that JSON can’t be indexed for efficient WHERE clauses, especially sub-documents like arrays or objects within the JSON.

Actually, however, I’ve come up with a technique to effectively index JSON data in MySQL (to any depth). The key lies in transforming the JSON from a format that is not easily indexed into one that is easily indexed. Now, when you think index you think B-TREE or HASH indexes (or bitmap indexes) but MySQL also supports FULLTEXT indexes.

A fulltext index is an inverted index where words (tokens) point to documents. While text indexes are great, they aren’t normally usable for JSON. The reason is, MySQL splits words on whitespace and non-alphanumeric characters. A JSON document doesn’t end up being usable when the name of the field (the key) can’t be associated with the value. But what if we transform the JSON? You can “flatten” the JSON down into key/value pairs and use a text index to associate the key/value pairs with the document. I created a UDF called RAPID_FLATTEN_JSON using the C++ Rapid JSON library. The UDF flattens JSON documents down into key/value pairs for the specific purpose of indexing.

Here is an example JSON document:

{ "id": "0001", "type": "donut", "name": "Cake", "ppu": 0.55, "batters": { "batter": [ { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" }, { "id": "1003", "type": "Blueberry" }, { "id": "1004", "type": "Devil's Food" } ] }, "topping": [ { "id": "5001", "type": "None" }, { "id": "5002", "type": "Glazed" }, { "id": "5005", "type": "Sugar" }, { "id": "5007", "type": "Powdered Sugar" }, { "id": "5006", "type": "Chocolate with Sprinkles" }, { "id": "5003", "type": "Chocolate" }, { "id": "5004", "type": "Maple" } ] }

Flattened:

mysql> select RAPID_FLATTEN_JSON(load_file('/tmp/doc.json'))G *************************** 1. row *************************** RAPID_FLATTEN_JSON(load_file('/tmp/doc.json')): id=0001 type=donut name=Cake ppu=0.55 id=1001 type=Regular id=1002 type=Chocolate id=1003 type=Blueberry id=1004 type=Devil's Food type=Devil's type=Food id=5001 type=None id=5002 type=Glazed id=5005 type=Sugar id=5007 type=Powdered Sugar type=Powdered type=Sugar id=5006 type=Chocolate with Sprinkles type=Chocolate type=with type=Sprinkles id=5003 type=Chocolate id=5004 type=Maple 1 row in set (0.00 sec)

Obviously this is useful, because our keys are now attached to our values in an easily searchable way. All you need to do is store the flattened version of the JSON in another field (or another table), and index it with a FULLTEXT index to make it searchable. But wait, there is one more big problem: MySQL will split words on the equal sign. We don’t want this as it removes the locality of the keyword and the value. To fix this problem you’ll have to undertake the (actually quite easy) step of adding a new collation to MySQL (I called mine ft_kvpair_ci). I added equal (=) to the list of lower case characters as described in the manual. You just have to change two text files, no need to recompile the server or anything, and as I said, it is pretty easy. Let me know if you get stuck on this step and I can show you the 5.6.22 files I modified.

By the way, I used a UDF, because MySQL FULLTEXT indexes don’t support pluggable parsers for InnoDB until 5.7. This will be much cleaner in 5.7 with a parser plugin and there will be no need to maintain an extra column.

Using the solution:
Given a table full of complex json:

create table json2(id int auto_increment primary key, doc mediumtext);

Add a column for the index data and FULLTEXT index it:

alter table json2 add flat mediumtext character set latin1 collate ft_kvpair_ci, FULLTEXT(flat);

Then populate the index. Note that you can create a trigger to keep the second column in sync, I let that up to an exercise of the reader, or you can use Flexviews to maintain a copy in a second table automatically.

mysql> update json2 set flat=RAPID_FLATTEN_JSON(doc); Query OK, 18801 rows affected (26.34 sec) Rows matched: 18801 Changed: 18801 Warnings: 0

Using the index:

mysql> select count(*) from json2 where match(flat) against ('last_name=Vembu'); +----------+ | count(*) | +----------+ | 3 | +----------+ 1 row in set (0.00 sec)

The documents I searched for that example are very complex and highly nested. Check out the full matching documents for the query here here

If you want to only index a subportion of the document, use the MySQL UDF JSON_EXTRACT to extract the portion you want to index, and only flatten that.

Aggregating

JSON documents may contain sub-documents as mentioned a moment ago. JSON_EXTRACT can extract a portion of a document, but it is still a text document. There is no function that can extract ALL of a particular key (like invoice_price) and aggregate the results. So, if you have a document called orders which contains a varying number of items and their prices, it is very difficult (if not impossible) to use the JSON UDF to aggregate a “total sales” figure from all the order documents.

To solve this problem, I created another UDF called RAPID_EXTRACT_ALL(json, ‘key’). This UDF will extract all the values for the given key. For example, if there are 10 line items with invoice_id: 30, it will extract the value (30 in this case) for each item. This UDF returns each item separated by newline. I created a few stored routines called jsum, jmin, jmax, jcount, and javg. They can process the output of rapid_extract_all and aggregate it. If you want to only RAPID_EXTRACT_ALL from a portion of a document, extract that portion with the MySQL UDF JSON_EXTRACT first, then process that with RAPID_EXTRACT_ALL.

For example:

mysql> select json_extract_all(doc,'id') ids, jsum(json_extract_all(doc,'id')) from json2 limit 1G *************************** 1. row *************************** ids: 888 889 2312 5869 8702 jsum(json_extract_all(doc,'id')): 18660.00000 1 row in set (0.01 sec)

Aggregating all of the id values in the entire collection:

mysql> select sum( jsum(json_extract_all(doc,'id')) ) from json2 ; +-----------------------------------------+ | sum( jsum(json_extract_all(doc,'id')) ) | +-----------------------------------------+ | 296615411.00000 | +-----------------------------------------+ 1 row in set (2.90 sec)

Of course you could extract other fields and sort and group on them.

Where to get the tools:
You can find the UDF in the swanhart-tools github repo. I think you will find these tools very useful in working with JSON documents in MySQL.

(This post was originally posted on my personal blog: swanhart.livejournal.com, but is reposed here for wider distribution)

The post Advanced JSON for MySQL appeared first on MySQL Performance Blog.

Pages

]]>