Booking.com, one of the world’s leading e-commerce companies, helps travels book nearly 1 million rooms per night. Established in 1996, Booking.com B.V. guarantees the best prices for any type of property, from small, family-run bed and breakfasts to executive apartments and five-star luxury suites.
The travel website is also a dedicated contributor to the MySQL and Perl community. Other open source technologies include CentOS Linux, Nginx, python, puppet, Git and more.
A Diamond sponsor of Percona Live Amsterdam Sept. 21-23, you can meet the people who power Booking.com at booth 205. Enter promo code “BlogInterview” at registration to save €20!
In the meantime, meet Jean-François Gagné, a system engineer at Booking.com. He’ll be presenting a couple of talks: “Riding the Binlog: an in Deep Dissection of the Replication Stream” and “Binlog Servers at Booking.com.”
Tom: Hi Jean-François, in your session, “Riding the Binlog: an in Deep Dissection of the Replication Stream“, you talk about how we can think of the binary logs as a transport for a “Stream of Transactions”. What will be the top 3 things attendees will come away with following this 50-minute talk?
Jean-François: Hi Tom, thanks for this opportunity to give a sneak peak of my talk. The most important subject that will be discussed is that the binary logs evolves: by the usage of “log-slave-updates”, the stream can grow, shrink or morph. Said in another way: the binary logs of a slave can be very different from the binary logs of the master, and this should be taken into account when relying on those (including when replicating using intermediate master and when promoting a slave as a new master using GTIDs). We will also explore how the binary logs can be decomposed in sub-streams, or viewed as the multiplexing of many streams. We will also look for de-multiplexing functions and the new possibilities that are opened with that.
Tom: Percona Live, starting with this conference, has a new venue and a broader theme – now encompassing, in addition to MySQL, MongoDB, NoSQL and data in the cloud. Your thoughts? And what do think is missing – what would you change (if anything)?
Jean-François: I think you forget the best of all changes: going from a 2 day conference last year in London to a 3 day conference this year. This will allow better knowledge exchange and I am very happy about that. I think this event will be a success with a good balance of sessions focus on technologies and presentation about a specific use-case of those technologies. If I had one wish: I would like to see more sessions about specific use-cases of NoSQL technologies with and in deep discussion about why they are a better choice than more traditional solutions: maybe more of those sessions will be submitted next year.
Tom: Which other session(s) are you most looking forward to besides your own?
Jean-François: I will definitely attend the Facebook session about Semi-Synchronous Replication: it is very close to my interest, especially as Booking.com is thinking about using loss-less semi-sync replication in the future, and I look forward to hear war stories about this feature. All sessions dissecting internals of a technology (InnoDB, TokuDB, RocksDB, …) will also have my attention. Finally, it is always interesting to hear about how large companies are using databases, so I plan to attend the MySQL@Wikimedia session.
Tom: As a resident of Amsterdam, what are some of the must-do activities/sightseeing for those visiting for Percona Live from out of town?
Jean-François: Seeing the city from a high point is impressive, and you will have the opportunity of enjoying that view from the Booking.com office at the Community Dinner. Also, I recommend finding a bike and discover the city pedaling (there are many renting shops, just ask Google). From the conference venue, you can do a 70 minutes ride crossing three nice parks: the Westerpark, the Rembrandtpark and the Vondelpark – https://goo.gl/P13Mc7 – and you can discover the first of third park in a shorter ride (45 minutes). If you feel a little more adventurous, I recommend a 90 minute ride South following the Amstel: once out of Amsterdam, you will have the water on one side at the level of the road, and the fields (Polder) 3 meters below on the other side (https://goo.gl/OPDv5z). This will allow you to see for yourself why this place is called the “Low Countries”.
The post Booking dot yeah! Booking.com’s Jean-François Gagné on Percona Live Amsterdam appeared first on MySQL Performance Blog.
There can be a lot of confusion and lack of planning in Percona XtraDB Clusters in regards to nodes becoming desynchronized for various reasons. This can happen a few ways:
When I say “desynchronized” I mean a node that is permitted to build up a potentially large wsrep_local_recv_queue while some operation is happening. For example a node taking a backup would set wsrep_desync=ON during the backup and potentially fall behind replication some amount.
Some of these operations may completely block Galera from applying transactions, while others may simply increase load on the server enough that it falls behind and applies at a reduced rate.
In all the cases above, flow control is NOT used while the node cannot apply transactions, but it MAY be used while the node is recovering from the operation. For an example of this, see my last blog about IST.
If a cluster is fairly busy, then the flow control that CAN happen when the above operations catch up MAY be detrimental to performance.Example setup
Let us take my typical 3 node cluster with workload on node1. We are taking a blocking backup of some kind on node3 so we are executing the following steps:
This includes up through step 3 above. My node1 is unaffected by the backup on node3, I can see it averaging 5-6k writesets(transactions) per second which it did before we began:
node2 is also unaffected:
but node3 is not applying and its queue is building up:
Let’s examine briefly what happens when node3 is permitted to start applying, but wsrep_desync stays enabled:
node1’s performance is pretty much the same, node3 is not using flow control yet. However, there is a problem:
It’s hard to notice, but node3 is NOT catching up, instead it is falling further behind! We have potentially created a situation where node3 may never catch up.
The PXC nodes were close enough to the red-line of performance that node3 can only apply just about as fast (and somewhat slower until it heats up a bit) as new transactions are coming into node1.
This represents a serious concern in PXC capacity planning:
Nodes do not only need to be fast enough to handle normal workload, but also to catch up after maintenance operations or failures cause them to fall behind.
Experienced MySQL DBA’s will realize this isn’t all that different than Master/Slave replication.Flow Control as a way to recovery
So here’s the trick: if we turn off wsrep_desync on node3 now, node3 will use flow control if and only if the incoming replication exceeds node3’s apply rate. This gives node3 a good chance of catching up, but the tradeoff is reducing write throughput of the cluster. Let’s see what this looks like in context with all of our steps. wsrep_desync is turned off at the peak of the replication queue size on node3, around 12:20PM:
So at the moment node3 starts utilizing flow control to prevent falling further behind, our write throughput (in this specific environment and workload) is reduced by approximately 1/3rd (YMMV). The cluster will remain in this state until node3 catches up and returns to the ‘Synced’ state. This catchup is still happening as I write this post, almost 4 hours after it started and will likely take another hour or two to complete.
I can see a more realtime representation of this by using myq_status on node1, summarizing every minute:[root@node1 ~]# myq_status -i 1m wsrep mycluster / node1 (idx: 1) / Galera 3.11(ra0189ab) Cluster Node Outbound Inbound FlowC Conflct Gcache Appl time P cnf # stat laten msgs data que msgs data que pause snt lcf bfa ist idx %ef 19:58:47 P 5 3 Sync 0.9ms 3128 2.0M 0 27 213b 0 25.4s 0 0 0 3003k 16k 62% 19:59:47 P 5 3 Sync 1.1ms 3200 2.1M 0 31 248b 0 18.8s 0 0 0 3003k 16k 62% 20:00:47 P 5 3 Sync 0.9ms 3378 2.2M 32 27 217b 0 26.0s 0 0 0 3003k 16k 62% 20:01:47 P 5 3 Sync 0.9ms 3662 2.4M 32 33 266b 0 18.9s 0 0 0 3003k 16k 62% 20:02:47 P 5 3 Sync 0.9ms 3340 2.2M 32 27 215b 0 27.2s 0 0 0 3003k 16k 62% 20:03:47 P 5 3 Sync 0.9ms 3193 2.1M 0 27 215b 0 25.6s 0 0 0 3003k 16k 62% 20:04:47 P 5 3 Sync 0.9ms 3009 1.9M 12 28 224b 0 22.8s 0 0 0 3003k 16k 62% 20:05:47 P 5 3 Sync 0.9ms 3437 2.2M 0 27 218b 0 23.9s 0 0 0 3003k 16k 62% 20:06:47 P 5 3 Sync 0.9ms 3319 2.1M 7 28 220b 0 24.2s 0 0 0 3003k 16k 62% 20:07:47 P 5 3 Sync 1.0ms 3388 2.2M 16 31 251b 0 22.6s 0 0 0 3003k 16k 62% 20:08:47 P 5 3 Sync 1.1ms 3695 2.4M 19 39 312b 0 13.9s 0 0 0 3003k 16k 62% 20:09:47 P 5 3 Sync 0.9ms 3293 2.1M 0 26 211b 0 26.2s 0 0 0 3003k 16k 62%
This reports around 20-25 seconds of flow control every minute, which is consistent with that ~1/3rd of performance reduction we see in the graphs above.
Watching node3 the same way proves it is sending the flow control (FlowC snt):mycluster / node3 (idx: 2) / Galera 3.11(ra0189ab) Cluster Node Outbound Inbound FlowC Conflct Gcache Appl time P cnf # stat laten msgs data que msgs data que pause snt lcf bfa ist idx %ef 17:38:09 P 5 3 Dono 0.8ms 0 0b 0 4434 2.8M 16m 25.2s 31 0 0 18634 16k 80% 17:39:09 P 5 3 Dono 1.3ms 0 0b 1 5040 3.2M 16m 22.1s 29 0 0 37497 16k 80% 17:40:09 P 5 3 Dono 1.4ms 0 0b 0 4506 2.9M 16m 21.0s 31 0 0 16674 16k 80% 17:41:09 P 5 3 Dono 0.9ms 0 0b 0 5274 3.4M 16m 16.4s 27 0 0 22134 16k 80% 17:42:09 P 5 3 Dono 0.9ms 0 0b 0 4826 3.1M 16m 19.8s 26 0 0 16386 16k 80% 17:43:09 P 5 3 Jned 0.9ms 0 0b 0 4957 3.2M 16m 18.7s 28 0 0 83677 16k 80% 17:44:09 P 5 3 Jned 0.9ms 0 0b 0 3693 2.4M 16m 27.2s 30 0 0 131k 16k 80% 17:45:09 P 5 3 Jned 0.9ms 0 0b 0 4151 2.7M 16m 26.3s 34 0 0 185k 16k 80% 17:46:09 P 5 3 Jned 1.5ms 0 0b 0 4420 2.8M 16m 25.0s 30 0 0 245k 16k 80% 17:47:09 P 5 3 Jned 1.3ms 0 0b 1 4806 3.1M 16m 21.0s 27 0 0 310k 16k 80%
There are a lot of flow control messages (around 30) per minute. This is a lot of ON/OFF toggles of flow control where writes are briefly delayed rather than a steady “you can’t write” for 20 seconds straight.
It also interestingly spends a long time in the Donor/Desynced state (even though wsrep_desync was turned OFF hours before) and then moves to the Joined state (this has the same meaning as during an IST).Does it matter?
As always, it depends.
If these are web requests and suddenly the database can only handle ~66% of the traffic, that’s likely a problem, but maybe it just slows down the website somewhat. I want to emphasize that WRITES are what is affected here. Reads on any and all nodes should be normal (though you probably don’t want to read from node3 since it is so far behind).
If this were some queue processing that had reduced throughput, I’d expect it to possibly catch up later
This can only be answered for your application, but the takeaways for me are:
Graphs in this post courtesy of VividCortex.
The post High-load clusters and desynchronized nodes on Percona XtraDB Cluster appeared first on MySQL Performance Blog.
Percona is pleased to announce the availability of Percona Toolkit 2.2.15. Released August 28, 2015. Percona Toolkit is a collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.
This release is the current GA (Generally Available) stable release in the 2.2 series. It includes multiple bug fixes as well as continued preparation for MySQL 5.7 compatibility. Full details are below. Downloads are available here and from the Percona Software Repositories.
Say hello to David Murphy, lead DBA and MongoDB Master at ObjectRocket (a Rackspace company). David works on sharding, tool building, very large-scale issues and high-performance MongoDB architecture. Prior to ObjectRocket he was a MySQL/NoSQL architect at Electronic Arts. David enjoys large-scale operational tool building, high performance OS and database tuning. He is also a core code contributor to MongoDB. He’ll be speaking next month at Percona Live Amsterdam, which runs Sept. 21-13. Enter promo code “BlogInterview” at registration to save €20!
Tom: David, your 3-hour tutorial is titled “Mongo Sharding from the trench: A Veterans field guide.” Did your experience in working with vast amounts of data at Rackspace give you a unique perspective, in view, that now puts you into a position to help people just getting started? Can you give a couple examples?
David: I think this has been something organically I grew into from the days of supporting Cpanel type MySQL instances to today. I have worked for a few verticals from hosts to advertising to gaming, finally entering into the platform service. The others give me a host of knowledge around how customer need systems to work, and then the number and range of workloads we see at Rackspace re-enforces this.
Many times the unique perspective comes with the scale such as someone calling up a single node to the multi-terabyte range. When they go to “shard” they can find the process that is normally very light and unnoticeable to most Mongo sharding can severally lock the metadata for an extended time. In other cases, the “balancer” might not be able to keep up with the amount of working being asked of it.
Toward the smaller end of the spectrum, having seen so many workloads from big to small. I can see similar thought processes and trends. When this happens having worked with some many of these workloads, and honestly having learned along the evolution of mongo helps me explain to clients the good, bad, and the hairy. Many times discussions come down to people not using connection pooling, non-indexed sorting, or complex operators such as $in, $nin, and more. In these cases, I can talk to people about the balance of using these concepts and when they will become bigger issues for them. My goal is to give them the enough knowledge to help determine when it is correct to use development resource to fix and issue, and when it’s manageable and that development could be better spent elsewhere.
Tom: The title of your tutorial also sounds like the perfect title of a book. Do you have any for one?
David: What an excellent question! I have thought about this. However, I think the goal of a book if I can find the time to do it. A working title might be “Mongo from the trenches: Surviving the minefield to get ahead”. I think the book might be broken into three sections: “When should you use or not user Mongo”, “Schema and Operatorators in the NoSQL world”, “Sharding”. I would do this as this could be a great mini book on its own the community really could use a level of depth similar to the MySQL 5.0 certification guides. I liked these books as it helped someone understand all the bits of what to consider with your schema design and how it affects the application as much as the database hosts. Then in the second half more administration geared it took those same schema and design choices to help you manage them with confidence.
In the end, Mongo is a good product that works well for most people as it matures we need more and discussion. On topics such as what should you monitor, how you should predict issues, and how valuable are regular audits. Especially in an ecosystem where it’s easy to spin something up, launch it, and move on to the next project.
Tom: When and why would you recommend using MongoDB instead of MySQL?
David: I am glad I mentioned this is worthy of a book already, as it is such a complex topic and one that gets me very excited.
I feel there is a bit or misinformation on both sides of this field. Many in the MySQL camp of experts know when someone says they can’t get more than 1000 TPS via MySQL. 9 out of 10 times and design, not a technology issue, the Mongo crowd love this and due to inherit sharding nature of Mongo they can sidestep these types of issues. Conversely in the Mongo camp you will hear how bad the SQL standard is, however, omitting transactions for a moment, the same types of operations exist in MySQL and Mongo. There are some interesting powers in the Mongo aggregation. However, SQL is more powerful and just as complex as some map reduce jobs and aggregations I have written.
Another area is simply looking at the history of Mongo and MySQL. Mongo until WiredTiger and RocksDB were very similar to MyISAM from a locking behavior and support perspective. With the advent of the new storage system, we will-will see major leaps forward in types of flows you will want in Mongo. With the writer lock issue is gone, and locking between the systems is becoming more and more similar making deciding which much harder.
The news is not all use. However, subdocuments and array support in Mongo is amazing there are so many things I can do in Mongo that even in bitwise SET/ENUM operators I could not do. So if you need that type of system, or you want to create a semi denormalize for of a view in the database. Mongo can do this with ease and on the fly. MySQL, on the other hand, would take careful planning and need whole tables updated. In this regard I feel more people could use Mongo and is ability to have a versioned document schema allowing more incremental changes to documents. With new code releases, allowing the application to read old version and “upgrade” them to the latest form. Removing a whole flurry of maintenance related pains that RDBMs have to the frustration of developers who just want to launch the new product.
The last thing I would want to say here is you need not choose, why not use both. Mongo can be very powerful for keeping a semi denormalized version of the data that is nimble to allow fast application or system updates and features. Leaving MySQL for a very specific workload that need the precision are simple are not expected to have schema changes. I am a huge fan of keeping the transactional portions in MySQL, and the rest in Mongo. Allowing you to scale quickly up and down the build of your data needs, and more slowly change the parts that need to be 100% consistent all of the time with no room for eventual consistency.
Tom: What another session(s) are you most looking forward to besides your own at Percona Live Amsterdam?
David: There are a few that are near and dear to me.
“Turtles all the way down: tuning Linux for database workloads” looks like a great one. It is one view I have always had, and DBA’s should be DBA’s, SysAdmins, and Storage people rolled into one. That way they can understand the impacts of the application down to the blocks the database reads.
“TokuDB internals” is another one. I have used TokuDB in MySQL and Mongo to some degree but as it has never had in-depth documentation. A topic like that is a great way to fill any gaps for experienced and new people alike.
“Database Reliability Engineering” looks like a great talk from a great speaker.
As an InnoDB geek, I like the idea around “Understanding InnoDB locks: case studies.”
I see a huge amount of potential for MaxScale if anyone else is curious, “Anatomy of a Proxy Server: MaxScale Internals” should be good for R/W splits and split writing type cases.
Finally, one of my favorite people is Charity as she always is so energetic and can get to the heart of the matter. If you are not going to “Upgrade your database: without losing your data, your perf or your mind” you are missing out!
Tom: Thanks for speaking with me, David! Is there anything else you’d like to add: either about Rackspace or Percona Live Amsterdam?
David: In regards to Rackspace, I urge everyone to check out the Data Services group. We handle everything from Redis to Hadoop with a goal of augmenting your groups or providing experts to help keep your uptime as high as possible. With options for dedicated hosts to platform type services, there is something that helps everyone. Rackspace is not just a cloud company but a real support company that provides amazing hardware to use, or support for other hardware location that is growing rapidly.
With Percona Amsterdam, everyone should come the group of speakers is simply amazing, I for one am excited by so many topics because they are all so compelling. Outside of that you will it hard find another a gathering of database experts with multiple technologies under their belt and who truly believe in the move to picking the right technology for the right use case.
The post ObjectRocket’s David Murphy talks about MongoDB, Percona Live Amsterdam appeared first on MySQL Performance Blog.
Thank you for attending my July 22 webinar titled “Advanced Query Tuning in MySQL 5.6 and 5.7” (my slides and a replay available here). As promised here is the list of questions and my answers (thank you for your great questions).
Q: Here is the explain example:mysql> explain extended select id, site_id from test_index_id where site_id=1 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test_index_id type: ref possible_keys: key_site_id key: key_site_id key_len: 5 ref: const rows: 1 filtered: 100.00 Extra: Using where; Using index
why is site_id a covered index for the query, given the fact that a) we are selecting “id”, b) key_site_id only contains site_id?
As the table is InnoDB, all secondary keys will always contain primary key (“id”); in this case the secondary index will contain all needed information to satisfy the above query and key_site_id will be “covered index”
Q: Applications change over time. Do you suggest doing a periodic analysis of indexes that are being used and drop the ones that are not? If yes, any suggestions as to tackle that?
Yes, that is a good idea. Usually it can be done easily with Percona toolkit or Performance_schema in MySQL 5.6
Q: Does the duplicate index is found on 5.6/5.7 will that causes an performance impact to the db while querying?
Duplicate keys can have negative impact on selects:
Q: What is the suggested method to measure performance on queries (other than the slow query log) so as to know where to create indexes?
Q: I’m not sure if this was covered in the webinar but… are there any best-practices for fulltext indexes?
That was not covered in this webinar, however, I’ve done a number of presentations regarding Full Text Indexes. For example: Creating Geo Enabled Applications with MySQL 5.6
Q: What would be the limit on index size or number of indexes you can defined per table?
There are no limits on Index size on disk, however, it will be good (performance wise) to have active indexes fit in RAM.
In InnoDB there are a number of index limitations, i.e. a table can contain a maximum of 64 secondary indexes.
Q: If a table has two columns you would like to sum, can you have that sum indexed as a calculated index? To add to that, can that calculated index have “case when”?
Just to clarify, this is only a feature of MySQL 5.7 (not released yet).
Yes, it is documented now:CREATE TABLE triangle ( sidea DOUBLE, sideb DOUBLE, sidec DOUBLE AS (SQRT(sidea * sidea + sideb * sideb)) );
Q: I have noticed that you created indexes on columns like DayOfTheWeek with very low cardinality. Shouldn’t that be a bad practice normally?
Yes, you are right! Unless, you are doing queries like “select count(*) from … where DayOfTheWeek = 7” those indexes may not be very useful.
Q: I saw an article that if you don’t specify a primary key upfront mysql / innodb creates one in the background (hidden). Is it different from a primary key itself, if most of the where fields that are used not in the primary / semi primary key? And is there a way to identify the tables with the hidden primary key indexes?
The “hidden” primary key will be 6 bytes, which will also be appended (duplicated) to all secondary keys. You can create an INT primary key auto_increment, which will be smaller (if you do not plan to store more than 4 billion rows). In addition, you will not be able to use the hidden primary key in your queries.
The following query (against information_schema) can be used to find all tables without declared primary key (with “hidden” primary key):SELECT tables.table_schema, tables.table_name, tables.table_rows FROM information_schema.tables LEFT JOIN ( SELECT table_schema, table_name FROM information_schema.statistics GROUP BY table_schema, table_name, index_name HAVING SUM( CASE WHEN non_unique = 0 AND nullable != 'YES' THEN 1 ELSE 0 END ) = COUNT(*) ) puks ON tables.table_schema = puks.table_schema AND tables.table_name = puks.table_name WHERE puks.table_name IS NULL AND tables.table_type = 'BASE TABLE' AND engine='InnoDB'
You may also use mysql.innodb_index_stats table to find rows with the hidden primary key:
Example:mysql> select * from mysql.innodb_index_stats; +---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+ | database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description | +---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+ | test | t1 | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | n_diff_pfx01 | 96 | 1 | DB_ROW_ID | | test | t1 | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index | | test | t1 | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | size | 1 | NULL | Number of pages in the index | +---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
Q: You are using the alter table to create index, but how does mysql sort the data for creating the index? isn’t it uses temp table for that?
That is a very good question: the behavior of the “alter table … add index” has changed over time. As documented in Overview of Online DDL:
Historically, many DDL operations on InnoDB tables were expensive. Many ALTER TABLE operations worked by creating a new, empty table defined with the requested table options and indexes, then copying the existing rows to the new table one-by-one, updating the indexes as the rows were inserted. After all rows from the original table were copied, the old table was dropped and the copy was renamed with the name of the original table.
MySQL 5.5, and MySQL 5.1 with the InnoDB Plugin, optimized CREATE INDEX and DROP INDEX to avoid the table-copying behavior. That feature was known as Fast Index Creation
When MySQL uses “Fast Index Creation” operation it will create a set of temporary files in MySQL’s tmpdir:
To add a secondary index to an existing table, InnoDB scans the table, and sorts the rows using memory buffers and temporary files in order by the values of the secondary index key columns. The B-tree is then built in key-value order, which is more efficient than inserting rows into an index in random order.
Q: How good is InnoDB deadlocks on 5.7 comparing to 5.6 version. Is that based on parameters setup?
InnoDB deadlocks discussion is outside of the scope of this presentation. Valerii Kravchuk and Nilnandan Joshi did an excellent talk at Percona Live 2015 (slides available): Understanding Innodb Locks and Deadlocks
Q: What is the performance impact of generating a virtual column for a table having 66 Million records and generating the index. And how would you go about it? Do you have any suggestions on how to re organize indexes on the physical disk?
As MySQL 5.7 is not released yet, behavior of the virtual columns may change. The main question here is: will it be online operations to a) add a virtual column (as this is only metadata change it should be very light operation anyway). b) add index on that virtual column. In the labs released it was not online, however this can change.
Thank you again for attending.
The post Advanced Query Tuning in MySQL 5.6 and MySQL 5.7 Webinar: Q&A appeared first on MySQL Performance Blog.