Latest MySQL Performance Blog posts
Founded in 2004 to help people find great local businesses, Yelp has some 135 million monthly unique visitors. With those traffic volumes Yelp’s 300+ engineers are constantly working to keep things moving smoothly – and when you move that fast you learn many things.
Fortunately for the global MySQL community, three Yelp DBAs will be sharing what they’ve learned at the annual Percona Live MySQL Conference and Expo this April 13-16 in Santa Clara, California.
Say “hello” to Susanne Lehmann, Jenni Snyder and Josh Snyder! I chatted with them over email about their presentations, on how MySQL is used at Yelp, and about the shortage of women in MySQL.
Tom: Jenni, you and Josh will be co-presenting “Next generation monitoring: moving beyond Nagios ” on April 14.
You mentioned that Yelp’s databases scale dynamically, and so does your monitoring of those databases. And to minimize human intervention, you’ve created a Puppet and Sensu monitoring ensemble… because “if it’s not monitored, it’s not in production.” Talk to me more about Yelp’s philosophy of “opt-out monitoring.” What does that entail? How does that help Yelp?
Jenni: Before we moved to Sensu, our Nagios dashboards were a sea of red, muted, acknowledged, or disabled service checks. In fact, we even had a cluster check to make sure that we never accidentally put a host into use that was muted or marked for downtime. It was possible for a well-meaning operator to acknowledge checks on a host and forget about it, and I certainly perpetrated a couple of instances of disks filling up after acknowledging a 3am “warning” page that I’d rather forget about. With Sensu, hosts and services come out of the downtime/acknowledgement state automatically after a number of days, ensuring that we’re kept honest and stay on top of issues that need to be addressed.
Also, monitoring is deployed with a node, not separate monitoring configuration. Outside of a grace period we employ when a host is first provisioned or rebooted, if a host is up, it’s being monitored and alerting. Also, alerting doesn’t always mean paging. We also use IRC and file tickets directly into our tracking system when we don’t need eyes on a problem right away.
Tom: Susanne, in your presentation, titled “insert cassandra into prod where use_case=?;” you’ll discuss the situations you’ve encountered where MySQL just wasn’t the right tool for the job.
What led up to that discovery and how did you come up with finding the right tools (and what were they) to run alongside and support MySQL?
Susanne: Our main force behind exploring other datastores alongside MySQL was that Yelp is growing outside the US market a lot. Therefore we wanted the data to be nearer to the customer and needed multi-master writes.
Also, we saw use cases where our application data was organized very key-value like and not relational, which made them a better fit for a NoSQL solution.
We decided to use Cassandra as a datastore and I plan to go more into detail why during my talk. Now we offer developers more choices on how to store our application data, but we also believe in the “right tool for the job” philosophy and might add more solutions to the mix in the future.
Tom: Jenni, you’ll also be presenting “Schema changes multiple times a day? OK!” I know that you and your fellow MySQL DBAs are always improving and also finding better ways of supporting new and existing features for Yelp users like me. Delivering on such a scale must entail some unique processes and tools. Does this involve a particular mindset among your fellow DBAs? Also, what are some of those key tools – and processes and how are they used?
Jenni: Yelp prizes the productivity of our developers and our ability to iterate and develop new features quickly. In order to do that, we need to be able to not only create new database tables, but also modify existing ones, many of which are larger than MySQL can alter without causing considerable replication delay. The first step is to foster a culture of automated testing, monitoring, code reviews, and partnership between developers and DBAs to ensure that we can quickly & safely roll out schema changes. In my talk, I’ll be describing tools that we’ve talked about before, like our Gross Query Checker, as well as the way the DBA team works with developers while still getting the rest of our work done. The second, easy part is using a tool like pt-online-schema-change to run schema changes online without causing replication delay or degrading performance
Tom: Josh, you’ll also be speaking on “Bootstrapping databases in a single command: elastic provisioning for the win.” What is “elastic provisioning” and how are you using it for Yelp’s tooling?
Josh: When I say that we use elastic provisioning, I mean that we can reliably and consistently build a database server from scratch, with minimal human involvement. The goal is to encompass every aspect of the provisioning task, including configuration, monitoring, and even load balancing, in a single thoroughly automated process. With this process in place, we’ve found ourselves able to quickly allocate and reallocate resources, both in our datacenters and in the cloud. Our tools for implementing the above goals give us greater confidence in our infrastructure, while avoiding single-points of failure and achieving the maximum possible level of performance. We had a lot of fun building this system, and we think that many of the components involved are relevant to others in the field.
Tom: Susanne and Jenni, last year at Percona Live there was a BoF session titled “MySQL and Women (or where are all the women?).” The idea was to discuss why there are “just not enough women working on the technology side of tech.” In a nutshell, the conversation focused on why there are not more women in MySQL and why so relatively few attend MySQL conferences like Percona Live.
The relative scarcity of women in technical roles was also the subject of an article published in the August 2014 issue of Forbes, citing a recent industry report.
Why, in your (respective) views, do you (or don’t) think that there are so few women in MySQL? And how can this trend be reversed?
Susanne: I think there are few women in MySQL and the reasons are manifold. Of course there is the pipeline problem. Then there is the problem, widely discussed right now, that women who are entering STEM jobs are less likely staying in there. These are reasons not specific for MySQL jobs, but rather for STEM in general. What is more specific for database/MySQL jobs is, in my opinion, that often times DBAs need to be on call, they need to stay in the office if things go sideways. Database problems tend often to be problems that can’t wait till the next morning. That makes it more demanding when you have a family for example (which is true for men as well of course, but seems still to be more of a problem for women).
As for how to reverse the trend, I liked this Guardian article because it covers a lot of important points. There is no easy solution.
I like that more industry leaders and technology companies are discussing what they can do to improve diversity these days. In general, it really helps to have a great professional (female) support system. At Yelp, we have AWE, the Awesome Women in Engineering group, in which Jenni and I are both active. We participate in welcoming women to Yelp engineering, speaking at external events and workshops to help other women present their work, mentoring, and a book club.
Jenni: I’m sorry that I missed Percona Live and this BoF last year; I was out on maternity leave. I believe that tech/startup culture is a huge reason that fewer women are entering and staying these days, but a quick web search will lead you to any number of articles debating the subject. I run into quite a few women working with MySQL; it’s large, open community and generally collaborative and supportive nature is very welcoming. As the article you linked to suggests, MySQL has a broad audience. It’s easy to get started with and pull into any project, and as a result, most software professionals have worked with it at some time or another.
On another note, I’m happy to see that Percona Live has a Code of Conduct. I hope that Percona and/or MySQL will consider adopting a Community Code of Conduct like Python, Puppet, and Ubuntu. Doing so raises the bar for all participants, without hampering collaboration and creativity!
* * *
Thanks very much, Susanne, Jenni and Josh! I look forward to seeing you next month at the conference. And readers, if you’d like to attend Percona Live, use the promo code Yelp15 for 15% off! Just enter that during registration. If you’re already attending, be sure to tweet about your favorite sessions using the hashtag #PerconaLive. And if you need to find a great place to eat while attending Percona Live, click here for excellent Yelp recommendations.
The post Yelp IT! A talk with 3 Yelp MySQL DBAs on Percona Live & more appeared first on MySQL Performance Blog.
Percona Toolkit is one of our most mature open source applications. Derived from Maatkit and Aspersa, Percona Toolkit has evolved significantly over the years. The software now contains 32 tools, over 4,000 tests, and has been downloaded over 250,000 times. Anyone who manages a database – from DBAs to system administrators to even software developers – benefits from Percona Toolkit’s ability to perform a variety of MySQL server and system tasks that are too difficult or complex to perform manually.
We continue to make Percona Toolkit better each month. Over the last 9 months alone Percona has had 6 releases and resolved nearly 50 issues.
While Percona team members in Support, Consulting, and Managed Services are big drivers of identifying bugs and new features (driven mostly by Percona customer needs), the community of Percona Toolkit users plays a significant role in making the open source software what it is today.
We’d like to learn how we can make Percona Toolkit even better for your needs. Please take a brief survey so we can learn how you actually use the software. As a thank you for taking the survey, we are randomly giving away five $50 Amazon.com gift cards to participants. It’s a small token but one that we hope you’ll appreciate.
Recent additions to Percona Toolkit have included better Percona XtraDB Cluster support as well as multiple fixes and improvements to pt-online-schema-change, pt-kill, pt-query-digest, pt-stalk, and preparation for the MySQL 5.7 GA. Help us continue to improve Percona Toolkit by taking part in our survey. If you use Percona Toolkit and are attending Percona Live next month, please keep a look out for me. I’d like to hear about your experiences.
The post Calling all MySQL DBAs: How do you use Percona Toolkit? appeared first on MySQL Performance Blog.
MongoDB 3.0 was recently released. Instead of focusing on what’s new – that is so easy to find, let’s rather talk about something that has not changed a lot since the early MongoDB days. This topic is sharding and most specifically: how to choose a good sharding key. Note that most of the discussion will also apply to MySQL, so if you are more interested in sharding than in MongoDB, it could still be worth reading.When do you want to shard?
In general sharding is recommended with MongoDB as soon as any of these conditions is met:
- #1: A single server can no longer handle the write workload.
- #2: The working set no longer fits in memory.
- #3: The dataset is too large to easily fit in a single server.
Note that #1 and #2 are by far the most common reason why people need sharding. Also note that in the MySQL world, #2 does not imply that you need sharding.What are the properties of a good sharding key?
The starting point is that a cross-shard query is very expensive in a sharded environment. It is easy to understand why: the query has to be executed independently on several shards, and then results have to be merged.
With MongoDB, mongos will transparently route queries to the right shards and will automatically merge the results: this is very handy but the hidden complexity can also make you forget that you have executed a very poorly optimized query.
This is where the choice of a sharding key is so critical: choose the right key and most queries will be simple and efficient, choose a wrong one and you’ll have ugly and slow queries.
Actually a good sharding does not need to have tens of properties, but only two:
- Insertions should be as much balanced as possible across all shards.
- Each query should be able to be executed by retrieving data from as little shards as possible (ideally 1 shard).
Sounds quite easy, right? However depending on your use case, it may be quite difficult to find a good sharding key. Let’s look at a few examples.Social Network
Say we have users who can be connected to other users, who can read or write posts and who have their own wall.
All 3 collections can become very large, so sharding all will be necessary over time.
For the user and wall collection the user_id field is an obvious choice and it actually meets both criteria for a good sharding key.
For the post collection, user_id also looks like an obvious choice, but if you think about how the collection is accessed for reads, you will realize that you will need to fetch it using its post_id, not its user_id (simply because a user can have multiple posts). If you shard by user_id, any read to a single post will be broadcast to all shards: this is clearly not a good option.
So using post_id is a better choice. However it only meets criteria #2: most posts are never updated, so all the writes are insertions that will go to a single shard. However the traffic on the post collection is strongly in favor of reads, so being able to speed up reads while not slowing down writes is probably an acceptable tradeoff.Access Logs
The workload here is very specific: write-intensive and append-only.
Sharding by ObjectId is definitely a bad idea: while data can be easily spread across all shards, all writes will only go to one shard, so you will have no benefit compared to a non-sharded setup when it comes to scale the writes.
A better solution is to use a hash of the ObjectId: that way data AND writes will be spread across all shards.
Another good option would be to use another field in your documents that you know is evenly distributed across the whole dataset. Such field may not exist though, that’s why hashing the ObjectId is a more generic solution.Ecommerce
MongoDB can be a good option to store a product catalog: being schemaless, it can easily store products with very different attributes.
To be usable such a catalog must be searchable. This means that many indexes need to be added, and the working set will probably grow very quickly. In this case your main concern is probably not to scale the writes, but to make reads as efficient as possible.
Sharding can be an option because if done properly, each shard will act as a coarse-grained index. Now the issue is to find which field(s) will evenly distribute the dataset. Most likely a single field will not be enough, you will have to use a compound sharding key.
Here I would say that there is no generic solution, but if the products are for instance targeted at either kid, woman or man and if you have several categories of products, a potential sharding key would be (target, category, sku).
Note that in this case, reading from secondaries may be enough to ensure good scalability.Conclusion
As you can see, choosing a correct sharding key is not always easy: do not assume that because some apps are sharded by some fields, you should do the same. You may need a totally different strategy or sharding may even not be the right solution for your issues.
If you are interested in learning more about MongoDB, I will be presenting a free webinar on March 25 at 10 a.m. Pacific time. It will be an introduction to MongoDB for MySQL DBAs and developers. Register here if you are interested.
The post Choosing a good sharding key in MongoDB (and MySQL) appeared first on MySQL Performance Blog.
For the last year or so I have been involved with OpenStack and more specifically the Trove (DBaaS) project as sort of an ambassador for Percona, contributing bits of knowledge, help and debugging wherever I could and thought I would share some of my experience with others that wanted to get involved with OpenStack development, documentation, testing, etc. Getting started with OpenStack contributions is also the idea behind my talk next month at Percona OpenStack Live 2015. (Percona Live attendees have access to OpenStack Live)
Back at the last OpenStack Conference and Design Summit in Paris last November, I had the amazing opportunity to attend the two-day OpenStack Upstream Training hosted by Stefano Maffulli, Loic Dachary and several other very kind and generous folks. If you ever find yourself in a position to attend one of these training sessions, I strongly suggest that you take advantage of the opportunity, you will not be disappointed.
Using some of the material from the OpenStack Foundation and a little personal experience, I’m going to go through some of the basics of what you’ll need to know if you want to contribute. There are several steps but they are mostly painless:
– It all starts with a little bit of legal work such as signing a either an individual or corporate contributor agreement.
– You will need to decide on a project or projects that you want to contribute to. Chances are that you already have one in mind.
– Find the various places where other contributors to that project hang out, usually there is a mailing list and IRC channel. Logon, introduce yourself, make some friends and sit and listen to what they are working on. Find the PTL (Project Team Lead) and remember his/her name. Let him/her know who you are, who you work for, what you are interested in, etc. Sit in on their meetings, ask questions but don’t be a pest. Observe a little etiquette, be polite and humble and you will reap many rewards later on.
– Eventually you will need to find and get the code and install whatever tools are necessary for that project, build it, stand up a test/working environment, play with it and understand what the various moving parts are. Ask more questions, etc.
– Do you think you are ready to do some coding and submit a patch? Talk to the PTL and get a lightweight bug or maybe a documentation task to work on.
– In order to submit a patch you will need to understand the workflow use the OpenStack gerrit review system which takes a little bit of time to understand if you have never used gerrit before. You’ll need to find and install git-review. Here is where making friends above really helps out. In every project there are usually going to be a few folks around with the time and patience to help you work through your first review.
– Find a bit of a mentor to help you with the mechanics in case you run into trouble, could just be the PTL if he/she has the time, make your patch, send it in and work through the review process.
– As with most peer review situations, you’ll need to remember never to take things personally. A negative review comment is not an insult to you and your family! Eventually your patch will either be accepted and merged upstream (yay!) or rejected and possibly abandoned in favor of some alternative (boo!). If rejected, fret not! Talk to the PTL and your new friends to try and understand the reason why if the review comments were unclear and simply try again.
It is that easy!
Come join me on Tuesday, April 14th in Santa Clara, California and we’ll chat about how you can begin contributing to OpenStack.
MySQL QA Episode #1: Bash/GNU Tools & Linux Upskill & Scripting Fun
This episode consists of 13 parts, and an introduction. See videos below
In HD quality (set your player to 720p!)
Part 1: echo, ls, cp, rm, vi, cat, df, du, tee, cd, clear, uname, date, time, cat, mkdir
Part 2: find, wc, sort, shuf, tr, mkdir, man, more
Part 3: Redirection, tee, stdout, stderr, /dev/null, cat
Part 4: Vars, ‘ vs “, $0, $$, $!, screen, chmod, chown, export, set, whoami, sleep, kill, sh, grep, sudo, su, pwd
Part 5: grep, regex (regular expressions), tr
Part 6: sed, regex (regular expressions)
Part 7: awk
Part 8: xargs
Part 9: subshells, shells, sh
Part 10: if, for, while, seq, head, grep & grep -q, sleep, tee, read & more
Part 11: Arrays, lynx, grep, egrep, awk, redirection, variable, printf, while, wget, read
Part 12: Production scripting examples
Part 13: Gnuwin32, Gnuwin32 escaping & path name/binary selection gotcha’s, untar, unzip, gzip for Windows
If you enjoyed these video’s leave us a comment below!
The post MySQL QA Episode 1: Bash/GNU Tools & Linux Upskill & Scripting Fun appeared first on MySQL Performance Blog.
Welcome to the MySQL QA Training Series!
If you have not read our introductory blog post on pquery yet, I’d recommend reading that one first to get a bit of background. The community is enthuastic about pquery, and today I am happy to announce a full training series on pquery and more. Whether you are a Linux or MySQL newbie or a seasoned QA engineer, there is something here for you. From Bash scripting (see episode 1 below), to every aspect of the new pquery framework, it is my hope that you enjoy this series. If you do, please leave us a comment
Database quality assurance is not as straightforward as it may seem. It’s not a matter of point-and-click, but rather of many intertwined tools and scripts. Beyond that, due to the complexity of the underlying product, it’s about having an overall plan or vision on how to adequately test the product in every aspect.
Take for example the SELECT statement; it allows specifying about 30 different clauses or modifiers (GROUP BY, WHERE, ORDER, LIMIT, HAVING, …). Then, think further about what one could do inside these clauses, or inside subselects etc. The number of possible combinations (exhaustive testing) of all commands (and all formats and variations thereof) plus all mysqld options (nearly 500 of them) is for all intents and purposes infinite, and thus seemingly impossible to test.
In Episode 13, an approach is proposed which, in our view, adequately solves this test infinite-possibility coverage problem through the use of random spread coverage testing and sql interleaving.Knowing your Bash/Linux/Gnu scripting well is also an almost definite prerequisite to getting started with mysqld QA. Episode 1 in this series is over 3.5 hours of in-depth (from easy to advanced) training on many often-used Bash commands and topics. From ls to sed, from cp to xargs and from variables to arrays. Enjoy!
Without further ado, here are the planned upcoming episodes:
MySQL QA Episode 1: Bash/GNU Tools & Linux Upskill & Scripting Fun
MySQL QA Episode 2: Build a MySQL Server: Git, Compiling, Build Tools
MySQL QA Episode 3: Debugging: GDB, Backtraces, Frames, Library Dependencies
MySQL QA Episode 4: QA Framework Setup Time! percona-qa, pquery, reducer & more
MySQL QA Episode 5: Preparing Your QA Run: mtr_to_sql.sh and pquery-run.sh
MySQL QA Episode 6: Analyzing & Filtering: pquery-prep-red.sh, -clean-known.sh & pquery-results.sh
MySQL QA Episode 7: Reducing Testcases for Beginners: single-threaded reducer.sh
MySQL QA Episode 8: Reducing Testcases for Engineers: tuning reducer.sh
MySQL QA Episode 9: Reducing Testcases for Experts: multi-threaded reducer.sh
MySQL QA Episode 10: Reproducing and Simplifying: How to get it Right
MySQL QA Episode 11: Valgrind Testing: Pro’s, Con’s, Why and How
MySQL QA Episode 12: Multi-node Cluster Testing Using Docker
MySQL QA Episode 13: A Better Approach to all MySQL Regression, Stress & Feature Testing: Random Coverage Testing & SQL Interleaving
A short introduction on each episode:
As episodes are finished, the series titles above will be linked so it’s easy to check this page for updates.