Say hello to David Murphy, lead DBA and MongoDB Master at ObjectRocket (a Rackspace company). David works on sharding, tool building, very large-scale issues and high-performance MongoDB architecture. Prior to ObjectRocket he was a MySQL/NoSQL architect at Electronic Arts. David enjoys large-scale operational tool building, high performance OS and database tuning. He is also a core code contributor to MongoDB. He’ll be speaking next month at Percona Live Amsterdam, which runs Sept. 21-13. Enter promo code “BlogInterview” at registration to save €20!
Tom: David, your 3-hour tutorial is titled “Mongo Sharding from the trench: A Veterans field guide.” Did your experience in working with vast amounts of data at Rackspace give you a unique perspective, in view, that now puts you into a position to help people just getting started? Can you give a couple examples?
David: I think this has been something organically I grew into from the days of supporting Cpanel type MySQL instances to today. I have worked for a few verticals from hosts to advertising to gaming, finally entering into the platform service. The others give me a host of knowledge around how customer need systems to work, and then the number and range of workloads we see at Rackspace re-enforces this.
Many times the unique perspective comes with the scale such as someone calling up a single node to the multi-terabyte range. When they go to “shard” they can find the process that is normally very light and unnoticeable to most Mongo sharding can severally lock the metadata for an extended time. In other cases, the “balancer” might not be able to keep up with the amount of working being asked of it.
Toward the smaller end of the spectrum, having seen so many workloads from big to small. I can see similar thought processes and trends. When this happens having worked with some many of these workloads, and honestly having learned along the evolution of mongo helps me explain to clients the good, bad, and the hairy. Many times discussions come down to people not using connection pooling, non-indexed sorting, or complex operators such as $in, $nin, and more. In these cases, I can talk to people about the balance of using these concepts and when they will become bigger issues for them. My goal is to give them the enough knowledge to help determine when it is correct to use development resource to fix and issue, and when it’s manageable and that development could be better spent elsewhere.
Tom: The title of your tutorial also sounds like the perfect title of a book. Do you have any for one?
David: What an excellent question! I have thought about this. However, I think the goal of a book if I can find the time to do it. A working title might be “Mongo from the trenches: Surviving the minefield to get ahead”. I think the book might be broken into three sections: “When should you use or not user Mongo”, “Schema and Operatorators in the NoSQL world”, “Sharding”. I would do this as this could be a great mini book on its own the community really could use a level of depth similar to the MySQL 5.0 certification guides. I liked these books as it helped someone understand all the bits of what to consider with your schema design and how it affects the application as much as the database hosts. Then in the second half more administration geared it took those same schema and design choices to help you manage them with confidence.
In the end, Mongo is a good product that works well for most people as it matures we need more and discussion. On topics such as what should you monitor, how you should predict issues, and how valuable are regular audits. Especially in an ecosystem where it’s easy to spin something up, launch it, and move on to the next project.
Tom: When and why would you recommend using MongoDB instead of MySQL?
David: I am glad I mentioned this is worthy of a book already, as it is such a complex topic and one that gets me very excited.
I feel there is a bit or misinformation on both sides of this field. Many in the MySQL camp of experts know when someone says they can’t get more than 1000 TPS via MySQL. 9 out of 10 times and design, not a technology issue, the Mongo crowd love this and due to inherit sharding nature of Mongo they can sidestep these types of issues. Conversely in the Mongo camp you will hear how bad the SQL standard is, however, omitting transactions for a moment, the same types of operations exist in MySQL and Mongo. There are some interesting powers in the Mongo aggregation. However, SQL is more powerful and just as complex as some map reduce jobs and aggregations I have written.
Another area is simply looking at the history of Mongo and MySQL. Mongo until WiredTiger and RocksDB were very similar to MyISAM from a locking behavior and support perspective. With the advent of the new storage system, we will-will see major leaps forward in types of flows you will want in Mongo. With the writer lock issue is gone, and locking between the systems is becoming more and more similar making deciding which much harder.
The news is not all use. However, subdocuments and array support in Mongo is amazing there are so many things I can do in Mongo that even in bitwise SET/ENUM operators I could not do. So if you need that type of system, or you want to create a semi denormalize for of a view in the database. Mongo can do this with ease and on the fly. MySQL, on the other hand, would take careful planning and need whole tables updated. In this regard I feel more people could use Mongo and is ability to have a versioned document schema allowing more incremental changes to documents. With new code releases, allowing the application to read old version and “upgrade” them to the latest form. Removing a whole flurry of maintenance related pains that RDBMs have to the frustration of developers who just want to launch the new product.
The last thing I would want to say here is you need not choose, why not use both. Mongo can be very powerful for keeping a semi denormalized version of the data that is nimble to allow fast application or system updates and features. Leaving MySQL for a very specific workload that need the precision are simple are not expected to have schema changes. I am a huge fan of keeping the transactional portions in MySQL, and the rest in Mongo. Allowing you to scale quickly up and down the build of your data needs, and more slowly change the parts that need to be 100% consistent all of the time with no room for eventual consistency.
Tom: What another session(s) are you most looking forward to besides your own at Percona Live Amsterdam?
David: There are a few that are near and dear to me.
“Turtles all the way down: tuning Linux for database workloads” looks like a great one. It is one view I have always had, and DBA’s should be DBA’s, SysAdmins, and Storage people rolled into one. That way they can understand the impacts of the application down to the blocks the database reads.
“TokuDB internals” is another one. I have used TokuDB in MySQL and Mongo to some degree but as it has never had in-depth documentation. A topic like that is a great way to fill any gaps for experienced and new people alike.
“Database Reliability Engineering” looks like a great talk from a great speaker.
As an InnoDB geek, I like the idea around “Understanding InnoDB locks: case studies.”
I see a huge amount of potential for MaxScale if anyone else is curious, “Anatomy of a Proxy Server: MaxScale Internals” should be good for R/W splits and split writing type cases.
Finally, one of my favorite people is Charity as she always is so energetic and can get to the heart of the matter. If you are not going to “Upgrade your database: without losing your data, your perf or your mind” you are missing out!
Tom: Thanks for speaking with me, David! Is there anything else you’d like to add: either about Rackspace or Percona Live Amsterdam?
David: In regards to Rackspace, I urge everyone to check out the Data Services group. We handle everything from Redis to Hadoop with a goal of augmenting your groups or providing experts to help keep your uptime as high as possible. With options for dedicated hosts to platform type services, there is something that helps everyone. Rackspace is not just a cloud company but a real support company that provides amazing hardware to use, or support for other hardware location that is growing rapidly.
With Percona Amsterdam, everyone should come the group of speakers is simply amazing, I for one am excited by so many topics because they are all so compelling. Outside of that you will it hard find another a gathering of database experts with multiple technologies under their belt and who truly believe in the move to picking the right technology for the right use case.