Search Results for: data set

MySQL & OpenStack: How to overcome issues as your dataset grows

MySQL is the database of choice for most OpenStack components (Ceilometer is a notable exception). If you start with a small deployment, it will probably run like a charm. But as soon as the dataset grows, you will suddenly face several challenges. We will write a series of blog posts explaining the issues you may […]

Sample datasets for benchmarking and testing

Sometimes you just need some data to test and stress things. But randomly generated data is awful — it doesn’t have realistic distributions, and it isn’t easy to understand whether your results are meaningful and correct. Real or quasi-real data is best. Whether you’re looking for a couple of megabytes or many terabytes, the following […]

Storing time-series data with MongoDB and TokuMX

Storing time-series data is a frequent pattern for databases – be it for logs or for any kind of monitoring. Such data has the following properties: records are inserted but also never updated, the insertion rate can be high and records are likely to expire after some time. MongoDB and TokuMX are both good fits […]

Keep your MySQL data in sync when using Tungsten Replicator

MySQL replication isn’t perfect and sometimes our data gets out of sync, either by a failure in replication or human intervention. We are all familiar with Percona Toolkit’s pt-table-checksum and pt-table-sync to help us check and fix data inconsistencies – but imagine the following scenario where we mix regular replication with the Tungsten Replicator: We […]

The cost of not properly managing your databases

Every day hundreds of millions of dollars are wasted by allowing improperly tuned or misconfigured systems, misunderstood infrastructure, and inefficient IT operations to live and thrive in data centers around the globe. There are both direct and indirect costs associated with allowing these unhealthy systems to continue to exist. Let’s look at some. The setup: […]

How to setup a PXC cluster with GTIDs (and have async slaves replicating from it!)

This past week was marked by a series of personal findings related to the use of Global Transaction IDs (GTIDs) on Galera-based clusters such as Percona XtraDB Cluster (PXC). The main one being the fact that transactions touching MyISAM tables (and FLUSH PRIVILEGES!) issued on a giving node of the cluster are recorded on a GTID set […]

‘Indexing’ JSON documents for efficient MySQL queries over JSON data

MySQL meets NoSQL with JSON UDF I recently got back from FOSDEM, in Brussels, Belgium. While I was there I got to see a great talk by Sveta Smirnova, about her MySQL 5.7 Labs release JSON UDF functions. It is important to note that while the UDF come in a 5.7 release it is absolutely […]