November 22, 2014

What is the largest amount of data do you store in MySQL ?

My previous poll got a great response so I thought I should continue these series.
The question of the day today is How much data do your store in your largest MySQL instance ? I’m interested about single instance not the total amount of data you have in MySQL in your Sharded replicated environment. Feel free to share details in comments – are you using MyISAM or Innodb ? Is it working out well for you or are there any challenges you’re seeing ?

What is the largest data size for single MySQL Instance you use in Production

  • 100GB to 1TB (34%, 298 Votes)
  • 10GB to 100GB (25%, 226 Votes)
  • 1GB to 10GB (16%, 145 Votes)
  • 1TB to 10TB (15%, 134 Votes)
  • less than 1GB (7%, 60 Votes)
  • more than 10TB (3%, 25 Votes)

Total Voters: 888

Loading ... Loading ...

Thank you for your participation !

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. brad voth says:

    15tb… painful to manage

  2. Brad,

    Thanks for feedback. What is the biggest Pain ? MyISAM and Innodb ?

  3. AD says:

    We got 16TB and the hardest part is backups. Xtrabackup works good. But incrementals do not work with compressed Xtrabackups. We had a bug open for this. (We have Percona Gold support).

    So, if we can get: (a) compressed xtrabackup, (b) compressed incrementals and (c) parallel recovery (or to put it more simply a simple way to parallelize the restore such that the entire io capacity of the disk sub system is utilized — we’d be much happier.

    Thanks,
    AD

  4. William says:

    I’m sort of bummed out that the last category is from 100 GB to 1 TB. I’d like to see a more granular breakdown of that segment.

  5. AD says:

    A few other pain points with large Mysql DBs (16TB):

    – Large DBs imply large tables — so DDLs are too slow. We are using Percona 5.1.6x builds. So, online index additions would be ideal. Not sure if 5.5.x or 5.6.x already addresses this.

    – Large DBs also come with large buffer pool sizes. And that implies very slow shutdown and start (we have fast-shutdown disabled — all that seems to do is just shift the slowness from shutdown to starting up) – I am thinking there is no easy solution here

    – A simpler way to distribute some DBs to a different mount point. We have to use symlinks to point some DBs to other places. Instead, if we could do “create database new_db using db_file_space ‘/new_mount_point/mysql_data_dir';” — would be nice.

    – Backport crash-safe replication to 5.1.6x :) — yes we want to get to 5.5 — but now that 5.6.x is round the corner, have put 5.5.x migration on hold and are planning to go straight to 5.6.x

    Thanks,
    AD

  6. AD,

    Thanks for feedback. Number of compression backup issues are fixed in Xtrabackup 2.0 do you have some specific unresolved issue ? Please feel free to open support request or let me know if there is unresolved support issue I would investigate it

  7. Brad voth says:

    Our main pain point is that we have a single table that has ~12+tb. This data is a rolling 90 days worth of data, write mostly, but needs to be available for immediate queries if the need presents itself. This is an innodb table. The main issue is that with a table this large any table changes are out of the question, as are backups, etc. We write at a rate of 30k+ rows / sec into this table.

    If DDL statements were able to be run in a multithreaded fashion through some sort of divide and conquer methodology that would help immensely. It’s just near impossible to run any sord of DDL on a table with 180b rows. When each row needs to be managed sequentially.

  8. Laurynas Biveinis says:

    AD –

    Percona Server 5.1 has the crash-resistant replication feature: http://www.percona.com/doc/percona-server/5.1/reliability/innodb_recovery_update_relay_log.html. Could you clarify, does it fail for you under some scenarios?

    Thanks,
    Laurynas

  9. Domas says:

    I once had five minutes of stress, when I couldn’t boot up ~30TB MySQL instance, errors were way too cryptic ;-)
    I don’t think we’re using that configuration any more, with regular ~2-3TB machines serving the workload nowadays.

    With MySQL one must shard/partition/whatnot large tables because of index locking, so I don’t even people running single huge table ;-)

  10. This info and the resulting thread is very interesting.

    I’d be curious to see this survey continue with two more questions:

    How big is the largest single table, and what % of the schema does it represent?

    What is the most number tables / database artifacts you house in a mysql instance?

  11. Tim,

    I have 20 different pool ideas already which I will be posting every so often so you people do not get bored :)

  12. - CentOS release 5.6 (Final)
    – ProLiant DL380 G7, Processors: 24, Memory: 94.41 GB
    – MySQL Community Server (GPL) 5.5.16-log (mysql_55)

    Filesystem Size Used Avail Use% Mounted on

    /dev/mapper/vg00-log
    xfs 10G 255M 9.8G 3% /mysql/logvault/log
    /dev/mapper/vg00-data
    xfs 40T 39T 1.2T 98% /mysql/logvault/data

    The inert part of this is compressed MyISAM. The data size in this is clearly too large to manage. Also the logdata structure is not well suited for SQL. This is about to become a Hadoop/Hive cluster.

  13. Kristian,

    Impressive it is….

  14. Krishna says:

    2TB for produciton report server and still growing. (InnoDB)

  15. It were, if it worked properly. Hive and Hadoop have their share of problems, but in general handle this amount of data orders of magnitude better.

  16. Krishna says:

    Making application intelligent, also helps in managing terabytes of data. for eg manipulating data of more than 3 months (900M records), automatically apps should show ‘schedule by email’.

  17. eRadical says:

    Currently we have a new 300G instance (a new internal-analytics instance) that is growing daily by 3G.
    This is the entry instance for raw data from ETL scripts.

    The machine is a 16GB-RAM, Quad-Core Xeon with HT, 2 x 3TB HDD (Software-RAID 1).
    It’s on a CentOS 6.3 and I’m using MariaDB Server 5.5.27.

    On the same instance there is also:
    – an instance replicating our website prod database (too small to count – also MariaDB),
    – a TokuDB instance that is computing data from the ETL instance (under 2G on permanent data)

    The oddest thing is the size of computed data: 250MB… so far.

  18. AD says:

    Laurynas,
    Thanks – I must have misread the documentation. For some reason I had thought that crash-resistant replication was in 5.5.x only.

    Peter,
    Re: the compressed incremental backups with latest xtrabackup, I’ll double check if its working now and report back.

    Domas,
    Re: your point about sharding — well, yes, true. We are right on the cusp of whether to shard or not. Now, sharding will greatly increase application complexity. And the rate at which high iops (FusionIO, Virident, even commodity OCZ) storage costs are plummeting; I suspect our needs are going to be met with a largish single node Mysql for quite a few years more allowing us to postpone the sharding decision.

    Thanks,
    AD

  19. Jouni "rautamiekka" Järvinen says:

    Our MySQL 5.5.28-1~dotdeb.0-log on Debian64 6 for our Minecraft is running 2.9GB large database of 71 tables where most tables are InnoDB. Works fine AFA I’ve conffed it, and I’m not awesome in that.

  20. Neda says:

    Hi Peter,
    I have designed and replicated a MySQL instance as slave for archiving our daily transactions for 6 months ,we are using this server for some reporting activities as well . It took around 700 G up to now .

    Expired data will be deleted every night after working hours ,we have started with 5 million transactions per day and now i’m upgrading it to store around 9 million transactions every day .(extra transactions are fetching with ETL from row data sent by our partners )

    I’ve used Innodb engine and I have partitioned big tables ,they are around 300G now ,each table contained several data files ,

Speak Your Mind

*