Announcement

Announcement Module
Collapse
No announcement yet.

Tuning server for very large (250GB) data sets

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tuning server for very large (250GB) data sets

    Hey guys. I've got some large datasets at work and I'm constantly struggling with performance issues. I wonder if you guys can help me analyze my MySQL configuration and offer any suggestions.

    Here's my setup: We store firewall log data for each of our clients. We split the data across four tables in a one-to-one mapping -- one table (named 'event') stores the information that is common to every event, and the other three tables ('event_detail', 'event_ip_detail', and 'event_icmp_detail') hold the data that applies only to some of the events. The decision so split the data up was made before I got here, to save hard disk space for the records that don't need those extra columns, and now we have to deal with joining the data back up when we want to use it. Eventually I want to redesign the table schema and put it all in one table, but that's not an option for now.

    We've got seven database servers total, each with two Xeon processors (ranging between 2.0 and 3.2 ghz on different servers, all with HyperThreading enabled), 2.5GB RAM, and two 146GB 10k RPM SCSI disks joined together with LVM. All servers are running CentOS 4 (linux). The version of MySQL on every server is 4.1.20-standard-log.

    We store data for multiple clients, and our data is partitioned on a client level. Each client's data is stored in a separate database. Each database has the same four tables inside, using InnoDB file_per_table. The larger clients (200-300 GB each) get a server to themselves, and the smaller ones (10-70 GB each) share one server.

    Each of the tables in the larger databases has on the order of about 300 million rows.

    As far as read vs write usage (selects vs inserts), ours isn't really clearly biased one way or another like it is in some cases (like webservers doing mostly selects). I suppose we do a little more inserting than selecting -- we're pretty much *always* inserting new events. We then have a series of scripts that run once an hour and once a day to calculate various statistics based on the data. Most of the rest of our usage is based on those calculated statistics, so that's about the last time the data itself is accessed until we delete it 45 days later.

    Here's the relevant portions of my MySQL config for one of my servers. This one has 2GB of RAM and dual 3.2 ghz Xeons w/2MB cache.


    skip-name-resolveskip-lockingset-variable = max_relay_log_size=256Mset-variable = key_buffer=384Mset-variable = max_allowed_packet=25Mset-variable = table_cache=128set-variable = sort_buffer=4Mset-variable = read_buffer_size=5Mset-variable = myisam_sort_buffer_size=16Mset-variable = tmp_table_size=768Minnodb_file_per_tableinnodb_dat a_file_path = ibdata1:100M:autoextendset-variable = innodb_log_files_in_group=3set-variable = innodb_log_file_size=250Mset-variable = innodb_log_buffer_size=16Mset-variable = innodb_buffer_pool_size=1536Mset-variable = innodb_additional_mem_pool_size=20Mset-variable = innodb_lock_wait_timeout=1000set-variable = innodb_flush_log_at_trx_commit=1


    Any ideas? If there are any other stats that would be useful, let me know, and I can post them. Thanks a lot.

    Dan

  • #2
    In what area are you struggling with performance - inserts, selects, updates? I assume you are saying that selects are a problem.

    Can you give a sample query pseudo code that is a problem?

    Comment


    • #3
      Usually, our SELECTs are relatively fast (considering the size of the tables we're dealing with), but INSERTs and mostly DELETEs are slow. Lately I've been struggling with two servers that have become an order of magnitude slower in both selects AND deletes and I have no idea why, but that's probably a separate issue.

      Basically, it's DELETEs that cause me the most problems. We have an archive process that runs daily for each client -- it SELECTs the oldest 24 hours' worth of data from the database (in smaller blocks, of course, not all at once) and writes it to an XML file, then it DELETEs those oldest 24 hours' worth of data from the database (also in blocks of 50,000 rows at a time). For the larger databases, where we have about 4-5 GB of data per 24 hours, the process can take 18-20 hours total (usually 4-6 hrs for the reading portion, and then 10-15 hours for deleting). So it's usually just barely completing before the next day's archive process starts running.

      We've got a relatively large amount of indexes on each table, since it was the only way to achieve decent performance on our SELECT queries, and I'm sure it's those indexes that cause the DELETEs to take forever, but it's not really an option to get rid of any indexes, since that will just transfer the slow performance to the SELECT queries.

      Here's the table definitions for my four tables:


      CREATE TABLE `event` ( `event_id` bigint(20) unsigned NOT NULL auto_increment, `event_time` datetime NOT NULL default '0000-00-00 00:00:00', `host_address` varchar(15) NOT NULL default '', `facility` varchar(20) NOT NULL default '', `severity` tinyint(3) unsigned NOT NULL default '0', `message_id` varchar(20) default NULL, `vendor_id` tinyint(3) unsigned NOT NULL default '0', `message` varchar(255) NOT NULL default '', `insert_ts` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP, PRIMARY KEY (`event_id`), KEY `idx__event__event_time__host_address__severity__e vent_id` (`event_time`,`host_address`,`severity`,`event_id` ), KEY `idx__event__facility` (`facility`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;CREATE TABLE `event_detail` ( `event_id` bigint(20) unsigned NOT NULL default '0', `interface` varchar(15) default NULL, `protocol_id` smallint(5) unsigned default NULL, `traffic_type_id` tinyint(3) unsigned default NULL, `traffic_action` tinyint(3) unsigned default NULL, `traffic_dir` tinyint(3) unsigned default NULL, `policy_id` varchar(15) default NULL, `zone` enum('UNKNOWN','SELF','TRUST','UNTRUST') default NULL, `event_type` enum('UNKNOWN','TRAFFIC','DEVICE','CTRL','ALARM',' WEB') default NULL, PRIMARY KEY (`event_id`), KEY `idx__event_detail__event_type__zone__traffic_acti on` (`event_type`,`zone`,`traffic_action`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;CREATE TABLE `event_ip_detail` ( `event_id` bigint(20) unsigned NOT NULL default '0', `src_ip` varchar(15) default NULL, `dst_ip` varchar(15) default NULL, `src_ip_desc` varchar(25) default NULL, `dst_ip_desc` varchar(25) default NULL, `src_port` smallint(5) unsigned default NULL, `dst_port` smallint(5) unsigned default NULL, `src_port_desc` varchar(25) default NULL, `dst_port_desc` varchar(25) default NULL, PRIMARY KEY (`event_id`), KEY `idx__event_ip_detail__src_ip__src_ip_desc` (`src_ip`,`src_ip_desc`), KEY `idx__event_ip_detail__dst_ip__dst_ip_desc` (`dst_ip`,`dst_ip_desc`), KEY `idx__event_ip_detail__dst_ip__src_ip` (`dst_ip`,`src_ip`), KEY `idx__event_ip_detail__src_ip__dst_port` (`src_ip`,`dst_port`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;CREATE TABLE `event_icmp_detail` ( `event_id` bigint(20) unsigned NOT NULL default '0', `icmp_type` tinyint(3) unsigned default NULL, `icmp_code` tinyint(3) unsigned default NULL, PRIMARY KEY (`event_id`), KEY `idx__event_icmp_detail__icmp_type__icmp_code` (`icmp_type`,`icmp_code`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;


      (It also wasn't my idea to store IP addresses as VARCHAR(15)'s instead of INTEGER UNSIGNED. That's something else I'd like to change, but there's a lot of code that will have to be modified when I do that.)

      As for a sample query, here's an example of how we might obtain the list of most common destination hosts:


      SELECT IP.dst_ip, COUNT(IP.event_id) AS dst_countFROM event E USE INDEX(idx__event_event_time__host_address__severit y__event_id) INNER JOIN event_ip_detail IP ON E.event_id = IP.event_id INNER JOIN event_detail D ON E.event_id = D.event_idWHERE E.event_time BETWEEN 20070601000000 AND 20070602000000 AND D.traffic_action = 0 AND D.event_type = 'TRAFFIC' AND D.zone = 'TRUST' AND IP.dst_ip_desc IS NOT NULLGROUP BY IP.dst_ipORDER BY dst_count DESCLIMIT 10;


      Although, as I said, it's really mostly the deletes that are slow, but those are pretty uninteresting. Originally I was doing something like:


      DELETE event_ip_detailFROM event E INNER JOIN event_ip_detail IP ON E.event_id = IP.event_idWHERE E.event_time BETWEEN 20050101000000 AND 20070502000000LIMIT 50000;(repeat for event_icmp_detail and event_detail, then do a straightforward delete from event)(repeat until affected_rows == 0)


      But after some testing I determined that it was actually faster to avoid those three joins and just do this:


      SELECT event_idFROM eventWHERE event_time BETWEEN 20070501000000 AND 20070502000000LIMIT 50000;DELETE FROM event_icmp_detail WHERE event_id IN ($event_ids);DELETE FROM event_ip_detail WHERE event_id IN ($event_ids);DELETE FROM event_detail WHERE event_id IN ($event_ids);DELETE FROM event WHERE event_id IN ($event_ids);(repeat until we get no more events in back from the SELECT)


      I had to raise my max query length to let me send a query with 50,000 event_id's in it, of course, but it seems to work pretty well.

      Does that help?

      Comment

      Working...
      X