HP DL385 Performance and Memory giving me grey hair :(

  • Filter
  • Time
  • Show
Clear All
new posts

  • HP DL385 Performance and Memory giving me grey hair :(

    Hmmm ok so I'm not quite sure where to start here....

    A bit of background to our setup...

    2 Datacentres, 15 servers in each, only one datacentre active.

    In Active DC - Master DB is a HP DL360, 2 x 6 Core 4800Mhz CPU , 64Gb RAM - all other servers in the DC are slaves to this.
    12 of the slaves are also DL360's - exactly the same config (RAM, CPU) , 2 of the servers are DL385: 64Gb RAM, 2 x 16 Core 3500Mhz CPU

    In Standby DC we have exactly the same - 13 DL360's and 2 DL385's - All the same spec
    One of the DL360's is a Slave to the Master in the Active DC, all other servers are slaved from that.

    Everything is good so far...

    We're using our standby DC to try and bottom out some performance issues - specifically the two DL385s are under performing by orders of magnitude compared to the DL360s.

    In addition, on one of the DL385's, If I up the innodb_buffer_pool to 35Gb - Mysql won't start, yet on the other DL385 it's fine.

    In terms of my.cnf parameters - apart from the obvious bin logging enabled on the masters, everything is the same and controlled by puppet.

    I'm kind of lost as to:
    a) why won't mysql start with anything greater than 35Gb buffer pool on one server, but on another identical one it's fine
    b) why are the DL385's performing so badly

    I know the information I have provided is probably only a fraction of what is needed for a much more detailed investigation but just as a top level guess, can anyone think of anything that I'm missing?

    We're using 5.5.30-rel30.2.500 on all boxes.

    Key my.cnf params as follows (This is from a server that starts fine with a 35Gb Buffer pool):
    This is the error we get when we increase the buffer pool to 35Gb or more on one of the DL385s:
    130711 13:50:55 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended
    130711 13:50:56 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/data
    130711 13:50:56 [Note] Plugin 'FEDERATED' is disabled.
    130711 13:50:56 InnoDB: The InnoDB memory heap is disabled
    130711 13:50:56 InnoDB: Mutexes and rw_locks use GCC atomic builtins
    130711 13:50:56 InnoDB: Compressed tables use zlib 1.2.3
    130711 13:50:56 InnoDB: Using Linux native AIO
    130711 13:50:56 InnoDB: Error: Linux Native AIO is not supported on tmpdir.
    InnoDB: You can either move tmpdir to a file system that supports native AIO
    InnoDB: or you can set innodb_use_native_aio to FALSE to avoid this message.
    130711 13:50:56 InnoDB: Error: Linux Native AIO check on tmpdir returned error[22]
    130711 13:50:56 InnoDB: Warning: Linux Native AIO disabled.
    130711 13:50:56 InnoDB: Initializing buffer pool, size = 35.0G
    130711 13:50:58  InnoDB: Assertion failure in thread 47165255037984 in file ut0mem.c line 103 
    InnoDB: Failing assertion: ret || !assert_on_error 
    InnoDB: We intentionally generate a memory trap.
    InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
    InnoDB: If you get repeated assertion failures or crashes, even
    InnoDB: immediately after the mysqld startup, there may be
    InnoDB: corruption in the InnoDB tablespace. Please refer to
    InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
    InnoDB: about forcing recovery.
    12:50:58 UTC - mysqld got signal 6 ;
    This could be because you hit a bug. It is also possible that this binary
    or one of the libraries it was linked against is corrupt, improperly built,
    or misconfigured. This error can also be caused by malfunctioning hardware.
    We will try our best to scrape up some info that will hopefully help
    diagnose the problem, but since we have already crashed, 
    something is definitely wrong and this may fail.
    Please help us make Percona Server better by reporting any
    bugs at http://bugs.percona.com/
    It is possible that mysqld could use up to 
    key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6577353 K  bytes of memory
    Hope that's ok; if not, decrease some variables in the equation.
    Thread pointer: 0x0
    Attempting backtrace. You can use the following information to find out
    where mysqld died. If you see no messages after this, something went
    terribly wrong...
    stack_bottom = 0 thread_stack 0x40000
    You may download the Percona Server operations manual by visiting
    http://www.percona.com/software/percona-server/. You may find information
    in the manual which will help you identify the cause of the crash.
    130711 13:50:58 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended

  • #2
    If InnoDB attempts to allocate more memory for it's buffer pool then available, it's not crashing like in your case, it will just give you a nice message and refuse to initialize, like this:

    130717 12:04:10 InnoDB: Initializing buffer pool, size = 100.0G
    InnoDB: mmap(110310195200 bytes) failed; errno 12
    130717 12:04:10 InnoDB: Completed initialization of buffer pool
    130717 12:04:10 InnoDB: Fatal error: cannot allocate memory for the buffer pool
    130717 12:04:10 [ERROR] Plugin 'InnoDB' init function returned error.
    130717 12:04:10 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
    130717 12:04:10 [ERROR] Unknown/unsupported storage engine: InnoDB
    130717 12:04:10 [ERROR] Aborting
    But in your case, MySQL crashed in this function:

    Allocates memory.
    @return own: allocated memory */
            ulint   n,              /*!< in: number of bytes to allocate */
            ibool   assert_on_error)/*!< in: if TRUE, we crash mysqld if the
                                    memory cannot be allocated */
    #ifndef UNIV_HOTBACKUP
            ulint   retry_count;
            void*   ret;
            if (UNIV_LIKELY(srv_use_sys_malloc)) {
                    ret = malloc(n);
                    ut_a(ret || !assert_on_error);

    Which could mean the memory area you are trying to use is corrupted. This could also explain overall performance problems. Can't you see anything related in dmesg log?
    I would suggest to perform deep heath checks of those DL385s, starting from full memtest.

    Also, on servers, where you can set higher buffer pool which would fit most hot data in memory, the performance could indeed be orders of magnitude better then on servers where the buffer pool is too small to keep those hot data. It's disk vs memory speed