MySQL distribution change e.g. MySQL community edition to Percona Server

Lastest Forum Posts - August 26, 2015 - 4:43pm
I would like to know process/steps to change following
5.1.34sp1-enterprise-gpl-advanced-log MySQL Enterprise Server - Advanced Edition (GPL)
to Percona Server 5.1.X ( 5.1 latest release)

Are these different MySQL distributions compatible with each other ( I understand that some version might have some extra features/performance etc) ? What is process to convert above to Percona release or Percona release to Community Release etc ?

Thanks a lot.

ObjectRocket’s David Murphy talks about MongoDB, Percona Live Amsterdam

Latest MySQL Performance Blog posts - August 26, 2015 - 1:07pm

Say hello to David Murphy, lead DBA and MongoDB Master at ObjectRocket (a Rackspace company). David works on sharding, tool building, very large-scale issues and high-performance MongoDB architecture. Prior to ObjectRocket he was a MySQL/NoSQL architect at Electronic Arts. David enjoys large-scale operational tool building, high performance OS and database tuning. He is also a core code contributor to MongoDB. He’ll be speaking next month at Percona Live Amsterdam, which runs Sept. 21-13. Enter promo code “BlogInterview” at registration to save €20!

Tom: David, your 3-hour tutorial is titled “Mongo Sharding from the trench: A Veterans field guide.” Did your experience in working with vast amounts of data at Rackspace give you a unique perspective, in view, that now puts you into a position to help people just getting started? Can you give a couple examples?

David: I think this has been something organically I grew into from the days of supporting Cpanel type MySQL instances to today. I have worked for a few verticals from hosts to advertising to gaming, finally entering into the platform service. The others give me a host of knowledge around how customer need systems to work, and then the number and range of workloads we see at Rackspace re-enforces this.

ObjectRocket’s David Murphy

Many times the unique perspective comes with the scale such as someone calling up a single node to the multi-terabyte range. When they go to “shard” they can find the process that is normally very light and unnoticeable to most Mongo sharding can severally lock the metadata for an extended time. In other cases, the “balancer” might not be able to keep up with the amount of working being asked of it.

Toward the smaller end of the spectrum, having seen so many workloads from big to small. I can see similar thought processes and trends. When this happens having worked with some many of these workloads, and honestly having learned along the evolution of mongo helps me explain to clients the good, bad, and the hairy. Many times discussions come down to people not using connection pooling, non-indexed sorting, or complex operators such as $in, $nin, and more. In these cases, I can talk to people about the balance of using these concepts and when they will become bigger issues for them. My goal is to give them the enough knowledge to help determine when it is correct to use development resource to fix and issue, and when it’s manageable and that development could be better spent elsewhere.


Tom: The title of your tutorial also sounds like the perfect title of a book. Do you have any for one?

David: What an excellent question! I have thought about this. However, I think the goal of a book if I can find the time to do it. A working title might be “Mongo from the trenches: Surviving the minefield to get ahead”. I think the book might be broken into three sections:  “When should you use or not user Mongo”,  “Schema and Operatorators in the NoSQL world”, “Sharding”. I would do this as this could be a great mini book on its own the community really could use a level of depth similar to the MySQL 5.0 certification guides.  I liked these books as it helped someone understand all the bits of what to consider with your schema design and how it affects the application as much as the database hosts. Then in the second half more administration geared it took those same schema and design choices to help you manage them with confidence.

In the end, Mongo is a good product that works well for most people as it matures we need more and discussion. On topics such as what should you monitor, how you should predict issues, and how valuable are regular audits. Especially in an ecosystem where it’s easy to spin something up, launch it, and move on to the next project.


Tom: When and why would you recommend using MongoDB instead of MySQL?

David: I am glad I mentioned this is worthy of a book already, as it is such a complex topic and one that gets me very excited.

I feel there is a bit or misinformation on both sides of this field. Many in the MySQL camp of experts know when someone says they can’t get more than 1000 TPS via MySQL. 9 out of 10 times and design, not a technology issue,  the Mongo crowd love this and due to inherit sharding nature of Mongo they can sidestep these types of issues. Conversely in the Mongo camp you will hear how bad the  SQL standard is, however, omitting transactions for a moment, the same types of operations exist in MySQL and Mongo.  There are some interesting powers in the Mongo aggregation. However, SQL is more powerful and just as complex as some map reduce jobs and aggregations I have written.

As to your question, MySQL will always win in regards to repeatable reads to the database in a transaction. There is some talk of limited transactions in Mongo. However, these will likely not become global and cluster wide anytime soon if ever.  I don’t trust floats in Mongo for financials; it’s not that Mongo doesn’t do them but rather JavaScript type floats are present. Sometimes you need to store data as a 64-bit integer and do math in the app to make it a high precision float. MySQL, on the other hand, has excellent support for precision.

Another area is simply looking at the history of Mongo and MySQL.  Mongo until WiredTiger and  RocksDB were very similar to MyISAM from a locking behavior and support perspective. With the advent of the new storage system, we will-will see major leaps forward in types of flows you will want in Mongo. With the writer lock issue is gone, and locking between the systems is becoming more and more similar making deciding which much harder.

The news is not all use. However, subdocuments and array support in Mongo is amazing there are so many things  I can do in Mongo that even in bitwise SET/ENUM operators I could not do. So if you need that type of system, or you want to create a semi denormalize for of a view in the database. Mongo can do this with ease and on the fly. MySQL, on the other hand, would take careful planning and need whole tables updated.  In this regard I feel more people could use Mongo and is ability to have a versioned document schema allowing more incremental changes to documents. With new code  releases, allowing the application to read old version and “upgrade” them to the latest form. Removing a whole flurry of maintenance related pains that RDBMs have to the frustration of developers who just want to launch the new product.

The last thing I would want to say here is you need not choose, why not use both. Mongo can be very powerful for keeping a semi denormalized version of the data that is nimble to allow fast application or system updates and features. Leaving MySQL for a very specific workload that need the precision are simple are not expected to have schema changes.  I am a huge fan of keeping the transactional portions in MySQL, and the rest in Mongo. Allowing you to scale quickly up and down the build of your data needs, and more slowly change the parts that need to be 100% consistent all of the time with no room for eventual consistency.


Tom: What another session(s) are you most looking forward to besides your own at Percona Live Amsterdam?

David: There are a few that are near and dear to me.

Turtles all the way down: tuning Linux for database workloads” looks like a great one. It is one view I have always had, and DBA’s should be DBA’s,  SysAdmins, and Storage people rolled into one. That way they can understand the impacts of the application down to the blocks the database reads.

TokuDB internals” is another one. I have used TokuDB in MySQL and Mongo to some degree but as it has never had in-depth documentation. A topic like that is a great way to fill any gaps for experienced and new people alike.

Database Reliability Engineering” looks like a great talk from a great speaker.

As an InnoDB geek, I like the idea around “Understanding InnoDB locks: case studies.”

I see a huge amount of potential for MaxScale if anyone else is curious, “Anatomy of a Proxy Server: MaxScale Internals” should be good for R/W splits and split writing type cases.

Finally, one of my favorite people is Charity as she always is so energetic and can get to the heart of the matter. If you are not going to “Upgrade your database: without losing your data, your perf or your mind” you are missing out!


Tom: Thanks for speaking with me, David! Is there anything else you’d like to add: either about Rackspace or Percona Live Amsterdam?

David: In regards to Rackspace, I urge everyone to check out the Data Services group.  We handle everything from Redis to Hadoop with a goal of augmenting your groups or providing experts to help keep your uptime as high as possible. With options for dedicated hosts to platform type services, there is something that helps everyone. Rackspace is not just a cloud company but a real support company that provides amazing hardware to use, or support for other hardware location that is growing rapidly.

With Percona Amsterdam, everyone should come the group of speakers is simply amazing, I for one am excited by so many topics because they are all so compelling. Outside of that you will it hard find another a gathering of database experts with multiple technologies under their belt and who truly believe in the move to picking the right technology for the right use case.

The post ObjectRocket’s David Murphy talks about MongoDB, Percona Live Amsterdam appeared first on MySQL Performance Blog.

MyISAM performance drop after upgrade from 5.5 to 5.6

Lastest Forum Posts - August 26, 2015 - 5:09am

We recently upgraded our three-server cluster from Percona MySQL 5.5 to 5.6 and have run into problems with read performance on a number of MyISAM tables.

We started on CentOS 6.6 (Percona-Server-server-55-5.5.42-rel37.1.el6.x86_64) and upgrade to Percona MySQL 5.6 on Gentoo (percona-server-

The upgrade was done in-place, however the configuration was to some extent rebuilt as part of the upgrade.

An example table we're having problems with would look like this:

mysql> describe content.placement;
| Field | Type | Null | Key | Default | Extra |
| storyid | int(11) | NO | PRI | 0 | |
| revision | int(11) | NO | PRI | 0 | |
| placementid | int(11) | NO | PRI | 0 | |
| platform | int(11) | NO | MUL | 0 | |
| cat_id | int(11) | NO | MUL | 0 | |
| layout | int(11) | NO | | 0 | |
| priority | int(11) | NO | MUL | 0 | |
| published_date | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| expire_date | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| topstory | tinyint(3) unsigned | NO | MUL | 0 | |
| status | tinyint(4) | NO | MUL | 0 | |
| revisionstatus | tinyint(4) | NO | MUL | 0 | |
| sort | tinyint(4) | NO | MUL | 0 | |
| topsort | tinyint(4) | NO | | 0 | |
| channeltop | tinyint(4) | NO | | 0 | |
| infinity | tinyint(4) | NO | | 0 | |

The query that is giving us grief is this one, a relatively simple one:

mysql> explain SELECT SQL_NO_CACHE storyid FROM placement WHERE cat_id IN (88,248,51,250,807,1534,1855,1860,2247,2457,4451) ORDER by published_date DESC LIMIT 50;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | placement | index | cat_id,catpub | published_date | 8 | NULL | 2246 | Using where |
1 row in set (0.00 sec)

Our problem is that this query has a run-time seemingly randomly distributed between 0.05 and 3++ seconds. In 5.5, this query was consistently fast, i.e. less than 0.1s. Another confusing issue is that the chosen key seems to vary from server to server, but also from query execution to query execution.

We believe to have eliminated the underlying system as a source of problems. What we do however see is that the query seems to get stuck in various phases, and we don't quite understand why:

mysql> show profile for query 35;
| Status | Duration |
| starting | 0.000054 |
| checking permissions | 0.000004 |
| Opening tables | 0.000023 |
| init | 0.000021 |
| System lock | 0.000006 |
| optimizing | 0.000004 |
| statistics | 0.191296 |
| preparing | 0.000052 |
| Sorting result | 0.000016 |
| executing | 0.000006 |
| Sending data | 1.426275 |
| end | 0.000000 |
| query end | 0.000013 |
| closing tables | 0.000004 |
| freeing items | 0.000031 |
| cleaning up | 0.000026 |
16 rows in set, 1 warning (0,00 sec)


mysql> show profile for query 16;
| Status | Duration |
| starting | 0.000033 |
| checking permissions | 0.000004 |
| Opening tables | 0.000040 |
| init | 0.000066 |
| System lock | 0.000009 |
| optimizing | 0.000004 |
| statistics | 0.000035 |
| preparing | 0.000010 |
| Sorting result | 0.000003 |
| executing | 0.000002 |
| Sending data | 0.000004 |
| Creating sort index | 0.014366 |
| end | 0.000005 |
| query end | 0.000004 |
| closing tables | 0.000006 |
| freeing items | 0.000232 |
| cleaning up | 0.000109 |
17 rows in set, 1 warning (0.00 sec)

Does anyone have any ideas where we should dig further? Our extensive attempts at research and debugging are bringing us nowhere at the moment - thank you in advance for any input.

Best regards

mysqldump import status mysql < filename

Lastest Forum Posts - August 25, 2015 - 4:44pm
-bash-3.2$ mysql -u root -pmypassword < Data.sql

ERROR 1142 (42000) at line 35275: SELECT,LOCK TABL command denied to user 'root'@'localhost' for table 'cond_instances'

I see some error while importing. What happens after first error , does import still keep going or just aborts. How do I know whether it has imported all dtabases/rows etc ?

Thanks a lot.

Advanced Query Tuning in MySQL 5.6 and MySQL 5.7 Webinar: Q&A

Latest MySQL Performance Blog posts - August 24, 2015 - 7:16am

Thank you for attending my July 22 webinar titled “Advanced Query Tuning in MySQL 5.6 and 5.7” (my slides and a replay available here). As promised here is the list of questions and my answers (thank you for your great questions).

Q: Here is the explain example:

mysql> explain extended select id, site_id from test_index_id where site_id=1 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test_index_id type: ref possible_keys: key_site_id key: key_site_id key_len: 5 ref: const rows: 1 filtered: 100.00 Extra: Using where; Using index

why is site_id a covered index for the query, given the fact that a) we are selecting “id”, b) key_site_id only contains site_id?

As the table is InnoDB, all secondary keys will always contain primary key (“id”); in this case the secondary index will contain all needed information to satisfy the above query and key_site_id will be “covered index”

Q: Applications change over time. Do you suggest doing a periodic analysis of indexes that are being used and drop the ones that are not? If yes, any suggestions as to tackle that?

Yes, that is a good idea. Usually it can be done easily with Percona toolkit or Performance_schema in MySQL 5.6

  1. Enable slow query log and log every query, then use Pt-index-usage tool
  2. Or use the following query (as suggested by FromDual blog post):

SELECT object_schema, object_name, index_name FROM performance_schema.table_io_waits_summary_by_index_usage WHERE index_name IS NOT NULL AND count_star = 0 ORDER BY object_schema, object_name;

Q: Does the duplicate index is found on 5.6/5.7 will that causes an performance impact to the db while querying?

Duplicate keys can have negative impact on selects:

  1. MySQL can get confused and choose a wrong index
  2. Total index size can grow, which can cause MySQL to run out of RAM

Q: What is the suggested method to measure performance on queries (other than the slow query log) so as to know where to create indexes?

Slow query log is most common method. In MySQL 5.6 you can also use Performance Schema and use events_statements_summary_by_digest table.

Q: I’m not sure if this was covered in the webinar but… are there any best-practices for fulltext indexes?

That was not covered in this webinar, however, I’ve done a number of presentations regarding Full Text Indexes. For example: Creating Geo Enabled Applications with MySQL 5.6

Q: What would be the limit on index size or number of indexes you can defined per table?

There are no limits on Index size on disk, however, it will be good (performance wise) to have active indexes fit in RAM.

In InnoDB there are a number of index limitations, i.e. a table can contain a maximum of 64 secondary indexes.

Q:  If a table has two columns you would like to sum, can you have that sum indexed as a calculated index? To add to that, can that calculated index have “case when”?

Just to clarify, this is only a feature of MySQL 5.7 (not released yet).

Yes, it is documented now:

CREATE TABLE triangle ( sidea DOUBLE, sideb DOUBLE, sidec DOUBLE AS (SQRT(sidea * sidea + sideb * sideb)) );

Q: I have noticed that you created indexes on columns like DayOfTheWeek with very low cardinality. Shouldn’t that be a bad practice normally?

Yes, you are right! Unless, you are doing queries like “select count(*) from … where DayOfTheWeek = 7” those indexes may not be very useful.

Q: I saw an article that if you don’t specify a primary key upfront mysql / innodb creates one in the background (hidden). Is it different from a primary key itself, if most of the where fields that are used not in the primary / semi primary key? And is there a way to identify the tables with the hidden primary key indexes?

The “hidden” primary key will be 6 bytes, which will also be appended (duplicated) to all secondary keys. You can create an INT primary key auto_increment, which will be smaller (if you do not plan to store more than 4 billion rows). In addition, you will not be able to use the hidden primary key in your queries.

The following query (against information_schema) can be used to find all tables without declared primary key (with “hidden” primary key):

SELECT tables.table_schema, tables.table_name, tables.table_rows FROM information_schema.tables LEFT JOIN ( SELECT table_schema, table_name FROM information_schema.statistics GROUP BY table_schema, table_name, index_name HAVING SUM( CASE WHEN non_unique = 0 AND nullable != 'YES' THEN 1 ELSE 0 END ) = COUNT(*) ) puks ON tables.table_schema = puks.table_schema AND tables.table_name = puks.table_name WHERE puks.table_name IS NULL AND tables.table_type = 'BASE TABLE' AND engine='InnoDB'

You may also use mysql.innodb_index_stats table to find rows with the hidden primary key:


mysql> select * from mysql.innodb_index_stats; +---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+ | database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description | +---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+ | test | t1 | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | n_diff_pfx01 | 96 | 1 | DB_ROW_ID | | test | t1 | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index | | test | t1 | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | size | 1 | NULL | Number of pages in the index | +---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+

Q: You are using the alter table to create index, but how does mysql sort the data for creating the index? isn’t it uses temp table for that?

That is a very good question: the behavior of the “alter table … add index” has changed over time. As documented in Overview of Online DDL:

Historically, many DDL operations on InnoDB tables were expensive. Many ALTER TABLE operations worked by creating a new, empty table defined with the requested table options and indexes, then copying the existing rows to the new table one-by-one, updating the indexes as the rows were inserted. After all rows from the original table were copied, the old table was dropped and the copy was renamed with the name of the original table.

MySQL 5.5, and MySQL 5.1 with the InnoDB Plugin, optimized CREATE INDEX and DROP INDEX to avoid the table-copying behavior. That feature was known as Fast Index Creation

When MySQL uses “Fast Index Creation” operation it will create a set of temporary files in MySQL’s tmpdir:

To add a secondary index to an existing table, InnoDB scans the table, and sorts the rows using memory buffers and temporary files in order by the values of the secondary index key columns. The B-tree is then built in key-value order, which is more efficient than inserting rows into an index in random order.

Q: How good is InnoDB deadlocks on 5.7 comparing to 5.6 version. Is that based on parameters setup?

InnoDB deadlocks discussion is outside of the scope of this presentation. Valerii Kravchuk and Nilnandan Joshi did an excellent talk at Percona Live 2015 (slides available): Understanding Innodb Locks and Deadlocks

Q: What is the performance impact of generating a virtual column for a table having 66 Million records and generating the index. And how would you go about it? Do you have any suggestions on how to re organize indexes on the physical disk?

As MySQL 5.7 is not released yet, behavior of the virtual columns may change.  The main question here is: will it be online operations to a) add a virtual column (as this is only metadata change it should be very light operation anyway). b) add index on that virtual column. In the labs released it was not online, however this can change.

Thank you again for attending.

The post Advanced Query Tuning in MySQL 5.6 and MySQL 5.7 Webinar: Q&A appeared first on MySQL Performance Blog.

Which situation it's suitable to set &amp;quot;tokudb_directio=1&amp;quot;?

Lastest Forum Posts - August 22, 2015 - 2:02am
As title means, I couldn't find which situation to set "tokudb_directio=1" in documentation,anyone knows?

Fresh Install: ERROR! The server quit without updating PID file

Lastest Forum Posts - August 21, 2015 - 9:38am
Starting MySQL (Percona Server).. ERROR! The server quit without updating PID file (/var/lib/mysql/mysql.pid).

Fresh install on new server. Created config file using the configuration wizard.

Server will not start. Tried permissions, removing ib_logfile's.

Server is amazon small instance. Any help would be great.

Log file:

150821 16:26:30 mysqld_safe mysqld from pid file /var/lib/mysql/mysql.pid ended
150821 16:27:24 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/
2015-08-21 16:27:25 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-08-21 16:27:25 0 [Warning] 'ERROR_FOR_DIVISION_BY_ZERO' is deprecated and will be removed in a future release.
2015-08-21 16:27:25 0 [Note] /usr/sbin/mysqld (mysqld 5.6.25-73.1-log) starting as process 5547 ...
2015-08-21 16:27:25 5547 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
2015-08-21 16:27:25 5547 [Note] Plugin 'FEDERATED' is disabled.
2015-08-21 16:27:25 5547 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-08-21 16:27:25 5547 [Note] InnoDB: The InnoDB memory heap is disabled
2015-08-21 16:27:25 5547 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-08-21 16:27:25 5547 [Note] InnoDB: Memory barrier is not used
2015-08-21 16:27:25 5547 [Note] InnoDB: Compressed tables use zlib 1.2.3
2015-08-21 16:27:25 5547 [Note] InnoDB: Using Linux native AIO
2015-08-21 16:27:25 5547 [Note] InnoDB: Using CPU crc32 instructions
2015-08-21 16:27:25 5547 [Note] InnoDB: Initializing buffer pool, size = 1.4G
2015-08-21 16:27:25 5547 [Note] InnoDB: Completed initialization of buffer pool
2015-08-21 16:27:25 5547 [Note] InnoDB: Restoring page 0 of tablespace 0
2015-08-21 16:27:25 5547 [Warning] InnoDB: Doublewrite does not have page_no=0 of space: 0
2015-08-21 16:27:25 5547 [ERROR] InnoDB: space header page consists of zero bytes in data file ./ibdata1
2015-08-21 16:27:25 5547 [ERROR] InnoDB: Could not open or create the system tablespace. If you tried to add new data files to the system tablespace, and it failed here, you should now edit innodb_data_file_path in my.cnf back to what it was,
and remove the new ibdata files InnoDB created in this failed attempt. InnoDB only wrote those files full of zeros, but did not yet use them in any way. But be careful: do not remove old data files which contain your precious data!
2015-08-21 16:27:25 5547 [ERROR] Plugin 'InnoDB' init function returned error.
2015-08-21 16:27:25 5547 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2015-08-21 16:27:25 5547 [ERROR] Unknown/unsupported storage engine: InnoDB
2015-08-21 16:27:25 5547 [ERROR] Aborting

2015-08-21 16:27:25 5547 [Note] Binlog end
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'partition'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'ARCHIVE'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'BLACKHOLE'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_CHANGED_PAGES'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_SYS_DATAFILES'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_SYS_TABLESPACES'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN_COLS'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_SYS_FIELDS'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_SYS_COLUMNS'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_SYS_INDEXES'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_SYS_TABLESTATS'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_SYS_TABLES'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_FT_INDEX_TABLE'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_FT_INDEX_CACHE'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_FT_CONFIG'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_FT_BEING_DELETED'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_FT_DELETED'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_FT_DEFAULT_STOPWORD'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_METRICS'
2015-08-21 16:27:25 5547 [Note] Shutting down plugin 'INNODB_BUFFER_POOL_STATS'

Find unused indexes on MongoDB and TokuMX

Latest MySQL Performance Blog posts - August 21, 2015 - 8:33am

Finding and removing unused indexes is a pretty common technique to improve overall performance of relational databases. Less indexes means faster insert and updates but also less disk space used. The usual way to do it is to log all queries’ execution plans and then get a list of those indexes that are not used. Same theory applies to MongoDB and TokuMX so in this blog post I’m going to explain how to find those.

Profiling in MongoDB

To understand what profiling is you only need to think about MySQL’s slow query log, it is basically the same idea. It can be enabled with the following command:

db.setProfilingLevel(level, slowms)

There are three different levels:

0: No profiling enabled.
1: Only those queries slower than “slowms” are profiled.
2: All queries are profiled, similar to query_long_time=0.

Once it is enabled you can use db.system.profile.find().pretty() to read it. You would need to scan through all profiles and find those indexes that are never used. To make things easier there is a javascript program that will find the unused indexes after reading all the profile information. Unfortunately, it only works with mongodb 2.x.

The javascript is hosted in this github project https://github.com/wfreeman/indexalizer You just need to start mongo shell with indexStats.js and run db.indexStats() command. This is an sample output:

scanning profile {ns:"test.col"} with 2 records... this could take a while. { "query" : { "b" : 1 }, "count" : 1, "index" : "", "cursor" : "BtreeCursor b_1", "millis" : 0, "nscanned" : 1, "n" : 1, "scanAndOrder" : false } { "query" : { "b" : 2 }, "count" : 1, "index" : "", "cursor" : "BtreeCursor b_1", "millis" : 0, "nscanned" : 1, "n" : 1, "scanAndOrder" : false } checking for unused indexes in: col this index is not being used: "_id_" this index is not being used: "a_1"


So “a_1” is not used and could be dropped. We can ignore “_id_” because that one is needed

There is a problem with profiling. It will affect performance so you need to run it only for some hours and usually during low peak. That means that there is a possibility that not all possible queries from your application are going to be executed during that maintenance window. What alternative TokuMX provides?

Finding unused indexes in TokuMX

Good news for all of us. TokuMX doesn’t require you to enable profiling. Index usage statistics are stored as part of every query execution and you can access them with a simple db.collection.stats() command. Let me show you an example:

> db.col.stats() [...] { "name" : "a_1", "count" : 5, "size" : 140, "avgObjSize" : 28, "storageSize" : 16896, "pageSize" : 4194304, "readPageSize" : 65536, "fanout" : 16, "compression" : "zlib", "queries" : 0, "nscanned" : 0, "nscannedObjects" : 0, "inserts" : 0, "deletes" : 0 }, { "name" : "b_1", "count" : 5, "size" : 140, "avgObjSize" : 28, "storageSize" : 16896, "pageSize" : 4194304, "readPageSize" : 65536, "fanout" : 16, "compression" : "zlib", "queries" : 2, "nscanned" : 2, "nscannedObjects" : 2, "inserts" : 0, "deletes" : 0 } ], "ok" : 1 }


There are our statistics without profiling enabled. queries means the number of times that index has been used on a query execution. b_1 has been used twice and a_1 has never been used. You can use this small javascript code I’ve written to scan all collections inside the current database:

db.forEachCollectionName(function (cname) { output = db.runCommand({collstats : cname }); print("Checking " + output.ns + "...") output.indexDetails.forEach(function(findUnused) { if (findUnused.queries == 0) { print( "Unused index: " + findUnused.name ); }}) });


An example using the same data:

> db.forEachCollectionName(function (cname) { ... output = db.runCommand({collstats : cname }); ... print("Checking " + output.ns + "...") ... output.indexDetails.forEach(function(findUnused) { if (findUnused.queries == 0) { print( "Unused index: " + findUnused.name ); }}) ... ... }); Checking test.system.indexes... Checking test.col... Unused index: a_1



Finding unused indexes is a regular task that every DBA should do. In MongoDB you have to use profiling while in TokuMX nothing needs to be enabled because it will gather information by default without impacting service performance.

The post Find unused indexes on MongoDB and TokuMX appeared first on MySQL Performance Blog.

Replication setup

Lastest Forum Posts - August 21, 2015 - 12:16am
I have Master slave replication setup on Windows server MySQL 5.6.22 without GTID. With below parameters in my.ini

masterA> my.ini


master status: mysql-bin-0002.log POS: 19548

slaveB> my.ini

relay_log_index= mysql-relay-bin.index

ON slave: mysql-bin-0002.log POS: 19548 * As I have log-bin and log-slave-updates enabled

I need to promote slave as master and master as slave. But I want the new slave to continue from mysql-bin-0002.log POS: 19548. What I need to know is whether below steps are correct. Is it necessary to issue reset slave command on slave

1. Stop tomcat so that there are no connections to mysql.
2. Verfiy whether master and slave are in sync. if yes then proceed further.
3. On SLAVE> stop slave; Change master to master_host=''; ###I will keep it blank.
4. On SLAVE> comment read_only parameter
5. On MASTER> Stop Mysql service. Add below parameters in my.ini
relay_log_index= mysql-relay-bin.index
6. On MASTER> start mysql service. Issue change master to command.
7. Start slave.

Optimizing PXC Xtrabackup State Snapshot Transfer

Latest MySQL Performance Blog posts - August 20, 2015 - 8:38am
State Snapshot Transfer (SST) at a glance

PXC uses a protocol called State Snapshot Transfer to provision a node joining an existing cluster with all the data it needs to synchronize.  This is analogous to cloning a slave in asynchronous replication:  you take a full backup of one node and copy it to the new one, while tracking the replication position of the backup.

PXC automates this process using scriptable SST methods.  The most common of these methods is the xtrabackup-v2 method which is the default in PXC 5.6.  Xtrabackup generally is more favored over other SST methods because it is non-blocking on the Donor node (the node contributing the backup).

The basic flow of this method is:

  • The Joiner:
    • joins the cluster
    • Learns it needs a full SST and clobbers its local datadir (the SST will replace it)
    • prepares for a state transfer by opening a socat on port 4444 (by default)
    • The socat pipes the incoming files into the datadir/.sst directory
  • The Donor:
    • is picked by the cluster (could be configured or be based on WAN segments)
    • starts a streaming Xtrabackup and pipes the output of that via socat to the Joiner on port 4444.
    • Upon finishing its backup, sends an indication of this and the final Galera GTID of the backup is sent to the Joiner
  • The Joiner:
    • Records all changes from the Donor’s backup’s GTID forward in its gcache (and overflow pages, this is limited by available disk space)
    • runs the –apply-log phase of Xtrabackup on the donor
    • Moves the datadir/.sst directory contents into the datadir
    • Starts mysqld
    • Applies all the transactions it needs (Joining and Joined states just like IST does it)
    • Moves to the ‘Synced’ state and is done.

There are a lot of moving pieces here, and nothing is really tuned by default.  On larger clusters, SST can be quite scary because it may take hours or even days.  Any failure can mean starting over again from the start.

This blog will concentrate on some ways to make a good dent in the time SST can take.  Many of these methods are trade-offs and may not apply to your situations.  Further, there may be other ways I haven’t thought of to speed things up, please share what you’ve found that works!

The Environment

I am testing SST on a PXC 5.6.24 cluster in AWS.  The nodes are c3.4xlarge and the datadirs are RAID-0 over the two ephemeral SSD drives in that instance type.  These instances are all in the same region.

My simulated application is using only node1 in the cluster and is sysbench OLTP with 200 tables with 1M rows each.  This comes out to just under 50G of data.  The test application runs on a separate server with 32 threads.

The PXC cluster itself is tuned to best practices for Innodb and Galera performance


In my first test the cluster is a single member (receiving workload) and I am  joining node2.  This configuration is untuned for SST.  I measured the time from when mysqld started on node2 until it entered the Synced state (i.e., fully caught up).  In the log, it looked like this:

150724 15:59:24 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql ... lots of other output ... 2015-07-24 16:48:39 31084 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 4647341)

Doing some math on the above, we find that the SST took 51 minutes to complete.


One of the first things I noticed was that the –apply-log step on the Joiner was very slow.  Anyone who uses Xtrabackup a lot will know that –apply-log will be a lot faster if you give it some extra RAM to use while making the backup consistent via the –use-memory option.  We can set this in our my.cnf like this:

[sst] inno-apply-opts="--use-memory=20G"

The [sst] section is a special one understood only by the xtrabackup-v2 script.  inno-apply-opts allows me to specify arguments to innobackupex when it runs.

Note that this change only applies to the Joiner (i.e., you don’t have to put it on all your nodes and restart them to take advantage of it).

This change immediately makes a huge improvement to our above scenario (node2 joining node1 under load) and the SST now takes just over 30 minutes.


Another slow part of getting to Synced is how long it takes to apply transactions up to realtime after the backup is restored and in place on the Joiner.  We can improve this throughput by increasing the number of apply threads on the Joiner to make better use of the CPU.  Prior to this wsrep_slave_threads was set to 1, but if I increase this to 32  (there are 16 cores on this instance type) my SST now takes 25m 32s


xtrabackup-v2 supports adding a compression process into the datastream.  On the Donor it compresses and on the Joiner it decompresses.  This allows you to trade CPU for transfer speed.  If your bottleneck turns out to be network transport and you have spare CPU, this can help a lot.

Further, I can use pigz instead of gzip to get parallel compression, but theoretically any compression utilization can work as long as it can compress and decompress standard input to standard output.  I install the ‘pigz’ package on all my nodes and change my my.cnf like this:

[sst] inno-apply-opts="--use-memory=20G" compressor="pigz" decompressor="pigz -d"

Both the Joiner and the Donor must have the respective decompressor and compressor settings or the SST will fail with a vague error message (not actually having pigz installed will do the same thing).

By adding compression, my SST is down to 21 minutes, but there’s a catch.  My application performance starts to take a serious nose-dive during this test.  Pigz is consuming most of the CPU on my Donor, which is also my primary application node.  This may or may not hurt your application workload in the same way, but this emphasizes the importance of understanding (and measuring) the performance impact of SST has on your Donor nodes.

Dedicated donor

To alleviate the problem with the application, I now leave node2 up and spin up node3.  Since I’m expecting node2 to normally not be receiving application traffic directly, I can configure node3 to prefer node2 as its donor like this:

[mysqld] ... wsrep_sst_donor = node2,

When node3 starts, this setting instructs the cluster that node3 is the preferred donor, but if that’s not available, pick something else (that’s what the trailing comma means).

Donor nodes are permitted to fall behind in replication apply as needed without sending flow control.  Sending application traffic to such a node may see an increase in the amount of stale data as well as certification failures for writes (not to mention the performance issues we saw above with node1).  Since node2 is not getting application traffic, moving into the Donor state and doing an expensive SST with pigz compression should be relatively safe for the rest of the cluster (in this case, node1).

Even if you don’t have a dedicated donor, if you use a load balancer of some kind in front of your cluster, you may elect to consider Donor nodes as failing their health checks so application traffic is diverted during any state transfer.

When I brought up node3, with node2 as the donor, the SST time dropped to 18m 33s


Each of these tunings helped the SST speed, though the later adjustments maybe had less of a direct impact.  Depending on your workload, database size, network and CPU available, your mileage may of course vary.  Your tunings should vary accordingly, but also realize you may actually want to limit (and not increase) the speed of state transfers in some cases to avoid other problems. For example, I’ve seen several clusters get unstable during SST and the only explanation for this is the amount of network bandwidth consumed by the state transfer preventing the actual Galera communication between the nodes. Be sure to consider the overall state of production when tuning your SSTs.

The post Optimizing PXC Xtrabackup State Snapshot Transfer appeared first on MySQL Performance Blog.

When percona server 5.7 becomes available to downloading from percona repositories?

Lastest Forum Posts - August 20, 2015 - 7:21am
When percona server 5.7 ready-to-use packages becomes available to downloading under Ubuntu from percona repositories?

pt-online-schema-change for RDS instance with Read Replica behind VPC

Lastest Forum Posts - August 20, 2015 - 5:54am

I require running pt-online-schema-change for a MySQL 5.6 DB instance under AWS RDS.
The same instance is behind a VPC, and it has a Read Replica. AWS manages its VPC read replicas under a different subnet than the original instance.

How would you go about executing this tool in this case?
When I tried doing it using --recursion-method processlist, I received an error that it couldn't connect to the Read Replica - Cannot connect to D=<schema>,P=3306,h=<replica-ip>,p=...,u=<user>
No slaves found. See --recursion-method if host <db-instance-host> has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Can't use an undefined value as an ARRAY reference at /usr/bin/pt-online-schema-change line 7135, <STDIN> line 1.

Is it possible that the replication under the VPC works with a different IP than the "public IP" of the Replica, and that's why Percona is unable to discover the replica?

Thanks for your assistance,

load 9GB sql

Lastest Forum Posts - August 20, 2015 - 4:56am
Hello. I need to move database from one server (Mysql 5.1.73) to another Percona 5.6.25-73.1.
I made a dump
mysqldump --opt db > db.sql
copy db.sql
and trying to load it to mysql
mysql db < db.sql

db.sql - 9GB
DB engine - Innodb.
CentOS 7.1 64 Software raid 1 2xSata disk tps 150-250

I have already loading for 6 hours and it seems load only about half of data. Can i somehow speed up this process ?

Number of queries with response time slower than 3ms using pt-query-digest

Lastest Forum Posts - August 19, 2015 - 11:47pm
I want to know how many queries have response time more than 3 milliseconds. How can I know that using pt-query-digest ?
I'm using '--type tcpdump' and everything works fine but really don't know how to create a rule to get both "total number of queries" and "total number of queries with response time more than 3ms".
Any help would be appreciated.

Upgrade/Patch MySQL 5.5.42-37.1 Percona Server to 37.3

Lastest Forum Posts - August 19, 2015 - 2:20pm
I am new to MySQL world. We have MySQL 5.5.42-37.1 Percona Server. Does Percona releases Critical Patch Update ( Security related) very similar to Oracle database ( Critical Patch Update) ?
If not what is process to go from MySQL 5.5.42-37.1 to 37.3 ? Does process include installing new binaries, running some database patch script etc ? Can some one point to correct documentation ?

How much could you benefit from MySQL 5.6 parallel replication?

Latest MySQL Performance Blog posts - August 19, 2015 - 3:00am

I have heard this question quite often: “At busy times, our replicas start lagging quite frequently. We are using N schemas, so which performance boost could we expect from MySQL 5.6 parallel replication?” Here is a quick way to give you a rough estimate of the potential benefit.

General idea

In MySQL 5.6, parallelism is added at the schema level. So in theory, if you have N schemas and if you use N parallel threads, replication could be up to N times faster. This assumes at least 2 things:

  • Replication throughput scales linearly with the number of parallel threads.
  • Writes are evenly distributed across schemas.

Both assumptions are of course not realistic. But it is easy to know the distribution of writes, and that can already give you an idea about how much you could benefit from parallel replication.

Writes are stored in binary logs but it is much easier to work with the slow query log, so we can enable full slow query logging for some time with long_query_time = 0 and then use pt-query-digest to analyze the resulting log file.

An example

I have a test server with 3 schemas, and I’ve run some sysbench load on it to get a decent slow query log file. Once done, I can run this command:

pt-query-digest --filter '$event->{arg} !~ m/^select|^set|^commit|^show|^admin|^rollback|^begin/i' --group-by db --report-format profile slow_query.log > digest.out

and here is the result I get:

# Profile # Rank Query ID Response time Calls R/Call V/M Item # ==== ======== ============== ====== ====== ===== ==== # 1 0x 791.6195 52.1% 100028 0.0079 0.70 db3 # 2 0x 525.1231 34.5% 100022 0.0053 0.68 db1 # 3 0x 203.4649 13.4% 100000 0.0020 0.64 db2

In a perfect world, with 3 parallel threads and if each schema would handle 33% of the total write workload, I could expect a 3x performance improvement.

However here we can see in the report that the 3 replication threads will only work simultaneously 25% of the time in the best case (13.4/52.1 = 0.25). We can also expect 2 replication threads to work simultaneously for some part of the workload, but let’s ignore that for clarity.

It means that instead of the theoretical 200% performance improvement (3 parallel threads 100% of the time), we can hardly expect more than a 50% performance improvement (3 parallel threads 25% of the time). And the reality is that the benefit will be much lower than that.


Parallel replication in MySQL 5.6 is a great step forward, however don’t expect too much if your writes are not evenly distributed across all your schemas. The pt-query-digest trick I shared can give you a rough idea whether your workload is a good fit for multi-threaded slaves in 5.6.

I’m expecting much better results for 5.7, partly because parallelism is handled differently, but also because you can tune how efficient parallel replication will be by adjusting the binlog group commit settings.

The post How much could you benefit from MySQL 5.6 parallel replication? appeared first on MySQL Performance Blog.

Featured Talk: The Future of Replication is Today: New Features in Practice

Latest MySQL Performance Blog posts - August 18, 2015 - 4:17pm

In the past years, both MySQL 5.6, MySQL 5.7 and MariaDB 10 have been successful implementing new features. For many DBAs, the “old way” of replicating data is comfortable so taking the action to implement these new features seems like a momentous leap rather then a simple step. But perhaps it isn’t that complicated…

Giuseppe Maxia, a Quality Assurance Architect at VMware and loyal member of the Percona Live Conference Committee will be presenting “The Future of Replication is Today: New Features in Practice” at the Percona Live Data Performance Conference this September in Amsterdam.
Percona’s Community Manager, Tom Diederich had an opportunity to catch up with Giuseppe last week and get an in-depth look at some of the items Giuseppe will be covering in his talk in addition to getting his take on some of the hot sessions to hit while at the conference.  This is how it went:

(Hint: Read to the end to find a special discount code) 


Tom: Your talk is titled, “The Future of Replication is today: new features in practice.” What are the top 3 areas in which replication options have improved in MySQL 5.6, MySQL 5.7, and MariaDB 10?Giuseppe: Replication has been stagnant for over 10 years. Before MySQL 5.6, the only important change in the technology was the introduction of row-based replication in 2008. After that, we had to wait till 2013 to see global transaction identifiers in MySQL 5.6, followed by the same feature, with different implementation in 2014 with MariaDB 10. GTID has been complemented, in both flavors, with crash-safe replication tables, which is a feature that guarantees a reliable resume of replication after a server failure. There is also the parallel applier, a minor feature that has been implemented in both MySQL 5.6 and MariaDB, and improved in latest versions, although it seems to lack proper support for monitoring. The last feature that was introduced in MySQL 5.6 and MariaDB 10 is multi-source replication, i.e. the ability of replicating from multiple masters to a single slave. In both editions, the implementation is quite simple, and not so different from what DBAs are used to do for regular replication.Tom: For DBAs, how difficult will it be to make the change from the “old way” of replicating data — to stop using the same comfortable features that have been around for several years — and put into practice some of the latest features?Giuseppe: The adoption of new features can be deceptively simple. For example, GTID in MariaDB comes out of the box and its adoption could be as easy as running a backup followed by a restore, but it can produce unpleasant results if you try to combine this feature with multi-source replication without planning ahead. That said, the transition could be simpler than its counterpart in MySQL.MySQL 5.6 and 5.7 require some reconfiguration to run GTID, and users can face unpleasant failures due to the complexity of the rules applying to this feature. They will need to read the manual thoroughly and test the deployment extensively before trusting an upgrade in production.For multi-source replication, the difficulties are, in my experience, hidden in the users expectations. When speaking about multi-source (or multi-masters, s it is commonly referred to), many users have the mistaken expectation that they can easily insert anything in multiple masters as if they were doing it in a single server. However, the nature of asynchronous replication and the current implementation of multi-source topologies do not handle conflicts, and this fact will probably surprise and anger the early adopters.Tom: What is still missing in replication technology? How can MySQL improve?Giuseppe: There are two areas where the current implementation is lacking. The first one is monitoring data: while new features have been adding up to replication, there is not enough effort made to cover the monitoring needs. The current way of monitoring replication is hard-wired around the original replication feature, and little has been done to give the users a deeper view of what is going on. With the latest releases at our disposal, we can run parallel replication using multiple masters, and yet we have very little visibility on what goes on inside the dozen of threads that the new features can unchain inside a single slave. It’s like flying a F1 racing car with the dashboard of a Ford model-T. MySQL 5.7 has moved a few steps in that direction, with the new replication tables in performance_schema, but it is still a drop in the ocean compared to what we need.The second area where replication is still too much tied with its past is in heterogeneous replication. While relational databases are still dominating the front-end of the web economy, its back-end is largely being run by different structures, such as Hadoop, MongoDB, Cassandra. Moving data back and forth between the relational storage and its growing siblings has become an urgent need. There have been a few sparks of change in this direction, but nothing that can qualify as promising changes.Tom: Which other session(s) are you most looking forward to besides your own?Giuseppe: I am always interested in the sessions that explain and discuss new features. I am most interested in the talks by Oracle engineers, who have been piling up many features in the latest years, and I am sure they have something more up their sleeve that will appear at the conference. I also attend eagerly sessions about complementary tools, which are usually highly educational and often give me more ideas.

Want to read more on the topic? Visit Giuseppe’s blog:

 MySQL Replication Monitoring 101

The Percona Live Data Performance Conference is the premier event for the rich and diverse MySQL, NoSQL and data in the cloud ecosystems in Europe. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and IoT (Internet of Things) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

This year’s conference will feature one day of tutorials and two days of keynote talks and breakout sessions related to MySQL, NoSQL and Data in the Cloud. Attendees will get briefed on the hottest topics, learn about building and maintaining high-performing deployments and hear from top industry leaders.

The Percona Live Europe Data Performance Conference will be September 21-23 at the Mövenpick Hotel Amsterdam City Centre.

Register using code “FeaturedTalk” and save 20 euros off of registration!

Hope to see you in Amsterdam!

The post Featured Talk: The Future of Replication is Today: New Features in Practice appeared first on MySQL Performance Blog.

innobackupex support for MySQL 5.7.8

Lastest Forum Posts - August 18, 2015 - 3:40pm
Is there a way to overcome this issue? I'm just testing out the tool.

150818 22:25:14 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;my sql_socket=/tmp/mysql.sock' as 'root' (using password: YES).
150818 22:25:14 innobackupex: Connected to MySQL server
innobackupex: Error: Unsupported server version: '5.7.8-rc' (InnoDB version: '5.7.8'). Please report a bug at https://bugs.launchpad.net/percona-xtrabackup

Percona Cluster - Single node crashed and could not join the cluster back again

Lastest Forum Posts - August 18, 2015 - 7:56am
I started a percona cluster with 3 nodes in docker containers. I use version 5.6.24. The cluster worked fine. After this I had the below observations.
i) When I uninstalled all the nodes, and installed them back again, I could see them reforming the cluster without any issues.
ii) When I brought down a single node from the cluster, the remaining 2 nodes formed the quorum and continued to provide service. Down the line, when I brought this third node up again, it is not able to join the existing cluster which it had been part of previously. I verified that I do mysql start with the correct gcom:// of the two existing peers in the cluster. But inspite of that, this third node could not join back the cluster. I could see the below error from the Mysql logs.
[ERROR] WSREP: gcs/src/gcs_group.cpp:void group_post_state_exchange(gcs_group_t*) ():319: Reversing History: 0 -> 0, this member has applied 140082893529968 more events than primary component. Data loss is possible. Aborting. [Note] WSREP: mysqld: Terminated
iii) Now, when I manually removed the grastate.dat file from the third node's mysql host mounted data directory, now it could rejoin the cluster properly. Is this the expected way to make it work ? or do I miss something ? What is the method for rejoining a crashed node back into an existing cluster apart from specifying the proper gcom:// startup peer info.
seeking your expert advice on this. thanks in advance.

ibd file very big when convert to innodb engine

Lastest Forum Posts - August 18, 2015 - 5:35am
I have a special question, the innodb table become very big when I convert table from MyISAM to InnoDB, the table like this:
original table is MyISAM engine, data file and index file size is:
-rw-r----- 1 mysql mysql 8.7K Aug 17 13:34 udb_home_follow.frm
-rw-r----- 1 mysql mysql 1.9G Aug 17 13:35 udb_home_follow.MYD
-rw-r----- 1 mysql mysql 1.5G Aug 17 13:35 udb_home_follow.MYI : CREATE TABLE `udb_home_follow` ( `uid` int(11) unsigned NOT NULL DEFAULT '0', `username` varchar(255) NOT NULL DEFAULT '', `followuid` int(11) unsigned NOT NULL DEFAULT '0', `fusername` varchar(255) NOT NULL DEFAULT '', `bkname` varchar(255) NOT NULL DEFAULT '', `status` tinyint(1) NOT NULL DEFAULT '0', `mutual` tinyint(1) NOT NULL DEFAULT '0', `dateline` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`uid`,`followuid`), KEY `idx_fuid` (`followuid`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 there is about 60 million records in udb_home_follow table, most of the records seems like this:
: +----------+-----------------------+-----------+--------------------+--------+--------+--------+------------+ | uid | username | followuid | fusername | bkname | status | mutual | dateline | +----------+-----------------------+-----------+--------------------+--------+--------+--------+------------+ | 6235195 | curapikt87 | 14644915 | coolmayi | | 0 | 0 | 1347433303 | | 17425562 | 1442453108xia | 14919966 | zczczczc0 | | 0 | 0 | 1347433673 | +----------+-----------------------+-----------+--------------------+--------+--------+--------+------------+ Name: udb_home_follow Engine: MyISAM Version: 10 Row_format: Dynamic Rows: 62833131 Avg_row_length: 31 Data_length: 1948544532 Max_data_length: 281474976710655 Index_length: 1563675648 Data_free: 0 after I execute any of the following sql query, new innodb become very huge :
: alter table udb_home_follow engine = innodb; insert into udb_innodb select * from udb_home_follow; innodb table, it seems like table Number of lines increased tenfold, and ibd file become to 267G:
: Engine: InnoDB Version: 10 Row_format: Compact Rows: 641272410 Avg_row_length: 73 Data_length: 237746100224 Max_data_length: 0 Index_length: 33729784832 Data_free: 3306556672 -rw-rw---- 1 mysql mysql 8.7K Aug 18 17:45 udb_innodb.frm -rw-rw---- 1 mysql mysql 267G Aug 18 20:16 udb_innodb.ibd my system env :
DELL PowerEdge R610, RAM 32G system: CentOS release 6.4 (Final) Percona-Server-5.5.33-rel31.1-566.Linux.x86_64 or Percona-Server-5.5.23-rel25.3-240.Linux.x86_64 I have no idea for this error message, thanks a lot for everyone help.