Tag - disk seek

Indexing Big Data – NSF Workshop on Research Directions in the Principles of Parallel Computation

I attended the NSF Workshop on Research Directions in the Principles of Parallel Computation in Pittsburgh on 6/27/12. The workshop brought together researchers from academia and industry to explore visions for the future of parallel computing. I was one of 17 invited speakers. We were asked to address the question: “what are three big research […]

Read more

OldSQL Tricks or NewSQL Treats

Why do B-trees need “Tricks” to work?
Marko Mäkelä recently posted a couple of “tips and tricks” you can use to improve InnoDB performance. Tips and tricks. A general purpose relational database like MySQL shouldn’t need “tips and tricks” to perform well, and I lay the blame on design choices that were made in […]

Read more

On “Replace Into”, “Insert Ignore”, Triggers, and Row Based Replication

In posts on June 30 and July 6, I explained how implementing the commands “replace into” and “insert ignore” with TokuDB’s fractal trees data structures can be two orders of magnitude faster than implementing them with B-trees. Towards the end of each post, I hinted at that there are some caveats that complicate the […]

Read more

On “Replace Into”, “Insert Ignore”, and Secondary Keys

In posts on June 30 and July 6, I explained how implementing the commands “replace into” and “insert ignore” with TokuDB’s fractal trees data structures can be two orders of magnitude faster than implementing them with B-trees. Towards the end of each post, I hinted at that there are some caveats that complicate the […]

Read more

Why “insert … on duplicate key update” May Be Slow, by Incurring Disk Seeks

In my post on June 18th, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. I previously explained why it would be better to use “replace into” or to use “insert ignore” over normal inserts. In this post, I […]

Read more

Making “Insert Ignore” Fast, by Avoiding Disk Seeks

In my post from three weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use “replace into” or “insert ignore” over normal inserts, […]

Read more

Making “Replace Into” Fast, by Avoiding Disk Seeks

In this post two weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use “replace into” or “insert ignore” over normal inserts, because […]

Read more

Making Updates Fast, by Avoiding Disk Seeks

The analysis that shows how to make deletions really fast by using clustering keys and TokuDB’s fractal tree based engine also applies to make updates really fast. (I left it out of the last post to keep the story simple). As a quick example, let’s look at the following statement:

Shell

update foo set price=price+1 where […]

Read more

Disk seeks are evil, so let’s avoid them, pt. 4

Continuing in the theme from previous posts, I’d like to examine another case where we can eliminate all disk seeks from a MySQL operation and therefore get two orders-of-magnitude speedup. The general outline of these posts is:

B-trees do insertion disk seeks. While they’re at it, they piggyback some other work on the […]

Read more

Making Deletions Fast, by Avoiding Disk Seeks

In my last post, I discussed how fractal tree data structures can be up to two orders of magnitude faster on deletions over B-trees. I focused on the deletions where the row entry is known (the storage engine API handler::delete_row), but I did not fully analyze how MySQL delete statements can be fast. In […]

Read more