So far, I’ve analyzed point and range queries. Now it’s time to talk about insertions and deletions. We’ll call the combination updates. Updates come in two flavors, and today we’ll cover both.
Depending on the exact settings of your database, the updates give a varying amount of feedback. For example, when a key is deleted, all rows with that key are deleted (assuming the database allows duplicate keys). The normal behavior is to return the number of rows deleted. The normal behavior when deleting a key that has no corresponding rows in the database is to return an error message. On insertion, one can allow duplicate or not. In the latter case, the storage engine returns an error message if a duplication insertion is attempted.
We’ll see that the details of error messages have a profound influence on the lower-bound arguments I’ve been making (and we’ll see a bit later that they have a profound influence on how quickly you can actually implement these operations). Here, we define two classes of update operations:
These are the insertions and deletions that come by default in MySQL, and many other databases.
- Insertions let you know if there’s already a row with that key in the database.
- Unique-key indices: Returns an error on the insertion of a repeated key.
- Deleting a key will tell you how many rows had that key.
- Deletion of a non-existing key will return an error.
As I say, these are the types of updates that we are all used to.
The alternative to strict updates is lenient updates. In a sense, they are equivalent, in that there’s nothing you can do with one that you can’t do with the other. It’s just going to turn out that lenient updates can be made to run much faster.
- Insertions do not let you know if there’s already a row with that key in the database.
- Unique-key indices: Overwrites with new value on the insertion of a repeated key. Does not return an error.
- Deleting a key will not tell you how many rows had that key.
- Deletion of a non-existing key will not return an error.
What’s the upshot? I’ve been presenting operations in terms of lower bounds that are dominated by disk seeks (slow!) versus those dominated by disk bandwidth (much faster). See if you can use this analysis to understand the behavior of these two types of updates.