Tagged with: [ innodb ] [ MySQL ] [ transaction ]
When asking what THE advantage of InnoDB over other MySQL engines like MyISAM is, then 9 out of 10 times the answer will be that InnoDB supports transactions. And it’s true. But there is more about transactions than meets the eye. Let’s explore one of the most difficult area’s: isolation levels.
First, some small theory on RDBMSes. InnoDB is ACID compliant system. This means that InnoDB is:
A transaction or group of statements are either all executed successfully, or all will fail. It is not possible that a query succeeds, while the next one in the transaction fails. This is an important aspect when you are in the process of transferring money from one account number to another (you don’t want to loose the money in case the database server fails).
The database is consistent before and after a transaction. This means that for instance foreign key relations are valid before and after a transaction. Allthough the database CAN be in a inconsistent state during the transaction, it’s ok as long as at the end of the transaction the database is made consistent again.
Transaction A cannot interfere with transaction B. This means all the data we update in transaction A is not visible in transaction B until transaction A actually commits the data. This I believe is the hardest property for databases because we need to find a balance between (true) isolation and performance. We talk about this during this post.
Data in the database should be permanently stored. Or, better said: when the database tells the client (you) the data has been stored or committed, the data should actually have been stored. This however, is not the same as the data being stored on the disk, but for this blog post you can consider it as being saved to disk.
We will talk in the post about the 3rd property: isolation. It looks like a simple one: I do something, somebody else does another thing and we both are completely seperated from eachother. Unfortunally, this is not the case in databases. We share the same data, we might update the same data at the same time and this can result in conflicts, just like in a code repository. We’ll discuss what can happen, what the catches are and how InnoDB (and ultimally you) can avoid it..
InnoDB and isolation
Let’s talk a bit in detail about the isolation-property of InnoDB. As said: transactions are completely isolated from each other. Suppose you have a database with 2 concurrent connections serving transaction A (TX A) and transaction B (TX B), and field_a in table_a holds the value ‘10’:
TX A: start transaction TX B: start transaction TX A: update table_a set field_a=field_a+10; TX B: select field_a from table_a;
Now, what kind of data should transaction B see? Should it see field_a with the 10-values, the 20-values, or should transaction B block until transaction A is done? Or even something else? The correct answer is that it depends on what you need, and MySQL lets you decide for yourself. This is called the “transaction isolation level” which you can set in MySQL on a global, session or only for the current transaction.
To see the current transacton levels issue the following sql query:
SELECT @@GLOBAL.tx_isolation, @@tx_isolation;
It will display the global transaction level and the current transaction level that is used. You can only set the global transaction level when you have SUPER privileges.
To set the level:
set [global|session] transaction isolation level [level];
In MySQL/InnoDB there are 4 kind of isolation levels ranging from the lowest to the highest: read uncommitted, read committed, repeatable reads and serializable. The higher the level, the more performance loss you will get, but the more “isolated” the transactions will be. If your application has no problem in seeing data from other transactions in some cases, it can be perfectly allright to lower the isolation levels so you gain a bit of performance and speed. Let’s discuss all levels in detail:
Transaction B will see all the mutations that transaction A makes. You might say that there is no real isolation at all and this comes with all kind of problems that are solved by the other levels:
First of all, even though you see the mutations from transaction A, it is still uncomitted data. This means that when transaction A does a rollback, all data will be restored to what it was just like before starting transaction A. So in transaction B you can see those records change data during a transaction. That is called a “dirty read”.
Secondly, when transaction A changes data and commits that data, transaction B still sees that data as changed during it transaction. Basically: it gets different results with the same query, which is called a non-repeatable read and allthough this is similar as a dirty read, we categorize it differently.
And finally, it is possible that transaction A inserts new rows in its transaction. Those records are called phantom rows and off course can dissappear when the transaction does a rollback.
Basically this level will take away possibility of dirty reads. It is not possible to see the data from transaction A until it is commited. Note that transaction B won’t see the data until it has committed it’s own data. Non-repeatable reads and phantoms are still possible since this level does not address those problems.
We take away the non-repeatable reads so from this level on we don’t see ANY data from other transactions until we are actually done with our transaction. Now, depending on the database system you use, phantom reads could still possible, but this is not the case for InnoDB.
So, since the “repeatable read” level takes care of all three problems, why do we have a fourth level? Basically this mode can be described as a “paranoid” mode since it will lock all records that a transaction select for all the other transactions. It means that when a transaction updates a record, it becomes impossible for others to select that record. It’s a great way to make absolutely sure that no transaction overwrites the actions of other transactions, but it comes with the costs of lots of record locking.
Some SQL examples
Let’s try some examples with 2 transactions and see how they react depending on the transaction levels. Assume that ‘test’ is a table with the following data:
+----+-----+ | id | val | +----+-----+ | 1 | 8 | | 2 | 8 | +----+-----+
TX A: start transaction; TX B: set session transaction isolation level read uncommitted; TX B: start transaction; TX A: select * from test; -- val = 8 TX B: select * from test; -- val = 8 TX A: update test set val = val + 1; -- val = 9 TX B: select * from test; -- val = 9, dirty read TX A: rollback; TX B: select * from test; -- val = 8 TX B: commit;
As you can see it’s possible for TX B to see the data that was modified by TX A. However, after the rollback of TX A, the data is reverted back.
TX A: start transaction; TX B: set session transaction isolation level read committed; TX B: start transaction; TX A: select * from test; -- val = 8 TX B: select * from test; -- val = 8 TX A: update test set val = val + 1; -- val = 9 TX B: select * from test; -- val = 8, No dirty read! TX A: commit TX B: select * from test; -- val = 9, commited read
This level shows the dirty read is not possible, but after the commit of TX A, the data is available to TX B.
TX A: start transaction; TX B: set session transaction isolation level repeatable read; TX B: start transaction; TX A: select * from test; -- val = 8 TX B: select * from test; -- val = 8 TX A: update test set val = val + 1; -- val = 9 TX B: select * from test; -- val = 8 TX A: commit TX B: select * from test; -- val = 8, repeatable read! TX B: commit; TX B: select * from test; -- val = 9 (from tx A)
After the commit of TX A we still see the unchanged data. Only after a commit (or rollback) from TX B we see that the data has changed.
TX A: start transaction; TX B: set session transaction isolation level serializable; TX B: start transaction; TX A: select * from test; -- val = 8 TX A: update test set val = val + 1; -- val = 9 TX B: select * from test; -- LOCKED, NO OUTPUT TX A: commit; -- Unlocked TX B TX B: select * from test; -- val = 8 (repeatable read!) TX B: commit; TX B: select * from test; -- val = 9 (now we see TX A)
You see that after we have done an update we cannot select the data in the other transaction anymore. We have to wait until TX A commits or rollbacks the transaction. After TX A has committed it’s data, TX B still can’t see it because otherwise it would be a non-repeatable read. We have to commit our own transaction before we can see the actual results.
Isolation levels are an aspect that for normal engineers or administrators is not really that important. The standard “repeatable read” level that InnoDB uses is good enough for maybe 99% of the databases outthere. However, loosening the levels can increase your database performance but you can loose some safety. It’s up to you to decide if repeatable-reads or phantoms are an issue or not…