SQL Transactions and ACID

2016-11-27 6 min read Development Alexey Bogdanov

Transaction is a sequence of one or more SQL operations treated as a logical unit. Transaction could be completed either entirely successfully or not executed at all. Transactions and locking are important aspects of database for supporting multiple users. Into this tutorial we’ll take a look atTransaction, ACID, Isolation Levels.

ACID

Transaction must have the following properties called ACID, which means the following:

Atomicity: means that transaction must happen as a single unit and exactly once. SQL statements contained in a transaction make up a single unit of work. And if one part of the transaction fails, then the entire transaction fails, and the database state is left unchanged.
E.g. when we transfer money from one account to another it’s not acceptable to debit one account and do not credit the second account because of any errors. If any error occurs, money should be returned to the first(credit) account.
Consistency: means that at the end of transaction the database should be left in consistent state.
E.g. after transferring $10 from one account to another the funds in the first account should be decreased by $10 and funds in the second account should be increased by $10;
Isolation: means that each transaction must be independent of all other transaction, no matter how many other transactions are there in the system. In other words, DB should guarantee that result of execution parallel transactions should be the same as if transactions were executed serially, one after another. Isolation is typically defined at database level as a property that describes how/when the changes made by one operation become visible to other. Relational Databases support the following isolation levels:
Read uncommitted
Read committed
Repeatable read
Serializable
Durability: means that once a transaction has been committed, the database changes are preserved, even if the machine on which the database software is running crashes later. This is achieved by using a transaction log file.
E.g. Once money has been successfully transferred between accounts, it must stay transferred, even if the power fails and the machine with the database has an uncontrolled power down.

How to start transaction

We took a look at such transaction properties as ACID. Now let’s take a look at how to create, commit and rollback transactions in PostgreSQL.

To start transaction, BEGIN key word should be used. COMMIT is used for saving the result of one’s work. ROLLBACK is used for aborting transaction without saving the result of one’s work. Here’s a snippet of code with statement which is wrapped into transaction:

BEGIN
CREATE TABLE vehicle
(
    vehicle_id SERIAL       NOT NULL,
    make       VARCHAR(100) NOT NULL,
    model      VARCHAR(100) NOT NULL,
    price      NUMERIC(5)   NOT NULL
);

INSERT INTO vehicle (make, model, price)
VALUES ('Toyota', 'Camry', 25560);
COMMIT;

Here’s an example with rolled back transaction:

BEGIN
INSERT INTO vehicle (make, model, price)
VALUES ('Audi', 'A4', 42825);
INSERT INTO vehicle (make, model, price)
VALUES ('Acura', 'ILX', 30840);
INSERT INTO vehicle (make, model, price)
VALUES ('BMW', '5 Series ', 67470);
ROLLBACK;

If we select the data from the table we’ll see that records for Audi, Acura and MBW were not inserted into the vehicle table.

Isolation levels

Isolation determines how transaction integrity is visible to other users and systems. As was mentioned early, there 4 isolation levels defined in SQL standard:

Weak isolation levels

Read committed
Read uncommitted
Repeatable Read

Have lower overhead and consistency guarantee but higher concurrency.

Strict isolation levels

Serializable

Has a lot of overhead and low concurrency.

PostgreSQL supports only Read Committed and Serializable isolation levels because

this is the only sensible way to map the standard isolation levels to the multiversion concurrency control architecture.

Read uncommitted Isolation Level

Read uncommitted isolation level is the lowest isolation level. It guarantees only the absence of lost updates. In this level transactions can see changes to data made by other transactions that are not yet committed. During SELECT query execution we can retrieve data which actually doesn’t exists because transaction was rolled back.

Read Committed isolation level

Read Committed isolation level is used by default in many relational databases. Into this isolation level SELECT query may see only the data committed before the SELECT query begun. If any uncommitted data or committed changes occur during query execution, SELECT statement will never see it. It happens so because SELECT query sees a snapshot of the database as of the instant the query begins to run. But SELECT query may see the changes which were made within its own transaction, even though they are not yet committed. UPDATE and DELETE queries behave into the same way as SELECT: they see only the data committed before the UPDATE and DELETE queries executed. Because of the such rule it’s possible to get inconsistent data during snapshot.

Repeatable read Isolation Level

Repeatable read means that a transaction re-reads data it has previously read and finds that data has been modified by another transaction.
E.g. When a transaction (e.g. named T1) reads any data this data will be blocked from modification by other transactions until T1 will be finished.

Serializable Isolation Level

Serializable isolation level is the strictest transaction isolation level. Into this isolation level transactions are completely isolated from each other and each transaction executes one after another, serially, rather than concurrently (into the way as if other transactions are not exists). In case when application uses Serializable isolation level must be ready for retrying transactions due to serialization failures.

Incorrectly chosen level of isolation may impact significantly on database performance or may lead to the inconsistency of data during selection or update. To decrease the impact of isolation level on performance SQL standard defines isolation levels in terms of undesirable phenomena which may happen in multiuser database. Here’s a list of major problems:

Dirty read: transaction can read data which was written by another transaction even if another transaction is not committed yet,
Dirty read permits other transactions to see data that has not yet been committed to the database. This means they can see changes that are later discarded, because of the ROLLBACK command.
Non-repeatable read: is very similar to the Dirty Read but is more restrictively defined. When rereading data, a transaction can find that the data has been modified by another transaction that has just committed. The same query executed twice can return different values for the same rows.
Phantom read: is similar to non-repeatable read, but it is related to new data, created by another transaction. The same query executed twice can return different numbers of records.

Isolation levels vs READ phenomena

Change Isolation level

READ COMMITTED isolation level is set by default in PostgreSQL. If need it could be changed to Serializable isolation level:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Additional reading resources

Transaction Isolation (official PostgreSQL documentation)
Isolation (database systems)
Introduction to Databases Videos from MOOC available here