SQL Query explain plan and Optimization

Query optimization is determination by relational database the most efficient way to execute a given query by considering the possible query plans. Mostly query optimizer is used by databases for optimizing the query in the best possible way. User doesn't work directly with query optimizer. But some queries can be optimized by exploring its execution plan.

Prepare test data

Before we dive into queries optimizations and explain plan, let's prepare a table with test data. In our example we'll be using vehicle table which stores make, model, year and price. The vehicle table looks into the following way:

CREATE TABLE vehicle (
	vehicle_id SERIAL       NOT NULL PRIMARY KEY,
	make       VARCHAR(100) NOT NULL,
	model      VARCHAR(100) NOT NULL,
	price      NUMERIC(5)   NOT NULL,
	year       NUMERIC(4)
);

For filling in vehicle table we'll be using fillInVehicle(rowsToInsert INTEGER) function which inserts randomly generated data into vehicle table according to specified rowsToInsert parameter.

Source code for creating table and fillInVehicle function could be found here:vehicle_table_generator.sql
How to use generator described into README file.

Let's insert 10 000 record to the vehicle table:

SELECT fillInVehicle(10000);

Explain command

EXPLAIN is a command that shows how tables are scanned and cost of the query. It helps to figure out why a certain query is slow. E.g. EXPLAIN can show how changing IN statement to JOIN will affect on query performance.
The EXPLAIN command shows the execution plan of a statement, how tables are scanned, how tables are joined, estimated number of rows and which JOIN method is used.

The basic syntax for EXPLAIN command is:

EXPLAIN sql_statement

where sql_statement is any SQL query for which we want to get execution plan.

Explain example for simple select

Let's select all records form the vehicle table and take a look at explain plan:

EXPLAIN SELECT * FROM vehicle;

The output will look like this:

Seq Scan on vehicle  (cost=0.00..184.00 rows=10000 width=32)

rows

rows=10000 is number of selected rows

cost

cost is information on the estimated startup and total cost of each plan node, as well as the estimated number of rows and the estimated width of each row. Generally speaking it's an effort required to execute the query.
Cost for retrieving the first row is 0.00 and cost for retrieving all rows is 184.00. Cost for retrieving all rows calculates into the following way:

seq _ page _ cost * number_of_relation_pages) * cpu _ tuple _ cost * number_of_rows)

where
page_cost and cpu_tuple_cost are planner-related configuration settings.
seq _ page _ cost and number_of_relation_pages could be found into pg_class table.

We can retrieve the same results into the following way:

SELECT relpages*current_setting('seq_page_cost')::numeric +
reltuples*current_setting('cpu_tuple_cost')::numeric as query_cost
FROM pg_class WHERE relname='vehicle';

The output will look like:

query_cost
   184

width

width is the average width of the tuple in bytes. This information is stored into pg_stats table and could be retrieved into the following way:

SELECT sum(avg_width) FROM pg_stats WHERE tablename = 'vehicle';

Average width for each column could be retrieved into the following way:

SELECT attname, avg_width FROM pg_stats WHERE tablename = 'vehicle';

The output will look into the following way:

attname avg_width
vehicle_id 4
make 8
model 9
price 6
year 5

Explain with Analyze

Analyze collects statistics about the contents of tables in the database, and stores the results in the pg_statistic system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.
It should be kept in mind that EXPLAIN ANALYZE shows the actual runtime of the query and can take a long time for large tables and complex queries.
Full syntax for Analyze looks into the following way:

ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ]

Detailed explanation of parameters could be found in official documentation: ANALYZE

Now let's analyze the following query:

EXPLAIN ANALYZE SELECT * FROM vehicle;

The output will look into the following way:

Seq Scan on vehicle  (cost=0.00..184.00 rows=10000 width=32) (actual time=0.013..1.059 rows=10000 loops=1)
Planning time: 0.658 ms
Execution time: 1.310 ms

In the output we can the the additional information such as Planning and Execution times are present.
Planning time is how much time potentially it will take to run the query.
Execution time how much time actually required for running the query.

How to optimize queries

Here is a list of useful materials related to optimization queries on different relational databases:

  1. Series of SQL Server Query Tuning Best Practices
  2. Query Plan Analysis
  3. PostgreSQL Performance Optimization
  4. Using EXPLAIN PLAN in Oracle DB
  5. Real-World Performance Videos (Oracle)

Alexey Bogdanov

Read more posts by this author.

Subscribe to Development notes

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!