SQL Query explain plan and Optimization
Query optimization
is determination by relational database the most efficient way to execute a given query by considering the possible query plans. Mostly query optimizer
is used by databases for optimizing the query in the best possible way. User doesn’t work directly with query optimizer. But some queries can be optimized by exploring its execution plan.
Prepare test data
Before we dive into queries optimizations and explain plan, let’s prepare a table with test data. In our example we’ll be using vehicle
table which stores make
, model
, year
and price
. The vehicle
table looks into the following way:
CREATE TABLE vehicle (
vehicle_id SERIAL NOT NULL PRIMARY KEY,
make VARCHAR(100) NOT NULL,
model VARCHAR(100) NOT NULL,
price NUMERIC(5) NOT NULL,
year NUMERIC(4)
);
For filling in vehicle
table we’ll be using fillInVehicle(rowsToInsert INTEGER)
function which inserts randomly generated data into vehicle
table according to specified rowsToInsert
parameter.
Source code for creating table and fillInVehicle
function could be found here:vehicle_table_generator.sql
How to use generator described into README file.
Let’s insert 10 000
record to the vehicle
table:
SELECT fillInVehicle(10000);
Explain command
EXPLAIN
is a command that shows how tables are scanned and cost of the query. It helps to figure out why a certain query is slow. E.g. EXPLAIN
can show how changing IN
statement to JOIN
will affect on query performance.
The EXPLAIN
command shows the execution plan of a statement, how tables are scanned, how tables are joined, estimated number of rows and which JOIN
method is used.
The basic syntax for EXPLAIN
command is:
EXPLAIN sql_statement
where sql_statement
is any SQL query for which we want to get execution plan.
Explain example for simple select
Let’s select all records form the vehicle
table and take a look at explain plan:
EXPLAIN SELECT * FROM vehicle;
The output will look like this:
Seq Scan on vehicle (cost=0.00..184.00 rows=10000 width=32)
rows
rows=10000
is number of selected rows
cost
cost
is information on the estimated startup and total cost of each plan node, as well as the estimated number of rows and the estimated width of each row. Generally speaking it’s an effort required to execute the query.
Cost for retrieving the first row is 0.00
and cost for retrieving all rows is 184.00
. Cost for retrieving all rows calculates into the following way:
seq _ page _ cost * number_of_relation_pages) * cpu _ tuple _ cost * number_of_rows)
where
page_cost
and cpu_tuple_cost
are planner-related configuration settings.
seq _ page _ cost
and number_of_relation_pages
could be found into pg_class
table.
We can retrieve the same results into the following way:
SELECT relpages*current_setting('seq_page_cost')::numeric +
reltuples*current_setting('cpu_tuple_cost')::numeric as query_cost
FROM pg_class WHERE relname='vehicle';
The output will look like:
query_cost
184
width
width
is the average width of the tuple in bytes. This information is stored into pg_stats
table and could be retrieved into the following way:
SELECT sum(avg_width) FROM pg_stats WHERE tablename = 'vehicle';
Average width for each column could be retrieved into the following way:
SELECT attname, avg_width FROM pg_stats WHERE tablename = 'vehicle';
The output will look into the following way:
Explain with Analyze
Analyze
collects statistics about the contents of tables in the database, and stores the results in the pg_statistic
system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.
It should be kept in mind that EXPLAIN ANALYZE
shows the actual runtime of the query and can take a long time for large tables and complex queries.
Full syntax for Analyze
looks into the following way:
ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ]
Detailed explanation of parameters could be found in official documentation: ANALYZE
Now let’s analyze the following query:
EXPLAIN ANALYZE SELECT * FROM vehicle;
The output will look into the following way:
Seq Scan on vehicle (cost=0.00..184.00 rows=10000 width=32) (actual time=0.013..1.059 rows=10000 loops=1)
Planning time: 0.658 ms
Execution time: 1.310 ms
In the output we can the the additional information such as Planning
and Execution
times are present.
Planning time
is how much time potentially it will take to run the query.
Execution time
how much time actually required for running the query.
How to optimize queries
Here is a list of useful materials related to optimization queries on different relational databases: