How to Debug Slow PostgreSQL Queries: A Practical Walkthrough
The query was taking 14 seconds. It should have taken under 100 milliseconds.
The setup
The application was a multi-tenant SaaS. The slow query was a dashboard summary: total orders, revenue, and average order value per customer for the last 90 days. The tables:
orders— 4.1 million rows, withcustomer_id,created_at,total_amount,statuscustomers— 312,000 rows, withid,account_id,created_atorder_items— 18 million rows, withorder_id,product_id,quantity,unit_price
SELECT
c.id AS customer_id,
c.email,
COUNT(o.id) AS order_count,
SUM(o.total_amount) AS revenue,
AVG(o.total_amount) AS avg_order_value
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
AND o.created_at > NOW() - INTERVAL '90 days'
AND o.status != 'cancelled'
WHERE c.account_id = $1
GROUP BY c.id, c.email
ORDER BY revenue DESC NULLS LAST;
Looked reasonable. Ran it. 14.2 seconds.
Step 1 — EXPLAIN ANALYZE
First thing, always:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT ...; -- the full query
The important parts from the output:
Hash Join (cost=12483.21..89432.18 rows=89201 width=72)
(actual time=312.441..13891.223 rows=89201 loops=1)
Hash Cond: (o.customer_id = c.id)
Buffers: shared hit=4821 read=128432
-> Seq Scan on orders (cost=0.00..71234.18 rows=1823401 width=28)
(actual time=0.019..8234.112 rows=1823401 loops=1)
Filter: ((status <> 'cancelled') AND (created_at > ...))
Rows Removed by Filter: 2276599
Buffers: shared hit=891 read=124882
-> Seq Scan on customers (cost=0.00..8234.11 rows=18321 width=44)
(actual time=0.012..231.443 rows=18321 loops=1)
Filter: (account_id = '550e8400...'::uuid)
Rows Removed by Filter: 293679
Two sequential scans. The orders table scan was reading 124,882 blocks from disk (read=124882) and taking 8 seconds on its own. How to read these numbers:
cost=X..Y— planner estimate. First number is startup cost, second is total.actual time=X..Y— wall clock milliseconds.Buffers: shared hit=N read=N—hitmeans data was in memory cache;readmeans it hit disk. Highreadvalues mean I/O is the bottleneck.
Problem clear: no index on orders.customer_id or orders.created_at.
Step 2 — Finding the missing index
Before adding indexes blindly, check what's already there:
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'orders';
Result: a primary key on id, an index on status. Nothing on customer_id or created_at. Also check sequential scan frequency:
SELECT
relname AS table,
seq_scan,
seq_tup_read,
idx_scan
FROM pg_stat_user_tables
WHERE relname = 'orders';
seq_scan was 847. This table was being sequentially scanned almost a thousand times since the last stats reset.
CREATE INDEX CONCURRENTLY idx_orders_customer_created
ON orders (customer_id, created_at DESC)
WHERE status != 'cancelled';
The CONCURRENTLY flag builds the index without locking writes — essential on a production table. The partial WHERE condition makes it smaller and faster, and the planner will use it because our query has that exact filter.
After the index:
Index Scan using idx_orders_customer_created on orders
(cost=0.56..8234.21 rows=89201 width=28)
(actual time=0.148..412.221 rows=89201 loops=1)
Buffers: shared hit=92341 read=1823
Disk reads dropped from 124,882 to 1,823. Time dropped to 412ms. Better — but still too slow for a page-load query.
Step 3 — The customers table scan
The EXPLAIN output showed the customers table scan was filtering 293,679 rows down to 18,321 — scanning the whole table to find 6% of it.
CREATE INDEX CONCURRENTLY idx_customers_account
ON customers (account_id);
After that index, the customers scan became a direct index scan. Full query: 38ms.
Step 4 — Table bloat
While in there, I checked table bloat:
SELECT
relname AS table,
n_dead_tup,
n_live_tup,
round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
WHERE relname IN ('orders', 'customers')
ORDER BY n_dead_tup DESC;
The orders table had 890,000 dead tuples — about 22% dead. Last autovacuum was 9 days ago. Not the primary cause of the slowness, but contributing to index bloat and making buffer reads less efficient.
VACUUM ANALYZE orders;
This reclaims space from dead tuples and updates the planner statistics. After running it, the planner's row estimates got more accurate and the query dropped another 8ms.
Final state
- Before: 14.2 seconds
- After index on orders: 412ms
- After index on customers: 38ms
- After VACUUM: 30ms
Total debugging time: about 35 minutes. The fix itself took 8 minutes to run on production with no downtime.
The lesson isn't "always index everything." It's: read the EXPLAIN output, look at actual vs estimated rows, check buffer hits vs reads, and look at pg_stat_user_tables before assuming you need to rewrite the query. Usually the query is fine — the data access path isn't.
The pg_stat functions on pgref.dev are worth knowing. pg_stat_user_tables, pg_stat_statements, and pg_indexes are the three I open first on any performance problem.