Window Functions Explained Without the Headache
Window functions clicked for me the day I stopped reading tutorials and started with a problem: I needed the second-highest salary in each department. Not the highest — the second. A subquery approach worked but required two passes through the data. A window function did it in one. That's where we'll start.
The dataset
Every example in this post uses the same table. No switching datasets mid-article.
CREATE TABLE employees (
id INT PRIMARY KEY,
name TEXT,
department TEXT,
salary NUMERIC(10,2),
hire_date DATE
);
INSERT INTO employees VALUES
(1, 'Alice Chen', 'Engineering', 95000, '2019-03-15'),
(2, 'Bob Kowalski', 'Engineering', 88000, '2020-07-22'),
(3, 'Carmen Diaz', 'Engineering', 95000, '2018-11-01'),
(4, 'David Park', 'Engineering', 72000, '2022-01-10'),
(5, 'Emma Fischer', 'Marketing', 68000, '2020-04-30'),
(6, 'Frank Osei', 'Marketing', 71000, '2019-09-12'),
(7, 'Grace Liu', 'Marketing', 68000, '2021-06-15'),
(8, 'Henry Moore', 'Sales', 82000, '2018-02-28'),
(9, 'Isla MacLeod', 'Sales', 79000, '2019-10-05'),
(10, 'James Okonkwo', 'Sales', 82000, '2020-12-01');
ROW_NUMBER — the gateway
The problem: get the highest-paid employee in each department.
SELECT * FROM (
SELECT
*,
row_number() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
) ranked
WHERE rn = 1;
How to read row_number() OVER (PARTITION BY department ORDER BY salary DESC):
OVERtells PostgreSQL this is a window function, not a regular aggregatePARTITION BY departmentmeans "restart the counter for each department" — like sorting papers into piles, one pile per departmentORDER BY salary DESCmeans "within each pile, sort by salary descending, then number the rows"
Notice Alice gets rank 1 in Engineering even though Carmen has the same salary. With row_number(), ties are broken arbitrarily unless you add a tiebreaker: ORDER BY salary DESC, id ASC.
ROW_NUMBER vs RANK vs DENSE_RANK
SELECT
name, department, salary,
row_number() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
rank() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk,
dense_rank() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rnk
FROM employees
WHERE department = 'Engineering'
ORDER BY salary DESC;
Result:
name | salary | row_num | rnk | dense_rnk
--------------+---------+---------+-----+-----------
Alice Chen | 95000 | 1 | 1 | 1
Carmen Diaz | 95000 | 2 | 1 | 1
Bob Kowalski | 88000 | 3 | 3 | 2
David Park | 72000 | 4 | 4 | 3
row_number(): always unique — Alice is 1, Carmen is 2, even though they're tiedrank(): tied rows get the same rank and the next rank skips — there's no rank 2dense_rank(): tied rows get the same rank, but the next rank is consecutive
For deduplication, always use row_number(). For "second highest salary" use dense_rank() — two people tying for first should still show a second place.
The frame clause — where most tutorials lose people
Every window function has a frame: the set of rows it operates over within its partition. The default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
Running total of salary by hire date:
SELECT
name, hire_date, salary,
SUM(salary) OVER (ORDER BY hire_date) AS running_total
FROM employees
ORDER BY hire_date;
For a 3-row moving average using ROWS (physical rows, not value ranges):
SELECT
name, hire_date, salary,
AVG(salary) OVER (
ORDER BY hire_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3
FROM employees
ORDER BY hire_date;
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW: for each row, look at this row and the two rows before it by position. Use ROWS when you care about physical row position. Use RANGE when you care about value equality. For most practical uses (rolling averages, running totals), ROWS is what you want.
LAG and LEAD — looking sideways
LAG(col, n) looks backwards n rows. LEAD(col, n) looks forwards n rows. Month-over-month salary changes:
SELECT
name, hire_date, salary,
LAG(salary) OVER (ORDER BY hire_date) AS prev_salary,
salary - LAG(salary) OVER (ORDER BY hire_date) AS change
FROM employees
ORDER BY hire_date;
The first row has NULL for prev_salary. Provide a default with LAG(salary, 1, 0) OVER (...) — the third argument is what to return when there's no preceding row.
The equivalent self-join is ugly: two table scans, a correlated subquery. Window functions exist partly to avoid this.
The pattern I use most
ROW_NUMBER with a CTE for deduplication. When a table has duplicate rows and I want to keep the most recent one:
WITH ranked AS (
SELECT
*,
row_number() OVER (
PARTITION BY customer_id
ORDER BY created_at DESC
) AS rn
FROM orders
)
DELETE FROM orders
WHERE id IN (
SELECT id FROM ranked WHERE rn > 1
);
This keeps one row per customer_id (the most recent), deletes the rest. Edge case: if two rows have the exact same created_at, which one survives is arbitrary. Add a tiebreaker (ORDER BY created_at DESC, id ASC) to make it deterministic.
Bookmark the pgref.dev window functions page. You'll need it once a week, and you will forget the frame syntax every time. I still look it up.