The PostgreSQL Functions I Actually Use Every Day

Developer & creator of pgref.dev · 9 min read

functionscoalescenullif

After building pgref.dev and cataloguing over 400 PostgreSQL functions, most days I use about 12 of them. The Pareto principle applies brutally to SQL. The rest exist for edge cases, for niche domains, for the one person at a financial institution who genuinely needs width_bucket(). The functions below are the ones I reach for without thinking.

coalesce()

I reach for coalesce() every time I have a nullable column that should display something sensible. That's most nullable columns.

SELECT
  order_id,
  coalesce(shipping_name, billing_name, 'Unknown Customer') AS display_name
FROM orders;

For years I wrote CASE WHEN shipping_name IS NULL THEN billing_name ELSE shipping_name END. That's nine tokens to do what coalesce does in one. The moment I made the switch I never went back.

The gotcha: coalesce evaluates all its arguments left to right and stops at the first non-null. So if you write coalesce(expensive_function(), 'default'), that function runs whenever the first argument is null. It's not lazy evaluation. I learned this the hard way when a coalesce branch that "never runs" started running under load and took a query from 10ms to 800ms.

nullif()

The most underrated function in PostgreSQL. I use nullif(a, b) constantly for avoiding divide-by-zero errors and normalising empty strings to NULL.

-- Avoid divide-by-zero
SELECT
  total_revenue / nullif(total_orders, 0) AS avg_order_value
FROM monthly_summary;

-- Treat empty string the same as NULL
SELECT nullif(trim(notes), '') AS clean_notes FROM tickets;

nullif(a, b) returns NULL if a = b, otherwise returns a. It solves an annoying class of problems cleanly.

concat_ws()

The || concatenation operator burned me twice before I switched to concat_ws(). The issue: || propagates NULL. If any operand is NULL, the whole expression is NULL.

-- This returns NULL if middle_name is NULL
SELECT first_name || ' ' || middle_name || ' ' || last_name FROM users;

-- This skips NULL components gracefully
SELECT concat_ws(' ', first_name, middle_name, last_name) FROM users;

concat_ws takes a separator as the first argument, then any number of values, and skips NULLs entirely. For name assembly, address formatting, tag concatenation — anything where components are optional — it's exactly what you want.

date_trunc()

date_trunc() is how I group time-series data. You pass a precision string and a timestamp; it zeroes out everything below that precision.

SELECT
  date_trunc('month', created_at) AS month,
  count(*) AS signups
FROM users
GROUP BY 1
ORDER BY 1;

The trap I hit: date_trunc respects timezones only if you tell it which one. On a server set to UTC, date_trunc('day', created_at) gives you UTC days. If your users are in Sydney, that's wrong by ten hours. The fix:

date_trunc('day', created_at AT TIME ZONE 'Australia/Sydney')

I've debugged two separate "the daily report is off by a day" incidents that turned out to be this exact problem.

generate_series()

generate_series() generates rows. That sounds trivial until you need to fill gaps in date-grouped data.

-- Revenue by day, with zeros for days with no orders
SELECT
  d::date AS day,
  coalesce(sum(o.total), 0) AS revenue
FROM generate_series(
  '2024-01-01'::timestamp,
  '2024-01-31'::timestamp,
  '1 day'
) AS d
LEFT JOIN orders o ON o.created_at::date = d::date
GROUP BY d
ORDER BY d;

Without generate_series, a GROUP BY on the orders table skips days with no orders entirely. The reporting client sees the chart with missing dates and files a bug. I've done this LEFT JOIN dance on every reporting project I've worked on.

row_number() vs rank()

I still get these wrong occasionally. row_number() gives every row a unique sequential integer regardless of ties. rank() gives tied rows the same number and then skips.

-- Get the single most recent order per customer
SELECT * FROM (
  SELECT
    *,
    row_number() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
  FROM orders
) sub
WHERE rn = 1;

If I use rank() here instead of row_number(), and a customer has two orders at the exact same timestamp, both get rank = 1 and I get two rows back. For deduplication, always use row_number().

string_agg()

string_agg() concatenates values across rows. The part that took me a while to find: you can ORDER BY inside the aggregate.

SELECT
  project_id,
  string_agg(tag_name, ', ' ORDER BY tag_name) AS tags
FROM project_tags
GROUP BY project_id;

Without the ORDER BY, the order is arbitrary and changes between queries. On a project where a test was checking the exact output of a tags column, this caused intermittent failures for a week before I found it.

regexp_replace()

regexp_replace() is how I clean data from external sources — phone numbers, reference codes, user input.

-- Strip non-digit characters from phone numbers
SELECT regexp_replace(phone, '[^0-9]', '', 'g') FROM contacts;

The 'g' flag means "replace all matches." Without it, only the first match is replaced. Other regex implementations default to global replacement. PostgreSQL doesn't. I forget this about twice a year.

jsonb_build_object() and jsonb_set()

I reach for jsonb_build_object() whenever I'm assembling API responses or audit log entries inside a query.

SELECT
  jsonb_build_object(
    'user_id', u.id,
    'email', u.email,
    'plan', s.plan_name,
    'expires_at', s.expires_at
  ) AS user_summary
FROM users u
JOIN subscriptions s ON s.user_id = u.id;

The thing I stopped doing: using -> and ->> in WHERE clauses on large tables without a GIN index. I had a query filtering on metadata->>'source' = 'api' over 2 million rows that was doing a sequential scan. Adding a generated column fixed it in five minutes.

left() and right()

left(str, n) and right(str, n) are small but I use them constantly for truncation and prefix extraction.

-- Truncate long descriptions for display
SELECT left(description, 150) || '…' AS preview FROM articles;

-- Extract file extension
SELECT right(filename, 4) AS ext FROM uploads;

The functions I thought I'd use constantly but almost never do

When I started building pgref.dev I assumed geometric functions would come up regularly — point, circle, polygon, the distance operators. They haven't. Not once in a production query. Same with the XML functions. array_agg I expected to use daily. I use it maybe once a month. Most of the time string_agg or a JSON approach fits better.

If I lost every page on pgref.dev except one, I'd keep the Conditional Functions page. coalesce and nullif together solve more problems than any other pair of functions I know.