How an attempt to introduce a new optimisation ran into the problem of preserving ERROR-freedom of PostgreSQL functions.

TL;DR

A fundamental principle of query optimisation is that the result must not depend on the plan. PostgreSQL enforces this in part by tracking function volatility: volatile expressions restrict where the optimiser may push predicates and sort keys. But volatility isn't the only way an expression can be plan-sensitive - a stable function can also throw a runtime error on inputs that a different plan would have filtered out.

In this post, we explore the issue through the lens of a concrete optimisation - outer join sort pushdown - which enables pushing Sort nodes below Joins to benefit ORDER BY ... LIMIT queries. The problem it exposed, however, is more general and already latent in the core planner. We outline a solution by extending PostgreSQL's existing planner support function (prosupport) machinery with a new request type that allows functions to declare themselves unsafe for early evaluation. We'd welcome feedback and criticism from the community on this approach.

Plan Independence and Function Volatility

A query's result must not depend on how the planner chooses to execute it. PostgreSQL's primary tool for guaranteeing this is function volatility. Every function is classified as IMMUTABLE, STABLE, or VOLATILE:

Category	Guarantee
---------- -----------------------------------------------------
IMMUTABLE	Same result forever for the same inputs
STABLE	    Same result within one statement
VOLATILE	May return different results on each call

Volatility controls what the optimiser is allowed to do. For VOLATILE functions, the key rule is: the optimiser must not move the expression to a place that changes how many times it gets evaluated. A call to random() must run once per row - collapsing it into a single pre-evaluated constant or feeding it into an index condition would change the result. STABLE and IMMUTABLE expressions don't have this restriction: since their value doesn't change (within a statement or ever), the optimiser can safely evaluate them once and reuse the result, push them into index scans, or hoist them out of loops.

A quick example makes the difference visible. Say we have a table with a timestamp index:

CREATE TABLE events (ts timestamptz, data text);
CREATE INDEX ON events (ts);

A filter using now() (a STABLE function) can be evaluated once and fed straight to the index - the optimiser knows it won't change mid-statement:

EXPLAIN (COSTS OFF)
SELECT * FROM events WHERE ts > now() - interval '1 hour';

Index Scan using events_ts_idx on events
Index Cond: (ts > (now() - '01:00:00'::interval))

But throw in random() (a VOLATILE function) and the picture changes. Now the expression is different for every row, so the index is useless - the optimiser has to scan everything and check the filter at runtime:

EXPLAIN (COSTS OFF)
SELECT * FROM events WHERE ts > now() - random() * interval '1 hour';

Seq Scan on events
Filter: (ts > (now() - (random() * '01:00:00'::interval)))

This works well for what it's designed to do: stop the optimiser from reusing values that change between calls. But it has a blind spot.

The Blind Spot: Stable Functions that ERROR

Take a type cast like CAST(val AS integer). It's IMMUTABLE - the conversion from text to integer is deterministic, with no dependencies on the environment. Not volatile, not set-returning, parallel-safe. Every optimiser safety gate says "go ahead."

But this cast is what we'll call a partial function (borrowing from mathematics): it throws a runtime error when the input isn't a valid integer (e.g., 42.1). Volatility says nothing about this. The optimiser treats the cast as freely movable - and can push it somewhere it finds inputs the original plan would have filtered out.

This isn't a theoretical worry. It already happens in vanilla PostgreSQL via plain-old predicate pushdown. Here's an example:

CREATE TABLE raw_data (id integer PRIMARY KEY, val text);
CREATE TABLE numbers (id integer references raw_data(id));
INSERT INTO raw_data VALUES (1, '42'), (2, '42.1');
INSERT INTO numbers VALUES (1);

The subquery joins raw_data with numbers, producing only id=1 (val='42'). The outer query applies CAST(val AS integer) > 0 to the join result. Reading the query as written, you'd expect the cast to never see val='42.1' - the join should filter it out first. (The SQL standard doesn't actually guarantee this - evaluation order is implementation-defined - but the query text strongly suggests it.)

SELECT q.val FROM
  (SELECT val FROM raw_data JOIN numbers USING (id)) AS q(val)
WHERE CAST(val AS integer) > 0;

But the optimiser pushes the predicate into the raw_data scan:

Hash Join
   Hash Cond: (numbers.id = raw_data.id)
   ->  Seq Scan on numbers
   ->  Hash
         ->  Seq Scan on raw_data
               Filter: ((val)::integer > 0)

That filter hits every raw_data row - including id=2 where val='42.1'. Boom:

ERROR:  invalid input syntax for type integer: "42.1"

This is the classical relational algebra equivalence σ_p(R ⋈ S) = σ_p(R) ⋈ S. It works when p only touches columns of R and is a "total" predicate - defined for all inputs. When p is "partial" - when it can error on some inputs - the equivalence breaks: the left side errors on rows that the right side would have quietly filtered away.

The optimiser has no way to know this. The cast is IMMUTABLE, the predicate only references raw_data columns, and the pushdown is perfectly valid under standard relational algebra. The trouble is that the algebra assumes all expressions are "total".

This isn't just a PostgreSQL quirk. The SQL Server team documented the same problem back in 2006 in their blog post "Predicate ordering is not guaranteed": a view that converts varchar to int only for certain row categories blows up when the optimiser evaluates the conversion before the category filter. Their recommended workaround - use CASE expressions to guard the conversion - is essentially a manual way for users to enforce "totality" at the query level. The problem is well known across database engines; what's missing is a systematic solution.

A New Optimisation that Widened the Gap

We ran into this while implementing and testing a new optimiser feature (branch: https://github.com/danolivo/pgdev/tree/enforce-presorted-scan-on-query-pathkeys) that expands the plan space for joins. It targets queries with LIMIT <N> and ORDER BY on the outer side of a join: sometimes it's cheaper to pre-sort the outer side and let the join produce at most N rows. This pattern is reportedly common in queries generated by ERP or CRM platforms, and SQL Server already plans it efficiently.

The idea: Perform the sort before the join.

Using the same tables, consider an ORDER BY ... LIMIT over a join:

SELECT * FROM raw_data JOIN numbers USING (id)
ORDER BY val
LIMIT 10;

Normally, PostgreSQL sorts the entire result, then grabs 10 rows:

Limit
   ->  Sort
         Sort Key: raw_data.val
         ->  Hash Join
               Hash Cond: (raw_data.id = numbers.id)
               ->  Seq Scan on raw_data
               ->  Hash
                     ->  Seq Scan on numbers

Our optimisation pushes the Sort below the Join. If the inner side is small or efficiently indexed, a NestLoop with a pre-sorted outer wins - and the LIMIT can propagate down, enabling a top-N heapsort:

Limit
   ->  Nested Loop
         ->  Sort
               Sort Key: raw_data.val
               ->  Seq Scan on raw_data
         ->  Seq Scan on numbers
               Filter: (id = raw_data.id)

When ORDER BY uses a plain column like val, this is perfectly safe - the Sort just reorders rows the Scan was already producing. Faster query, same semantics. Everybody wins.

Where it Breaks

But what happens when ORDER BY uses a "partial" expression?

SELECT * FROM raw_data JOIN numbers USING (id)
ORDER BY CAST(val AS integer)
LIMIT 10;

Without the optimisation, this works fine: the INNER JOIN drops id=2 (no match in numbers), and the Sort only sees val='42'. With the optimisation, the Sort gets pushed below the Join - now it evaluates CAST(val AS integer) on every raw_data row, including id=2, before the Join gets a chance to filter it out:

ERROR:  invalid input syntax for type integer: "42.1"

Same "partial"-function problem as with predicate pushdown, just triggered by a different transformation. The original plan would have eliminated the bad row before the cast ran, but the new optimization broke that by evaluating it earlier.

Why this Matters in Practice

These examples might look contrived, but they point to a real risk for production systems. A query that works today can start throwing errors after a PostgreSQL upgrade - not because anyone changed the query or the data, but because the optimiser got smarter.

Every major release brings new optimisations: join reordering heuristics, incremental sort, memoise, and partitionwise joins. Each one gives the optimiser more freedom to rearrange the plan tree. A query that survived for years because the old planner always put the Sort above the Join might blow up under a newer planner that's clever enough to push it down. The user sees a stable query break on upgrade, with zero schema or data changes - and no obvious explanation.

That's not a great look for a DBMS. Whether a query succeeds or fails shouldn't depend on which optimisations the planner uses. When it does, users lose trust in upgrades and start pinning plans or disabling features, which defeats the whole point of having an optimiser.

To be fair, the SQL standard explicitly permits implementations to differ on whether a query errors or succeeds, depending on evaluation order. That's a remarkably honest admission - but from a user's perspective, it's cold comfort. The gap is real enough that even major DBMSes have to publish workaround guides to help users protect themselves from optimiser-induced errors.

A Missing Dimension in Function Classification

The predicate pushdown and sort pushdown examples are really the same problem in different clothes. The optimiser applies transformations that are valid under standard relational algebra, which assumes expressions are "total" - but breaks when expressions can error.

PostgreSQL classifies functions along one axis: volatility. But there's a second, independent axis: "totality".

In mathematics, a "total" function is defined for every element of its domain. A "partial" function may be undefined - i.e., it errors - for some inputs:

FunctionVolatilityTotality
abs(x)IMMUTABLE"total"
x + yIMMUTABLE"partial" (overflow)
CAST(x AS int)IMMUTABLE"partial" (invalid format)
x / yIMMUTABLE"partial" (division by zero)
x::regclassSTABLE"partial" (invalid name)
random()VOLATILE"total"

You might wonder: isn't "partial" just "more volatile than VOLATILE" - a fourth step on the volatility ladder? It's not. Volatility is about value stability: IMMUTABLE means "never changes," STABLE means "constant within a statement," VOLATILE means "might differ on every call." Each step tells the optimiser when it may evaluate the expression - how aggressively it can cache and reuse.

"Totality" asks a different question: can the evaluation blow up? And these two questions are genuinely independent:

  • random() is maximally volatile but never fails. The optimiser has to re-evaluate it every time, but that's always safe.

  • 1/x is immutable but can error. The optimiser can cache it and push it into an index, and a single bad input can kill the query.

If "partial" were just "super-volatile," the optimiser would refuse to touch these expressions, treating CAST(val AS integer) like random(). But that's way too conservative. The cast is deterministic; the optimiser should cache it, use it in index conditions, and pre-evaluate it with constants. We just don't want it moved somewhere it sees unfiltered data.

Bottom line: volatility controls when to evaluate (once, per row, or per call?). "Totality" controls where to evaluate (above which filters?). They're orthogonal dimensions, not points on the same scale.

What a Solution Might Look Like

PostgreSQL already has a mechanism designed for exactly this kind of problem: planner support functions (prosupport). Every pg_proc entry can optionally point to a support function that the optimiser calls at planning time to ask function-specific questions. Today, support functions handle tasks such as custom selectivity estimates, cost overrides, row-count estimates for set-returning functions, and simplifying function calls.

The infrastructure is already there. A new request type, SupportRequestSafeEarlyEval, would let a function tell the optimiser: "don't evaluate me on unfiltered data."

typedef struct SupportRequestSafeEarlyEval
{
    NodeTag     type;
    /* the function call being considered */
    FuncExpr   *funcexpr;
    /* support function sets this */
    bool        safe;       /* true = OK to push down */
} SupportRequestSafeEarlyEval;

The natural place to call it is relation_can_be_sorted_early() in equivclass.c - the function that already serves as the central gatekeeper for "can this expression be evaluated at a lower plan level?" It already checks for volatility, set-returning functions, and parallel safety. A prosupport call would slot in as a fourth check:

/* existing checks */
if (ec->ec_has_volatile)         return false;
if (expression_returns_set(...)) continue;
if (!is_parallel_safe(...))      continue;

/* new: ask the function itself */
if (!expression_safe_for_early_eval((Node *) em->em_expr))
    continue;

What makes this elegant is that relation_can_be_sorted_early() is already called from all the right places:

- Sort pushdown (consider_enforce_ordered_scan) - our optimisation

- Gather Merge paths (generate_useful_gather_paths) - parallel plans

- FDW/custom-scan paths (get_useful_pathkeys_for_relation)

So a single check in one function automatically protects every code path. And the same expression_safe_for_early_eval() walker could be called from distribute_qual_to_rels() to protect predicate pushdown too - closing the gap we demonstrated with the vanilla PostgreSQL example.

The prosupport approach has several advantages over a static catalogue column:

  • No schema change - no new pg_proc column, no catalogue version bump.

  • Incremental adoption - you add support functions to known "partial" functions one at a time. Functions without one keep the current behaviour.

  • Extensibility - extensions can register support functions for their own "partial" functions.

Conclusion

The SQL standard is deliberately vague about expression evaluation order, and database engines have inherited that vagueness as a practical problem: queries can break when the optimiser rearranges expressions that look safe but aren't. As optimisers grow more sophisticated and explore larger plan spaces, these incidents become more frequent - not less.

We believe PostgreSQL's existing prosupport machinery is the right place to put the problem under at least partial control of DBAs. A new SupportRequestSafeEarlyEval request type, checked in relation_can_be_sorted_early(), would protect sort pushdown, parallel paths, and FDW paths in one shot - and the same walker could extend to predicate pushdown. The approach requires no changes to the catalogue, supports incremental adoption, and is open to extensions.

We'd love to hear what the community thinks - especially about edge cases we might have missed. Feedback is welcome on the pgsql-hackers mailing list.