pgEdge ColdFront FAQs

General

What is pgEdge ColdFront?

pgEdge ColdFront is a transparent data tiering and partition lifecycle management system for PostgreSQL. It lets a single table span two storage tiers, with recent data in native PostgreSQL partitions and older data in Apache Iceberg on any S3-compatible object store, while applications continue using the same table name and the same SQL. ColdFront also manages partition lifecycle independently (pre-creating future partitions, retiring old ones) on stock PostgreSQL with no cold tier required.

How is ColdFront related to pgEdge Enterprise Postgres?

ColdFront is a component of pgEdge Enterprise Postgres, alongside the Agentic AI Toolkit, AI DBA Workbench, Spock, and other capabilities in the Enterprise bundle. It is not a separate product. Customers adopt ColdFront independently of whether they use Spock, the AI Toolkit, or any other Enterprise capability.

Is ColdFront open source?

Yes. The coldfront extension and archiver are fully open source under the PostgreSQL License. The source code is publicly available on GitHub. Enterprise support for ColdFront is included with a pgEdge Enterprise Postgres subscription.

Does ColdFront require Spock?

No. ColdFront works on a single PostgreSQL node with no other pgEdge components required. When Spock is present, ColdFront gains distributed cold writes via the bakery protocol, but Spock is not a prerequisite.

Which PostgreSQL versions does ColdFront support?

ColdFront supports PostgreSQL 17 and 18, with PostgreSQL 16 support in progress. It runs on stock upstream PostgreSQL from the standard community distribution packages. No proprietary fork or vendor-specific distribution is required.

Is ColdFront available in pgEdge Cloud?

Not today. pgEdge Cloud and pgEdge Enterprise Postgres are separate product lines. If ColdFront comes to Cloud in the future, it would be a Cloud-managed feature. This is a separate roadmap item.

How It Works

What are the operating modes?

ColdFront supports three modes, configurable per table, coexisting in the same database:

Tiered (hot + cold): Recent data stays in PostgreSQL heap partitions. A lightweight archiver moves older partitions to Iceberg on a configurable schedule and expires cold data past its retention period. Best for OLTP-heavy workloads with a strong recency pattern.

Decoupled (Iceberg-only): The entire table lives in Iceberg from row one. PostgreSQL becomes a stateless compute front-end. Best for analytic and append-heavy workloads where storage/compute separation matters.

Partition-only (no cold tier): ColdFront manages a partitioned table's lifecycle on stock PostgreSQL with no Iceberg and no extension required. Pre-creates future partitions, retires old ones with DETACH CONCURRENTLY. A table can start here and be upgraded to tiered later.

Do I need to change my application code?

No. ColdFront intercepts SQL at the extension layer. Applications continue using SELECT, INSERT, UPDATE, and DELETE against the same table name. No refactoring, no ORM changes, no new data access patterns. The tiering is a property of the deployment, not the application.

What happens to my existing queries?

They work unchanged. Hot-only queries benefit from PostgreSQL partition pruning and never touch Iceberg. Cold-only queries use DuckDB's Iceberg pruning and Parquet row group skipping. Cross-tier queries span both tiers transparently via a unified view.

Can I UPDATE or DELETE cold (archived) data?

Yes. ColdFront's cold tier is fully writable through the same table name. UPDATE and DELETE work on archived rows without rehydration, special paths, or application-level awareness that a tier boundary exists. This is important for compliance workflows like GDPR right-to-delete, where you need to remove data from every storage tier.

Operators who prefer read-only cold can enforce it by setting coldfront.allow_mixed_writes = off.

What is the archiver?

The archiver is a small static Go binary (~9 MB, no CGO, no runtime dependencies) that runs via cron. It moves expired PostgreSQL partitions to Iceberg, updates the watermark, and manages the full partition lifecycle (pre-create, tier, expire). If it crashes, the next run picks up where it left off. The watermark is the single source of truth for crash recovery.

What is the bakery protocol?

The bakery protocol is ColdFront's distributed mutual exclusion system for serializing Iceberg commits across multiple PostgreSQL nodes. It is based on Lamport's 1978 algorithm with the Ricart-Agrawala deferred-reply optimization, riding pgEdge Spock's per-origin FIFO apply ordering. Safety properties are formally verified in TLA+.

In practice: multiple PostgreSQL nodes can write to the same Iceberg table concurrently with no 409 catalog conflicts and no application-level retry logic. Validated at 756,000 rows per second across a three-node cluster with 90 million rows.

What is "id mode" for primary keys?

PostgreSQL forces a partitioned table's unique key to include the partition column, so a time-partitioned table can't have a clean single-column PRIMARY KEY on just id. ColdFront's id mode partitions by RANGE on a time-ordered identifier (a pgEdge snowflake ID or RFC 9562 UUIDv7), so the partition key is the primary key: a genuine single-column PRIMARY KEY, time-aligned partitions, zero application change.

Does ColdFront support multi-tenant or multi-region sub-partitioning?

Yes. Two-level tables (LIST by tenant, location, or region, then RANGE by time) are managed and tiered as first-class citizens. Each region's time partitions are pre-created and tiered independently, and the hot/cold boundary advances a whole time period at a time across all regions together, so cross-tenant reads never see a gap mid-archive.

Infrastructure and Dependencies

What does the full stack look like?

PostgreSQL 17 or 18 (stock upstream), the pg_duckdb extension (DuckDB running in-process, stock upstream, no fork), the coldfront extension (small C extension for DML rewrite), Lakekeeper (Apache Iceberg REST catalog, single Rust binary), and any S3-compatible object store (AWS S3, MinIO, SeaweedFS, GCS, Azure Blob). The archiver/partitioner is a small Go binary that runs via cron.

Does ColdFront require a separate analytics engine or daemon?

No. All Iceberg I/O goes through DuckDB running as a library call inside the PostgreSQL process via the pg_duckdb extension. There is no separate daemon, no Arrow Flight RPC, no sidecar process. One process, one address space.

What is Lakekeeper?

Lakekeeper is an Apache-licensed Rust binary that implements the standard Iceberg REST catalog protocol. It manages Iceberg metadata, snapshots, and commit concurrency. It runs on its own dedicated PostgreSQL database (not co-located on a ColdFront data node) and is the only external service ColdFront requires beyond object storage.

What S3-compatible stores are supported?

Any. AWS S3, MinIO, SeaweedFS, Google Cloud Storage, Azure Blob Storage, and any other S3-compatible endpoint. Lakekeeper and DuckDB communicate with the object store using standard S3 API calls.

Can I run ColdFront on a single node?

Yes. A single PostgreSQL instance with the two extensions, one Lakekeeper binary, and one S3 bucket is the complete stack. No cluster manager, no consensus group, no minimum-node requirement.

How does ColdFront scale?

Add PostgreSQL nodes for more write throughput and read capacity on the hot tier. Cold reads scale with S3 bandwidth, not PostgreSQL. Physical standbys on any node serve cross-tier read-only traffic. In decoupled mode, new compute nodes spin up in seconds against the same Lakekeeper and S3 with no data sync.

Data Format and Portability

What format is cold data stored in?

Apache Iceberg (Parquet data files plus Iceberg metadata) on S3-compatible object storage. Iceberg is an open specification governed by the Apache Software Foundation. The catalog speaks the standard Iceberg REST protocol.

Can I read cold data without ColdFront?

Yes. Cold data is standard Apache Iceberg, readable by any Iceberg-capable engine or data platform (Spark, Trino, DuckDB standalone, and others) with no ColdFront dependency. Nothing on disk, hot or cold, requires ColdFront's tooling to read.

Does pg_dump still work?

Yes. The hot tier dumps and restores normally with standard pg_dump. The cold tier is standard Iceberg on S3, portable independently. ColdFront's catalog metadata (the tiered_views registry, watermark, and runtime_config) is included in a pg_dump and survives a restore.

What column types are supported?

bigint, integer, smallint, real, double precision, boolean, timestamptz, timestamp, date, time, uuid, text, varchar(N), char(N), bytea, oid, numeric(P,S) with P up to 38, jsonb, json, and interval. Types outside this list are rejected at table-creation time rather than silently downgraded.

jsonb and json are stored as varchar in Iceberg (Iceberg has no JSON primitive) and view-cast back to the rich PostgreSQL type on read. Standard JSON access operators (->>, ->, #>) work unchanged. jsonb-only operators (?, @>) need an explicit data::jsonb cast.

AI and Agentic Workloads

How does ColdFront support AI workloads?

AI agents and ML pipelines need access to the full depth of an organization's data, not just the recent working set. ColdFront keeps the complete dataset queryable through standard PostgreSQL at analytical speed. In decoupled mode, PostgreSQL becomes a stateless compute front-end over Iceberg, so AI workloads can scale compute independently of storage without moving data to a separate system.

Does ColdFront work with the pgEdge MCP Server?

Yes. The pgEdge Postgres MCP Server connects AI assistants and code generators to PostgreSQL through standard database access. ColdFront makes the full data history (hot and cold) available through the same PostgreSQL interface the MCP Server already uses. No special configuration is required.

How does ColdFront help with AI governance and provenance?

Agent governance platforms generate massive volumes of audit logs, decision traces, and policy evaluation records. ColdFront tiers these automatically to Iceberg and keeps them queryable for compliance investigations. When a provenance trace leads back to a stale or erroneous source record in the cold tier, ColdFront's writable cold tier lets you correct the data at source with a standard UPDATE or DELETE, without a restore-archive cycle.

Can ColdFront support real-time context assembly for AI agents?

Yes. Context assembly tools query across recent data (at OLTP speed) and deep historical data (at analytical speed) in the same operation to compile context windows for agent invocations. ColdFront's unified view serves both tiers through one table name and one SQL query. DuckDB in-process gives columnar speed on the historical portion. Hot-tier partition pruning gives index speed on the recent portion. The context assembler gets both in one round-trip.

Competitive Positioning

How is ColdFront different from EDB PGAA (Analytics Accelerator)?

Several key differences:

ColdFront runs on stock upstream PostgreSQL. EDB PGAA requires the EDB proprietary PostgreSQL distribution plus a minimum 3-node PGD cluster for tiered tables.

ColdFront's cold tier is fully writable (UPDATE and DELETE on archived rows through the same table name). EDB PGAA's cold tier is read-only. To write to archived data, you must restore it back to the hot PostgreSQL heap first.

ColdFront runs DuckDB in-process as a library call. EDB PGAA runs Seafowl as a separate daemon communicating via Arrow Flight RPC.

ColdFront supports distributed cold writes across a Spock mesh via the bakery protocol. EDB PGD replicates hot writes but the cold tier is read-only on every node.

ColdFront's cold data is standard Apache Iceberg, readable by any Iceberg-capable tool. EDB PGAA uses a proprietary table access method.

How is ColdFront different from Databricks Lakebase?

Lakebase is a Databricks-managed Neon fork. It is not self-hostable and does not have user-visible data tiering. Cold pages go to S3 in a proprietary chunked format, and open-format access is a one-way Delta/Iceberg mirror queried by Photon on the Databricks side. ColdFront runs on stock PostgreSQL on your infrastructure, stores cold data in standard Apache Iceberg, and is fully self-hostable.

How is ColdFront different from Snowflake pg_lake?

pg_lake supports writable Iceberg tables, but through a separately-named table. The application must know which table to address, and moving data between hot and cold is the application's job. ColdFront provides transparent tiering under one table name with automatic lifecycle management. pg_lake also requires approximately 15 extensions and runs DuckDB as a separate daemon, while ColdFront requires 2 extensions with DuckDB in-process.

What are ColdFront's honest limitations?

Cross-tier atomicity is not crash-safe in permissive mode (a crash mid-commit can orphan S3 objects, which Iceberg housekeeping reclaims). jsonb surfaces as json through the unified view (DuckDB has no native jsonb). Single-node query execution (scaling reads means adding replicas). Iceberg only, no Delta Lake support. Targeted-column INSERT in decoupled mode must supply all columns (upstream duckdb-iceberg limitation).

Getting Started

How do I get started with ColdFront?

ColdFront beta is available on GitHub. The quickstart requires one PostgreSQL instance (17 or 18), the pg_duckdb and coldfront extensions, one Lakekeeper binary, and one S3-compatible bucket. A docker-compose file brings up the full stack for evaluation. Documentation and reference architectures are available at docs.pgedge.com.

What does beta mean?

Both operating modes (tiered and decoupled) work end-to-end. The PG 18 CI matrix is fully green across vanilla and mesh topologies, including physical standby reads. Remaining beta work includes PG 16/17 matrix cells, operational hardening (privilege model, graceful degradation under catalog outage), and production-readiness for physical standby deployments.

Is enterprise support available?

Yes. Enterprise support for ColdFront is included with a pgEdge Enterprise Postgres subscription at no additional cost. This includes 24x7x365 support from pgEdge's PostgreSQL experts with defined SLAs.