A deep dive into Cloudflare's serverless analytics query engine — how it works, how much it costs, and how to use it today
Imagine you have 10 terabytes of log data sitting in an object store. You want to ask questions like:
Traditionally, you'd need to spin up a cluster — Trino, Spark, Presto, or Athena. That means managing servers, worrying about costs when they're idle, and dealing with egress fees when moving data around.
Cloudflare R2 SQL says: What if the query engine lives ON the storage, runs serverlessly, and costs you nothing?
That's exactly what it is — a serverless analytics query engine that runs SQL on Apache Iceberg tables stored in Cloudflare R2, with zero infrastructure to manage and no egress fees ever.
Let's dig into how it actually works.
R2 SQL doesn't just "read all the files." That would be impossibly slow at petabyte scale. Instead, it uses a two-phase approach that's clever as heck.
Before reading ANY data, the planner navigates Iceberg's metadata hierarchy. Iceberg tables store metadata about their own data — think of it as a card catalog at a library. You don't pull every book off the shelf; you check the catalog first.
Layer 1: Table Metadata (JSON)
└─ Points to the current snapshot
Layer 2: Manifest List (partition stats)
└─ Min/max partition values per manifest file
└─ PRUNE: Skip entire manifests that don't match WHERE
Layer 3: Manifest Files (column stats per data file)
└─ Min/max values for each column in each Parquet file
└─ Null counts per column
└─ PRUNE: Skip files where col min/max don't satisfy query
Layer 4: Parquet Row Groups (footer stats)
└─ Statistics inside each Parquet file
└─ PRUNE: Skip row groups within files
Real example: If you query WHERE status_code = 500 and a file's stats show status_code min=200, max=302, the planner says: "This file can't possibly contain status_code=500," and skips it entirely without reading a single byte. This is called metadata pruning, and it's the secret sauce.
Once the planner identifies which files to read, it distributes work across Cloudflare's global network — not centralized servers. Each worker reads only the row groups it needs and streams results back.
For ORDER BY timestamp DESC LIMIT 5 queries, R2 SQL maintains a bounded heap of the top 5 results and a high-water mark. The moment it can prove better results can't exist in unread files — it stops. The official Cloudflare blog notes this can reduce data read to under 10% of matching files.
During open beta, R2 SQL query engine usage is free. You only pay for:
| Service | Cost |
|---|---|
| R2 Storage | $0.015/GB/month |
| Class A Ops (writes) | $4.50/million |
| Class B Ops (reads) | $0.36/million |
| Data Catalog Ops | $9.00/million |
| Data Egress | $0.00 (always free) |
Comparison vs competitors:
| Service | Cost per TB scanned | Egress fees | Server management |
|---|---|---|---|
| Cloudflare R2 SQL | $0 (beta) | $0 | None |
| AWS Athena | ~$5.00/TB | $0.09/GB out | None |
| BigQuery | ~$5.00/TB | $0.12/GB out | None |
| Snowflake | ~$23.00/TB | Varies | Managed |
Cloudflare will give 30 days notice before any query pricing begins.
R2 SQL works with Apache Iceberg tables managed through the R2 Data Catalog. You don't INSERT directly — you ingest via Cloudflare Pipelines.
# 1. Create an R2 bucket
npx wrangler r2 bucket create my-analytics
# 2. Enable Data Catalog on it
npx wrangler r2 bucket catalog enable my-analytics
# → Note the Warehouse ID returned
# 3. Define a schema
cat > schema.json << 'EOF'
{
"fields": [
{"name": "user_id", "type": "string", "required": true},
{"name": "event_type", "type": "string", "required": true},
{"name": "amount", "type": "float64", "required": false}
]
}
EOF
# 4. Create stream + sink + pipeline
npx wrangler pipelines streams create events_stream --schema-file schema.json --http-enabled true
npx wrangler pipelines sinks create events_sink --type "r2-data-catalog" --bucket "my-analytics" --table "ecommerce"
npx wrangler pipelines create events_pipeline --sql "INSERT INTO events_sink SELECT * FROM events_stream"
curl -X POST https://{stream-id}.ingest.cloudflare.com \
-H "Content-Type: application/json" \
-d '[
{"user_id":"user_123","event_type":"purchase","amount":29.99},
{"user_id":"user_456","event_type":"view_product"}
]'
Since R2 Data Catalog exposes a standard Iceberg REST API, you can connect DuckDB, Spark, or PyIceberg directly — and still pay zero egress:
-- DuckDB example
INSTALL iceberg; LOAD iceberg;
CREATE SECRET r2_secret (TYPE ICEBERG, TOKEN 'your-api-token');
ATTACH 'https://catalog.cloudflare.com/...' AS r2_catalog (TYPE ICEBERG);
INSERT INTO r2_catalog.default.ecommerce
SELECT * FROM read_csv_auto('local_data.csv');
export WRANGLER_R2_SQL_AUTH_TOKEN="your-api-token"
# Get your warehouse name
npx wrangler r2 bucket catalog get my-analytics
# Basic query
npx wrangler r2 sql query "YOUR_WAREHOUSE" \
"SELECT * FROM default.ecommerce LIMIT 10"
# Filtered + aggregated
npx wrangler r2 sql query "YOUR_WAREHOUSE" \
"SELECT event_type,
COUNT(*) AS cnt,
ROUND(AVG(amount), 2) AS avg_amount
FROM default.ecommerce
WHERE amount IS NOT NULL
GROUP BY event_type
ORDER BY cnt DESC
LIMIT 10"
curl -X POST \
"https://api.sql.cloudflarestorage.com/api/v1/accounts/{ACCOUNT_ID}/r2-sql/query/my-analytics" \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT event_type, COUNT(*) AS cnt FROM default.ecommerce GROUP BY event_type ORDER BY cnt DESC LIMIT 10"}'
| Feature | Status | Example |
|---|---|---|
| SELECT with WHERE | Yes | SELECT * FROM t WHERE x > 5 |
| GROUP BY + aggregates | Yes | GROUP BY cat, COUNT(*), AVG(x) |
| ORDER BY / LIMIT | Yes | ORDER BY x DESC LIMIT 100 |
| JOINs (INNER/LEFT/RIGHT) | Yes | JOIN t2 ON t1.id = t2.id |
| CTE (WITH) | Yes | WITH cte AS (...) SELECT * FROM cte |
| Subqueries | Yes | WHERE id IN (SELECT id FROM ...) |
| SHOW DATABASES | Yes | Schema discovery |
| DESCRIBE table | Yes | Schema inspection |
| INSERT/UPDATE/DELETE | No | Append-only via Pipelines |
| CREATE TABLE | No | Auto-created by Pipelines |
| You should use R2 SQL if... | You should stick with alternatives if... |
|---|---|
| Your data is already in R2 | You need transactional (row-level) updates |
| You want zero egress costs | You need real-time streaming queries (<1s latency) |
| You hate managing clusters | You need a specific engine feature not yet supported |
| You're building on Cloudflare Workers | Your analytics tool requires a JDBC/ODBC driver |
Based on Cloudflare's benchmarks and community reports:
Cloudflare's engineering blog notes that for partitioned + clustered tables, they've seen 90%+ file skipping rates for typical analytical queries.
Written by John — Software Engineer, Docker wrangler, and accidental Chromium zombie wrangler. First published June 12, 2026.