Cloudflare Workers: Patterns That Actually Matter in Production
Workers are real edge compute — not Lambda@Edge duct-taped to CloudFront. But there are a handful of configuration mistakes and runtime pitfalls that will wreck you in production if you don't know about them. Here's what I've learned shipping Workers at scale.
The pitch for Cloudflare Workers is simple: your code runs in 300+ data centers globally, cold start is milliseconds, and you don't manage any servers. The reality mostly holds up. But "zero-config edge" implies a few things that aren't actually true out of the box, and the failure modes are subtle.
Configuration must-haves
Set compatibility_date to today
Every new Workers project should have a compatibility_date set to the current date in
wrangler.jsonc. This opts you into the latest runtime behavior, APIs, and bug fixes.
Older dates preserve legacy behavior for existing projects — fine for stability, but new projects
should start current:
{
"name": "my-worker",
"compatibility_date": "2026-04-08",
"compatibility_flags": ["nodejs_compat"]
}
Always enable nodejs_compat
Without nodejs_compat, imports from node:crypto, node:buffer,
and node:stream fail at runtime with cryptic errors. Most non-trivial packages depend
on at least one of these. Enable the flag and stop debugging phantom import failures.
Never hand-write your Env interface
If you're using TypeScript, run npx wrangler types to auto-generate your Env
interface from your actual wrangler.jsonc bindings. Hand-writing it means it will drift
from reality. Add --check in CI to fail if types are stale:
# Generate types locally
npx wrangler types
# CI check — exits 1 if wrangler.jsonc and generated types are out of sync
npx wrangler types --check
Streaming bodies — the 128 MB wall
Workers have a 128 MB memory limit. That's not a theoretical concern — it bites you as soon as you start buffering response bodies. The classic mistake:
// WRONG: buffers the entire response body into memory
const response = await fetch(upstreamUrl);
const text = await response.text(); // 150 MB → OOM
// RIGHT: stream the body directly without buffering
const response = await fetch(upstreamUrl);
return new Response(response.body, {
status: response.status,
headers: response.headers,
});
If you absolutely need to inspect or modify the body, enforce a size limit before reading:
export default {
async fetch(request: Request): Promise {
const MAX_BYTES = 10 * 1024 * 1024; // 10 MB
const contentLength = Number(request.headers.get("content-length") ?? 0);
if (contentLength > MAX_BYTES) {
return new Response("Request too large", { status: 413 });
}
const body = await request.arrayBuffer();
// process body...
}
}
waitUntil for post-response work
ctx.waitUntil() lets you run work after the response has been sent — analytics,
cache writes, logging that isn't on the critical path. The response returns fast; the work
continues behind the scenes for up to 30 seconds.
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise {
const response = await handleRequest(request, env);
// Fire and forget — happens after response is sent
ctx.waitUntil(logToAnalytics(request, response, env));
return response;
}
}
const { waitUntil } = ctx; waitUntil(promise); // ❌
Always call it directly:
ctx.waitUntil(promise); // ✅
Queues vs Workflows: which one you actually need
Both handle async work, but they're different tools. Getting this wrong means either over-engineering or hitting a wall when your use case doesn't fit.
| Reach for Queues when… | Reach for Workflows when… |
|---|---|
| Decoupling producer from consumer | Multiple dependent steps |
| Fan-out (one event → many consumers) | Step results must be persisted |
| Buffering or batching messages | Only failed steps should retry (not the whole job) |
| Simple single-step background jobs | Long-running processes (hours or days) |
| At-least-once delivery with retries | Human-approval steps via step.waitForEvent() |
The patterns compose: use a Queue for high-throughput ingestion, then have the consumer trigger a Workflow per item for complex multi-step fulfillment. Queue handles the burst; Workflow handles the durability.
// Queue consumer triggers a Workflow per message
export default {
async queue(batch: MessageBatch, env: Env): Promise {
for (const msg of batch.messages) {
await env.MY_WORKFLOW.create({
id: msg.body.orderId,
params: msg.body,
});
msg.ack();
}
}
}
Service bindings for Worker-to-Worker
If you have multiple Workers and one needs to call another, don't make an HTTP request to the
public URL. Use service bindings: they're free, bypass the public internet, and support type-safe
RPC via WorkerEntrypoint.
// auth-worker.ts
import { WorkerEntrypoint } from "cloudflare:workers";
export default class AuthService extends WorkerEntrypoint {
async verifyToken(token: string): Promise {
// Your auth logic here
return token === await this.env.DB.prepare(
"SELECT token FROM sessions WHERE token = ?"
).bind(token).first("token");
}
}
// api-worker.ts — calls auth-worker via binding, no HTTP
export default {
async fetch(request: Request, env: Env): Promise {
const token = request.headers.get("Authorization")?.slice(7) ?? "";
const valid = await env.AUTH_SERVICE.verifyToken(token);
if (!valid) return new Response("Unauthorized", { status: 401 });
// ...
}
}
Declare the binding in wrangler.jsonc:
{
"services": [
{ "binding": "AUTH_SERVICE", "service": "auth-worker" }
]
}
Hyperdrive for external databases
Every request to an external Postgres or MySQL database pays a TCP + TLS + auth handshake overhead. From a Cloudflare edge node, that's often 300–500ms before your first query even runs. Hyperdrive maintains a regional connection pool close to your database and eliminates that overhead.
import { Client } from "pg";
export default {
async fetch(request: Request, env: Env): Promise {
// Create a new Client per request — Hyperdrive manages the actual pool
const client = new Client({ connectionString: env.HYPERDRIVE.connectionString });
await client.connect();
try {
const result = await client.query("SELECT * FROM posts LIMIT 10");
return Response.json(result.rows);
} finally {
await client.end();
}
}
}
Requires nodejs_compat and a Hyperdrive binding in wrangler.jsonc.
If you're hitting an external database from Workers without Hyperdrive, you're leaving significant
latency on the table.
Secrets: wrangler secret put, never in code
API keys, database URLs, OAuth secrets — none of these belong in wrangler.jsonc
or source code:
# Deploy a secret (takes effect immediately)
wrangler secret put DATABASE_URL
# Stage a secret without deploying (use with gradual rollouts)
wrangler versions secret put DATABASE_URL
# In local dev: .dev.vars file (git-ignored, never committed)
# DATABASE_URL=postgres://localhost/mydb
Access secrets in the Worker exactly like any other binding: env.DATABASE_URL.
No special handling needed — they're just environment variables that Cloudflare encrypts at rest
and injects at runtime.
Custom domains vs routes — the DNS mistake everyone makes
These are different things with different DNS requirements:
- Custom domain: the Worker IS the origin. Cloudflare creates the DNS record and SSL cert automatically. Use this when there's no backend server.
- Route: the Worker runs in front of an existing origin. Requires a proxied (orange-cloud) DNS record to already exist.
The common failure: you add a route for api.example.com/v2/* but there's no DNS record
for api.example.com. Result: ERR_NAME_NOT_RESOLVED and an hour of debugging.
Fix: if you're using a route but have no real origin behind it, add a proxied AAAA record
pointing to 100:: as a placeholder. Cloudflare intercepts the request before it ever
reaches that address.
Global mutable state will betray you
Workers reuse isolates across requests. Module-level variables persist between requests in the same isolate. This is intentional for performance (it's how you cache model weights or DB connections), but it means you can accidentally leak data between requests:
// WRONG — currentUser bleeds across requests
let currentUser: User | null = null;
export default {
async fetch(request: Request, env: Env): Promise {
currentUser = await getUser(request, env); // race condition
const data = await getDataFor(currentUser);
return Response.json(data);
}
}
// RIGHT — pass state through function arguments
export default {
async fetch(request: Request, env: Env): Promise {
const user = await getUser(request, env);
const data = await getDataFor(user);
return Response.json(data);
}
}
Observability: turn it on before you need it
Enable logs and traces in wrangler.jsonc before you deploy to production. You want
the data already collected when the first intermittent error appears, not after:
{
"observability": {
"enabled": true,
"logs": { "head_sampling_rate": 1 },
"traces": { "enabled": true, "head_sampling_rate": 0.01 }
}
}
Use structured JSON logs, not plain strings. They're queryable in the dashboard:
// Queryable
console.log(JSON.stringify({ event: "request", path: url.pathname, status: 200, durationMs: 42 }));
// Not queryable
console.log("Request to /api/users completed in 42ms");
console.error() maps to "error" severity; console.warn() maps to "warning".
Use them consistently so your alerts fire on the right things.
Security: two things that are easy to get wrong
Token comparison timing attacks
Direct string comparison leaks timing information. Use constant-time comparison instead:
async function verifyToken(provided: string, expected: string): Promise {
const encoder = new TextEncoder();
const [providedHash, expectedHash] = await Promise.all([
crypto.subtle.digest("SHA-256", encoder.encode(provided)),
crypto.subtle.digest("SHA-256", encoder.encode(expected)),
]);
return crypto.subtle.timingSafeEqual(providedHash, expectedHash);
}
Random values for security-sensitive operations
Math.random() is not cryptographically secure. For tokens, IDs, and anything
security-related:
// Unique ID
const id = crypto.randomUUID();
// Random bytes
const bytes = new Uint8Array(32);
crypto.getRandomValues(bytes);
The non-obvious one: no floating promises
An unawaited promise in a Worker is silently dropped when the isolate terminates. The work doesn't run. The error doesn't surface. You just lose data:
// WRONG — this analytics write may never complete
writeAnalytics(event); // floating promise
// RIGHT — awaited
await writeAnalytics(event);
// RIGHT — if you don't want to block the response
ctx.waitUntil(writeAnalytics(event));
Enable @typescript-eslint/no-floating-promises in your ESLint config and let the
linter catch these before they reach production.