Sergen Tanguc
Selected Works
// Case Study

MyPdfBoy

Pixel-perfect PDFs from headless Chrome, every time, every environment

TypeScript service for reliable PDF generation — headless Chrome rendering, template management, and predictable output across environments.

TypeScript PDF Puppeteer Node.js Service

Why I Built This

PDF generation has a reputation for being easy. It isn’t. The first time I wired up a Puppeteer script to render an invoice, it worked on my machine, produced garbage on the staging server (different font rendering), and then hung indefinitely under load because nobody told Chromium it had a deadline.

I went through a few iterations of “just fix this one issue” before I accepted that the problem wasn’t the rendering — Puppeteer handles that fine. The problem was everything around it: warm instances, clean shutdown, predictable font behavior across environments, and a way to reproduce a document exactly as it looked when it was first generated.

MyPdfBoy is what I landed on after all those fixes accumulated into something worth packaging as a standalone service.

The Core Contract

The API is deliberately simple. You POST a template name and a JSON data payload, and you get back a PDF binary. That’s it.

POST /render
Content-Type: application/json
{
"template": "invoice",
"version": "2.1.0",
"data": {
"invoiceNumber": "INV-2024-0047",
"issuedAt": "2024-11-15",
"client": {
"name": "Acme Corp",
"address": "742 Evergreen Terrace, Springfield"
},
"lineItems": [
{ "description": "Backend development", "qty": 10, "unitPrice": 750 },
{ "description": "Code review", "qty": 3, "unitPrice": 350 }
],
"vatRate": 0.20
}
}
HTTP/1.1 200 OK
Content-Type: application/pdf
Content-Disposition: attachment; filename="INV-2024-0047.pdf"
<binary PDF content>

The version field is optional — omitting it uses the latest published template. Including it pins the render to a specific template version, which matters when you need to reproduce a document months later exactly as it was generated.

Browser Pool and Cold Starts

The biggest latency problem with headless Chrome is launch time. A cold Chromium start takes anywhere from 800ms to 2.5 seconds depending on the host, which is catastrophic if every render request has to pay that cost.

The service maintains a small pool of warm browser instances — typically two or three, configurable by environment. When a render request comes in, it grabs an available instance from the pool, creates a new page (cheap), renders, closes the page, and returns the instance. Pool acquisition uses a simple queue with a configurable wait timeout: if no instance becomes available within that window, the request fails fast rather than piling up behind a backed-up queue.

On startup, the pool spins up asynchronously in the background. The first few requests may hit a cold instance if they arrive before the pool is warm, but steady-state p99 latency drops substantially compared to launch-per-render. In practice I see roughly 120ms p50 for a typical invoice template once the pool is warm, versus 1.8–2.2s cold.

Timeout Enforcement and Process Recycling

A hung render is one of the nastier failure modes in practice. It happens when a template references an external resource (a font CDN, a logo URL) that times out, or when the page layout triggers an infinite CSS recalculation. Without a hard deadline, the render job sits there consuming a browser instance indefinitely, starving out the pool.

Every render runs inside a Promise.race between the actual render and a timeout that hard-kills the page and recycles the instance back into the pool. The timeout is set at the request level, with a hard cap enforced server-side so callers cannot request unbounded render time.

Process recycling goes a step further: each browser instance tracks how many renders it has served. After a configured render count (default: 200), the instance is gracefully drained — no new pages, wait for any in-flight page to finish — then shut down and replaced with a fresh one. This prevents memory growth in long-running Chromium processes, which accumulates quietly and eventually destabilizes renders.

Template Versioning

Templates are HTML files with a Handlebars-style data injection layer. Each template directory is versioned with a semantic version string:

templates/
invoice/
2.0.0/
index.html
styles.css <- inlined at build time
manifest.json
2.1.0/
index.html
styles.css
manifest.json
contract/
1.0.0/
...

Styles are inlined into the HTML at template publish time, not at render time. This means each versioned template is a self-contained snapshot: no external stylesheets, no font CDN calls, no runtime dependencies. The browser renders exactly what the template contains, which is why font and layout behavior is consistent across environments.

The manifest.json per version contains metadata about the template — required data fields, optional fields with defaults, the schema version used for validation. The service validates incoming data against the manifest before passing it to the renderer, so type mismatches fail early with a descriptive error rather than silently producing a malformed PDF.

Output Validation

“Output validation” is an easy thing to hand-wave over, but the failure modes it catches are real. The service runs three checks on every rendered PDF before returning it to the caller:

  1. Minimum size check — a PDF below a threshold (configurable, default 5KB) is almost certainly a rendering failure. Chromium will happily produce a structurally valid but empty PDF if the template failed to render for any reason. This catches it.

  2. PDF header check — the output is read and its first bytes are checked for the %PDF- magic header. This catches the case where the render pipeline returned a non-PDF response, which has happened during Puppeteer version mismatches.

  3. Page count check — for templates that declare an expected page range in their manifest, the rendered PDF is introspected to confirm the page count falls within bounds. A five-page contract template that renders as one page usually means data injection failed silently.

Failures at any of these stages return a 422 with a structured error body and log the raw rendered output for debugging. They do not return a broken PDF to the caller.

What It Handles in Practice

Invoice generation, contract templates, report exports — anything where the layout is defined ahead of time and the data arrives at runtime. The service runs as a sidecar to the applications that need it rather than being embedded in them. That boundary matters: it keeps the PDF concern isolated, independently deployable, and easy to version separately from the apps consuming it.

Status

Private repository. The service is in production use for internal projects.