Pretext vs DOM Reflow: Real Benchmarks for Streaming AI Interfaces

Every token your AI streams triggers a question: how tall is this text now? The browser needs to know for auto-scroll, layout, overflow detection. The traditional answer is getBoundingClientRect(). The cost of that answer is a forced synchronous reflow.

Cheng Lou's Pretext (@chenglou/pretext on npm) sidesteps the DOM entirely. Pure JavaScript text measurement using canvas font metrics and arithmetic. No reflow. No layout thrashing. ~15KB gzipped, zero dependencies.

I benchmarked both approaches across 5 scenarios, 1000 iterations each, measuring prepare() + layout() against getBoundingClientRect() on a real element.

Results

Scenario	Chars	DOM Reflow	Pretext	Speedup
Short (1 token)	4	0.024ms	0.011ms	2x
Word (5 tokens)	22	0.026ms	0.023ms	1x
Sentence	75	0.032ms	0.030ms	1x
Paragraph	235	0.089ms	0.052ms	2x
Streaming (growing)	356	0.122ms	0.070ms	2x

The layout() function alone, called on an already-prepared handle, averaged 0.00052ms. That is the cached hot path.

Why This Matters for Streaming

Modern AI interfaces stream responses token by token. Gemma 4, GPT-5, Claude -- they all support SSE streaming. A typical response is 100-300 tokens arriving at roughly 20 tokens per second.

Each token changes the text content. Something needs to measure the new height to decide: should I auto-scroll? Did the bubble grow past the viewport?

At 20 tokens/sec with the longest benchmark text (356 chars):

Method	Cost/sec	% of 16.67ms frame budget
DOM reflow	2.45ms	14.7%
Pretext (prepare + layout)	1.40ms	8.4%
Pretext layout() only, rAF throttled	0.06ms	0.36%

14.7% of your frame budget spent on text measurement is not catastrophic on Chrome. On Safari it gets worse. On a 2020 iPhone SE running Safari, DOM reflow costs scale significantly higher.

The optimal strategy: call prepare() when the text content changes (new token arrives), but throttle layout() calls to requestAnimationFrame. Since layout() operates on the cached prepared handle, you only pay 0.00052ms per frame for height checks. That is 0.36% of frame budget. Effectively free.

The Streaming Pattern

import { prepare, layout } from '@chenglou/pretext'

let prepared = null
let lastText = ''

function onToken(token) {
  buffer += token
  // prepare() on content change
  prepared = prepare(buffer, '14px Inter')
}

function checkHeight() {
  if (!prepared) return
  const { height } = layout(prepared, bubbleMaxWidth, 20)
  if (height > containerHeight) scrollToBottom()
  requestAnimationFrame(checkHeight)
}

requestAnimationFrame(checkHeight)

prepare() runs per token. layout() runs per frame. The two concerns are decoupled. Text segmentation and canvas measurement happen when content changes. Height calculation happens when the browser is ready to paint.

What Pretext Is Not

Pretext is not a rendering engine. It does not draw text. It does not replace CSS. It answers one question: given this text, this font, and this container width, how many lines and what height?

That question comes up more often than most developers realize:

Streaming AI responses (auto-scroll)
Virtual scrolling (height prediction without mounting DOM nodes)
Chat bubble shrink-wrapping (tightest width that preserves line count)
Adaptive font sizing (binary search for largest size that fits bounds)
Layout shift prevention (pre-calculate height before inserting content)

Bundle Cost

~15KB gzipped, zero dependencies. For context, Oat UI (the CSS framework) is ~8KB. Adding Pretext roughly doubles your UI layer weight. Whether that trade-off makes sense depends on how often you measure text.

For a static landing page: skip it. CSS handles everything.

For a streaming AI chat interface running 20 measurements per second: the 15KB pays for itself on the first streamed response.

Early Days Disclaimer

Pretext (github.com/chenglou/pretext) is weeks old as of writing. v0.0.5 on npm. 43k GitHub stars and massive momentum, but this is pre-1.0 software. No formal security audit. No stability guarantees. The API surface could change. Cheng Lou is iterating fast, shipping breaking changes in patches.

For production use: pin your version, read the changelog before upgrading, and test against your specific fonts and text patterns. The library is pure computation with zero network access, so the risk surface is small. But treat it as you would any pre-1.0 dependency.

Practical Notes

Use named fonts. system-ui produces inaccurate measurements on macOS due to font substitution behavior. Specify "Inter", "Helvetica Neue", or whatever you load.

Cache prepared handles. For completed messages that will not change, call prepare() once and store the handle. Only call it again if content changes.

Throttle layout() to rAF. Decoupling measurement from token arrival avoids redundant work when multiple tokens arrive within a single frame.

Pretext uses canvas internally. The first prepare() call creates an offscreen canvas context. Subsequent calls reuse it. There is no visible canvas element.

Reproducing These Benchmarks

The test measures prepare() + layout() vs creating a DOM element, setting textContent, appending to document, calling getBoundingClientRect(), and removing the element. 1000 iterations per scenario, 5 scenarios, median values reported.

The streaming scenario simulates growing text by appending characters incrementally and measuring at each step, which mirrors real token-by-token SSE behavior.

Hardware and browser matter. Chrome is faster at DOM reflow than Safari. Mobile Safari is slower than desktop Safari. The 2x speedup I measured is a conservative baseline on a desktop browser. Mobile devices with slower layout engines will see larger gaps.

The benchmark code and data are available on request. I ran these while building a Spanish learning app with streaming AI conversation practice. Pretext handles auto-scroll during token streaming, adaptive flashcard font sizing, and chat bubble width calculation -- all without touching the DOM.

Tools Used in This Research

WasmBox -- If you care about sandboxed tool execution for AI agents, WasmBox is what I am building. WASI sandbox runtime with SHA-256 content-addressed verification. Nine tools shipped, more coming. MIT core, FSL-1.1-Apache-2.0 compliance layer. Repository at tangled.org/metaend.eth.xyz/wasmbox-cli.

NanoGPT -- The streaming benchmarks used Gemma 4 31B via NanoGPT's OpenAI-compatible API. $8/month flat for access to all major open source models (Qwen, Kimi, DeepSeek, Gemma, GLM) with full API and CLI usage. No per-token billing. Crypto and card accepted. The link above gives you 5% off.

MetaEnd