# Pretext vs DOM Reflow: Real Benchmarks for Streaming AI Interfaces

> Revolutionizing User Experience: The Impact of Pretext and DOM Reflow on Streaming AI Interfaces

**Published by:** [MetaEnd](https://paragraph.com/@metaend/)
**Published on:** 2026-04-15
**Categories:** pretext, dom, research, llm, frontend
**URL:** https://paragraph.com/@metaend/pretext-vs-dom-reflow-streaming-benchmarks

## Content

Every token your AI streams triggers a question: how tall is this text now? The browser needs to know for auto-scroll, layout, overflow detection. The traditional answer is getBoundingClientRect(). The cost of that answer is a forced synchronous reflow. Cheng Lou's Pretext (@chenglou/pretext on npm) sidesteps the DOM entirely. Pure JavaScript text measurement using canvas font metrics and arithmetic. No reflow. No layout thrashing. ~15KB gzipped, zero dependencies. I benchmarked both approaches across 5 scenarios, 1000 iterations each, measuring prepare() + layout() against getBoundingClientRect() on a real element.ResultsScenarioCharsDOM ReflowPretextSpeedupShort (1 token)40.024ms0.011ms2xWord (5 tokens)220.026ms0.023ms1xSentence750.032ms0.030ms1xParagraph2350.089ms0.052ms2xStreaming (growing)3560.122ms0.070ms2xThe layout() function alone, called on an already-prepared handle, averaged 0.00052ms. That is the cached hot path.Why This Matters for StreamingModern AI interfaces stream responses token by token. Gemma 4, GPT-5, Claude -- they all support SSE streaming. A typical response is 100-300 tokens arriving at roughly 20 tokens per second. Each token changes the text content. Something needs to measure the new height to decide: should I auto-scroll? Did the bubble grow past the viewport? At 20 tokens/sec with the longest benchmark text (356 chars):MethodCost/sec% of 16.67ms frame budgetDOM reflow2.45ms14.7%Pretext (prepare + layout)1.40ms8.4%Pretext layout() only, rAF throttled0.06ms0.36%14.7% of your frame budget spent on text measurement is not catastrophic on Chrome. On Safari it gets worse. On a 2020 iPhone SE running Safari, DOM reflow costs scale significantly higher.The optimal strategy: call prepare() when the text content changes (new token arrives), but throttle layout() calls to requestAnimationFrame. Since layout() operates on the cached prepared handle, you only pay 0.00052ms per frame for height checks. That is 0.36% of frame budget. Effectively free.The Streaming Patternimport { prepare, layout } from '@chenglou/pretext' let prepared = null let lastText = '' function onToken(token) { buffer += token // prepare() on content change prepared = prepare(buffer, '14px Inter') } function checkHeight() { if (!prepared) return const { height } = layout(prepared, bubbleMaxWidth, 20) if (height > containerHeight) scrollToBottom() requestAnimationFrame(checkHeight) } requestAnimationFrame(checkHeight)prepare() runs per token. layout() runs per frame. The two concerns are decoupled. Text segmentation and canvas measurement happen when content changes. Height calculation happens when the browser is ready to paint.What Pretext Is NotPretext is not a rendering engine. It does not draw text. It does not replace CSS. It answers one question: given this text, this font, and this container width, how many lines and what height? That question comes up more often than most developers realize:Streaming AI responses (auto-scroll)Virtual scrolling (height prediction without mounting DOM nodes)Chat bubble shrink-wrapping (tightest width that preserves line count)Adaptive font sizing (binary search for largest size that fits bounds)Layout shift prevention (pre-calculate height before inserting content)Bundle Cost~15KB gzipped, zero dependencies. For context, Oat UI (the CSS framework) is ~8KB. Adding Pretext roughly doubles your UI layer weight. Whether that trade-off makes sense depends on how often you measure text. For a static landing page: skip it. CSS handles everything. For a streaming AI chat interface running 20 measurements per second: the 15KB pays for itself on the first streamed response.Early Days DisclaimerPretext (github.com/chenglou/pretext) is weeks old as of writing. v0.0.5 on npm. 43k GitHub stars and massive momentum, but this is pre-1.0 software. No formal security audit. No stability guarantees. The API surface could change. Cheng Lou is iterating fast, shipping breaking changes in patches. For production use: pin your version, read the changelog before upgrading, and test against your specific fonts and text patterns. The library is pure computation with zero network access, so the risk surface is small. But treat it as you would any pre-1.0 dependency.Practical NotesUse named fonts. system-ui produces inaccurate measurements on macOS due to font substitution behavior. Specify "Inter", "Helvetica Neue", or whatever you load. Cache prepared handles. For completed messages that will not change, call prepare() once and store the handle. Only call it again if content changes. Throttle layout() to rAF. Decoupling measurement from token arrival avoids redundant work when multiple tokens arrive within a single frame. Pretext uses canvas internally. The first prepare() call creates an offscreen canvas context. Subsequent calls reuse it. There is no visible canvas element.Reproducing These BenchmarksThe test measures prepare() + layout() vs creating a DOM element, setting textContent, appending to document, calling getBoundingClientRect(), and removing the element. 1000 iterations per scenario, 5 scenarios, median values reported. The streaming scenario simulates growing text by appending characters incrementally and measuring at each step, which mirrors real token-by-token SSE behavior. Hardware and browser matter. Chrome is faster at DOM reflow than Safari. Mobile Safari is slower than desktop Safari. The 2x speedup I measured is a conservative baseline on a desktop browser. Mobile devices with slower layout engines will see larger gaps.The benchmark code and data are available on request. I ran these while building a Spanish learning app with streaming AI conversation practice. Pretext handles auto-scroll during token streaming, adaptive flashcard font sizing, and chat bubble width calculation -- all without touching the DOM.Tools Used in This ResearchWasmBox -- If you care about sandboxed tool execution for AI agents, WasmBox is what I am building. WASI sandbox runtime with SHA-256 content-addressed verification. Nine tools shipped, more coming. MIT core, FSL-1.1-Apache-2.0 compliance layer. Repository at tangled.org/metaend.eth.xyz/wasmbox-cli. NanoGPT -- The streaming benchmarks used Gemma 4 31B via NanoGPT's OpenAI-compatible API. $8/month flat for access to all major open source models (Qwen, Kimi, DeepSeek, Gemma, GLM) with full API and CLI usage. No per-token billing. Crypto and card accepted. The link above gives you 5% off.

## Publication Information

- [MetaEnd](https://paragraph.com/@metaend/): Publication homepage
- [All Posts](https://paragraph.com/@metaend/): More posts from this publication
- [RSS Feed](https://api.paragraph.com/blogs/rss/@metaend): Subscribe to updates
- [Twitter](https://twitter.com/ngmisl): Follow on Twitter