# How a 570-Line JSON File Saves 85% of AI Token Waste (and the CO2 That Goes With It)

> Navigating AI Token Waste: How a JSON Index Revolutionizes Code Exploration Efficiency

**Published by:** [MetaEnd](https://paragraph.com/@metaend/)
**Published on:** 2026-01-28
**Categories:** ai, golang, eco, claude, agents, llm
**URL:** https://paragraph.com/@metaend/codebase-indexer-saving-85-percent-ai-tokens

## Content

Every time you ask an AI coding assistant to "find where useAppStore is defined," it launches a multi-step expedition: glob for files, grep for patterns, read candidates one by one. For a modest 30-file React project, that's easily 5–10 tool calls and 15,000+ tokens of file content ingested — just to answer a question that has a three-word answer: src/stores/appStore.ts:23. We built a dead-simple fix: a static JSON index of your codebase that the AI reads once instead of exploring repeatedly. Here's what happened when we iterated it down to the essentials.The Problem: AI Assistants Explore Like TouristsAI coding tools (Claude Code, Cursor, Copilot, etc.) don't have a mental model of your project. Every session starts from zero. When they need context, they:Glob for file patterns (**/*.ts, **/*.tsx)Grep for symbol names across matchesRead each candidate file to confirmRepeat for every follow-up questionFor our palm reading app (28 source files, React + TypeScript + Zustand), a typical "understand the codebase" exploration consumed ~50,000 input tokens across 15–20 tool calls. That's before the assistant writes a single line of code.The Fix: One JSON File, Generated OnceWe built Codebase Indexer, a lightweight Go tool that uses tree-sitter ASTs to extract three things:dependencyGraph — who imports whomsymbols — every exported function, type, and variable with file + line numberfiles — file listing with language and export metadataThe AI reads this once at conversation start, then knows exactly where everything is.codebase-indexer -root . -out .context/relationships.json The Iteration: From 2,970 Lines to 570The first version indexed everything: every local variable, every function call, every destructured binding. It worked, but it was bloated. We went through five rounds of pruning:VersionLinesWhat Changedv1: Everything2,970Calls, locals, destructurings, duplicates — all indexedv2: Remove calls + dedup2,255Dropped calls arrays (noisiest, least useful field)v3: Filter locals314Only exported symbols, stripped all local definesv4: Fix type exports470Re-added defines for files with actual exportsv5: Full export detection570export const/function now detected alongside typesThe key insight at each stage: what does the AI actually need to navigate? Not local variables. Not call expressions. Not destructured const [foo, setFoo] bindings. It needs:Where is this symbol defined? → symbolsWhat depends on what? → dependencyGraphWhat does this file export? → files[x].exportsEverything else is noise that costs tokens without aiding navigation.The NumbersToken savings per sessionScenarioWithout IndexWith IndexSavings"Find where X is defined"~15,000 tokens (5-10 tool calls)~2,000 tokens (1 read)87%"Understand the codebase"~50,000 tokens (15-20 tool calls)~4,000 tokens (1 read)92%Typical feature implementation~80,000 tokens exploration overhead~12,000 tokens85%Impact analysis ("what breaks if I change X?")~25,000 tokens~2,000 tokens92%The index file itself costs ~4,000 tokens to ingest. It pays for itself on the first question.Across a workdayA developer working with an AI assistant typically runs 20–40 conversations per day. Conservatively:Without index: 30 sessions × 50,000 token overhead = 1,500,000 tokens/day on explorationWith index: 30 sessions × 6,000 token overhead = 180,000 tokens/dayDaily savings: ~1,320,000 tokens → ~1.3M tokens/dayMonthly (solo developer)Saved: ~29M tokens/monthAt typical API pricing (~$3/M input tokens for Claude Sonnet): ~$87/month savedFor Opus-class models (~$15/M input tokens): ~$435/month savedFor a team of 10Saved: ~290M tokens/monthSonnet pricing: ~$870/monthOpus pricing: ~$4,350/monthThe CO2 ImpactAI inference has a real carbon footprint. The exact numbers vary by provider, hardware, and energy mix, but published research gives us reasonable estimates.Per-token emissionsBased on Luccioni et al. (2023) and infrastructure reports from major cloud providers:Large language model inference: ~0.3–0.5g CO2 per 1,000 tokens (varies by model size and datacenter)Using a midpoint of 0.4g CO2 per 1,000 tokensSolo developer savings29M tokens saved/month × 0.4g/1,000 = 11.6 kg CO2/month139 kg CO2/year — equivalent to driving ~560 km in an average carTeam of 10290M tokens saved/month × 0.4g/1,000 = 116 kg CO2/month1,392 kg CO2/year — equivalent to a one-way transatlantic flight (NYC → London)At scale (1,000 developers)11,600 kg CO2/month = 139 metric tons CO2/yearEquivalent to ~30 cars driven for a yearThese are conservative estimates. They don't account for output tokens (which cost more energy), retries from failed explorations, or the compound effect of the AI making better decisions faster with good context.Why This Works Better Than RAG or EmbeddingsYou might wonder: why not use vector embeddings or retrieval-augmented generation?Zero infrastructure. A Go binary and a JSON file. No vector database, no embedding model, no chunking strategy to tune.Deterministic. The same codebase always produces the same index. No relevance scoring to debug.Complete. Every export is indexed. Embedding-based retrieval can miss symbols that don't appear in semantically similar contexts.Fast. The indexer runs in <1 second for most projects. It can run on every commit via a git hook.Readable. The AI (and you) can read the JSON directly. Try that with a vector database.How to Use It1. Installgit clone https://tangled.org/metaend.eth.xyz/codebase-indexer/ cd codebase-indexer go build -o ~/go/bin/codebase-indexer . Make sure ~/go/bin is in your PATH.2. Generatecodebase-indexer -root . -out .context/relationships.json 3. Tell your AI assistantAdd to CLAUDE.md or AGENTS.md:Before exploring unfamiliar parts of the codebase, consult `.context/relationships.json` This file contains: - **symbols**: Global symbol-to-location index (find any function/var instantly) - **dependencyGraph**: Which files depend on which - **files**: Per-file exports and defines 4. Auto-update (optional)# Git hook cp post-commit .git/hooks/ && chmod +x .git/hooks/post-commit # Or watch mode ./watch-index.sh The TakeawayThe most effective optimization isn't a smarter model or a bigger context window. It's not sending tokens you don't need to send. A 570-line JSON file, generated in under a second, eliminates 85% of the exploration overhead that AI coding assistants burn through on every session. That's real money saved, real latency reduced, and real carbon not emitted. The best part: it took five iterations to get from "index everything" to "index only what matters." The same principle applies to the index itself — less is more, as long as it's the right less.Codebase Indexer is open source. Supports JavaScript, TypeScript, Go, and Python via tree-sitter. Adding a new language is ~50 lines of Go.

## Publication Information

- [MetaEnd](https://paragraph.com/@metaend/): Publication homepage
- [All Posts](https://paragraph.com/@metaend/): More posts from this publication
- [RSS Feed](https://api.paragraph.com/blogs/rss/@metaend): Subscribe to updates
- [Twitter](https://twitter.com/ngmisl): Follow on Twitter