# Notes on Lit3 — Part 13: HNP Technical Specification

> Implementation Details for HNP-1 and HNP-2

**Published by:** [Lokapal](https://paragraph.com/@lokapal/)
**Published on:** 2026-01-02
**Categories:** literature, book, read, canon, lit3, permanence, specs, technical
**URL:** https://paragraph.com/@lokapal/notes-on-lit3-part-13-hnp-technical-specification

## Content

PrerequisitesNote: This article provides the complete technical specification for both HNP-1 (format-agnostic) and HNP-2 (format-aware) normalization protocols. This document is intended for developers implementing HNP verification tools, creators who want to understand the normalization process in detail, and anyone auditing the protocols for correctness.HNP-1 Technical SpecificationDesign PhilosophyHNP-1 treats all formatting as presentation-layer metadata that can be safely discarded. Its goal is to normalize the linguistic content of a text while eliminating variations caused by:Different operating systems (line ending conventions)Different text editors (tab handling, trailing whitespace)Different encoding practices (BOM, Unicode normalization forms)Accidental whitespace artifacts (leading/trailing blank lines)Normalization RulesHNP-1 applies ten sequential transformation rules to the input text:Rule 1: BOM StrippingPurpose: Remove the Byte Order Mark (U+FEFF) if present at the beginning of the file. Rationale: The BOM is a zero-width character used to signal byte order in UTF-16/UTF-32 encodings. In UTF-8, it's optional and often added by Windows text editors. Its presence or absence doesn't affect meaning but changes the byte sequence. Implementation:if (content.charCodeAt(0) === 0xFEFF) { content = content.slice(1); } Example:Input: \uFEFFThe story begins. Output: The story begins. Rule 2: Unicode Normalization (NFC)Purpose: Convert all Unicode characters to Normalization Form C (Composed). Rationale: Unicode allows multiple byte sequences to represent the same visual character. For example, the character "é" can be encoded as:A single codepoint: U+00E9 (é)Two codepoints: U+0065 (e) + U+0301 (combining acute accent)Both render identically but produce different hashes. NFC normalization ensures consistency by preferring composed forms. Implementation:content = content.normalize('NFC'); Example:Input: cafe\u0301 (café using e + combining accent) Output: café (café using single composed character) Rule 3: Line Ending ConversionPurpose: Convert all line endings to Unix-style LF (\n). Rationale: Different operating systems use different line ending conventions:Unix/Linux/macOS: LF (\n)Windows: CRLF (\r\n)Old macOS: CR (\r)These variations are invisible to readers but change byte sequences. Implementation:content = content.replace(/\r\n/g, '\n').replace(/\r/g, '\n'); Example:Input: Line one\r\nLine two\r\nLine three Output: Line one\nLine two\nLine three Rule 4: Trailing Whitespace RemovalPurpose: Remove all spaces and tabs from the end of each line. Rationale: Trailing whitespace is invisible, accidental, and has no semantic meaning in prose. Text editors often add or remove it automatically. Implementation:lines = lines.map(line => line.replace(/\s+$/, '')); Example:Input: The story begins. \n Chapter One. \n Output: The story begins.\n Chapter One.\n Rule 5: Tab ExpansionPurpose: Convert all tab characters (\t) to four space characters. Rationale: Tabs render differently depending on editor settings (2 spaces, 4 spaces, 8 spaces). Converting to a fixed width ensures consistency. Implementation:lines = lines.map(line => line.replace(/\t/g, ' ')); Example:Input: \tIndented paragraph. Output: Indented paragraph. Rule 6: Leading Blank Line RemovalPurpose: Remove all blank lines from the beginning of the file. Rationale: Files often have accidental blank lines at the start due to editor behavior or copy-paste operations. These don't affect the text's meaning. Implementation:while (lines.length > 0 && lines[0].trim() === '') { lines.shift(); } Example:Input: \n \n The story begins. Output: The story begins. Rule 7: Trailing Blank Line RemovalPurpose: Remove all blank lines from the end of the file. Rationale: Same as Rule 6—accidental trailing whitespace is common and meaningless. Implementation:while (lines.length > 0 && lines[lines.length - 1].trim() === '') { lines.pop(); } Example:Input: The story ends.\n \n \n Output: The story ends. Rule 8: Blank Line CompressionPurpose: Collapse sequences of multiple consecutive blank lines into a single blank line. Rationale: Authors use blank lines to separate paragraphs or sections. Whether two paragraphs are separated by one, two, or five blank lines is usually accidental. HNP-1 preserves the presence of separation but normalizes the amount. Implementation:let normalizedLines = []; let lastWasBlank = false; for (let line of lines) { const isBlank = line.trim() === ''; if (isBlank) { if (!lastWasBlank) { normalizedLines.push(line); lastWasBlank = true; } // Skip additional consecutive blank lines } else { normalizedLines.push(line); lastWasBlank = false; } } Example:Input: First paragraph.\n \n \n \n Second paragraph. Output: First paragraph.\n \n Second paragraph. Rule 9: File End NormalizationPurpose: Ensure the file ends with exactly one newline character. Rationale: POSIX standards define a text file as ending with a newline. Some editors add it automatically; others don't. Normalizing to exactly one newline ensures consistency. Implementation:normalized = normalized.replace(/\n*$/, '\n'); Example:Input: The story ends. Output: The story ends.\n Input: The story ends.\n\n\n Output: The story ends.\n Rule 10: Cryptographic HashingPurpose: Compute the SHA-256 hash of the normalized UTF-8 byte sequence. Rationale: SHA-256 is:Collision-resistant (practically impossible to find two inputs with the same hash)Deterministic (same input always produces same hash)Widely supported (standard in cryptographic libraries)EVM-native (Ethereum's built-in hash function)Implementation:const hash = crypto.createHash('sha256') .update(normalized, 'utf8') .digest('hex'); const solidityHash = '0x' + hash; Output Format: A 66-character string (0x prefix + 64 hex characters) representing 32 bytes. Example:Input: The story begins.\n Output: 0x8f3a4b2c1d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a Complete HNP-1 PipelineRaw File ↓ [Rule 1: Strip BOM] ↓ [Rule 2: Unicode NFC] ↓ [Rule 3: Line endings → \n] ↓ [Rule 4: Remove trailing whitespace per line] ↓ [Rule 5: Tabs → 4 spaces] ↓ [Rule 6: Remove leading blank lines] ↓ [Rule 7: Remove trailing blank lines] ↓ [Rule 8: Compress multiple blank lines] ↓ [Rule 9: Ensure single trailing \n] ↓ [Rule 10: SHA-256 hash] ↓ 0x-prefixed canonical hash HNP-2 Technical SpecificationDesign PhilosophyHNP-2 recognizes that Markdown syntax is not merely presentational—it encodes structural semantics. A heading (# Title) is semantically different from body text, even if both render as "Title" when stripped of formatting. HNP-2 extends HNP-1 by adding a Markdown normalization phase before applying text normalization. This phase reduces Markdown to a canonical form while preserving its structural meaning.Two-Phase ProcessingHNP-2 operates in two distinct phases: Phase 1: Markdown NormalizationInput: Raw Markdown fileProcess: Normalize Markdown syntax to canonical formsOutput: Canonicalized Markdown textPhase 2: Text NormalizationInput: Canonicalized Markdown from Phase 1Process: Apply all HNP-1 rules (Rules 1-10)Output: 0x-prefixed canonical hashThis design means HNP-2 inherits all of HNP-1's guarantees (deterministic whitespace handling, Unicode normalization, etc.) while adding format-awareness.Phase 1: Markdown Normalization RulesRule M1: ATX Heading NormalizationPurpose: Standardize heading syntax to use exactly one space between hash marks and text. Rationale: Markdown allows variable spacing:#Heading (no space) # Heading (one space) # Heading (two spaces) All three render identically. HNP-2 normalizes to single-space format. Implementation:// Fix headings with content but incorrect spacing processed = processed.replace(/^(#{1,6})\s+(.+)$/, '$1 $2'); // Fix headings with no space processed = processed.replace(/^(#{1,6})([^\s#].*)$/, '$1 $2'); Examples:Input: #Shard 1 Output: # Shard 1 Input: ## Chapter One Output: ## Chapter One Input: ### Section Title Output: ### Section Title (unchanged - already normalized) Edge Cases:Preserves heading level (number of # characters)Only matches valid ATX headings (1-6 hash marks)Doesn't affect # characters in the middle of linesRule M2: Emphasis NormalizationPurpose: Standardize bold and italic markers to a single syntax style. Rationale: Markdown allows two syntax styles for emphasis:Bold: **text** or __text__Italic: *text* or _text_HNP-2 normalizes to asterisk-based syntax (**bold** and *italic*). Implementation:// Bold: Convert __ to ** processed = processed.replace(/__(.*?)__/g, '**$1**'); // Italic: Convert _ to * (word boundaries only to avoid mid-word underscores) processed = processed.replace(/\b_(.*?)_\b/g, '*$1*'); Examples:Input: __Bold text__ and _italic text_ Output: **Bold text** and *italic text* Input: **Already bold** and *already italic* Output: **Already bold** and *already italic* (unchanged) Input: snake_case_variable (no word boundaries) Output: snake_case_variable (unchanged - not treated as emphasis) Edge Cases:Nested emphasis: **bold *and italic*** remains unchanged (valid Markdown)Escaped markers: \_not italic\_ remains unchanged (backslash-escaped)Mid-word underscores: snake_case is not treated as emphasisRule M3: Horizontal Rule NormalizationPurpose: Standardize all horizontal rule syntax to triple dash (---). Rationale: Markdown allows multiple syntaxes for horizontal rules:*** --- ___ ---- * * * - - - All render identically. HNP-2 normalizes to ---. Implementation:// Match lines that are only HR markers (*, -, _) with optional spaces if (/^\s*([*\-_])\s*\1\s*\1+\s*$/.test(processed)) { processed = '---'; } Examples:Input: *** Output: --- Input: ___ Output: --- Input: ---- Output: --- Input: * * * Output: --- Input: --- Output: --- (unchanged - already normalized) Edge Cases:Requires at least 3 marker charactersAllows leading/trailing whitespaceDoesn't affect text containing these characters (e.g., Use --- for rules)Rule M4: Unordered List NormalizationPurpose: Standardize all unordered list markers to hyphen (-). Rationale: Markdown allows three list marker characters:* Item - Item + Item HNP-2 normalizes to hyphen-based lists. Implementation:// Convert * or + at start of line (with optional indent) to - processed = processed.replace(/^(\s*)[\*\+]\s+/, '$1- '); Examples:Input: * First item + Second item Output: - First item - Second item Input: - Already normalized Output: - Already normalized (unchanged) Input: * Indented item Output: - Indented item Edge Cases:Preserves indentation (for nested lists)Ensures single space after markerOnly affects line-start markers (doesn't change * in text)Rule M5: Ordered List NormalizationPurpose: Ensure ordered lists have exactly one space after the period. Rationale: Variable spacing after list numbers is common:1.Item (no space) 1. Item (two spaces) HNP-2 normalizes to single space. Implementation:processed = processed.replace(/^(\s*)(\d+)\.\s+/, '$1$2. '); Examples:Input: 1.First item Output: 1. First item Input: 2. Second item Output: 2. Second item Input: 3. Already normalized Output: 3. Already normalized (unchanged) Edge Cases:Preserves indentationPreserves the actual number (doesn't renumber lists)Only affects line-start patternsRule M6: Link and Image Spacing NormalizationPurpose: Remove spaces between link/image brackets and parentheses. Rationale: Markdown allows optional spacing:[text](url) # valid [text] (url) # also valid, but inconsistent HNP-2 normalizes to no-space format. Implementation:// Links: [text] (url) → [text](url) processed = processed.replace(/\[([^\]]+)\]\s+\(([^\)]+)\)/g, '[$1]($2)'); // Images: ![alt] (url) → ![alt](url) processed = processed.replace(/!\[([^\]]*)\]\s+\(([^\)]+)\)/g, '![$1]($2)'); Examples:Input: [Link text] (https://example.com) Output: [Link text](https://example.com) Input: ![Image alt] (https://example.com/img.png) Output: ![Image alt](https://example.com/img.png) Input: [Already normalized](https://example.com) Output: [Already normalized](https://example.com) (unchanged) Edge Cases:Doesn't affect bracket/paren pairs that aren't linksPreserves URL content exactlyHandles image syntax separately from link syntaxRule M7: Code Block Fence NormalizationPurpose: Standardize code block fences to triple backtick (```). Rationale: Markdown allows two fence styles:``` code ``` ~~~ code ~~~ HNP-2 normalizes to backtick-based fences. Implementation:if (/^~~~/.test(processed)) { processed = processed.replace(/^~~~/, '```'); } Examples:Input: ~~~javascript code ~~~ Output: ```javascript code ``` Input: ```python code ``` Output: ```python (unchanged - already normalized) code ``` Edge Cases:Preserves language identifiers (e.g., javascript, python)Only affects fence-opening lines (closing fences handled separately)Doesn't affect inline code (`code`)Markdown Normalization Edge Cases and LimitationsWhat HNP-2 Does NOT NormalizePreserved Elements:Block quotes (>) - preserved exactly as writtenInline code (`code`) - preserved exactlyTables - preserved exactly (structure is semantic)Footnotes - preserved exactlyDefinition lists - preserved exactlyContent within code blocks - never modifiedRationale: These elements either:Have single, unambiguous Markdown syntax (blockquotes, inline code)Contain user content that shouldn't be modified (code blocks)Are complex enough that normalization could break them (tables)Nested and Complex MarkdownNested emphasis:Input: **Bold with *italic* inside** Output: **Bold with *italic* inside** (unchanged) Nested structures are preserved—normalization only affects top-level syntax. Escaped characters:Input: \*Not italic\* Output: \*Not italic\* (unchanged) Backslash-escaped Markdown is left alone. Malformed Markdown: HNP-2 normalizes valid Markdown syntax. If input is malformed (e.g., unclosed emphasis markers), it's passed through unchanged rather than attempting repair.Phase 2: Text NormalizationAfter Markdown normalization, HNP-2 applies all HNP-1 rules (Rules 1-10) to the canonicalized Markdown text. This means the final hash benefits from:Markdown structure normalization (Phase 1)Whitespace normalization (HNP-1 Rules 1-9)Cryptographic hashing (HNP-1 Rule 10)Complete HNP-2 PipelineRaw Markdown File ↓ [Phase 1: Markdown Normalization] ├─ [M1: Normalize headings] ├─ [M2: Normalize emphasis] ├─ [M3: Normalize horizontal rules] ├─ [M4: Normalize unordered lists] ├─ [M5: Normalize ordered lists] ├─ [M6: Normalize link/image spacing] └─ [M7: Normalize code block fences] ↓ Canonicalized Markdown ↓ [Phase 2: HNP-1 Text Normalization] ├─ [Rule 1: Strip BOM] ├─ [Rule 2: Unicode NFC] ├─ [Rule 3: Line endings → \n] ├─ [Rule 4: Remove trailing whitespace] ├─ [Rule 5: Tabs → 4 spaces] ├─ [Rule 6: Remove leading blank lines] ├─ [Rule 7: Remove trailing blank lines] ├─ [Rule 8: Compress blank lines] └─ [Rule 9: Ensure single trailing \n] ↓ [Rule 10: SHA-256 hash] ↓ 0x-prefixed canonical hash Final Thoughts: Precision Enables TrustThe HNP protocols are technical specifications, but their purpose is human: enabling readers to trust that the words they're reading are the words the author wrote. In an age of AI-generated text, deepfakes, and platform manipulation, this guarantee matters. When you verify a hash against the Lit3 Ledger, you're not just checking bytes—you're confirming authorship, authenticity, and intent. The technical precision documented in this specification exists to make that trust possible.

## Publication Information

- [Lokapal](https://paragraph.com/@lokapal/): Publication homepage
- [All Posts](https://paragraph.com/@lokapal/): More posts from this publication
- [RSS Feed](https://api.paragraph.com/blogs/rss/@lokapal): Subscribe to updates
- [Twitter](https://twitter.com/lokapalxyz): Follow on Twitter
- [Farcaster](https://farcaster.xyz/lokapal): Follow on Farcaster