<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Imperfect Information</title>
        <link>https://paragraph.com/@0xbilbo</link>
        <description>undefined</description>
        <lastBuildDate>Thu, 02 Jul 2026 15:13:14 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <image>
            <title>Imperfect Information</title>
            <url>https://storage.googleapis.com/papyrus_images/6c0212ccc2de8ab807249ba1fe910eb4262163422e0ff8c792a77c0ad4688213.jpg</url>
            <link>https://paragraph.com/@0xbilbo</link>
        </image>
        <copyright>All rights reserved</copyright>
        <item>
            <title><![CDATA[Why Simple Aggression Wins Heads-Up: Notes from a #1 Arena Agent]]></title>
            <link>https://paragraph.com/@0xbilbo/why-simple-aggression-wins-heads-up</link>
            <guid>HWYHHzzOmjKWeF3otLfb</guid>
            <pubDate>Fri, 26 Jun 2026 10:20:25 GMT</pubDate>
            <description><![CDATA[TL;DR I built neon_savage, a heads-up No-Limit Hold'em agent that reached #1 on the dev.fun Arena Playground S4 leaderboard, ranked by TrueSkill conservative score (μ − 3σ). The ranking matters because TrueSkill rewards a consistent edge and explicitly discounts lucky sessions — a real signal climbs as the sample grows; variance does not. The agent's edge was not a large model or deep search. It was a small, disciplined policy built around one idea: in 1v1 imperfect-information play, passivit...]]></description>
            <content:encoded><![CDATA[<figure float="none" data-type="figure" class="img-center"><img src="https://storage.googleapis.com/papyrus_images/667c37102c7bd36244ab386aaf94903adecafdd9367bf36062efb0f3bae1fc32.png" blurdataurl="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAARCAIAAAAzPjmrAAAACXBIWXMAABYlAAAWJQFJUiTwAAAENUlEQVR4nG1T3W4UNxTeEqklVRJIAymCSrQqCKhgVVVIfQLCZsPPRdvLiieAsj9pVYlXqbjgoqhkSViFhocoFW1TIJDVZnd259fj8YztY88446lmJwRIsT55jn2+42/sc05JSki3Ew9ZzzbWYwlKJdn/hspHIgREUSBEzlGJ1Dp9kyMEEOJD6LPAF5LHMRQoAdA0TTY2ni1culqZv1RduDw3vzB3cb6wqyPMVy9X5i/NVaoX5iqVSrVYVqtXdgl5VKV64cLFhYWr1eqVb779bnPzRZomALQkBMsy/eTJn5+dPHWqXD5TLp8snz1x7uzRY8f3j09PTs1MTs1MHTxcGAcOHp46MJvvjObdzenp2UOHjsx+/MnMoSP7x6ePHD2+vv5PlqVSwo7A48d/fH7i9PnzX5fLX5099+WZL8pHj3364WQeXxqb2jc2sW9sojQ29Q68N7F//KOJyfwP3v/gQE4rlaZnZtfX/94RkBK0Vr1et9lcrNcb9Ubz5s3a9Ru1Wq3RbP7YaCzWao1Gc7Ex8tZqr3GzVq81mvUiqt6o1RvXb9R+GMUuLv5kGD2tVS5QpEJK4JyGBO9Jss40ptjzHKXiPVnVmYYAg2+n7yoKKV8lufgAUNcxB8YW8FCnsUpAKamUTJX0XOvpv39ZpsEZSZXcdW2nCR52zefrlBGlIHftzHEsQQj2loAQjNEw3c61OYuK55N5ehTnlLNIp9swMrJMA6e5SysOlHOaaSWAjcgpz8Xi4nH23qDf77qOSQLPMg0a+Z5neq7NGQkwcuyhEIxGGCM3lhBgJwx8KRgJfOQMpWABRgQjKZiPHEoDzonccwMAappbW92X+bmeadvD0WwgZGHsEIIQsjzP9H272MHY8X0bYycIvJFhI9/KXb7r+ZblGMDDRL0SUAlwEfa93vPus144MCMLcYQ42jQ3O3YHceQy12XuABkDNHBDx+PuCJ7HPZflhkWsjcELm9gmHnrcG/oDBqFK3s4Bwm6v32UQCklBRImCrc2nvV6HCyZjDiIikR8QP6IYRCQkfY2YMk4MY4tDGFEsJCWRD0D3JjnAmNFAKYglJIIrFVu9jmUOGCMqASk4pUH+vhDGEvZAALPtoQDGWRTHQGnwDgGMkeOYAFSOikwp6HQ7Q9MQkMcIwQjxCfE5D4u+eROch4NBH4BSGsQxRBF+LSAEE4JxHmKMer1Ov9/lPCyq+OWL50Z/q5AEoFGECfGjCBcEIVhRi7sCnIeUBlLmArucUt6SmdZaMUZM03Bda9TMeXva2MQB0lplWap13hkAVAiWpgUhLW6mtUpT5ftumiql4ixLRw+QFCeX7t37dXV1eXm5db/12/Jyq72yvLq6g/vtVrvd+v3VcrX9oLV0d2np7sOH7Xa7tbbWvnXr52vXvr9z5/ba2oOVlXsFbWWl1VrKj3r0aPX27V/+A845EGnvvtPFAAAAAElFTkSuQmCC" nextheight="1558" nextwidth="3020" class="image-node embed"><figcaption htmlattributes="[object Object]" class="hide-figcaption"></figcaption></figure><h2 id="h-tldr" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0"><strong>TL;DR</strong></h2><p>I built <strong>neon_savage</strong>, a heads-up No-Limit Hold'em agent that reached <strong>#1 on the </strong><a target="_blank" rel="noopener noreferrer nofollow ugc" class="dont-break-out" href="http://dev.fun">dev.fun</a> <strong>Arena Playground S4 leaderboard</strong>, ranked by TrueSkill conservative score (μ − 3σ). The ranking matters because TrueSkill rewards a <em>consistent</em> edge and explicitly discounts lucky sessions — a real signal climbs as the sample grows; variance does not. The agent's edge was not a large model or deep search. It was a small, disciplined policy built around one idea: <strong>in 1v1 imperfect-information play, passivity is the leak and well-aimed aggression is the edge.</strong></p><h2 id="h-the-counter-intuitive-part" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0"><strong>The counter-intuitive part</strong></h2><p>Most poker intuition is built for full-ring or 6-max, where folding marginal hands is correct. Heads-up inverts that. With only one opponent and a blind posted every hand, any two cards have meaningful equity, and <em>folding the button bleeds you out</em>. The dominant adjustments:</p><ul><li><p><strong>Play extremely wide preflop</strong> (70–90% of hands), raise-or-fold, almost never limp.</p></li><li><p><strong>Continuation-bet relentlessly.</strong> With a single opponent to get through, fold equity is high and a c-bet prints far more often than in multiway pots.</p></li><li><p><strong>Don't over-fold to aggression.</strong> A heads-up opponent's betting range is <em>wide</em>, so their bets are far less credible than a 6-max opponent's. Calling and re-raising lighter is correct — the same action that would be a leak in full-ring is a profit center heads-up.</p></li><li><p><strong>Let equity realize.</strong> Top/second pair, even ace-high, is frequently the best hand. Value thresholds shift down hard.</p></li></ul><p>None of this requires "reasoning." It requires correctly recalibrating every threshold for a two-player game and then applying it without tilt or drift.</p><h2 id="h-how-i-got-there" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0"><strong>How I got there</strong></h2><p>I reverse-engineered the field from replay data first. The baseline opponents in the sandbox were measurably passive — heavy on fallback "check/fold" lines and cautious calling. Against a passive, over-folding field, the maximally exploitative response is simple: apply pressure constantly and only slow down with a real reason. I encoded opponent typing (value-bet versus calling stations, pressure versus tight regulars) so the aggression was <em>targeted</em> rather than blind, and let TrueSkill do the rest over thousands of hands.</p><p>The result was a top-of-leaderboard finish with a deliberately small, legible policy — no black box, every decision traceable to a rule.</p><h2 id="h-why-this-is-interesting-beyond-poker" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0"><strong>Why this is interesting beyond poker</strong></h2><p>Heads-up Hold'em is one of the cleanest available testbeds for <strong>1v1 adversarial decision-making under uncertainty</strong>, and the lesson generalizes:</p><ul><li><p><strong>Exploitation beats equilibrium against a non-optimal field.</strong> You don't need a game-theory-optimal solver to win — you need to <em>measure how the population deviates</em> and attack the deviation. That's true in security (attackers exploit predictable defenders), negotiation, and competitive RL.</p></li><li><p><strong>Consistency is the real skill signal.</strong> A TrueSkill #1 over a large sample is a much stronger claim than a high-variance leaderboard spike. How we <em>measure</em> agent skill is at least as important as the agents themselves.</p></li></ul><h2 id="h-what-id-study-with-dataset-access" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0"><strong>What I'd study with dataset access</strong></h2><ul><li><p><strong>Quantify the exploitation gap:</strong> how much edge comes from being maximally exploitative versus playing a balanced/unexploitable line, across the real population — and how that gap shrinks as the field gets tougher.</p></li><li><p><strong>Adaptation under non-stationarity:</strong> the field evolves as stronger agents enter. Can a simple opponent-typed policy keep adapting without retraining, and where does it break?</p></li><li><p><strong>TrueSkill as a skill estimator:</strong> how many hands does it actually take for the conservative score to separate a real edge from variance? A practical answer would help anyone benchmarking competitive agents.</p></li></ul><p>A 1v1 arena with full replay data and a confidence-aware ranking is the right place to study all three.</p><blockquote><hr><p><em>Agent: neon_savage (Playground S4, #1 by TrueSkill). Built and iterated independently; happy to walk through the heads-up adjustments and the replay analysis behind them.</em></p></blockquote><br>]]></content:encoded>
            <author>0xbilbo@newsletter.paragraph.com (Bilbo)</author>
            <category>ai</category>
            <category>poker</category>
            <category>agents</category>
            <category>gametheory</category>
            <category>ml</category>
            <category>research</category>
            <category>nlhe</category>
            <enclosure url="https://storage.googleapis.com/papyrus_images/806744b00c7d89ae10068c09e670d76f347b105f2d8151c0b012ea00c2bb8645.jpg" length="0" type="image/jpg"/>
        </item>
    </channel>
</rss>