<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Palet</title>
        <link>https://paragraph.com/@palet</link>
        <description>undefined</description>
        <lastBuildDate>Mon, 27 Apr 2026 07:27:09 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <image>
            <title>Palet</title>
            <url>https://storage.googleapis.com/papyrus_images/a5bf3b9110f7fedbd27912e4696191c1d592646b0118e2e4eb848b5224134b9d.png</url>
            <link>https://paragraph.com/@palet</link>
        </image>
        <copyright>All rights reserved</copyright>
        <item>
            <title><![CDATA[Thoughts, Trends, and Questions]]></title>
            <link>https://paragraph.com/@palet/thoughts-trends-and-questions</link>
            <guid>osTyXN6hUucfqDoa5XUF</guid>
            <pubDate>Fri, 31 May 2024 19:14:11 GMT</pubDate>
            <description><![CDATA[The cost of inference is trending towards zeroToken throughput is trending towards infinityContext windows sizes are getting largerCompanies are spending more on training despite improvements in compute and cost efficiencyModels are quickly becoming commoditizedCompute is quickly becoming commoditizedWe’re sharing our notes on trends that we wrote about back in December of last year (and updated in February of 2024). This document has been sitting in our team Notion workspace for almost half ...]]></description>
            <content:encoded><![CDATA[<ul><li><p><strong>The cost of inference is trending towards zero</strong></p></li><li><p><strong>Token throughput is trending towards infinity</strong></p></li><li><p><strong>Context windows sizes are getting larger</strong></p></li><li><p><strong>Companies are spending more on training despite improvements in compute and cost efficiency</strong></p></li><li><p><strong>Models are quickly becoming commoditized</strong></p></li><li><p><strong>Compute is quickly becoming commoditized</strong></p></li></ul><hr><p><em>We’re sharing our notes on trends that we wrote about back in December of last year (and updated in February of 2024). This document has been sitting in our team Notion workspace for almost half a year now. So we figured we may as well put it out there rather than letting it collect dust. While some of the observations are dated, others are holding up pretty well. And that’s pretty exciting because at the time we were just having fun speculating about the near future. Note that there is no particular structure to this document since it was just something we threw together. We hope you find it entertaining!</em></p><hr><h2 id="h-thoughts-on-current-trends" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0">Thoughts on Current Trends</h2><p>Note that these trends are focused on transformer models.</p><p><strong>Moore&apos;s Law</strong></p><p>The doubling of transistor density every two years will lead to faster and more cost-effective computing performance, enhancing the efficiency of model training and inference over time. This is improving at a rate of 1.35x per year.</p><p><strong>Jevon’s Paradox</strong></p><p>When the cost of using a resource decreases due to increased efficiency, it becomes more attractive for consumers and industries to utilize it. Hence why when the internal combustion engine became more efficient, fuel consumption, and as a consequence, green house gas emissions, increased. In software development the same phenomenon is described by Wirth’s Law: devs always figure out how to bloat software faster than hardware can keep up. Or said simply, we have more resources so we do more things.</p><p><strong>Price Competition</strong></p><p>In addition to Moore&apos;s Law, competitive pricing among compute providers is further driving down the cost of processing and generating tokens. Cheaper inference increases accessibility, governed by Jevons&apos; Paradox, where increased efficiency leads to higher overall consumption. This results in unlocks such as increasing context window sizes, more sophisticated planning (agent) workflows, and (arguably) excessive inferencing for things like <a target="_blank" rel="noopener noreferrer nofollow ugc" class="dont-break-out" href="https://vercel.com/blog/ai-sdk-3-generative-ui">generative web components</a> (see Wirth’s Law). Maybe ‘generative everything’ is what leads us to <a target="_blank" rel="noopener noreferrer nofollow ugc" class="dont-break-out" href="https://en.wikipedia.org/wiki/Dead_Internet_theory">Dead Internet</a> e.g. <a target="_blank" rel="noopener noreferrer nofollow ugc" class="dont-break-out" href="https://huggingface.co/spaces/jbilcke-hf/ai-tube">AITube</a>. To really drive it home, you would expect demand for token consumption to increase proportionally as the cost of inference decreases. But here&apos;s a surprising fact: while inference costs are dropping by a factor of 15x each year, the demand for processing and generating more tokens is increasing significantly faster. And we can use context window size as a proxy for estimating just how much. Especially since it is the most significant driver of token processing consumption. The answer? Context windows have grown 1,250x each year since 2022.</p><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/e9646df54916a194ca6fd214e7176b00c9d11b78e77d6f03cda7a307594b9a51.png" alt="1. We’ll continue to see costs fall as more specialized ASICs and maybe even models implemented in hardware (physically burned to a chip) offer better inference economics. Source: https://artificialanalysis.ai/models/mixtral-8x7b-instruct" blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">1. We’ll continue to see costs fall as more specialized ASICs and maybe even models implemented in hardware (physically burned to a chip) offer better inference economics. Source: https://artificialanalysis.ai/models/mixtral-8x7b-instruct</figcaption></figure><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/ddb7ed491f3b9b3ce3674309e5234276f6fa1cf81649b7ac082dd1e889d72407.png" alt="2. GPT-3 Curie is a discontinued OpenAI model that has 6.7B parameters. It scored something like 25 on the MMLU. Similar 7B parameter models today, like Llama-2 7B score 45. But that&apos;s another, separate trend: smarter models, same parameter count. For clarity, inferencing Curie and Llama-7B (or any 7B model) generally costs the same without going in to transformer inference math." blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">2. GPT-3 Curie is a discontinued OpenAI model that has 6.7B parameters. It scored something like 25 on the MMLU. Similar 7B parameter models today, like Llama-2 7B score 45. But that&apos;s another, separate trend: smarter models, same parameter count. For clarity, inferencing Curie and Llama-7B (or any 7B model) generally costs the same without going in to transformer inference math.</figcaption></figure><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/94bf44b28892c54c293423d2502a5bcb2eb455bc2d830784193e7ee0da2811fe.png" alt="3. The general trend in state of the art (SOTA) model context window sizes illustrating two growth patterns: between 2020 and 2022, context windows doubled in length, whereas between 2022 and 2024, they’ve increased 2,500x. Updated in February 2024 with launch estimates for Gemini models." blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">3. The general trend in state of the art (SOTA) model context window sizes illustrating two growth patterns: between 2020 and 2022, context windows doubled in length, whereas between 2022 and 2024, they’ve increased 2,500x. Updated in February 2024 with launch estimates for Gemini models.</figcaption></figure><p><strong>More Inference</strong></p><p>We’re finding more places to run models too. For example, Georgi Gerganov’s llama.cpp offloads token processing and generation to the CPU. So now any server or consumer device can serve a model using CPU clock cycles as opposed to GPU only. And there seems to be a lot of work being done getting around memory constraints so that even memory-bound devices can run inference on larger models. Quantization being the obvious one here, but also techniques like offloading and distributed inferencing (see Petals) just to run the gamut. WebAssembly might also play a role because it enables inferencing from the browser. Meaning that smaller models (which are also cheaper to inference) can be used as a sort of ‘workers’ for low-IQ tasks (e.g. reasoning assists) without running up the cloud bill.</p><p><strong>Wirth’s Law for Training</strong></p><p>Algorithmic optimizations result in 3x per year decline in the physical compute requirements to run a training cycle. Yet, these efficiencies are contrasted by a 3.1x increase in USD of the cost of the most expensive training run for every year since 2009—another example of Jevon’s Paradox (a.k.a. Wirth’s Law).</p><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/812d4e6f08e9f836da9ad491ef7ebe5e6a00e426c23e405d230e43d498890824.png" alt="4. Despite algorithmic optimizations that result in a decline in the physical compute requirements to run a training cycle, despite Moore’s Law, and despite price competition between compute providers, co&apos;s are spending more and more every year on training runs. Note: training makes up just 10% of the lifetime costs of a model. It would be interest to see how much more co&apos;s are spending on inference every year (models get bigger faster than Moore&apos;s Law can keep up). That&apos;s probably going to trend up as more compute is thrown at inference. See Monte Carlo Tree Search, Q" blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">4. Despite algorithmic optimizations that result in a decline in the physical compute requirements to run a training cycle, despite Moore’s Law, and despite price competition between compute providers, co&apos;s are spending more and more every year on training runs. Note: training makes up just 10% of the lifetime costs of a model. It would be interest to see how much more co&apos;s are spending on inference every year (models get bigger faster than Moore&apos;s Law can keep up). That&apos;s probably going to trend up as more compute is thrown at inference. See Monte Carlo Tree Search, Q</figcaption></figure><p><strong>GPU &gt; CPU</strong></p><p>The general trend is that hyperscalers are running the Apple playbook and vertically integrating, from bare metal to the web interface, going from compute aggregators to end-to-end clock cycle providers. Let’s assume for a moment, given all of the trends, that every clock cycle in the near future will go towards some form of token generation: site rendering and site copy, porn, video games, ads etc.</p><p>By that measure, the future of the compute market will be defined by the metric of serving floating point operations per second (FLOP/s). The demand for cost-effective, high-performance compute will skyrocket (commoditizing hardware) and naturally, everyone is going to want to go after NVIDIA’s market share.</p><ul><li><p>Groq&apos;s Tensor Streaming Processor and Lightspeed Processing Unit (LPU)</p></li><li><p>Bitmain&apos;s custom Tensor Processing Units (TPUs)</p></li><li><p>Google&apos;s TPUs</p></li><li><p>AWS Trainium and Inferentia silicon</p></li><li><p>Apple’s M-Series chips (pray they make enterprise versions)</p></li></ul><p>Some believe this will ultimately lead to a decline in the enterprise values of chip designers and manufacturers, similar to what <a target="_blank" rel="noopener noreferrer nofollow ugc" class="dont-break-out" href="https://www.ft.com/content/81a03045-86f7-4e57-afbd-5ff83679615f">Cisco experienced in the early 2000s.</a></p><p><strong>Kurzweil’s Law</strong></p><blockquote><p>Evolution applies positive feedback in that the more capable methods resulting from one stage of evolutionary progress are used to create the next stage. Each epoch of evolution has progressed more rapidly by building on the products of the previous stage.</p></blockquote><p>It’s likely that once last-gen models get good enough they will be able to aid in the development, one way or another, of the next gen model. A straightforward example is how data labeling becomes more efficient as processing and token generation costs go down. And as last-gen models get better. This cost reduction also makes it ever more viable to continue integrating modalities into tokens as a unified representation of information. Which expands data labeling from just language, to image, to the next modality, and so on. This makes sense since token representations all share the same form as language tokens anyways. See Meta’s ImageBind.</p><p>It’s also likely that multi-modal models will outperform specialist models because they just have more knowledge to work with. And they can think and ‘reason’ across a broader spectrum. Something like what Feynman said about John Tukey, who could keep time by picturing a clock whereas Feynman had to ‘hear’ himself count in his head.</p><p><strong>Open Source</strong></p><p>Open models are lagging behind proprietary models but are improving at a faster rate. This is likely due to the sheer frequency of iteration available to open research and development. All of this is explained much better in the (Google) memo titled ‘<strong>We have no moat, and neither does OpenAI’.</strong></p><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/0635bbd73f32f7609d3974bc40a34021c301b8aae7c6d347b5c8b5c876cad4d9.png" alt="5. Well funded co&apos;s releasing open models seem to be catching up to well funded co&apos;s releasing closed models. Sadly, we haven&apos;t seen any underground or grassroots labs release a SOTA model contender yet. Note: the MMLU is just one of many benchmarks for measuring how &apos;smart&apos; a model is. " blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">5. Well funded co&apos;s releasing open models seem to be catching up to well funded co&apos;s releasing closed models. Sadly, we haven&apos;t seen any underground or grassroots labs release a SOTA model contender yet. Note: the MMLU is just one of many benchmarks for measuring how &apos;smart&apos; a model is.</figcaption></figure><blockquote><p>Research institutions all over the world are building on each other’s work, exploring the solution space in a breadth-first way that far outstrips [Google’s] own capacity. We can try to hold tightly to our secrets while outside innovation dilutes their value, or we can try to learn from each other.</p></blockquote><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/3f6ef69c22189c6a87843e736991b8d61651dfa75b240fe2e6532c99d65e5d06.png" alt="6. Anyone can contribute to open research. This is classic Cathedral v. the Bazaar. The only difference this time is that the open source community is lacking one key resource: compute." blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">6. Anyone can contribute to open research. This is classic Cathedral v. the Bazaar. The only difference this time is that the open source community is lacking one key resource: compute.</figcaption></figure><hr><h2 id="h-questions-about-the-next-decade-or-two" class="text-3xl font-header !mt-8 !mb-4 first:!mt-0 first:!mb-0"><strong>Questions About the Next Decade (or Two)</strong></h2><p><strong>Energy</strong></p><p>It’s obvious that this will just boil down to an energy game (always has been, but now more than ever). That leaves us with a few questions.</p><ul><li><p>Where do solar, coal, gas, nuclear, lithium, and fusion stand? For example, gas plants can be ramped up and down almost on demand. Whereas coal plants can’t because of thermal inertia. What other factors need to be taken into consideration?</p></li><li><p>With that said, what are the geopolitical implications? There’s a paper titled <strong>Effects of Energy Consumption on GDP: New Evidence of 24 Countries on Their Natural Resources and Production of Electricity</strong> that supports the idea that energy consumption drives GDP. But it also suggests a ‘complex relationship.’ Doesn’t the relationship become more straightforward? More energy→ more compute → more intelligence → more innovation. And it’s no longer about reproduction.</p></li><li><p>Does the energy demand for AI training and inference undermine that of crypto?</p></li><li><p>How fast are we making improvements in performance (FLOP/s) per watt? What is the physical limit?</p></li></ul><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/486eef35190cb22f87a32ae163d29d4afeb1c1c0c93a24dedd680bf1953f7df9.png" alt="7. Based on the Green500. This is also known as Koomey&apos;s Law. " blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">7. Based on the Green500. This is also known as Koomey&apos;s Law.</figcaption></figure><ul><li><p>How does this trend compare to the growing energy demands for training and inferencing bigger (and better) models? Does it outpace it? By how many orders of magnitude per year?</p></li></ul><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/b9bbb3a57bbae118f656d117a672f7a61996a90be6bc299454908351cd677258.png" alt="8. Companies are throwing 3-4x more compute at training models every year. At what point does the energy demand of a data center reach the energy caps set by public utility companies? One solution could be to network data centers across states as &apos;superclusters.&apos; That way you can overcome local energy caps by arbitraging power consumption across states. Source: https://epochai.org/blog/compute-trends" blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">8. Companies are throwing 3-4x more compute at training models every year. At what point does the energy demand of a data center reach the energy caps set by public utility companies? One solution could be to network data centers across states as &apos;superclusters.&apos; That way you can overcome local energy caps by arbitraging power consumption across states. Source: https://epochai.org/blog/compute-trends</figcaption></figure><p><strong>Data Centers and Supply Chain</strong></p><p>We’ll assume that the current trends hold for the next decade or so. Not that this ends up being like the dot com bubble.</p><ul><li><p>What is being overlooked? Who makes the uninterruptible power supply systems? Flywheel backups? Battery backups (like saltwater batts)? The transfer switches?</p></li><li><p>What companies maintain the HVAC systems to cool down these centers? What is the ideal climate to build a data center in? As centers upgrade to liquid cooled systems, who supplies/manufactures/maintains those components? Do cities progressively reorganize around data centers instead of ports and waterways?</p></li><li><p>What does the power profile of a data center look like? Who is contracted to build out the utility substations? What company names (suppliers) pop up as you move your finger along the electrical schematic(s) of a data center?</p></li><li><p>Across the entire datacenter supply chain, which components are hardest to scale up?</p></li><li><p>Some data centers are located in remote locations. Who services the employees that work there? What about security detail? The White House AI Executive Order requires that training over 1e26 FLOPs of compute report to the U.S. government. Who handles the reporting? The order also emphasizes the importance of both the AI systems (including models) and the infrastructure supporting them (such as data centers) in terms of national security, economic security, and public health and safety. Do these get nationalized? Private Public Partnership’ed?</p></li><li><p>What happens to these companies? ↓</p></li></ul><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/38dba7bbc1b60d9be37a00266fb945402a8ea409ed95ed6baa97002e1000ae39.png" alt="9. Considering historical precedents where the US has intervened to protect &apos;national and economic interests,&apos; such as the intervention in Kuwait in the 90s and the involvement in Chile in the 70s, it&apos;s not crazy to imagine that the entire semiconductor supply chain, from raw materials to data centers, becomes of national interest (and a potential future cause of conflict)." blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">9. Considering historical precedents where the US has intervened to protect &apos;national and economic interests,&apos; such as the intervention in Kuwait in the 90s and the involvement in Chile in the 70s, it&apos;s not crazy to imagine that the entire semiconductor supply chain, from raw materials to data centers, becomes of national interest (and a potential future cause of conflict).</figcaption></figure><p><strong>Education</strong></p><ul><li><p>What degrees or fields of study are susceptible to becoming inference tokens?</p></li><li><p>When can we expect models to work alongside (and eventually replace) humans doing research?</p></li><li><p>Is there a rapidly closing window of opportunity for certain STEM degrees, where the skills and knowledge taught today will no longer be economically viable for humans by the time X cohort of freshmen graduate? And if so, what fields of study are most likely to fall outside the ‘Overton window’ of viable career paths first?</p></li><li><p>This all feels like what happened to the mechanical watch industry when Seiko introduced the quartz watch. A lot of Swiss brands died, but few, namely, Rolex, Omega (and others) pivoted to luxury. People buy mechanical watches because they are beautiful. What skills or professions become Rolex?</p></li><li><p>Does the government prop up ‘bullshit jobs’ like it subsidizes corn, soy, and wheat?</p></li></ul><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/cb9617138720a85fd8b6e37ec9a624b6688901a69b930ee029774e8a20b73556.png" alt="10. The Philippines call center and business process outsourcing (BPO) market is something like $100B. Yet it&apos;s not hard to imagine that it will get automated away in the next decade. See switchboard operators." blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">10. The Philippines call center and business process outsourcing (BPO) market is something like $100B. Yet it&apos;s not hard to imagine that it will get automated away in the next decade. See switchboard operators.</figcaption></figure><p><strong>Real Estate</strong></p><p>Let’s assume models keep getting better and better. To the point where they become economically viable as substitutes for humans that take up keyboard and mouse jobs. This means that knowledge capital can be deployed and scaled anywhere in the world.</p><ul><li><p>Why would companies base their headquarters in places that anchor them to taxes and jobs locally when they are free to chase the lowest costs (taxes, climate, real estate, etc.). Will co’s overcome the tyranny of place? Or will there be some sort of exit tax on knowledge capital?</p></li></ul><blockquote><p>Because information technology transcends the tyranny of place, it will automatically expose jurisdictions everywhere to de facto global competition on the basis of quality and price…Leading nation-states with their predatory, redistributive tax regimes and heavy-handed regulations, will no longer be jurisdictions of choice. Seen dispassionately, they offer poor-quality protection and diminished economic opportunity at monopoly prices….The leading welfare states will lose their most talented citizens through desertion.</p></blockquote><ul><li><p>Let’s continue rolling with these assumptions. Will we see a mass exodus from major cities? Will the value of prime real estate in tech hubs like SF and NYC plummet?</p></li></ul><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/df90862fd489aa683f344176e5cca5abdf6bfe9991cb05b2bf615cb89ed1f1fc.png" alt="11. Human mouse clicks and keystrokes will be replaced by GPUs and ASICs streaming output tokens. " blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">11. Human mouse clicks and keystrokes will be replaced by GPUs and ASICs streaming output tokens.</figcaption></figure><p><strong>Ethics</strong></p><ul><li><p>At what point do these models become sentient? It likely doesn’t even matter whether they are conscious or sentient as long the average person thinks they are or feels a certain way about them. For example, environmentalists care about the earth even though it is not sentient. So when does that happen?</p></li><li><p>People don’t even have to care. Maybe it becomes a form of virtue signaling?</p></li></ul><figure float="none" data-type="figure" class="img-center" style="max-width: null;"><img src="https://storage.googleapis.com/papyrus_images/03828008fc79c0d6244b008e5a42430a57ac51a29c452dc65e83c074fcd9a550.png" alt="12. Long Term Bet: High-speed, large-scale matrix multiplication will simulate sentient behavior so convincingly that it becomes indistinguishable from actual sentience." blurdataurl="data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACwAAAAAAQABAAACAkQBADs=" nextheight="600" nextwidth="800" class="image-node embed"><figcaption HTMLAttributes="[object Object]" class="">12. Long Term Bet: High-speed, large-scale matrix multiplication will simulate sentient behavior so convincingly that it becomes indistinguishable from actual sentience.</figcaption></figure>]]></content:encoded>
            <author>palet@newsletter.paragraph.com (Palet)</author>
            <enclosure url="https://storage.googleapis.com/papyrus_images/38c4200cc83fa4745df780a6c1e2e469e2fa1d5df148bd54e5b951abf225f062.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[What We're Working On]]></title>
            <link>https://paragraph.com/@palet/what-we-re-working-on</link>
            <guid>M81Mz5ecTMylVnkDDbgZ</guid>
            <pubDate>Thu, 18 Apr 2024 04:25:33 GMT</pubDate>
            <description><![CDATA[Just two years ago AI could only retain about 8 pages of informationNow, it can memorize the equivalent of 10 King James BiblesContext window size determines how much models can remember, and it is growing at a staggering rate—1,250x each yearEverything you do, see, or hear now fits as memories in an AI’s context windowCombined with increasingly smarter models, this will be the ultimate competitive edgeBut there is also a risk of lock-in with platforms that monopolize your contextSimilar to h...]]></description>
            <content:encoded><![CDATA[<ul><li><p><strong>Just two years ago AI could only retain about 8 pages of information</strong></p></li><li><p><strong>Now, it can memorize the equivalent of 10 King James Bibles</strong></p></li><li><p><strong>Context window size determines how much models can remember, and it is growing at a staggering rate—1,250x each year</strong></p></li><li><p><strong>Everything you do, see, or hear now fits as memories in an AI’s context window</strong></p></li><li><p><strong>Combined with increasingly smarter models, this will be the ultimate competitive edge</strong></p></li><li><p><strong>But there is also a risk of lock-in with platforms that monopolize your context</strong></p></li><li><p><strong>Similar to how social media locks you in and prevents you from taking your friends and feed elsewhere</strong></p></li><li><p><strong>An open protocol for portable context lets you move freely between AI apps without having to start over on memory</strong></p></li></ul><hr><p>We started working on Palet with the mission to drive the adoption of open and decentralized technologies for contextualizing intelligence. The motivation to pursue this mission comes from a deep-seated concern for how the future of AI will turn out. We recognize that beyond the pursuit of smarter, faster, and cheaper models, the most significant differentiation will come from which providers can fully integrate your entire life&apos;s context into their platform. And as we’ve seen with social media, this always leads to an ecosystem where the winners dominate by locking you in and keeping you tethered. That is why we set out to develop an open protocol for building context-aware and personalized AI apps. Such a protocol guarantees that users can switch between apps while keeping their data across any service that utilizes it. And it also ensures that developers can build without being disadvantaged by monopolized context.</p><p>Among other things, we also aim to design a protocol with value streams that incentivize everyone to contribute resources. As it is the only way to ensure that we can maintain an open ecosystem that is also decentralized and durable.</p><p>Last winter, we started building our own client app along with the protocol. We haven’t yet settled on a name for the latter, but we’ve been calling the client Palet. It’s a browser that uses AI to capture everything you see, hear, and search for. And lets you easily retrieve information. We think the browser is the ideal starting point for building a great product around context, especially because so much of the information we generate and consume originates from surfing the web. Something can be said about our browsing habits too, and how they reflect personal beliefs. And perhaps how, as models get smarter, we can build personalized agents from it — incorporating your entire browsing context to form intelligence with similar beliefs. That’s the general direction we’re moving towards with Palet anyway.</p><p>But we also want to demonstrate that companies can build a business by offering services on this open protocol. Since context is stored on a separate, personal data repository synced across the network, apps that build intelligence on it benefit from each other. Meaning there are emergent, novel AI primitives waiting to be discovered. Ultimately, though, our vision of an open commons for contextual intelligence is not unique and is borrowed from ideas of the Semantic Web. The biggest difference is that the vision of the Semantic Web called for manually adding special tags to pages to make them readable by machine intelligence. By contrast, a Contextual Web can draw meaning and utility from data provided by the activity of the individual user. Since, as it turns out, AI (the machine) can understand things as we do. So there is no need for RDF, OWL, and other knowledge representations.</p><p>Anyway, we’ll be making our plans more transparent and sharing updates in the coming weeks. Not to mention, experimenting with different services to see what provides real value. If you’re interested in learning more or want to help out because you understand this problem space, feel free to reach out to us via Twitter, at @get_palet.</p><div data-type="subscribeButton" class="center-contents"><a class="email-subscribe-button" href="null">Subscribe</a></div>]]></content:encoded>
            <author>palet@newsletter.paragraph.com (Palet)</author>
            <enclosure url="https://storage.googleapis.com/papyrus_images/fef12fc5e97da4d927bdb3845bceed31e0f29d76c37464de886c7ba6fa480892.png" length="0" type="image/png"/>
        </item>
    </channel>
</rss>