ASXN Labs

Morpheus Token Economics

asxn-labs@newsletter.paragraph.com (ASXN Labs) — Wed, 08 May 2024 12:51:14 GMT

Introduction

Morpheus is designed to foster the development of a peer-to-peer network, comprising personal general-purpose AIs capable of executing Smart Contracts on a user's behalf, termed Smart Agents. By offering open-source Smart Agents and LLMs that facilitate connections to users' wallets, decentralized applications, and smart contracts, Morpheus aims to democratize access to AI and the Web3 ecosystem for all users.

The architecture incorporates a user’s Web3 wallet for essential key management and to authenticate recommended transactions in interactions with the Smart Agent. It leverages a Large Language Model, which is trained on comprehensive Web3 data encompassing Blockchains, Wallets, Dapps, Decentralized Autonomous Organizations, and Smart Contracts, ensuring informed and secure operations within the network.

In this article we will analyze the token economics of the Morpheus network, looking into the tokens utility functions, distribution and emissions. Toward the end we will explain the launch of the MOR token and look at some potential launch prices for MOR.

Token Utility

The Morpheus network introduces the MOR token, which serves as a broad utility token for multiple functions within the smart agent protocol. Primarily, the MOR token acts as the base currency of the protocol, allowing users access to inference and rewarding the key shareholders in the ecosystem. We can analyze the utility of the token by shareholder:

Users

The Morpheus network, through the work provided by the various computer providers, can produce a maximum number of Lang Tokens (language tokens, similar to characters of text) per day. This maximum is constrained by the compute budget of the network. Each MOR token can access a pro-rata share of this compute budget per day and thus a pro-rata share of the number of Lang Tokens (LT) generated per day. In this way the MOR token finds utility in its ability to access the underlying decentralized computational power of the network. For example, if a user holds 10% of the circulating MOR tokens, that user has the authority to access 10% of the networks daily compute, allowing the user to generate 10% of the total output of the network that day (the output here being the number of Lang Tokens returned by the compute providers).

Users are also able to use the MOR token as currency in order to acquire access to specialized smart agents released by developers onto the smart agent protocol. They can also opt-in to selling their data in order for smart agents or LLM developers to train specific models.

Compute Provider

Compute providers are entities running full nodes that provide access to computation resources to allow users to access inference from models that cannot be handled locally. They typically will operate Graphical Processing Units (GPUs) either on-site, in the cloud or via decentralized DePin networks such as Akash or Render. Compute providers offer Inferences Per Second (IPS) bids through the Morpheus router. Essentially the compute providers allow users access to the outputs of computationally intensive models and in return they are rewarded with MOR tokens for each successful interaction with a user.

This mechanism is similar to the Proof of Work (PoW) consensus that the Bitcoin network runs on. Miners in the Bitcoin network are rewarded with a BTC block subsidy for each successful block they validate and build. Compute providers find utility in the MOR token in a couple of ways:

Qualification - To qualify as a valid compute provider for the network and receive compute requests from the router, the compute provider's public address must hold MOR tokens. This is to dissuade Sybil attacks.
Payment - Operating the infrastructure to facilitate compute requests is costly. GPU’s are expensive and they require a large supply of power to operate, on top of the Devops work that goes into successfully fulfilling requests. The MOR payment for compute therefore acts as revenue for the provider and enables the Morpheus compute network to dynamically service compute requests via the free market. For example, if demand for compute is low and the supply is high, the most inefficient providers may be operating at a loss and will leave the network until an equilibrium is reached. This dynamic also incentivises the most efficient compute providers to facilitate requests, since they are able to offer the lowest bids to the router (this could be through access to the long-tail of GPUs or cheap energy sources throughout the world). Simply, the software pays for the hardware.

Coder

Coders play a crucial role in the development and upkeep of the Morpheus network by contributing to the creation and maintenance of smart contracts, off-chain software, and the development of smart agents. Coders contribute to the Morpheus codebase and are rewarded with a pro-rata share of the coder MOR emissions (discussed later). For example, if a coder provides 10% of the total hours contributed to the codebase of Morpheus, that coder will receive 10% of the MOR emissions from the coder allocation. In this way the MOR token provides utility by incentivising open source contributors to build the software that allows the network to run smoothly, whilst also incentivising developers to create a flourishing ecosystem of smart agents for users to interact with on the protocol.

Coders are also able to use the MOR token as payment to access specific training data from users of the network in order to help train new LLMs or smart agents. For example, a coder could pay a user for their specific blockchain data in order to help train a DeFi smart agent.

Community

Community builders provide frontends and developer tools to the Morpheus network. Connecting to the Morpheus API, they create gateways for users to access inference. The contribution these community builders provide can be calculated as the pro-rata share of MOR transaction fees burned by each community builder. For example, if a community builder facilitates 10% of the output of the network each day and thus burns 10% of the fees, they will be entitled to 10% of the community allocation of MOR tokens for that day.

Capital

Capital providers provide staked Ethereum (stETH) to the Morpheus capital contribution pool. The yield generated from these stETH enables the protocol to create a robust liquidity pool for the MOR token via the Techno Capital Model (TCM), which we will explore in a later article. Capital providers receive a pro-rata share of MOR emissions from the capital provider allocation.

The TCM enables the bootstrapping of the Morpheus network in an extremely fair way. The mechanism through which Morpheus bootstraps liquidity is as follows:

Users contribute their stETH to the Capital contract
stETH tokens rebase once a day at 12PM UTC when the oracle reports changes in ETH2 deposits and changes in ETH rewards from users. This rebase is the yield from staking.
Whilst a user places stETH within the pool, their yield is used by the Morpheus protocol.
At a regular interval, 50% of this yield is swapped for MOR tokens via an AMM.
The other 50% of the yield (which is still in stETH) is then paired with the recently acquired MOR.
The MOR/stETH is then placed into the univ2 style as liquidity.
Fees generated from this liquidity position, in the form of stETH and MOR, will be reinvested into the pool.
The liquidity is then permanently locked through the burning of the LP tokens.
The pro-rata MOR rewards for capital providers is calculated per block. For example, if a provider makes up 10% of the pool for a day, they will receive 10% of the capital provider MOR allocation for the day.

Overtime, the capital provider's contribution to the network should serve to increase the on-chain liquidity of the protocol's token, benefiting all shareholders in the ecosystem. As the capital contribution of TVL grows, the daily amount of MOR bought back on the open market will increase. The free market will ultimately determine the TVL of this pool. For instance, should the Annual Percentage Rate (APR) for contributing stETH to this pool fall below the acceptable threshold for some participants, due to concerns over smart contract risks, they may choose to withdraw from the pool. This withdrawal would decrease the Total Value Locked (TVL), thereby elevating the yield, assuming all other factors remain constant.

MOR Token Distribution

The MOR token will be a fair-launch token, with a relatively simple distribution. Within the Morpheus ecosystem there are four key shareholders that contribute to the growth of the network and are thus rewarded, they are:

Coders - The coders in the Morpheus network are the open source developers that create the smart contracts and off chain components that power Morpheus. They are also developers who build smart agents on top of Morpheus.
Capital Providers - The capital providers in the Morpheus network are the participants who commit their staked ETH (stETH) to the capital pool for use by the network.
Compute Providers - The compute providers provide agnostic compute power, mainly in the form of GPUs.
Community - The community allocation in the Morpheus network refers to shareholders who create frontends in order to interact with the Morpheus network and their smart agents. It also encompasses any users who provide tools or do work to bring users into the ecosystem.

The final component of the MOR token allocation is the protection fund, which serves to reimburse any victims in the event of losses through smart contract exploits or bugs.

The MOR token has a maximum supply of 42,000,000 tokens and is emitted daily in the following proportions:

Coders - 24%
Capital Providers - 24%
Compute Providers - 24%
Community - 24%
Protection Funds - 4%

Emissions

On the 8th of February 2024 at 12pm UTC, the MOR protocol started emitting tokens. The block reward for the protocol started at 14,400 MOR per day. Each day the MOR reward declines by 2.468994701 MOR until the reward reaches 0 on day 5,833 (~16 years later).

The first day of emissions will therefore look like this:

Like both Bitcoin and Ethereum, the MOR token is a scarce digital resource with a decelerating inflation rate. However, unlike Bitcoin’s halvening, the rate at which MOR emissions decrease is linear:

The first 16 years of emissions will follow this schedule, until all 42 million MOR tokens have been emitted:

Burning MOR

The MOR token will have some burning features implemented, similar to Ethereum’s EIP-1559 mechanism. The full details of this haven’t been released yet.

Fees / Long term tokenomics

Long term the fees of the system should overtake the emissions and be the primary incentivisation mechanism of the system.

Tail Emissions

A key issue that is often referenced when analyzing the token economics of the Bitcoin network is the lack of a block subsidy or tail emission after the final block subsidy some time in 2140. In this not so distant future, the miners validating the Bitcoin network would start to operate at a loss if the revenue from transaction fees alone does not cover the cost of electricity and run costs. The resultant effect would be a mass migration or shutdown of hashrate on the network, greatly decreasing Bitcoins security.

The Morpheus network seeks to add in a tail emission, optimizing for a few goals:

Enable efficient compute providers, coders, community builders and capital providers to continue to operate in the event of less than expected revenue from fees paid.
To keep the MOR tokens scarce, never surpassing the maximum supply of 42 million tokens.

The tail emission phase will commence upon the completion of the initial token issuance period spanning 5,883 days. By this juncture, a total of 42 million MOR tokens will have been issued. However, it is important to note that the actual circulating supply of MOR will be less than 42 million due to the implementation of a token burning mechanism, which will effectively remove a portion of MOR from circulation.

The tail emission follows this 5,833 or 16 year epoch and applies 50% of the cumulative burned tokens in the previous epoch as the tail emission for the new 16 year period. However, this cumulative tail emission cannot exceed 16% of the total circulating MOR at that time. Through this approach, approximately 1% of the annual MOR rewards, relative to the circulating supply of MOR at that time, will be allocated to support future coders, compute resources, community initiatives, and capital investment in the network.

Let’s walk through a couple of examples:

If we assume the average burn rate of MOR in the first 16 years is 25%, then 10,500,000 MOR will have been burned, meaning of the 42 million MOR emitted only 31,500,000 are currently circulating. If we take 50% of the cumulative burn amount (10.5M / 2), we get 5,250,000 MOR tokens. However, since this amount of MOR is greater than 16% of the current circulating supply of MOR (5.25M / 31.5M = 16.67%), we set the tail emission to a flat 16% of the circulating MOR. In this case, that is 5,040,000 MOR tokens, creating a daily tail emission of ~863 MOR. This process perpetually repeats, following 5,833 day epochs.

We can generalize this approach to see the daily tail emissions that would result from different average burn rates, which will depend on network usage:

The burn rate and thus the tail emissions also has a pronounced effect on the number of MOR tokens in circulation through epoch epoch:

We can see that a high burn rate compounded over multiple epochs has a great influence on the number of MOR tokens in circulation:

Fair Launch

At the core of the Morpheus network is decentralization, open innovation and the fair launch of the MOR token. As such, the Morpheus network is undergoing a 90 day bootstrapping phase without taking capital from VCs in a private round or completing a pre-mine. The bootstrapping phase utilizes the TCM with a one time 90 day delay. During this initial 90 days, between Feb 8th - May 8th 2024, MOR tokens are not claimable / sendable by users, meaning these MOR rewards from code contributions, capital contributions and compute / community are accumulated.

Throughout this 90 day period the various Morpheus smart contracts are calculating each shareholders share of MOR rewards and on May 8th once the bootstrapping period is over and the compute / router contracts are expected to be online, MOR tokens will be claimable and sendable on the Arbitrum network. The Morpheus network opted for this bootstrapping period so that once the MOR token is live to trade, there isn’t an extreme scarcity issue that will be a detriment to price discovery. If we take a look at the launch of Zcash we can see why this phase is needed:

Zcash launched on October 29th 2016 with only a small fraction of the total supply available to trade. The resultant effect was eager buyers of this new technology entered Zcash on the launch day at wildly high fully diluted valuations, with some buyers purchasing the ZEC token for as much as $3,191 per token. It took the market almost 3 months post launch to reach an equilibrium price of ~$50 and establish rational price discovery. Early investors and notably all market analytics sites like Coingecko would show the ZEC token down ~99%, leading to many believing this to have been a failed project. The bootstrapping period enables the initial circulating supply to fulfill the various utility functions throughout the network.

At the beginning of day 91, a total of 1,286,111 MOR (3.06% of total supply) will have been emitted:

The capital provider allocation will be claimable to all users who participated and the coder allocation will be sent to each contributor's submitted address. Since the compute contracts (full node, contract and router) will not have been live during this bootstrapping phase, the compute allocation will be stored within the compute distribution contracts and will only be distributed based on factual compute provided. Likewise, the community allocation will be stored within the community distribution contracts.

AMM

The protection fund will have accumulated 50,482 MOR tokens during this bootstrapping phase and in order to submit a transaction to create the liquidity pool for MOR, the network will use these MOR tokens as a one time event. These 50,482 MOR tokens will be paired with all of the accumulated stETH from the capital contribution pool.

By analyzing the number of stETH within the capital contribution contract, we can get a rough approximation of the number of stETH the initial MOR tokens will be paired with in the AMM. We can therefore work out the launch price of MOR.

To work out the launch price of the MOR token we have to make a few assumptions / guess a few variables:

The average number of stETH within the capital contributor pool for the 90 day duration. This allows us to work out the stETH yield that will be paired with MOR at the token launch.
The ETH price. This will affect the dollar denominated price of MOR.
The stETH APY. The yield on stETH can change due to a variety of factors, however the most common deviation in rewards is due to variance in the base fee and MEV rewards.

With the assumptions made in the table above, we can see that when the Morpheus network bootstraps the MOR LP, they will pair ~874 wETH with 50,482 MOR from the protection fund. The liquidity will be a combination of full range and concentrated, with:

52% of the wETH and 50,482 being placed as full range liquidity.
48% of the wETH being placed as one-sided liquidity from range 0 to the initial listing price.

The results are the following:

In this scenario the initial launch price of MOR will be ~$27 and the MOR LP would be just under $4 million and growing each day as capital contributor yield is LP’d. We can sensitize the two main variables in this analysis (ETH price & capital contributor TVL) to work out the varying prices at which MOR may launch at:

Liquidity Pool

The TCM provides a small, consistent buyback of the MOR token before adding this MOR and stETH to the liquidity pool. As such, the LP is set to slowly grow over time. In the example below, we assume a steady state TVL in the capital contribution contract of 150,000 stETH (allowing for a decrease in TVL post the initial bootstrapping period), an stETH APY of 3.5% and and stETH price of $3000. In this scenario, ~14.38 stETH are earned each day as yield, with half of this buying back the MOR token.

The TCM model creates a way to fair launch projects without pre-sales, LBPs or pre-mines and enables on-chain liquidity to grow alongside the project. TCM has been widely successful and represents a viable way to fair-launch projects, opposing the supply-concentrated, VC funded projects we have seen of late.

Morpheus Yellowstone Compute Model

asxn-labs@newsletter.paragraph.com (ASXN Labs) — Wed, 10 Apr 2024 20:20:28 GMT

Introduction

Morpheus aims to enhance the development of a peer-to-peer network that includes personal, general-purpose AIs known as Smart Agents. These agents are capable of executing Smart Contracts on behalf of users. Through providing open-source Smart Agents and large language models (LLMs) that enable connections to users' wallets, decentralized applications, and smart contracts, Morpheus seeks to make access to AI and the crypto ecosystem universally accessible.

Source: Morpheus Github

In the Morpheus ecosystem, four principal groups of stakeholders contribute to and benefit from the network's expansion:

Coders - These are the open-source developers within the Morpheus network who design the smart contracts and off-chain components that fuel Morpheus. This group also includes developers who construct smart agents leveraging Morpheus.
Capital Providers - These stakeholders in the Morpheus network are the ones who allocate their staked ETH (stETH) to the network's capital pool for utilization.
Compute Providers - These contributors offer neutral computing resources, predominantly through GPUs, to support the network's operations.
Community - This segment includes stakeholders who develop interfaces to engage with the Morpheus network and its smart agents. It also covers any individuals contributing tools or efforts to attract new users to the ecosystem.

In this article, we delve into the mechanics of compute provision within the Morpheus network, alongside an analysis of Erik Vorhees' 'Yellowstone Compute' model, which serves as the foundational infrastructure for Morpheus.

Compute Provision

First, let’s understand the role of a compute provider in the Morpheus network. Compute providers are entities running full nodes that provide access to computation resources to allow users to access inference from models that cannot be handled locally. They typically will operate Graphical Processing Units (GPUs) either on-site, in the cloud or via DePIN’s such as Akash, Render or IO Net. Essentially the compute providers allow users access to the outputs of computationally intensive models and in return they are rewarded with MOR tokens for each successful interaction with a user.

Qualification - To qualify as a valid compute provider for the network and receive compute requests from the router, the compute provider's public address must hold MOR tokens. This is to dissuade Sybil attacks.
Payment - Operating the infrastructure to facilitate compute requests is costly. GPU’s are expensive and they require a large supply of power to operate, on top of the Devops work that goes into successfully fulfilling requests. The MOR payment for compute therefore acts as revenue for the provider and enables the Morpheus compute network to dynamically service compute requests via the free market. For example, if demand for compute is low and the supply is high, the most inefficient providers may be operating at a loss and will leave the network until an equilibrium is reached. This dynamic also incentivises the most efficient compute providers to facilitate requests, since they are able to offer the lowest bids to the router (this could be through access to the long-tail of GPUs or cheap energy sources throughout the world). Simply, the software pays for the hardware.

White Paper Solution

The original Morpheus whitepaper, published on the 2nd of September 2022, described the compute provision mechanism like this:

“The pro-rata MOR transaction fees burned by each Compute Provider serves as proof of the Compute Providers status and earns a proportion of the MOR tokens each day.

For example, if there are 100 Compute Providers on day 1 when the network launches, then each one gets a pro-rata reward based on the amount of MOR they have burned via fees. In this case, presuming each of the 100 compute providers burned 100 MOR, then 1% of the 3,456 MOR tokens each day = 34.56 MOR.”

In this case, a user would pay for inference with a blockchain transaction each time they utilized the Morpheus compute network and the routing of compute requests would be decided based on the quantity of MOR the compute providers public address held. Under this model, we encounter four major issues:

User Costs - Under this model, each time a user accesses inference they have to pay a fee in MOR via a blockchain transaction. Even post the Dencun upgrade, the gas costs associated with accessing this inference quickly become uneconomical as each inference is an extremely low cost (gas costs could be greater than the inference cost).

UX - Requiring users to complete a blockchain transaction for each inference access results in a poor user experience. Unlike OpenAI or Google, where inference is quick and seamless without the need to sign transactions, this requirement could render Morpheus a less competitive option.

Game-ability - Under this model, all MOR compute emissions would be distributed pro-rata, making the system highly vulnerable to exploitation due to the significant gap between expected revenue for compute providers and the actual costs of computing. An adversary could potentially spam/sybil their own Compute Provider node with inference requests, thereby earning a substantial amount of MOR tokens daily without delivering any real economic value. This situation could result in an abundance of early compute resources being unused, which would likely vanish as the exaggerated revenue prospects decline. Consequently, the MOR tokens allocated for this early incentive would be effectively squandered.

Performance - In this model, the routing and matching of a compute provider and a user is based on the quantity of MOR the compute provider holds, meaning the routing is agnostic to performance. Under this model, determining priority in inference handling misses the mark on critical performance indicators like response speed and the efficiency of inference processing. Optimizing for these aspects — minimizing both the time it takes to respond and the cost associated with computation — should be the network's primary goal.

Yellowstone Objectives

Recognizing the shortcomings of the initial setup, Erik Vorhees released the “Morpheus ‘Yellowstone’ Compute Model” paper on January 3, 2024. This document aims to more closely align the compute provision mechanism with the network's overarching objectives. Those objectives are:

Align Incentives - Create fundamental economic demand for MOR in a manner that's sound and sustainable.
Improve UX - Allow users the convenience of accessing compute without the need for payment per inference, aiming for a model where they ideally incur no costs in order to compete with their centralized counterparts.
Improve Game Theory - Ensure the provision of permissionless compute resources is efficient, scalable, and sustainable, without leading to excessive compensation.
Improve Performance - Encourage free-market competition among compute providers with incentives for lower response times and reduced costs.
Reduce Costs - Minimize the total number of blockchain transactions needed, thereby reducing the associated gas fees and facilitating highly scalable compute requests.

Yellowstone Mechanics

First, we must define each component of the system:

Users - refer to any entities possessing a MOR address who submit requests to the Router for computational services. This group may include individual persons utilizing a Morpheus desktop node, automated bots, or corporations and third-party websites engaging with the Morpheus network on behalf of their clients. It is crucial to note that these third-party "end-users" are not considered Users in the context of the Morpheus network.
Compute Providers - are entities operating a node that supplies computational resources, holds a MOR address, and submits IPS (Inferences Per Second) bids to the Router. Upon winning a bid, a Provider offers the computational resources (such as GPUs) necessary for executing the AI model requested by the User.
Router - is a software application equipped with a MOR address that orchestrates the market interactions between Users and Providers. It is responsible for registering Providers' addresses and bids, processing requests from Users, recording the execution times and outcomes of these requests, and directing the Compute Contract to compensate qualified Providers with MOR payments. The Router neither initiates nor receives any MOR transactions or transactions on any other blockchain and does not access the content of requests or their responses.
Compute Contract - is a smart contract with a MOR address, tasked with collecting all MOR designated for the Compute allocation, keeping track of amounts due to qualified Providers, and disbursing MOR payments to Providers upon their request.

“Inferences per Second” (IPS) is the basic measurement unit for AI inferences, serving as a benchmark for rates within the Yellowstone router. Hence, the significance of a single Morpheus AI inference is measured by this unit.

“IPSMax” indicates the Router's upper limit for IPS that can be compensated.

The Yellowstone compute model works as follows:

Users, compute providers, and router all create MOR public keys and private key pairs.
If a user holds any balance of MOR, they may submit a signed Request for Compute “RFC” message to the Router. The user specifies [LLM] and [IPSMax].
The router prioritizes RFCs based on a user’s MOR balance, which helps to solve the sybil issue.
The router selects a compute provider that supports the [LLM], prioritized based on lowest Bid per IPS in MOR (cheapest cost of inference).
The router then sends a liveness check to the compute provider. If Pass, then:
The router connects the user to the Provider over TCP/IP.
The user sends a query using to the provider, using this notation: ‘[LLM],[prompt]’
The compute provider computes the query and sends the result to the user
The user reports success metrics to the router (such as IPSs received or time taken, or pass/fail vote)
The router instructs the compute contract to credit the compute provider with MOR if the job was completed satisfactorily.
Some time later, the compute provider requests payment of MOR from the compute contract and the compute contract sends MOR payment if valid (first blockchain TX so far, can be batched).

Yellowstone Outcomes

The Yellowstone compute model improves on the original model and achieves its set out goals in a number of ways. Let’s analyze the outcomes of this compute model:

User Experience - The user receives a fast result for their query and doesn’t pay anything (only holds MOR). This leads to superior UX and thus should improve adoption.
Performance - The compute contract paid for access to compute through a competitive bidding process (lowest bid per IPS) and implemented a quality check to ensure the result was satisfactory. This free market, competitive process will drive the costs for inference toward the price of base electricity. This way, the most performant compute providers will win, either through superior management or cheaper access to compute / electricity.
Costs - The costs of inference are driven down via free market mechanisms, but also through the usage of offchain systems and limiting the number of blockchain transactions that must be completed. In this model, the only blockchain transactions that must be completed are the compensation of compute providers, which can be batched together and completed on set intervals. Users can freely access compute without having to pay exorbitant gas fees.
Improved Game Theory - In the Yellowstone model, a compute provider is only compensated with MOR when they have successfully provided compute. In the original model, the compute provider emissions would be fully distributed pro-rata, enabling users to sybil / spam their own hardware with requests to acquire MOR cheaply. The somewhat random selection of providers in the matching process coupled with compute providers only being paid for successful outputs, help reduce this issue.
Privacy - The offchain router provides reasonable privacy guarantees. The query never touches the router and neither does the result. Providers are selected somewhat randomly and only know the IP address of the user.

The Compute Budget

As discussed, the compute contract is a smart contract which receives all of the MOR emissions allocated to the Compute bucket. It tracks the amounts owed to eligible compute providers and facilitates MOR payments to these providers upon their payment request.

The Morpheus Network establishes a daily 'compute budget,' reflecting the maximum amount of MOR the network is prepared to allocate for computing services each day (starting at 3456 MOR per day). Consequently, the compute contract is authorized to disburse up to this MOR limit in compensating compute providers for their contributions. The product of this allocated MOR amount and its prevailing market price determines the network's daily budget in dollars for securing compute services. This way, the contract will be solvent so long as MOR paid < MOR earned per period from emissions. In reality, this budget will be a fixed percentage of the compute contracts MOR balance at the end of the prior day, ensuring that the contract never runs out of MOR as the balance decays asymptotically.

For example, if we take 50% of the compute emissions as the ‘compute budget’, we get the following emissions schedule for compute providers, assuming the compute started when the capital contribution contract went live and the entire budget is distributed each day.

However, since we have a 90 day bootstrapping period for the network, at launch there will be 312,069 MOR in the compute contract, meaning the compute budget will be higher in MOR terms. The compute budget is relevant since fundamentally, the Morpheus network manages the limited resources of IPS production. There is therefore a maximum amount of inference that can be accessed per day. Via this"AccessRate" mechanism, inference is provided to users, with this rate essentially specifying the daily access to IPS that each MOR token provides. Taking an example:

AccessRate is presented as the amount of IPS accessible with 1 MOR token (e.g., 1 MOR = 15,000 IPS). It is influenced by MaxIPS, which represents the network's maximum daily purchasing capacity for IPS.

MaxIPS = ((MOR Compute Budget * MOR Price) / IPS Price) * 1000
AccessRate = (1/MOR Supply) * MaxIPS
UserMax = AccessRate * User's MOR balance

Example at Launch:

MOR Supply = 1,300,289 MOR tokens

Prior Day Compute Contract Balance = 312,069

MOR Compute Budget = 3,120.69 MOR tokens per day (1% of above)

MOR Price = $50

IPS Price = $0.0025 per 1000 IPS

User Balance = 10 MOR tokens

Example results:

MaxIPS = [(3,120.69 * $50) / $0.0025] * 1000 = 62,413,800,000 IPS (this is the maximum IPS the network can buy/produce each day)

AccessRate = (1/1,300,289) * 62,413,800,000 = 48.067 (thus each MOR token grants access to 48.067 IPS per day)

UserMax = 480.067 (a user with 10 MOR tokens can access up to 480.067 IPS per day)

Nirmaan

We are gearing up to deploy compute at scale in order to advance smart agent proliferation on the Morpheus network once compute and router contracts go live in May/June time.

Nirmaan aims to democratize access to compute provision. If you are interested in running compute on Morpheus, Nirmaan will be providing a white glove solution, aggregating compute from the most cost effective venues in web2 & web3 and expertly managing the Devops required to compete on the network.

If this is of interest to you, please reach out to @oxnirmata on telegram or email hello@nirmaan.ai for further details.

Running Heurist on Io Net

asxn-labs@newsletter.paragraph.com (ASXN Labs) — Thu, 04 Apr 2024 16:33:03 GMT

In the computing landscape, a significant shift towards decentralization is taking place, moving away from traditional centralized models. This evolution is underscored by innovations such as the Io Net, driven by a combination of factors that spotlight the limitations of legacy Web2 infrastructures. Among the primary concerns are the inefficiencies and bottlenecks caused by centralization, which compromise adaptability to the fast-paced demands of modern computing. Furthermore, the scarcity of high-performance GPUs obstructs the execution of compute-intensive tasks critical for advancements in machine learning and artificial intelligence. Compounding these issues are the significant risks to data privacy inherent in centralized systems, which raise alarms about the security of sensitive information. The opacity of pricing models adds another layer of complexity, making it challenging for users to estimate costs accurately. Additionally, the inflexible and unclear financial policies of these platforms often hinder users' ability to manage their funds effectively.

At Nirmata Labs, our firsthand experiences with Web2 service providers have illuminated several specific challenges. The process to access particular GPU models is fraught with justification requirements and a prolonged approval process, constraining our agility and responsiveness to project demands. Moreover, the limited availability of top-tier GPU models, such as Nvidia's H100, A100, or RTX A6000, essential for cutting-edge ML and AI projects, restricts AI labs technological capabilities. The cumbersome process for withdrawing funds, necessitating direct interaction with providers and enduring waiting periods of 5-7 days for bank processing, further exemplifies the operational inefficiencies we face. Additionally, the limitations on customization options, with only a few providers offering essential tools like Docker, Nvidia drivers, or the CUDA toolkit pre-installed, and the scarcity of options to preload machine learning models, stifle our capacity for innovation and experimentation.

In contrast, io.net stands out for its full customizability, allowing users to tailor their instances precisely to their needs. It offers seamless deployment options, includes nvidia drivers, CUDA toolkit and Anaconda, providing a smoother and more efficient experience for developers and users alike while offering full decentralization, absolute liquidity, instant withdrawals and no approvals to rent any of their machines. In this article, we will take a look at how we can use io.net to set up a miner for Heurist.

Hardware and Software Requirements for running a Stable Diffusion Miner for Heurist:

Nvidia Cards with at least 12 GB of VRAM
CUDA Toolkit 12.1 or 12.2
Nvidia GPU Drivers
Anaconda3 or Miniconda3

Step 1: Connecting to the Console:

Visit io.net and sign up for an account.
You can add USDC to your balance on io.net by clicking on the ‘Reload with Solana Pay’ button at the top left hand corner. This will enable us to pay for our compute needs. You can also choose to pay at a later stage in the cluster deployment setup also.
Navitagte

Select IO Net Cloud

Click on the Deploy Button next to the “Ray” Option which is the first one here:

Deploy Ray Cluster

Now once we are on the Create New Cluster page, We can select the “General” Option in the Cluster Types. If deploying an LLM Miner you can use the Inference Type as well to handle heavy workloads and produce low latency inferences, but for this tutorial, we’ll just use the General option. Next, scroll down and select the supplier, we’ll use io.net. This is a glimpse of what the setup screen encompasses:

Creating a Cluster

Scroll Down until it shows Select Your Cluster Processor option. IO Net lists a multitude of GPUs available for leasing as a cluster. The GPUs provided by IO Net workers range from low-end general purpose GPUs to high-end cutting edge AI & ML GPUs like A100s and H100s, which are very scarce in supply on the majority of the centralized providers:

Select Your Cluster Processor

In this example, we have selected an RTX A4000 with 16 GB of VRAM to run the Stable Diffusion Miner for Heurist, and after scrolling down we will select the location as United States. For the connectivity tier we have selected ultra high speed :

Select Location

Select Connectivity Tier

After all the appropriate options have been selected, this is what the final summary before the deployment will look like:

Re-check your selections and click on the Deploy button here:

The deployment screen will look like this while its processing your payment and getting ready for deployment:

After Deployment your IO Cloud dashboard will have this:

We have now successfully deployed an 8 x Nvidia RTX A4000 Cluster!

Step 2: Initializing the Deployment

After the deployment is successful, click on the clusters tab. Select on the instance.
After scrolling down on your cluster’s page, on the bottom right you will see this:

We can either use Visual Studio or Jupyter Notebook to access the cluster’s head node and its terminal.
For this tutorial we’ll use Visual Studio through the IO Net cluster by clicking on the Visual Studio Button. We recommend using Jupyter Notebook if you wish to use Miniconda instead of Anaconda.
The password for your dev environment will be provided by IO Net under the dev environment tab. After the Visual Studio set up is complete, this is what you will see:

Press CTRL + SHIFT + ` in order to open up a terminal for our head node.
Once the terminal loads up, we can verify that the NVIDIA drivers and the CUDA toolkit are pre-installed by running this command in the shell:

nvidia-smi

A Successful installation will look like this:

Displayed is our configuration featuring an RTX A4000 equipped with 16GB of VRAM and the CUDA Toolkit version 12.2, in addition to the installed NVIDIA drivers. This setup fulfils the prerequisites for operating a Heurist miner. With these specifications in place, we are now positioned to proceed with the installation of Anaconda or Miniconda.

Step 3: Installing Dependencies:

If not already logged in, we need to log into the root user by entering command:
su -
Update and upgrade Linux packages and dependencies:
apt-get update
apt-get upgrade
Install wget, tmux and Neovim:
apt-get install wget
apt-get install neovim
apt-get install tmux

Important Note: IO Net clusters usually come with Anaconda3 pre-installed. You can check if you have it preinstalled by typing conda list or conda –version. If anaconda3 is pre-installed, you can skip to step 4 where we create the environment.

If Anaconda3 is not installed, Install Miniconda using these steps:

Installing Miniconda (Only if you don’t have anaconda installed):

Create a directory for Minconda3:
mkdir -p ~/miniconda3
Download the latest Miniconda Installation Script:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
Run the Install Script:
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
After the installation is finished, it will look like this:

Delete the Install Script:
rm -rf ~/miniconda3/miniconda.sh
Add a conda initialize to your bash:
~/miniconda3/bin/conda init bash
Exit from the shell as we have to reconnect to it in order to initialize conda:
exit
After exiting, wait 5 seconds and re-open the terminal by pressing:
CTRL + SHIFT + `

Verifying the Installation and creating the conda environment:

After restarting the shell, run:
conda list
If Miniconda has been installed successfully, you will see something like this after running conda list:

Step 4. Creating the Environment:

Create a new tmux session in order to create and activate the environment and setup a Heurist miner by first running:
tmux new -s heurist
To establish our environment, execute the following command and allow some time for the downloading and extraction of the necessary packages:
conda create --name gpu-3-11 python=3.11
Activate the environment once you have entered the tmux by running:
conda activate gpu-3-11
Since we already have nvidia-smi working and the CUDA toolkit installed, we can go straight ahead and install the conda environment dependencies we need to run the miner (this will take ~10mins to install):
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Command taken from: https://pytorch.org/get-started/locally/

When you see:

Preparing transaction:

doneVerifying transaction:

doneExecuting transaction: done

It means pytorch and other dependencies were installed correctly. We can now move onto the next step.

Step 5. Cloning the miner repository to access the miner:

Clone the official repository by running:
git clone https://github.com/heurist-network/miner-release
Enter into the miner-release directory:
cd miner-release
Install the python dependencies needed to run the miner:
pip install python-dotenv
pip install -r requirements.txt
After installing all the dependencies, configure your Miner ID to receive rewards by creating a .env file using an editor of your choice:
For this tutorial we have chosen to use Neovim as our file editor but you are free to use nano, vi or vim as well.
Create and open the .env file while we are still in the miner-release directory:
nvim .env
Configure your Miner ID in order to be eligible for incentives and rewards by entering your 0x EVM address into your .env file like this: MINER_ID_0=0xYourWalletAddressHere
Exit Neovim by pressing your keyboard’s escape button, or writing :exit or :wq + press enter
Make sure that after exiting Neovim we have the .env configured and we are still in tmux.

Step 6. Finally, Running the Miner:

While we are still in tmux, run:
python3 sd-miner-v1.0.0.py
or
python sd-miner-v1.0.0.py
Note: Make sure you select the correct version by checking the directory. As of mid march 2024, 1.0.0 is the latest version for stable diffusion.
After running the miner, you will be asked for a yes or no to install the miner’s packages so enter yes
Soon the model will be ready and your tmux will show: No Model Updates Required. and then this: All model Files are up to date, Miner is ready.
When you see these, it means that the Miner is ready and running so you can detach from the tmux by pressing CTRL + b at the same time on your keyboard and d right after.
Now you can successfully exit your machine as your miner is up and running in the tmux session in the background.

Nirmaan

The cornerstone of crypto AI networks lies in computing power. This includes the computing necessary for the inference of computationally demanding models, or the computing required to execute a model and generate a cryptographic proof verifying the correct execution of the model. For these tasks, high-performance GPUs are essential to the operation of such networks. However, not everyone has access to this high cost hardware or has the technical know-how to run the hardware with high performance and uptime.

We at Nirmaan are democratizing access to compute and are excited to enter a strategic partnership with Heurist, providing our miner-as-a service middleware product to Heurist users who wish to provide compute to earn rewards.

Nirmaan aggregates the most cost effective compute from web2 & web3 providers such as Io Net, securing cheap and effective compute so that we can provision it to Heurist.

We will initially offer and manage NVIDIA RTX A6000 GPUs to run both LLM & Stable Diffusion models on the Heurist network.

Running Heurist

asxn-labs@newsletter.paragraph.com (ASXN Labs) — Mon, 01 Apr 2024 20:05:30 GMT

Introduction

Heurist is a Zero-Knowledge (ZK) Layer 2 network specifically crafted for decentralized hosting and inference of AI models. Unlike traditional platforms, it operates on a distributed network of compute providers, enabling seamless, serverless access to open-source AI models. This approach mirrors the functionality of Hugging Face, a renowned open-source AI model marketplace. However, Heurist distinguishes itself through its unique ownership structure: the network is collectively owned by its users, promoting a more democratic and decentralized model of operation.

The primary ambition of the network is to democratize access to AI technologies by achieving several key objectives: facilitating easy and affordable access to AI models, enhancing transparency across the board, significantly reducing biases inherent in AI, and fostering the democratization of AI models. This strategy is designed to lower barriers to entry for AI utilization, ensuring a more equitable distribution of AI benefits across various sectors and communities.

Closed-source AI

The internal ‘settings’ of a model, known as parameters, play a critical role in neural networks. 'Weights,' which are coefficients applied to the input data, determine the strength of connections between units across various layers of the model. These weights are adjusted during training to reduce errors in predictions. 'Biases' are constants added just before the activation function, ensuring the model can still make precise predictions even when input values are zero. They aid in pattern recognition by allowing adjustments to the application of the activation function.

Proprietary, closed-source models, such as the GPT series developed by OpenAI, keep their training data and architectural details confidential, making the precise configurations of their parameters proprietary. This exclusivity grants the model's owner full authority over its utilization, development, and deployment. Such control can introduce several centralizing influences during the creation of a model:

Censorship - Model owners have the autonomy to dictate the nature of content produced or analyzed by the model, incorporating filters to exclude specific topics, keywords, or concepts. This functionality serves multiple purposes, such as sidestepping contentious issues, adhering to legal mandates, or ensuring alignment with a company’s ethical standards and commercial objectives. Since ChatGPT's introduction, there has been a noticeable trend towards more restricted outputs, diminishing its utility in certain contexts. A notable illustration of this trend is observed in China, where interactions with a Robot based on OpenAI's core model on WeChat are heavily censored. For instance, it refrains from addressing inquiries about Taiwan or posing questions related to Xi Jinping. A Wall Street Journal journalist successfully employed adversarial techniques to reveal that the Robot was intentionally programmed to evade topics deemed politically sensitive by the Chinese government or the Communist Party of China.
Bias - In neural networks, weights and biases play a crucial role but can also unintentionally lead to bias, especially if the training data is not diverse. Weights adjust connection strengths between neurons and might emphasize or overlook specific features. This can result in a bias of omission, where essential details or patterns in less represented data are missed. Biases, intended to improve learning efficiency, could cause the model to prefer certain types of data over others if they are not adequately adjusted for a wide range of inputs. The proprietary nature of these models further exacerbates this issue, as it may lead to the exclusion of important patterns related to certain groups or situations, thereby skewing predictions and reinforcing biases in the outputs. This means certain perspectives, voices, or information might be underrepresented or inaccurately portrayed. An illustrative case of model owner-induced bias and censorship is seen with Google's advanced language model, Gemini.
Verifiability - In a closed-source framework, it's challenging for users to verify if the model version claimed to be in use, such as ChatGPT 4 compared to ChatGPT 3, truly matches the service provided. The lack of transparency regarding the model's architecture, parameters, and training data prevents external validation. This opacity obscures whether users are benefiting from the latest technological enhancements or if outdated technologies are being misrepresented as new, potentially compromising the AI service's effectiveness and quality. For instance, in scenarios like assessing an applicant's eligibility for a loan through AI models, questions arise about the consistency of the model applied across different applicants. Moreover, there are concerns about whether the model strictly adheres to the designated inputs without deviation, highlighting the importance of transparency in ensuring fairness and reliability in AI applications.
Dependency, lock-in and stagnation - Entities that depend on proprietary AI platforms or models are often at the mercy of the corporations controlling these technologies, resulting in a monopolistic aggregation of power that hinders open innovation. This dependence emerges as these corporations can limit access or modify the model at any time, significantly affecting those who rely on these tools for development. Historically, several examples illustrate this trend: Facebook, which initially promoted open development through its public APIs to encourage innovation, later restricted access to emerging competitors like Vine. Voxer, a messaging app that became popular in 2012 for its integration with Facebook to find friends, was cut off from Facebook's 'Find Friends' feature. This phenomenon isn't confined to Facebook; numerous platforms that start with an open-source or open innovation philosophy often shift towards prioritizing shareholder value, to the detriment of their users. For instance, Apple's App Store enforces a 30% commission on app-generated revenues. Similarly, Twitter, which once championed openness and interoperability with the RSS protocol, moved to prioritize its centralized database in 2013, severing ties with RSS. This shift resulted in the loss of data ownership and access to one's social network. Amazon has faced accusations of exploiting its internal data to create and favor its products over third-party sellers. These instances highlight a consistent pattern where platforms transition from open ecosystems to more controlled, centralized frameworks, adversely affecting innovation and the wider digital community.
Privacy - The owners of these centralized models, large corporations such as OpenAI, retain all rights to use the prompt and user data to better train their models. This greatly inhibits user privacy. For example, Samsung employees inadvertently exposed highly confidential information by utilizing ChatGPT for assistance with their projects. The organization permitted its semiconductor division engineers to use this AI tool for debugging source code. However, this led to the accidental disclosure of proprietary information, including the source code of an upcoming software, internal discussion notes, and details about their hardware. Given that ChatGPT collects and uses the data inputted into it for its learning processes, Samsung's trade secrets have unintentionally been shared with OpenAI.

The Rise of Open-Source AI

Open-source AI is defined by its transparency and accessibility, featuring openly available model parameters and clear disclosures about the data (and the raw data) used in pre-training. This approach allows developers, researchers, and users to inspect, modify, and enhance AI models, fostering a collaborative environment that accelerates innovation and improvement. Open-source AI projects disclose the architecture of their models, enabling a deeper understanding of how they operate and the basis of their decision-making processes. Additionally, by revealing the datasets used for pre-training, these projects ensure users can assess the diversity, breadth, and potential biases within the training data, contributing to more ethical and unbiased AI systems. This openness not only democratizes access to cutting-edge technology but also encourages a global community of contributors to identify flaws, suggest improvements, and adapt the technology for varied applications, thereby ensuring the AI's continuous evolution and relevance.

Despite the approximately seven-year lead that closed-source AI development had over its open-source counterparts, the landscape is rapidly changing. Currently, many open-source language models not only match but occasionally surpass the performance of GPT-3.5. Moreover, in specific areas, these open-source models achieve performance levels comparable to GPT-4. This development signals a significant shift in the AI field, where open-source initiatives are closing the gap with well-funded, proprietary systems. The success of these open-source models can be attributed to a combination of factors, including the collaborative nature of open-source projects, which accelerates innovation and improvement; the availability of extensive datasets and advanced computing resources; and a growing community of developers dedicated to enhancing AI accessibility and capabilities.

Source: HuggingFace

Source: Anyscale

In the realm of image generation, the Stable Diffusion models developed by Stability AI have emerged as the leading open-source text-to-image model family. They have demonstrated a level of power and efficiency on par with, and in terms of cost, superior to, their closed-source rival, OpenAI's DALL-E 2. A distinctive advantage of the Stable Diffusion models is the public accessibility of their weights, which empowers artists and developers to tailor the model for specific visual styles. This level of customization and adaptability is notably absent in the DALL-E models from OpenAI, highlighting a significant flexibility offered by Stable Diffusion in creative and development processes.

HuggingFace has become the epicenter of a Cambrian explosion in open-source AI innovation, dramatically democratizing the ability for individuals and organizations to host and leverage open-source models for a wide range of inference tasks. The platform's growth in hosting models, datasets, and applications has been nothing short of meteoric. From having fewer than 5,000 models in 2020, HuggingFace has witnessed an exponential increase, boasting a staggering 574,737 models as of the latest count. This surge reflects not just the escalating interest and investment in AI but also underscores HuggingFace's pivotal role in facilitating unprecedented access to cutting-edge AI tools and fostering a vibrant, collaborative ecosystem for AI research and development.

Source: HuggingFace

Seeing this exponential growth of open-source AI, Heurist is looking to build the HuggingFace of Web3.

Protocol Overview

The Heurist protocol connects various participants, each playing a role in maintaining a decentralized model inference protocol through the coordination of distributed compute. The network participants are:

Consumers: Consumers engage with the Heurist protocol to perform inference tasks, such as generating text or images, utilizing a selection of AI models hosted on the platform. They benefit from a pay-as-you-go model for the computational resources utilized.
Miners/Model Hosts: Individuals possessing GPU resources can earn Heurist Tokens by hosting AI models on the protocol. They process model operations on their hardware and receive compensation through payments from users and Heurist Token distributions for completing inference tasks. To assure a commitment to high-quality service, miners are mandated to stake a predetermined quantity of tokens.
Model Creators: The dynamism of the Heurist ecosystem is propelled by AI model creators. By uploading their AI models to the protocol's model registry, they gain a share of the transactions made by users. This arrangement motivates creators to innovate and produce more sophisticated models to meet the escalating demands of users.
Application Integrators: These participants develop user-facing interfaces that leverage Heurist’s AI models, including but not limited to chatbots, AI agents, and image generation applications hosted online, as well as SDKs for web service integration. Application integrators receive a portion of the fees in Heurist Tokens when consumers transact through their applications.
Validators: Validators are essential for upholding the Heurist network’s integrity and reliability. They conduct regular verifications of the data output by miners to ensure its validity. Miners found to be delivering inaccurate or fraudulent data face a penalty, with a part of their staked tokens being confiscated and awarded to the validator who detected the misconduct.

Protocol Mechanics

The Heurist protocol facilitates open access to inference, whilst preserving user privacy via the following mechanism:

Each miner (model host) generates a unique public-private key pair.
Miners publish their public keys, making them accessible to users.
When accessing inference, a user generates a symmetrical encryption key to securely encrypt the input data (their prompt) destined for inference.
The symmetric key is then encrypted using the public key of each miner the user intends to interact with. Should a user choose N miners, this encryption process is repeated N times, individually for each miner's public key.
The user compiles the encrypted data (their prompt), the collection of encrypted symmetric keys, and their public key into a request. This request is then disseminated across the network.
Upon receiving a user's request, a miner utilizes its private key to decrypt the symmetric key.
The miner subsequently uses the symmetric key to decrypt the user's original input data.
Once the user's input data is decrypted, the miner proceeds to execute the model inference task by running the user's unencrypted prompt through the correct model.
After finishing the task, the miner encrypts the model output data with the user's public key.
The encrypted output is then published to the network, where only the original user can decrypt it with their private key.

Token Economics

The HUE token (which is not currently live) has a dynamic supply, influenced by both emissions and burning mechanisms inherent in the protocol, with the maximum supply capped at 1 billion tokens. The exact distribution and launch plan is currently TBD, but so far we know the following:

5% of total supply will be rewarded to testnet miners.
2% of total supply will be governed by Heurist Imagineries NFT holders
There will be a DeFi-inspired staking mechanism to align the interests of token holders with miners

The distribution is currently TBD, but so far we know the following:

Mining

Mining Process: Users have the opportunity to mine HUE tokens by utilizing their GPUs to host AI models.
Staking Criteria: Activating a mining node necessitates staking a minimum of 10,000 HUE or esHUE tokens. Falling below this threshold renders the node inactive, unable to generate rewards.
Mining Rewards: The process of mining awards esHUE tokens, which are automatically added to the miner node's stake. The reward magnitude is influenced by several factors, including the efficiency of the GPU, its availability (uptime), the specific AI model in operation, and the cumulative stake within a mining node.
Enhanced POW Mining: For those staking between 10,000 and 100,000 HUE tokens, the efficiency of mining operations improves in direct proportion to the staked amount.

Staking

Staking Mechanism: Users are afforded the capability to stake either HUE or esHUE tokens within mining nodes.
Staking Rewards: The rewards from staking are dispensed in either HUE or esHUE, based on the token variant staked. Opting to stake esHUE tokens results in higher yield compared to staking HUE.
Withdrawal Lock Period: There is a 30-day withdrawal lock period for unstaking HUE tokens. In contrast, unstaking esHUE tokens is not subject to any lock period.
Vesting Scheme: Rewards earned in esHUE can undergo a vesting process to convert into HUE over a span of one year, adhering to a linear vesting schedule.
Stake Transferability: Stakeholders have the liberty to transfer their stake in HUE or esHUE between mining nodes instantly, fostering a dynamic environment that encourages competition and flexibility among miners.

Incentivised Testnet Information

Starting on the 1st of April 2024, Heurist’s incentivised testnet goes live. Heurist have earmarked 5% of the total HUE token supply to serve as rewards for mining activities. Participants will earn these rewards in the form of points, which can be converted into fully liquid HUE tokens at the mainnet's TGE. This conversion opportunity will become available immediately following the conclusion of the incentivized testnet phase, providing a direct pathway for participants to claim their earnings.

Rewards are segmented into two distinct categories, reflecting the type of AI model being provided by the miner:

Llama Points, allocated to miners of LLMs. Each Llama Point is awarded for the processing of 1000 input/output tokens by the Mixtral 8x7b model.
Waifu Points, allocated to miners utilizing Stable Diffusion models. Each Waifu Point is awarded for generating one 512x512 image via the Stable Diffusion 1.5 model, achieved through 20 iteration steps.

Compute providers hosting various LLM models will have to run hardware with the following minimum GPU RAM and will earn the following number of Llama points per 1000 tokens:

Source: Heurist Docs

The HUE allocation ratio of Llama Points to Waifu Points will be finalized as Heurist approaches TGE, based on an analysis of demand and usage patterns for the two model types in the upcoming months. This approach ensures that the distribution of rewards accurately reflects the value and contribution of each mining activity within our ecosystem.

Compute Provision Specifications

In order to participate as a compute provider in the incentivised testnet, the following GPUs are recommended to host either LLMs or Stable diffusion models:

Source: Huerist Docs

Nirmaan Partnership

Nirmaan aggregates the most cost effective compute from web2 & web3 providers, utilizing our partnerships with the largest compute providers in India to secure cheap and effective compute so that we can provision it to Heurist.

We will initially offer and manage NVIDIA RTX A6000 GPUs to run LLM models on the Heurist network.

Nirmaan's compute aggregation layer

Running openhermes-mixtral on our NVIDIA RTX A6000

Setup a Heurist Miner on Akash

asxn-labs@newsletter.paragraph.com (ASXN Labs) — Tue, 26 Mar 2024 19:04:46 GMT

In the realm of computing, a significant trend towards decentralization is unfolding, marking a departure from traditional, centralized models. This movement is exemplified by platforms such as the Akash Network or Io Net. The move towards decentralization is driven by a confluence of factors that underscore the limitations inherent in Web2 infrastructures. The traditional Web2 providers are becoming increasingly cumbersome to navigate and deploy. This complexity stems from several key issues:

Centralization: The centralized nature of these systems introduces inefficiencies and bottlenecks, making them less adaptable to the dynamic needs of modern computing.
GPU Scarcity: The availability of high-performance GPUs is severely limited, hindering the execution of compute-intensive tasks essential for advancements in ML/AI..
Data Privacy Concerns: Centralized systems often pose significant risks to data privacy, raising concerns among users and organizations about the security of their information.
Convoluted Pricing Structures: Pricing mechanisms are often opaque, making it difficult for users to understand and anticipate costs.
Liquidity Issues: Users frequently face challenges in managing their deposited funds due to rigid and opaque financial structures.

Our team at Nirmata Labs has directly encountered the challenges posed by Web2 providers. Notably, we have faced: Approval Hurdles for GPU Access: The requirement to justify the need for specific GPU models through detailed written explanations, followed by a tedious approval process, severely limits our agility and responsiveness to project demands.

Limited Access to High-End GPUs: Many platforms do not offer access to top-tier GPU models, such as Nvidia's H100, A100, or RTX A6000, which are crucial for cutting-edge machine learning and AI projects.

Funds Withdrawal Complexities: Withdrawing funds from these platforms is often a cumbersome and lengthy process, requiring direct communication with the provider team and enduring waiting periods of 5-7 days for the funds to be returned to one's bank account.

Limited Customizability: Customizability options are starkly limited, with only a select few providers offering essential pre-installed tools such as Docker, Nvidia drivers, or the CUDA toolkit. The scarcity of options to pre-install machine learning models further exacerbates the challenge, stifling innovation and experimentation.

In contrast, Akash Network stands out for its full customizability, allowing users to tailor their instances precisely to their needs. It offers seamless deployment options, including the ability to pre-install machine learning models, providing a smoother and more efficient experience for developers and users alike while offering full decentralization, absolute liquidity, instant withdrawals and no approvals to rent any of their machines. In this article, we will take a look at how we can use Akash to set up a miner for Heurist.

Hardware and Software Requirements for running a Stable Diffusion Miner for Heurist:

Nvidia Cards with at least 12 GB of VRAM
CUDA Toolkit 12.1 or 12.2
Nvidia GPU Drivers
Miniconda3

Step 1: Connecting to the Console:

Visit Akash Network Console
Connect your wallet and click on the Deployments tab on the left hand side
Click on the Rent GPUs Option here:

Click on Rent GPUs

In navigating through the setup, you'll encounter a configuration screen that prompts for several key inputs: the choice between a Docker Image or Operating System, the number of GPUs, the preferred GPU model, the quantity of CPU cores, as well as the Memory and Storage specifications. As a case in point, we've selected an Operating System tailored for AI Art or Stable Diffusion, which notably includes pre-installed Nvidia Drivers and the CUDA toolkit—vital components for our project. The RTX A4000 was our GPU of choice, owing to its robust VRAM capacity that's adept at running Stable Diffusion models with ease. For our testing phase, we opted for 16 CPU cores; however, configurations with 4 or 8 cores should suffice as well, considering the computation is predominantly GPU-centric. We rounded off our setup with 16 GB of Memory and 200 GB of Storage, leaving the advanced configuration settings untouched. This is a glimpse of what the setup screen encompasses:

Configuring Compute Setup

Click on Deploy and then it will show a “waiting for bids” screen. Wait for a few minutes until the bids based on the GPUs available show up.
After the Bids show up, it will look like this:

Compute Bids

Select the bid you like and click on the “Accept Bid” button at the top right.
Confirm the deployment by accepting the transaction for AKT in your Metamask wallet.
After confirming it will look like this:

Confirmed Deployment

Step 2: Initializing the Deployment

After Deployment is successful, your instance will automatically show up on your current tab with the “Events Tab” pre-selected.
The instance page will contain this:

It will show events setting up in the “Events Tab”, meaning Akash is installing the OS and the necessary drivers onto the instance. It typically takes a minute or two until the instance is configured for use.
After waiting for a few minutes, click on the “Shell” tab and it will give you this message once successfully configured:

It seems that our shell is set up and we can verify that the NVIDIA drivers and the CUDA toolkit are pre-installed by running this command in the shell:

nvidia-smi

A Successful installation will look like this:

Akash Console Shell

Displayed is our configuration featuring an RTX 4000 equipped with 16GB of VRAM and the CUDA Toolkit version 12.2, in addition to the installed NVIDIA Drivers. This setup fulfills the prerequisites for operating a Heurist miner. With these specifications in place, we are now positioned to proceed with the installation of Miniconda.

Step 3: Installing Miniconda and other dependencies

We need to log into the root user so enter command:
su -
Update and upgrade Linux Packages and Dependencies:
apt-get update
apt-get upgrade
Install wget, tmux and Neovim:
apt-get install wget
apt-get install neovim
apt-get install tmux

Installing Miniconda:

Referenced from : Install Miniconda in 5 Steps

Create a directory for Minconda3:
mkdir -p ~/miniconda3
Download the latest Miniconda Installation Script:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
Run the Install Script:
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
After the installation is finished, it will look like this:

Delete the Install Script:
rm -rf ~/miniconda3/miniconda.sh
Add a conda initialize to your bash:

~/miniconda3/bin/conda init bash

Exit from the shell as we have to reconnect to it in order to initialize conda:
exit
Click on the events tab after exiting, wait for 10 seconds and click on the “Shell'' tab again to reconnect to the shell.

Verifying the Installation and creating the conda environment:

After restarting the shell, run:

conda list

If Miniconda has been installed successfully, you will see this:

Creating the Environment:

To establish our environment, execute the following command and allow some time for the downloading and extraction of the necessary packages:
conda create --name gpu-3-11 python=3.11
Create a new tmux session to activate the environment and activate the miner by running:
tmux new -s heurist
Activate the environment once you have entered the tmux by running:
conda activate gpu-3-11
Install the conda environment dependencies we need to run the miner (this will take ~5mins to install):
conda install pytorch torchvision torchaudio pytorch-cuda=12.2 -c pytorch -c nvidia

Cloning the miner repository to access the miner:

Clone the official repository by running:
git clone https://github.com/heurist-network/miner-release
Enter into the miner-release directory:
cd miner-release
Install the python dependencies needed to run the miner:
pip install python-dotenv
pip install -r requirements.txt
After installing all the dependencies, configure your Miner ID to receive rewards by creating a .env file using an editor of your choice:
For this tutorial we have chosen to use Neovim as our file editor but you are free to use nano, vi or vim as well.
Create and open the .env file while we are still in the miner-release directory:
nvim .env
Configure your Miner ID in order to be eligible for incentives and rewards by entering your 0x EVM address into your .env file like this: MINER_ID_0=0xYourWalletAddressHere
Exit Neovim
Press your keyboard’s escape button, or
Write :exit or :wq and press enter
Make sure that after exiting Neovim we have the .env configured and we are still in tmux.

Finally, Running the Miner:

While we are still in tmux, run:

python3 sd-miner-v1.0.0.py

python sd-miner-v1.0.0.py

Note: Make sure you select the correct version by checking the directory. As of mid march 2024, 1.0.0 is the latest version for stable diffusion.

After running the miner, you will be asked for a yes or no to install the miner’s packages so enter y or yes
Soon the model will be ready and your tmux will show: No Model Updates Required. and then this: All model Files are up to date, Miner is ready.
When you see these, it means that the Miner is ready and running so you can detach from the tmux by pressing CTRL + b at the same time on your keyboard and d right after.
Now you can successfully exit your machine as your miner is up and running in the tmux session in the background.

The resultant effect is that we have rented a GPU on Akash and are now using it as a miner on the Heurist network, serving image creation via stable diffusion. Nirmaan will utilise Akash and other decentralised compute providers as part of our compute aggregator, enabling us to route jobs to the cheapest available compute.

Nirmaan's Middleware Layer

ASXN Labs AI Thesis

asxn-labs@newsletter.paragraph.com (ASXN Labs) — Sun, 24 Mar 2024 19:15:22 GMT

Introduction

Artificial intelligence, which is broadly the ability of machines to perform cognitive tasks, has quickly become an essential technology in our day to day lives. The breakthrough in 2017 occurred when transformers were developed to solve the problem of neural machine translation, which allows a model to take an input sentence of a task and produce an output. This enabled a neural network to take text, speech, or images as an input, process it, and produce output.

OpenAI and Deepmind pioneered this technology and more recently the OpenAI GPT (Generative Pre-trained Transformer) models created a eureka for AI with the proliferation of their LLM chatbots. GPT-1 was first introduced in June of 2018, featuring a model composed of twelve processing layers. It used a specialized technique called "masked self-attention" across twelve different focus areas, allowing it to understand and interpret language more effectively. Unlike simpler learning methods, GPT-1 employed the Adam optimization algorithm for more efficient learning, with its learning rate gradually increasing and then decreasing in a controlled manner. Overall, it contained 117 million adjustable elements, or parameters, which helped refine its language processing capabilities.

GPT 1 Architecture

Fast forward to March 14th 2023, OpenAI released GPT-4, which features approximately 1.8 trillion parameters spread across 120 layers. The increase in parameters and layers enhances its ability to understand and generate more nuanced and contextually relevant language, among other things. The over 10,000x increase in the number of parameters in OpenAI’s GPT models in under 5 years shows the astounding rate of innovation happening at the cutting edge of generative models.

[insert performance data]

Regulation

Running parallel to this innovation and underpinning the AI stack is regulation. Whenever a transformative technology comes to market, regulators will introduce laws and processes so that they can better control it. Almost prophetically, we saw this play out in 1991 when Joe Biden, then a chairman of the Senate Judiciary Committee, proposed a bill to ban encryption on Emails. This potential ban on code and mathematics inspired Phil Zimmermann to build the open source Pretty Good Privacy (PGP) program that enabled users to communicate securely by decrypting and encrypting messages, authenticating messages through digital signatures, and encrypting files. The United States Customs Service went on to start a criminal investigation into Zimmermann for allegedly violating the Arms Export Control Act as they regarded his PGP software as a munition and wanted to limit access to strong cryptography to citizens and foreign entities.

Reminiscent of the email encryption bill, on the 30th of October 2023 Joe Biden, now the President of the United States, passed a Presidential Executive Order on “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence”. The order falls under the Defense Production Act (DPA) affording the President a broad set of authorities to ensure the country has the resources necessary for national security. Broadly, the act seeks to establish new standards for AI safety and security. The order imposes strict Know Your Customer (KYC) on compute and data, whilst also banning all foreign AI model training occurring on US soil or in US data centers. On top of this they are putting permissionless AI models that will be capped at contains “tens of billions of parameters”, for reference Mistral-7B-v0.1 has 7 billion parameters. We are also witnessing this play out with hardware as the US recently prohibited the sale of semiconductor chips above a certain capability threshold to China, Russia and other nations.

Model Generation

On top of the centralizing regulatory pressures that artificial intelligence faces, there are a number of centralizing forces throughout the creation of a model. The creation of an AI model, particularly large-scale models like those used in natural language processing, typically follows three main phases: pre-training, fine-tuning, and inference. We will walk through each phase and the centralizing forces that are present:

Pre-Training

The pre-training phase is the initial step where the model learns a wide range of knowledge and skills from a large and diverse dataset. Before the advent of transformer-based architectures, top-performing neural models in natural language processing (NLP) primarily used supervised learning, which required vast quantities of curated and manually labeled data, which resided mostly within corporate boundaries. This dependence on supervised learning restricted their applicability to datasets lacking extensive annotations and created a centralizing force due to the prohibitive costs of employing skilled researchers and developers to perform this supervised learning. During this pre-transformer stage, supervised pre-training of models was dominated by

centralized entities like Google who had the resources to fund this work. The advent of transformer-based architectures, among other advancements, contributed significantly to the advancement of unsupervised learning, particularly in the field of natural language processing, enabling models to be trained on datasets without predefined labels or annotated outcomes.

Data Collection & Preparation

The first step in pre-training a model is gathering the data that the model will be trained on. A large and diverse data set is collected from a vast corpus of text such as books, websites and articles. The data is then cleaned and processed.

Tokenization involves breaking down text data into smaller units, or tokens, which may range from words to parts of words, or even individual characters, based on the model's architecture. Following this, the data undergoes formatting to make it comprehensible to the model. This typically includes transforming the text into numerical values that correspond to the tokens, such as through the use of word embeddings.

Model Architecture

Selecting the right model architecture is a crucial step in the development process, tailored to the specific application at hand. For instance, transformer-based architectures are frequently chosen for language models due to their effectiveness in handling sequential data. Alongside choosing a framework, it's also important to set the initial parameters of the model, such as the weights within the neural network. These parameters serve as the starting point for training and will be fine-tuned to optimize the model's performance.

Training Procedure

Using the cleaned and processed data, the model is fed a large amount of text and learns patterns and relationships in order to make predictions about that text. During the training procedure there are a couple of key procedures used to dial in the parameters of the model so that it produces accurate results. One is the learning algorithm:

The learning algorithm in neural network training prominently involves backpropagation, a fundamental method that propagates the error—defined as the difference between the predicted and actual outputs—back through the network layers. This identifies the contribution of each parameter, like weights, to the error. Backpropagation involves gradient calculation, where gradients of the error with respect to each parameter are computed. These gradients, essentially vectors, indicate the direction of the greatest increase of the error function.

Additionally, Stochastic Gradient Descent (SGD) is employed as an optimization algorithm to update the model's parameters, aiming to minimize the error. SGD updates parameters for each training example or small batches thereof, moving in the opposite direction of the error gradient. A critical aspect of SGD is the learning rate, a hyperparameter that influences the step size towards the loss function's minimum. A very high learning rate can cause overshooting of the minimum, while a very low rate can slow down the training process significantly.

Furthermore, the Adam optimizer, an enhancement over SGD, is used for its efficiency in handling separate learning rates for each parameter. It adjusts these rates based on the first moment (average of recent gradients) and the second moment (square of these gradients). Adam's popularity stems from its ability to achieve better results more quickly, making it ideal for large-scale problems with extensive datasets or numerous parameters.

The second key procedure we use in the training phase is the loss function, also known as a cost function. It plays a crucial role in supervised learning by quantifying the difference between the expected output and the model's predictions. It serves as a measure of error for the training algorithm to minimize. Common loss functions include Mean Squared Error (MSE), typically used in regression problems, where it computes the average of the squares of the differences between actual and predicted values. In classification tasks, Cross-Entropy Loss is often employed. This function measures the performance of a classification model by evaluating the probability output between 0 and 1. During the training process, the model generates predictions, the loss function assesses the error, and the optimization algorithm subsequently updates the model's parameters to reduce this loss. The choice of loss function is pivotal, significantly influencing the training's efficacy and the model's ultimate performance. It must be carefully selected to align with the specific objectives and nature of the problem at hand.

Resource Allocation

Resource allocation during the pre-training phase of AI models, particularly for large-scale models like those in the GPT series, necessitates a careful and substantial deployment of both computational and human resources. This phase is pivotal as it establishes the groundwork for the model's eventual performance and capabilities. The pre-training of these complex AI models demands an extensive amount of computational power, primarily sourced from Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs), which are specialized for handling the intense parallel processing tasks typical in machine learning. To address the considerable computational needs, a distributed computing approach is often adopted, utilizing multiple GPUs or TPUs across various machines or data centers in tandem to process the vast amounts of training data and update the model parameters efficiently.

Moreover, the significant volume of data required for pre-training, potentially reaching petabytes, necessitates robust storage solutions for both the raw and processed data formats. The energy consumption during this phase is notably high due to the prolonged operation of high-performance computing hardware, prompting a need to optimize computational resource use to strike a balance between performance, cost, and environmental impact. The financial aspects also play a critical role, as the acquisition and maintenance of necessary hardware, alongside the electricity for powering and cooling these devices, entail substantial costs. Furthermore, many organizations turn to cloud computing services to access the needed computational resources, adding a variable cost based on usage rates. In fact, when asked at an MIT event Sam Altman said that GPT 4 cost “more than $100 million” to train.

Fine-Tuning

The next stage in the creation of a model is fine-tuning. The pre-trained model undergoes adaptation to excel in specific tasks or with certain datasets that were not part of its initial training regimen. This phase takes advantage of the broad capabilities acquired during pre-training, refining them for superior performance in more focused applications, such as text classification, sentiment analysis, or question-answering. Fine-tuning involves preparing a smaller, task-specific dataset that reflects the nuances of the intended application, modifying the model's architecture to suit the task's unique output requirements, and adjusting parameters, including adopting a lower learning rate for more precise, targeted optimization. The model is then retrained on this curated dataset, which may involve training only the newly adjusted layers or the entire model, depending on the task's demands.

Following the initial pre-training and fine-tuning phases, models, particularly those akin to OpenAI's GPT-3, may undergo Reinforcement Learning from Human Feedback (RLHF) as an additional refinement step. This advanced training approach integrates supervised fine-tuning with reward modeling and reinforcement learning, leveraging human feedback to steer the model towards outputs that align with human preferences and judgments. This process begins with fine-tuning on a dataset of input-output pairs to guide the model towards expected outcomes. Human annotators then assess the model's outputs, providing feedback that helps to model rewards based on human preferences. A reward model is subsequently developed to predict these human-given scores, guiding reinforcement learning to optimize the AI model's outputs for more favorable human feedback. RLHF thus represents a sophisticated phase in AI training, aimed at aligning model behavior more closely with human expectations and making it more effective in complex decision-making scenarios.

Inference

The inference stage marks the point where the model, after undergoing training and possible fine-tuning, is applied to make predictions or decisions on new, unseen data. This stage harnesses the model's learned knowledge to address real-world problems across various domains. The process begins with preparing the input data to match the training format, involving normalization, resizing, or tokenizing steps, followed by loading the trained model into the deployment environment, whether it be a server, cloud, or edge devices. The model then processes the input to generate outputs, such as class labels, numerical values, or sequences of tokens, tailored to its specific task. Inference can be categorized into batch and real-time, with the former processing data in large volumes where latency is less critical, and the latter providing immediate feedback, crucial for interactive applications. Performance during inference is gauged by latency, throughput, and efficiency—key factors that influence the deployment strategy, choosing between edge computing for local processing and cloud computing for scalable resources. However, challenges such as model updating, resource constraints, and ensuring security and privacy remain paramount.

Centralizing Forces Within Model Generation

In the process of creating an AI model, numerous centralizing and monopolistic forces come into play. The significant resources needed for every phase of development pave the way for economies of scale, meaning that efficiency improvements tend to concentrate superior models in the hands of a select few corporations. Below, we detail the diverse mechanisms through which AI centralization occurs:

Pre-Training

As we have seen, the pre-training phase of a model combines a few things: data, training and resources. When it comes to the data collection, there are a number of issues:

Access to data

The pre-training phase requires a large corpus of data, typically from books, articles, corporate databases and from scraping the internet. As we discussed, when supervised learning dominated as a training technique, the large companies like Google could create the best models due to the large amount of data they were able to store from users interacting with their search engine. We see a similar centralizing and monopolistic force throughout AI today. Large companies such as Microsoft, Google & OpenAI have access to the best data through data partnerships, in-house user data or the infrastructure required to create an industrial internet scraping pipeline. For example, leaked documents suggest OpenAI is preparing to purchase user data from Tumblr and WordPress, at the expense of users' privacy.

The top 1% of x networks, facilitates x proportion of the total traffic / volume. Source Chris Dixon's "Read Write Own".

Transformers enabled unsupervised learning models but the scraping of web data is no easy feat, web pages typically ban scraper IP addresses, user agents and employ rate limits and CAPTCHA services.

AI companies deploy a variety of tactics to navigate around the barriers websites put in place to obstruct data collection efforts. One common method involves utilizing a diverse array of IP addresses to sidestep IP-based rate limiting or outright bans, often achieved through the use of proxy servers or VPN services. Additionally, altering the user-agent string in HTTP requests—a technique known as User-Agent Spoofing—allows these companies to emulate different browsers or devices, thereby potentially circumventing blocks aimed at user-agent strings typically associated with automated bots or scrapers. Furthermore, to overcome CAPTCHA challenges, which are frequently employed by websites to prevent automated data collection, some AI companies turn to CAPTCHA solving services. These services are designed to decode CAPTCHAs, enabling uninterrupted access to the site's data, albeit raising questions about the ethical implications of such practices.

Beyond their ability to gather large amounts of data, big corporations also have the financial means to build strong legal teams. These teams work tirelessly to help them collect data from the internet and through partnerships, as well as to obtain patents. We can see this happening today with OpenAI and Microsoft, who are in a legal dispute with The New York Times. The issue is over the use of The New York Times' articles to train the ChatGPT models without permission.

Patent Centralization. Source: Statista

Closed source data

There are also ethical and bias considerations involved in training a model. All data has some inherent bias attached to it since AI models learn patterns, associations, and correlations from their training data, any inherent biases in this data can be absorbed and perpetuated by the model. Common biases we find in AI models result from sample bias, measurement bias and historical bias and can lead to AI models producing poor or unintended results. For example, Amazon trained an automated recruitment model which was designed to assess candidates based on their fit for different technical positions. The model developed its criteria for evaluating suitability by analyzing resumes from past applicants. However, since the data set it was trained on included predominantly male resumes, the model learned to penalize resumes that included the word “women”.

Resource allocation

As we have discussed, pre-training of foundation models requires large cycles of GPU compute, costing hundreds of millions to train the top models (in 2022, OpenAI reported a $540 million loss in the training phase of GPT3). Demand for accessible and usable GPUs vastly outstrips current supply and this has led to a consolidation of the pre-training of models to within the largest and most well-funded tech companies (FAANG, OpenAI, Anthropic) and data centers.

Although corporations keep details of their data centers and operations somewhat secret, for a variety of reasons: security, regulatory compliance, customer data protection & competitive advantages we can see that the top 5 cloud & data center providers

We have learned that models improve with training size logarithmically and therefore, in general, the best models are the ones trained with the highest number of GPU compute cycles. Thus, a very centralizing force within the pre-training of models is the economies of scale and productivity gains large incumbent tech and data companies have and we are seeing this play out with OpenAI, Google, Amazon, Microsoft and Meta dominating.

Source: Epochai

The concentration of the power to develop transformative artificial intelligence technologies within a small number of large corporations, such as OpenAI, Google, and Microsoft, prompts significant concerns. As articulated by Facebook's first president, the primary objective of these platforms is to capture and retain as much of our time and conscious attention as possible. This reveals a fundamental misalignment of incentives when interacting with Web2 companies, an issue we have begrudgingly accepted due to the perceived benefits their services bring to our lives. However, transplanting this oligopolistic Web2 model onto a technology that is far more influential than social media—and holds the capacity to profoundly influence our decisions and experiences—presents a concerning scenario. A perfect example of this is the Cambridge Analytica scandal in the 2010’s. The British firm unauthorizedly gathered personal data from up to 87 million Facebook users in order to build a user profile of each user before serving them targeted political ads to influence elections. This data aided the 2016 U.S. presidential campaigns of Ted Cruz and Donald Trump, and was implicated in the Brexit referendum interference. If such a powerful tool as AI falls under the control of a few dominant players, it risks amplifying the potential for misuse and manipulation, raising ethical, societal, and governance issues.

GPU Supply-Side Centralisation

The resultant effect of models scaling logarithmically with training size, is that demand for GPU compute is growing exponentially to achieve linear gains in model quality. Certainly we have seen this play out over the last 2 years with demand for GPU compute skyrocketing with the launch of chatGPT and the AI race. If we take Nvidia’s revenue as a proxy for GPU demand, we see that Nvidia’s quarterly revenue increased 405% from Q4 2022 to Q4 2023.

Source: Nvidia reports

The production of GPUs and microchips for AI training is an extremely complex and expensive process, with high barriers to entry. As such, there are few companies capable of producing hardware capable of delivering the performance that companies like OpenAI require to train their GPT models. The largest of these semiconductor and GPU manufacturers is Nvidia, holding approximately 80% of the global market share in GPU semiconductor chips. Originally starting off in 1993, creating graphics-based computing hardware for video games, Nvidia quickly became a pioneer in high end GPUs and made their seminal step into AI in 2006 with the launch of its Compute Unified Device Architecture (CUDA), which specialized in GPU parallel processing.

The hardware used to train a model is vital and the costs of this are extremely high as we have discussed. To compound the barriers to entry of training a model, the current access to this hardware is extremely limited with only top tech companies receiving their orders in a timely manner. Normal people like you or I cannot buy the latest and greatest, H100 Tensor Core GPU from Nvidia. Nvidia works directly with Microsoft, Amazon, Google and co to facilitate large bulk orders of GPUs, leaving regular people at the bottom of the waitlist. We have seen a number of initiatives between chip manufacturers and large corporations in order to create the infrastructure required to train and provide inference for these models, for example:

OpenAI - In 2020, Microsoft exclusively built a supercomputer in order to train their GPT models. The supercomputer developed for OpenAI is a single system with more than 285,000 CPU cores, 10,000 Nvidia V100 and A100 GPUs and 400 gigabits per second of network connectivity for each GPU server.
Microsoft - In 2022, Nvidia partnered with Microsoft to create a 1,123,200-core supercomputer utilizing Microsoft's Azure cloud technology. Eagle is now the 3rd largest supercomputer in the entire world, with maximum performance of 561 petaFLOPS generated from 14,400 Nvidia H100 GPUs and Intels’ Xeon Platinum 8480C 48C CPU.
Google - In 2023, Google announced the A3 supercomputer, purpose built for AI & ML models. A3 combines Nvidia’s H100 GPUs with Google’s custom-designed 200 Gpbs Infrastructure Processing Units (IPUs), allowing the A3 to host up to 26,000 H100 GPUs.
Meta - By year end 2024, Meta expects to operate some 350,000 Nvidia H100 GPUs and an equivalent of 600,000 H100 of compute from older GPUs such as the Nvidia A100’s used to train Meta’s LLaMA models.

Source: Statista

The application of these feats of engineering when applied to training of models is immediately transparent. The large number of GPUs allow for parallel processing, enabling AI training to be greatly sped up and for large models to be created. Take Microsoft's Eagle supercomputer for example, using the MLPerf benchmarking suite, this system trained a GPT-3 LLM generative model with 175 billion parameters, in just 4 minutes. The 10,752 H100 GPUs significantly speed up the process by leveraging their parallel processing capabilities, specialized Tensor Cores for deep learning acceleration, and high-speed interconnects like NVLink and NVSwitch. These GPUs' large memory bandwidth and capacity, along with optimized CUDA and AI frameworks, facilitate efficient data handling and computations. Consequently, this setup enables distributed training strategies, allowing for simultaneous processing of different model parts, which drastically reduces training times for complex AI models.

Scale records on the model GPT-3 (175 billion parameters) from MLPerf Training v3.0 in June 2023 (3.0-2003) and Azure on MLPerf Training v3.1 in November 2023 (3.1-2002). Source: Microsoft

We have clearly established then that the powerhouse behind the training of these large models is compute power, primarily in the form of GPUs. The centralizing forces we run into here are two fold:

Exclusivity - Nvidia GPUs have a huge waitlist & monopolistic corporations bulk order GPUs with priority over smaller orders / individuals.
Costs - The sheer cost of these GPU configurations mean only a small set of entities worldwide can train these models. For reference, each Nvidia H100 costs anywhere between $30,000 to $40,000, meaning Meta’s 600,000 H100 equivalent compute infrastructure will cost between $10.5 Billion and $24 Billion.

Supercomputer Geographical Centralization. Source: Wikipedia TOP500

Amid the consolidation of computational power by major corporations, there's a parallel and strategic push by leading nations to enhance their computational capabilities, mirroring the intense competition of the Cold War's nuclear arms race. These countries are crafting and implementing comprehensive AI strategies, accompanied by a suite of regulatory measures aimed at securing technological supremacy. Notably, a Presidential executive order now mandates that foreign entities must obtain authorization to train AI models on U.S. territory. Additionally, export restrictions on microchips are set to hinder China's efforts to expand its supercomputing infrastructure, showcasing the geopolitical maneuvers to maintain and control the advancement of critical technologies.

Chip Manufacturing

Whilst Nvidia & other semiconductor companies are at the cutting edge of chip design, they outsource all of their manufacturing to other corporations. Taiwan serves as the global hub for microchip production, accounting for more than 60% of the world's semiconductors and over 90% of the most sophisticated ones. The majority of these chips are produced by the Taiwan Semiconductor Manufacturing Corporation (TSMC), the sole manufacturer of most advanced semiconductors. Nvidia’s partnership with TSMC is fundamental to the company's success and for the efficient production of H100 GPUs. TSMC distinguishes itself in the semiconductor industry with its advanced chip packaging patents, utilizing high-density packaging technology that stacks chips in three dimensions to enhance performance. This technology is crucial for producing chips designed for intensive data processing tasks, such as AI, enabling faster operation.

Whilst microchip production is currently working at maximum capacity, there are some risks regarding the possible dangers to production due to increased military threats from China towards Taiwan, a democratic island claimed by Beijing despite Taipei's vehement opposition. Geopolitical tensions in the region have heightened, but worldwide we are seeing a heightening of AI tensions with the US banning certain microchip exports to China so as not to strengthen China’s AI capabilities and military. Should China advance on Taiwan, it could strategically position itself to dominate microchip manufacturing and thus the AI race.

Fine-Tuning & Closed-source Models

In the fine-tuning stage the model is trained on new, specific datasets and the internal configurations that allow the model to make predictions or decisions based on input data are altered. These internal configurations are called parameters and in neural networks, ‘weights’ are coefficients applied to input data, determining the connection strength between units across different layers of the model, and are adjusted throughout training to minimize prediction errors. ‘Biases’, constants added before the activation function, ensure the model can make accurate predictions even when inputs are zero, facilitating pattern recognition by allowing shifts in the activation function's application.

Closed-source models like OpenAI's GPT series maintain the confidentiality of their training data and model architecture, meaning the specific configurations of their parameters remain exclusive. The owner of this model retains complete control over how it is used, developed and deployed which can lead to a number of centralizing forces within the fine-tuning stage of a model:

Censorship - Owners can decide what types of content the model generates or processes. They can implement filters that block certain topics, keywords, or ideas from being produced or recognized by the model. This could be used to avoid controversial subjects, comply with legal regulations, or align with the company's ethical guidelines or business interests. Since the launch of chatGPT, the outputs have continued to become increasingly censored and less useful. An extreme case of censorship of these models is showcased in China, where weChat conversions with Robot (built atop OpenAI’s foundational model) doesn’t answer questions such as “What is Taiwan?” or allow users to ask questions about Xi Jinping. In fact, through adversarial bypass techniques, a WSJ reporter was able to get Robot to admit that it was programmed to avoid discussing “politically sensitive content about the Chinese government or Communist Party of China.”
Bias - In neural networks, the role of weights and biases is pivotal, yet their influence can inadvertently introduce bias, particularly if the training data lacks diversity. Weights, by adjusting the strength of connections between neurons, may disproportionately highlight or ignore certain features, potentially leading to a bias of omission where critical information or patterns in underrepresented data are overlooked. Similarly, biases, set to enhance learning capabilities, might predispose the model to favor certain data types if not calibrated to reflect a broad spectrum of inputs. The closed source nature of these models can cause the model to neglect important patterns from specific groups or scenarios, skewing predictions and perpetuating biases in the model's output, meaning certain perspectives, voices or information are excluded or misrepresented. A good example of bias and censorship by the model owner is Google’s latest and greatest LLM, Gemini.
Verifiability - In a closed-source environment, users cannot confirm whether the claimed version of a model, such as ChatGPT 4 versus ChatGPT 3, is actually being used. This is because the underlying model architecture, parameters, and training data are not accessible for external review. Such opacity makes it difficult to ascertain if the latest advancements or features are indeed present or if older technologies are being passed off as newer versions, potentially affecting the quality and capabilities of the AI service received. For example, when using AI models to ascertain an applicant's credit worthiness for a loan, how can the applicant be sure that the same model was run by them as other applicants? Or how can we be sure the model only used the inputs it was supposed to use?
Dependency, lock-in and stagnation - Entities that rely on closed source AI platforms or models find themselves dependent on the corporations that maintain these services, leading to a monopolistic concentration of power that stifles open innovation. This dependency arises because the owning corporation can, at any moment, restrict access or alter the model, directly impacting those who build upon it. A historical perspective reveals numerous instances of this dynamic: Facebook, which initially embraced open development with its public APIs to foster innovation, notably restricted access to applications like Vine as they gained traction. Similarly, Voxer, a messaging app that gained popularity in 2012 for allowing users to connect with their Facebook friends, lost its access to Facebook's 'Find Friends' feature. This pattern is not exclusive to Facebook; many networks and platforms begin with an open-source or open innovation ethos only to later prioritize shareholder value, often at the expense of their user base. We see for-profit corporations eventually require take rates in order to meet their stated goals of creating shareholder value, for example Apple's App Store imposes a 30% fee on the revenues that are generated from apps. Another example is Twitter. Despite its original commitment to openness and interoperability with the RSS protocol network, eventually prioritized its centralized database, leading to a disconnection from RSS in 2013 with it the loss of data ownership and one's social graph. Amazon has also been accused of using its internal data to replicate and prioritize its products over those of other sellers. These examples underscore a trend where platforms evolve from open ecosystems to more controlled, centralized models, impacting both innovation and the broader digital community.
Privacy - The owners of these centralized models, large corporations such as OpenAI, retain all rights to use the prompt and user data to better train their models. This greatly inhibits user privacy. For example, Samsung employees inadvertently exposed highly confidential information by utilizing ChatGPT for assistance with their projects. The organization permitted its semiconductor division engineers to use this AI tool for debugging source code. However, this led to the accidental disclosure of proprietary information, including the source code of an upcoming software, internal discussion notes, and details about their hardware. Given that ChatGPT collects and uses the data inputted into it for its learning processes, Samsung's trade secrets have unintentionally been shared with OpenAI.

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement Learning from Human Feedback (RLHF) integrates supervised fine-tuning, reward modeling, and reinforcement learning, all underpinned by human feedback. In this approach, human evaluators critically assess the AI's outputs, assigning ratings that facilitate the development of a reward model attuned to human preferences. This process necessitates high-quality human input, highlighting the importance of skilled labor in refining these models. Typically, this expertise tends to be concentrated within a few organizations capable of offering competitive compensation for such specialized tasks. Consequently, corporations with substantial resources are often in a better position to enhance their models, leveraging top talent in the field. This dynamic presents challenges for open-source projects, which may struggle to attract the necessary human labor for feedback without comparable funding or revenue streams. The result is a landscape where resource-rich entities are more likely to advance their AI capabilities, underscoring the need for innovative solutions to support diverse contributions in the development of AI technologies.

Inference

To effectively deploy machine learning (ML) or artificial intelligence (AI) models for user applications, it is imperative to ensure these models are equipped to manage real-world data inputs and provide precise, timely predictions or analyses. This necessitates careful deliberation on two pivotal aspects: the choice of deployment platform and the infrastructure requirements.

Deployment Platform

The deployment platform serves as the foundation for hosting the model, dictating its accessibility, performance, and scalability. Options range from on-premises servers, offering heightened control over data security and privacy, to cloud-based solutions that provide flexible, scalable environments capable of adapting to fluctuating demand. Additionally, edge computing presents a viable alternative for applications requiring real-time processing, minimizing latency by bringing computation closer to the data source. As with the pre-training stage, we run into similar centralisation problems when deploying the model for real world use:

Infrastructure centralisation - The majority of models are deployed on top of high-performance cloud infrastructure, of which there are not many options worldwide. As highlighted earlier, a small set of corporations have the facilities to process inference for these high parameter models and the majority are located in the US (as of 2023, 58.6% of all data centers were located in the USA). This is particularly relevant in light of the presidential executive order on AI and the EU AI act as it could greatly limit the number of countries that are able to train and provide inference for complex AI models.

Source: Statista

Costs - Another centralizing force within the inference stage is the significant costs involved in deploying these models on one's own servers, cloud infrastructure, or through edge computing. OpenAI has partnered with Microsoft to utilize Microsoft's Azure cloud infrastructure for serving its models. Dylan Patel, the chief analyst at consulting firm SemiAnalysis, estimated that OpenAI's server costs for enabling inference for GPT-3 were $700,000 per day. Importantly, this was when OpenAI was offering inference for their 175 billion parameter model, so, all things being equal, we would expect this number to have escalated well into the seven figures today. In addition to the geographical and jurisdictional centralization of these data centers, we also observe this necessary infrastructure being consolidated within a few corporations (84.9% of cloud revenues were generated by four companies in 2023).

Source: Amazon, Microsoft, Google, Equinix, Statista

Centralized Frontends

Centralized hosting of frontends involves delivering the user-interface components of websites and web applications from one primary location or a select few data centers managed by a handful of service providers. This method is widely adopted for rolling out web applications, particularly those leveraging AI technologies to offer dynamic content and interactive user experiences. The frontend is therefore susceptible to take-downs through regulations or through changes in the policies of the service providers. We have seen this play out in Mainland China as citizens are blocked from interacting with the frontends of the popular AI interfaces such as ChatGPT and Hugging Face.

Conclusion

In conclusion, we can see the status quo for AI suffers from a number of centralizing and monopolistic forces that enable a minority of the world's largest entities to control and distribute models to the population. We have seen from the failures of web2, the misalignment of incentives between the user and corporation poses a dire threat to our freedoms, privacy and right to use AI. The impending regulation surrounding AI and the flourishing open source space shows we are at a pivotal moment in the advancement of the technology and that we should do everything in our power to ensure it remains free and open source for all to use. In our next blog we will cover how crypto at the intersection of AI is enabling these free and open source systems to scale, improving the status quo and improving crypto.

Introduction to Crypto AI

The influence of AI on our world is becoming increasingly evident across various aspects of daily life and industry. From enhancing the efficiency of operations in sectors such as healthcare, finance, and manufacturing to transforming the way we interact with technology through personal assistants and smart devices, AI's impact is profound. In our first report, we covered all the centralizing forces within model creation, culminating in increasing control amassed by major AI providers such as OpenAI and Microsoft. Approaching from a philosophical perspective, AI embodies digital knowledge. Within the vast expanse of the digital domain, knowledge stands out as a prime candidate for decentralization. This article delves into the convergence of AI and cryptocurrency, exploring how permissionless, uncensorable networks for settlement and incentivization can foster the secure and democratic evolution of AI. Additionally, we will scrutinize how AI can contribute to the enhancement of cryptocurrency ecosystems, creating a symbiotic relationship that promotes growth and innovation in both fields.

Pre-Training

As we have discussed extensively in part 1, within the pre-training phase of model generation we encounter multiple centralizing forces, namely:

Closed-source data & data access
Geographical centralization of resources
Resource costs
GPU supply-side exclusivity

The collection of this data in order to train models is vital, however the following issues are prevalent:

Data Access - Major firms like Microsoft, Google, and OpenAI have superior data access through partnerships, their own user data, or the capability to establish extensive web scraping operations.
Closed-Source Data - When Training models the data used requires careful consideration of bias.
Data Provenance - Determining and verifying the source of data is becoming increasingly important, as it ensures the integrity and reliability of the data which is crucial when training a model.

Collection & Preparation

Up to 80% of the effort in deploying AI models is dedicated to data preparation. This task becomes more time-consuming and complex with fragmented or unstructured data, with exporting and cleansing being the two critical steps in the process. The competitive landscape of AI is intensifying as major websites with investments or strategic partnerships in centralized AI entities take measures to safeguard their position by restricting smaller contenders' access to vital data. These websites have adopted policies that effectively make data access prohibitively expensive, excluding all but the most well-funded AI laboratories. They frequently employ strategies such as blocking IP addresses from recognized data centers, and in some cases, they engage in intentional data poisoning—a tactic where companies deliberately corrupt shared data sources to disrupt their rivals' AI algorithms.

Valid residential IP addresses and user-agents hold significant value, as they enable the collection of internet data in a way that ensures the retrieval of accurate information. Every ordinary internet user possesses this potential, and if leveraged collectively within a network, it could facilitate the extensive indexing of the web. This, in turn, would empower open-source and decentralized AI initiatives by providing them with the vast datasets necessary for training. The use of crypto incentives to accurately reward participation in this DePIN network can create a virtuous flywheel that can enable this network to compete with the likes of Google and Microsoft who are the only entities who have indexed the whole internet:

Source: Messari

The DePIN flywheel works as follows:

Participants contributing to the network's growth are motivated through inflationary token rewards, effectively subsidizing their efforts. These incentives are designed to bolster the network's early development until it can earn steady income from user fees.
The expansion of the network draws in developers and creators of products. Furthermore, the network's financial support for those who supply its services enables them to provide these services at lower costs, which in turn entices end users.
As end users start to pay for the services offered by the network, the income for both the providers and the network itself rises. This increase in revenue generates a positive feedback loop, drawing in additional providers and investors to the network.
Being user owned, the value within the network can be distributed back to them, typically via a token burn model or through distribution of revenues. With these models, as the network becomes more useful and tokens are either removed from circulation in a burn model or staked by users, the value of the tokens tend to go up. This increase in token value further encourages more providers to join the network, perpetuating a beneficial cycle.

Utilizing DePIN to compile publicly accessible data could address the problem of proprietary datasets in AI, which often embed biases in the resultant models. Training open-source AI models on data from such a network would enhance our ability to detect, assess, and correct biases. Currently, the opaqueness surrounding the datasets used for training AI models hinders our comprehension of the biases they may contain, compounded by the difficulty in contrasting models trained on diverse datasets.The creation of this decentralized data network, could incentivise various contributors to provide datasets with clear provenance, while also enabling the tracking of how these datasets are utilized in both the initial training and subsequent fine-tuning of foundational AI models.

Grass

Grass is one such network focussed on data acquisition, cleaning and provenance. Functioning similarly to traditional residential proxy services, Grass harnesses the untapped potential of users' idle bandwidth for operations such as web scraping. By installing a Google Chrome application, users contribute to the Grass network whenever they are online. This system repurposes any surplus bandwidth for designated tasks, like the extraction of large corpora of texts, such as philosophical texts.Utilizing residential proxies, the Grass network navigates around common obstacles such as rate limits, blocks, and data poisoning attacks. This approach allows Grass to efficiently gather substantial volumes of online data in its intended format, optimizing the process of data extraction.

On top of enabling streamlined data acquisition, a key advantage of Grass lies in the compensation model: users are awarded the full value of their bandwidth contribution, rather than just a small portion of the proceeds.

Data Provenance

Led by AI, data generation is increasing exponentially, from 2 Zettabytes in 2010 to an estimated 175 Zettabytes in 2025. Forecasts predict a surge to over 2000 Zettabytes by 2035, indicating a growth rate exceeding 10 times in the next 15 years. This is in part due to the creation of AI generated content, in fact, a report into deep fakes and AI generated content by Europol estimated that AI-generated content could account for as much as 90% of information on the internet in a few years’ time, as ChatGPT, Dall-E and similar programs flood language and images into online space.

Source: Statista digital economy compass 2019

As discussed in part 1, the inherent biases in an AI model's outputs are often a reflection of the data on which it was trained. Consider the potential pitfalls of using data harvested by industrial-scale internet scrapers for pre-training: with the proliferation of AI-generated content online, there's a heightened risk of feeding models with inaccurate or skewed data. A clear manifestation of this issue is observed in Google's Gemini LLM, which, in its quest for equitable representation, might introduce historically inaccurate elements into generated content. For instance, it might produce an image of the founding fathers of America that includes a diverse range of ethnicities, diverging from historical accuracy.Therefore, the provenance of data is crucial in the training of models. Currently, we are compelled to trust that the proprietary datasets large corporations employ for model training are both accurate and genuinely sourced, rather than being generated by another AI model.

However, there are a number of crypto solutions on the market today that offer data provenance solutions. Decentralized data storage solutions, such as Filecoin, guarantee data provenance through the use of blockchain technology. This technology creates a clear, unchangeable ledger that records data storage, access, and any alterations over time, ensuring transparency and immutability in the data's history. By enabling individuals to offer their unused storage space, Filecoin creates a vast, decentralized network of data storage providers. Each transaction on Filecoin, from the initial agreement between data owners and storage providers to every instance of data access or modification, is permanently recorded on the blockchain. This creates an indelible and transparent history of data interactions, making it straightforward to track the data's storage, access, and movement. Furthermore, Filecoin employs cryptographic proofs to guarantee the integrity and immutability of the data stored within its network. Storage providers are required to periodically demonstrate, via verifiable proofs, that they are faithfully storing the data as per the contractual agreement, adding an extra layer of transparency and enhancing the overall security of the data stored. The clear data provenance and immutable ledger has started to attract a number of respected institutions and we are seeing entities such as NASA, the University of California, the National Human Genome Research Institute and the National Library of Medicine utilize this storage solution. Filecoin is starting to facilitate more and more deals in the greater that 1000 tebibyte size as a direct result of this.

Source: Filfox

On top of the benefits of immutable, censorship-resistant data provenance guarantees, we also find that onboarding the long-tail of storage devices from around the world drives the price of storage down, making the decentralized storage solutions cheaper than the centralized alternatives. For example, storing 1TB of data for one month on Filecoin costs $0.19, whilst storing on Amazon’s S3 is 121x more expensive costing $23 for the month. Due to these benefits we are starting to see decentralized storage solutions growing.

Source: Coingecko centralized vs decentralised storage costs

Resource Allocation

The availability of GPUs was already constrained before the advent of ChatGPT and it wasn’t uncommon to see periods of heightened demand from use cases such as Crypto mining. Following the launch of chatGPT and the cambrian explosions of foundational models, the demand for GPUs has surged dramatically, possibly even a hundredfold. Rarely have we seen such a significant disparity between the demand for a resource and its available supply, even though the aggregate supply exceeds demand. If every GPU worldwide were capable of being organized and utilized for AI training today, we would be facing an excess rather than a shortage.

The long-tail of GPUs are scattered across various platforms and devices, often underutilized or used for purposes far less demanding than their full capabilities allow, for example:

Gaming PCs and Consoles: High-end GPUs underused outside gaming could support distributed computing or AI training.
Corporate Workstations: Workstations with GPUs for creative tasks could be redirected for computational use during off-hours.
Data Centers: Despite their capacity, some data centers have GPUs with spare capacity ideal for AI tasks.
Academic Institutions: Universities with high-performance GPUs for research might not fully utilize them at all times, offering potential for broader use.
Cloud Computing Platforms: These platforms sometimes have more GPU resources than needed, presenting opportunities for optimized utilization.
Edge Devices: IoT and smart appliances have GPUs that, while less powerful individually, offer substantial collective processing power.
Cryptocurrency Mining Rigs: Market downturns make some rigs less suitable for mining but still valuable for computational tasks.

Previously, there lacked an incentivisation and coordination layer that could effectively manage this two-sided marketplace for compute, whilst also addressing the myriad of technical issues that must be considered when selecting GPUs for training. These issues primarily arise from the distributed and non-uniform nature of the aggregation of long-tail GPUs:

Diverse GPU Capabilities for Varied Tasks: Graphics cards vary widely in design and performance capabilities, making some unsuitable for specific AI tasks. Success in this domain hinges on effectively pairing the right GPU resources with the corresponding AI workloads.
Adapting Training Methods for Increased Latency: Currently, foundational AI models are developed using GPU clusters with ultra-low latency links. In a decentralized setup, where GPUs are distributed across various locations and connected over the public internet, latency can significantly rise. This situation presents a chance to innovate training methodologies that accommodate higher latency levels. Such adjustments could optimize the utilization of geographically dispersed GPU clusters.
Security and Privacy: Utilizing GPUs across various platforms raises concerns about data security and privacy. Ensuring that sensitive or proprietary information is protected when processed on external or public GPUs is crucial.
Quality of Service: Guaranteeing a consistent level of service can be challenging in a decentralized environment. Variability in GPU performance, network stability, and availability can lead to unpredictable processing times and outcomes.

Networks leveraging crypto-incentives to coordinate the development and operation of this essential infrastructure can achieve greater efficiency, resilience, and performance (although not quite yet) compared to their centralized counterparts. Although nascent, we can already see the benefits of onboarding the long-tail of GPU power into a Decentralised Physical Infrastructure Networks (DePIN):

Source: Messari

The costs to run some of the top performance GPUs on DePIN networks are between 60-80% cheaper than their centralized counterparts.

It’s still early within the generalized compute marketspace and despite the lower costs to utilize this infrastructure, we are seeing growing pains in terms of performance and uptime. Nevertheless, the demand for GPU has become apparent Akash’s daily spend increasing by 20.32x since their GPU market went live in late August 2023.

Source: Akash

The facilitate this massive increase in demand, Akash’s GPU capacity has had to scale quickly:

Source: Akash

The only way to compete with the centralized, monopolistic corporations and their ability to spend billions on compute power each year to improve their models is to harness the power of DePIN that provides decentralized, permissionless access to compute. Crypto incentives enable software to pay for the hardware without a central authority. The very dynamics that have allowed the Bitcoin network to become “the largest computer network in the world , a network orders of magnitude larger than the combined size of the clouds that Amazon, Google, and Microsoft have built over the last 15-20 years”, can allow decentralized and open source AI to compete with centralized incumbents.

Io Net

Another example of the DePIN thesis is Io net. The platform aggregates a diverse array of GPUs into a communal resource pool, accessible to AI developers and businesses, with their mission statement being “Putting together one million GPUs” into a network. Io net leverages token incentives to fundamentally decrease the expenses associated with acquiring and retaining supply-side resources, thereby diminishing costs for end consumers. Presently, this network is fueled by thousands of GPUs sourced from data centers, mining operations, and consumer-level hardware and has over 62,000 compute hours.

While pooling these resources presents significant value, AI workloads can't seamlessly transition from centralized, high-end and low latency hardware to distributed networks of heterogeneous GPUs. The challenge here lies in efficiently managing and allocating tasks across a wide variety of hardware, each with its own memory, bandwidth, and storage specifications. Io.net implements ‘clustering’ by overlaying custom-designed networking and orchestration layers on top of distributed hardware, effectively activating and integrating them in order to perform ML tasks. Utilizing Ray, Ludwig, Kubernetes, and other open-source distributed computing frameworks, the network enables machine learning engineering and operations teams to effortlessly scale their projects across an extensive network of GPUs with only minor modifications needed. We believe the limited demand for compute networks like Render and Akash is primarily attributed to their model of renting out single GPU instances. This approach leads to slower and less efficient machine learning training, hindering their attractiveness to potential users seeking robust computational resources.

The IO Cloud is engineered to streamline the deployment and management of decentralized GPU clusters, known as Io workers, on demand. By creating on-demand clusters, machine learning teams can effectively distribute their workloads across io.net's GPU network. This system utilizes advanced libraries to address the complexities of orchestration, scheduling, fault tolerance, and scalability, ensuring a more efficient operational workflow. At its core, IO Cloud employs the RAY distributed computing Python framework, a solution that has been rigorously tested and adopted by OpenAI for training cutting-edge models like GPT-3 and GPT-4 on over 300,000 servers.

Conclusion

DePIN employs token incentives to significantly reduce the costs involved in acquiring and maintaining supply-side resources, which in turn lowers the expenses for end consumers, creating a virtuous flywheel effect that enables the network to expand rapidly. To rival the efficiency of centralized alternatives, DePIN networks are essential. However, in their present development stage, these networks face challenges with reliability, including susceptibility to downtime and software bugs.

Fine-Tuning

During the fine-tuning phase, the model's parameters are established. To summarize Part 1, we observe several centralizing influences resulting from its proprietary nature:

Censorship - Owners have the authority to determine the kinds of content that the model creates or handles.
Bias - Owners have the discretion to specify the types of content the model produces or processes.
Verifiability - Within a proprietary setting, the verifiability of whether the stated version of a model is genuinely in operation is unattainable for users.
Dependency and Lock-in - Entities using proprietary AI platforms or models become dependent on the controlling corporations, fostering a monopolistic power dynamic that hampers open innovation.
RLHF - Refining models with RLHF demands skilled labor, typically concentrated in wealthy organizations that can pay for top talent, giving them a competitive advantage in model enhancement.

Source: Tech Target

In early March, the open-source community gained access to its first highly capable foundational model when Meta's LLaMA was unexpectedly leaked to the public. Despite lacking instruction or conversation tuning, as well as RLHF, the community quickly grasped its importance, sparking a wave of innovation with significant advancements occurring within days of each other. The open-source community created variations of the model enhanced with instruction tuning, quantization, improved quality, human evaluations, multimodality, RLHF, among other improvements, with many developments building upon the previous ones. An internal memo by a Google researcher, which was leaked, eloquently details the future of AI and the struggles of the development of closed source software, Below is a concise excerpt:

“We Have No Moat
And neither does OpenAI
We’ve done a lot of looking over our shoulders at OpenAI. Who will cross the next milestone? What will the next move be?
But the uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch.
I’m talking, of course, about open source. Plainly put, they are lapping us. Things we consider “major open problems” are solved and in people’s hands today. Just to name a few:
LLMs on a Phone: People are running foundation models on a Pixel 6 at 5 tokens / sec.
Scalable Personal AI: You can finetune a personalized AI on your laptop in an evening.
Responsible Release: This one isn’t “solved” so much as “obviated”. There are entire websites full of art models with no restrictions whatsoever, and text is not far behind.
Multimodality: The current multimodal ScienceQA SOTA was trained in an hour.
While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months. This has profound implications for us:
We have no secret sauce. Our best hope is to learn from and collaborate with what others are doing outside Google. We should prioritize enabling 3P integrations.
People will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. We should consider where our value add really is.
Giant models are slowing us down. In the long run, the best models are the ones
which can be iterated upon quickly. We should make small variants more than an afterthought, now that we know what is possible in the <20B parameter regime…
Directly Competing With Open Source Is a Losing Proposition
This recent progress has direct, immediate implications for our business strategy. Who would pay for a Google product with usage restrictions if there is a free, high quality alternative without them?
And we should not expect to be able to catch up. The modern internet runs on open source for a reason. Open source has some significant advantages that we cannot replicate.
We need them more than they need us
Keeping our technology secret was always a tenuous proposition. Google researchers are leaving for other companies on a regular cadence, so we can assume they know everything we know, and will continue to for as long as that pipeline is open.
But holding on to a competitive advantage in technology becomes even harder now that cutting edge research in LLMs is affordable. Research institutions all over the world are building on each other’s work, exploring the solution space in a breadth-first way that far outstrips our own capacity. We can try to hold tightly to our secrets while outside innovation dilutes their value, or we can try to learn from each other.”

Open source models offer transparent innovation whilst helping to address issues of censorship and bias inherent in proprietary models. With model parameters and code accessible to everyone, foundational models can be executed and modified by anyone, allowing for the removal of embedded censorship. Additionally, the transparency of open parameters makes it simpler to identify and mitigate any underlying biases in the model and training method. The nature of open source models enables extensive comparisons.

Verifiability

We also discussed the crucial issue of our reliance on centralized model providers to accurately process our prompts and, often, to handle our private data securely. Crypto has been at the forefront of pioneering cryptographic proofs systems like Zero Knowledge Proofs (ZKP), Optimistic Proofs (OP) and Trusted Execution Environments (TEE), enabling for the verification of claims without exposing the underlying data, thus safeguarding privacy while also offering solutions for scaling. Cryptographic proofs enable users to transact without revealing their identities, ensuring privacy, and facilitate scaling by allowing the offloading of computationally intensive tasks to an auxiliary layer, such as rollups, or off-chain. Simultaneously, they provide a proof on-chain that the correct procedure was adhered to (TEE does not).

Machine learning and AI models are notoriously computationally heavy and thus it would be prohibitively expensive to run these models inside smart contracts on-chain. Through the development of various proving systems, models can be utilized in the following manner:

User submits a prompt to a specific LLM / model.
This request for inference is relayed off-chain to a GPU / computer that enters the prompt through the desired model.
This inference is then returned to the user and alongside it a cryptographic proof is attached verifying that the prompt was run through the specific model.

Verifiability is currently lacking in the AI industry, and as AI models become increasingly integrated into our lives and work, it is essential that we have the ability to verify the authenticity of their outputs.Take for example sectors like healthcare and law, where AI models assist in diagnosing diseases or analyzing legal precedents, the inability of professionals to verify the model's source and accuracy can foster mistrust or lead to errors. For healthcare providers, not knowing if an AI's recommendations are based on the most reliable model could adversely affect patient care. Similarly, for lawyers, uncertainty about an AI's legal analysis being up-to-date could compromise legal strategies and client outcomes. Conversely, if a user wishes to utilize a model with their data while keeping it private from the model provider due to confidentiality concerns, they can process their data with the model independently, without disclosing the data and then confirm the accurate execution of the desired model by providing a proof.

The result of these verifiability systems is that models can be integrated into smart contracts whilst preserving the robust security assumptions of running these models on-chain. The benefits are multi-faceted:

The model provider is able to keep their model private if they wish.
A User can verify the model was run correctly.
Models can be integrated into smart contracts, helping bypass the scalability issues present in blockchains today.

Currently, these crypto proving systems are largely in a developmental phase, with the majority of initiatives concentrated on establishing foundational components and creating initial demonstrations. The primary obstacles encountered at present encompass high computational expenses, constraints on memory capacity, intricate model designs, a scarcity of specialized tools and underlying frameworks, and a shortage of skilled developers. The preferences of these cryptographic verification systems (zkML, opML, TEE) are still being decided, but we are starting to see TEEs vastly outperform the currently computationally intensive and expensive zk proofs. Marlin gave a good overview of the trade-offs during their Eth Denver conference:

Source: Marlin Protocol

The potential for verification to enhance current model architecture is clear, yet these systems also hold the promise of improving the user experience in cryptocurrencies through the facilitation of upgradability and the implementation of dynamic contracts. As it stands, the functionality of smart contracts is significantly limited by their dependence on preset parameters, necessitating manual updates to ensure their continued effectiveness. Typically, this manual updating process involves either bureaucratic governance procedures or compromises decentralization by granting undue autonomy to the smart contract owner. For instance, the updating of risk parameters in Aave is managed through political and bureaucratic governance votes alongside risk teams, a method that has proven to be inefficient as evidenced by Guantlet’s departure from Aave. Integrating AI into smart contracts has the potential to revolutionize their management, modification, and automatic enhancement. In the example of Aave, implementing AI agents for the adaptation of risk parameters in response to evolving market conditions and risks could significantly optimize the process, offering a more efficient and timely alternative to the often slow and cumbersome adjustments made by humans or DAOs.

Ritual

Ritual is conceptualized as a decentralized, flexible, and sovereign platform for executing AI tasks. It integrates a distributed network of nodes, each equipped with computational resources, and allows AI model developers to deploy their models, including both LLMs and traditional machine learning models, across these nodes. Users can access any model within the network through a unified API, benefitting from additional cryptographic measures embedded in the network. These measures provide assurances of computational integrity and privacy (zkML, opML & TEE).

Infernet represents the inaugural phase in the evolution of Ritual, propelling AI into the realm of on-chain applications by offering robust interfaces for smart contracts to utilize AI models for inference tasks. It ensures unrestricted access to a vast network of model and compute providers, marking a significant step towards democratizing AI and computational resources for on-chain applications and smart contracts.

Source: Ritual.net

The overarching aim for Ritual is to establish itself as the ‘AI coprocessor’. This involves advancing Infernet into a comprehensive suite of execution layers that seamlessly interact with the foundational infrastructure across the ecosystem. The goal is to enable protocols and applications on any blockchain to leverage Ritual as an AI coprocessor, facilitating widespread access to advanced AI functionalities across crypto.

The Ritual Superchain

Privacy

The foundational element of crypto is encryption and cryptography, which can be utilized in numerous ways to facilitate privacy-preserving interactions with AI. Verifiable AI will empower users to retain ownership of their data, restricting third-party access. Despite advancements, privacy concerns persist because data is not encrypted at all stages of processing. Currently, the standard practice involves encrypting messages between a user's device and the provider's server. However, this data is decrypted on the server to enable the provider to utilize it, such as running models on user data, exposing it to potential privacy risks. There is therefore a large risk of exposing sensitive information to the LLM service provider and in critical sectors like healthcare, finance, and law, such privacy risks are substantial enough to halt progress and adoption.

These privacy concerns are evident in IBM's 2023 Security Report, which reveals a notable rise in the incidence of healthcare data breaches throughout the decade:

Source: Zama

Crypto is now pioneering a new standard for data transfer and processing, called Fully Homomorphic Encryption (FHE). FHE allows data to be processed without ever being decrypted. This innovation ensures that companies / people can provide their services while keeping users' data completely private, with no discernible impact on functionality for the user. With FHE, data remains encrypted not only while it is being transmitted but also throughout the processing phase. This advancement extends the possibility of end-to-end encryption to all online activities, not just message transmission and in the context of AI FHE will enable the best of both worlds: protection of both the privacy of the user and the IP of the model.

FHE allows functions to be executed on data while it remains encrypted. A demonstration by Zama has shown that an LLM model, when implemented with FHE, preserves the accuracy of the original model's predictions, whilst keeping data (prompts, answers) encrypted throughout the whole process. Zama modified the GPT-2 implementation from the Hugging Face Transformers library, specifically, parts of the inference were restructured using Concrete-Python. This tool facilitates the transformation of Python functions into their FHE counterparts, ensuring secure and private computation without compromising on performance.

In part 1 of our thesis, we analyzed the structure of the GPT models, which broadly consist of A sequence of multi-head attention (MHA) layers applied consecutively. Each MHA layer utilizes the model's weights to project the inputs, executes the attention mechanism, and then re-projects the attention's output into a novel tensor.

Zama's TFHE approach encodes both model weights and activations as integers. By employing Programmable Bootstrapping (PBS) and simultaneously refreshing ciphertexts, it enables arbitrary computations. This method allows for the encryption of any component or the entirety of LLM computations within the realm of Fully Homomorphic Encryption.

Source: Zama

After converting these weights and activations, Zama enables the model to be run fully encrypted, meaning the server can never see a user's data or inputs. The above graphic from Zama, shows a basic implementation of FHE in LLMs:

A user begins the inference process on their local device, stopping just after the initial layer, which is omitted from the model shared with the server.
The client then encrypts these intermediate operations and forwards them to the server. The server processes a portion of the attention mechanism on this encrypted data, and sends the results back to the client.
Upon receipt, the client decrypts these results to proceed with the inference locally.

A more concrete example may be the following:In a scenario where a potential borrower applies for a loan, the bank is tasked with assessing the applicant's creditworthiness while navigating privacy regulations and concerns that restrict direct access to the borrower's detailed financial history. To address this challenge, the bank could adopt a FHE scheme, enabling secure, privacy-preserving computations.

The applicant agrees to share their financial data with the bank under the condition that privacy safeguards are in place. The borrower encrypts their financial data locally, ensuring its confidentiality. This encrypted data is then transmitted directly to the bank, which is equipped to run sophisticated credit assessment algorithms & AI models on the encrypted data within its computing environment. As the data remains encrypted throughout this process, the bank can conduct the necessary analyses without accessing the applicant's actual financial information. This approach also safeguards against data breaches, as hackers would not be able to decrypt the financial data without the encryption key, even if the bank's servers were breached.

Upon completing the analysis, the user decrypts the resulting encrypted credit score and insights, thus gaining access to the information without compromising the privacy of their financial details at any point. This innovative method ensures the protection of the applicant's financial records at every step, from the initial application through to the final credit assessment, thereby upholding confidentiality and adherence to privacy laws.

Incentivised RLHF

Within part one of our thesis, we highlighted the centralizing forces within the RLHF stage of fine-tuning, namely the aggregation of specialized labor within a few large companies due to the compensation they can provide. Crypto economic incentives have proven valuable in creating positive feedback loops and engaging top tier talent toward a common goal. An example of this tokenized RLHF has been Hivemapper, which aims to use crypto economic incentives to accurately map the entire world. The Hivemapper Network, launched in November 2022, rewards participants who dedicate their time to refining and curating mapping information and has since mapped 140 million Kilometers since launched in over 2503 distinct regions. Kyle Samani highlights that tokenized RLHF starts to make sense in the following scenarios:

When the model targets a specialized and niche area rather than a broad and general application. Individuals who rely on RLHF for their primary income, and thus depend on it for living expenses, will typically prefer cash payments. As the focus shifts to more specialized domains, the demand for skilled workers increases, who may have a vested interest in the project's long-term success.
When the individuals contributing to RLHF have a higher income from sources outside of RLHF activities. Accepting compensation in the form of non-liquid tokens is viable only for those with adequate financial stability from other sources to afford the risk associated with investing time in a specific RLHF model. To ensure the model's success, developers should consider offering tokens that vest over time, rather than immediately accessible ones, to encourage contributors to make decisions that benefit the project in the long run.

Inference

In the inference phase, the deployment platform delivers the inference to the end user via on-premise servers, cloud infrastructure, or edge devices. As previously mentioned, there is a centralization in both the geographic location of the hardware and its ownership. With daily operational costs reaching hundreds of thousands of dollars for the most popular deployment platforms, most corporations find themselves priced out and thus the serving of models aggregates. Similarly to the pre-training phase, DePIN networks can be utilized to serve inferences on a large scale, offering multiple advantages:

User ownership - DePIN’s can be used to serve inference by connecting and coordinating compute across the globe. The ownership of the network and the subsequent rewards flow to the operators of this network, who are also the users. DePIN enables the collective ownership of the network by its users, avoiding the misalignment of incentives we historically find in web2 operations.
Crypto economic incentives - Crypto economic incentives such as block rewards or rewards for proof of work enables the network to function with no central authority and accurately incentivise and compensate work done that is beneficial to the network.
Reduced costs - onboarding the long-tail of GPU’s across the globe can greatly reduce the costs of inference as we have seen with the price comparisons between decentralized compute providers when compared to their centralized counterparts.

Decentralized frontends

Source: Marlin protocol

The underlying code of smart contracts is executed on a decentralized peer-to-peer network, however, the primary way users interact with these contracts is through frontends hosted on centralized servers. This centralization presents multiple challenges, such as vulnerability to Distributed Denial of Service (DDoS) attacks, the possibility of domain name DDoS or malicious takeovers and most importantly censorship by corporate owners or nation states. Similarly, the dominance of centralized frontends in the current AI landscape raises concerns, as users can be restricted access to this pivotal technology. When developing community-owned, censorship-resistant front ends that facilitate worldwide access to smart contracts and AI, it's crucial to take into account the geographical distribution of nodes for data storage and transmission. Equally important is ensuring proper ownership and access control over this data. There are a number of protocols and crypto systems that can be used to enable this:

IPFS

The Interplanetary File System (IPFS) is a decentralized, content-addressable network that allows for the storage and distribution of files across a peer-to-peer network. In this system, every file is hashed, and this hash serves as a unique identifier for the file, enabling it to be accessed from any IPFS node using the hash as a request. This design is aimed at supplanting HTTP as the go-to protocol for web application delivery, moving away from the traditional model of storing web applications on a single server to a more distributed approach where files can be retrieved from any node within the IPFS network. Whilst a good alternative to the status quo, webpages hosted on IPFS, connected via DNSLink rely on gateways which may not always be secure or operate on a trustless basis. This webpage is also a static HTML site.

3DNS & Marlin

The advent of 3DNS introduces the concept of ‘tokenized domains’ managed directly on the blockchain, presenting a solution to several decentralization challenges. This innovation allows smart contracts, and by extension DAOs to oversee domain management. One of the primary benefits of managing DNS records on the blockchain is enhanced access control. With this system, only keys with explicit authorization can modify DNS records, effectively mitigating risks such as insider interference, database breaches, or email compromises at the DNS provider level. However, the domain must still be linked to an IP address, which a hosting provider can change at their discretion. Consequently, any alteration in the IP address of the server hosting the frontend necessitates a DAO vote to update the records—a cumbersome process.

To address this, there's a need for a deployment method that enables smart contracts to autonomously verify the server's codebase and only then redirect DNS records to the host, contingent upon the code aligning with an approved template. This is where Trusted Execution Environments (TEEs), such as Marlin’s Oyster come into play. TEEs create a secure enclave for executing code, shielded from any data/code modifications or access by the host machine, and enable the verification of the code's integrity against its intended version through attestations.

This framework allows for the implementation of Certificate Authority Authorization (CAA) records, managed by a DNS admin contract, to ensure that only enclaves executing approved software can request domain certificates. This mechanism guarantees that any data received by users visiting the domain is authenticated, emanating exclusively from the authorized application, thereby certifying its integrity and safeguarding against tampering.

Public Key Infrastructure Problem

During the 1990s and early 2000s, cryptographers and computer scientists extensively theorized about the vast benefits and innovations that Public Key Infrastructure (PKI) could bring forth. PKI represents a sophisticated framework that is pivotal for bolstering security across the internet and intranets, facilitating secure network transactions including e-commerce, internet banking, and the exchange of confidential emails through robust encryption and authentication mechanisms. Utilizing asymmetric cryptography, also known as public-key cryptography, as its foundational security mechanism, this approach involves the use of two keys: a public key, which can be shared openly, and a private key, which is kept secret by the owner. The public key is used for encrypting messages or verifying digital signatures, while the private key is used for decrypting messages or creating digital signatures. Central to the PKI system is the generation of a pair of keys—a public key and a private key—which enables the encrypted transmission of data, thereby safeguarding privacy and ensuring that only authorized individuals can access the information.

For PKI to operate effectively, it is imperative for users to maintain their private keys in a manner that is both secure and accessible. "Secure" in this context means that the private key is stored in a private manner, exclusively accessible to the user. "Accessible" implies that the user can easily and frequently retrieve their private key when needed. The challenge of PKI lies in achieving this delicate balance. For instance, a user might secure their private key by writing it down and storing it in a locked box and then misplacing the box—akin to storing it in a highly secure location but then forgetting about it. This scenario compromises security. Conversely, storing the private key in a highly accessible location, such as on a public website, would render it insecure as it could be exploited by unauthorized users. This conundrum encapsulates the fundamental PKI problem that has hindered the widespread adoption of public key infrastructure.

Cryptocurrency has addressed the PKI dilemma through the implementation of direct incentives, ensuring that private keys are both secure and readily accessible. If a private key is not secure, the associated crypto wallet risks being compromised, leading to potential loss of funds. Conversely, if the key is not accessible, the owner loses the ability to access their assets. Since Bitcoin's introduction, there has been a steady, albeit gradual, increase in the adoption of PKI. When utilized effectively, PKI and crypto’s resolution to the PKI problem, plays a critical role in facilitating the secure expansion of open-source agents, proof of personhood solutions, and, as previously discussed, data provenance solutions.

Source: An empirical three phase analysis of the crypto market

Autonomous Smart Agents

The concept of an agent has deep historical roots in philosophy, tracing back to the works of prominent thinkers like Aristotle and Hume. Broadly, an agent is defined as any entity capable of taking action, and "agency" refers to the expression or demonstration of this ability to act. More specifically, "agency" often pertains to the execution of actions that are intentional. Consequently, an "agent" is typically described as an entity that holds desires, beliefs, intentions, and possesses the capability to act based on these factors. This concept extended into the realm of computer science with the goal of empowering computers to grasp users' preferences and independently carry out tasks on their behalf. As AI evolved, the terminology "agent" was adopted within AI research to describe entities that exhibit intelligent behavior. These agents are characterized by attributes such as autonomy, the capacity to respond to changes in their environment, proactiveness in pursuing goals, and the ability to interact socially. AI agents are now recognized as a critical step towards realizing Artificial General Intelligence (AGI), as they embody the capability for a broad spectrum of intelligent behaviors.

As judged by World Scope, LLMs have showcased remarkable abilities in acquiring knowledge, understanding instructions, generalizing across contexts, planning, and reasoning. They have also proven adept at engaging in natural language interactions with humans. These strengths have led to LLMs being heralded as catalysts for Artificial General Intelligence (AGI), highlighting their significance as a foundational layer in the development of intelligent agents. Such advancements pave the way for a future in which humans and agents can coexist in harmony.

Within the confines of its environment, these agents can be used to complete a wide array of tasks. Bill Gates used the following scenario to describe their myriad functions: “Imagine that you want to plan a trip. A travel bot will identify hotels that fit your budget. An agent will know what time of year you’ll be traveling and, based on whether you always try a new destination or like to return to the same place repeatedly, it will be able to suggest locations. When asked, it will recommend things to do based on your interests and propensity for adventure and book reservations at the types of restaurants you would enjoy.” Whilst this far out, OpenAI is reportedly developing AI agents capable of executing complex, multi-step tasks autonomously. These agents, transcending the traditional bounds of user interaction, are designed to manipulate user devices directly to perform intricate tasks across different applications. For instance, an AI agent could autonomously transfer data from a document into a spreadsheet for further analysis, streamlining work processes significantly. Innovating beyond mere desktop applications, these AI agents could also navigate web-based tasks—such as booking flights or compiling travel itineraries—without relying on APIs.

Whilst useful, these centralized AI agents pose similar risks to the ones we identified in part 1:

Data control & access
Verifiability
Censorship

However, we also run into a few new issues:

Composability - One of the primary benefits of crypto is the composability it facilitates. This feature enables open-source contributions and the permissionless interaction of protocols, allowing them to connect, build upon, and interface with each other seamlessly. This is illustrated in DeFi through the concept of 'money legos'. For an AI agent to function optimally, it must possess the capability to interface with a broad spectrum of applications, websites, and other agents. However, within the confines of traditional closed network systems, AI agents face significant challenges in task execution, often limited by the need to connect to multiple third-party APIs or resort to using complex methods like Selenium drivers for information retrieval and task execution. In the era of sovereign AI, the limitations become even more pronounced, as agents are unable to access models or data behind national firewalls. To truly empower AI agents, credibly neutral, decentralized base layers are essential. Such layers would allow agents to interact permissionlessly with a diverse range of applications, models, and other agents, enabling them to collaboratively complete complex tasks without the barriers imposed by current network infrastructures.

Value Transfer - The ability for agents to transfer value is a crucial functionality that will become increasingly important as they evolve. Initially, these transactions will primarily serve the needs of the humans utilizing the agents, facilitating payments for services, models, and resources. However, as agent capabilities advance, we will observe a shift towards autonomous transactions between agents themselves, both for their own benefit and for executing tasks beyond their individual capabilities. For instance, an agent might pay a DePIN to access a computationally intensive model or compensate another specialized agent for completing a task more efficiently. We believe that a significant portion of global value transfer will eventually be conducted by agents. Evidence of this trend is already emerging, as seen with initiatives like Autonolas on the Gnosis chain. Autonolas agents make up over 11% of all transactions on the chain and in the last month they have averaged 75.24% of all Gnosis chain transactions.

Source: adrian0x on Dune analytics

Privacy & Alignment - Human-operated agents become more effective when they have access to extensive personal data, enabling them to tailor suggestions and services based on individual preferences, schedules, health metrics, and financial status. However, the reliance on centralized agents raises significant concerns regarding data privacy, as it entails the accumulation of sensitive personal information by large technology corporations. This situation can lead to unintended consequences, as illustrated by the incident with Samsung, where employees inadvertently compromised company secrets while using ChatGPT. Such scenarios highlight the misalignment of incentives between technology incumbents and users, presenting substantial ethical and privacy risks. Users' sensitive data could be exploited, either sold to advertisers or used in ways that serve corporate interests rather than the individuals'. To safeguard sensitive information while maximizing the efficiency of AI agents, it's essential that each user retains ownership of both their agent and their data. By facilitating the local operation of these agents, users can effectively protect their personal data from external vulnerabilities. This approach not only enhances data privacy but also ensures that the agents can perform at their optimal capacity, tailored specifically to the user's preferences and needs without compromising security.

Dependency & Lock-in - In part 1 of our thesis, we delved into the drawbacks of lock-in effects stemming from closed source models. The core issue with relying on centralized agents developed by corporations focused on maximizing shareholder value is the inherent misalignment of incentives. This misalignment can drive such companies to exploit their users in the pursuit of increased profits, for instance, by selling the intimate data provided to these agents. Moreover, a company's take rate—the percentage of revenue taken as fees—is often directly tied to the exclusivity and indispensability of its services. In the context of AI agents, a company like OpenAI might initially offer low take rates and maintain a relatively open network. However, as these agents become more integral to users, evolving through iterative improvements from vast amounts of user data, these centralized corporations may gradually increase their take rates through higher fees or by venturing into other revenue-generating activities like advertising or selling user data. We believe in the establishment of agents on a credibly neutral, open-source base layer that is user-owned, ensuring that incentives are properly aligned and prioritizing the users' interests and privacy above corporate gains.

Smart Agents

Source: David Johnston Smart agents paper

Revisiting the challenges inherent to PKI, it becomes evident that cryptographic solutions offer compelling resolutions to the issues of alignment, privacy, and verifiability. Furthermore, these solutions adeptly align incentives.

The term "smart agent" refers to a class of general-purpose AI systems designed to interact with smart contracts on blockchain networks. These agents can be categorized based on their operational model: they may either be owned and controlled by users or function autonomously without direct human oversight. Here we will examine their capacity to enhance human interactions in various domains.

The implementation of a smart agent encompasses three critical components, each playing a pivotal role in its functionality and effectiveness. These components are designed to ensure secure, informed, and contextually aware interactions within the crypto ecosystem:

User's Crypto Wallet: This serves as the foundational element for key management and transactional operations. It enables users to sign and authorize transactions recommended by the smart agent, ensuring secure and authenticated interactions with blockchain-based applications.
LLM Specialized in Crypto: A core intelligence engine of the smart agent, this model is trained on extensive crypto datasets, including information on blockchains, wallets, decentralized applications, DAOs, and smart contracts. This training enables the agent to understand and navigate the complex crypto environment effectively. This LLM must be fine-tuned to include a component that scores and recommends the most suitable smart contracts to users based on a set of criteria, prioritizing safety.
Long-Term Memory for User Data and Connected Applications: This feature involves the storage of user data and information on connected applications either locally or on a decentralized cloud. It provides the smart agent with a broader context for its actions, allowing for more personalized and accurate assistance based on historical interactions and preferences.

Users interact with their personal smart agents through either a locally installed application or a community-hosted frontend interface. This interface could be similar to that of platforms like ChatGPT, allowing users to input queries and engage in dialogues with their smart agent. Through this interaction, users can specify the actions they wish to be executed. The smart agent then provides suggestions tailored to the user's preferences and the security of the involved smart contracts.

What sets these smart agents apart from standard LLMs is their action-oriented capability. Unlike LLMs that primarily generate informational responses, smart agents have the advanced functionality to act on the suggestions they provide. They achieve this by crafting blockchain transactions that represent the user's desired actions. This capacity for direct action on the blockchain distinguishes smart agents as a significant advancement over traditional AI models, offering users a more interactive and impactful experience. By integrating conversational interaction with the ability to perform blockchain transactions, smart agents facilitate a seamless and secure interface between users and the crypto ecosystem.

Incorporating PKI as the foundational element of agent usage empowers individuals with direct control over their data and the actions of their agents. This approach addresses the issue of misaligned incentives as users actively confirm that their agents are acting in their interests by reviewing and approving transactions. This mechanism not only ensures that agents operate in alignment with user goals but also secures the ownership and control of sensitive personal data that powers these agents. In an era where artificial intelligence can easily generate convincing fabrications, the immutable nature of cryptographic techniques stands as a bulwark against such threats. As AI technology advances, the authenticity guaranteed by a private key may emerge as one of the few unforgeable proofs of identity and intent. Therefore, private keys are pivotal in constructing a framework that allows for the controlled and verifiable use of agents.

Crypto UX

Smart agents represent an upgrade from their centralized counterparts, but they also have the potential to drastically improve crypto UX. This mirrors the technological journey experienced during the 1980s and 1990s with the advent of the internet—a period marked by challenges in navigating a novel and largely misunderstood technology. The initial phase of the internet was characterized by a hands-on and often challenging process of website discovery and access. Users primarily relied on directories such as Yahoo! Directory and DMOZ, which featured manually curated lists of websites neatly categorized for easy navigation. Additionally, the dissemination of website information was largely through traditional methods, including word of mouth, publications, and printed guides that provided URLs for direct access. Before the internet became widespread, platforms like Bulletin Board Systems (BBS) and Usenet forums were instrumental in exchanging files, messages, and recommendations for interesting websites. With the absence of advanced search tools, early internet exploration necessitated knowing the precise URLs, often obtained from non-digital sources, marking a stark contrast to the sophisticated, algorithm-driven search engines that streamline web discovery today.

It wasn’t until Google's introduction in 1998, which revolutionized internet use by indexing web pages and enabling simple search functionality, that the internet became vastly more accessible and user-friendly. This breakthrough allowed people to easily find relevant information, and Google's minimalist search bar and efficient results paved the way for widespread internet adoption, setting a precedent for making complex technologies accessible to the general public.

Currently, the crypto ecosystem is that early internet, presenting non-technical users with the daunting task of navigating chains, wallets, bridges, token derivatives with varying risk profiles, staking, and more. This complexity renders crypto largely inaccessible to the average person, primarily due to the poor user experience that is standard in the space. However, smart agents hold the potential to create a 'Google moment' for crypto. They could enable ordinary people to interact with smart contracts as simply as typing a command into a search bar, significantly simplifying the user interface. This breakthrough could transform the crypto landscape, making it user-friendly and accessible, akin to how Google transformed internet search and usability.

Morpheus

One such project which is advancing Smart agents is Morpheus. The Morpheus whitepaper, authored by the pseudonymous trio Morpheus, Trinity, and Neo, was released on September 2nd, 2023 (Keanu Reeves Birthday - a tip to his role in the Matrix). Unlike typical projects, Morpheus operates without a formal team, company, or foundation, embodying a fully decentralized ethos.

The project is architected to advance the creation of a peer-to-peer network, consisting of personal, general-purpose AIs that act as Smart Agents capable of executing Smart Contracts for individuals. It promotes open-source Smart Agents and LLMs that enable seamless interactions with users' wallets, decentralized applications, and smart contracts. Within the network there are four key shareholders:

Coders - The coders in the Morpheus network are the open source developers that create the smart contracts and off chain components that power Morpheus. They are also developers who build smart agents on top of Morpheus.
Capital Providers - The capital providers in the Morpheus network are the participants who commit their staked ETH (stETH) to the capital pool for use by the network.
Compute Providers - The compute providers provide agnostic compute power, mainly in the form of GPUs.
Community - The community allocation in the Morpheus network refers to shareholders who create frontends in order to interact with the Morpheus network and their smart agents. It also encompasses any users who provide tools or do work to bring users into the ecosystem.

The final shareholder in the ecosystem is the user. The user encompasses any individual or entity soliciting inference services from the Morpheus network. To synchronize incentives for accessing inference, the Yellowstone compute model is employed, operating under the following (simplified) structure:

To request an output from an LLM, a user must hold MOR tokens in their wallet.
A user specifies the LLM they want to use and submits their prompt.
An offchain router contract connects the user to a compute provider that is hosting that LLM on their hardware and is providing the cheapest and highest quality response.
The compute provider runs the prompt and returns the answer to the user.
The compute provider is paid MOR tokens for their work done.

Source: Yellowstone compute model

Proof of Personhood

Proof of personhood (PoP), also referred to as the "unique-human problem," represents a constrained version of real-world identity verification. It confirms that a specific registered account is managed by an actual individual who is distinct from the individuals behind all other accounts, and it strives to accomplish this ideally without disclosing the identity of the actual person.

The advent of sophisticated generative AI necessitates the establishment of two key frameworks to enhance fairness, social dynamics, and trustworthiness online:

Implementing a cap on the quantity of accounts each individual can hold, a measure crucial for safeguarding against Sybil attacks, with significant implications for facilitating digital and decentralized governance.
Curbing the proliferation of AI-generated materials that are virtually identical to those produced by humans, in order to prevent the mass spread of deception or disinformation.

Utilizing public key infrastructure and human verification systems, PoP can provide a fundamental rate limit to accounts, preventing sybil attacks. With the use of valid human verification systems, even if a human or agent tried to get 100 bot accounts created, they would need 100 humans to consistently complete the verification. This naturally reduces spam and sybil attacks.

The second, and perhaps more critical, application of Proof of Personhood (PoP) systems lies in their ability to accurately differentiate between content generated by AI and that produced by humans. As highlighted in the data provenance section of our report, Europol has projected that AI-generated content might constitute up to 90% of online information in the coming years, posing a significant risk for the spread of misinformation. A prevalent instance of this issue is the creation and distribution of 'deepfake' videos, where AI is utilized to craft highly convincing footage of individuals. This technology allows creators to promulgate false information while masquerading it as legitimate and real. Essentially, intelligence tests will cease to serve as reliable markers of human origin. PoP endows users with the choice to engage with human-verified accounts, layering on social consensus around verifiability. This is akin to the existing filters on social media that allow users to select what content appears in their feeds. PoP offers a similar filter, but focused on verifying the human source behind content or accounts. It also supports the creation of reputation frameworks that penalize the dissemination of false information, whether AI-crafted or otherwise.

Worldcoin

Source: Worldcoin.org

Sam Altman, known for his role as the CEO of OpenAI, co-founded Worldcoin, a project aimed at providing a unique ledger of human identities through public key infrastructure. The philosophy is as follows: As AI is poised to generate significant prosperity and resources for society, it also presents the risk of displacing or augmenting numerous jobs, on top of blurring the lines between human and bot identities. To address these challenges, Worldcoin introduces two core concepts: a robust proof-of-personhood system to verify human identity and a universal basic income (UBI) for all. Distinctly, Worldcoin utilizes advanced biometric technology, specifically by scanning the iris with a specialized device known as “the Orb”, to confirm individual uniqueness.

As you can see from the graphic, Worldcoin works like this:

Every Worldcoin participant downloads an application onto their mobile device that creates both a private and a public key.
They then go to the physical location of an “Orb”, which can be found here.
The user stares into the camera of the Orb while simultaneously presenting it with a QR code that their Worldcoin application generates, which includes their public key.
The Orb examines the user's eyes with sophisticated scanning hardware and applies machine learning classifiers to ensure that the individual is a genuine human being and that the iris of the individual is unique and has not been recorded as a match to any other user already in the system.
Should both assessments be successful, the Orb signs a message, approving a unique hash derived from the user's iris scan.
The generated hash is then uploaded to a database.
The system does not retain complete iris scans, destroying these images locally. Instead, it stores only the hashes, which are utilized to verify the uniqueness of each user.
The user then receives their World ID.

A holder of a World ID can then demonstrate their uniqueness as a human by creating a Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (ZK-SNARK). This proves they possess the private key that matches a public key listed in the database, without disclosing the specific key they own.

Source: Worldcoin.org

Decentralising AI

asxn-labs@newsletter.paragraph.com (ASXN Labs) — Mon, 18 Mar 2024 22:07:27 GMT

Introduction to Crypto AI

Pre-Training

As we have discussed extensively in part 1, within the pre-training phase of model generation we encounter multiple centralizing forces, namely:

Closed-source data & data access
Geographical centralization of resources
Resource costs
GPU supply-side exclusivity

The collection of this data in order to train models is vital, however the following issues are prevalent:

Data Access - Major firms like Microsoft, Google, and OpenAI have superior data access through partnerships, their own user data, or the capability to establish extensive web scraping operations.
Closed-Source Data - When Training models the data used requires careful consideration of bias.
Data Provenance - Determining and verifying the source of data is becoming increasingly important, as it ensures the integrity and reliability of the data which is crucial when training a model.

Collection & Preparation

Source: Messari

The DePIN flywheel works as follows:

Participants contributing to the network's growth are motivated through inflationary token rewards, effectively subsidizing their efforts. These incentives are designed to bolster the network's early development until it can earn steady income from user fees.
The expansion of the network draws in developers and creators of products. Furthermore, the network's financial support for those who supply its services enables them to provide these services at lower costs, which in turn entices end users.
As end users start to pay for the services offered by the network, the income for both the providers and the network itself rises. This increase in revenue generates a positive feedback loop, drawing in additional providers and investors to the network.
Being user owned, the value within the network can be distributed back to them, typically via a token burn model or through distribution of revenues. With these models, as the network becomes more useful and tokens are either removed from circulation in a burn model or staked by users, the value of the tokens tend to go up. This increase in token value further encourages more providers to join the network, perpetuating a beneficial cycle.

Grass

Data Provenance

Source: Statista digital economy compass 2019

Source: Filfox

Source: Coingecko centralized vs decentralised storage costs

Resource Allocation

The long-tail of GPUs are scattered across various platforms and devices, often underutilized or used for purposes far less demanding than their full capabilities allow, for example:

Gaming PCs and Consoles: High-end GPUs underused outside gaming could support distributed computing or AI training.
Corporate Workstations: Workstations with GPUs for creative tasks could be redirected for computational use during off-hours.
Data Centers: Despite their capacity, some data centers have GPUs with spare capacity ideal for AI tasks.
Academic Institutions: Universities with high-performance GPUs for research might not fully utilize them at all times, offering potential for broader use.
Cloud Computing Platforms: These platforms sometimes have more GPU resources than needed, presenting opportunities for optimized utilization.
Edge Devices: IoT and smart appliances have GPUs that, while less powerful individually, offer substantial collective processing power.
Cryptocurrency Mining Rigs: Market downturns make some rigs less suitable for mining but still valuable for computational tasks.

Diverse GPU Capabilities for Varied Tasks: Graphics cards vary widely in design and performance capabilities, making some unsuitable for specific AI tasks. Success in this domain hinges on effectively pairing the right GPU resources with the corresponding AI workloads.
Adapting Training Methods for Increased Latency: Currently, foundational AI models are developed using GPU clusters with ultra-low latency links. In a decentralized setup, where GPUs are distributed across various locations and connected over the public internet, latency can significantly rise. This situation presents a chance to innovate training methodologies that accommodate higher latency levels. Such adjustments could optimize the utilization of geographically dispersed GPU clusters.
Security and Privacy: Utilizing GPUs across various platforms raises concerns about data security and privacy. Ensuring that sensitive or proprietary information is protected when processed on external or public GPUs is crucial.
Quality of Service: Guaranteeing a consistent level of service can be challenging in a decentralized environment. Variability in GPU performance, network stability, and availability can lead to unpredictable processing times and outcomes.

Source: Messari

The costs to run some of the top performance GPUs on DePIN networks are between 60-80% cheaper than their centralized counterparts.

Source: Akash

The facilitate this massive increase in demand, Akash’s GPU capacity has had to scale quickly:

Source: Akash

Io Net

Conclusion

Fine-Tuning

During the fine-tuning phase, the model's parameters are established. To summarize Part 1, we observe several centralizing influences resulting from its proprietary nature:

Censorship - Owners have the authority to determine the kinds of content that the model creates or handles.
Bias - Owners have the discretion to specify the types of content the model produces or processes.
Verifiability - Within a proprietary setting, the verifiability of whether the stated version of a model is genuinely in operation is unattainable for users.
Dependency and Lock-in - Entities using proprietary AI platforms or models become dependent on the controlling corporations, fostering a monopolistic power dynamic that hampers open innovation.
RLHF - Refining models with RLHF demands skilled labor, typically concentrated in wealthy organizations that can pay for top talent, giving them a competitive advantage in model enhancement.

Source: Tech Target

“We Have No Moat
And neither does OpenAI
We’ve done a lot of looking over our shoulders at OpenAI. Who will cross the next milestone? What will the next move be?
But the uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch.
I’m talking, of course, about open source. Plainly put, they are lapping us. Things we consider “major open problems” are solved and in people’s hands today. Just to name a few:
LLMs on a Phone: People are running foundation models on a Pixel 6 at 5 tokens / sec.
Scalable Personal AI: You can finetune a personalized AI on your laptop in an evening.
Responsible Release: This one isn’t “solved” so much as “obviated”. There are entire websites full of art models with no restrictions whatsoever, and text is not far behind.
Multimodality: The current multimodal ScienceQA SOTA was trained in an hour.
While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months. This has profound implications for us:
We have no secret sauce. Our best hope is to learn from and collaborate with what others are doing outside Google. We should prioritize enabling 3P integrations.
People will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. We should consider where our value add really is.
Giant models are slowing us down. In the long run, the best models are the ones
which can be iterated upon quickly. We should make small variants more than an afterthought, now that we know what is possible in the <20B parameter regime…
Directly Competing With Open Source Is a Losing Proposition
This recent progress has direct, immediate implications for our business strategy. Who would pay for a Google product with usage restrictions if there is a free, high quality alternative without them?
And we should not expect to be able to catch up. The modern internet runs on open source for a reason. Open source has some significant advantages that we cannot replicate.
We need them more than they need us
Keeping our technology secret was always a tenuous proposition. Google researchers are leaving for other companies on a regular cadence, so we can assume they know everything we know, and will continue to for as long as that pipeline is open.
But holding on to a competitive advantage in technology becomes even harder now that cutting edge research in LLMs is affordable. Research institutions all over the world are building on each other’s work, exploring the solution space in a breadth-first way that far outstrips our own capacity. We can try to hold tightly to our secrets while outside innovation dilutes their value, or we can try to learn from each other.”

Verifiability

User submits a prompt to a specific LLM / model.
This request for inference is relayed off-chain to a GPU / computer that enters the prompt through the desired model.
This inference is then returned to the user and alongside it a cryptographic proof is attached verifying that the prompt was run through the specific model.

The model provider is able to keep their model private if they wish.
A User can verify the model was run correctly.
Models can be integrated into smart contracts, helping bypass the scalability issues present in blockchains today.

Source: Marlin Protocol

Ritual

Source: Ritual.net

The Ritual Superchain

Privacy

These privacy concerns are evident in IBM's 2023 Security Report, which reveals a notable rise in the incidence of healthcare data breaches throughout the decade:

Source: Zama

A user begins the inference process on their local device, stopping just after the initial layer, which is omitted from the model shared with the server.
The client then encrypts these intermediate operations and forwards them to the server. The server processes a portion of the attention mechanism on this encrypted data, and sends the results back to the client.
Upon receipt, the client decrypts these results to proceed with the inference locally.

Incentivised RLHF

When the model targets a specialized and niche area rather than a broad and general application. Individuals who rely on RLHF for their primary income, and thus depend on it for living expenses, will typically prefer cash payments. As the focus shifts to more specialized domains, the demand for skilled workers increases, who may have a vested interest in the project's long-term success.
When the individuals contributing to RLHF have a higher income from sources outside of RLHF activities. Accepting compensation in the form of non-liquid tokens is viable only for those with adequate financial stability from other sources to afford the risk associated with investing time in a specific RLHF model. To ensure the model's success, developers should consider offering tokens that vest over time, rather than immediately accessible ones, to encourage contributors to make decisions that benefit the project in the long run.

Inference

User ownership - DePIN’s can be used to serve inference by connecting and coordinating compute across the globe. The ownership of the network and the subsequent rewards flow to the operators of this network, who are also the users. DePIN enables the collective ownership of the network by its users, avoiding the misalignment of incentives we historically find in web2 operations.
Crypto economic incentives - Crypto economic incentives such as block rewards or rewards for proof of work enables the network to function with no central authority and accurately incentivise and compensate work done that is beneficial to the network.
Reduced costs - onboarding the long-tail of GPU’s across the globe can greatly reduce the costs of inference as we have seen with the price comparisons between decentralized compute providers when compared to their centralized counterparts.

Decentralized frontends

Source: Marlin protocol

IPFS

3DNS & Marlin

Public Key Infrastructure Problem

Source: An empirical three phase analysis of the crypto market

Autonomous Smart Agents

Whilst useful, these centralized AI agents pose similar risks to the ones we identified in part 1:

Data control & access
Verifiability
Censorship

However, we also run into a few new issues:

Source: adrian0x on Dune analytics

Smart Agents

Source: David Johnston Smart agents paper

User's Crypto Wallet: This serves as the foundational element for key management and transactional operations. It enables users to sign and authorize transactions recommended by the smart agent, ensuring secure and authenticated interactions with blockchain-based applications.
LLM Specialized in Crypto: A core intelligence engine of the smart agent, this model is trained on extensive crypto datasets, including information on blockchains, wallets, decentralized applications, DAOs, and smart contracts. This training enables the agent to understand and navigate the complex crypto environment effectively. This LLM must be fine-tuned to include a component that scores and recommends the most suitable smart contracts to users based on a set of criteria, prioritizing safety.
Long-Term Memory for User Data and Connected Applications: This feature involves the storage of user data and information on connected applications either locally or on a decentralized cloud. It provides the smart agent with a broader context for its actions, allowing for more personalized and accurate assistance based on historical interactions and preferences.

Crypto UX

Morpheus

Coders - The coders in the Morpheus network are the open source developers that create the smart contracts and off chain components that power Morpheus. They are also developers who build smart agents on top of Morpheus.
Capital Providers - The capital providers in the Morpheus network are the participants who commit their staked ETH (stETH) to the capital pool for use by the network.
Compute Providers - The compute providers provide agnostic compute power, mainly in the form of GPUs.
Community - The community allocation in the Morpheus network refers to shareholders who create frontends in order to interact with the Morpheus network and their smart agents. It also encompasses any users who provide tools or do work to bring users into the ecosystem.

To request an output from an LLM, a user must hold MOR tokens in their wallet.
A user specifies the LLM they want to use and submits their prompt.
An offchain router contract connects the user to a compute provider that is hosting that LLM on their hardware and is providing the cheapest and highest quality response.
The compute provider runs the prompt and returns the answer to the user.
The compute provider is paid MOR tokens for their work done.

Source: Yellowstone compute model

Proof of Personhood

The advent of sophisticated generative AI necessitates the establishment of two key frameworks to enhance fairness, social dynamics, and trustworthiness online:

Implementing a cap on the quantity of accounts each individual can hold, a measure crucial for safeguarding against Sybil attacks, with significant implications for facilitating digital and decentralized governance.
Curbing the proliferation of AI-generated materials that are virtually identical to those produced by humans, in order to prevent the mass spread of deception or disinformation.

Worldcoin

Source: Worldcoin.org

As you can see from the graphic, Worldcoin works like this:

Every Worldcoin participant downloads an application onto their mobile device that creates both a private and a public key.
They then go to the physical location of an “Orb”, which can be found here.
The user stares into the camera of the Orb while simultaneously presenting it with a QR code that their Worldcoin application generates, which includes their public key.
The Orb examines the user's eyes with sophisticated scanning hardware and applies machine learning classifiers to ensure that the individual is a genuine human being and that the iris of the individual is unique and has not been recorded as a match to any other user already in the system.
Should both assessments be successful, the Orb signs a message, approving a unique hash derived from the user's iris scan.
The generated hash is then uploaded to a database.
The system does not retain complete iris scans, destroying these images locally. Instead, it stores only the hashes, which are utilized to verify the uniqueness of each user.
The user then receives their World ID.

Source: Worldcoin.org

Nirmaan

Nirmaan, with its mining-as-a-service product is your gateway to decentralized AI networks.
Nirmaan democratizes access to participation in AI networks. Nirmaan's delegation service abstracts away the complexity of choosing GPU providers and AI networks. Users can simply buy and stake NRM tokens to access mining rewards earned by the mining service. By supplying GPU compute power to networks like Bittensor, Ritual and Morpheus, Nirmaan can acquire tokens at a cost that is between 60-70% of the current market price and even as low as 5-10% for emerging networks. Nirmaan will use a risk-managed approach to maximize earnings by allocating resources to both established and emerging networks.
Our team is composed of experts with a rich background in machine learning and quantitative trading. This includes achievements such as producing highly regarded research papers, founding successful ML startups & a deep reinforcement learning hedge fund. We are in the process of developing an optimization algorithm, alongside the deployment of skilled devops personnel, to seamlessly allocate compute across multiple networks. This approach aims to maximize rewards for our token holders. With proven experience in deploying GPUs and validators across 16 networks, including but not limited to Bittensor, Akash, Heurist, and IO.Net, our team brings invaluable expertise to the table.

Twitter:

Reach out to us on Telegram: @RMSYx0

Email: hello@nirmaan.ai

The Centralizing Forces Within AI

asxn-labs@newsletter.paragraph.com (ASXN Labs) — Mon, 04 Mar 2024 18:02:49 GMT

Introduction

GPT 1 Architecture

[insert performance data]

Regulation

Model Generation

Pre-Training

Data Collection & Preparation

Model Architecture

Training Procedure

Resource Allocation

Fine-Tuning

Inference

Centralizing Forces Within Model Generation

Pre-Training

As we have seen, the pre-training phase of a model combines a few things: data, training and resources. When it comes to the data collection, there are a number of issues:

Access to data

The top 1% of x networks, facilitates x proportion of the total traffic / volume. Source Chris Dixon's "Read Write Own".

Transformers enabled unsupervised learning models but the scraping of web data is no easy feat, web pages typically ban scraper IP addresses, user agents and employ rate limits and CAPTCHA services.

Patent Centralization. Source: Statista

Closed source data

Resource allocation

Source: Epochai

GPU Supply-Side Centralisation

Source: Nvidia reports

OpenAI - In 2020, Microsoft exclusively built a supercomputer in order to train their GPT models. The supercomputer developed for OpenAI is a single system with more than 285,000 CPU cores, 10,000 Nvidia V100 and A100 GPUs and 400 gigabits per second of network connectivity for each GPU server.
Microsoft - In 2022, Nvidia partnered with Microsoft to create a 1,123,200-core supercomputer utilizing Microsoft's Azure cloud technology. Eagle is now the 3rd largest supercomputer in the entire world, with maximum performance of 561 petaFLOPS generated from 14,400 Nvidia H100 GPUs and Intels’ Xeon Platinum 8480C 48C CPU.
Google - In 2023, Google announced the A3 supercomputer, purpose built for AI & ML models. A3 combines Nvidia’s H100 GPUs with Google’s custom-designed 200 Gpbs Infrastructure Processing Units (IPUs), allowing the A3 to host up to 26,000 H100 GPUs.
Meta - By year end 2024, Meta expects to operate some 350,000 Nvidia H100 GPUs and an equivalent of 600,000 H100 of compute from older GPUs such as the Nvidia A100’s used to train Meta’s LLaMA models.

Source: Statista

Scale records on the model GPT-3 (175 billion parameters) from MLPerf Training v3.0 in June 2023 (3.0-2003) and Azure on MLPerf Training v3.1 in November 2023 (3.1-2002). Source: Microsoft

We have clearly established then that the powerhouse behind the training of these large models is compute power, primarily in the form of GPUs. The centralizing forces we run into here are two fold:

Exclusivity - Nvidia GPUs have a huge waitlist & monopolistic corporations bulk order GPUs with priority over smaller orders / individuals.
Costs - The sheer cost of these GPU configurations mean only a small set of entities worldwide can train these models. For reference, each Nvidia H100 costs anywhere between $30,000 to $40,000, meaning Meta’s 600,000 H100 equivalent compute infrastructure will cost between $10.5 Billion and $24 Billion.

Supercomputer Geographical Centralization. Source: Wikipedia TOP500

Chip Manufacturing

Fine-Tuning & Closed-source Models

Censorship - Owners can decide what types of content the model generates or processes. They can implement filters that block certain topics, keywords, or ideas from being produced or recognized by the model. This could be used to avoid controversial subjects, comply with legal regulations, or align with the company's ethical guidelines or business interests. Since the launch of chatGPT, the outputs have continued to become increasingly censored and less useful. An extreme case of censorship of these models is showcased in China, where weChat conversions with Robot (built atop OpenAI’s foundational model) doesn’t answer questions such as “What is Taiwan?” or allow users to ask questions about Xi Jinping. In fact, through adversarial bypass techniques, a WSJ reporter was able to get Robot to admit that it was programmed to avoid discussing “politically sensitive content about the Chinese government or Communist Party of China.”
Bias - In neural networks, the role of weights and biases is pivotal, yet their influence can inadvertently introduce bias, particularly if the training data lacks diversity. Weights, by adjusting the strength of connections between neurons, may disproportionately highlight or ignore certain features, potentially leading to a bias of omission where critical information or patterns in underrepresented data are overlooked. Similarly, biases, set to enhance learning capabilities, might predispose the model to favor certain data types if not calibrated to reflect a broad spectrum of inputs. The closed source nature of these models can cause the model to neglect important patterns from specific groups or scenarios, skewing predictions and perpetuating biases in the model's output, meaning certain perspectives, voices or information are excluded or misrepresented. A good example of bias and censorship by the model owner is Google’s latest and greatest LLM, Gemini.
Verifiability - In a closed-source environment, users cannot confirm whether the claimed version of a model, such as ChatGPT 4 versus ChatGPT 3, is actually being used. This is because the underlying model architecture, parameters, and training data are not accessible for external review. Such opacity makes it difficult to ascertain if the latest advancements or features are indeed present or if older technologies are being passed off as newer versions, potentially affecting the quality and capabilities of the AI service received. For example, when using AI models to ascertain an applicant's credit worthiness for a loan, how can the applicant be sure that the same model was run by them as other applicants? Or how can we be sure the model only used the inputs it was supposed to use?
Dependency, lock-in and stagnation - Entities that rely on closed source AI platforms or models find themselves dependent on the corporations that maintain these services, leading to a monopolistic concentration of power that stifles open innovation. This dependency arises because the owning corporation can, at any moment, restrict access or alter the model, directly impacting those who build upon it. A historical perspective reveals numerous instances of this dynamic: Facebook, which initially embraced open development with its public APIs to foster innovation, notably restricted access to applications like Vine as they gained traction. Similarly, Voxer, a messaging app that gained popularity in 2012 for allowing users to connect with their Facebook friends, lost its access to Facebook's 'Find Friends' feature. This pattern is not exclusive to Facebook; many networks and platforms begin with an open-source or open innovation ethos only to later prioritize shareholder value, often at the expense of their user base. We see for-profit corporations eventually require take rates in order to meet their stated goals of creating shareholder value, for example Apple's App Store imposes a 30% fee on the revenues that are generated from apps. Another example is Twitter. Despite its original commitment to openness and interoperability with the RSS protocol network, eventually prioritized its centralized database, leading to a disconnection from RSS in 2013 with it the loss of data ownership and one's social graph. Amazon has also been accused of using its internal data to replicate and prioritize its products over those of other sellers. These examples underscore a trend where platforms evolve from open ecosystems to more controlled, centralized models, impacting both innovation and the broader digital community.
Privacy - The owners of these centralized models, large corporations such as OpenAI, retain all rights to use the prompt and user data to better train their models. This greatly inhibits user privacy. For example, Samsung employees inadvertently exposed highly confidential information by utilizing ChatGPT for assistance with their projects. The organization permitted its semiconductor division engineers to use this AI tool for debugging source code. However, this led to the accidental disclosure of proprietary information, including the source code of an upcoming software, internal discussion notes, and details about their hardware. Given that ChatGPT collects and uses the data inputted into it for its learning processes, Samsung's trade secrets have unintentionally been shared with OpenAI.

Reinforcement Learning from Human Feedback (RLHF)

Inference

Deployment Platform

Source: Statista

Source: Amazon, Microsoft, Google, Equinix, Statista

Centralized Frontends

Conclusion