<100 subscribers
Article Highlights
To date, attention has primarily focused on the bottom layers of the AI stack, encompassing renowned AI labs such as OpenAI and Anthropic, along with hardware manufacturers like Nvidia.
This focus and capital investment on these bottom layers have overshadowed the potential brewing in the application layer.
In the coming months, as experiments in the application layer grow, developers integrating AI models into applications may find that using relatively smaller models (potentially with highly specific or specialized functionalities) can lead to more manageable and flexible AI systems.
The use of small AI models positively impacts numerous areas within the AI and crypto stack, including decentralized training, local inference, and dataset collection and creation.
Returning to late 2022, the world first experienced the magical properties embedded in OpenAI's chatGPT. Initial experiments mostly followed the typical pattern of emerging but transformative technologies—primarily used and understood as an intriguing toy.
Fast forward to today, the spark ignited by chatGPT has evolved into a full-scale arms race aimed at amassing vast funds to develop the first Artificial General Intelligence (AGI). To achieve this ultimate goal, all eyes are on large AI labs (like OpenAI and Anthropic) and hardware manufacturers (such as Nvidia) to develop the next cutting-edge trillion-parameter model.
AI labs and hardware companies represent the bottom layers of the AI stack. Together, these layers form the foundation for AI agents, applications, and systems. The focus and capital investment on these bottom layers have obscured the potential brewing in the application layer. As demonstrated by the relatively simple AI agent chatGPT, the true magic of these models unfolds when they integrate with other software systems to form a coherent product.
More broadly, combining pure AI models with tools, orchestration software, business logic, and even other AI models can form an AI system or composite AI system. As pointed out by the Berkeley AI Research Group, these systems can achieve astonishing results that a single AI model could not achieve alone.
As more developers attempt to integrate AI models into their applications, smaller models (potentially with highly specific or specialized functionalities) may lead to more manageable and flexible AI systems. Cost savings alone suffice as a compelling reason to explore the use of smaller models—using OpenAI's larger GPT-4o model costs approximately 30 times more than its GPT-4o mini model.
In a world where the use of small models is increasing, positive second-order effects may arise for decentralized model training, localized inference, and data collection incentives—areas of focus for many teams within the AI and crypto stack.
AI Systems and Small Models
The transition towards these AI systems represents a natural evolution in the use of independent AI models. Generally, AI models themselves are not the end-product desired by end-users; rather, it is the entire software system (i.e., AI application) that creates value.
Like typical application development, packaging AI models, or multiple models, with business logic and necessary tools requires thoughtful design and may require multiple tests, iterations, and deployments. From an architectural perspective, smaller, more specialized models may offer advantages.
Advantages of Small AI Models
To date, the scaling laws for Large Language Models (LLMs) still hold true. Increasing the overall size of the model, along with the computational budget and the scale of the training dataset, typically yields better-performing or more "intelligent" models. However, the performance gains of these increasingly larger models come with trade-offs compared to smaller models.
Trade-offs in Model Training and Inference
Smaller models can be trained faster than cutting-edge LLMs and with less computational budget. For example, at the far end of the scaling spectrum, Meta's Llama 3.1 model set utilizes a vast training dataset (exceeding 15 trillion tokens) and sophisticated Nvidia H100 GPU clusters. Training the largest model may have taken weeks or even months, utilizing Meta's 16,000 H100 clusters. On the other hand, smaller models can be trained with less data in a shorter time using fewer GPUs. As an application developer focused on rapidly releasing and iterating different product designs, smaller models are a compelling choice because their narrower scope within applications negates the need for extensive general intelligence.
Inference involves querying the model after it has been trained and integrated into an application. The response time of the model (called latency) is a critical metric when AI applications serve real user requests in production environments. Lower latency typically leads to a better overall user experience. Imagine how poor the user experience would be if chatGPT took minutes or even tens of seconds to respond to each query. Since smaller models require less overall computation, they can process inference requests faster than larger models.
Overall, these design decisions shape the trade-off space that application developers must navigate. How small can an AI model be while still providing performance (in terms of latency and cost) that meets the application's requirements?
Historically, developers seeking significant performance improvements had the best option of waiting for the release of the next generation of large, cutting-edge AI models. These increasingly larger models indeed bring more capabilities but also come with increased computational resources and latency costs. For many, relying on these massive models seemed the only viable path to achieving the desired user experience for applications. However, from an efficiency perspective, this approach is actually over-engineering, as not all applications require the full capabilities of these oversized models, leading to a mismatch between model capabilities and application needs.
Today, the landscape is changing. Smaller AI models have become powerful enough to be deployed in real production environments. When these models are combined with complex business logic and technologies (such as tool utilization, function calls, Retrieval-Augmented Generation (RAG) systems, fine-tuning, and even other smaller AI models), these AI systems can produce results comparable to or even surpass those utilizing larger models.
Impact on AI and Crypto
Introducing smaller AI models into AI systems and applications positively impacts multiple verticals within the AI and crypto stack, including decentralized training, local inference, and incentivized data collection.
Decentralized and Distributed Training
Recent breakthroughs in decentralized and distributed training have brought this concept into the mainstream AI spotlight. Both Prime Intellect and Nous Research have demonstrated the feasibility of training AI models on geographically distributed computer clusters using different technologies. Prior to these recent achievements, decentralized training was largely considered impractical and economically unfeasible.
Although the initial results are impressive, more research and engineering work is needed to scale these training methods to create models comparable to those produced by AI labs like OpenAI and Anthropic from a pure model size perspective (e.g., trillion-parameter models).
However, those leveraging smaller models (Prime Intellect and Nous have experimented with models in the billions of parameters) can leverage decentralized training protocols like Gensyn and Prime Intellect once these new distributed training methods are integrated into their systems.
Local Inference
Most users interacting with newer generative AI models, whether for text, images, or videos, do so through hosted services. For instance, similar to existing traditional cloud architectures, OpenAI effectively hosts and runs the chatGPT application, providing developers with API endpoints to integrate their model sets.
The convenience provided by hosted services to users is hard to overstate. However, it also has some drawbacks:
Black Box Models – The model you use today may be different tomorrow. From the user's perspective, it's a black box—the service provider can switch models without the user's knowledge. This can lead to unexpected behavior or replacement of high-quality models with lower-quality ones, despite users paying for the higher-quality model.
Privacy – The entity running the service can view all data passed through the model. This deprives users of the ability to keep their queries private.