I. Introduction | The Model Layer Leap in Crypto AI
Data, models, and computing power are the three core elements of AI infrastructure, analogous to fuel (data), engine (model), and energy (computing power), all of which are indispensable. Similar to the evolution path of traditional AI infrastructure, the Crypto AI field has also gone through similar stages. In early 2024, the market was once dominated by decentralized GPU projects (such as Akash, Render, io.net, etc.), which generally emphasized a crude growth logic of "competing on computing power." However, after entering 2025, the industry focus has gradually shifted to the model and data layers, marking the transition of Crypto AI from competition over underlying resources to more sustainable and application-valuable mid-layer construction.
General Large Models (LLMs) vs. Specialized Models (SLMs)
Traditional large language models (LLMs) rely heavily on large-scale datasets and complex distributed architectures for training, with parameter scales often ranging from 70B to 500B, and a single training session can cost millions of dollars. In contrast, SLMs (Specialized Language Models) serve as a lightweight fine-tuning paradigm based on reusable foundation models. Typically built on open-source models such as LLaMA, Mistral, and DeepSeek, SLMs combine a small amount of high-quality professional data and technologies like LoRA to quickly construct expert models with specific domain knowledge, significantly reducing training costs and technical barriers.
It is worth noting that SLMs are not integrated into the weights of LLMs. Instead, they collaborate with LLMs through Agent architecture calls, dynamic routing of plugin systems, hot-swapping of LoRA modules, and RAG (Retrieval-Augmented Generation) methods. This architecture retains the broad coverage capability of LLMs while enhancing professional performance through fine-tuning modules, forming a highly flexible composable intelligent system.
The Value and Boundaries of Crypto AI in the Model Layer
Crypto AI projects essentially struggle to directly enhance the core capabilities of large language models (LLMs), primarily due to:
High Technical Barriers: The data scale, computing resources, and engineering capabilities required to train foundation models are extremely vast. Currently, only tech giants in the United States (such as OpenAI) and China (such as DeepSeek) possess the corresponding capabilities.
Limited Open-Source Ecosystem: Although mainstream foundation models like LLaMA and Mixtral have been open-sourced, the keys to model breakthroughs still lie within research institutions and closed-source engineering systems. The participation of on-chain projects in the core model layer is limited.
However, on top of open-source foundation models, Crypto AI projects can still extend value by fine-tuning specialized language models (SLMs) and combining the verifiability and incentive mechanisms of Web3. As the "peripheral interface layer" of the AI industry chain, they manifest in two core directions:
Trustworthy Verification Layer: By recording the model generation path, data contributions, and usage on-chain, the verifiability and tamper-resistance of AI outputs are enhanced.
Incentive Mechanism: Native tokens are used to incentivize data uploads, model calls, and Agent executions, creating a positive feedback loop for model training and services.
Classification of AI Model Types and Blockchain Applicability Analysis
It is evident that the feasible focus of model-based Crypto AI projects mainly lies in the lightweight fine-tuning of small SLMs, on-chain data access and verification in the RAG architecture, and local deployment and incentivization of Edge models. Combined with the verifiability and token mechanisms of blockchain, Crypto can provide unique value for these mid-to-low resource model scenarios, forming differentiated value in the AI "interface layer."
Blockchain-based AI chains can clearly and immutably record the contribution sources of every piece of data and model, significantly enhancing data credibility and model training traceability. Moreover, through smart contract mechanisms, rewards are automatically triggered when data or models are called, transforming AI actions into measurable, tradable tokenized value and building a sustainable incentive system. Additionally, community users can evaluate model performance, participate in rule-making and iteration through token voting, and improve the decentralized governance structure.
OpenLedger is one of the few blockchain AI projects in the market that focuses on data and model incentive mechanisms. It was the first to propose the concept of "Payable AI," aiming to build a fair, transparent, and composable AI operating environment that incentivizes data contributors, model developers, and AI application builders to collaborate on the same platform and earn on-chain rewards based on their actual contributions.
OpenLedger provides a full-chain closed loop from "data provision" to "model deployment" and then to "call revenue sharing." Its core modules include:
Model Factory: A no-code platform for fine-tuning and deploying custom models based on open-source LLMs using LoRA without programming.
OpenLoRA: Supports the coexistence of thousands of models with dynamic loading on demand, significantly reducing deployment costs.
PoA (Proof of Attribution): Measures contributions and allocates rewards through on-chain call records.
Datanets: Structured data networks for vertical scenarios, collaboratively built and verified by the community.
Model Proposal Platform: A composable, callable, and payable on-chain model marketplace.
Through these modules, OpenLedger builds a data-driven, model-composable "agent economy infrastructure," promoting the on-chainization of the AI value chain.
In terms of blockchain technology adoption, OpenLedger is based on the OP Stack + EigenDA, creating a high-performance, low-cost, and verifiable data and contract runtime environment for AI models.
Built on the OP Stack: Leveraging the Optimism technology stack for high throughput and low-cost execution.
Settled on the Ethereum Mainnet: Ensuring transaction security and asset integrity.
EVM Compatible: Facilitating quick deployment and scalability for developers using Solidity.
EigenDA for Data Availability: Significantly reducing storage costs while ensuring data verifiability.
Compared to general-purpose AI chains like NEAR, which focus more on data sovereignty and the "AI Agents on BOS" architecture, OpenLedger is dedicated to building an AI-specific chain for data and model incentives. It aims to make model development and calls on-chain traceable, composable, and sustainable. As a model incentive infrastructure in the Web3 world, it combines HuggingFace-style model hosting, Stripe-style usage billing, and Infura-style on-chain composable interfaces, promoting the path to "models as assets."
ModelFactory is a large language model (LLM) fine-tuning platform within the OpenLedger ecosystem. Unlike traditional fine-tuning frameworks, ModelFactory offers a purely graphical interface without the need for command-line tools or API integration. Users can fine-tune models based on datasets that have been authorized and reviewed on OpenLedger. It integrates data authorization, model training, and deployment into a unified workflow, with the core processes including:
Data Access Control: Users submit data requests, providers review and approve, and data is automatically connected to the model training interface.
Model Selection and Configuration: Supports mainstream LLMs (such as LLaMA, Mistral) with GUI-based hyperparameter configuration.
Lightweight Fine-Tuning: Built-in LoRA/QLoRA engine with real-time training progress display.
Model Evaluation and Deployment: Built-in evaluation tools support export for deployment or ecosystem sharing.
Interactive Verification Interface: Chat-style interface for easy testing of model Q&A capabilities.
RAG Generation Traceability: Answers come with source citations to enhance trust and auditability.
The Model Factory system architecture consists of six modules, covering identity authentication, data permissions, model fine-tuning, evaluation and deployment, and RAG traceability. It creates a secure, controllable, real-time interactive, and sustainable monetization platform for model services.
The current capabilities of the large language models supported by ModelFactory are briefly summarized as follows:
LLaMA Series: Widely adopted in the ecosystem, active community, strong general performance, and one of the most mainstream open-source foundation models.
Mistral: Highly efficient architecture, excellent inference performance, suitable for flexible deployment in resource-limited scenarios.
Qwen: Developed by Alibaba, excellent performance in Chinese tasks, strong comprehensive capabilities, and a preferred choice for domestic developers.
ChatGLM: Outstanding Chinese dialogue effects, suitable for vertical customer service and localization scenarios.
Deepseek: Superior in code generation and mathematical reasoning, suitable for intelligent development assistant tools.
Gemma: A lightweight model launched by Google, clear structure, easy to get started and experiment with.
Falcon: Once a performance benchmark, suitable for basic research or comparison tests, but community activity has declined.
BLOOM: Strong multi-language support, but weak inference performance, suitable for language coverage research.
GPT-2: A classic early model, only suitable for teaching and verification purposes, not recommended for actual deployment.
Although OpenLedger's model combination does not include the latest high-performance MoE models or multimodal models, its strategy is not outdated. Instead, it is a "utility-first" configuration based on the practical constraints of on-chain deployment (inference cost, RAG adaptation, LoRA compatibility, EVM environment).
As a no-code toolchain, Model Factory has built-in contribution proof mechanisms for all models to ensure the rights of data contributors and model developers. It offers the advantages of low barriers to entry, monetization, and composability. Compared to traditional model development tools:
For Developers: Provides a complete path for model incubation, distribution, and revenue.
For the Platform: Forms a model asset circulation and composability ecosystem.
For Users: Enables the combination and use of models or Agents like calling an API.
LoRA (Low-Rank Adaptation) is an efficient parameter fine-tuning method that inserts "low-rank matrices" into pre-trained large models to learn new tasks without modifying the original model parameters. This significantly reduces training costs and storage requirements. Traditional large language models (such as LLaMA, GPT-3) typically have billions or even tens of billions of parameters. To adapt them for specific tasks (such as legal Q&A or medical consultations), fine-tuning is required. The core strategy of LoRA is to "freeze the parameters of the original large model and only train the newly inserted parameter matrix." With its parameter efficiency, fast training, and flexible deployment, LoRA is currently the mainstream fine-tuning method best suited for Web3 model deployment and composability.
OpenLoRA is a lightweight inference framework designed by OpenLedger for multi-model deployment and resource sharing. Its core goal is to address common issues in AI model deployment, such as high costs, low reusability, and GPU resource waste, and to promote the implementation of "Payable AI."
The core components of the OpenLoRA system architecture, based on a modular design, cover key aspects such as model storage, inference execution, and request routing, enabling efficient and low-cost multi-model deployment and calling capabilities:
LoRA Adapter Storage Module: Fine-tuned LoRA adapters are hosted on OpenLedger, allowing on-demand loading to avoid preloading all models into GPU memory and saving resources.
Model Hosting and Adapter Merging Layer: All fine-tuned models share a base model, with LoRA adapters dynamically merged during inference. This supports ensemble inference with multiple adapters, enhancing performance.
Inference Engine: Integrated with multiple CUDA optimization technologies such as Flash-Attention, Paged-Attention, and SGMV.
Request Router and Token Streaming Module: Dynamically routes requests to the correct adapter based on the required model, and generates tokens at the kernel level to achieve stream-based output.
The inference process of OpenLoRA is a "mature and general" model service process as follows:
Base Model Loading: The system preloads base models such as LLaMA 3 and Mistral into GPU memory.
LoRA Dynamic Retrieval: After receiving a request, dynamically loads the specified LoRA adapter from Hugging Face, Predibase, or a local directory.
Adapter Merging and Activation: Optimized kernels merge the adapter with the base model in real-time, supporting combined inference with multiple adapters.
Inference Execution and Streamed Output: The merged model begins to generate responses, using token-level streaming output to reduce latency while ensuring efficiency and accuracy through quantization.
Inference Completion and Resource Release: After inference is complete, the adapter is automatically unloaded to free up GPU memory. This ensures efficient rotation and service of thousands of fine-tuned models on a single GPU, supporting efficient model rotation.
Through a series of underlying optimizations, OpenLoRA significantly improves the efficiency of multi-model deployment and inference. These include:
Dynamic LoRA Adapter Loading (JIT Loading): Effectively reduces GPU memory usage.
Tensor Parallelism and Paged Attention: Enables high concurrency and long-text processing.
Multi-Adapter Merging: Supports ensemble inference with multiple adapters.
Flash Attention, Precompiled CUDA Kernels, and FP8/INT8 Quantization: Further enhances inference speed and reduces latency through underlying CUDA optimization and quantization support.
These optimizations allow OpenLoRA to efficiently serve thousands of fine-tuned models in a single-card environment, balancing performance, scalability, and resource utilization.
OpenLoRA is not only an efficient LoRA inference framework but also deeply integrates model inference with Web3 incentive mechanisms. Its goal is to turn LoRA models into callable, composable, and revenue-sharing Web3 assets.
Model as Asset: OpenLoRA does not just deploy models but assigns each fine-tuned model an on-chain identity (Model ID) and binds its calling behavior to economic incentives, achieving "call-based revenue sharing."
Dynamic Multi-LoRA Merging + Revenue Attribution: Supports dynamic combination calls of multiple LoRA adapters, allowing different model combinations to form new Agent services. The system can precisely allocate revenue to each adapter based on the PoA (Proof of Attribution) mechanism according to the call volume.
Multi-Tenant Shared Inference for Long-Tail Models: Through dynamic loading and memory release mechanisms, OpenLoRA can serve thousands of LoRA models in a single-card environment, making it particularly suitable for high-reuse, low-frequency calling scenarios in Web3, such as niche models and personalized AI assistants.
High-quality, domain-specific data is a key element in building high-performance models. Datanets are the "data as asset" infrastructure of OpenLedger, a decentralized network for aggregating, verifying, and distributing domain-specific data to provide high-quality data sources for AI model training and fine-tuning. Each Datanet acts like a structured data warehouse where contributors upload data, ensuring data traceability and trustworthiness through on-chain attribution mechanisms. Through incentive mechanisms and transparent permission controls, Datanets enable community-built and trustworthy use of data for model training.
Compared to projects like Vana that focus on data sovereignty, OpenLedger goes beyond "data collection." Through Datanets (collaborative labeling and attribution datasets), Model Factory (no-code fine-tuning model training tool), and OpenLoRA (traceable and composable model adapters), OpenLedger extends the value of data to model training and on-chain calls, creating a complete "data-to-intelligence" closed loop. While Vana emphasizes "who owns the data," OpenLedger focuses on "how data is trained, called, and rewarded," occupying key positions in data sovereignty protection and data monetization paths in the Web3 AI ecosystem.
Proof of Attribution (PoA) is the core mechanism of OpenLedger for data attribution and incentive distribution. Through on-chain cryptographic records, it establishes a verifiable link between each training data point and model output, ensuring contributors receive their deserved rewards when models are called. The data attribution and incentive process is as follows:
Data Submission: Users upload structured, domain-specific datasets and establish on-chain ownership.
Impact Assessment: The system evaluates the value of data based on its feature influence and contributor reputation during each inference.
Training Verification: Training logs record the actual usage of each data point to ensure contributions are verifiable.
Incentive Distribution: Token rewards are distributed to contributors based on data impact, tied to effectiveness.
Quality Governance: Low-quality, redundant, or malicious data is penalized to ensure model training quality.
Compared to blockchain-based general-purpose incentive networks like Bittensor's subnet architecture with scoring mechanisms, OpenLedger focuses on value capture and revenue-sharing mechanisms at the model level. PoA is not just an incentive distribution tool but a framework for transparency, source tracking, and multi-stage attribution. It records the entire process of data upload, model calling, and Agent execution on-chain, creating an end-to-end verifiable value path. This mechanism ensures that every model call can be traced back to data contributors and model developers, achieving true "value consensus" and "reward availability" in on-chain AI systems.
RAG (Retrieval-Augmented Generation) is an AI architecture that combines retrieval systems with generative models. It aims to address the "knowledge isolation" and "fabrication" issues of traditional language models by introducing external knowledge bases to enhance model generation capabilities, making outputs more truthful, interpretable, and verifiable. RAG Attribution is the data attribution and incentive mechanism established by OpenLedger in the context of retrieval-augmented generation. It ensures that model outputs are traceable and verifiable, contributors are incentivized, and ultimately, generation is credible and data transparent. The process includes:
User Query → Data Retrieval: After receiving a question, the AI retrieves relevant content from the OpenLedger data index.
Data Call and Response Generation: The retrieved content is used to generate model responses, and the call behavior is recorded on-chain.
Contributor Reward: After data usage, contributors receive incentives calculated based on amount and relevance.
Response with Citation: Model outputs include links to the original data sources, enabling transparent Q&A and verifiable content.