How to Implement GPU-Based LLM Inference in AO
With the rapid development of artificial intelligence (AI) technology, an increasing number of large language model (LLM) applications require efficient computational resources. In this article, we will explore how to integrate APUS's GPU extension into the Application Overlay (AO) system to support more powerful AI model inference. Before delving into how GPU extensions work in the AO network, let's briefly review how typical AI applications operate and the composition of the AO ne...

Getting Started with HyperBEAM: Building a Custom Device for Beginners
AbstractThis guide introduces developers to HyperBEAM's distributed computing framework through hands-on device extension. Learn how to leverage Erlang/OTP architecture and the Converge Protocol to create custom devices. Beginners will gain practical experience through a calculator device demo, understanding NIFs (Native Implemented Functions) and WASM port communication patterns.ChaptersIntroduction to HyperBEAMConverge Protocol : the root of device call logic and pathBuilding a Simple ...

The Future Is Deterministic: HyperBeam Architecture and the Importance of Hashpaths in AO
1. IntroductionAs decentralized computation evolves, HyperBeam emerges as a powerful client implementation of the AO-Core protocol, enabling distributed computation in a modular and verifiable way. By abstracting hardware resources and standardizing computation through devices, HyperBeam allows a wide range of computational models to operate seamlessly within the AO ecosystem. At the core of this system lies the concept of Hashpaths, which serve as unique identifiers for computational state a...
<100 subscribers
How to Implement GPU-Based LLM Inference in AO
With the rapid development of artificial intelligence (AI) technology, an increasing number of large language model (LLM) applications require efficient computational resources. In this article, we will explore how to integrate APUS's GPU extension into the Application Overlay (AO) system to support more powerful AI model inference. Before delving into how GPU extensions work in the AO network, let's briefly review how typical AI applications operate and the composition of the AO ne...

Getting Started with HyperBEAM: Building a Custom Device for Beginners
AbstractThis guide introduces developers to HyperBEAM's distributed computing framework through hands-on device extension. Learn how to leverage Erlang/OTP architecture and the Converge Protocol to create custom devices. Beginners will gain practical experience through a calculator device demo, understanding NIFs (Native Implemented Functions) and WASM port communication patterns.ChaptersIntroduction to HyperBEAMConverge Protocol : the root of device call logic and pathBuilding a Simple ...

The Future Is Deterministic: HyperBeam Architecture and the Importance of Hashpaths in AO
1. IntroductionAs decentralized computation evolves, HyperBeam emerges as a powerful client implementation of the AO-Core protocol, enabling distributed computation in a modular and verifiable way. By abstracting hardware resources and standardizing computation through devices, HyperBeam allows a wide range of computational models to operate seamlessly within the AO ecosystem. At the core of this system lies the concept of Hashpaths, which serve as unique identifiers for computational state a...
Share Dialog
Share Dialog


The demand for running complex computations, especially AI inference, directly on decentralized platforms like the AO Computer is growing. Apus Network is tackling this challenge by integrating GPU acceleration into HyperBeam, providing a powerful solution for AO. Critically, this integration ensures deterministic execution, a vital requirement for both the verifiable nature of the AO ecosystem and the predictable outcomes needed for reliable AI applications. This post explores the architecture and technology stack enabling this deterministic GPU capability within HyperBeam.
At the heart of HyperBeam's GPU solution lies an integration built around llamacpp. The framework connects the high-level HyperBEAM environment with the low-level GPU hardware through several layers:
HyperBEAM
dev_wasi_nn.erl: The entry point within the HyperBEAM Erlang environment.
dev_wasi_nn_nif.erl: An Erlang Native Implemented Function (NIF) that acts as a bridge to C code.
WASM Module (aos-wasi-nn.wasm): A WebAssembly module responsible for executing the inference tasks. It interacts with a specific WASI-NN backend.
WASI-NN Backend (lib_wasinn_llamacpp.c): This C library implements the WASI-NN standard, translating the calls into operations that llama.cpp can understand. It handles loading models, initializing execution contexts, setting inputs, computing, and getting outputs.
Llama.cpp: A key C++ library optimized for running LLM inference efficiently, including support for GPU acceleration via CUDA and guarantee deterministic output.

How does a request travel through this system to reach the GPU? The layer path diagram provides a clear picture:
A request might originate from a C or Lua environment, potentially interacting via an HTTP message handled by aos console.
This request eventually triggers the aos-wasi-nn WASM module that compiled Lua into wasm file.
Then , the execution flows from dev_wasm (handling WASM execution) to dev_wasi_nn (the Erlang WASI-NN interface) by import_resolver , the mechanism that how Hyperbeam call a device .
dev_wasi_nn calls the NIF (dev_wasi_nn_nif), bridging the gap between Erlang and C.
The NIF, in turn, invokes functions within the wasi_nn_llamacpp dynamic library compiled by C program which utilizes the llamacpp api interface to interact with llama.cpp .
Finally, llama.cpp , which then leverages CUDA to perform the computation on the GPU and return the output back to Hyperbeam.

This solution leverages several important technologies:
WASI-NN: a standard way for WebAssembly (Wasm) programs to perform Machine Learning (ML) inference (running pre-trained models) by using the host system's capabilities.
Erlang NIFs: Allow efficient interoperability between the BEAM (Erlang VM) and native code (C/C++).
Llama.cpp: Offers high-performance inference for large language models, crucially with GPU support.
The core components are available in the following repositories:
aos-wasi-nn: Integrates the aos system with WASI-NN. (Link: https://github.com/apuslabs/aos-wasi-nn)
HyperBEAM (dev_wasi_nn branch): Contains the GPU extension part, specifically the dev_wasi_nn Erlang module. (Link: https://github.com/apuslabs/HyperBEAM/tree/feat/dev_wasi_nn)
wasi-nn-backend: Implements the NIF layer that calls llama.cpp. (Link: https://github.com/apuslabs/wasi_nn_backend)
Hyperbeam basic knowledge Background
if you are still not familiar with Hyperbeam project, device concepts . please refer to our previous tutorial article https://mirror.xyz/0xE84A501212d68Ec386CAdAA91AF70D8dAF795C72/iLNw4j49tiB7XUE6EUayuFXUvdXnG-USTUBjK0HUH9c
Run unit test in dev_wasi_nn_nif
wasi_nn_exec_test() ->
Init = generate_wasi_nn_stack("test/wasi-nn.wasm", <<"lib_main">>, []),
Instance = hb_private:get(<<"wasm/instance">>, Init, #{}),
{ok, StateRes} = hb_converge:resolve(Init, <<"compute">>, #{}),
[Ptr] = hb_converge:get(<<"results/wasm/output">>, StateRes),
{ok, Output} = hb_beamr_io:read_string(Instance, Ptr),
?event({wasm_output, Output}),
?assertNotEqual(<<"">>, Output).
Got result from llama backend
Let’s Test It - The Simple Way!
We are going to interact with the qwen 2.5 7B q2_k model using the wasi-nn device on HyperBEAM!
Check out our HyperBEAM node:
https://8bb814239d968378-10000.jp-tyo-1.gpu-instance.novita.ai/
2. Run inference!
3. Change the prompt at the end of the URL “prompt=who%20are%20you”
Examples:
prompt=how%20do%20you%20work
prompt=what%20is%20arweave

4. Enjoy deterministic AI inference on HyperBEAM!
Apus Network's HyperBeam GPU device solution presents a well-architected approach to bringing GPU-accelerated AI inference to AO ecosystem. By leveraging WebAssembly, llamacpp, and efficient native code integration via Erlang NIFs, they have created a pathway to harness the power of libraries like llama.cpp and CUDA within the HyperBeam environment. This opens up exciting possibilities for running demanding AI workloads directly within the HyperBeam in AO .
The demand for running complex computations, especially AI inference, directly on decentralized platforms like the AO Computer is growing. Apus Network is tackling this challenge by integrating GPU acceleration into HyperBeam, providing a powerful solution for AO. Critically, this integration ensures deterministic execution, a vital requirement for both the verifiable nature of the AO ecosystem and the predictable outcomes needed for reliable AI applications. This post explores the architecture and technology stack enabling this deterministic GPU capability within HyperBeam.
At the heart of HyperBeam's GPU solution lies an integration built around llamacpp. The framework connects the high-level HyperBEAM environment with the low-level GPU hardware through several layers:
HyperBEAM
dev_wasi_nn.erl: The entry point within the HyperBEAM Erlang environment.
dev_wasi_nn_nif.erl: An Erlang Native Implemented Function (NIF) that acts as a bridge to C code.
WASM Module (aos-wasi-nn.wasm): A WebAssembly module responsible for executing the inference tasks. It interacts with a specific WASI-NN backend.
WASI-NN Backend (lib_wasinn_llamacpp.c): This C library implements the WASI-NN standard, translating the calls into operations that llama.cpp can understand. It handles loading models, initializing execution contexts, setting inputs, computing, and getting outputs.
Llama.cpp: A key C++ library optimized for running LLM inference efficiently, including support for GPU acceleration via CUDA and guarantee deterministic output.

How does a request travel through this system to reach the GPU? The layer path diagram provides a clear picture:
A request might originate from a C or Lua environment, potentially interacting via an HTTP message handled by aos console.
This request eventually triggers the aos-wasi-nn WASM module that compiled Lua into wasm file.
Then , the execution flows from dev_wasm (handling WASM execution) to dev_wasi_nn (the Erlang WASI-NN interface) by import_resolver , the mechanism that how Hyperbeam call a device .
dev_wasi_nn calls the NIF (dev_wasi_nn_nif), bridging the gap between Erlang and C.
The NIF, in turn, invokes functions within the wasi_nn_llamacpp dynamic library compiled by C program which utilizes the llamacpp api interface to interact with llama.cpp .
Finally, llama.cpp , which then leverages CUDA to perform the computation on the GPU and return the output back to Hyperbeam.

This solution leverages several important technologies:
WASI-NN: a standard way for WebAssembly (Wasm) programs to perform Machine Learning (ML) inference (running pre-trained models) by using the host system's capabilities.
Erlang NIFs: Allow efficient interoperability between the BEAM (Erlang VM) and native code (C/C++).
Llama.cpp: Offers high-performance inference for large language models, crucially with GPU support.
The core components are available in the following repositories:
aos-wasi-nn: Integrates the aos system with WASI-NN. (Link: https://github.com/apuslabs/aos-wasi-nn)
HyperBEAM (dev_wasi_nn branch): Contains the GPU extension part, specifically the dev_wasi_nn Erlang module. (Link: https://github.com/apuslabs/HyperBEAM/tree/feat/dev_wasi_nn)
wasi-nn-backend: Implements the NIF layer that calls llama.cpp. (Link: https://github.com/apuslabs/wasi_nn_backend)
Hyperbeam basic knowledge Background
if you are still not familiar with Hyperbeam project, device concepts . please refer to our previous tutorial article https://mirror.xyz/0xE84A501212d68Ec386CAdAA91AF70D8dAF795C72/iLNw4j49tiB7XUE6EUayuFXUvdXnG-USTUBjK0HUH9c
Run unit test in dev_wasi_nn_nif
wasi_nn_exec_test() ->
Init = generate_wasi_nn_stack("test/wasi-nn.wasm", <<"lib_main">>, []),
Instance = hb_private:get(<<"wasm/instance">>, Init, #{}),
{ok, StateRes} = hb_converge:resolve(Init, <<"compute">>, #{}),
[Ptr] = hb_converge:get(<<"results/wasm/output">>, StateRes),
{ok, Output} = hb_beamr_io:read_string(Instance, Ptr),
?event({wasm_output, Output}),
?assertNotEqual(<<"">>, Output).
Got result from llama backend
Let’s Test It - The Simple Way!
We are going to interact with the qwen 2.5 7B q2_k model using the wasi-nn device on HyperBEAM!
Check out our HyperBEAM node:
https://8bb814239d968378-10000.jp-tyo-1.gpu-instance.novita.ai/
2. Run inference!
3. Change the prompt at the end of the URL “prompt=who%20are%20you”
Examples:
prompt=how%20do%20you%20work
prompt=what%20is%20arweave

4. Enjoy deterministic AI inference on HyperBEAM!
Apus Network's HyperBeam GPU device solution presents a well-architected approach to bringing GPU-accelerated AI inference to AO ecosystem. By leveraging WebAssembly, llamacpp, and efficient native code integration via Erlang NIFs, they have created a pathway to harness the power of libraries like llama.cpp and CUDA within the HyperBeam environment. This opens up exciting possibilities for running demanding AI workloads directly within the HyperBeam in AO .
No comments yet