Cover photo

Running LLMs on Android

Hello World

I wanted to have an LLM on my phone, cause why not

it’s like a mini-library on your phone which you can talk to, very useful when you’re offline somewhere in the woods

To start I installed Termux on my phone

Termux
Termux

Note that the version on the Google Playstore is outdated so you will need to get it from the superior Open-Source store that is F-Droid or you can download the Termux apk either directly from the official GitHub Repo

Termux was how I learnt Linux commands, I didn’t even realize they were Linux commands as I never had used Linux before and didn’t have a laptop when I started screwing around on my phone (I know android is technically Linux but it’s not the same)

Termux with light ricing
Termux with light ricing

The first screen is what you will see when you first open it, I installed zsh and oh my zsh with powerlevel10k theme and other features such as auto-suggestion and auto-completion, feel free to ask me for a tutorial on that I would drop one

https://github.com/ggerganov/llama.cpp.git

I’m gonna run the LLMs using llama.cpp, considering how light weight and easy it is to setup, and models are so less complicated to use, the gguf format combine all the model files into one package and pretty much every model you can think of has pre-quantized gguf files ready on hugging face

we have to install some packages on Termux to be able to get going

for that we need to first setup the repos

run

termux-change-repo

post image

and I selected the group rotate and all mirrors so that I could always find a package which could be at least in one of those repos

pkg update && pkg upgrade

ran this to update all the package lists and upgrade existing packages

then installed the packages we need

pkg install wget git clang cmake

wget: to download model files

git: to work with the llama.cpp repo

clang: to compile the code into binaries

cmake: additional tool to be used in the compile process

voila! your Termux setup is ready

Clone the llama.cpp repo

git clone https://github.com/ggerganov/llama.cpp.git

now change your directory into the folder

cd llama.cpp

and compile the binaries

make -j $(nproc)

your bins are ready now we need to get the model, I’m using the qwen2 1.5b param model

https://huggingface.co/Qwen/Qwen2-1.5B-Instruct-GGUF/tree/main

make a folder for it inside the models folder

cd models

mkdir qwen2

cd qwen2

copy the address link of the download button of the quantized version you want and run

make sure to remove the ?dowload=true from the end or it will save it as your-abc-model.gguf/download=true it should have .gguf as the extension

wget https://huggingface.co/Qwen/Qwen2-1.5b-Instruct-GGUF/resolve/main/qwen2-1_5b-instruct-q4_k_m.gguf

you can get the lower precision models which are smaller in size but you will lose quality over performance while running, anything less than q4 is not recommended

now cd back to the root of the repo

cd ../..

and run the following command

./llama-cli -m ./models/qwen2/qwen2-1_5b-instruct-q4_k_m.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

Questioning it about Rome
Questioning it about Rome
Performance Summary
Performance Summary

There you go you have your own chatGPT/claude/gemini works completely offline!

https://shkspr.mobi/blog/2015/02/overlapping-animated-gifs/
https://shkspr.mobi/blog/2015/02/overlapping-animated-gifs/