# Running LLMs on Android

By [wings](https://paragraph.com/@wings-3) · 2024-07-02

---

Hello World

I wanted to have an LLM on my phone, cause why not

it’s like a mini-library on your phone which you can talk to, very useful when you’re offline somewhere in the woods

To start I installed [Termux](https://termux.dev/en/) on my phone

![Termux](https://storage.googleapis.com/papyrus_images/4666f0e4bec2bf3863c7579a17636cfa57f5565d6751b7d6b689965964350462.png)

Termux

Note that the version on the Google Playstore is outdated so you will need to get it from the superior Open-Source store that is [F-Droid](https://f-droid.org/en/) or you can download the Termux apk either directly from the official [GitHub Repo](https://github.com/termux/termux-app/releases/)

Termux was how I learnt Linux commands, I didn’t even realize they were Linux commands as I never had used Linux before and didn’t have a laptop when I started screwing around on my phone (I know android is technically Linux but it’s not the same)

![Termux with light ricing](https://storage.googleapis.com/papyrus_images/b077233543dee7cb1e81adbc2edd05d4662c355dd6bdc68c7de79a037b5afd6d.png)

Termux with light ricing

The first screen is what you will see when you first open it, I installed zsh and oh my zsh with powerlevel10k theme and other features such as auto-suggestion and auto-completion, feel free to ask me for a tutorial on that I would drop one

[https://github.com/ggerganov/llama.cpp.git](https://github.com/ggerganov/llama.cpp.git)

I’m gonna run the LLMs using [llama.cpp](https://github.com/ggerganov/llama.cpp), considering how light weight and easy it is to setup, and models are so less complicated to use, the [gguf](https://huggingface.co/docs/hub/en/gguf) format combine all the model files into one package and pretty much every model you can think of has pre-quantized gguf files ready on hugging face

we have to install some packages on Termux to be able to get going

for that we need to first setup the repos

run

`termux-change-repo`

![](https://storage.googleapis.com/papyrus_images/0c26e3e5b1c4b623c2794427d66dc030dff698e0c50d113fdb4c708f76168560.png)

and I selected the `group rotate` and `all mirrors` so that I could always find a package which could be at least in one of those repos

`pkg update && pkg upgrade`

ran this to update all the package lists and upgrade existing packages

then installed the packages we need

`pkg install wget git clang cmake`

_wget: to download model files_

_git: to work with the llama.cpp repo_

_clang: to compile the code into binaries_

_cmake: additional tool to be used in the compile process_

voila! your Termux setup is ready

Clone the llama.cpp repo

`git clone https://github.com/ggerganov/llama.cpp.git`

now change your directory into the folder

`cd llama.cpp`

and compile the binaries

`make -j $(nproc)`

your bins are ready now we need to get the model, I’m using the qwen2 1.5b param model

[https://huggingface.co/Qwen/Qwen2-1.5B-Instruct-GGUF/tree/main](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct-GGUF/tree/main)

make a folder for it inside the models folder

`cd models`

`mkdir qwen2`

`cd qwen2`

copy the address link of the download button of the quantized version you want and run

_make sure to remove the_ `?dowload=true` _from the end or it will save it as_ `your-abc-model.gguf/download=true` _it should have .gguf as the extension_

`wget https://huggingface.co/Qwen/Qwen2-1.5b-Instruct-GGUF/resolve/main/qwen2-1_5b-instruct-q4_k_m.gguf`

you can get the lower precision models which are smaller in size but you will lose quality over performance while running, anything less than q4 is not recommended

now cd back to the root of the repo

`cd ../..`

and run the following command

`./llama-cli -m ./models/qwen2/qwen2-1_5b-instruct-q4_k_m.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt`

![Questioning it about Rome](https://storage.googleapis.com/papyrus_images/07a2b941f644c0633cdbaef590e4c193770666526760928c0650502448c74a37.png)

Questioning it about Rome

![Performance Summary](https://storage.googleapis.com/papyrus_images/e82fdeb5bc829a8aebdd18d34e466953f3d4f852597f83284eb253802ce3b13c.png)

Performance Summary

There you go you have your own chatGPT/claude/gemini works completely offline!

![https://shkspr.mobi/blog/2015/02/overlapping-animated-gifs/](https://storage.googleapis.com/papyrus_images/bdff84f8d50d7e2e341fc1aff8c3198906bf72a0fa125f0fd2f6009ebbeeda27.gif)

https://shkspr.mobi/blog/2015/02/overlapping-animated-gifs/

---

*Originally published on [wings](https://paragraph.com/@wings-3/running-llms-on-android)*