- Android smartphone
- It is recommended to use a phone with Snapdragon 8 series or better chipset.
Follow the instructions in Deploying llama.cpp on PC to acquire and quantize the model file ggml-model-Q4_K_M.gguf
.
Transfer the quantized model file to the /sdcard/Download
directory on your phone. Here, we provide a method using ADB (Android Debug Bridge), although other methods can also be used:
adb push ggml-model-Q4_K_M.gguf /sdcard/Download
Download and install an appropriate version of Termux on your phone; it is recommended to use version v0.118.1.
Open the Termux application and run the following command to grant storage permissions to Termux:
termux-setup-storage
Fetch the llama.cpp source code and compile it within Termux:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make main
Use the compiled llama-cli tool to perform inference:
./llama-cli -m /sdcard/Download/ggml-model-Q4_K_M.gguf --prompt "<User>Do you know openmbmb?<AI>"
Now you can start performing inference using the MiniCPM model on your Android device!
Please note that some of the commands mentioned above may need to be adjusted according to your specific environment, such as the version number of Termux or other details.