v1.5.0 · System prompts + 28 languages

Run large language models
on your phone.

No cloud, no internet, no tracking.

MIT — free & open source 25+ supported models 0 network calls at inference
12:19
Gemma 4 E2B
3.41 GB
What is machine learning?
Thinking...

Machine learning is a field of computer science where systems are taught to learn patterns and make decisions from data, without being explicitly programmed for every specific task.

The goal is to enable computers to predict outcomes or classify new, unseen data. For example, if you show it thousands of pictures labeled "cat" and "dog," the machine learns the features that define a cat versus a dog.

120 tokens · 1.5s · 80 tok/s
Message Send
Runs open-weights from
Qwen Gemma Meta Llama Nemotron Phi-4 Granite Mistral DeepSeek Liquid AI
Why on-device

Private by construction. Fast by design.

The model weights live on your phone. Prompts, drafts and answers never touch a server — there isn't one to touch.

01

Fully offline

Airplane mode? Subway? Flight? No difference. Inference happens on your SoC, never on someone else's server.

02

Your roles, saved

Save reusable system prompts once and pick the right one for any model. Keep tone, role and output format consistent across every session.

03

Tune every knob

Temperature, Top-P, Top-K, Min-P, repetition penalty, seed, context size — each model remembers its own settings.

04

Save anywhere

Models are multi-gigabyte. Download them to any folder — internal storage, SD card, or an external drive. Move them between locations any time.

Model library · 25+

Pick a brain. Tap download. Chat.

Browse community-made models optimized for mobile — compact enough to fit on your phone, powerful enough to be useful.

0pkts/s
Zero network traffic during inference. After the model is downloaded, you can pull the SIM, turn off Wi-Fi, and keep chatting. There is no telemetry, no analytics, no "helpful" background sync.
10k 1k 100 10 0 100 1k 10k 100k 1M 10M Bytes sent to server Tokens generated →
How it works

Three taps from app open to first token.

STEP 01

Pick a model

Browse the curated list or paste your own model URL. Sizes range from 267 MB to 5.4 GB.

Qwen 3 1.7B
Gemma 3 1B806 MB
Llama 3.2 3B2.02 GB
Phi-4 mini2.49 GB
DeepSeek R1 1.5B1.12 GB
STEP 02

Download and resume

Reliable background downloads with notifications, speed, and ETA — with automatic resume when the connection drops.

Qwen 3 1.7B73%
12.4 MB/sETA 0:28
Wi-Fi · HomeResuming...
→ /storage/emulated/0/LMPlayground/
STEP 03

Chat locally

Your conversation history stays on your device. Reasoning is shown inline. No accounts, no API keys.

Summarize this email.
The sender is moving Thursday's sync to Friday 10am PT — confirm or propose another time.
● on-device38 tok/s
Languages · 28

Speaks your language.
Right out of the box.

The whole app is translated into 28 languages. On first launch, we'll pick a model that speaks yours.

28 locales
Spotted a wrong translation? Please open a PR on GitHub.
Under the hood

Engineered for phones,
not data centers.

Built on llama.cpp with GGUF quantized models. Native C++ inference with ARM-optimized kernels means less heat, more tokens per second, and no unexpected background drain.

  • Inference enginellama.cpp
  • Model formatGGUF · Q4_K_M
  • KernelsKleidiAI + OpenMP
  • Min AndroidAPI 30 (Android 11)
  • Architecturearm64-v8a
  • UIJetpack Compose · Material 3
  • LicenseMIT
// LlamaCpp.kt — Kotlin ↔ C++ bridge to llama.cpp package com.druk.llamacpp class LlamaCpp { companion object { init { System.loadLibrary("llamacpp") } } external fun init(): Int external fun systemInfo(): String external fun loadModel( path: String, progressCallback: LlamaProgressCallback, ): LlamaModel external fun probeModelMetadata(path: String): Array<String>? } // no network, no telemetry, no keys.
FAQ

Questions you had
before you asked.

Still stuck? Open an issue on GitHub or reach out directly.

What is LM Playground?
LM Playground lets you run large language models directly on your Android device. All processing happens locally — no cloud servers needed.
What models are supported?
The app supports models in GGUF format with Q4_K_M quantization, including Qwen 3, Gemma 3, Llama 3.2, Phi‑4 mini, and DeepSeek R1 Distill in various sizes.
How much storage space do I need?
Model sizes vary from about 500 MB (small models like Qwen 3 0.6B) to several GB (larger models like DeepSeek R1 7B). Make sure you have enough free space before downloading.
Does the app require an internet connection?
Internet is only needed to download models. Once downloaded, models run completely offline on your device.
Is my data private?
Yes. All conversations are processed locally on your device. No data is sent to external servers.
Why is model loading slow?
Larger models take more time to load into memory. Loading times depend on your device's hardware. Once loaded, the model stays in memory until you unload it.
Which devices work best?
Devices with more RAM can run larger models. For best performance, use a device with at least 6 GB of RAM for small models and 8+ GB for larger ones.
Can I load a custom GGUF model?
Yes. Place your .gguf file in the storage folder selected in Settings → Models (the same folder used for downloads). The app will pick it up automatically and show it in the model selector alongside the built-in catalog. Chat template and tokenizer settings are read from the GGUF metadata. If a specific model doesn't work, please open an issue on GitHub.
Can I change where models are stored?
Yes. Go to Settings, then Models, and use the "Change Folder" option to select a different storage location.
How do I delete a model?
Go to Settings, then Models. In the "Downloaded" section, tap the delete icon next to the model you want to remove.

Any model in your pocket.

Seconds to install. Minutes to download a model. Then you're done with the cloud.

Tweaks