EngineeringApril 15, 2026· 10 min read

Building an AI app on Android with TFLite and MediaPipe in 2026

A practical walkthrough of the Android on-device AI stack: Gemma, TFLite, MediaPipe LLM Inference, and how to make it feel as fast as the iOS version.

iOS gets most of the on-device AI press, but the Android story in 2026 is genuinely good — and in some ways more open. You have real choices about which runtime, which model, and how deep to go into NNAPI vs. vendor delegates.

The stack I landed on: Gemma 2B (4-bit) as the default model, MediaPipe's LLM Inference API as the high-level wrapper, TFLite with the GPU + NNAPI delegates underneath, and a Jetpack Compose UI that streams tokens through a Kotlin `Flow`.

Why MediaPipe over raw TFLite. MediaPipe gives you a sane streaming interface, KV-cache management, and tokenizer integration out of the box. Going straight to TFLite is fine if you need a non-LLM model (vision, audio), but for a chat loop it's a lot of plumbing for no real win.

Delegate selection is device-specific. On Pixel 8/9 the Tensor TPU delegate wins. On flagship Snapdragon, the Hexagon delegate via QNN is dramatically faster than NNAPI. On older mid-range hardware, the GPU delegate is usually your best bet. I ship a small benchmark on first launch and pick the winner.

The packaging problem. A 1.5GB model in your APK is a non-starter on Play Store install caps. The fix is Play Asset Delivery with `on-demand` mode — the app installs at ~30MB and pulls the model on first run, with a clean progress UI.

Compose makes streaming easy. A `StateFlow<List<Token>>` from the inference module, collected in a `LazyColumn` with `animateItemPlacement`, and you get smooth token-by-token rendering with maybe 30 lines of UI code. This is one of those moments where the modern Android stack genuinely earns its keep.

Don't skip the foreground service. If you let the user background the app mid-generation, Android will happily kill your inference process. Wrapping the long-running call in a foreground service with a 'thinking…' notification is mandatory, not optional.

Building an AI app on Android with TFLite and MediaPipe in 2026

Keep reading

What is a native AI app, and why does it matter in 2026?

Shipping an on-device LLM without melting the phone