Liquid AI has released a compact MoE model designed for consumer devices
LFM2.5-8B-A1B is a Mixture-of-Experts language model with 8 billion total and 1 billion active parameters, optimized for inference on laptops, smartphones, and reasoning tasks.
This release expands the LFM2 series: the context window is now 128k tokens, pre-training volume reached 38 trillion tokens, and large-scale RL has been applied. The tokenizer vocabulary has doubled to 128k, significantly improving efficiency for non-Latin scripts.
Liquid AI emphasizes speed and tool-use: the model achieves up to 253 tokens/sec on an Apple M5 Max (under 6 GB RAM) and ~30 tokens/sec on a smartphone. In benchmarks, it performs comparably to models like Gemma-4-26B.
Support: llama.cpp, MLX, vLLM, SGLang, and ONNX Runtime.
📌 Licensing: LFM Open License
🟡 Blog Post
🟡 Documentation
🟡 Weights
🟡
