
Welcome to Veyra AI
Tiny English language models built for fast local inference.
Veyra AI focuses on compact, CPU-friendly language models that are easy to run, fine-tune, and experiment with. Our work is centered on small English models, function calling, Python-oriented variants, distillation, RLVR, tool use, and local AI.
The goal is simple: make capable small models that are practical for local workflows, research, and lightweight deployment.
Current Model Families:
- Veyra2 30M — Next-generation 30M parameter model optimized for low-latency inference and on-device deployment. Delivers excellent speed and efficiency without sacrificing responsiveness.
- Veyra2 15M — Ultra-lightweight 15M parameter model built for highly resource-constrained environments. Ideal for edge devices and ultra-fast inference.
- Veyra 30M (Legacy) — Proven 30M base model with strong instruction-following and balanced general capabilities. Still reliable for many use cases.
Planned Model Families:
- Veyra2 80M Fast and efficient 80M parameter model that brings significantly improved reasoning, coherence, and instruction adherence while maintaining excellent speed and low resource usage.
- Veyra SmolLM2 135M — Compact yet highly capable 135M model. A custom instruction-tuned version of SmolLM2 135M, offering strong performance in a small footprint.
- Kairo 30M/50M — Experimental architecture designed to validate next-generation design choices for the entire Veyra lineup. Some Kairo models are out now.