Speech and language AI for the Horn of Africa.
Tabotec Languages is a department of Tabotec dedicated to building fine-tuned AI for under-served languages. We launched in the Horn of Africa because the gap is enormous and because it's home — but the same pipeline is designed to travel.
Tigvoice
Tigrigna speech recognition and synthesis. Trained on a curated WhatsApp-and-broadcast corpus, tuned for everyday conversational speech and Eritrean/Tigrayan accents.
- Word-level timestamps and confidence
- Streaming and batch transcription
- Configurable voice IDs for TTS
Tabotec Tigrigna
An instruction-tuned LLM for Tigrigna. Reads and writes Geez script natively, follows Tigrigna instructions, and degrades gracefully to English when needed.
- Open-weight base (Llama / Qwen)
- Self-hostable on a single GPU
- API and fine-tune-on-your-data options
Why this matters
The languages of ~150 million people are missing from frontier AI.
Major language initiatives — Gates' African Next Voices, Mozilla's East Africa fund, Google WAXAL — collectively cover dozens of African languages. None of them ship Tigrigna. Most ship none of the long-tail Ethiopian languages at all.
That gap is where this program builds. We start with Tigrigna because it's the most conspicuously underserved Horn-of-Africa language, then move down the list.
Coverage roadmap
Languages in our pipeline
Filled = in active training. Outlined = next 12 months.
How the program works
A repeatable pipeline behind every language.
Each language we add goes through the same four steps. The boring parts — corpus curation, eval design, deployment — are where the actual quality lives.
1. Curate the data
Clean, deduplicated training manifests with confidence scoring and human review. No "throw the internet at it" — every row is accountable, with provenance.
2. Fine-tune the right base
ASR on Whisper, TTS on Piper / VibeVoice, language on Llama, Qwen, or your existing checkpoint. We pick what fits, not what's fashionable.
3. Evaluate honestly
Real-world eval suites in the target language — broadcast transcripts, WhatsApp speech, code-switched text. We publish numbers we can defend.
Pilot access
Be first to use Tigvoice and Tabotec Tigrigna.
We are onboarding pilot users in the Horn of Africa diaspora, NGOs, broadcasters, and translation teams. Leave your email and we'll reach out when access opens.
