Home  /  Products  /  Languages

Tabotec Languages

Speech and language AI for the Horn of Africa.

Tabotec Languages is a department of Tabotec dedicated to building fine-tuned AI for under-served languages. We launched in the Horn of Africa because the gap is enormous and because it's home — but the same pipeline is designed to travel.

Tigrigna ASR & TTS in pilot Llama / Whisper / Piper based Self-hostable on your hardware

Why this matters

The languages of ~150 million people are missing from frontier AI.

Major language initiatives — Gates' African Next Voices, Mozilla's East Africa fund, Google WAXAL — collectively cover dozens of African languages. None of them ship Tigrigna. Most ship none of the long-tail Ethiopian languages at all.

That gap is where this program builds. We start with Tigrigna because it's the most conspicuously underserved Horn-of-Africa language, then move down the list.

Coverage roadmap

Languages in our pipeline

Tigrigna Amharic Afaan Oromoo Somali Afar Sidamo Wolaytta Hadiyya Sidaama

Filled = in active training. Outlined = next 12 months.

How the program works

A repeatable pipeline behind every language.

Each language we add goes through the same four steps. The boring parts — corpus curation, eval design, deployment — are where the actual quality lives.

1. Curate the data

Clean, deduplicated training manifests with confidence scoring and human review. No "throw the internet at it" — every row is accountable, with provenance.

2. Fine-tune the right base

ASR on Whisper, TTS on Piper / VibeVoice, language on Llama, Qwen, or your existing checkpoint. We pick what fits, not what's fashionable.

3. Evaluate honestly

Real-world eval suites in the target language — broadcast transcripts, WhatsApp speech, code-switched text. We publish numbers we can defend.

Pilot access

Be first to use Tigvoice and Tabotec Tigrigna.

We are onboarding pilot users in the Horn of Africa diaspora, NGOs, broadcasters, and translation teams. Leave your email and we'll reach out when access opens.