Launch of Kaydka Af Soomaaliga (Somali Corpus)

Somali Corpus is a grammatically annotated electronic corpus of Somali language and literature. With its almost 3 million Somali words tagged; 1100 documents parsed and indexed; over 52000 headwords collected from major monolingual dictionaries; almost 6 million inflected forms generated through a rule-based morphological parser; and over 10 thousand translations in four languages (Somali vs English, Italian, France and Swedish), the Somali Corpus becomes one of the most populated corpora of African Languages, available as of today.

