Home > Archive > Volume 58, No. 2 > This paper

Corpora and Corpus Tools for Indigenous African Languages: The Case of an IsiZulu-English Spoken Word Code-Switching Corpus

Gugulethu Mazibuko and Hloniphani Ndebele


This study reports on the analysis of a specialized isiZulu-English code-switching corpus of isiZulu speakers at Inanda, Ntuzuma and KwaMashu in Durban. In this paper, we argue that corpus development and analysis (corpus tools) is a vital initiative in the development of indigenous African languages in South Africa. We further argue that spoken word corpora are significant in providing a clear picture of language use in its natural setting. Information and Communication Technology localization, of which corpus development and analysis is a part, is an important initiative that promotes the development and intellectualization of the previously marginalized indigenous African languages. A number of corpus tools have been designed more specifically to analyze European languages corpora, such as English, while little or no effort is invested in indigenous African languages. As a result some of these corpus tools have been adopted in order to analyze indigenous African languages, though with some challenges. The study employed a mixed-method approach. The WordSmith tools version 6, wordlist function, was employed for the analysis of the corpus.

   Download PDF