Meet The Winners of Swahili News Classification Challenge

The first Zindi Africa NLP Virtual Hackathon focus on African Languages.

Davis David
8 min readJul 8, 2020
Photo by Jill Wellington from Pexels

“With Zindi there is either winning or learning, there is no losing” — Celina Lee

This article was updated on 23 Dec 2020.

A few weeks ago I organized a virtual Hackathon called Swahili Machine Learning Virtual Hackathon hosted & supported by Zindi Africa for data scientists in Tanzania only.

If you don't know Zindi Africa, in short, Zindi Africa is Africa’s largest data science competition platform, solving complex challenges using artificial intelligence (AI) and machine learning (ML). The platform allows data scientists across the African continent to compete to solve challenges that focus on transport, health, social impacts, agriculture, African languages, electricity, or economics, to name a few.

Zindi Africa Photos

This hackathon was the first virtual hackathon focus on African Language. In just 60 hours, the virtual hackathon attracted 30 data scientists from across different cities and communities in Tanzania, with 14 placing on the leaderboard.

Zindi Virtual Hackathon

Why the Swahili Language?

Swahili (also known as Kiswahili) is one of the most spoken languages in Africa. It is spoken by 100–150 million people across East Africa. In Tanzania, it is one of two national languages (the other is English) and it is the official language of instruction in all schools.

Top language in Africa

“A language that needs no introduction, Swahili is an African language of global presence. With speakers of Swahili in countries outside of the African continent such as the USA and Saudi Arabia, Swahili has both native speakers and second language users in large numbers. In Africa, Swahili (also known as Kiswahili) is the official language of Tanzania and Kenya and also spoken by countries such as Uganda, Rwanda, and Burundi amongst others. Swahili is popularly used as a second language by people across the African continent and taught in schools and universities. Swahili has been influenced by Arabic and even had an Arabic script during its early years. Given its presence within the continent and outside, learning Swahili is a popular choice for many language enthusiasts.” — Cudoo

The objective of the virtual hackathon was to develop a multi-class classification model to classify news content according to their specific categories. The model can be used by Swahili online news platforms to automatically group news according to their categories and help readers find the specific news they want to read. In addition, the model will contribute to a body of work ensuring that Swahili is represented in apps and other online products in the future.

News in Swahili is an important part of the media sphere in Tanzania. News contributes to education, technology, and the economic growth of a country, and news in local languages plays an important cultural role in many Africa countries. In the modern age, African languages in news and other spheres are at risk of being lost as English becomes the dominant language in online spaces.

The winners of this challenge are Waziri Shebogholo from Dodoma City in 1st place, Michael Samweli Mollel from Arusha city in 2nd place, and Daudi Nkanda from Dar es salaam city in 3rd place.

A special thank you to the 1st, 2nd, and 3rd place winners for sharing some insights into how they succeeded in this challenge so we can learn from them.

1st Winner: Waziri Shebogholo

First Winner

Tell us a bit about yourself?

I am CTO at Belltro, a startup geared to transform the state of Conversational AI in Africa. We work specifically on Natural Language and Speech Processing to develop technologies that make it easier for digital assistants to understand human language, mostly in Africa. I have been involved in a couple of NLP projects from Sentiment Analysis, Chatbots, NMT to identifying Adverse Drug Reaction(ADR) on EHR.

Tell us about the approach you took?

I used transformers (bert-base-multilingual-cased) from huggingface, a library that provides Natural Language Processing(NLP) architectures. I started with creating a baseline model using TF-IDF and Naive Bayes Classifier which didn’t perform well so I had to do transfer learning using Huggingface Transformer library.

What were the things that made the difference for you that you think others can learn from?

With all the techniques I tried, I think using cross-validation made all the difference in this competition. I tried making a submission without any cross-validation strategy and it didn’t give me a higher position on the leaderboard.

What are the biggest areas of opportunity you see in AI in Tanzania over the next few years?

With the current state in AI, AI education will increase leading to relatively more practitioners. Chatbots will be a means to interact with companies from Banks to Telecom companies and probably government institutions. AI will keep on attracting more attention in Agriculture and Healthcare.

What are you looking forward to most about the Zindi community?

I am so happy that Zindi provides a platform for data enthusiasts to learn and practice data science skills. The ever increasing Zindi community, project that we’ll be having many practitioners in the field of Data Science and Machine Learning in Africa.

What are the biggest areas of opportunity you see in AI in Africa over the next few years?

Online presence is increasing in Africa which provides opportunities for tech companies to connect people even more, regardless of the huge diversity. Helping people with translation of the online content and speech technologies to help with manual operations might be a very big advantage in Africa. Agriculture and medical diagnosis in healthcare are other areas which can potentially be affected by AI.

You can follow Waziri on twitter here.

2nd Winner: Michael Samwel Mollel

Second Winner

Tell us a bit about yourself?

I am a PhD student at the Nelson Mandela African Institution of Science and Technology located in Arusha, Tanzania. My research area focuses on future mobile systems, and currently, I am working on the handover management problem caused by the use of millimetre wave.

Tell us about the approach you took?

To solve Swahili language problem, I used Ktrain. Ktrain is the Keras wrapper library which enables easy to access many deep learning models. Since I realize the dataset was not enough to generalize the model, then I use the pretrain model known as ‘Bert-multilingual-uncased.’. I search for different hyperparameters and be able to optimize the model.

What were the things that made the difference for you that you think others can learn from?

The use of a pre-train model such as Bert

What are the biggest areas of opportunity you see in AI in Tanzania over the next few years?

I can see AI will deeply penetrate in the agriculture sector. The fact the Tanzania economy heavily depends on agriculture and the use of AI will bring benefits to farmers. AI will also being used for various applications, such as medicine, technology and other sectors in the country.

What are you looking forward to most about the Zindi community?

From my observation, I see the few people turn out to join the competition is due to lack of local events like this one. I think Zindi should organize and promote more on the local competition so as it will have a positive impact on the many students within the given country.

What are the biggest areas of opportunity you see in AI in Africa over the next few years?

Different sectors will soon shift to the use of AI such sector include automatic machine to self-driving cars. Africa is growing at an unexpected pace, and soon AI will be a driving factor to the African economy. Hence I can see AI is needed in every angle. Like the computer, also AI will be a necessary tool for everyday life in Africa.

You can access the source code of the second winner in the following link.

You can follow Michael on Twitter here.

3rd Winner: Daudi Nkanda

Third Winner

Tell us a bit about yourself?

I’m an electrical engineering undergraduate student. I am also a data science enthusiast

Tell us about the approach you took?

Besides the basic data cleaning and some tweaks against class imbalance, it was all down to the ML model to do the heavy lifting, it turns out catboost has an inbuilt way to classify even “text” features which really proved useful with a stable score!

What were the things that made the difference for you that you think others can learn from?

As so many have pointed out over other competitions, basic steps like data cleaning and at-least understanding of what you’re trying do will surely work.

What are the biggest areas of opportunity you see in AI in Tanzania over the next few years?

I think telecommunications companies and mass media here could potentially revolutionize their services if they tap into AI technologies.

What are you looking forward to most about the Zindi community?

I hope to learn so much more, there is nothing to keep you on your feet than competitive learning

What are the biggest areas of opportunity you see in AI in Africa over the next few years?

For Africa, it comes down to mainly agricultural, education and health sectors, so much room to improve through AI.

I hope you have learned some techniques you can apply in your NLP projects or competitions you are currently participating in. I also received a lot of emails and messages in social media from different people outside Tanzania who want to participate in the hackathon, the good news is that this hackathon will be re-opened as a knowledge competition in the near future. You can follow zindi Africa on twitter to get updates.

Resources

Here is a resource for you if you want to learn NLP techniques and start doing NLP projects.

Community

If you are interested to be part of the community that supports NLP by using African languages I recommend you to join Masakhane. Masakhane is a grassroots NLP community for Africa by Africans.

NOTE: The hackathon is now opened as a knowledge competition on the zindi platform, you can access it here.

Before you leave

Please share it so that others can see it. Feel free to leave a comment too. Till then, see you in the next post! I can also be reached on Twitter @Davis_McDavid.

One last thing: Read more articles like this in the following links.

--

--

Davis David
Davis David

Written by Davis David

Data Scientist 📊 | Software Developer | Technical Writer 📝 | ML Course Author 👨🏽‍💻 | Giving talks. Check my new ML course: https://bit.ly/OptimizeMLModels

No responses yet