Meet The Top Winners of East African Virtual ML Hackathon 2021

The first virtual hackathon for East Africa countries.

Davis David
7 min readMar 18, 2021

A few weeks ago, I and fellow Zindi ambassadors from East Africa organized the first East African virtual machine learning hackathon called AI4D Swahili News Classification Challenge. The virtual hackathon was a private hackathon open to participants from East Africa Countries (Tanzania, Kenya, Malawi, Uganda, and Rwanda).

If you don’t know Zindi, it is Africa’s largest data science competition platform, solving complex challenges using artificial intelligence (AI) and machine learning (ML). The platform allows data scientists across the African continent to learn data science, have fun, and compete to solve challenges that focus on transport, health, social impacts, agriculture, African languages, electricity, or economics, to name a few.

Zindi Africa

The objective of the virtual hackathon was to develop a multi-class classification model to classify news articles according to their specific categories. The model can be used by Swahili online news platforms to automatically group news according to their categories and help readers find the specific news they want to read. In addition, the model will contribute to a body of work ensuring that Swahili is represented in apps and other online products in the future.

In just 60 hours, the East African virtual hackathon attracted 77 data scientists from selected EA countries, with 54 placing on the leaderboard. The participants compete for three days to find the best solution for the challenge. The first winner from each country received a prize from Zindi.

Here is the list of winners from each country:

If you remember well, in 2020, I organized the same hackathon for Tanzania only (you can see it here). The main difference with this hackathon is that it has more data and categories than the previous hackathon. The dataset for this hackathon contains more than 30,000 news articles and 6 different categories (news topics).

AI4D and its partners support the creation of this Swahili dataset through the Language Dataset Fellowships program. AI4D-Africa is a network of excellence in AI in sub-Saharan Africa. It is aimed at strengthening and developing community, scientific and technological excellence in a range of AI-related areas.

It is composed of African AI researchers, practitioners, and policymakers. This program supports the creation of more than 9 datasets from different African languages. To learn more about AI4D, click here.

A special thank you to the 1st, 2nd, 3rd, and 5th place winners (according to the Leaderboard) for sharing some insights into how they succeeded in this challenge so we can learn from them.

1st Winner: Daudi Nkanda from Tanzania.

Daudi Nkanda

Tell us a bit about yourself?

I’m a recent electrical engineering graduate. I am also a data science enthusiast.

Tell us a bit about your solution and the approach you took for this hackathon?

I was inspired to go with ULMFiT approach where instead of jumping directly to the classifier, fine-tune a pretrained language model to the entire corpus and then use that as the base for a classifier.

What were the things that made the difference for you that you think others can learn from?

Its all about keeping up to date with the state of art techniques.

2nd Winner: Darius Moruri from Kenya.

Darius Moruri

Tell us a bit about yourself?

I am an experienced Data Scientist passionate about creating and using artificial intelligence to positively contribute and improve the livelihoods of people in our society. I hold a Bachelors in Economics from The University of Nairobi and a Data Science Nano Degree from Moringa School.

Tell us about the approach you took?

My overall approach involved:
1. Performing an extended EDA to get a feel of how the data was distributed across the various article classes.
2. Cleaning the data by removing trailing spaces, new lines, and tab spaces.
3. Trying out different types of pre-trained models with different architectures 4. Trying out different cross-validation techniques to avoid overfitting.
5. Trying out different model parameters to get optimal parameters with the best score.

Tell us a bit about your solution and the approach you took for this hackathon?

My general approach was to use a multi-lingual deep learning model pre-trained on different languages more so the Swahili dialect. I used the free GPU resources provided by google in collab.

The only preprocessing I did on the data involved removing trailing spaces, new lines, and tab spaces.

I used the tensor flow wrapper — ktrain to train the xlm-roberta-base pre-trained model from hugging face across a five-fold stratified validation.

As the hackathon was only for three days, instead of using a Bayesian approach to tune my model which takes a while to run, I opted to do a little bit of research on the recommended parameters from the authors of the model. That’s how I settled on the params of max_len 256, batch size 16, folds 3, epochs 2 and a learning rate of 3e-5

What were the things that made the difference for you that you think others can learn from?.

Taking time to understand the problem statement and doing focused research yields better results as compared to guessing or randomly trying out different ideas

You can access Darius’s notebook for this challenge here https://github.com/DariusTheGeek/Swahili-News-Classification-Challenge-Zindi

You can also reach Yakobo on Twitter here

3rd Winner: Michael Samwel Mollel from Tanzania.

Michael Samwel Mollel

Tell us a bit about yourself?

A Ph.D. student and researcher in mobility management for mm-wave communication.

Tell us about the approach you took?

The approach I considered is the use of Bert Transformer.

Tell us a bit about your solution and the approach you took for this hackathon?

Because there were imbalanced datasets, first I oversample some on the dataset, and then I used Kfold to reduce the effect of the imbalanced dataset.

What were the things that made the difference for you that you think others can learn from?

The use of stop words helps me a lot to boost the CV.

5th Winner: Yakobo Kyombo Mollel from Tanzania.

Yakobo Kyombo

Tell us a bit about yourself?

Fourth-year Telecommunication engineering student at the University of Dodoma with a passion for data science and machine learning, located in Dodoma City, Tanzania. My home region is Mbeya city, Tanzania.

Tell us about the approach you took?

In my model, I used a Roberta-large pre-trained model package from hugging face transformers.

Tell us a bit about your solution and the approach you took for this hackathon?

The major challenge in preprocessing was to remove stopwords for there are no packages for Swahili language stopwords like it is for the English language. I then had to generate new ones which were few but they were helpful. I used a Roberta-large pre-trained model from hugging face transformers with default parameters.

What were the things that made the difference for you that you think others can learn from?

In text classification and NLP in general transformers plays a great role in bringing better predictions.

You can access Yakobo’s notebook for this challenge here https://github.com/Yakobo-ky/Swahili-hackthon.

You can also reach Yakobo on Twitter here.

Conclusion

Photo by Giftpundits.com from Pexels

I want to say thank you to my fellow ambassadors from East Africa, the Zindi Team, and AI4D for making this first East African virtual machine learning hackathon successful. 🙏 🙏

Are you still interested in testing your machine learning skills in NLP? Zindi is currently running more than 5 NLP challenges focusing on Africa languages with prizes of USD 2000. Here is the list:

Before you leave

Please share it so that others can see it. Feel free to leave a comment too. Till then, see you in the next post! I can also be reached on Twitter @Davis_McDavid.

One last thing: Read more articles like this in the following links.

--

--

Davis David
Davis David

Written by Davis David

Data Scientist 📊 | Software Developer | Technical Writer 📝 | ML Course Author 👨🏽‍💻 | Giving talks. Check my new ML course: https://bit.ly/OptimizeMLModels

Responses (1)