Good African language datasets for numerous NLP tasks.

Africa has over 2000 languages however, these languages are not well represented in the existing Natural language processing (NLP) ecosystem. One of the challenges is the lack of useful African language datasets that can be used to solve different social and economical problems.

In this article, I have compiled a list of African language datasets from across the web. These datasets can be used in numerous NLP tasks such as text classification, named entity recognition, machine translation, sentiment analysis, speech recognition, and topic modeling.

This collection of datasets have been made public to give you an opportunity to use your…


Scikit-learn remains one of the most popular open-source and free machine learning libraries for Python. The scikit-learn library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering, and dimensionality reduction.

Many data scientists, machine learning engineers, and researchers rely on this library for their projects. I personally love using the scikit-learn library because it offers a ton of flexibility and it’s easy to understand its documentation with a lot of examples.

In this article, I’m happy to share with you the 5 best new features in scikit-learn 0.24.

Install the Latest Version of the Scikit-Learn Library

Firstly, make sure…


A simple trick to Improve Model Performance.

Photo by from

When you training a machine learning model, you can have some features in your dataset that represent categorical values. Categorical features are types of data that may be divided into groups.

There are three common categorical data types which are:

  1. Ordinal — This has a set of orders. Example: rating happiness on a scale of 1–10
  2. Binary — This has only two values. Example: Male or Female
  3. Nominal — This does not have any set of orders. Example: Countries

Most machine learning algorithms require numerical input and output variables. Therefore you will have to transform categorical features in your dataset…


They say data is the new oil, but we don’t use the oil directly from its source. It has to be processed and cleaned before we use it for different purposes.
The same applies to data, we don’t use it directly from its source. It also has to be processed.


The first virtual hackathon for East Africa countries.

A few weeks ago, I and fellow Zindi ambassadors from East Africa organized the first East African virtual machine learning hackathon called . The virtual hackathon was a private hackathon open to participants from East Africa Countries (Tanzania, Kenya, Malawi, Uganda, and Rwanda).

If you don’t know , it is Africa’s largest data science competition platform, solving complex challenges using artificial intelligence (AI) and machine learning (ML). …


Photo by from

I remember the first time I created a simple machine learning model. It was a model that could predict your salary according to your years of experience. And after making it, I was curious about how I could deploy it into production.

If you have been learning machine learning, you might have seen this challenge in online tutorials or books. You can find if you are interested.

It was really difficult for me to figure out where I could deploy my model. …


Convert Trained ML Model into Programming Langauge of Your Choice.

Image by from

Most trained machine learning models are saved as . This file type is the standard way of serializing and de-serializing objects in Python.

In order to make predictions, you need to load the saved trained model and then perform predictions from the inputs provided.

In this article, you will learn how to use the m2cgen Python library to convert the trained machine learning model into native code (for example Python, PHP, or JavaScript) with zero dependencies. Then you’ll make predictions based on it.

What is the m2cgen Python Library?

m2cgen (Model 2 Code Generator) is a simple Python library that converts a trained machine learning…


From a Text Data with Multiple Languages To a Single Language

Image by from

Happy new year to you, 2021 is here and you did it 💪. 2020 is now behind us, and even though 2020 has been a tough and strange year for many people around the world, there’s still a lot to celebrate. In 2020, I learned that all we need is the love & support of our loved ones, family members, and friends.

“In the face of adversity, we have a choice. We can be bitter, or we can be better. Those words are my North Star.”- Caryn Sullivan

This will be my first article for 2021, and I will talk…


Build a ML Model to predict Tourist Expenditure in Tanzania

Last week I organized a Tourism Machine Learning Hackathon called hosted & supported by and during the Second Pycon conference here in Tanzania.

If you don’t know Zindi Africa, in short, Zindi Africa is Africa’s largest data science competition platform, solving complex challenges using artificial intelligence (AI) and machine learning (ML). The platform allows data scientists across the African continent to compete to solve challenges that focus on transport, health, social impacts, agriculture, African languages, electricity, or economics, to name a few.


A simple way to deploy the NLP model on a serverless production, step by step.

Photo by from

Did you know that 90% of machine learning models never actually make it into production?

This means that the topic of machine learning deployment is rarely discussed when people learn machine learning. As a result, many AI practitioners know how to create useful ML models, but they find it difficult to deploy them into production.

Needless to say, machine learning deployment is one of the more important skills you should have if you’re going to work with ML models.

Davis David

Data Scientist | AI Practitioner | Software Developer. Giving talks, teaching, writing. Contact me to collaborate

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store