Africa has over 2000 languages however, these languages are not well represented in the existing Natural language processing (NLP) ecosystem. One of the challenges is the lack of useful African language datasets that can be used to solve different social and economical problems.
In this article, I have compiled a list of African language datasets from across the web. These datasets can be used in numerous NLP tasks such as text classification, named entity recognition, machine translation, sentiment analysis, speech recognition, and topic modeling.
This collection of datasets have been made public to give you an opportunity to use your…
Scikit-learn remains one of the most popular open-source and free machine learning libraries for Python. The scikit-learn library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering, and dimensionality reduction.
Many data scientists, machine learning engineers, and researchers rely on this library for their machine learning projects. I personally love using the scikit-learn library because it offers a ton of flexibility and it’s easy to understand its documentation with a lot of examples.
In this article, I’m happy to share with you the 5 best new features in scikit-learn 0.24.
Firstly, make sure…
When you training a machine learning model, you can have some features in your dataset that represent categorical values. Categorical features are types of data that may be divided into groups.
There are three common categorical data types which are:
They say data is the new oil, but we don’t use the oil directly from its source. It has to be processed and cleaned before we use it for different purposes.
The same applies to data, we don’t use it directly from its source. It also has to be processed.
A few weeks ago, I and fellow Zindi ambassadors from East Africa organized the first East African virtual machine learning hackathon called AI4D Swahili News Classification Challenge. The virtual hackathon was a private hackathon open to participants from East Africa Countries (Tanzania, Kenya, Malawi, Uganda, and Rwanda).
If you don’t know Zindi, it is Africa’s largest data science competition platform, solving complex challenges using artificial intelligence (AI) and machine learning (ML). …
I remember the first time I created a simple machine learning model. It was a model that could predict your salary according to your years of experience. And after making it, I was curious about how I could deploy it into production.
If you have been learning machine learning, you might have seen this challenge in online tutorials or books. You can find the source code here if you are interested.
It was really difficult for me to figure out where I could deploy my model. …
Most trained machine learning models are saved as pickle files. This file type is the standard way of serializing and de-serializing objects in Python.
In order to make predictions, you need to load the saved trained model and then perform predictions from the inputs provided.
m2cgen (Model 2 Code Generator) is a simple Python library that converts a trained machine learning…
Happy new year to you, 2021 is here and you did it 💪. 2020 is now behind us, and even though 2020 has been a tough and strange year for many people around the world, there’s still a lot to celebrate. In 2020, I learned that all we need is the love & support of our loved ones, family members, and friends.
“In the face of adversity, we have a choice. We can be bitter, or we can be better. Those words are my North Star.”- Caryn Sullivan
This will be my first article for 2021, and I will talk…
Last week I organized a Tourism Machine Learning Hackathon called Tanzania Tourism Prediction Challenge hosted & supported by Zindi Africa and Tanzania Pycon Community during the Second Pycon conference here in Tanzania.
If you don’t know Zindi Africa, in short, Zindi Africa is Africa’s largest data science competition platform, solving complex challenges using artificial intelligence (AI) and machine learning (ML). The platform allows data scientists across the African continent to compete to solve challenges that focus on transport, health, social impacts, agriculture, African languages, electricity, or economics, to name a few.
Did you know that 90% of machine learning models never actually make it into production?
This means that the topic of machine learning deployment is rarely discussed when people learn machine learning. As a result, many AI practitioners know how to create useful ML models, but they find it difficult to deploy them into production.
Needless to say, machine learning deployment is one of the more important skills you should have if you’re going to work with ML models.