Meet The Winners of Tourism Expenditure Prediction Challenge

Build a ML Model to predict Tourist Expenditure in Tanzania

Davis David
8 min readDec 21, 2020

Last week I organized a Tourism Machine Learning Hackathon called Tanzania Tourism Prediction Challenge hosted & supported by Zindi Africa and Tanzania Pycon Community during the Second Pycon conference here in Tanzania.

If you don’t know Zindi Africa, in short, Zindi Africa is Africa’s largest data science competition platform, solving complex challenges using artificial intelligence (AI) and machine learning (ML). The platform allows data scientists across the African continent to compete to solve challenges that focus on transport, health, social impacts, agriculture, African languages, electricity, or economics, to name a few.

Zindi Africa Photos

The Tanzania Pycon Conference is an annual gathering of Python programming language users in Tanzania which includes web developers, software developers, data scientists, data analysts, Ethical hackers, Researchers, IoT Engineers & techies from various organizations. The conference is organized by members of the Python Tanzania Users Group, a community dedicated to advancing the use of the Python language and technology in Tanzania. This annual gathering involves Workshops, technical presentations/talks, lightning talks, hackathons, research showcasing, community collaboration, and networking.

Pycon Conference 2020

Why Tanzania Tourism Sector?

The Tanzanian tourism sector plays a significant role in the Tanzanian economy, contributing about 17% of the country’s GDP and 25% of all foreign exchange revenues. The sector, which provides direct employment for more than 600,000 people and up to 2 million people indirectly, generated approximately $2.4 billion in 2018 according to government statistics. Tanzania received a record 1.1 million international visitor arrivals in 2014, mostly from Europe, the US, and Africa.

Tanzania is the only country in the world that has allocated more than 25% of its total area for wildlife, national parks, and protected areas. There are 16 national parks in Tanzania, 28 game reserves, 44 game-controlled areas, two marine parks, and one conservation area.

Tanzania’s tourist attractions include the Serengeti plains, which hosts the largest terrestrial mammal migration in the world; the Ngorongoro Crater, the world’s largest intact volcanic caldera and home to the highest density of big game in Africa; Kilimanjaro, Africa’s highest mountain; and the Mafia Island marine park; among many others. The scenery, topography, rich culture, and very friendly people provide for excellent cultural tourism, beach holidays, honeymooning, game hunting, historical and archaeological ventures, and certainly the best wildlife photography safaris in the world.

The objective of this hackathon is to develop a machine learning model to predict what a tourist will spend when visiting Tanzania. The model can be used by different tour operators and the Tanzania Tourism Board to automatically help tourists across the world estimate their expenditure before visiting Tanzania.

After running the hackathon for 3 days, the top 3 winners of this challenge are Daudi Nkanda from the University of Dar es salaam in 1st place, Anthony Mipawa from the University of Dodoma in 2nd place, and Frank E Anderson from the University of Agriculture in 3rd place.

A special thank you to the 1st, 2nd, and 3rd place winners for sharing some insights into how they succeeded in this challenge so we can learn from them.

1st Winner: Daudi Nkanda

Daudi Nkanda

Tell us a bit about yourself?

I’m a recent electrical engineering graduate. I am also a data science enthusiast

Tell us about the approach you took?

For the basic pipeline i started with some data exploration and data preprocessing, the only feature engineering that mattered for me was modifying “country” feature to provide “region” and “sub region” of each country (it made sense to me based on domain knowledge. I only considered Catboost and LightGBM for model selection but in the end LightGBM proved superior for this case.

What were the things that made the difference for you that you think others can learn from?

For this specific challenge, model selection, hyper parameter tuning and good local validation really provided an edge. Also feature engineering based on domain knowledge did provide some extra boost

What are the biggest areas of opportunity you see in AI in Tanzania over the next few years?

Specifically telecommunications companies and mass media here could potentially revolutionize their services if they tap into AI technologies, this can extend to other sectors such as Agriculture and Medicine

What are you looking forward to most about the Zindi community?

I hope to learn so much more, there is nothing to keep you on your feet than competitive learning

What are the biggest areas of opportunity you see in AI in Africa over the next few years?

For Africa, it comes down to mainly agricultural, education and health sectors, so much room to improve through AI

2nd Winner: Anthony Mipawa

Anthony Mipawa

Tell us a bit about yourself?

I’m fourth year Software engineering student, Community Manager of UDOMAI Community at University of Dodoma, passionate on Data Science field with 1 year experience skilled on python, Machine learning predictive Models, Natural Language Processing and computer vision. I believe that data tells us more than just numbers. My pressure to facilitate African students and young generation interested in doing Data science so that to rise our productivity and solving real life problems.

Tell us about the approach you took?

I used Ensemble Algorithm to create my model using Extreme Gradient Boosting, The most part I played with is on preprocessing categorical variables and generation of new features i tried some techniques on conversion of categorical variables one of them is frequency encoding but didn’t perform well and i decided to apply label encoder on variable with two categories only and get_dummies on other categorical variables. I tried to generate new features such as total number of people, total number of nights etc. Lastly I tried some algorithms to create model but xgboost performed well and from that baseline i decided to improve by tuning different parameters.

What were the things that made the difference for you that you think others can learn from?

I think spending much time on Feature Engineering , trying more than one Algorithm on building baseline model are things that differentiate with others.

What are the biggest areas of opportunity you see in AI in Tanzania over the next few years?

Health sector and Agriculture

What are you looking forward to most about the Zindi community?

I look forward to work with fellow Zindians to solve more real life problems using AI algorithms.

What are the biggest areas of opportunity you see in AI in Africa over the next few years?

Agriculture, industrial sector and Health sector

Second Winner

You can access Anthony’s notebook for this challenge here: 👇 https://github.com/Tonyloyt/TanzaniaTourism-Hackathon-2020-Second-place-Winning-Solution

You can follow Anthony on Twitter here 👉 https://twitter.com/loyttony to learn more from him.

3rd Winner: Frank E. Anderson

Frank E Anderson

Tell us a bit about yourself?

A fresh graduate from Sokoine University of Agriculture holding bachelor’s of science in environmental sciences and management. I like playing around with my notebook, very passionate in IoT and data science but very skilled in GIS and remote sensing.

Tell us about the approach you took?

I used ensemble based techniques using weight average of three different models trained on the same datasets. Models used are xgb, lgbm and catboost regressor. Preprocessing includes dropping of missing value in a specificity manner that hinder plunning of data (data was small). So I used one hot encoding to create value based features and I made to drop columns that have keywords that I replaced with NaN values.

What were the things that made the difference for you that you think others can learn from?

About model hyper parameter tuning. In fact this challenge was somehow easy to preprocess data but very hard to tune models . I utilized much time to tune my models manually because parameter searching using randomized didn’t work, gridsearch run for long time. So learning to tune models manually shall be a one point away from your opponent

What are the biggest areas of opportunity you see in AI in Tanzania over the next few years?

In agriculture, fintech and energy automation.

What are you looking forward to most about the Zindi community?

I’m the first student from Sokoine university to join Zindi and I look forward to spread this powerful platform to my university because many students are passionate but they dont know where to start implementing their theoretical skills to real world problem

What are the biggest areas of opportunity you see in AI in Africa over the next few years?

Agriculture, communication, environmental and fintech.

Third Winner

You can access Frank’s notebook for this challenge here: https://gitlab.com/dashboard/projects

Resources for Tourism in Tanzania

Do you want to visit Tanzania and spend quality time at the different national parks, historical sites and meet beautiful people, here are good resources for you to learn more and know where you can start!.

Wrapping Up

Davis David (Zindi Ambassador)

I want to say thank you to Zindi Africa and Pycon Community to make this machine learning hackathon successful.

I would like to wish you a happy holiday season, as well as a wonderful New Year! 2021. Take advantage of this time to reconnect with your family & friends 👪 . May 2021 bring you health, happiness, and success!

Before you leave

Please share it so that others can see it. Feel free to leave a comment too. Till then, see you in the next post! I can also be reached on Twitter @Davis_McDavid.

One last thing: Read more articles like this in the following links.

--

--

Davis David
Davis David

Written by Davis David

Data Scientist 📊 | Software Developer | Technical Writer 📝 | ML Course Author 👨🏽‍💻 | Giving talks. Check my new ML course: https://bit.ly/OptimizeMLModels

No responses yet