. Mathematical Sciences Graduate | Data Analysis • Machine Learning • TIME SERIES• SQL | The Data Laboratory

The Data Laboratory Katlego Mathebula

Junior Data Scientist Skilled in Machine Learning , Time Series, PostgreSQL, Python and R. @katlegoTheScientist
Continue


Forecasting Tourist Accommodation Trends in South Africa

Built a project that demonstrates data cleaning and analysis skills in R, focusing on South Africa’s tourist accommodation sector (2007–2024) using Statistics SA data. I performed time-series forecasting, comparing ARIMA and SARIMA models. SARIMA outperformed ARIMA, achieving lower RMSE and MAPE, and provided reliable post-COVID recovery forecasts. The results reveal uneven recovery across accommodation types, with hotels showing the strongest resilience, offering insights to support data-driven tourism policy and planning.

Loan Approval using Machine Learning Decision Tree

Built an interpretable decision tree model in R to predict loan approval outcomes, achieving 98% accuracy and strong performance on imbalanced data. Skills demonstrated: R, decision trees, classification modeling, class imbalance handling, stratified sampling, model evaluation, business interpretation, ethical ML awareness.

Stock Volatility Forecasting
Time-Series Regression

Built a time-series regression model to forecast stock market volatility using historical price data. I used Apple stock data from Yahoo Finance, engineered rolling volatility as the target variable, and trained a linear regression model to predict future volatility based on return behavior in python.

Numerical Solutions of Differential
Equations and Oscillations

Built a project that uses numerical modeling in Python to solve second-order differential equations describing oscillatory systems such as springs and electrical circuits. Euler’s and RK2 methods were implemented and evaluated for accuracy, stability, and error against exact solutions. Results showed that RK2 is more accurate and reliable, demonstrating a data-driven approach to modeling real-world dynamic systems.

Telco Customer Churn
Prediction Random Forest CLssification

Built a Random Forest classification model to predict customer churn using an 80/20 train-test split. Implemented proper data leakage removal and addressed class imbalance using ROSE resampling. Evaluated model performance using accuracy and ROC-AUC. Identified key churn drivers including contract type, tenure, and monthly charges to provide actionable business insights.

Olist Ecommerce
using PostgreSQL

Built an end-to-end E-commerce SQL analytics project featuring database schema design, data cleaning, constraints, indexing, and advanced business intelligence queries. Includes revenue analysis, customer lifetime value, cohort retention, ranking functions, and performance optimization using PostgreSQL.