Built a project that demonstrates data cleaning and analysis skills in R, focusing on South Africa’s tourist accommodation sector (2007–2024) using Statistics SA data.
I performed time-series forecasting, comparing ARIMA and SARIMA models. SARIMA outperformed ARIMA, achieving lower RMSE and MAPE,
and provided reliable post-COVID recovery forecasts. The results reveal uneven recovery across accommodation types, with hotels showing the strongest resilience,
offering insights to support data-driven tourism policy and planning.
Built an interpretable decision tree model in R to predict loan approval outcomes, achieving 98% accuracy and strong performance on imbalanced data.
Skills demonstrated:
R, decision trees, classification modeling, class imbalance handling, stratified sampling, model evaluation, business interpretation, ethical ML awareness.
Built a time-series regression model to forecast stock market volatility using historical price data.
I used Apple stock data from Yahoo Finance, engineered rolling volatility as the target variable,
and trained a linear regression model to predict future volatility based on return behavior in python.
Built a project that uses numerical modeling in Python to solve second-order differential equations describing oscillatory systems such as springs and electrical circuits. Euler’s and RK2 methods were implemented
and evaluated for accuracy, stability, and error against exact solutions. Results showed that RK2 is more accurate and reliable, demonstrating a data-driven approach to modeling real-world dynamic systems.
Built a Random Forest classification model to predict customer churn using an 80/20 train-test split.
Implemented proper data leakage removal and addressed class imbalance using ROSE resampling.
Evaluated model performance using accuracy and ROC-AUC. Identified key churn drivers including
contract type, tenure, and monthly charges to provide actionable business insights.
Built an end-to-end E-commerce SQL analytics project featuring database schema design, data cleaning, constraints,
indexing, and advanced business intelligence queries. Includes revenue analysis, customer lifetime value, cohort retention,
ranking functions, and performance optimization using PostgreSQL.