DATA SCIENCE • AI • INNOVATION

💫 About Me

Kipkemoi Vincent

Hi there — I'm Vincent, a passionate Data Scientist with over 5 years of experience turning messy data into meaningful solutions across climate, fintech, healthcare, agri-tech, and capital markets. I specialize in building and automating machine learning systems that not only work — but scale, integrate, and deliver real business value.

Whether it's deploying models with Docker and Kubernetes, managing pipelines with MLflow, or optimizing predictive performance, I enjoy the challenge of bringing ideas to life — from notebook to production.

When I'm not deep in code or fine-tuning a model, you'll probably find me exploring the outdoors, reading about behavioral science, or mentoring upcoming data talent. I believe great data science is not just technical — it's creative, curious, and deeply human.

I'm always open to remote opportunities or roles based in Nairobi, where I can contribute immediately, grow with a team, and help shape the future with data.

🎓 Education

The Cyprus Institute

MPhil. in Environmental Sciences

The Cyprus Institute, Nicosia, Cyprus
Scholarship: Cyprus Institute Merit Scholarship
Grade: 84%

AIMS South Africa

MSc. in Mathematical Sciences (Data Science)

University of Western Cape / AIMS, South Africa
Scholarship: MasterCard Foundation Scholarship
Grade: Good Pass [70–84%]

University of Nairobi

BSc. in Mathematics (Statistics & Operations Research)

University of Nairobi, Kenya
Scholarship: Finlays Undergraduate Scholarship
Grade: First Class Honors

👨‍💻 Skills & Certifications

Kipkemoi Vincent

🛠️ Skills

  • Stack: Python, R, SQL, LaTeX, Git, Jupyter, VS Code
  • Data platforms: PostgreSQL, MongoDB, DBeaver, Preset
  • Visualization: Matplotlib, Seaborn, Plotly, Power BI
  • GenAI & LLMs: GPT, Mistral, LLaMA via OpenAI, LangChain
  • Dashboards & Apps: Streamlit, Tableau, Django
  • ML models: LR, RF, ANN, SVR, XGBoost, CatBoost, LightGBM, LSTM
  • Frameworks: Scikit-learn, TensorFlow, PyTorch
  • Anomaly Detection: Isolation Forest, LOF, PyOD models
  • MLOps: MLflow, Docker, FastAPI, Kubernetes
  • Cloud & Infra: GCP, AWS, Render, BigQuery
  • 📜 Key Certifications

    • AWS Machine Learning Learning Plan – Amazon Web Services (AWS)
    • MLops with Databricks – LinkedIn Learning
    • AWS Cloud Practitioner Essentials – Amazon Web Services (AWS)
    • Applied Data Science – Prospect 33 (Global Data Lab)
    • Data Science Application in Financial Services – Prospect 33 (Global Data Lab)
    • AWS: Introduction to Generative AI – Amazon Web Services (AWS)
    • Data & Databases – Prospect 33 (Global Data Lab)
    • ETL & ELT – Prospect 33 (Global Data Lab)
    • Data Quality Framework – Prospect 33 (Global Data Lab)

💼 Recent Roles

Lead - Data Science/AI Innovation

CCI Global, Nairobi, Kenya
Jul 2025 – Present

  • Spearheading AI innovation by leading a high-performing team of AI developers to create innovative, scalable solutions in ML/AI, NLP, GenAI and Automation. Bridging strategy and technology to deliver impactful, secure, and scalable AI products.
  • Leading the design and deployment of advanced AI tools including sentiment analysis systems and QA automation platforms using Python, LLMs, and RAG frameworks, enabling smarter business processes and reducing manual overhead.
  • Builting a strong AI team from the ground up, mentoring junior developers, instilling a culture of innovation, and aligning technical delivery with strategic business goals across multiple departments.
  • Institutionalizing AI best practices across security, compliance, and model governance while identifying internal tools with commercialization potential—positioning the team as a strategic innovation hub.

Data Scientist - Global Data Lab

Prospect 33, New York, USA (Remote – Nairobi, Kenya)
Feb 2025 – June 2025

  • Responsible for developing and deploying active learning–based anomaly detection AI solutions and collaborating with stakeholders to design integration strategies for these solutions across business systems.
  • Implemented an active-learning Isolation Forest pipeline incorporating domain feedback to detect fraud anomalies while improving recall and significantly reducing false positives.
  • Engineered robust anomaly detection platforms (DIVA and LEAP) for capital markets, integrating explainability (SHAP, LIME, gradients) and scalable backend/frontend stacks (Django, DRF, Python) to support real-time analysis.
  • Acted as a bridge between technical teams and business leaders, aligning model development with enterprise needs and supporting regulatory transparency across financial risk analytics initiatives.

Data Scientist-Fraud and Credit Risk

Twino (Moneza Kenya), Westlands, Nairobi, Kenya
July 2022 – Jan 2025

  • Developed and managed credit risk models for PDL and BNPL products, collaborating with stakeholders to deploy them effectively—supporting risk mitigation, enhancing credit decisions, and improving overall portfolio performance.
  • Created production-ready credit scoring and fraud models using XGBoost and CatBoost with SEON and TransUnion data sources, boosting approval accuracy while reducing fraudulent applications.
  • Built real-time dashboards and automated reports using PostgreSQL and Preset to track defaults, payments, and collections—enabling data-driven decisions and faster credit risk assessments.
  • Led a team of analysts while collaborating across business and compliance units, integrating model outputs into decision pipelines and reinforcing governance in credit and fraud operations.

💻 Featured Projects

🤖 Lending Automation - ML for Credit Scoring

Credit Scoring

This project delivers an automated loan approval system using ML models like Random Forest, XGBoost, and LightGBM. It replaces manual decisions with fast, scalable processes. Key tasks included cleaning data, feature engineering, and evaluating model fairness—enabling personalized lending, dynamic pricing, and more accurate, data-driven credit scoring.

🛡️ Anomaly and Fraud Detection in Finance

Anomaly Detection

This project uses PyOD and FLAML AutoML to detect anomalies in imbalanced credit card datasets. Models like Isolation Forest and Autoencoders are evaluated using precision, recall, and ROC-AUC. Sampling techniques (SMOTE, oversampling) enhance model performance, producing a high-precision fraud detection pipeline for robust financial risk management.

🌍 Air Quality Monitoring in Nicosia, Cyprus

Air Quality Monitoring

This project calibrates low-cost air sensors using ML models like XGBoost and ANN to match reference-grade accuracy. It analyzes six months of data to study sampling strategies, calibration frequency, and environmental interference. Results support affordable, large-scale urban air quality monitoring that meets EU and EPA standards.

🔮 Vincent Chatbot – Personalized AI Assistant

Vincent Chatbot

This project builds an AI chatbot powered by large language models to answer questions about Vincent’s professional profile. Using vector embeddings and FastAPI, it enables contextual responses based on uploaded documents. Deployed on Render.com, it demonstrates skills in NLP, API development, and cloud deployment for personalized AI applications.

🏥 Healthcare Accessibility in Nairobi, Kenya

Healthcare Accessibility

This project evaluates healthcare access in Nairobi using demographic and facility data. Anchored in SDG 3, it identifies service gaps by analyzing population coverage, operational hours, and accessibility. Insights support policy interventions aimed at strengthening healthcare delivery and promoting well-being for all city residents.

🌐 My Personal Portfolio Website

Portfolio Website

In this project, I created my personal website that houses a portfolio of my work and highlights key projects in data science and AI. The website demonstrates my proficiency in front-end development using HTML, CSS, and JavaScript, with a focus on building responsive and user-friendly layouts. Through this project, I developed skills in web design, layout structuring, and interactive user interface development, all aimed at presenting information clearly and effectively.

🩺 COVID-19 Detection using Deep Learning

COVID-19 Detection

This project uses CNNs (ResNet50, DenseNet169, MobileNetV2) to detect COVID-19 from chest CT scans. Through preprocessing, augmentation, and ensemble modeling, it achieves high accuracy. Transfer learning boosts performance on real-world data, providing fast and reliable diagnosis support for clinical decision-making in pandemic response.

📉 Customer Churn Prediction

Customer Churn

This project predicts customer churn using survival analysis and ML models, enabling telecoms to target at-risk users. It analyzes historical behavior to estimate churn risk and customer lifetime value. An interactive tool supports data-driven retention strategies, helping reduce attrition and improve loyalty through personalized interventions.

📊 Sales Forecasting

Sales Forecasting

This project analyzes sales data using ARIMA, SARIMA, and Prophet to forecast trends and seasonality. The workflow includes preprocessing, stationarity testing, model tuning, and evaluation. Insights help optimize inventory, resource planning, and decision-making—reducing forecast errors and improving operational efficiency through accurate, data-driven sales forecasting.

📞 Get In Touch

Let's discuss your next data science project

📰 Weekly Newsletter

Stay updated with data science insights