Customer Churn Prediction

An end-to-end machine learning project that predicts which customers are likely to churn and highlights the most influential factors behind the risk.

What was done?

Data processing

Cleaned customer records, removed identifiers and leakage, encoded categorical fields, and prepared the dataset for modeling.

Training

Trained an interpretable logistic regression classifier and validated it using standard metrics to ensure stable churn predictions.

Creating dashboard

Exported results for dashboard use and created a clear report for stakeholders.

Insights

Generated explanations for churn risk, including feature importance and SHAP visualization to explain model decisions.

Results

The most important project outputs are linked below: the final PDF, the raw churn dataset, and the SHAP values image.

Powerbi Dashboard (PDF) View Raw churn dataset (XLSX) Download SHAP values image Open

See the GitHub repo for full code and notes:

github.com/whitemustang47/Customer-Churn-Prediction

Key takeaways

1. Churn is concentrated in a small segment

Most customers are stable, but a small High-risk group shows an extremely high churn probability. This allows Customer Success teams to focus efforts where they matter most.

2. Risk segmentation is actionable

Customers are clearly divided into Low, Medium, and High risk. This supports targeted retention strategies instead of broad, inefficient campaigns.

3. High-risk customers drive nearly all churn

Churn is almost entirely coming from the High-risk segment. Reducing churn in this group has the biggest impact on overall performance.

4. Clear understanding of churn drivers

SHAP analysis highlights the main factors influencing churn (e.g., contract type, tenure, internet service). This gives product, pricing, and customer success teams concrete levers to reduce churn.

5. Customer-level insights enable direct action

The dashboard identifies exactly which customers are at risk, enabling personalized outreach and data-driven retention decisions.