📊 Customer Churn Prediction using Machine Learning

An end-to-end Machine Learning project that predicts whether a telecom customer is likely to Churn or Stay using customer demographics, subscription details, billing information, and service usage patterns.

The project follows the complete Machine Learning lifecycle, including:

Data Cleaning
Exploratory Data Analysis (EDA)
Feature Engineering
Feature Scaling
Model Training
Model Comparison
Model Evaluation
Model Deployment using Streamlit

A total of 8 Machine Learning classification algorithms were trained and evaluated. Logistic Regression achieved the highest overall performance and was selected as the final model for deployment.

🌟 Project Highlights

🤖 Compared 8 Machine Learning Models
📈 Achieved 80.38% Accuracy
📊 Interactive Streamlit Dashboard
⚡ Real-Time Customer Prediction
📉 Customer Risk Assessment
💾 Model Serialization using Joblib
🧠 End-to-End Machine Learning Pipeline

📑 Table of Contents

Business Problem
Features
Project Structure
Dataset
Machine Learning Workflow
Technologies Used
Model Comparison
Results
Application Screenshots
Installation
Example Prediction
Learning Outcomes
Future Improvements
Contributing
License
Developer
Support

🎯 Business Problem

Customer churn is one of the biggest challenges for telecom companies and subscription-based businesses.

Retaining an existing customer is often much more cost-effective than acquiring a new one. By accurately predicting customers who are likely to leave, businesses can take proactive actions such as:

Offering personalized discounts
Improving customer support
Providing loyalty rewards
Enhancing customer satisfaction
Reducing revenue loss

This project demonstrates how Machine Learning can assist businesses in identifying customers at high risk of churning and support data-driven decision-making.

🚀 Features

📊 Customer Churn Prediction
🤖 Comparison of 8 Machine Learning Algorithms
🧹 Data Cleaning & Preprocessing
🔄 Missing Value Handling
🔢 One-Hot Encoding
📏 Feature Scaling using StandardScaler
📈 Exploratory Data Analysis (EDA)
📉 Correlation Heatmap
📊 Model Performance Comparison
🎯 Customer Risk Assessment
📈 Prediction Probability
🌐 Interactive Streamlit Dashboard
💾 Model Serialization using Joblib
⚡ Real-Time Customer Prediction

📂 Project Structure

Customer_Churn_Prediction/
│
├── app/
│   └── app.py
│
├── dataset/
│   └── WA_Fn-UseC_-Telco-Customer-Churn.csv
│
├── model/
│   ├── churn_model.pkl
│   ├── scaler.pkl
│   └── model_columns.pkl
│
├── notebook/
│   └── customer_churn_prediction.ipynb
│
├── output/
│   ├── app_home.png
│   ├── stay_prediction.png
│   ├── churn_prediction.png
│   ├── churn_distribution.png
│   ├── contract_distribution.png
│   ├── internet_service_distribution.png
│   ├── correlation_heatmap.png
│   ├── model_comparison.png
│   └── model_comparison.csv
│
├── README.md
├── requirements.txt
├── LICENSE
└── .gitignore

📊 Dataset

📌 Dataset Name

IBM Telco Customer Churn Dataset

The dataset contains customer demographic information, subscribed services, billing details, contract information, payment methods, and whether the customer churned.

📈 Dataset Summary

Attribute	Value
Total Customers	7,043
Features	20+
Target Variable	Churn
Classes	Yes / No
Missing Values	Handled during preprocessing

📋 Dataset Features

Some important features include:

Gender
Senior Citizen
Partner
Dependents
Tenure
Phone Service
Multiple Lines
Internet Service
Online Security
Online Backup
Device Protection
Tech Support
Streaming TV
Streaming Movies
Contract Type
Paperless Billing
Payment Method
Monthly Charges
Total Charges

🔄 Machine Learning Workflow

The project follows an end-to-end Machine Learning pipeline.

Dataset
    │
    ▼
Data Cleaning
    │
    ▼
Exploratory Data Analysis (EDA)
    │
    ▼
Feature Engineering
    │
    ▼
One-Hot Encoding
    │
    ▼
Feature Scaling
    │
    ▼
Train-Test Split
    │
    ▼
Model Training
    │
    ▼
Model Comparison
    │
    ▼
Best Model Selection
    │
    ▼
Model Serialization
    │
    ▼
Streamlit Deployment

🛠 Technologies Used

Category	Technologies
Programming Language	Python
Data Manipulation	Pandas, NumPy
Data Visualization	Matplotlib, Seaborn
Machine Learning	Scikit-learn, XGBoost
Model Serialization	Joblib
Web Framework	Streamlit
IDE	Visual Studio Code, Jupyter Notebook
Version Control	Git & GitHub

🤖 Machine Learning Models Evaluated

The following supervised Machine Learning algorithms were trained and evaluated.

Rank	Model	Accuracy	Precision	Recall	F1-Score
🥇	Logistic Regression	80.38%	64.76%	57.49%	60.91%
🥈	Gradient Boosting	79.53%	63.78%	53.21%	58.02%
🥉	Random Forest	78.82%	62.34%	51.34%	56.31%
4	Support Vector Machine	78.68%	62.59%	49.20%	55.09%
5	Extra Trees	77.40%	59.21%	48.13%	53.10%
6	XGBoost	77.11%	57.60%	52.67%	55.03%
7	K-Nearest Neighbors	75.34%	53.62%	53.48%	53.55%
8	Decision Tree	71.64%	46.61%	45.99%	46.30%

🏆 Best Model

After comparing all Machine Learning models, Logistic Regression achieved the highest overall performance and was selected as the final model for deployment.

Metric	Score
Accuracy	80.38%
Precision	64.76%
Recall	57.49%
F1-Score	60.91%

Why Logistic Regression?

Highest overall accuracy among evaluated models.
Good balance between Precision and Recall.
Computationally efficient and easy to interpret.
Performs well on structured tabular datasets.
Suitable for real-time prediction in a Streamlit application.

📈 Model Performance Visualization

The notebook also generates a Model Comparison chart illustrating the performance of all evaluated algorithms.

📷 Screenshot available in:

output/model_comparison.png

🎯 Results

The project successfully predicts whether a telecom customer is likely to Churn or Stay by leveraging customer demographic information, subscribed services, billing details, and account history.

Key Achievements

✅ Built an end-to-end Machine Learning pipeline
✅ Compared 8 Machine Learning algorithms
✅ Achieved 80.38% accuracy using Logistic Regression
✅ Developed an interactive Streamlit dashboard
✅ Implemented customer risk assessment
✅ Displayed prediction confidence scores
✅ Generated real-time customer churn predictions

📷 Application Screenshots

🏠 Home Page

📊 Model Comparison

✅ Customer Stay Prediction

⚠️ Customer Churn Prediction

📈 Exploratory Data Analysis (EDA)

The following visualizations were generated during data analysis.

Customer Churn Distribution

Contract Type Distribution

Internet Service Distribution

Correlation Heatmap

💻 Installation Guide

Clone the Repository

git clone https://github.com/harshvardhan4096/Customer_Churn_Prediction.git

Navigate to the Project Directory

cd Customer_Churn_Prediction

Create a Virtual Environment (Optional)

Windows

python -m venv venv
venv\Scripts\activate

Linux / macOS

python3 -m venv venv
source venv/bin/activate

Install Required Packages

pip install -r requirements.txt

Launch the Streamlit Application

streamlit run app/app.py

The application will automatically open in your browser.

🚀 How to Use

Launch the Streamlit application.
Enter the customer's information.
Select customer services and billing details.
Click the 🚀 Predict Customer Churn button.
View:
- Prediction Result
- Stay/Churn Probability
- Customer Risk Level
- Prediction Summary

🧪 Example Predictions

Example 1

Customer Details

Feature	Value
Contract	Month-to-month
Internet Service	Fiber Optic
Payment Method	Electronic Check
Tenure	2 Months
Monthly Charges	$105

Prediction

⚠️ Customer Will Churn

Risk Level: 🔴 HIGH

Example 2

Customer Details

Feature	Value
Contract	Two Year
Internet Service	DSL
Payment Method	Credit Card (Automatic)
Tenure	60 Months
Monthly Charges	$55

Prediction

✅ Customer Will Stay

Risk Level: 🟢 LOW

📊 Skills Demonstrated

This project demonstrates practical knowledge of:

Machine Learning

Supervised Learning
Binary Classification
Model Evaluation
Model Comparison

Data Science

Data Cleaning
Exploratory Data Analysis (EDA)
Feature Engineering
Feature Scaling

Python Libraries

Pandas
NumPy
Matplotlib
Seaborn
Scikit-learn
XGBoost
Joblib

Deployment

Streamlit
Git
GitHub

📚 Learning Outcomes

Through this project, I gained practical experience in:

Building an end-to-end Machine Learning pipeline
Comparing multiple Machine Learning algorithms
Selecting the best-performing model
Feature engineering and preprocessing
Developing interactive Streamlit dashboards
Deploying Machine Learning applications
Version control using Git & GitHub

📈 Business Insights

Based on the analysis and model predictions, the following factors were observed to have a strong influence on customer churn:

📉 Customers with Month-to-Month contracts are more likely to churn.
💳 Customers using Electronic Check as their payment method tend to have a higher churn rate.
📅 Customers with short tenure are more likely to leave the service.
🌐 Customers using Fiber Optic Internet show comparatively higher churn.
💰 Higher Monthly Charges are associated with increased churn probability.
🤝 Customers with long-term contracts and automatic payment methods are generally less likely to churn.

These insights can help telecom companies develop effective customer retention strategies, improve customer satisfaction, and reduce revenue loss.

🚀 Future Improvements

This project can be further enhanced by implementing the following features:

🔍 Hyperparameter Tuning using GridSearchCV or RandomizedSearchCV
🧠 Explainable AI using SHAP or LIME
🤖 Deep Learning models (ANN, LSTM)
🌍 Cloud Deployment (AWS, Azure, Google Cloud)
🐳 Docker Containerization
⚡ REST API using FastAPI or Flask
📱 Mobile-Friendly User Interface
📈 Interactive Business Dashboard using Plotly
🔄 Automated Model Retraining Pipeline
☁️ CI/CD Pipeline using GitHub Actions

🛣️ Project Roadmap

✅ Completed

Dataset Collection
Data Cleaning
Exploratory Data Analysis
Feature Engineering
One-Hot Encoding
Feature Scaling
Model Training
Comparison of 8 Machine Learning Models
Best Model Selection
Streamlit Dashboard
Model Deployment
GitHub Repository

🔄 Planned Improvements

Hyperparameter Optimization
Explainable AI
REST API Development
Docker Support
Cloud Deployment
Automated CI/CD Pipeline

🤝 Contributing

Contributions, suggestions, and feature requests are always welcome.

If you'd like to contribute:

Fork the repository
Create a new feature branch

git checkout -b feature-name

Commit your changes

git commit -m "Add new feature"

Push your branch

git push origin feature-name

Open a Pull Request

🙏 Acknowledgements

Special thanks to the following resources:

IBM for providing the Telco Customer Churn Dataset
Scikit-learn Documentation
Streamlit
Pandas
NumPy
Matplotlib
Seaborn
XGBoost
Open Source Community

📜 License

This project is licensed under the MIT License.

Feel free to use, modify, and distribute this project in accordance with the license terms.

👨‍💻 Developer

Harsh Vardhan Chaudhary

B.Tech – Computer Science & Engineering

SRM Institute of Science & Technology

Connect with Me

💻 GitHub: https://github.com/harshvardhan4096
💼 LinkedIn: https://www.linkedin.com/in/connect-harsh-vardhan/

⭐ Support

If you found this project useful, please consider:

⭐ Star this repository

🍴 Fork the repository

📝 Share your feedback

Your support motivates me to continue building Machine Learning and Artificial Intelligence projects.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
dataset		dataset
model		model
notebook		notebook
output		output
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📊 Customer Churn Prediction using Machine Learning

🌟 Project Highlights

📑 Table of Contents

🎯 Business Problem

🚀 Features

📂 Project Structure

📊 Dataset

📌 Dataset Name

📈 Dataset Summary

📋 Dataset Features

🔄 Machine Learning Workflow

🛠 Technologies Used

🤖 Machine Learning Models Evaluated

🏆 Best Model

Why Logistic Regression?

📈 Model Performance Visualization

🎯 Results

Key Achievements

📷 Application Screenshots

🏠 Home Page

📊 Model Comparison

✅ Customer Stay Prediction

⚠️ Customer Churn Prediction

📈 Exploratory Data Analysis (EDA)

Customer Churn Distribution

Contract Type Distribution

Internet Service Distribution

Correlation Heatmap

💻 Installation Guide

Clone the Repository

Navigate to the Project Directory

Create a Virtual Environment (Optional)

Windows

Linux / macOS

Install Required Packages

Launch the Streamlit Application

🚀 How to Use

🧪 Example Predictions

Example 1

Customer Details

Prediction

Example 2

Customer Details

Prediction

📊 Skills Demonstrated

Machine Learning

Data Science

Python Libraries

Deployment

📚 Learning Outcomes

📈 Business Insights

🚀 Future Improvements

🛣️ Project Roadmap

✅ Completed

🔄 Planned Improvements

🤝 Contributing

🙏 Acknowledgements

📜 License

👨‍💻 Developer

Harsh Vardhan Chaudhary

Connect with Me

⭐ Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages