Skip to content

harshvardhan4096/Customer_Churn_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Customer Churn Prediction using Machine Learning

Python Scikit-Learn Streamlit License

An end-to-end Machine Learning project that predicts whether a telecom customer is likely to Churn or Stay using customer demographics, subscription details, billing information, and service usage patterns.

The project follows the complete Machine Learning lifecycle, including:

  • Data Cleaning
  • Exploratory Data Analysis (EDA)
  • Feature Engineering
  • Feature Scaling
  • Model Training
  • Model Comparison
  • Model Evaluation
  • Model Deployment using Streamlit

A total of 8 Machine Learning classification algorithms were trained and evaluated. Logistic Regression achieved the highest overall performance and was selected as the final model for deployment.


🌟 Project Highlights

  • πŸ€– Compared 8 Machine Learning Models
  • πŸ“ˆ Achieved 80.38% Accuracy
  • πŸ“Š Interactive Streamlit Dashboard
  • ⚑ Real-Time Customer Prediction
  • πŸ“‰ Customer Risk Assessment
  • πŸ’Ύ Model Serialization using Joblib
  • 🧠 End-to-End Machine Learning Pipeline

πŸ“‘ Table of Contents

  • Business Problem
  • Features
  • Project Structure
  • Dataset
  • Machine Learning Workflow
  • Technologies Used
  • Model Comparison
  • Results
  • Application Screenshots
  • Installation
  • Example Prediction
  • Learning Outcomes
  • Future Improvements
  • Contributing
  • License
  • Developer
  • Support

🎯 Business Problem

Customer churn is one of the biggest challenges for telecom companies and subscription-based businesses.

Retaining an existing customer is often much more cost-effective than acquiring a new one. By accurately predicting customers who are likely to leave, businesses can take proactive actions such as:

  • Offering personalized discounts
  • Improving customer support
  • Providing loyalty rewards
  • Enhancing customer satisfaction
  • Reducing revenue loss

This project demonstrates how Machine Learning can assist businesses in identifying customers at high risk of churning and support data-driven decision-making.


πŸš€ Features

  • πŸ“Š Customer Churn Prediction
  • πŸ€– Comparison of 8 Machine Learning Algorithms
  • 🧹 Data Cleaning & Preprocessing
  • πŸ”„ Missing Value Handling
  • πŸ”’ One-Hot Encoding
  • πŸ“ Feature Scaling using StandardScaler
  • πŸ“ˆ Exploratory Data Analysis (EDA)
  • πŸ“‰ Correlation Heatmap
  • πŸ“Š Model Performance Comparison
  • 🎯 Customer Risk Assessment
  • πŸ“ˆ Prediction Probability
  • 🌐 Interactive Streamlit Dashboard
  • πŸ’Ύ Model Serialization using Joblib
  • ⚑ Real-Time Customer Prediction

πŸ“‚ Project Structure

Customer_Churn_Prediction/
β”‚
β”œβ”€β”€ app/
β”‚   └── app.py
β”‚
β”œβ”€β”€ dataset/
β”‚   └── WA_Fn-UseC_-Telco-Customer-Churn.csv
β”‚
β”œβ”€β”€ model/
β”‚   β”œβ”€β”€ churn_model.pkl
β”‚   β”œβ”€β”€ scaler.pkl
β”‚   └── model_columns.pkl
β”‚
β”œβ”€β”€ notebook/
β”‚   └── customer_churn_prediction.ipynb
β”‚
β”œβ”€β”€ output/
β”‚   β”œβ”€β”€ app_home.png
β”‚   β”œβ”€β”€ stay_prediction.png
β”‚   β”œβ”€β”€ churn_prediction.png
β”‚   β”œβ”€β”€ churn_distribution.png
β”‚   β”œβ”€β”€ contract_distribution.png
β”‚   β”œβ”€β”€ internet_service_distribution.png
β”‚   β”œβ”€β”€ correlation_heatmap.png
β”‚   β”œβ”€β”€ model_comparison.png
β”‚   └── model_comparison.csv
β”‚
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
└── .gitignore

πŸ“Š Dataset

πŸ“Œ Dataset Name

IBM Telco Customer Churn Dataset

The dataset contains customer demographic information, subscribed services, billing details, contract information, payment methods, and whether the customer churned.


πŸ“ˆ Dataset Summary

Attribute Value
Total Customers 7,043
Features 20+
Target Variable Churn
Classes Yes / No
Missing Values Handled during preprocessing

πŸ“‹ Dataset Features

Some important features include:

  • Gender
  • Senior Citizen
  • Partner
  • Dependents
  • Tenure
  • Phone Service
  • Multiple Lines
  • Internet Service
  • Online Security
  • Online Backup
  • Device Protection
  • Tech Support
  • Streaming TV
  • Streaming Movies
  • Contract Type
  • Paperless Billing
  • Payment Method
  • Monthly Charges
  • Total Charges

πŸ”„ Machine Learning Workflow

The project follows an end-to-end Machine Learning pipeline.

Dataset
    β”‚
    β–Ό
Data Cleaning
    β”‚
    β–Ό
Exploratory Data Analysis (EDA)
    β”‚
    β–Ό
Feature Engineering
    β”‚
    β–Ό
One-Hot Encoding
    β”‚
    β–Ό
Feature Scaling
    β”‚
    β–Ό
Train-Test Split
    β”‚
    β–Ό
Model Training
    β”‚
    β–Ό
Model Comparison
    β”‚
    β–Ό
Best Model Selection
    β”‚
    β–Ό
Model Serialization
    β”‚
    β–Ό
Streamlit Deployment

πŸ›  Technologies Used

Category Technologies
Programming Language Python
Data Manipulation Pandas, NumPy
Data Visualization Matplotlib, Seaborn
Machine Learning Scikit-learn, XGBoost
Model Serialization Joblib
Web Framework Streamlit
IDE Visual Studio Code, Jupyter Notebook
Version Control Git & GitHub

πŸ€– Machine Learning Models Evaluated

The following supervised Machine Learning algorithms were trained and evaluated.

Rank Model Accuracy Precision Recall F1-Score
πŸ₯‡ Logistic Regression 80.38% 64.76% 57.49% 60.91%
πŸ₯ˆ Gradient Boosting 79.53% 63.78% 53.21% 58.02%
πŸ₯‰ Random Forest 78.82% 62.34% 51.34% 56.31%
4 Support Vector Machine 78.68% 62.59% 49.20% 55.09%
5 Extra Trees 77.40% 59.21% 48.13% 53.10%
6 XGBoost 77.11% 57.60% 52.67% 55.03%
7 K-Nearest Neighbors 75.34% 53.62% 53.48% 53.55%
8 Decision Tree 71.64% 46.61% 45.99% 46.30%

πŸ† Best Model

After comparing all Machine Learning models, Logistic Regression achieved the highest overall performance and was selected as the final model for deployment.

Metric Score
Accuracy 80.38%
Precision 64.76%
Recall 57.49%
F1-Score 60.91%

Why Logistic Regression?

  • Highest overall accuracy among evaluated models.
  • Good balance between Precision and Recall.
  • Computationally efficient and easy to interpret.
  • Performs well on structured tabular datasets.
  • Suitable for real-time prediction in a Streamlit application.

πŸ“ˆ Model Performance Visualization

The notebook also generates a Model Comparison chart illustrating the performance of all evaluated algorithms.

πŸ“· Screenshot available in:

output/model_comparison.png

🎯 Results

The project successfully predicts whether a telecom customer is likely to Churn or Stay by leveraging customer demographic information, subscribed services, billing details, and account history.

Key Achievements

  • βœ… Built an end-to-end Machine Learning pipeline
  • βœ… Compared 8 Machine Learning algorithms
  • βœ… Achieved 80.38% accuracy using Logistic Regression
  • βœ… Developed an interactive Streamlit dashboard
  • βœ… Implemented customer risk assessment
  • βœ… Displayed prediction confidence scores
  • βœ… Generated real-time customer churn predictions

πŸ“· Application Screenshots

🏠 Home Page

Home Page


πŸ“Š Model Comparison

Model Comparison


βœ… Customer Stay Prediction

Stay Prediction


⚠️ Customer Churn Prediction

Churn Prediction


πŸ“ˆ Exploratory Data Analysis (EDA)

The following visualizations were generated during data analysis.

Customer Churn Distribution

Churn Distribution


Contract Type Distribution

Contract Distribution


Internet Service Distribution

Internet Service


Correlation Heatmap

Correlation Heatmap


πŸ’» Installation Guide

Clone the Repository

git clone https://github.com/harshvardhan4096/Customer_Churn_Prediction.git

Navigate to the Project Directory

cd Customer_Churn_Prediction

Create a Virtual Environment (Optional)

Windows

python -m venv venv
venv\Scripts\activate

Linux / macOS

python3 -m venv venv
source venv/bin/activate

Install Required Packages

pip install -r requirements.txt

Launch the Streamlit Application

streamlit run app/app.py

The application will automatically open in your browser.


πŸš€ How to Use

  1. Launch the Streamlit application.
  2. Enter the customer's information.
  3. Select customer services and billing details.
  4. Click the πŸš€ Predict Customer Churn button.
  5. View:
    • Prediction Result
    • Stay/Churn Probability
    • Customer Risk Level
    • Prediction Summary

πŸ§ͺ Example Predictions

Example 1

Customer Details

Feature Value
Contract Month-to-month
Internet Service Fiber Optic
Payment Method Electronic Check
Tenure 2 Months
Monthly Charges $105

Prediction

⚠️ Customer Will Churn

Risk Level: πŸ”΄ HIGH


Example 2

Customer Details

Feature Value
Contract Two Year
Internet Service DSL
Payment Method Credit Card (Automatic)
Tenure 60 Months
Monthly Charges $55

Prediction

βœ… Customer Will Stay

Risk Level: 🟒 LOW


πŸ“Š Skills Demonstrated

This project demonstrates practical knowledge of:

Machine Learning

  • Supervised Learning
  • Binary Classification
  • Model Evaluation
  • Model Comparison

Data Science

  • Data Cleaning
  • Exploratory Data Analysis (EDA)
  • Feature Engineering
  • Feature Scaling

Python Libraries

  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-learn
  • XGBoost
  • Joblib

Deployment

  • Streamlit
  • Git
  • GitHub

πŸ“š Learning Outcomes

Through this project, I gained practical experience in:

  • Building an end-to-end Machine Learning pipeline
  • Comparing multiple Machine Learning algorithms
  • Selecting the best-performing model
  • Feature engineering and preprocessing
  • Developing interactive Streamlit dashboards
  • Deploying Machine Learning applications
  • Version control using Git & GitHub

πŸ“ˆ Business Insights

Based on the analysis and model predictions, the following factors were observed to have a strong influence on customer churn:

  • πŸ“‰ Customers with Month-to-Month contracts are more likely to churn.
  • πŸ’³ Customers using Electronic Check as their payment method tend to have a higher churn rate.
  • πŸ“… Customers with short tenure are more likely to leave the service.
  • 🌐 Customers using Fiber Optic Internet show comparatively higher churn.
  • πŸ’° Higher Monthly Charges are associated with increased churn probability.
  • 🀝 Customers with long-term contracts and automatic payment methods are generally less likely to churn.

These insights can help telecom companies develop effective customer retention strategies, improve customer satisfaction, and reduce revenue loss.


πŸš€ Future Improvements

This project can be further enhanced by implementing the following features:

  • πŸ” Hyperparameter Tuning using GridSearchCV or RandomizedSearchCV
  • 🧠 Explainable AI using SHAP or LIME
  • πŸ€– Deep Learning models (ANN, LSTM)
  • 🌍 Cloud Deployment (AWS, Azure, Google Cloud)
  • 🐳 Docker Containerization
  • ⚑ REST API using FastAPI or Flask
  • πŸ“± Mobile-Friendly User Interface
  • πŸ“ˆ Interactive Business Dashboard using Plotly
  • πŸ”„ Automated Model Retraining Pipeline
  • ☁️ CI/CD Pipeline using GitHub Actions

πŸ›£οΈ Project Roadmap

βœ… Completed

  • Dataset Collection
  • Data Cleaning
  • Exploratory Data Analysis
  • Feature Engineering
  • One-Hot Encoding
  • Feature Scaling
  • Model Training
  • Comparison of 8 Machine Learning Models
  • Best Model Selection
  • Streamlit Dashboard
  • Model Deployment
  • GitHub Repository

πŸ”„ Planned Improvements

  • Hyperparameter Optimization
  • Explainable AI
  • REST API Development
  • Docker Support
  • Cloud Deployment
  • Automated CI/CD Pipeline

🀝 Contributing

Contributions, suggestions, and feature requests are always welcome.

If you'd like to contribute:

  1. Fork the repository
  2. Create a new feature branch
git checkout -b feature-name
  1. Commit your changes
git commit -m "Add new feature"
  1. Push your branch
git push origin feature-name
  1. Open a Pull Request

πŸ™ Acknowledgements

Special thanks to the following resources:

  • IBM for providing the Telco Customer Churn Dataset
  • Scikit-learn Documentation
  • Streamlit
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • XGBoost
  • Open Source Community

πŸ“œ License

This project is licensed under the MIT License.

Feel free to use, modify, and distribute this project in accordance with the license terms.


πŸ‘¨β€πŸ’» Developer

Harsh Vardhan Chaudhary

B.Tech – Computer Science & Engineering

SRM Institute of Science & Technology

Connect with Me


⭐ Support

If you found this project useful, please consider:

⭐ Star this repository

🍴 Fork the repository

πŸ“ Share your feedback

Your support motivates me to continue building Machine Learning and Artificial Intelligence projects.


About

End-to-end Machine Learning project for predicting telecom customer churn using Logistic Regression and an interactive Streamlit dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors