An end-to-end Machine Learning project that predicts whether a telecom customer is likely to Churn or Stay using customer demographics, subscription details, billing information, and service usage patterns.
The project follows the complete Machine Learning lifecycle, including:
- Data Cleaning
- Exploratory Data Analysis (EDA)
- Feature Engineering
- Feature Scaling
- Model Training
- Model Comparison
- Model Evaluation
- Model Deployment using Streamlit
A total of 8 Machine Learning classification algorithms were trained and evaluated. Logistic Regression achieved the highest overall performance and was selected as the final model for deployment.
- π€ Compared 8 Machine Learning Models
- π Achieved 80.38% Accuracy
- π Interactive Streamlit Dashboard
- β‘ Real-Time Customer Prediction
- π Customer Risk Assessment
- πΎ Model Serialization using Joblib
- π§ End-to-End Machine Learning Pipeline
- Business Problem
- Features
- Project Structure
- Dataset
- Machine Learning Workflow
- Technologies Used
- Model Comparison
- Results
- Application Screenshots
- Installation
- Example Prediction
- Learning Outcomes
- Future Improvements
- Contributing
- License
- Developer
- Support
Customer churn is one of the biggest challenges for telecom companies and subscription-based businesses.
Retaining an existing customer is often much more cost-effective than acquiring a new one. By accurately predicting customers who are likely to leave, businesses can take proactive actions such as:
- Offering personalized discounts
- Improving customer support
- Providing loyalty rewards
- Enhancing customer satisfaction
- Reducing revenue loss
This project demonstrates how Machine Learning can assist businesses in identifying customers at high risk of churning and support data-driven decision-making.
- π Customer Churn Prediction
- π€ Comparison of 8 Machine Learning Algorithms
- π§Ή Data Cleaning & Preprocessing
- π Missing Value Handling
- π’ One-Hot Encoding
- π Feature Scaling using StandardScaler
- π Exploratory Data Analysis (EDA)
- π Correlation Heatmap
- π Model Performance Comparison
- π― Customer Risk Assessment
- π Prediction Probability
- π Interactive Streamlit Dashboard
- πΎ Model Serialization using Joblib
- β‘ Real-Time Customer Prediction
Customer_Churn_Prediction/
β
βββ app/
β βββ app.py
β
βββ dataset/
β βββ WA_Fn-UseC_-Telco-Customer-Churn.csv
β
βββ model/
β βββ churn_model.pkl
β βββ scaler.pkl
β βββ model_columns.pkl
β
βββ notebook/
β βββ customer_churn_prediction.ipynb
β
βββ output/
β βββ app_home.png
β βββ stay_prediction.png
β βββ churn_prediction.png
β βββ churn_distribution.png
β βββ contract_distribution.png
β βββ internet_service_distribution.png
β βββ correlation_heatmap.png
β βββ model_comparison.png
β βββ model_comparison.csv
β
βββ README.md
βββ requirements.txt
βββ LICENSE
βββ .gitignore
IBM Telco Customer Churn Dataset
The dataset contains customer demographic information, subscribed services, billing details, contract information, payment methods, and whether the customer churned.
| Attribute | Value |
|---|---|
| Total Customers | 7,043 |
| Features | 20+ |
| Target Variable | Churn |
| Classes | Yes / No |
| Missing Values | Handled during preprocessing |
Some important features include:
- Gender
- Senior Citizen
- Partner
- Dependents
- Tenure
- Phone Service
- Multiple Lines
- Internet Service
- Online Security
- Online Backup
- Device Protection
- Tech Support
- Streaming TV
- Streaming Movies
- Contract Type
- Paperless Billing
- Payment Method
- Monthly Charges
- Total Charges
The project follows an end-to-end Machine Learning pipeline.
Dataset
β
βΌ
Data Cleaning
β
βΌ
Exploratory Data Analysis (EDA)
β
βΌ
Feature Engineering
β
βΌ
One-Hot Encoding
β
βΌ
Feature Scaling
β
βΌ
Train-Test Split
β
βΌ
Model Training
β
βΌ
Model Comparison
β
βΌ
Best Model Selection
β
βΌ
Model Serialization
β
βΌ
Streamlit Deployment
| Category | Technologies |
|---|---|
| Programming Language | Python |
| Data Manipulation | Pandas, NumPy |
| Data Visualization | Matplotlib, Seaborn |
| Machine Learning | Scikit-learn, XGBoost |
| Model Serialization | Joblib |
| Web Framework | Streamlit |
| IDE | Visual Studio Code, Jupyter Notebook |
| Version Control | Git & GitHub |
The following supervised Machine Learning algorithms were trained and evaluated.
| Rank | Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| π₯ | Logistic Regression | 80.38% | 64.76% | 57.49% | 60.91% |
| π₯ | Gradient Boosting | 79.53% | 63.78% | 53.21% | 58.02% |
| π₯ | Random Forest | 78.82% | 62.34% | 51.34% | 56.31% |
| 4 | Support Vector Machine | 78.68% | 62.59% | 49.20% | 55.09% |
| 5 | Extra Trees | 77.40% | 59.21% | 48.13% | 53.10% |
| 6 | XGBoost | 77.11% | 57.60% | 52.67% | 55.03% |
| 7 | K-Nearest Neighbors | 75.34% | 53.62% | 53.48% | 53.55% |
| 8 | Decision Tree | 71.64% | 46.61% | 45.99% | 46.30% |
After comparing all Machine Learning models, Logistic Regression achieved the highest overall performance and was selected as the final model for deployment.
| Metric | Score |
|---|---|
| Accuracy | 80.38% |
| Precision | 64.76% |
| Recall | 57.49% |
| F1-Score | 60.91% |
- Highest overall accuracy among evaluated models.
- Good balance between Precision and Recall.
- Computationally efficient and easy to interpret.
- Performs well on structured tabular datasets.
- Suitable for real-time prediction in a Streamlit application.
The notebook also generates a Model Comparison chart illustrating the performance of all evaluated algorithms.
π· Screenshot available in:
output/model_comparison.png
The project successfully predicts whether a telecom customer is likely to Churn or Stay by leveraging customer demographic information, subscribed services, billing details, and account history.
- β Built an end-to-end Machine Learning pipeline
- β Compared 8 Machine Learning algorithms
- β Achieved 80.38% accuracy using Logistic Regression
- β Developed an interactive Streamlit dashboard
- β Implemented customer risk assessment
- β Displayed prediction confidence scores
- β Generated real-time customer churn predictions
The following visualizations were generated during data analysis.
git clone https://github.com/harshvardhan4096/Customer_Churn_Prediction.gitcd Customer_Churn_Predictionpython -m venv venv
venv\Scripts\activatepython3 -m venv venv
source venv/bin/activatepip install -r requirements.txtstreamlit run app/app.pyThe application will automatically open in your browser.
- Launch the Streamlit application.
- Enter the customer's information.
- Select customer services and billing details.
- Click the π Predict Customer Churn button.
- View:
- Prediction Result
- Stay/Churn Probability
- Customer Risk Level
- Prediction Summary
| Feature | Value |
|---|---|
| Contract | Month-to-month |
| Internet Service | Fiber Optic |
| Payment Method | Electronic Check |
| Tenure | 2 Months |
| Monthly Charges | $105 |
β οΈ Customer Will Churn
Risk Level: π΄ HIGH
| Feature | Value |
|---|---|
| Contract | Two Year |
| Internet Service | DSL |
| Payment Method | Credit Card (Automatic) |
| Tenure | 60 Months |
| Monthly Charges | $55 |
β
Customer Will Stay
Risk Level: π’ LOW
This project demonstrates practical knowledge of:
- Supervised Learning
- Binary Classification
- Model Evaluation
- Model Comparison
- Data Cleaning
- Exploratory Data Analysis (EDA)
- Feature Engineering
- Feature Scaling
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- XGBoost
- Joblib
- Streamlit
- Git
- GitHub
Through this project, I gained practical experience in:
- Building an end-to-end Machine Learning pipeline
- Comparing multiple Machine Learning algorithms
- Selecting the best-performing model
- Feature engineering and preprocessing
- Developing interactive Streamlit dashboards
- Deploying Machine Learning applications
- Version control using Git & GitHub
Based on the analysis and model predictions, the following factors were observed to have a strong influence on customer churn:
- π Customers with Month-to-Month contracts are more likely to churn.
- π³ Customers using Electronic Check as their payment method tend to have a higher churn rate.
- π Customers with short tenure are more likely to leave the service.
- π Customers using Fiber Optic Internet show comparatively higher churn.
- π° Higher Monthly Charges are associated with increased churn probability.
- π€ Customers with long-term contracts and automatic payment methods are generally less likely to churn.
These insights can help telecom companies develop effective customer retention strategies, improve customer satisfaction, and reduce revenue loss.
This project can be further enhanced by implementing the following features:
- π Hyperparameter Tuning using GridSearchCV or RandomizedSearchCV
- π§ Explainable AI using SHAP or LIME
- π€ Deep Learning models (ANN, LSTM)
- π Cloud Deployment (AWS, Azure, Google Cloud)
- π³ Docker Containerization
- β‘ REST API using FastAPI or Flask
- π± Mobile-Friendly User Interface
- π Interactive Business Dashboard using Plotly
- π Automated Model Retraining Pipeline
- βοΈ CI/CD Pipeline using GitHub Actions
- Dataset Collection
- Data Cleaning
- Exploratory Data Analysis
- Feature Engineering
- One-Hot Encoding
- Feature Scaling
- Model Training
- Comparison of 8 Machine Learning Models
- Best Model Selection
- Streamlit Dashboard
- Model Deployment
- GitHub Repository
- Hyperparameter Optimization
- Explainable AI
- REST API Development
- Docker Support
- Cloud Deployment
- Automated CI/CD Pipeline
Contributions, suggestions, and feature requests are always welcome.
If you'd like to contribute:
- Fork the repository
- Create a new feature branch
git checkout -b feature-name- Commit your changes
git commit -m "Add new feature"- Push your branch
git push origin feature-name- Open a Pull Request
Special thanks to the following resources:
- IBM for providing the Telco Customer Churn Dataset
- Scikit-learn Documentation
- Streamlit
- Pandas
- NumPy
- Matplotlib
- Seaborn
- XGBoost
- Open Source Community
This project is licensed under the MIT License.
Feel free to use, modify, and distribute this project in accordance with the license terms.
B.Tech β Computer Science & Engineering
SRM Institute of Science & Technology
- π» GitHub: https://github.com/harshvardhan4096
- πΌ LinkedIn: https://www.linkedin.com/in/connect-harsh-vardhan/
If you found this project useful, please consider:
β Star this repository
π΄ Fork the repository
π Share your feedback
Your support motivates me to continue building Machine Learning and Artificial Intelligence projects.







