- Overview
- Process
- Research Question
- Flowchart
- Class Imbalance
- Sentiment Analysis Integration
- Model Performance Metrics
- Feature Importance Analysis
- Installation
- Results
- Challenges and Limitations
- Contributing
- License
- Citation
- Technologies Used
- Acknowledgments
- Next Steps
- Support
The core purpose of this study is to find the impact of Sentiment Analysis in predicting customer churn for the e-commerce industry by employing different predictive models. Furthermore, the study is also focused on observing which model is best in a more accurate prediction for determining the churn rate of customers.
The whole study is divided into two phases:
- In the first phase, all the relevant variables that are expected to be causing the customer churn are selected and then the predictive models are developed. In this, there will be no feedback from the customer is utilized.
- In the second phase, in addition to all the relevant variables obtained from EDA, the feedback provided by the customers is also included in this phase to extract the sentiment scores which are now added to the data frame. Again, the churn predictive models are developed with this data.
Finally, the metrics from both these phases will be reviewed and interpreted to understand if the inclusion of the sentiment analysis will be helpful for the organization in better understanding why their customers are parting away without making any future transactions with the organization.
The effectiveness of these models, with and without sentiment analysis, is compared to understand if sentiment analysis aids in better predicting why customers may discontinue transactions with the organization.
"How does sentiment analysis impact predicting the customer churn of an organization?"
- 1 - Represented the churned customer
- 0 - Represents the non-churned customer
DistilBERT is used for performing the sentiment analysis. DistilBERT is a smaller, faster, cheaper and lighter version of BERT that still retains a lot of BERT's language understanding capabilities.
python
from huggingface_hub import notebook_login
notebook_login()
from datasets import load_dataset
imdb = load_dataset("imdb")
- Cross-Validation Metrics
We evaluate our models using the following metrics:
- Accuracy
- Recall
- Precision
- F1 Score
- ROC_AUC
- Random Forest Model Metrics:
import pandas as pd ## Dataframe setup rf_output = pd.DataFrame({ 'Training': [train_cv_acc_rf, train_cv_recall_rf, train_cv_precision_rf, train_cv_f1_rf, roc_auc_train_rf], 'Testing': [test_cv_acc_rf, test_cv_recall_rf, test_cv_precision_rf, test_cv_f1_rf, roc_auc_test_rf] }, index=['Accuracy', 'Recall', 'Precision', 'F1', 'ROC_AUC']) print(rf_output)
We analyze the importance of different features in our models to understand which factors contribute most to customer churn.
- Variable Importances for Random Forest and LightGBM:
import matplotlib.pyplot as plt plt.barh(sorted_feature_names, sorted_feature_importances) plt.xlabel('Feature Importance') plt.title('Variable Importance for Random Forest') plt.show()
To set up the project environment:
- Clone the repository:
git clone https://github.com/GaneshKotaSLU/Customer-Churn-Prediction.git
- Navigate to the Project Directory:
cd Customer-Churn-Prediction
Our analysis shows that incorporating sentiment analysis into churn prediction models can significantly improve their accuracy. Key findings include:
- Models with sentiment analysis outperformed traditional models atmost 3%.
- Customer feedback sentiment was found to be a strong predictor of churn.
- The SVM and Random Forest model showed the best overall performance.
- Data quality and completeness varied across different customer segments.
- The sentiment analysis model may not capture all nuances in customer feedback.
- The current approach doesn't account for time-series aspects of customer behavior.
Welcome contributions to this project. Please follow these steps:
- Create a new branch (git checkout -b feature/AmazingFeature)
- Commit your changes (git commit -m 'Add some AmazingFeature')
- Push to the branch (git push origin feature/AmazingFeature)
- Open a Pull Request
This project is licensed under the MIT License - see the LICENSE.md file for details.
If you use this work in your research, please cite:
Kota, G. (2023). Customer Churn Prediction: A Comparative Analysis of Models with and without Sentiment Analysis. GitHub repository, https://github.com/GaneshKotaSLU/Customer-Churn-Prediction
The below are few of the technologies used in this project.
- Python 3.8+
- Pandas
- Scikit-learn
- TensorFlow
- Hugging Face Transformers
- Matplotlib
- LightGBM
Thanks to the Hugging Face team for their excellent NLP tools and models. This project was inspired by recent advancements in sentiment analysis and its applications in business intelligence.
This prototype needs to be integrated with real-time data to predict customer churn behavior. Future work includes:
- Exploring additional machine learning models, such as neural networks.
- Conducting A/B testing in a production environment.
- Investigating the impact of external factors (e.g., market trends, competitor actions) on churn rates.
Support our work by starring our GitHub repository. For any questions or suggestions, please open an issue in the repository.
This comprehensive README provides a detailed overview of your project, its methodology, results, and future directions. It includes all the sections we discussed earlier, with placeholders for specific results and findings that you can fill in with your actual data. The structure is designed to be informative for both technical and non-technical readers, making your project more accessible and encouraging collaboration.