This project focuses on predicting the purchase amount of customers during Black Friday sales using a dataset of sales transactions from a retail store. The analysis aims to help retailers understand customer purchasing behavior and optimize sales strategies. This regression problem involves building machine learning models to predict purchase amounts based on customer and product attributes.
The dataset consists of 550,069 rows and 12 columns, providing a rich opportunity to apply feature engineering techniques and explore various machine learning algorithms.
The dataset captures sales transaction details from Black Friday shopping at a retail store. It contains a variety of customer and product-related attributes that can help predict the purchase amount. The problem at hand is to predict the total purchase amount for each transaction.
Problem: Predict Purchase Amount
Dataset Download Link: https://www.kaggle.com/kkartik93/black-friday-sales-prediction
This project makes use of several Python libraries to process data, visualize trends, and build machine learning models.
pandas: For data manipulation and analysis.
matplotlib: For visualizing trends and patterns in the dataset.
seaborn: For advanced statistical data visualizations.
scikit-learn: For building and evaluating machine learning models.
Several machine learning algorithms were applied to predict the purchase amount:
Linear Regression: A simple algorithm that fits a linear model to predict continuous values.
Decision Tree: A tree-based algorithm that splits data based on decision rules to make predictions.
Random Forest: An ensemble method that builds multiple decision trees to improve accuracy.
Extra Trees: Another ensemble method that builds multiple trees with randomized splits to enhance model performance.
Hyperparameter Tuning: Applying techniques such as grid search or random search to fine-tune the parameters of the models for optimal performance. Experimenting with Different Models: Trying other machine learning algorithms like Gradient Boosting, XGBoost, or Neural Networks to see if they provide better results. Creation of New Attributes: Engineering new features from the existing data to enhance predictive power. Normalization: Applying normalization techniques to improve model accuracy by standardizing data values.
This project demonstrates how machine learning models can be applied to real-world retail data to predict purchase amounts, helping businesses better understand customer behavior and optimize sales strategies.