This project aims to forecast the daily sales for the next 28 days at Walmart stores using hierarchical sales data. The data covers stores in three US states (California, Texas, and Wisconsin) and includes item-level, department, product categories, and store details. In addition, it contains explanatory variables such as price, promotions, day of the week, and special events. The primary goal is to improve forecast accuracy using machine learning techniques.
Forecasting sales at the item-store level can significantly impact inventory management, reducing overstock and stockouts, and thereby improving business efficiency. In this competition, you are challenged to use both traditional forecasting methods and machine learning to achieve this goal.
The model's performance is evaluated using the Weighted Root Mean Squared Scaled Error (RMSSE). The primary objective is to minimize this error metric.
• calendar.csv: Contains information about the dates on which the products are sold.
• sales_train_validation.csv: Contains the historical daily unit sales data per product and store [d_1 - d_1913].
• sales_train_evaluation.csv: Includes sales [d_1 - d_1941] (labels used for the Public leaderboard).
• sell_prices.csv: Contains information about the price of the products sold per store and date.
• sample_submission.csv: The correct format for submissions.
The data preprocessing steps are defined in the src/data_preprocessing.py module. This includes loading the data, handling missing values, and performing initial transformations.
Feature engineering steps are implemented in the src/feature_engineering.py module. This involves creating new features that are essential for improving the forecasting model.
The model training script is in the src/model_training.py module. Here, the Prophet model is trained using the preprocessed and engineered features.
Model evaluation is performed in the src/evaluation.py module. This includes calculating the RMSSE and other relevant metrics to assess the model's performance.
An example Jupyter Notebook is provided in the notebooks/
directory (walmart_unit_sales_forecasting.ipynb
). This notebook demonstrates the entire workflow from data loading, preprocessing, feature engineering, model training, and evaluation.
walmart-unit-sales-forecast/
├── config
│ └── config.yaml
│
├── data/
│ ├── raw/
│ │ ├── calendar.csv
│ │ ├── sales_train_validation.csv
│ │ ├── sample_submission.csv
│ │ ├── sell_prices.csv
│ │ └── sales_train_evaluation.csv
│ ├── processed/
│ │ ├── val.pkl
│ │ ├── eval.pkl
│ │ ├── future.pkl
│
├── logs/
│ └── app.log
│
├── notebooks/
│ └── walmart_unit_sales_forecasting.ipynb
│
├── models/
│ └── prophet_model.joblib
│
├── submissions/
│ └── submission.csv
│
├── src/
│ ├── __init__.py
│ ├── data_preparation.py
│ ├── feature_engineering.py
│ ├── model_training.py
│ ├── evaluation.py
│ ├── submission.py
│ ├── config_loader.py
│ └── logging_config.py
│
├── main.py
├── environment.yml
├── README.md
├── LICENSE
├── setup.py
└── requirements.txt (optional, if needed for pip packages only)
• data/raw/
: Contains the raw input files (sales data, product prices, etc.).
• data/processed/
: Contains processed data files (these will be generated during the execution).
• models/
: Contains saved model files.
• notebooks/
: Contains Jupyter notebooks for exploratory data analysis and model development.
• src/
: Contains the source code for data preparation, feature engineering, model training, and evaluation.
• submissions/
: Contains the submission files for the competition.
• Python 3.10.12
• Anaconda installed
• Git installed
- Clone the Repository
git clone https://github.com/abhinandansamal/walmart-unit-sales-forecast.git
cd walmart-sales-forecasting
-
Download the data:
• Download the required raw data files from this Google Drive link and place them in the data/raw/ directory. As most of the files are > 50 MB, so they are not uploaded in the repository.
• Alternatively, you can download the complete dataset from Kaggle here.
-
Create the Conda Environment
conda env create -f environment.yml
- Activate the Environment
conda activate walmart_sales_forecasting
- Run the Project
python main.py
• The processed data files (eval.pkl, future.pkl, val.pkl) will be generated during the execution of main.py. These files are required for the model to make predictions and will stored in the data/processed/ directory. As the file sizes are > 50 MB, so they are not uploaded in the repository.
• Due to the high RAM requirements for executing this project, it is advisable to run the Jupyter notebook (walmart_unit_sales_forecast.ipynb located in the notebooks/ folder) on a cloud platform or Google Colab with the high RAM option enabled. Alternatively, use a local system with sufficient memory.
• If running the Jupyter notebook in Google Colab, you can upload the raw data files to the Colab environment and adjust the file paths accordingly.
• Model: Implemented Facebook Prophet for time series forecasting.
• Error Metrics: Calculated Mean Absolute Error (MAE) and RMSSE to evaluate model performance.
• Visualizations: Generated visualizations to show sales trends and forecast accuracy.
• Savings: Estimated potential savings of around $250.44 by reducing overstock and stockouts, based on average item price and forecast accuracy.
• Feature Integration: Plan to integrate additional features such as marketing campaigns, weather data, and competitor pricing.
• Algorithm Experimentation: Explore different machine learning algorithms and ensemble methods to improve accuracy.
• Hyperparameter Tuning: Perform hyperparameter tuning to optimize model performance.
• Extended Forecasting: Extend the model to predict sales for multiple items and stores.
• Model Accuracy: The model achieved an RMSSE of 0.6054 on the validation set.
• Business Impact: By implementing this forecasting model, potential savings of approximately $250.44 can be realized through better inventory management.
This notebook demonstrates the process of forecasting sales using historical data and the Prophet model. The approach outlined here can be applied to other items and stores to enhance inventory management and business planning.
• This project is based on data provided by UNIVERSITY OF NICOSIA and is available on Kaggle.
• Special thanks to the developers of Prophet and the Python community for their support and contributions to the libraries used in this project.
This project is licensed under the MIT License. See the LICENSE file for details.