Walmart Sales Prediction in R

Introduction

The Walmart Sales Prediction project aims to analyze and predict the weekly sales for Walmart stores across various regions in the United States. This project uses a dataset from Kaggle that includes sales data from multiple stores, alongside other variables like temperature, fuel price, CPI (Consumer Price Index), and unemployment rates. The primary objective is to understand the factors affecting the weekly sales and build a predictive model using R programming language. The R libraries rio is used for importing data, dplyr and tidyr is used for data manipulation, ggplot and plotly are used for data visualization.

Data Exploration and Visualization

  • Dataset Overview: The dataset contains 6435 rows and 8 columns. The columns include Store ID, Date, Weekly Sales, Holiday Flag, Temperature, Fuel Price, CPI, and Unemployment rate. The head function showing the first 10 rows of the dataset.
  • image
  • Summary Statistics: Initial exploratory data analysis (EDA) showed that weekly sales vary significantly across different stores, with some stores consistently outperforming others. The dataset also highlighted a notable spike in sales during the holiday season, particularly between Thanksgiving and Christmas.
  • Data Preprocessing

    Data Visualization

    Machine Learning Modeling

    Conclusions

    The project provides a comprehensive analysis of Walmart’s weekly sales data, revealing key trends and patterns that can inform business decisions. The insights gained from the exploratory data analysis suggest that the holidays and store locations, play a crucial role in driving sales.

    The complete code for this R project can be found here.