SlideShare a Scribd company logo
1 of 36
Walmart Sales Prediction Using Rapidminer
Prepared by : Nagarjun Singharavelu
I. Introduction:
Wal-Mart Stores, Inc is an American Multinational retail
corporation that
operates a chain of discount department stores and Warehouse
Stores. Headquartered in
Bentonville, Arkansas, United States, the company was founded
by Sam Walton in 1962 and
incorporated on October 31, 1969. It has over 11,000 stores in
27 countries, under a total 71
banners. Walmart is the world's largest company by revenue,
according to the Fortune Global
500 list in 2014, as well as the biggest private employer in the
world with 2.2 million employees.
Walmart is a family-owned business, as the company is
controlled by the Walton family. Sam
Walton's heirs own over 50 percent of Walmart through their
holding company, Walton
Enterprises, and through their individual holdings. The company
was listed on the New York
Stock Exchange in 1972. In the late 1980s and early 1990s, the
company rose from a regional to
a national giant. By 1988, Walmart was the most profitable
retailer in the U.S. Walmart helps
individuals round the world economize and live better.
The main aim of our project is to identify the impact on sales
throughout
numerous strategic selections taken by the corporate. The
analysis is performed on historical
sales data across 45 Walmart stores located in different regions.
The foremost necessary is
Walmart runs many promotional markdown events throughout
the year and we have to check
the impact it creates on sales during that particular period. The
markdowns precede prominent
holidays, the four largest of which are the Labor Day,
Thanksgiving and Christmas. During these
weeks it is noted that there is a tremendous amount of change in
the day-to-day sales. Hence
we tend to apply different algorithms which we learnt in class
over this dataset to identify the
effect of markdowns on these holiday weeks.
II. Information about dataset:
We had taken four different datasets of Walmart from
Kaggle.com
containing the information about the stores, departments,
average temperature in that
particular region, CPI, day of the week, sales and mainly
indicating if that week was a
holiday. Let us explain each dataset in detail.
Stores:
we assume it to be
available in the particular
Train:
-02-
05 to 2012-11-
les in USD for the given department in the given
-
ce
they were not considered
Test:
ict the sales for each triplet of store and
Features:
average temperature in
that region which is measured which ranges from -
articular region is available in the
dataset from which we can
identify that if the fuel is higher in that particular region,
whether it affects the sales as
o measure the
changes in price level in that
-
promotional markdowns
which Walmart is running. There are 5 markdowns in that
particular duration in the
Among these 5 files, Features and train files contain every
necessary elements to do the
predication. So we are going to take advantage of them.
III. Data transformation & Preprocessing:
Our aim is to find Walmart’s weekly sales. We have all the
necessary data for
predicting the weekly sales in Feature data. But the previous
weekly sales report is in Train data
set. So we decided to combine all the data sets into a single data
set. After combining, we now
have the sales data of 45 stores with up to 99 departments which
consists of totally 421,000
records. Now our final data set has the following attributes.
-
Now we have 13 attributes, in which attributes such as
department number (Dept), Date and
Type are not required for our prediction, since it doesn’t have
any impact on its weekly sales.
Hence we are removing these attributes from our final data set.
After removing all the unwanted
data and attributes in our dataset we now have 6,435 records on
total.With our huge dataset, it is
difficult to get exact result about future sales from models
because of the uncertainty in so many
disturbing elements. So we decided to group the data and then
implementing with classifiers or
regression models. By using Discretize by Size operators, we
grouped the data based on
Weekly_Sales into 5 categories and then exported the output as
CSV file to desktop.
After processing, now check the data of weekly sales, we found
that the ranges are as
follows;
range1 [-? - 9144.495]
range2 [9144.495 - 14213.040]
range3 [14213.040 - 22267.010]
range4 [22267.010 - 39499.755]
range5 [39499.755 - ?]
Now we manually label the whole 6,435 records into 5 groups
each contain 1,287 records.
Groups Mark as Description
Group 1 A range1 [-? - 9144.495]
Group 2 B range2 [9144.495 - 14213.040]
Group 3 C range3 [14213.040 - 22267.010]
Group 4 D range4 [22267.010 - 39499.755]
Group 5 E range5 [39499.755 - ?]
IV. Mining models & Performance:
Our purpose is to build a model to predict Walmart’s weekly
sales in future, so we would
like to select the most suitable model for predication. Look into
the table, we find out the data is
structured. However, the parameters are complex and do not
show its rules obviously. Even
more, we have no idea about whether it is linear or not.
According to problems above, we determined to use Neural
Network Model with our project
because it has the characteristics below:
•It is for complicated prediction problems.
•Visualization or understanding of the rules are not needed.
•Accuracy is very important.
The design for implementing the Neural Network model in rapid
miner is as follows:
The output we obtained is as follows,
Accuracy = 77.61%, when Learning Rate = 2000 & Training
Cycles = 0.05
true A
true B
true C
true D
true E
class
precision
pred. A 939 49 83 36 95 78.12%
pred. B 17 1114 58 75 101 83.61%
pred. C 160 102 923 148 29 68.77%
pred. D 60 95 93 780 187 65.20%
pred. E 23 276 43 161 788 63.04%
67.67%
class recall 78.32% 68.09% 77.92% 65.00%
We tried to test our result by changing learning rate and training
cycle values and noted down
the changes too. The result obtained in this process is as
follows;
Accuracy = 72.69%, when Learning Rate = 1000 & Training
Cycles = 0.05
true A
true B
true C
true D
true E
class
precision
pred. A 1012 62 96 42 102 77.62%
pred. B 23 1193 58 76 120 84.61%
pred. C
184
114
1025
181
28
68.91%
pred. D 69 124 104 842 209 64.89%
pred. E 34 306 37 179 861 61.89%
class recall 76.72% 69.31% 78.65% 63.79% 67.23%
Accuracy = 75.35%, when Learning Rate = 1000 & Training
Cycles = 0.3
class
true A
true B
true C
true D
true E
precision
pred. A 908 86 114 56 108 74.38%
pred. B 25 1033 50 64 83 86.31%
pred. C 164 123 895 176 40 64.02%
pred. D 73 117 94 732 189 62.75%
pred. E 29 277 47 172 780 63.07%
class recall 77.03% 65.24% 77.28% 64.42% 67.02%
Since weekly sales is the core for any industry. So it is
necessary to get the more accurate
results, since it have direct impact on companies revenue. To
achieve better performance and
accuracy results, we have now increased the hidden layer size to
10.
We now combined all the hidden layer results obtained from
each node in a single table to check
where the weekly sales have a greater impact.
Attributes Node 1 Node 2
Store -30.224 27.949
Temperature -0.192 0.68
Fuel_Price 1.054 1.459
MarkDown1 0.114 -2.679
MarkDown2 0.88 0.476
MarkDown3 9.209 -0.635
Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10
-20.9 -31.401 5.072 -17.01 40.862 27.088 18.266 16.821
-0.965 -0.601 0.038 -0.847 -0.436 -1.513 -1.094 0.126
-0.46 0.632 -1.242 -1.713 0.698 1.303 2.589 -0.007
-0.24 -0.636 -4.109 -1.322 1.004 0.648 -1.147 -2.402
-0.03 2.65 -0.387 -1.306 0.328 0.633 1.983 -0.821
0.631 7.522 -6.247 -2.194 -6.743 -0.075 1.157 6.285
MarkDown4 -0.613 -0.554 0.131 1.257 -2.967 -0.445 0.753
0.691 -0.115 0.599
MarkDown5 0.41 -3.728 -3.697 1.786 -14.643 -2.111 0.341 -
0.999 -7.048 -0.908
CPI -6.316 29.004 -12.179 -12.975 0.568 11.217 -27.856 -
15.442 -5.764 1.045
Unemployment -12.906 -17.876 4.064 -1.446 8.895 3.908 7.056
12.174 -31.068 1.142
IsHoliday 0.056 0 -0.191 -0.484 -0.068 -0.244 -0.151 -0.27 -
0.036 -0.025
Bias -13.162 -10.766 -9.052 -8.572 -25.596 9.488 -10.875
20.103 -2.328 -9.733
The hidden layer network for this dataset will look like:
Other Models:
We tried to analyze data with more models other than Neural
Network. But we find out that
Neural Network model is the most suitable one for Walmart
dataset. Other models cannot
work on this example.
Naïve Bayes:
Design:
Output:
Accuracy = 23.42%
true A true B true C true D true D
class
precision
pred. A 8190 0 0 0 0 100.00%
pred. B 19795 23905 23910 23905 23909 20.71%
pred. C 15895 19128 19124 19128 19126 20.70%
pred. D 0 0 0 0 0 0.00%
pred. E 3917 4782 4781 4782 4780 20.74%
class
recall
17.13% 49.99% 40.00% 0.00% 10.00%
K-NN:
Design:
Output:
When K = 1, Accuracy = 28.76%
true A true B true C true D true E
class
precision
pred. A 385 207 373 180 199 28.65%
pred. B 190 619 259 311 301 36.85%
pred. C 352 271 187 225 188 15.29%
pred. D 113 267 124 180 210 20.13%
pred. E 159 272 257 304 302 23.34%
class
32.11% 37.84% 15.58% 15.00% 25.17%
recall
When K = 10, Accuracy = 31.06%
true A true B true C true D true E
class
precision
pred. A 429 234 403 168 206 29.79%
pred. B 272 834 288 421 428 37.18%
pred. C 219 123 186 97 213 24.20%
pred. D 195 266 162 347 281 28.64%
pred. E 84 179 161 167 72 10.86%
class
35.78% 50.98% 15.50% 28.92% 6.00%
recall
Decision Tree:
No matter what the parameters are, the Accuracy is 25.42% and
never change.
true D true S true C true B true A
class
precision
pred. D 0 0 0 0 0 0.00%
pred. S 1199 1636 1200 1200 1200 25.42%
pred. C 0 0 0 0 0 0.00%
pred. B 0 0 0 0 0 0.00%
pred. A 0 0 0 0 0 0.00%
class
0.00% 100.00% 0.00% 0.00% 0.00%
recall
As we show above, Naïve Bayes Model products the lowest
Accuracy
(18.63%). Abstractly, the Naïve Bayes model is a conditional
model like this:
P (Sales | Store, Data, Temperature … & IsHoliday)
=
P(Sales)*P(Store, Data, Temperature, … & IsHoliday | Sales)
P(Store, Data, Temperature, … & IsHoliday)
Because variable Store, Data to IsHoliday are independent on
each other, so:
P (Store, Date, Temperature … & IsHoliday) =P (Store)*P
(Date)*…..*P (IsHoliday)
Due to so many numbers in columns Store, Date … IsHoliday
that do not repeat, the
probability of each Variables is too small. So P (Store)*P
(Date)*…..*P (IsHoliday) will be far
lower than 1/6435. This means the probability of sales basing
such a model is infeasible.
V. Interpretation:
As a result, we find out that how each parameter affects the
weekly sales. MarkDown 1 to 5
has the highest weight as 9 to -16 which mean it really makes an
enormous impact on the sales.
Promotion will increase weekly sales remarkably. Fuel price
and temperature also makes a
positive impact, higher price and temperature makes higher
sales. CPI and Unemployment rate
having a heavy negative impact on the prospects of sales. The
higher CPI and unemployment
rate, the less weekly sales. Holidays affect weekly sales
slightly. We think customers don’t care
whether today is holiday or not, the only reason they buy items
is promotion. Therefore,
Promotion drives sales and revenue to increase.
VI. Conclusion:
The main objective was to model the effects of markdown
events throughout the year and it
was successfully measured using various algorithms. From the
analysis it is noted that MarkDown 1
to 5 has the highest weight as 16, which means it really makes
an enormous impact on the sales.
Promotion will increase weekly sales remarkably. Fuel price
and temperature also makes a positive
impact, higher price makes higher sales. CPI and
Unemployment rate has a heavy negative impact
on the prospects of sales. The higher CPI and unemployment
rate, the less is the weekly sales.
Holidays affect weekly sales slightly. We felt customers didn’t
care whether it was a holiday or not
because the only reason they buy items is due to promotion. It is
notable that they are attracted to
various deals in departmental stores all over the world.
References:
1. https://www.kaggle.com/c/walmart-recruiting-store-sales-
forecasting/data
2. http://www.walmart.com/
3. http://en.wikipedia.org/wiki/Walmart
https://www.kaggle.com/c/walmart-recruiting-store-sales-
forecasting/data
http://www.walmart.com/
http://en.wikipedia.org/wiki/Walmart
Allstate Purchase Prediction Challenge
Predict a purchased policy based on transaction history
As a customer shops an insurance policy, he/she will receive a
number of quotes with different coverage options before
purchasing a plan. This is represented in this challenge as a
series of rows that include a customer ID, information about the
customer, information about the quoted policy, and the cost.
Your task is to predict the purchased coverage options using a
limited subset of the total interaction history. If the eventual
purchase can be predicted sooner in the shopping window, the
quoting process is shortened and the issuer is less likely to lose
the customer's business.
Using a customer’s shopping history, can you predict what
policy they will end up choosing?
Files
test_v2.csv.zip
train.csv.zip
Data Introduction
The training and test sets contain transaction history for
customers that ended up purchasing a policy. For each
customer_ID, you are given their quote history. In the training
set you have the entire quote history, the last row of which
contains the coverage options they purchased. In the test set,
you have only a partial history of the quotes and do not have the
purchased coverage options. These are truncated to certain
lengths to simulate making predictions with less history (higher
uncertainty) or more history (lower uncertainty).
For each customer_ID in the test set, you must predict the seven
coverage options they end up purchasing.
What is a customer?
Each customer has many shopping points, where a shopping
point is defined by a customer with certain characteristics
viewing a product and its associated cost at a particular time.
· Some customer characteristics may change over time (e.g. as
the customer changes or provides new information), and the cost
depends on both the product and the customer characteristics.
· A customer may represent a collection of people, as policies
can cover more than one person.
· A customer may purchase a product that was not viewed!
Product Options
Each product has 7 customizable options selected by customers,
each with 2, 3, or 4 ordinal values possible:
A product is simply a vector with length 7 whose values are
chosen from each of the options listed above. The cost of a
product is a function of both the product options and customer
characteristics.
Variable Descriptions
customer_ID - A unique identifier for the customer
shopping_pt - Unique identifier for the shopping point of a
given customer
record_type - 0=shopping point, 1=purchase point
day - Day of the week (0-6, 0=Monday)
time - Time of day (HH:MM)
state - State where shopping point occurred
location - Location ID where shopping point occurred
group_size - How many people will be covered under the policy
(1, 2, 3 or 4)
homeowner - Whether the customer owns a home or not (0=no,
1=yes)
car_age - Age of the customer’s car
car_value - How valuable was the customer’s car when new
risk_factor - An ordinal assessment of how risky the customer is
(1, 2, 3, 4)
age_oldest - Age of the oldest person in customer's group
age_youngest - Age of the youngest person in customer’s group
married_couple - Does the customer group contain a married
couple (0=no, 1=yes)
C_previous - What the customer formerly had or currently has
for product option C (0=nothing, 1, 2, 3,4)
duration_previous - how long (in years) the customer was
covered by their previous issuer
A,B,C,D,E,F,G - the coverage options
cost - cost of the quoted coverage options
Data Mining Steps
Problem Definition
Market Analysis
Customer Profiling, Identifying Customer Requirements, Cross
Market Analysis, Target Marketing, Determining Customer
purchasing pattern
Corporate Analysis and Risk Management
Finance Planning and Asset Evaluation, Resource Planning,
Competition
Fraud Detection
Customer Retention
Production Control
Science Exploration
> Data Preparation
Data preparation is about constructing a dataset from one or
more data sources to be used for exploration and modeling. It is
a solid practice to start with an initial dataset to get familiar
with the data, to discover first insights into the data and have a
good understanding of any possible data quality issues. The
Datasets you are provided in these projects were obtained from
kaggle.com.
Variable selection and description
Numerical – Ratio, Interval
Categorical – Ordinal, Nominal
Simplifying variables: From continuous to discrete
Formatting the data
Basic data integrity checks: missing data, outliers
> Data Exploration
Data Exploration is about describing the data by means of
statistical and visualization techniques.
· Data Visualization:
· Univariate analysis explores variables (attributes) one by one.
Variables could be either categorical or numerical.
Univariate Analysis - Categorical
Statistics
Visualization
Description
Count
Bar Chart
The number of values of the specified variable.
Count%
Pie Chart
The percentage of values of the specified variable
Univariate Analysis - Numerical
Statistics
Visualization
Equation
Description
Count
Histogram
N
The number of values (observations) of the variable.
Minimum
Box Plot
Min
The smallest value of the variable.
Maximum
Box Plot
Max
The largest value of the variable.
Mean
Box Plot
The sum of the values divided by the count.
Median
Box Plot
The middle value. Below and above median lies an equal
number of values.
Mode
Histogram
The most frequent value. There can be more than one mode.
Quantile
Box Plot
A set of 'cut points' that divide a set of data into groups
containing equal numbers of values (Quartile, Quintile,
Percentile, ...).
Range
Box Plot
Max-Min
The difference between maximum and minimum.
Variance
Histogram
A measure of data dispersion.
Standard Deviation
Histogram
The square root of variance.
Coefficient of Deviation
Histogram
A measure of data dispersion divided by mean.
Skewness
Histogram
A measure of symmetry or asymmetry in the distribution of
data.
Kurtosis
Histogram
A measure of whether the data are peaked or flat relative to a
normal distribution.
Note: There are two types of numerical variables, interval and
ratio. An interval variable has values whose differences are
interpretable, but it does not have a true zero. A good example
is temperature in Centigrade degrees. Data on an interval scale
can be added and subtracted but cannot be meaningfully
multiplied or divided. For example, we cannot say that one day
is twice as hot as another day. In contrast, a ratio variable has
values with a true zero and can be added, subtracted, multiplied
or divided (e.g., weight).
· Bivariate analysis is the simultaneous analysis of two
variables (attributes). It explores the concept of relationship
between two variables, whether there exists an association and
the strength of this association.
There are three types of bivariate analysis.
1.Numerical & Numerical
Scatter Plot, Linear Correlation …
2.Categorical & Categorical
Stacked Column Chart, Combination Chart, Chi-square Test
3.Numerical & Categorical
Line Chart with Error Bars, Combination Chart, Z-test and t-test
> Modeling
· Predictive modeling is the process by which a model is created
to predict an outcome
· If the outcome is categorical it is called classification and if
the outcome is numerical it is called regression.
· Descriptive modeling or clustering is the assignment of
observations into clusters so that observations in the same
cluster are similar.
· Finally, association rules can find interesting associations
amongst observations.
Classification algorithms:
1. Frequency Table
· ZeroR, OneR, Naive Bayesian, Decision Tree
2. Covariance Matrix
· Linear Discriminant Analysis, Logistic Regression
3. Similarity Functions
· K Nearest Neighbors
4. Others
· Artificial Neural Network, Support Vector Machine
Regression
1. Frequency Table
· Decision Tree
2. Covariance Matrix
· Multiple Linear Regression
3. Similarity Function
· K Nearest Neighbors
4. Others
· Artificial Neural Network, Support Vector Machine
Clustering algorithms are:
1. Hierarchical
· Agglomerative, Divisive
2. Partitive
· K Means, Self-Organizing Map
> Evaluation
· helps to find the best model that represents our data and how
well the chosen model will work in the future. Hold-Out and
Cross-Validation
> Deployment
The concept of deployment in predictive data mining refers to
the application of a model for prediction to new data.

More Related Content

Similar to Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx

Company segmentation - an approach with R
Company segmentation - an approach with RCompany segmentation - an approach with R
Company segmentation - an approach with RCasper Crause
 
ASMD 2022 for class.pptx
ASMD 2022 for class.pptxASMD 2022 for class.pptx
ASMD 2022 for class.pptxMahekSinghania2
 
Get a Richer View of Customers with Wave, the Salesforce Analytics Cloud
Get a Richer View of Customers with Wave, the Salesforce Analytics CloudGet a Richer View of Customers with Wave, the Salesforce Analytics Cloud
Get a Richer View of Customers with Wave, the Salesforce Analytics CloudSalesforce Marketing Cloud
 
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...Biswadeep Ghosh Hazra
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeAtScale
 
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxWeek 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxjessiehampson
 
Gross Profit Bidding for Ecommerce | SMX Virtual 2021
Gross Profit Bidding for Ecommerce | SMX Virtual 2021Gross Profit Bidding for Ecommerce | SMX Virtual 2021
Gross Profit Bidding for Ecommerce | SMX Virtual 2021Christopher Gutknecht
 
IRJET- Finding Optimal Skyline Product Combinations Under Price Promotion
IRJET- Finding Optimal Skyline Product Combinations Under Price PromotionIRJET- Finding Optimal Skyline Product Combinations Under Price Promotion
IRJET- Finding Optimal Skyline Product Combinations Under Price PromotionIRJET Journal
 
IRJET- Retail Chain Sales Analysis and Forecasting
IRJET-  	  Retail Chain Sales Analysis and ForecastingIRJET-  	  Retail Chain Sales Analysis and Forecasting
IRJET- Retail Chain Sales Analysis and ForecastingIRJET Journal
 
SmartHelmet Business Plan Fall 2014
SmartHelmet Business Plan Fall 2014SmartHelmet Business Plan Fall 2014
SmartHelmet Business Plan Fall 2014Emily Tillo
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupKen Tucker
 
Essbase Calculations A Visual Approach KScope 2010
Essbase Calculations A Visual Approach KScope 2010Essbase Calculations A Visual Approach KScope 2010
Essbase Calculations A Visual Approach KScope 2010Ron Moore
 
Transforming big data into supply chain analytics
Transforming big data into supply chain analyticsTransforming big data into supply chain analytics
Transforming big data into supply chain analyticsTristan Wiggill
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Jim Stafford
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Jim Stafford
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Jim Stafford
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Jim Stafford
 
Statistics project2
Statistics project2Statistics project2
Statistics project2shri1984
 
Final Presentation from Chester Group Rev 0
Final Presentation from Chester Group Rev 0Final Presentation from Chester Group Rev 0
Final Presentation from Chester Group Rev 0Steven Quenzel
 

Similar to Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx (20)

Company segmentation - an approach with R
Company segmentation - an approach with RCompany segmentation - an approach with R
Company segmentation - an approach with R
 
ASMD 2022 for class.pptx
ASMD 2022 for class.pptxASMD 2022 for class.pptx
ASMD 2022 for class.pptx
 
Get a Richer View of Customers with Wave, the Salesforce Analytics Cloud
Get a Richer View of Customers with Wave, the Salesforce Analytics CloudGet a Richer View of Customers with Wave, the Salesforce Analytics Cloud
Get a Richer View of Customers with Wave, the Salesforce Analytics Cloud
 
Statistics homework help
Statistics homework helpStatistics homework help
Statistics homework help
 
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...
Case Study on Data Analytics with given Dataset (Biswadeep Ghosh Hazra) - [Ha...
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on Snowflake
 
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docxWeek 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
 
Gross Profit Bidding for Ecommerce | SMX Virtual 2021
Gross Profit Bidding for Ecommerce | SMX Virtual 2021Gross Profit Bidding for Ecommerce | SMX Virtual 2021
Gross Profit Bidding for Ecommerce | SMX Virtual 2021
 
IRJET- Finding Optimal Skyline Product Combinations Under Price Promotion
IRJET- Finding Optimal Skyline Product Combinations Under Price PromotionIRJET- Finding Optimal Skyline Product Combinations Under Price Promotion
IRJET- Finding Optimal Skyline Product Combinations Under Price Promotion
 
IRJET- Retail Chain Sales Analysis and Forecasting
IRJET-  	  Retail Chain Sales Analysis and ForecastingIRJET-  	  Retail Chain Sales Analysis and Forecasting
IRJET- Retail Chain Sales Analysis and Forecasting
 
SmartHelmet Business Plan Fall 2014
SmartHelmet Business Plan Fall 2014SmartHelmet Business Plan Fall 2014
SmartHelmet Business Plan Fall 2014
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April Meetup
 
Essbase Calculations A Visual Approach KScope 2010
Essbase Calculations A Visual Approach KScope 2010Essbase Calculations A Visual Approach KScope 2010
Essbase Calculations A Visual Approach KScope 2010
 
Transforming big data into supply chain analytics
Transforming big data into supply chain analyticsTransforming big data into supply chain analytics
Transforming big data into supply chain analytics
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Statistics project2
Statistics project2Statistics project2
Statistics project2
 
Final Presentation from Chester Group Rev 0
Final Presentation from Chester Group Rev 0Final Presentation from Chester Group Rev 0
Final Presentation from Chester Group Rev 0
 

More from celenarouzie

Attaining ExpertiseYou are tr.docx
Attaining ExpertiseYou are tr.docxAttaining ExpertiseYou are tr.docx
Attaining ExpertiseYou are tr.docxcelenarouzie
 
attachment Chloe” is a example of the whole packet. Please follow t.docx
attachment Chloe” is a example of the whole packet. Please follow t.docxattachment Chloe” is a example of the whole packet. Please follow t.docx
attachment Chloe” is a example of the whole packet. Please follow t.docxcelenarouzie
 
AttachmentFor this discussionUse Ericksons theoretic.docx
AttachmentFor this discussionUse Ericksons theoretic.docxAttachmentFor this discussionUse Ericksons theoretic.docx
AttachmentFor this discussionUse Ericksons theoretic.docxcelenarouzie
 
Attachment Programs and Families Working Together Learn.docx
Attachment Programs and Families Working Together Learn.docxAttachment Programs and Families Working Together Learn.docx
Attachment Programs and Families Working Together Learn.docxcelenarouzie
 
Attachment and Emotional Development in InfancyThe purpose o.docx
Attachment and Emotional Development in InfancyThe purpose o.docxAttachment and Emotional Development in InfancyThe purpose o.docx
Attachment and Emotional Development in InfancyThe purpose o.docxcelenarouzie
 
ATTACHEMENT from 7.1 and 7.2 Go back to the Powerpoint for thi.docx
ATTACHEMENT from 7.1 and 7.2 Go back to the Powerpoint for thi.docxATTACHEMENT from 7.1 and 7.2 Go back to the Powerpoint for thi.docx
ATTACHEMENT from 7.1 and 7.2 Go back to the Powerpoint for thi.docxcelenarouzie
 
Attached the dataset Kaggle has hosted a data science competitio.docx
Attached the dataset Kaggle has hosted a data science competitio.docxAttached the dataset Kaggle has hosted a data science competitio.docx
Attached the dataset Kaggle has hosted a data science competitio.docxcelenarouzie
 
Attached you will find all of the questions.These are just like th.docx
Attached you will find all of the questions.These are just like th.docxAttached you will find all of the questions.These are just like th.docx
Attached you will find all of the questions.These are just like th.docxcelenarouzie
 
Attached the dataset Kaggle has hosted a data science compet.docx
Attached the dataset Kaggle has hosted a data science compet.docxAttached the dataset Kaggle has hosted a data science compet.docx
Attached the dataset Kaggle has hosted a data science compet.docxcelenarouzie
 
B. Answer Learning Exercises  Matching words parts 1, 2, 3,.docx
B. Answer Learning Exercises  Matching words parts 1, 2, 3,.docxB. Answer Learning Exercises  Matching words parts 1, 2, 3,.docx
B. Answer Learning Exercises  Matching words parts 1, 2, 3,.docxcelenarouzie
 
B)What is Joe waiting for in order to forgive Missy May in The Gild.docx
B)What is Joe waiting for in order to forgive Missy May in The Gild.docxB)What is Joe waiting for in order to forgive Missy May in The Gild.docx
B)What is Joe waiting for in order to forgive Missy May in The Gild.docxcelenarouzie
 
B)Blanche and Stella both view Stanley very differently – how do the.docx
B)Blanche and Stella both view Stanley very differently – how do the.docxB)Blanche and Stella both view Stanley very differently – how do the.docx
B)Blanche and Stella both view Stanley very differently – how do the.docxcelenarouzie
 
b) What is the largest value that can be represented by 3 digits usi.docx
b) What is the largest value that can be represented by 3 digits usi.docxb) What is the largest value that can be represented by 3 digits usi.docx
b) What is the largest value that can be represented by 3 digits usi.docxcelenarouzie
 
b$ E=EE#s{gEgE lEgEHEFs ig=ii 5i= l; i € 3 r i.Er1 b €€.docx
b$ E=EE#s{gEgE lEgEHEFs ig=ii 5i= l; i € 3 r  i.Er1 b €€.docxb$ E=EE#s{gEgE lEgEHEFs ig=ii 5i= l; i € 3 r  i.Er1 b €€.docx
b$ E=EE#s{gEgE lEgEHEFs ig=ii 5i= l; i € 3 r i.Er1 b €€.docxcelenarouzie
 
B A S I C L O G I C M O D E L D E V E L O P M E N T Pr.docx
B A S I C  L O G I C  M O D E L  D E V E L O P M E N T  Pr.docxB A S I C  L O G I C  M O D E L  D E V E L O P M E N T  Pr.docx
B A S I C L O G I C M O D E L D E V E L O P M E N T Pr.docxcelenarouzie
 
B H1. The first issue that jumped out to me is that the presiden.docx
B H1. The first issue that jumped out to me is that the presiden.docxB H1. The first issue that jumped out to me is that the presiden.docx
B H1. The first issue that jumped out to me is that the presiden.docxcelenarouzie
 
b l u e p r i n t i CONSUMER PERCEPTIONSHQW DQPerception.docx
b l u e p r i n t i CONSUMER PERCEPTIONSHQW DQPerception.docxb l u e p r i n t i CONSUMER PERCEPTIONSHQW DQPerception.docx
b l u e p r i n t i CONSUMER PERCEPTIONSHQW DQPerception.docxcelenarouzie
 
B R O O K I N G SM E T R O P O L I TA N P O L I CY .docx
B R O O K I N G SM E T R O P O L I TA N P O L I CY .docxB R O O K I N G SM E T R O P O L I TA N P O L I CY .docx
B R O O K I N G SM E T R O P O L I TA N P O L I CY .docxcelenarouzie
 
B L O C K C H A I N & S U P P LY C H A I N SS U N I L.docx
B L O C K C H A I N  &  S U P P LY  C H A I N SS U N I L.docxB L O C K C H A I N  &  S U P P LY  C H A I N SS U N I L.docx
B L O C K C H A I N & S U P P LY C H A I N SS U N I L.docxcelenarouzie
 
Año 15, núm. 43 enero – abril de 2012. Análisis 97 Orien.docx
Año 15, núm. 43  enero – abril de 2012. Análisis 97 Orien.docxAño 15, núm. 43  enero – abril de 2012. Análisis 97 Orien.docx
Año 15, núm. 43 enero – abril de 2012. Análisis 97 Orien.docxcelenarouzie
 

More from celenarouzie (20)

Attaining ExpertiseYou are tr.docx
Attaining ExpertiseYou are tr.docxAttaining ExpertiseYou are tr.docx
Attaining ExpertiseYou are tr.docx
 
attachment Chloe” is a example of the whole packet. Please follow t.docx
attachment Chloe” is a example of the whole packet. Please follow t.docxattachment Chloe” is a example of the whole packet. Please follow t.docx
attachment Chloe” is a example of the whole packet. Please follow t.docx
 
AttachmentFor this discussionUse Ericksons theoretic.docx
AttachmentFor this discussionUse Ericksons theoretic.docxAttachmentFor this discussionUse Ericksons theoretic.docx
AttachmentFor this discussionUse Ericksons theoretic.docx
 
Attachment Programs and Families Working Together Learn.docx
Attachment Programs and Families Working Together Learn.docxAttachment Programs and Families Working Together Learn.docx
Attachment Programs and Families Working Together Learn.docx
 
Attachment and Emotional Development in InfancyThe purpose o.docx
Attachment and Emotional Development in InfancyThe purpose o.docxAttachment and Emotional Development in InfancyThe purpose o.docx
Attachment and Emotional Development in InfancyThe purpose o.docx
 
ATTACHEMENT from 7.1 and 7.2 Go back to the Powerpoint for thi.docx
ATTACHEMENT from 7.1 and 7.2 Go back to the Powerpoint for thi.docxATTACHEMENT from 7.1 and 7.2 Go back to the Powerpoint for thi.docx
ATTACHEMENT from 7.1 and 7.2 Go back to the Powerpoint for thi.docx
 
Attached the dataset Kaggle has hosted a data science competitio.docx
Attached the dataset Kaggle has hosted a data science competitio.docxAttached the dataset Kaggle has hosted a data science competitio.docx
Attached the dataset Kaggle has hosted a data science competitio.docx
 
Attached you will find all of the questions.These are just like th.docx
Attached you will find all of the questions.These are just like th.docxAttached you will find all of the questions.These are just like th.docx
Attached you will find all of the questions.These are just like th.docx
 
Attached the dataset Kaggle has hosted a data science compet.docx
Attached the dataset Kaggle has hosted a data science compet.docxAttached the dataset Kaggle has hosted a data science compet.docx
Attached the dataset Kaggle has hosted a data science compet.docx
 
B. Answer Learning Exercises  Matching words parts 1, 2, 3,.docx
B. Answer Learning Exercises  Matching words parts 1, 2, 3,.docxB. Answer Learning Exercises  Matching words parts 1, 2, 3,.docx
B. Answer Learning Exercises  Matching words parts 1, 2, 3,.docx
 
B)What is Joe waiting for in order to forgive Missy May in The Gild.docx
B)What is Joe waiting for in order to forgive Missy May in The Gild.docxB)What is Joe waiting for in order to forgive Missy May in The Gild.docx
B)What is Joe waiting for in order to forgive Missy May in The Gild.docx
 
B)Blanche and Stella both view Stanley very differently – how do the.docx
B)Blanche and Stella both view Stanley very differently – how do the.docxB)Blanche and Stella both view Stanley very differently – how do the.docx
B)Blanche and Stella both view Stanley very differently – how do the.docx
 
b) What is the largest value that can be represented by 3 digits usi.docx
b) What is the largest value that can be represented by 3 digits usi.docxb) What is the largest value that can be represented by 3 digits usi.docx
b) What is the largest value that can be represented by 3 digits usi.docx
 
b$ E=EE#s{gEgE lEgEHEFs ig=ii 5i= l; i € 3 r i.Er1 b €€.docx
b$ E=EE#s{gEgE lEgEHEFs ig=ii 5i= l; i € 3 r  i.Er1 b €€.docxb$ E=EE#s{gEgE lEgEHEFs ig=ii 5i= l; i € 3 r  i.Er1 b €€.docx
b$ E=EE#s{gEgE lEgEHEFs ig=ii 5i= l; i € 3 r i.Er1 b €€.docx
 
B A S I C L O G I C M O D E L D E V E L O P M E N T Pr.docx
B A S I C  L O G I C  M O D E L  D E V E L O P M E N T  Pr.docxB A S I C  L O G I C  M O D E L  D E V E L O P M E N T  Pr.docx
B A S I C L O G I C M O D E L D E V E L O P M E N T Pr.docx
 
B H1. The first issue that jumped out to me is that the presiden.docx
B H1. The first issue that jumped out to me is that the presiden.docxB H1. The first issue that jumped out to me is that the presiden.docx
B H1. The first issue that jumped out to me is that the presiden.docx
 
b l u e p r i n t i CONSUMER PERCEPTIONSHQW DQPerception.docx
b l u e p r i n t i CONSUMER PERCEPTIONSHQW DQPerception.docxb l u e p r i n t i CONSUMER PERCEPTIONSHQW DQPerception.docx
b l u e p r i n t i CONSUMER PERCEPTIONSHQW DQPerception.docx
 
B R O O K I N G SM E T R O P O L I TA N P O L I CY .docx
B R O O K I N G SM E T R O P O L I TA N P O L I CY .docxB R O O K I N G SM E T R O P O L I TA N P O L I CY .docx
B R O O K I N G SM E T R O P O L I TA N P O L I CY .docx
 
B L O C K C H A I N & S U P P LY C H A I N SS U N I L.docx
B L O C K C H A I N  &  S U P P LY  C H A I N SS U N I L.docxB L O C K C H A I N  &  S U P P LY  C H A I N SS U N I L.docx
B L O C K C H A I N & S U P P LY C H A I N SS U N I L.docx
 
Año 15, núm. 43 enero – abril de 2012. Análisis 97 Orien.docx
Año 15, núm. 43  enero – abril de 2012. Análisis 97 Orien.docxAño 15, núm. 43  enero – abril de 2012. Análisis 97 Orien.docx
Año 15, núm. 43 enero – abril de 2012. Análisis 97 Orien.docx
 

Recently uploaded

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 

Recently uploaded (20)

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 

Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx

  • 1. Walmart Sales Prediction Using Rapidminer Prepared by : Nagarjun Singharavelu I. Introduction: Wal-Mart Stores, Inc is an American Multinational retail corporation that operates a chain of discount department stores and Warehouse Stores. Headquartered in Bentonville, Arkansas, United States, the company was founded by Sam Walton in 1962 and incorporated on October 31, 1969. It has over 11,000 stores in 27 countries, under a total 71 banners. Walmart is the world's largest company by revenue, according to the Fortune Global 500 list in 2014, as well as the biggest private employer in the world with 2.2 million employees. Walmart is a family-owned business, as the company is controlled by the Walton family. Sam Walton's heirs own over 50 percent of Walmart through their holding company, Walton
  • 2. Enterprises, and through their individual holdings. The company was listed on the New York Stock Exchange in 1972. In the late 1980s and early 1990s, the company rose from a regional to a national giant. By 1988, Walmart was the most profitable retailer in the U.S. Walmart helps individuals round the world economize and live better. The main aim of our project is to identify the impact on sales throughout numerous strategic selections taken by the corporate. The analysis is performed on historical sales data across 45 Walmart stores located in different regions. The foremost necessary is Walmart runs many promotional markdown events throughout the year and we have to check the impact it creates on sales during that particular period. The markdowns precede prominent holidays, the four largest of which are the Labor Day, Thanksgiving and Christmas. During these weeks it is noted that there is a tremendous amount of change in the day-to-day sales. Hence we tend to apply different algorithms which we learnt in class
  • 3. over this dataset to identify the effect of markdowns on these holiday weeks. II. Information about dataset: We had taken four different datasets of Walmart from Kaggle.com containing the information about the stores, departments, average temperature in that particular region, CPI, day of the week, sales and mainly indicating if that week was a holiday. Let us explain each dataset in detail. Stores:
  • 4. we assume it to be available in the particular Train: -02- 05 to 2012-11- les in USD for the given department in the given
  • 5. - ce they were not considered Test: ict the sales for each triplet of store and Features: average temperature in that region which is measured which ranges from - articular region is available in the dataset from which we can
  • 6. identify that if the fuel is higher in that particular region, whether it affects the sales as o measure the changes in price level in that - promotional markdowns which Walmart is running. There are 5 markdowns in that particular duration in the Among these 5 files, Features and train files contain every necessary elements to do the
  • 7. predication. So we are going to take advantage of them. III. Data transformation & Preprocessing: Our aim is to find Walmart’s weekly sales. We have all the necessary data for predicting the weekly sales in Feature data. But the previous weekly sales report is in Train data set. So we decided to combine all the data sets into a single data set. After combining, we now have the sales data of 45 stores with up to 99 departments which consists of totally 421,000 records. Now our final data set has the following attributes. -
  • 8. Now we have 13 attributes, in which attributes such as department number (Dept), Date and Type are not required for our prediction, since it doesn’t have any impact on its weekly sales. Hence we are removing these attributes from our final data set. After removing all the unwanted data and attributes in our dataset we now have 6,435 records on total.With our huge dataset, it is difficult to get exact result about future sales from models because of the uncertainty in so many disturbing elements. So we decided to group the data and then implementing with classifiers or regression models. By using Discretize by Size operators, we grouped the data based on Weekly_Sales into 5 categories and then exported the output as CSV file to desktop. After processing, now check the data of weekly sales, we found that the ranges are as follows;
  • 9. range1 [-? - 9144.495] range2 [9144.495 - 14213.040] range3 [14213.040 - 22267.010] range4 [22267.010 - 39499.755] range5 [39499.755 - ?] Now we manually label the whole 6,435 records into 5 groups each contain 1,287 records. Groups Mark as Description Group 1 A range1 [-? - 9144.495] Group 2 B range2 [9144.495 - 14213.040] Group 3 C range3 [14213.040 - 22267.010] Group 4 D range4 [22267.010 - 39499.755] Group 5 E range5 [39499.755 - ?]
  • 10. IV. Mining models & Performance: Our purpose is to build a model to predict Walmart’s weekly sales in future, so we would like to select the most suitable model for predication. Look into the table, we find out the data is structured. However, the parameters are complex and do not show its rules obviously. Even more, we have no idea about whether it is linear or not. According to problems above, we determined to use Neural Network Model with our project because it has the characteristics below: •It is for complicated prediction problems. •Visualization or understanding of the rules are not needed. •Accuracy is very important. The design for implementing the Neural Network model in rapid miner is as follows:
  • 11. The output we obtained is as follows, Accuracy = 77.61%, when Learning Rate = 2000 & Training Cycles = 0.05 true A true B true C true D true E class precision
  • 12. pred. A 939 49 83 36 95 78.12% pred. B 17 1114 58 75 101 83.61% pred. C 160 102 923 148 29 68.77% pred. D 60 95 93 780 187 65.20% pred. E 23 276 43 161 788 63.04% 67.67% class recall 78.32% 68.09% 77.92% 65.00% We tried to test our result by changing learning rate and training cycle values and noted down the changes too. The result obtained in this process is as follows;
  • 13. Accuracy = 72.69%, when Learning Rate = 1000 & Training Cycles = 0.05 true A true B true C true D true E class precision pred. A 1012 62 96 42 102 77.62%
  • 14. pred. B 23 1193 58 76 120 84.61% pred. C 184 114 1025 181 28 68.91% pred. D 69 124 104 842 209 64.89% pred. E 34 306 37 179 861 61.89%
  • 15. class recall 76.72% 69.31% 78.65% 63.79% 67.23% Accuracy = 75.35%, when Learning Rate = 1000 & Training Cycles = 0.3 class true A true B true C true D true E
  • 16. precision pred. A 908 86 114 56 108 74.38% pred. B 25 1033 50 64 83 86.31% pred. C 164 123 895 176 40 64.02% pred. D 73 117 94 732 189 62.75% pred. E 29 277 47 172 780 63.07% class recall 77.03% 65.24% 77.28% 64.42% 67.02%
  • 17. Since weekly sales is the core for any industry. So it is necessary to get the more accurate results, since it have direct impact on companies revenue. To achieve better performance and accuracy results, we have now increased the hidden layer size to 10. We now combined all the hidden layer results obtained from each node in a single table to check where the weekly sales have a greater impact. Attributes Node 1 Node 2 Store -30.224 27.949 Temperature -0.192 0.68 Fuel_Price 1.054 1.459 MarkDown1 0.114 -2.679 MarkDown2 0.88 0.476 MarkDown3 9.209 -0.635
  • 18. Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 -20.9 -31.401 5.072 -17.01 40.862 27.088 18.266 16.821 -0.965 -0.601 0.038 -0.847 -0.436 -1.513 -1.094 0.126 -0.46 0.632 -1.242 -1.713 0.698 1.303 2.589 -0.007 -0.24 -0.636 -4.109 -1.322 1.004 0.648 -1.147 -2.402 -0.03 2.65 -0.387 -1.306 0.328 0.633 1.983 -0.821 0.631 7.522 -6.247 -2.194 -6.743 -0.075 1.157 6.285 MarkDown4 -0.613 -0.554 0.131 1.257 -2.967 -0.445 0.753 0.691 -0.115 0.599 MarkDown5 0.41 -3.728 -3.697 1.786 -14.643 -2.111 0.341 - 0.999 -7.048 -0.908 CPI -6.316 29.004 -12.179 -12.975 0.568 11.217 -27.856 - 15.442 -5.764 1.045 Unemployment -12.906 -17.876 4.064 -1.446 8.895 3.908 7.056 12.174 -31.068 1.142 IsHoliday 0.056 0 -0.191 -0.484 -0.068 -0.244 -0.151 -0.27 - 0.036 -0.025 Bias -13.162 -10.766 -9.052 -8.572 -25.596 9.488 -10.875 20.103 -2.328 -9.733
  • 19. The hidden layer network for this dataset will look like: Other Models: We tried to analyze data with more models other than Neural Network. But we find out that Neural Network model is the most suitable one for Walmart dataset. Other models cannot work on this example. Naïve Bayes:
  • 20. Design: Output: Accuracy = 23.42% true A true B true C true D true D class precision pred. A 8190 0 0 0 0 100.00% pred. B 19795 23905 23910 23905 23909 20.71% pred. C 15895 19128 19124 19128 19126 20.70%
  • 21. pred. D 0 0 0 0 0 0.00% pred. E 3917 4782 4781 4782 4780 20.74% class recall 17.13% 49.99% 40.00% 0.00% 10.00% K-NN: Design: Output:
  • 22. When K = 1, Accuracy = 28.76% true A true B true C true D true E class precision pred. A 385 207 373 180 199 28.65% pred. B 190 619 259 311 301 36.85% pred. C 352 271 187 225 188 15.29% pred. D 113 267 124 180 210 20.13% pred. E 159 272 257 304 302 23.34% class
  • 23. 32.11% 37.84% 15.58% 15.00% 25.17% recall When K = 10, Accuracy = 31.06% true A true B true C true D true E class precision pred. A 429 234 403 168 206 29.79% pred. B 272 834 288 421 428 37.18% pred. C 219 123 186 97 213 24.20%
  • 24. pred. D 195 266 162 347 281 28.64% pred. E 84 179 161 167 72 10.86% class 35.78% 50.98% 15.50% 28.92% 6.00% recall Decision Tree: No matter what the parameters are, the Accuracy is 25.42% and never change. true D true S true C true B true A
  • 25. class precision pred. D 0 0 0 0 0 0.00% pred. S 1199 1636 1200 1200 1200 25.42% pred. C 0 0 0 0 0 0.00% pred. B 0 0 0 0 0 0.00% pred. A 0 0 0 0 0 0.00% class 0.00% 100.00% 0.00% 0.00% 0.00% recall
  • 26. As we show above, Naïve Bayes Model products the lowest Accuracy (18.63%). Abstractly, the Naïve Bayes model is a conditional model like this: P (Sales | Store, Data, Temperature … & IsHoliday) = P(Sales)*P(Store, Data, Temperature, … & IsHoliday | Sales) P(Store, Data, Temperature, … & IsHoliday) Because variable Store, Data to IsHoliday are independent on each other, so: P (Store, Date, Temperature … & IsHoliday) =P (Store)*P (Date)*…..*P (IsHoliday) Due to so many numbers in columns Store, Date … IsHoliday
  • 27. that do not repeat, the probability of each Variables is too small. So P (Store)*P (Date)*…..*P (IsHoliday) will be far lower than 1/6435. This means the probability of sales basing such a model is infeasible. V. Interpretation: As a result, we find out that how each parameter affects the weekly sales. MarkDown 1 to 5 has the highest weight as 9 to -16 which mean it really makes an enormous impact on the sales. Promotion will increase weekly sales remarkably. Fuel price and temperature also makes a positive impact, higher price and temperature makes higher sales. CPI and Unemployment rate having a heavy negative impact on the prospects of sales. The higher CPI and unemployment rate, the less weekly sales. Holidays affect weekly sales slightly. We think customers don’t care whether today is holiday or not, the only reason they buy items is promotion. Therefore, Promotion drives sales and revenue to increase.
  • 28. VI. Conclusion: The main objective was to model the effects of markdown events throughout the year and it was successfully measured using various algorithms. From the analysis it is noted that MarkDown 1 to 5 has the highest weight as 16, which means it really makes an enormous impact on the sales. Promotion will increase weekly sales remarkably. Fuel price and temperature also makes a positive impact, higher price makes higher sales. CPI and Unemployment rate has a heavy negative impact on the prospects of sales. The higher CPI and unemployment rate, the less is the weekly sales. Holidays affect weekly sales slightly. We felt customers didn’t care whether it was a holiday or not because the only reason they buy items is due to promotion. It is notable that they are attracted to various deals in departmental stores all over the world. References: 1. https://www.kaggle.com/c/walmart-recruiting-store-sales- forecasting/data 2. http://www.walmart.com/ 3. http://en.wikipedia.org/wiki/Walmart https://www.kaggle.com/c/walmart-recruiting-store-sales- forecasting/data http://www.walmart.com/
  • 29. http://en.wikipedia.org/wiki/Walmart Allstate Purchase Prediction Challenge Predict a purchased policy based on transaction history As a customer shops an insurance policy, he/she will receive a number of quotes with different coverage options before purchasing a plan. This is represented in this challenge as a series of rows that include a customer ID, information about the customer, information about the quoted policy, and the cost. Your task is to predict the purchased coverage options using a limited subset of the total interaction history. If the eventual purchase can be predicted sooner in the shopping window, the quoting process is shortened and the issuer is less likely to lose the customer's business. Using a customer’s shopping history, can you predict what policy they will end up choosing? Files test_v2.csv.zip train.csv.zip Data Introduction The training and test sets contain transaction history for customers that ended up purchasing a policy. For each customer_ID, you are given their quote history. In the training set you have the entire quote history, the last row of which contains the coverage options they purchased. In the test set, you have only a partial history of the quotes and do not have the purchased coverage options. These are truncated to certain lengths to simulate making predictions with less history (higher uncertainty) or more history (lower uncertainty). For each customer_ID in the test set, you must predict the seven coverage options they end up purchasing. What is a customer? Each customer has many shopping points, where a shopping point is defined by a customer with certain characteristics viewing a product and its associated cost at a particular time.
  • 30. · Some customer characteristics may change over time (e.g. as the customer changes or provides new information), and the cost depends on both the product and the customer characteristics. · A customer may represent a collection of people, as policies can cover more than one person. · A customer may purchase a product that was not viewed! Product Options Each product has 7 customizable options selected by customers, each with 2, 3, or 4 ordinal values possible: A product is simply a vector with length 7 whose values are chosen from each of the options listed above. The cost of a product is a function of both the product options and customer characteristics. Variable Descriptions customer_ID - A unique identifier for the customer shopping_pt - Unique identifier for the shopping point of a given customer record_type - 0=shopping point, 1=purchase point day - Day of the week (0-6, 0=Monday) time - Time of day (HH:MM) state - State where shopping point occurred location - Location ID where shopping point occurred group_size - How many people will be covered under the policy (1, 2, 3 or 4) homeowner - Whether the customer owns a home or not (0=no, 1=yes) car_age - Age of the customer’s car car_value - How valuable was the customer’s car when new risk_factor - An ordinal assessment of how risky the customer is (1, 2, 3, 4) age_oldest - Age of the oldest person in customer's group age_youngest - Age of the youngest person in customer’s group married_couple - Does the customer group contain a married couple (0=no, 1=yes) C_previous - What the customer formerly had or currently has
  • 31. for product option C (0=nothing, 1, 2, 3,4) duration_previous - how long (in years) the customer was covered by their previous issuer A,B,C,D,E,F,G - the coverage options cost - cost of the quoted coverage options Data Mining Steps Problem Definition Market Analysis Customer Profiling, Identifying Customer Requirements, Cross Market Analysis, Target Marketing, Determining Customer purchasing pattern Corporate Analysis and Risk Management Finance Planning and Asset Evaluation, Resource Planning, Competition Fraud Detection Customer Retention Production Control Science Exploration > Data Preparation Data preparation is about constructing a dataset from one or more data sources to be used for exploration and modeling. It is a solid practice to start with an initial dataset to get familiar with the data, to discover first insights into the data and have a
  • 32. good understanding of any possible data quality issues. The Datasets you are provided in these projects were obtained from kaggle.com. Variable selection and description Numerical – Ratio, Interval Categorical – Ordinal, Nominal Simplifying variables: From continuous to discrete Formatting the data Basic data integrity checks: missing data, outliers > Data Exploration Data Exploration is about describing the data by means of statistical and visualization techniques. · Data Visualization: · Univariate analysis explores variables (attributes) one by one. Variables could be either categorical or numerical. Univariate Analysis - Categorical Statistics Visualization Description Count Bar Chart The number of values of the specified variable. Count% Pie Chart The percentage of values of the specified variable Univariate Analysis - Numerical Statistics Visualization Equation Description Count Histogram N The number of values (observations) of the variable.
  • 33. Minimum Box Plot Min The smallest value of the variable. Maximum Box Plot Max The largest value of the variable. Mean Box Plot The sum of the values divided by the count. Median Box Plot The middle value. Below and above median lies an equal number of values. Mode Histogram The most frequent value. There can be more than one mode. Quantile Box Plot A set of 'cut points' that divide a set of data into groups containing equal numbers of values (Quartile, Quintile, Percentile, ...). Range Box Plot Max-Min The difference between maximum and minimum. Variance Histogram A measure of data dispersion. Standard Deviation
  • 34. Histogram The square root of variance. Coefficient of Deviation Histogram A measure of data dispersion divided by mean. Skewness Histogram A measure of symmetry or asymmetry in the distribution of data. Kurtosis Histogram A measure of whether the data are peaked or flat relative to a normal distribution. Note: There are two types of numerical variables, interval and ratio. An interval variable has values whose differences are interpretable, but it does not have a true zero. A good example is temperature in Centigrade degrees. Data on an interval scale can be added and subtracted but cannot be meaningfully multiplied or divided. For example, we cannot say that one day is twice as hot as another day. In contrast, a ratio variable has values with a true zero and can be added, subtracted, multiplied or divided (e.g., weight). · Bivariate analysis is the simultaneous analysis of two variables (attributes). It explores the concept of relationship between two variables, whether there exists an association and the strength of this association. There are three types of bivariate analysis. 1.Numerical & Numerical Scatter Plot, Linear Correlation … 2.Categorical & Categorical
  • 35. Stacked Column Chart, Combination Chart, Chi-square Test 3.Numerical & Categorical Line Chart with Error Bars, Combination Chart, Z-test and t-test > Modeling · Predictive modeling is the process by which a model is created to predict an outcome · If the outcome is categorical it is called classification and if the outcome is numerical it is called regression. · Descriptive modeling or clustering is the assignment of observations into clusters so that observations in the same cluster are similar. · Finally, association rules can find interesting associations amongst observations. Classification algorithms: 1. Frequency Table · ZeroR, OneR, Naive Bayesian, Decision Tree 2. Covariance Matrix · Linear Discriminant Analysis, Logistic Regression 3. Similarity Functions · K Nearest Neighbors 4. Others · Artificial Neural Network, Support Vector Machine Regression 1. Frequency Table · Decision Tree 2. Covariance Matrix · Multiple Linear Regression 3. Similarity Function · K Nearest Neighbors 4. Others · Artificial Neural Network, Support Vector Machine Clustering algorithms are: 1. Hierarchical · Agglomerative, Divisive 2. Partitive
  • 36. · K Means, Self-Organizing Map > Evaluation · helps to find the best model that represents our data and how well the chosen model will work in the future. Hold-Out and Cross-Validation > Deployment The concept of deployment in predictive data mining refers to the application of a model for prediction to new data.