Historical Data on Avocado prices and Sales volume in US Markets

Overview:
• Data Resource
• Problem Definition
• Visualization
• Prediction – Tools to support data analysis
• Presenting findings
• Solving the problem framed in the beginning
QMST5336: ANALYTICS 2

• The retail sales data used for this analysis are based
on scanner data collected and provided by the Hass
Avocado Board.
• The data include total weekly retail sales in value
and volume for fresh Hass avocados (aggregated
across all relevant PLU codes) in 45 distinct local
market areas and eight regions (53 cross sectional
observations in total) for the years spanning 2015 –
2018
• These data represent an aggregation of retail outlets
that includes the following channels: grocery, mass
merchandisers, club stores, drugstores, dollar
outlets and military commissaries.
• An average price or unit value is computed in each
market and each week by dividing sales value by the
number of fresh Hass avocados sold.
DATASET : AVOCADO
Historical Data on Avocado prices and Sales volume in US
Markets

Columns:
• Date-The date of the observation
• AveragePrice-the average price of a single avocado
• Total Volume-Total number of avocados sold
• 4046-Total number of avocados with PLU 4046 sold
• Total Bags
• Small Bags
• Large Bags
• XLarge Bags
• Type-conventional or organic
• Year-the year
• Region-the city or region of the observation

Whether to import Avocados for 2020 or not?

Our Problem and Roadmap and WHERE we are!
Data
Visualization
Outliers
Text
Mining
Clustering
Regression
Predictive
Analytics
Descriptive
Analytics
Utility
Theory
Optimization
Decision
Analysis
Prescriptive
Analytics

Snapshot of our dataset after cleaning:
 Shape : (18249, 14)
 Null values: None
QMST5336: ANALYTICS
7

Snapshot of our dataset after mining date
column:
 Converted Date column to datetype and split it into Month and Day
 Converting Type : Organic or Conventional to dummy variable
QMST5336: ANALYTICS
8

Data
Visualization
Outliers
Text
Mining
Clustering
Regression
Predictive
Analytics
Descriptive
Analytics
Utility
Theory
Optimization
Decision
Analysis
Prescriptive
Analytics

Which type of Avocados are more in demand
(Conventional/Non-Organic VS Organic)?
• Organic vs Conventional : The main difference between organic and conventional food products are
the chemicals involved during production and processing. The interest in organic food products has
been rising steadily over the recent years with new health super fruits emerging.

Which type of Avocados are more in demand
(Conventional/Non-Organic VS Organic
agg by ‘Total Volume’)?
 A Pie Chart

Now, let's look at the average price distribution
In which range Average price lies?
 A Distribution Plot

How Average price is distributed over the months for
Conventional and Organic Types?
 A Line Plot

Now let's see the Average price distribution based on region
What are TOP 5 regions where Average price is very high?
 A Bar Chart

What are TOP 5 regions where Average price is very high?
These region are where price is very high
 HartfordSpringfield
 SanFrancisco
 NewYork
 Philadelphia
 Sacramento

What are TOP 5 regions where Average consumption is very high?
 A Bar Chart

What are TOP 5 regions where Average consumption is very high?
These region are where Consumption is very high
 West
 California
 SouthCentral
 Northeast
 Southeast

How dataset features are correlated with each other?
 As we can see from the heatmap above, all the Features are not correlated with the Average Price
column, instead most of them are correlated with each other. So now we are bit worried because that will
not help us get a good model. Let's try and see.

Data
Visualization
Outliers
Text
Mining
Clustering
Regression
Predictive
Analytics
Descriptive
Analytics
Utility
Theory
Optimization
Decision
Analysis
Prescriptive
Analytics

Model selection/predictions
 Aiming at observing the fluctuation of the avocado market in the United States based on weather
conditions, several machine learning techniques were evaluated to estimate the average price of a
unit(in dollars) of this agricultural product. For this purpose, we used the datasets listed before and
three algorithms of the sklearn:

Linear Regression: a technique used to determine the relationship of a y variable with one
or many other x1, . . . , xk variables. In a machine learning approach, it searches for several
functions that model the relationship between the variables and selects the one that most
closely approximates to or fits the data given in the class.
Decision tree builds regression or classification models in the form of a tree structure. It
breaks down a dataset into smaller and smaller subsets while at the same time an associated
decision tree is incrementally developed. The final result is a tree with decision nodes and
leaf nodes.
A random forest is a meta estimator that fits a number of classifying decision trees on
various sub-samples of the dataset and uses averaging to improve the predictive accuracy
and control over-fitting.

Performance metrics LinearRegression
(Baseline model that
we aimed to exceed)
DecisionTreeRegressor RandomForestRegressor
R Square Value 0.43 0.94 0.95
MAE: Mean Abs Error 0.23 0.13 0.10
MSE: Mean Sq Error 0.09 0.04 0.025
RMSE: Sqrt of MSE 0.30 0.21 0.15
Comparison of tools:
QMST5336: ANALYTICS
22

Output Summary of RandomForestRegressor:
QMST5336: ANALYTICS
23

 We predicted that RMSE is lower than the two previous models, so the RandomForest
Regressor is the best model in this case.
Linear
Regression
Decision Tree
Regression
RandomForest
Regression

 Residual = Observed value - Predicted value. e = y - ŷ Both the sum and the mean of the residuals are
equal to zero.
 Here that our residuals looked to be normally distributed and that's really a good sign which means
that our model was a correct choice for the data.
RandomForest Regressor
QMST5336: ANALYTICS
25

Data
Visualization
Outliers
Text
Mining
Clustering
Regression
Predictive
Analytics
Descriptive
Analytics
Utility
Theory
Optimization
Decision
Analysis
Prescriptive
Analytics

• Retailer chain in Dallas and
Houston with around 1000
stores
• Increase profit
• Procure from local Market Or
Direct import from Mexico
• No/Very low risk
• Prefer Mexico import due to
partnership
• May incur import duty of
10%, probability is 5%
• Meet consumption demand
Problem Map:
QMST5336: ANALYTICS
28

Historical data:
QMST5336: ANALYTICS
29

Case 1: Procure from local wholesale market Case 2: Direct import from Mexico
QMST5336: ANALYTICS
30

Options that can be executed:
 No Import – Continue existing model
 Direct Import from Mexico – preferred model
– potential to increase revenue by
$8,246,156.40
 Test Direct Import from Mexico for 1 week
 Based on test results, decide the next step
QMST5336: ANALYTICS
31

DECISION TREE ANALYSIS:

Sources:
 http://www.hassavocadoboard.com/retail/volume-and-
price-data
 https://www.tridge.com/intelligences/avocado/MX/wiki
 https://www.foodcoop.com/produce/
 https://www.statista.com/statistics/591766/mexico-
wholesale-prices-avocado-by-month/
 https://www.latimes.com/food/la-fo-trump-tariff-fruits-
import-export-fruit-20190531-story.html
 https://www.ams.usda.gov/rules-
regulations/section8e/avocados
QMST5336: ANALYTICS
33

ADITI SHAH HARMEET SINGH POOJA BASAVARAJU

Historical Data on Avocado prices and Sales volume in US Markets

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to Historical Data on Avocado prices and Sales volume in US Markets

Similar to Historical Data on Avocado prices and Sales volume in US Markets (20)

Recently uploaded

Recently uploaded (20)

Historical Data on Avocado prices and Sales volume in US Markets