SlideShare a Scribd company logo
QMST5336: ANALYTICS 1
Overview:
• Data Resource
• Problem Definition
• Visualization
• Prediction – Tools to support data analysis
• Presenting findings
• Solving the problem framed in the beginning
QMST5336: ANALYTICS 2
• The retail sales data used for this analysis are based
on scanner data collected and provided by the Hass
Avocado Board.
• The data include total weekly retail sales in value
and volume for fresh Hass avocados (aggregated
across all relevant PLU codes) in 45 distinct local
market areas and eight regions (53 cross sectional
observations in total) for the years spanning 2015 –
2018
• These data represent an aggregation of retail outlets
that includes the following channels: grocery, mass
merchandisers, club stores, drugstores, dollar
outlets and military commissaries.
• An average price or unit value is computed in each
market and each week by dividing sales value by the
number of fresh Hass avocados sold.
DATASET : AVOCADO
Historical Data on Avocado prices and Sales volume in US
Markets
QMST5336: ANALYTICS 3
Columns:
• Date-The date of the observation
• AveragePrice-the average price of a single avocado
• Total Volume-Total number of avocados sold
• 4046-Total number of avocados with PLU 4046 sold
• 4225-Total number of avocados with PLU 4225 sold
• 4770-Total number of avocados with PLU 4770 sold
• Total Bags
• Small Bags
• Large Bags
• XLarge Bags
• Type-conventional or organic
• Year-the year
• Region-the city or region of the observation
QMST5336: ANALYTICS 4
Whether to import Avocados for 2020 or not?
QMST5336: ANALYTICS 5
Our Problem and Roadmap and WHERE we are!
Whether to import Avocados for 2020 or not?
Data
Visualization
Outliers
Text
Mining
Clustering
Regression
Predictive
Analytics
Descriptive
Analytics
Utility
Theory
Optimization
Decision
Analysis
Prescriptive
Analytics
QMST5336: ANALYTICS 6
Snapshot of our dataset after cleaning:
 Shape : (18249, 14)
 Null values: None
QMST5336: ANALYTICS
7
Snapshot of our dataset after mining date
column:
 Converted Date column to datetype and split it into Month and Day
 Converting Type : Organic or Conventional to dummy variable
QMST5336: ANALYTICS
8
Our Problem and Roadmap and WHERE we are!
Whether to import Avocados for 2020 or not?
Data
Visualization
Outliers
Text
Mining
Clustering
Regression
Predictive
Analytics
Descriptive
Analytics
Utility
Theory
Optimization
Decision
Analysis
Prescriptive
Analytics
QMST5336: ANALYTICS 9
Which type of Avocados are more in demand
(Conventional/Non-Organic VS Organic)?
• Organic vs Conventional : The main difference between organic and conventional food products are
the chemicals involved during production and processing. The interest in organic food products has
been rising steadily over the recent years with new health super fruits emerging.
QMST5336: ANALYTICS 10
Which type of Avocados are more in demand
(Conventional/Non-Organic VS Organic
agg by ‘Total Volume’)?
 A Pie Chart
QMST5336: ANALYTICS 11
Now, let's look at the average price distribution
In which range Average price lies?
 A Distribution Plot
QMST5336: ANALYTICS 12
How Average price is distributed over the months for
Conventional and Organic Types?
 A Line Plot
QMST5336: ANALYTICS 13
Now let's see the Average price distribution based on region
What are TOP 5 regions where Average price is very high?
 A Bar Chart
QMST5336: ANALYTICS 14
What are TOP 5 regions where Average price is very high?
These region are where price is very high
 HartfordSpringfield
 SanFrancisco
 NewYork
 Philadelphia
 Sacramento
QMST5336: ANALYTICS 15
What are TOP 5 regions where Average consumption is very high?
 A Bar Chart
QMST5336: ANALYTICS 16
What are TOP 5 regions where Average consumption is very high?
These region are where Consumption is very high
 West
 California
 SouthCentral
 Northeast
 Southeast
QMST5336: ANALYTICS 17
How dataset features are correlated with each other?
 As we can see from the heatmap above, all the Features are not correlated with the Average Price
column, instead most of them are correlated with each other. So now we are bit worried because that will
not help us get a good model. Let's try and see.
QMST5336: ANALYTICS 18
Our Problem and Roadmap and WHERE we are!
Whether to import Avocados for 2020 or not?
Data
Visualization
Outliers
Text
Mining
Clustering
Regression
Predictive
Analytics
Descriptive
Analytics
Utility
Theory
Optimization
Decision
Analysis
Prescriptive
Analytics
QMST5336: ANALYTICS 19
Model selection/predictions
 Aiming at observing the fluctuation of the avocado market in the United States based on weather
conditions, several machine learning techniques were evaluated to estimate the average price of a
unit(in dollars) of this agricultural product. For this purpose, we used the datasets listed before and
three algorithms of the sklearn:
QMST5336: ANALYTICS 20
Linear Regression: a technique used to determine the relationship of a y variable with one
or many other x1, . . . , xk variables. In a machine learning approach, it searches for several
functions that model the relationship between the variables and selects the one that most
closely approximates to or fits the data given in the class.
Decision tree builds regression or classification models in the form of a tree structure. It
breaks down a dataset into smaller and smaller subsets while at the same time an associated
decision tree is incrementally developed. The final result is a tree with decision nodes and
leaf nodes.
A random forest is a meta estimator that fits a number of classifying decision trees on
various sub-samples of the dataset and uses averaging to improve the predictive accuracy
and control over-fitting.
QMST5336: ANALYTICS 21
Performance metrics LinearRegression
(Baseline model that
we aimed to exceed)
DecisionTreeRegressor RandomForestRegressor
R Square Value 0.43 0.94 0.95
MAE: Mean Abs Error 0.23 0.13 0.10
MSE: Mean Sq Error 0.09 0.04 0.025
RMSE: Sqrt of MSE 0.30 0.21 0.15
Comparison of tools:
QMST5336: ANALYTICS
22
Output Summary of RandomForestRegressor:
QMST5336: ANALYTICS
23
Model selection/predictions
 We predicted that RMSE is lower than the two previous models, so the RandomForest
Regressor is the best model in this case.
Linear
Regression
Decision Tree
Regression
RandomForest
Regression
QMST5336: ANALYTICS 24
Model selection/predictions
 Residual = Observed value - Predicted value. e = y - ŷ Both the sum and the mean of the residuals are
equal to zero.
 Here that our residuals looked to be normally distributed and that's really a good sign which means
that our model was a correct choice for the data.
RandomForest Regressor
QMST5336: ANALYTICS
25
QMST5336: ANALYTICS
26
Our Problem and Roadmap and WHERE we are!
Whether to import Avocados for 2020 or not?
Data
Visualization
Outliers
Text
Mining
Clustering
Regression
Predictive
Analytics
QMST5336: ANALYTICS 27
Descriptive
Analytics
Utility
Theory
Optimization
Decision
Analysis
Prescriptive
Analytics
• Retailer chain in Dallas and
Houston with around 1000
stores
• Increase profit
• Procure from local Market Or
Direct import from Mexico
• No/Very low risk
• Prefer Mexico import due to
partnership
• May incur import duty of
10%, probability is 5%
• Meet consumption demand
Problem Map:
QMST5336: ANALYTICS
28
Historical data:
QMST5336: ANALYTICS
29
Case 1: Procure from local wholesale market Case 2: Direct import from Mexico
QMST5336: ANALYTICS
30
Options that can be executed:
 No Import – Continue existing model
 Direct Import from Mexico – preferred model
– potential to increase revenue by
$8,246,156.40
 Test Direct Import from Mexico for 1 week
 Based on test results, decide the next step
QMST5336: ANALYTICS
31
DECISION TREE ANALYSIS:
QMST5336: ANALYTICS 32
Sources:
 http://www.hassavocadoboard.com/retail/volume-and-
price-data
 https://www.tridge.com/intelligences/avocado/MX/wiki
 https://www.foodcoop.com/produce/
 https://www.statista.com/statistics/591766/mexico-
wholesale-prices-avocado-by-month/
 https://www.latimes.com/food/la-fo-trump-tariff-fruits-
import-export-fruit-20190531-story.html
 https://www.ams.usda.gov/rules-
regulations/section8e/avocados
QMST5336: ANALYTICS
33
ADITI SHAH HARMEET SINGH POOJA BASAVARAJU
QMST5336: ANALYTICS 34

More Related Content

What's hot

Fundamentals of Cryptoeconomics
Fundamentals of CryptoeconomicsFundamentals of Cryptoeconomics
Fundamentals of Cryptoeconomics
Pranay Prateek
 
Blockchain in healthcare sector
Blockchain in healthcare sectorBlockchain in healthcare sector
Blockchain in healthcare sector
Balaji Naik
 
Digital DNA-seq Technology: Targeted Enrichment for Cancer Research
Digital DNA-seq Technology: Targeted Enrichment for Cancer ResearchDigital DNA-seq Technology: Targeted Enrichment for Cancer Research
Digital DNA-seq Technology: Targeted Enrichment for Cancer Research
QIAGEN
 
Bitcoin: Today and Future
Bitcoin: Today and FutureBitcoin: Today and Future
Bitcoin: Today and Future
Ivano Digital
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
QIAGEN
 
Blockchain for Media & Entertainment
Blockchain for Media & EntertainmentBlockchain for Media & Entertainment
Blockchain for Media & Entertainment
accenture
 
Blockchain Technology Presentation (Delwar alam Security researcher at BugsBD)
Blockchain Technology Presentation (Delwar alam Security researcher at BugsBD)Blockchain Technology Presentation (Delwar alam Security researcher at BugsBD)
Blockchain Technology Presentation (Delwar alam Security researcher at BugsBD)
Delwar Alam
 
Hyperledger fabric 20180528
Hyperledger fabric 20180528Hyperledger fabric 20180528
Hyperledger fabric 20180528
Arnaud Le Hors
 

What's hot (8)

Fundamentals of Cryptoeconomics
Fundamentals of CryptoeconomicsFundamentals of Cryptoeconomics
Fundamentals of Cryptoeconomics
 
Blockchain in healthcare sector
Blockchain in healthcare sectorBlockchain in healthcare sector
Blockchain in healthcare sector
 
Digital DNA-seq Technology: Targeted Enrichment for Cancer Research
Digital DNA-seq Technology: Targeted Enrichment for Cancer ResearchDigital DNA-seq Technology: Targeted Enrichment for Cancer Research
Digital DNA-seq Technology: Targeted Enrichment for Cancer Research
 
Bitcoin: Today and Future
Bitcoin: Today and FutureBitcoin: Today and Future
Bitcoin: Today and Future
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
 
Blockchain for Media & Entertainment
Blockchain for Media & EntertainmentBlockchain for Media & Entertainment
Blockchain for Media & Entertainment
 
Blockchain Technology Presentation (Delwar alam Security researcher at BugsBD)
Blockchain Technology Presentation (Delwar alam Security researcher at BugsBD)Blockchain Technology Presentation (Delwar alam Security researcher at BugsBD)
Blockchain Technology Presentation (Delwar alam Security researcher at BugsBD)
 
Hyperledger fabric 20180528
Hyperledger fabric 20180528Hyperledger fabric 20180528
Hyperledger fabric 20180528
 

Similar to Historical Data on Avocado prices and Sales volume in US Markets

Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
Nguyen Ngoc Binh Phuong
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progresso
veesingh
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
Boston Institute of Analytics
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
Pranov Mishra
 
Modeling for the Non-Statistician
Modeling for the Non-StatisticianModeling for the Non-Statistician
Modeling for the Non-Statistician
Andrew Curtis
 
Classification modelling review
Classification modelling reviewClassification modelling review
Classification modelling review
Jaideep Adusumelli
 
IHP 525 Milestone Five (Final) TemplateMOST OF THIS TEMPLATE S.docx
IHP 525 Milestone Five (Final) TemplateMOST OF THIS TEMPLATE S.docxIHP 525 Milestone Five (Final) TemplateMOST OF THIS TEMPLATE S.docx
IHP 525 Milestone Five (Final) TemplateMOST OF THIS TEMPLATE S.docx
wilcockiris
 
Final SAS Day 2015 Poster
Final SAS Day 2015 PosterFinal SAS Day 2015 Poster
Final SAS Day 2015 Poster
Reuben Hilliard
 
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
Daniel Valcarce
 
report
reportreport
report
Arthur He
 
Workbook Project
Workbook ProjectWorkbook Project
Workbook Project
Brian Ryan
 
NEW MARKET SEGMENTATION METHODS USING ENHANCED (RFM), CLV, MODIFIED REGRESSIO...
NEW MARKET SEGMENTATION METHODS USING ENHANCED (RFM), CLV, MODIFIED REGRESSIO...NEW MARKET SEGMENTATION METHODS USING ENHANCED (RFM), CLV, MODIFIED REGRESSIO...
NEW MARKET SEGMENTATION METHODS USING ENHANCED (RFM), CLV, MODIFIED REGRESSIO...
ijcsit
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docx
ShiraPrater50
 
Operations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paperOperations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paper
Somashekar S.M
 
Chapter 03
Chapter 03Chapter 03
Chapter 03
bmcfad01
 
QA QC
QA QCQA QC
MR Multivariate.pptx
MR Multivariate.pptxMR Multivariate.pptx
MR Multivariate.pptx
AnanyaSharma724578
 
TOP 10 Forecasting models Meghan WoodsMarketing 188 Dr. .docx
TOP 10 Forecasting models Meghan WoodsMarketing 188 Dr. .docxTOP 10 Forecasting models Meghan WoodsMarketing 188 Dr. .docx
TOP 10 Forecasting models Meghan WoodsMarketing 188 Dr. .docx
turveycharlyn
 
Customer analytics
Customer analyticsCustomer analytics
Customer analytics
Karl Melo
 
Reducing False Positives
Reducing False PositivesReducing False Positives
Reducing False Positives
Mayank Johri
 

Similar to Historical Data on Avocado prices and Sales volume in US Markets (20)

Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progresso
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Modeling for the Non-Statistician
Modeling for the Non-StatisticianModeling for the Non-Statistician
Modeling for the Non-Statistician
 
Classification modelling review
Classification modelling reviewClassification modelling review
Classification modelling review
 
IHP 525 Milestone Five (Final) TemplateMOST OF THIS TEMPLATE S.docx
IHP 525 Milestone Five (Final) TemplateMOST OF THIS TEMPLATE S.docxIHP 525 Milestone Five (Final) TemplateMOST OF THIS TEMPLATE S.docx
IHP 525 Milestone Five (Final) TemplateMOST OF THIS TEMPLATE S.docx
 
Final SAS Day 2015 Poster
Final SAS Day 2015 PosterFinal SAS Day 2015 Poster
Final SAS Day 2015 Poster
 
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
On the Robustness and Discriminative Power of IR Metrics for Top-N Recommenda...
 
report
reportreport
report
 
Workbook Project
Workbook ProjectWorkbook Project
Workbook Project
 
NEW MARKET SEGMENTATION METHODS USING ENHANCED (RFM), CLV, MODIFIED REGRESSIO...
NEW MARKET SEGMENTATION METHODS USING ENHANCED (RFM), CLV, MODIFIED REGRESSIO...NEW MARKET SEGMENTATION METHODS USING ENHANCED (RFM), CLV, MODIFIED REGRESSIO...
NEW MARKET SEGMENTATION METHODS USING ENHANCED (RFM), CLV, MODIFIED REGRESSIO...
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docx
 
Operations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paperOperations Management VTU BE Mechanical 2015 Solved paper
Operations Management VTU BE Mechanical 2015 Solved paper
 
Chapter 03
Chapter 03Chapter 03
Chapter 03
 
QA QC
QA QCQA QC
QA QC
 
MR Multivariate.pptx
MR Multivariate.pptxMR Multivariate.pptx
MR Multivariate.pptx
 
TOP 10 Forecasting models Meghan WoodsMarketing 188 Dr. .docx
TOP 10 Forecasting models Meghan WoodsMarketing 188 Dr. .docxTOP 10 Forecasting models Meghan WoodsMarketing 188 Dr. .docx
TOP 10 Forecasting models Meghan WoodsMarketing 188 Dr. .docx
 
Customer analytics
Customer analyticsCustomer analytics
Customer analytics
 
Reducing False Positives
Reducing False PositivesReducing False Positives
Reducing False Positives
 

Recently uploaded

一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
mbawufebxi
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 

Recently uploaded (20)

一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
一比一原版雷丁大学毕业证(UoR毕业证书)学历如何办理
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 

Historical Data on Avocado prices and Sales volume in US Markets

  • 2. Overview: • Data Resource • Problem Definition • Visualization • Prediction – Tools to support data analysis • Presenting findings • Solving the problem framed in the beginning QMST5336: ANALYTICS 2
  • 3. • The retail sales data used for this analysis are based on scanner data collected and provided by the Hass Avocado Board. • The data include total weekly retail sales in value and volume for fresh Hass avocados (aggregated across all relevant PLU codes) in 45 distinct local market areas and eight regions (53 cross sectional observations in total) for the years spanning 2015 – 2018 • These data represent an aggregation of retail outlets that includes the following channels: grocery, mass merchandisers, club stores, drugstores, dollar outlets and military commissaries. • An average price or unit value is computed in each market and each week by dividing sales value by the number of fresh Hass avocados sold. DATASET : AVOCADO Historical Data on Avocado prices and Sales volume in US Markets QMST5336: ANALYTICS 3
  • 4. Columns: • Date-The date of the observation • AveragePrice-the average price of a single avocado • Total Volume-Total number of avocados sold • 4046-Total number of avocados with PLU 4046 sold • 4225-Total number of avocados with PLU 4225 sold • 4770-Total number of avocados with PLU 4770 sold • Total Bags • Small Bags • Large Bags • XLarge Bags • Type-conventional or organic • Year-the year • Region-the city or region of the observation QMST5336: ANALYTICS 4
  • 5. Whether to import Avocados for 2020 or not? QMST5336: ANALYTICS 5
  • 6. Our Problem and Roadmap and WHERE we are! Whether to import Avocados for 2020 or not? Data Visualization Outliers Text Mining Clustering Regression Predictive Analytics Descriptive Analytics Utility Theory Optimization Decision Analysis Prescriptive Analytics QMST5336: ANALYTICS 6
  • 7. Snapshot of our dataset after cleaning:  Shape : (18249, 14)  Null values: None QMST5336: ANALYTICS 7
  • 8. Snapshot of our dataset after mining date column:  Converted Date column to datetype and split it into Month and Day  Converting Type : Organic or Conventional to dummy variable QMST5336: ANALYTICS 8
  • 9. Our Problem and Roadmap and WHERE we are! Whether to import Avocados for 2020 or not? Data Visualization Outliers Text Mining Clustering Regression Predictive Analytics Descriptive Analytics Utility Theory Optimization Decision Analysis Prescriptive Analytics QMST5336: ANALYTICS 9
  • 10. Which type of Avocados are more in demand (Conventional/Non-Organic VS Organic)? • Organic vs Conventional : The main difference between organic and conventional food products are the chemicals involved during production and processing. The interest in organic food products has been rising steadily over the recent years with new health super fruits emerging. QMST5336: ANALYTICS 10
  • 11. Which type of Avocados are more in demand (Conventional/Non-Organic VS Organic agg by ‘Total Volume’)?  A Pie Chart QMST5336: ANALYTICS 11
  • 12. Now, let's look at the average price distribution In which range Average price lies?  A Distribution Plot QMST5336: ANALYTICS 12
  • 13. How Average price is distributed over the months for Conventional and Organic Types?  A Line Plot QMST5336: ANALYTICS 13
  • 14. Now let's see the Average price distribution based on region What are TOP 5 regions where Average price is very high?  A Bar Chart QMST5336: ANALYTICS 14
  • 15. What are TOP 5 regions where Average price is very high? These region are where price is very high  HartfordSpringfield  SanFrancisco  NewYork  Philadelphia  Sacramento QMST5336: ANALYTICS 15
  • 16. What are TOP 5 regions where Average consumption is very high?  A Bar Chart QMST5336: ANALYTICS 16
  • 17. What are TOP 5 regions where Average consumption is very high? These region are where Consumption is very high  West  California  SouthCentral  Northeast  Southeast QMST5336: ANALYTICS 17
  • 18. How dataset features are correlated with each other?  As we can see from the heatmap above, all the Features are not correlated with the Average Price column, instead most of them are correlated with each other. So now we are bit worried because that will not help us get a good model. Let's try and see. QMST5336: ANALYTICS 18
  • 19. Our Problem and Roadmap and WHERE we are! Whether to import Avocados for 2020 or not? Data Visualization Outliers Text Mining Clustering Regression Predictive Analytics Descriptive Analytics Utility Theory Optimization Decision Analysis Prescriptive Analytics QMST5336: ANALYTICS 19
  • 20. Model selection/predictions  Aiming at observing the fluctuation of the avocado market in the United States based on weather conditions, several machine learning techniques were evaluated to estimate the average price of a unit(in dollars) of this agricultural product. For this purpose, we used the datasets listed before and three algorithms of the sklearn: QMST5336: ANALYTICS 20
  • 21. Linear Regression: a technique used to determine the relationship of a y variable with one or many other x1, . . . , xk variables. In a machine learning approach, it searches for several functions that model the relationship between the variables and selects the one that most closely approximates to or fits the data given in the class. Decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. QMST5336: ANALYTICS 21
  • 22. Performance metrics LinearRegression (Baseline model that we aimed to exceed) DecisionTreeRegressor RandomForestRegressor R Square Value 0.43 0.94 0.95 MAE: Mean Abs Error 0.23 0.13 0.10 MSE: Mean Sq Error 0.09 0.04 0.025 RMSE: Sqrt of MSE 0.30 0.21 0.15 Comparison of tools: QMST5336: ANALYTICS 22
  • 23. Output Summary of RandomForestRegressor: QMST5336: ANALYTICS 23
  • 24. Model selection/predictions  We predicted that RMSE is lower than the two previous models, so the RandomForest Regressor is the best model in this case. Linear Regression Decision Tree Regression RandomForest Regression QMST5336: ANALYTICS 24
  • 25. Model selection/predictions  Residual = Observed value - Predicted value. e = y - ŷ Both the sum and the mean of the residuals are equal to zero.  Here that our residuals looked to be normally distributed and that's really a good sign which means that our model was a correct choice for the data. RandomForest Regressor QMST5336: ANALYTICS 25
  • 27. Our Problem and Roadmap and WHERE we are! Whether to import Avocados for 2020 or not? Data Visualization Outliers Text Mining Clustering Regression Predictive Analytics QMST5336: ANALYTICS 27 Descriptive Analytics Utility Theory Optimization Decision Analysis Prescriptive Analytics
  • 28. • Retailer chain in Dallas and Houston with around 1000 stores • Increase profit • Procure from local Market Or Direct import from Mexico • No/Very low risk • Prefer Mexico import due to partnership • May incur import duty of 10%, probability is 5% • Meet consumption demand Problem Map: QMST5336: ANALYTICS 28
  • 30. Case 1: Procure from local wholesale market Case 2: Direct import from Mexico QMST5336: ANALYTICS 30
  • 31. Options that can be executed:  No Import – Continue existing model  Direct Import from Mexico – preferred model – potential to increase revenue by $8,246,156.40  Test Direct Import from Mexico for 1 week  Based on test results, decide the next step QMST5336: ANALYTICS 31
  • 33. Sources:  http://www.hassavocadoboard.com/retail/volume-and- price-data  https://www.tridge.com/intelligences/avocado/MX/wiki  https://www.foodcoop.com/produce/  https://www.statista.com/statistics/591766/mexico- wholesale-prices-avocado-by-month/  https://www.latimes.com/food/la-fo-trump-tariff-fruits- import-export-fruit-20190531-story.html  https://www.ams.usda.gov/rules- regulations/section8e/avocados QMST5336: ANALYTICS 33
  • 34. ADITI SHAH HARMEET SINGH POOJA BASAVARAJU QMST5336: ANALYTICS 34