SlideShare a Scribd company logo
1 of 33
Download to read offline
Big Data Hubris
Limitations in Aggregating Uber and Google Data
Thomas J. Weinandy
Ph.D. Candidate in Applied Economics
Western Michigan University
March 17, 2020
Data Science and Analytics West Michigan
West Michigan R Users Group
Thomas J. Weinandy Big Data Hubris March 17, 2020 1 / 33
Overview
Slides available at https://www.slideshare.net/TomWeinandy
R code available at https://github.com/tomweinandy
Part I: The Parable of Google Flu Trends
Part II: The impact of fuel prices on Uber and taxi trips
Thomas J. Weinandy Big Data Hubris March 17, 2020 2 / 33
The Gartner Hype Cycle
Where is Big Data at?
https://en.wikipedia.org/wiki/Hype_cycle
Thomas J. Weinandy Big Data Hubris March 17, 2020 3 / 33
From Big Data to Data Science
August 23, 2019
Thomas J. Weinandy Big Data Hubris March 17, 2020 4 / 33
Stage 1: Technology Trigger
The Problem
CDC collects national health data on seasonal influenza cases.
Publishes data at the weekly level with a 1-2 week reporting lag.
The Solution
Google Flu Trends was launched in 2009 based on search engine
data.
It was modeled with 45 queries selected to have the best fit with
CDC reports.
Provided daily estimates for the US with only a one-day lag
(Ginsberg et al. 2009)
Thomas J. Weinandy Big Data Hubris March 17, 2020 5 / 33
Stage 2: Peak of Inflated Expectations
“Harnessing the collective intelligence of millions of users, Google
web search logs can provide one of the most timely, broad-reaching
influenza monitoring systems available today.” (Ginsberg et al.
2009, Nature p. 1014)
It became the quintessential example of how Big Data can solve
problems.
It was new, relatable, intuitive and brand-name.
By 2013, Google Flu Trends was rolled out to 29 countries and
expanded to include dengue fever (Butler 2013).
Thomas J. Weinandy Big Data Hubris March 17, 2020 6 / 33
Stage 3: Trough of Disillusionment
Lazer et al. (2014)
The 0 line is the number of influenza-like illnesses from the CDC.
Each line is the percent error from the baseline for three models.
Not robust to anomalous flu seasons.
Thomas J. Weinandy Big Data Hubris March 17, 2020 7 / 33
Stage 3: Trough of Disillusionment
Lazer et al. (2014)
Google Flu Trends (orange) does not fully correct for seasonality,
shown by the wave pattern of the error.
The error is temporally related (autocorrelation).
Google Flu Trends from 8/21/11 to 9/1/13 systematically
overestimated cases, sometime by more than 100%.
Thomas J. Weinandy Big Data Hubris March 17, 2020 8 / 33
Stage 4: Slope of Enlightenment
What went wrong?
1. Complexity & dynamism
Google Search is a complicated, always-changing algorithm
2. Overfitting big data to small data
Google data: 50 million search terms at the daily level
CDC data: 1152 influenza-like illness reports at the weekly level
Thomas J. Weinandy Big Data Hubris March 17, 2020 9 / 33
Stage 5: Plateau of Productivity
Lazer et al. (2014)
Out-of-sample mean absolute error:
0.463 for Google Flu model
0.311 for Lagged CDC model
0.232 for combined Google Flu + Lagged CDC model
Thomas J. Weinandy Big Data Hubris March 17, 2020 10 / 33
The Parable of Google Flu Trends
Big data hubris is the belief that big data sets are a substitute for,
rather than a supplement to, traditional data collection and
analysis.
Big data is not a silver bullet.
Aggregating big data to match small data is inherently destructive.
Thomas J. Weinandy Big Data Hubris March 17, 2020 11 / 33
How this relates to my dissertation
Intended Research Questions
How does fuel price impact the quantity of car service trips?
Will taxi and Uber drivers react differently?
Unintended Research Questions
Is data still “big” once I aggregate it?
Thomas J. Weinandy Big Data Hubris March 17, 2020 12 / 33
Types of Car Services in New York City (NYC)
Thomas J. Weinandy Big Data Hubris March 17, 2020 13 / 33
Taxi Background
Taxis are regulated by the New York City Taxi and Limousine
Commission (TLC) that sets the price schedule and determines a
fixed supply through medallions.
The medallion system makes licensed taxis very expensive, so most
cab drivers must lease a taxi for regularly scheduled 12-hour shifts
(Bagchi 2018).
Drivers decide when to end their shift and tend to work longer as
expected income goes up (Farber 2015, Chen et al. 2017).
Taxi drivers keep all revenue but pay for their lease and their own
gasoline.
Thomas J. Weinandy Big Data Hubris March 17, 2020 14 / 33
TNC Background
TNCs are not subject to taxi regulation since they pick up
customers through an app, not from a street hail.
TNCs have a flexible supply, flexible price model that adjusts in
real time according to relative supply and demand.
84% of surveyed Uber drivers chose the job “to have more
flexibility in my schedule and balance my work with my life and
family” (Hall & Krueger 2018, p713).
TNC drivers pay about 25% of their revenue to the platform while
also paying for all vehicle costs, including gasoline.
Thomas J. Weinandy Big Data Hubris March 17, 2020 15 / 33
Estimated Model
quantity of car service trips = f(fuel price,
previous supply & demand, economic activity,
weather, traffic flow, supply & demand patterns)
ln(tripst) = β0 + β1ln(fuel pricet) + β2ln(tripst-1)
+ β3ln(stock activityt) + β4temperaturet + β5raint
+ β7ln(collisionst) + β8ln(collisionst) ∗ raint
+ β9ln(collisionst) ∗ snowt + β10holidayt + β11trend
+ γ day-of-weekt + φ week-of-yeart + t
Thomas J. Weinandy Big Data Hubris March 17, 2020 16 / 33
Explanation of Variables
Thomas J. Weinandy Big Data Hubris March 17, 2020 17 / 33
Big Data
Collected over 726 million ride-level observations in NYC from
https://opendata.cityofnewyork.us
However, fuel prices are at the day level with 1002 observations.
Thomas J. Weinandy Big Data Hubris March 17, 2020 18 / 33
Car Service Trips
Daily Taxi and TNC Trips
Thomas J. Weinandy Big Data Hubris March 17, 2020 19 / 33
Fuel Prices
Price of New York Harbor Gasoline
Thomas J. Weinandy Big Data Hubris March 17, 2020 20 / 33
Estimation Method
Estimate separately for TNCs and taxis
Include multiple specifications for robustness
With and without a lagged dependent variable
With and without a measure of economic activity
Evaluate using OLS regression and SUR estimation
I do not include the competing car service in the model since that
would introduce endogeneity.
Seemingly Unrelated Regression (SUR) estimation corrects for the
contemporaneous correlation of residuals between the TNCs and
taxis by estimating them together (Zellner 1962).
Thomas J. Weinandy Big Data Hubris March 17, 2020 21 / 33
OLS Results
Thomas J. Weinandy Big Data Hubris March 17, 2020 22 / 33
OLS Results
Thomas J. Weinandy Big Data Hubris March 17, 2020 23 / 33
OLS Results with Lagged DV
Thomas J. Weinandy Big Data Hubris March 17, 2020 24 / 33
SUR Results
Thomas J. Weinandy Big Data Hubris March 17, 2020 25 / 33
Marginal Effects: Methodology
Consider the following parsimonious equation
yt = αxt + βyt-1 + γ Γt
where yt is the number of car service trips, xt is fuel price, and Γt is a
vector of all other variables.
Due to the lagged dependent variable yt-1, the impact of fuel prices on
the number of trips will persist for all future periods.
Thus, the cumulative marginal effect of a one-day change in gas prices
is
∞
i=0
∂yt+i
∂xt
=
α
1 − β
Thomas J. Weinandy Big Data Hubris March 17, 2020 26 / 33
Marginal Effects: Methodology
∞
i=0
∂yt+i
∂xt
=
α
1 − β
To calculate the standard error around the above marginal effect, I
follow De Boef & Keele (2008) to estimate the variance using
V ar
a
b
=
a2
b2
V ar (a)
a2
+
V ar (b)
b2
−
2Cov (a, b)
ab
where a = α and b = 1 − β. This gives me the estimator and its
statistical significance.
Thomas J. Weinandy Big Data Hubris March 17, 2020 27 / 33
Marginal Effects: Results
Thomas J. Weinandy Big Data Hubris March 17, 2020 28 / 33
Marginal Effects: Results
Thomas J. Weinandy Big Data Hubris March 17, 2020 29 / 33
Summary of Results
1 A one-day 1% increase in fuel prices is associated with a 0.367%
decrease in TNC trips and a 0.088% increase in taxi trips.
2 On a typical day when fuel prices change 1.729%, there will be an
average change of 2,190 TNC trips and 576 taxi trips in New York
City alone.
3 TNC drivers are very flexible and can reduce their supply when an
input cost like gasoline increases.
4 Due to heavy taxi regulation, cab drivers are stuck paying the
higher fuel price but benefit when competing TNC drivers exit the
market.
Thomas J. Weinandy Big Data Hubris March 17, 2020 30 / 33
Conclusion
Where is Big Data at?
Thomas J. Weinandy Big Data Hubris March 17, 2020 31 / 33
Works Cited
Bagchi, Sutirtha. “A Tale of Two Cities: An Examination of Medallion Prices in New York and
Chicago.” Review of Industrial Organization 53.2 (2018): 295-319.
Butler, Declan. ”When Google got flu wrong: US outbreak foxes a leading web-based method for
tracking seasonal flu.” Nature 494.7436 (2013): 155-157.
Chen, Minzhang, Judith A. Chevalier, Peter E. Rossi and Emily Oehlsen. “The Value of Flexible
Work: Evidence from Uber Drivers.” NBER Working Paper No. 23296 (March 2017).
De Boef, Suzanna, and Luke Keele. ”Taking time seriously.” American Journal of Political Science
52.1 (2008): 184-200.
Farber, Henry S. “Why you Can’t Find a Taxi in the Rain and Other Labor Supply Lessons from
Cab Drivers,” The Quarterly Journal of Economics, Volume 130, Issue 4, (November 1, 2015).
1975–2026.
Ginsberg, Jeremy, et al. ”Detecting influenza epidemics using search engine query data.” Nature
457.7232 (2009): 1012-1014.
Hall, Jonathan V., and Alan B. Krueger. “An analysis of the labor market for Uber’s
driver-partners in the United States.” ILR Review 71.3 (2018). 705-732.
Lazer, David, et al. ”The parable of Google Flu: traps in big data analysis.” Science 343.6176
(2014): 1203-1205.
Zellner, Arnold. “An Efficient Method of Estimating Seemingly Unrelated Regression Equations
and Tests of Aggregation Bias,” Journal of the American Statistical Association, 57. (1962). 500-509.
Thomas J. Weinandy Big Data Hubris March 17, 2020 32 / 33
thank you
@tomweinandy
thomas.j.weinandy@wmich.edu
Thomas J. Weinandy Big Data Hubris March 17, 2020 33 / 33

More Related Content

Similar to Big Data Hubris: Limitations in Aggregating Uber and Google Data

Do rising rents lead to longer commutes? A gravity model of commuting flows i...
Do rising rents lead to longer commutes? A gravity model of commuting flows i...Do rising rents lead to longer commutes? A gravity model of commuting flows i...
Do rising rents lead to longer commutes? A gravity model of commuting flows i...Economic and Social Research Institute
 
Rethink x+humanity+report
Rethink x+humanity+reportRethink x+humanity+report
Rethink x+humanity+reportAmérico Roque
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?iPresso
 
Data-intensive decision making in the era of big data and artificial intellig...
Data-intensive decision making in the era of big data and artificial intellig...Data-intensive decision making in the era of big data and artificial intellig...
Data-intensive decision making in the era of big data and artificial intellig...samossummit
 
Big Data & Sensors: Blowing Up Transportation
Big Data & Sensors: Blowing Up TransportationBig Data & Sensors: Blowing Up Transportation
Big Data & Sensors: Blowing Up Transportation1776
 
Internet Trends 2014 - Redesigned
Internet Trends 2014 - RedesignedInternet Trends 2014 - Redesigned
Internet Trends 2014 - RedesignedEmiland
 
How audiences use technology and its impact on their lives.
How audiences use technology and its impact on their lives.How audiences use technology and its impact on their lives.
How audiences use technology and its impact on their lives.Sarah Frame
 
Electric vehicles and electric utilities – a clear opportunity with many shapes
Electric vehicles and electric utilities – a clear opportunity with many shapesElectric vehicles and electric utilities – a clear opportunity with many shapes
Electric vehicles and electric utilities – a clear opportunity with many shapesCarlo Stella
 
Smart City Event at the British Consulate - Denver
Smart City Event at the British Consulate - DenverSmart City Event at the British Consulate - Denver
Smart City Event at the British Consulate - DenverMatthew Bailey
 
An Interactive Data Visualization And Analytics Tool To Evaluate Mobility And...
An Interactive Data Visualization And Analytics Tool To Evaluate Mobility And...An Interactive Data Visualization And Analytics Tool To Evaluate Mobility And...
An Interactive Data Visualization And Analytics Tool To Evaluate Mobility And...Michele Thomas
 
Rocket City Keynote
Rocket City KeynoteRocket City Keynote
Rocket City KeynoteLearon Dalby
 
EC 330, Fall 2019 Your name _________________________ Uni.docx
EC 330, Fall 2019 Your name _________________________ Uni.docxEC 330, Fall 2019 Your name _________________________ Uni.docx
EC 330, Fall 2019 Your name _________________________ Uni.docxmadlynplamondon
 
Term Paper on Self Driving Cars Impact in Economy
Term Paper on Self Driving Cars Impact in EconomyTerm Paper on Self Driving Cars Impact in Economy
Term Paper on Self Driving Cars Impact in EconomyMehnaz Maharin
 
The Evolution of Digital Marketing - 2024
The Evolution of Digital Marketing - 2024The Evolution of Digital Marketing - 2024
The Evolution of Digital Marketing - 2024nmcmillan9103
 
Inst 760_Data_Visualization_Final_paper
Inst 760_Data_Visualization_Final_paperInst 760_Data_Visualization_Final_paper
Inst 760_Data_Visualization_Final_paperKaran Kashyap
 
Open Data: Its Value and Lessons Learned
Open Data: Its Value and Lessons LearnedOpen Data: Its Value and Lessons Learned
Open Data: Its Value and Lessons LearnedAndrew Stott
 
Internet+Trends+2017+Report.pdf
Internet+Trends+2017+Report.pdfInternet+Trends+2017+Report.pdf
Internet+Trends+2017+Report.pdfssuser1b6625
 

Similar to Big Data Hubris: Limitations in Aggregating Uber and Google Data (20)

Tushar Dalvi DWBI
Tushar Dalvi DWBITushar Dalvi DWBI
Tushar Dalvi DWBI
 
Do rising rents lead to longer commutes? A gravity model of commuting flows i...
Do rising rents lead to longer commutes? A gravity model of commuting flows i...Do rising rents lead to longer commutes? A gravity model of commuting flows i...
Do rising rents lead to longer commutes? A gravity model of commuting flows i...
 
Rethink x+humanity+report
Rethink x+humanity+reportRethink x+humanity+report
Rethink x+humanity+report
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Data-intensive decision making in the era of big data and artificial intellig...
Data-intensive decision making in the era of big data and artificial intellig...Data-intensive decision making in the era of big data and artificial intellig...
Data-intensive decision making in the era of big data and artificial intellig...
 
Big Data & Sensors: Blowing Up Transportation
Big Data & Sensors: Blowing Up TransportationBig Data & Sensors: Blowing Up Transportation
Big Data & Sensors: Blowing Up Transportation
 
Internet Trends 2014 - Redesigned
Internet Trends 2014 - RedesignedInternet Trends 2014 - Redesigned
Internet Trends 2014 - Redesigned
 
How audiences use technology and its impact on their lives.
How audiences use technology and its impact on their lives.How audiences use technology and its impact on their lives.
How audiences use technology and its impact on their lives.
 
Electric vehicles and electric utilities – a clear opportunity with many shapes
Electric vehicles and electric utilities – a clear opportunity with many shapesElectric vehicles and electric utilities – a clear opportunity with many shapes
Electric vehicles and electric utilities – a clear opportunity with many shapes
 
Smart City Event at the British Consulate - Denver
Smart City Event at the British Consulate - DenverSmart City Event at the British Consulate - Denver
Smart City Event at the British Consulate - Denver
 
An Interactive Data Visualization And Analytics Tool To Evaluate Mobility And...
An Interactive Data Visualization And Analytics Tool To Evaluate Mobility And...An Interactive Data Visualization And Analytics Tool To Evaluate Mobility And...
An Interactive Data Visualization And Analytics Tool To Evaluate Mobility And...
 
GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...
GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...
GDRR Opening Workshop - Transportation System Reliability: Challenges and Opp...
 
Rocket City Keynote
Rocket City KeynoteRocket City Keynote
Rocket City Keynote
 
Who? What? Why we better care?
Who? What? Why we better care?Who? What? Why we better care?
Who? What? Why we better care?
 
EC 330, Fall 2019 Your name _________________________ Uni.docx
EC 330, Fall 2019 Your name _________________________ Uni.docxEC 330, Fall 2019 Your name _________________________ Uni.docx
EC 330, Fall 2019 Your name _________________________ Uni.docx
 
Term Paper on Self Driving Cars Impact in Economy
Term Paper on Self Driving Cars Impact in EconomyTerm Paper on Self Driving Cars Impact in Economy
Term Paper on Self Driving Cars Impact in Economy
 
The Evolution of Digital Marketing - 2024
The Evolution of Digital Marketing - 2024The Evolution of Digital Marketing - 2024
The Evolution of Digital Marketing - 2024
 
Inst 760_Data_Visualization_Final_paper
Inst 760_Data_Visualization_Final_paperInst 760_Data_Visualization_Final_paper
Inst 760_Data_Visualization_Final_paper
 
Open Data: Its Value and Lessons Learned
Open Data: Its Value and Lessons LearnedOpen Data: Its Value and Lessons Learned
Open Data: Its Value and Lessons Learned
 
Internet+Trends+2017+Report.pdf
Internet+Trends+2017+Report.pdfInternet+Trends+2017+Report.pdf
Internet+Trends+2017+Report.pdf
 

More from Tom Weinandy

Is a Recession Coming? The Good, the Bad and the Ugly of Economic Trends
Is a Recession Coming? The Good, the Bad and the Ugly of Economic TrendsIs a Recession Coming? The Good, the Bad and the Ugly of Economic Trends
Is a Recession Coming? The Good, the Bad and the Ugly of Economic TrendsTom Weinandy
 
ICCM 2013 Ignite Session 2
ICCM 2013 Ignite Session 2ICCM 2013 Ignite Session 2
ICCM 2013 Ignite Session 2Tom Weinandy
 
ICCM 2013 Panel 3: You Can't Get There from Here (Generating, Analyzing & Usi...
ICCM 2013 Panel 3: You Can't Get There from Here (Generating, Analyzing & Usi...ICCM 2013 Panel 3: You Can't Get There from Here (Generating, Analyzing & Usi...
ICCM 2013 Panel 3: You Can't Get There from Here (Generating, Analyzing & Usi...Tom Weinandy
 
ICCM 2013 Panel 2: Crisis Mapping for Conflict Management
ICCM 2013 Panel 2: Crisis Mapping for Conflict ManagementICCM 2013 Panel 2: Crisis Mapping for Conflict Management
ICCM 2013 Panel 2: Crisis Mapping for Conflict ManagementTom Weinandy
 
ICCM 2013 Panel 1: What's so Big about Big Data?
ICCM 2013 Panel 1: What's so Big about Big Data?ICCM 2013 Panel 1: What's so Big about Big Data?
ICCM 2013 Panel 1: What's so Big about Big Data?Tom Weinandy
 
ICCM 2013 Keynote: Andrej Verity
ICCM 2013 Keynote: Andrej VerityICCM 2013 Keynote: Andrej Verity
ICCM 2013 Keynote: Andrej VerityTom Weinandy
 
ICCM 2013 Westgate Panel Slides
ICCM 2013 Westgate Panel SlidesICCM 2013 Westgate Panel Slides
ICCM 2013 Westgate Panel SlidesTom Weinandy
 
ICCM 2013 Ignite Session 1
ICCM 2013 Ignite Session 1ICCM 2013 Ignite Session 1
ICCM 2013 Ignite Session 1Tom Weinandy
 
ICCM 2013 Opening Remarks
ICCM 2013 Opening RemarksICCM 2013 Opening Remarks
ICCM 2013 Opening RemarksTom Weinandy
 
Major in Social Entrepreneurship Proposal
Major in Social Entrepreneurship ProposalMajor in Social Entrepreneurship Proposal
Major in Social Entrepreneurship ProposalTom Weinandy
 
The Impact of Domestic and International Immersion Experiences on Students
The Impact of Domestic and International Immersion Experiences on StudentsThe Impact of Domestic and International Immersion Experiences on Students
The Impact of Domestic and International Immersion Experiences on StudentsTom Weinandy
 
A Major in Social Entrepreneurship
A Major in Social EntrepreneurshipA Major in Social Entrepreneurship
A Major in Social EntrepreneurshipTom Weinandy
 
How Do You View the World?
How Do You View the World?How Do You View the World?
How Do You View the World?Tom Weinandy
 
Social entrepreneurship
Social entrepreneurshipSocial entrepreneurship
Social entrepreneurshipTom Weinandy
 

More from Tom Weinandy (15)

Is a Recession Coming? The Good, the Bad and the Ugly of Economic Trends
Is a Recession Coming? The Good, the Bad and the Ugly of Economic TrendsIs a Recession Coming? The Good, the Bad and the Ugly of Economic Trends
Is a Recession Coming? The Good, the Bad and the Ugly of Economic Trends
 
ICCM 2013 Ignite Session 2
ICCM 2013 Ignite Session 2ICCM 2013 Ignite Session 2
ICCM 2013 Ignite Session 2
 
ICCM 2013 Panel 3: You Can't Get There from Here (Generating, Analyzing & Usi...
ICCM 2013 Panel 3: You Can't Get There from Here (Generating, Analyzing & Usi...ICCM 2013 Panel 3: You Can't Get There from Here (Generating, Analyzing & Usi...
ICCM 2013 Panel 3: You Can't Get There from Here (Generating, Analyzing & Usi...
 
ICCM 2013 Panel 2: Crisis Mapping for Conflict Management
ICCM 2013 Panel 2: Crisis Mapping for Conflict ManagementICCM 2013 Panel 2: Crisis Mapping for Conflict Management
ICCM 2013 Panel 2: Crisis Mapping for Conflict Management
 
ICCM 2013 Panel 1: What's so Big about Big Data?
ICCM 2013 Panel 1: What's so Big about Big Data?ICCM 2013 Panel 1: What's so Big about Big Data?
ICCM 2013 Panel 1: What's so Big about Big Data?
 
ICCM 2013 Keynote: Andrej Verity
ICCM 2013 Keynote: Andrej VerityICCM 2013 Keynote: Andrej Verity
ICCM 2013 Keynote: Andrej Verity
 
ICCM 2013 Westgate Panel Slides
ICCM 2013 Westgate Panel SlidesICCM 2013 Westgate Panel Slides
ICCM 2013 Westgate Panel Slides
 
ICCM 2013 Ignite Session 1
ICCM 2013 Ignite Session 1ICCM 2013 Ignite Session 1
ICCM 2013 Ignite Session 1
 
ICCM 2013 Opening Remarks
ICCM 2013 Opening RemarksICCM 2013 Opening Remarks
ICCM 2013 Opening Remarks
 
Major in Social Entrepreneurship Proposal
Major in Social Entrepreneurship ProposalMajor in Social Entrepreneurship Proposal
Major in Social Entrepreneurship Proposal
 
The Impact of Domestic and International Immersion Experiences on Students
The Impact of Domestic and International Immersion Experiences on StudentsThe Impact of Domestic and International Immersion Experiences on Students
The Impact of Domestic and International Immersion Experiences on Students
 
Crisis Mapping
Crisis MappingCrisis Mapping
Crisis Mapping
 
A Major in Social Entrepreneurship
A Major in Social EntrepreneurshipA Major in Social Entrepreneurship
A Major in Social Entrepreneurship
 
How Do You View the World?
How Do You View the World?How Do You View the World?
How Do You View the World?
 
Social entrepreneurship
Social entrepreneurshipSocial entrepreneurship
Social entrepreneurship
 

Recently uploaded

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

Big Data Hubris: Limitations in Aggregating Uber and Google Data

  • 1. Big Data Hubris Limitations in Aggregating Uber and Google Data Thomas J. Weinandy Ph.D. Candidate in Applied Economics Western Michigan University March 17, 2020 Data Science and Analytics West Michigan West Michigan R Users Group Thomas J. Weinandy Big Data Hubris March 17, 2020 1 / 33
  • 2. Overview Slides available at https://www.slideshare.net/TomWeinandy R code available at https://github.com/tomweinandy Part I: The Parable of Google Flu Trends Part II: The impact of fuel prices on Uber and taxi trips Thomas J. Weinandy Big Data Hubris March 17, 2020 2 / 33
  • 3. The Gartner Hype Cycle Where is Big Data at? https://en.wikipedia.org/wiki/Hype_cycle Thomas J. Weinandy Big Data Hubris March 17, 2020 3 / 33
  • 4. From Big Data to Data Science August 23, 2019 Thomas J. Weinandy Big Data Hubris March 17, 2020 4 / 33
  • 5. Stage 1: Technology Trigger The Problem CDC collects national health data on seasonal influenza cases. Publishes data at the weekly level with a 1-2 week reporting lag. The Solution Google Flu Trends was launched in 2009 based on search engine data. It was modeled with 45 queries selected to have the best fit with CDC reports. Provided daily estimates for the US with only a one-day lag (Ginsberg et al. 2009) Thomas J. Weinandy Big Data Hubris March 17, 2020 5 / 33
  • 6. Stage 2: Peak of Inflated Expectations “Harnessing the collective intelligence of millions of users, Google web search logs can provide one of the most timely, broad-reaching influenza monitoring systems available today.” (Ginsberg et al. 2009, Nature p. 1014) It became the quintessential example of how Big Data can solve problems. It was new, relatable, intuitive and brand-name. By 2013, Google Flu Trends was rolled out to 29 countries and expanded to include dengue fever (Butler 2013). Thomas J. Weinandy Big Data Hubris March 17, 2020 6 / 33
  • 7. Stage 3: Trough of Disillusionment Lazer et al. (2014) The 0 line is the number of influenza-like illnesses from the CDC. Each line is the percent error from the baseline for three models. Not robust to anomalous flu seasons. Thomas J. Weinandy Big Data Hubris March 17, 2020 7 / 33
  • 8. Stage 3: Trough of Disillusionment Lazer et al. (2014) Google Flu Trends (orange) does not fully correct for seasonality, shown by the wave pattern of the error. The error is temporally related (autocorrelation). Google Flu Trends from 8/21/11 to 9/1/13 systematically overestimated cases, sometime by more than 100%. Thomas J. Weinandy Big Data Hubris March 17, 2020 8 / 33
  • 9. Stage 4: Slope of Enlightenment What went wrong? 1. Complexity & dynamism Google Search is a complicated, always-changing algorithm 2. Overfitting big data to small data Google data: 50 million search terms at the daily level CDC data: 1152 influenza-like illness reports at the weekly level Thomas J. Weinandy Big Data Hubris March 17, 2020 9 / 33
  • 10. Stage 5: Plateau of Productivity Lazer et al. (2014) Out-of-sample mean absolute error: 0.463 for Google Flu model 0.311 for Lagged CDC model 0.232 for combined Google Flu + Lagged CDC model Thomas J. Weinandy Big Data Hubris March 17, 2020 10 / 33
  • 11. The Parable of Google Flu Trends Big data hubris is the belief that big data sets are a substitute for, rather than a supplement to, traditional data collection and analysis. Big data is not a silver bullet. Aggregating big data to match small data is inherently destructive. Thomas J. Weinandy Big Data Hubris March 17, 2020 11 / 33
  • 12. How this relates to my dissertation Intended Research Questions How does fuel price impact the quantity of car service trips? Will taxi and Uber drivers react differently? Unintended Research Questions Is data still “big” once I aggregate it? Thomas J. Weinandy Big Data Hubris March 17, 2020 12 / 33
  • 13. Types of Car Services in New York City (NYC) Thomas J. Weinandy Big Data Hubris March 17, 2020 13 / 33
  • 14. Taxi Background Taxis are regulated by the New York City Taxi and Limousine Commission (TLC) that sets the price schedule and determines a fixed supply through medallions. The medallion system makes licensed taxis very expensive, so most cab drivers must lease a taxi for regularly scheduled 12-hour shifts (Bagchi 2018). Drivers decide when to end their shift and tend to work longer as expected income goes up (Farber 2015, Chen et al. 2017). Taxi drivers keep all revenue but pay for their lease and their own gasoline. Thomas J. Weinandy Big Data Hubris March 17, 2020 14 / 33
  • 15. TNC Background TNCs are not subject to taxi regulation since they pick up customers through an app, not from a street hail. TNCs have a flexible supply, flexible price model that adjusts in real time according to relative supply and demand. 84% of surveyed Uber drivers chose the job “to have more flexibility in my schedule and balance my work with my life and family” (Hall & Krueger 2018, p713). TNC drivers pay about 25% of their revenue to the platform while also paying for all vehicle costs, including gasoline. Thomas J. Weinandy Big Data Hubris March 17, 2020 15 / 33
  • 16. Estimated Model quantity of car service trips = f(fuel price, previous supply & demand, economic activity, weather, traffic flow, supply & demand patterns) ln(tripst) = β0 + β1ln(fuel pricet) + β2ln(tripst-1) + β3ln(stock activityt) + β4temperaturet + β5raint + β7ln(collisionst) + β8ln(collisionst) ∗ raint + β9ln(collisionst) ∗ snowt + β10holidayt + β11trend + γ day-of-weekt + φ week-of-yeart + t Thomas J. Weinandy Big Data Hubris March 17, 2020 16 / 33
  • 17. Explanation of Variables Thomas J. Weinandy Big Data Hubris March 17, 2020 17 / 33
  • 18. Big Data Collected over 726 million ride-level observations in NYC from https://opendata.cityofnewyork.us However, fuel prices are at the day level with 1002 observations. Thomas J. Weinandy Big Data Hubris March 17, 2020 18 / 33
  • 19. Car Service Trips Daily Taxi and TNC Trips Thomas J. Weinandy Big Data Hubris March 17, 2020 19 / 33
  • 20. Fuel Prices Price of New York Harbor Gasoline Thomas J. Weinandy Big Data Hubris March 17, 2020 20 / 33
  • 21. Estimation Method Estimate separately for TNCs and taxis Include multiple specifications for robustness With and without a lagged dependent variable With and without a measure of economic activity Evaluate using OLS regression and SUR estimation I do not include the competing car service in the model since that would introduce endogeneity. Seemingly Unrelated Regression (SUR) estimation corrects for the contemporaneous correlation of residuals between the TNCs and taxis by estimating them together (Zellner 1962). Thomas J. Weinandy Big Data Hubris March 17, 2020 21 / 33
  • 22. OLS Results Thomas J. Weinandy Big Data Hubris March 17, 2020 22 / 33
  • 23. OLS Results Thomas J. Weinandy Big Data Hubris March 17, 2020 23 / 33
  • 24. OLS Results with Lagged DV Thomas J. Weinandy Big Data Hubris March 17, 2020 24 / 33
  • 25. SUR Results Thomas J. Weinandy Big Data Hubris March 17, 2020 25 / 33
  • 26. Marginal Effects: Methodology Consider the following parsimonious equation yt = αxt + βyt-1 + γ Γt where yt is the number of car service trips, xt is fuel price, and Γt is a vector of all other variables. Due to the lagged dependent variable yt-1, the impact of fuel prices on the number of trips will persist for all future periods. Thus, the cumulative marginal effect of a one-day change in gas prices is ∞ i=0 ∂yt+i ∂xt = α 1 − β Thomas J. Weinandy Big Data Hubris March 17, 2020 26 / 33
  • 27. Marginal Effects: Methodology ∞ i=0 ∂yt+i ∂xt = α 1 − β To calculate the standard error around the above marginal effect, I follow De Boef & Keele (2008) to estimate the variance using V ar a b = a2 b2 V ar (a) a2 + V ar (b) b2 − 2Cov (a, b) ab where a = α and b = 1 − β. This gives me the estimator and its statistical significance. Thomas J. Weinandy Big Data Hubris March 17, 2020 27 / 33
  • 28. Marginal Effects: Results Thomas J. Weinandy Big Data Hubris March 17, 2020 28 / 33
  • 29. Marginal Effects: Results Thomas J. Weinandy Big Data Hubris March 17, 2020 29 / 33
  • 30. Summary of Results 1 A one-day 1% increase in fuel prices is associated with a 0.367% decrease in TNC trips and a 0.088% increase in taxi trips. 2 On a typical day when fuel prices change 1.729%, there will be an average change of 2,190 TNC trips and 576 taxi trips in New York City alone. 3 TNC drivers are very flexible and can reduce their supply when an input cost like gasoline increases. 4 Due to heavy taxi regulation, cab drivers are stuck paying the higher fuel price but benefit when competing TNC drivers exit the market. Thomas J. Weinandy Big Data Hubris March 17, 2020 30 / 33
  • 31. Conclusion Where is Big Data at? Thomas J. Weinandy Big Data Hubris March 17, 2020 31 / 33
  • 32. Works Cited Bagchi, Sutirtha. “A Tale of Two Cities: An Examination of Medallion Prices in New York and Chicago.” Review of Industrial Organization 53.2 (2018): 295-319. Butler, Declan. ”When Google got flu wrong: US outbreak foxes a leading web-based method for tracking seasonal flu.” Nature 494.7436 (2013): 155-157. Chen, Minzhang, Judith A. Chevalier, Peter E. Rossi and Emily Oehlsen. “The Value of Flexible Work: Evidence from Uber Drivers.” NBER Working Paper No. 23296 (March 2017). De Boef, Suzanna, and Luke Keele. ”Taking time seriously.” American Journal of Political Science 52.1 (2008): 184-200. Farber, Henry S. “Why you Can’t Find a Taxi in the Rain and Other Labor Supply Lessons from Cab Drivers,” The Quarterly Journal of Economics, Volume 130, Issue 4, (November 1, 2015). 1975–2026. Ginsberg, Jeremy, et al. ”Detecting influenza epidemics using search engine query data.” Nature 457.7232 (2009): 1012-1014. Hall, Jonathan V., and Alan B. Krueger. “An analysis of the labor market for Uber’s driver-partners in the United States.” ILR Review 71.3 (2018). 705-732. Lazer, David, et al. ”The parable of Google Flu: traps in big data analysis.” Science 343.6176 (2014): 1203-1205. Zellner, Arnold. “An Efficient Method of Estimating Seemingly Unrelated Regression Equations and Tests of Aggregation Bias,” Journal of the American Statistical Association, 57. (1962). 500-509. Thomas J. Weinandy Big Data Hubris March 17, 2020 32 / 33
  • 33. thank you @tomweinandy thomas.j.weinandy@wmich.edu Thomas J. Weinandy Big Data Hubris March 17, 2020 33 / 33