Identification and UGC IS Economics Research Seminar         By Beibei Li         May-11-2012                             ...
What is Identification? Understanding what is the causal relationshipbehind empirical results.e.g., Imagine variables Yt ...
Agenda   Major Research Questions   Why Is Identification Important for UGC Research   Overview of Econometric Identifi...
Major Research Questions Economic Effect     Product sales, pricing power, new product adoptions User Behavior, Motivat...
Why Identification? – Causality Economic Effect   Unobserved product heterogeneity.            e.g. product quality   P...
Overview of Identification Strategies Fixed Effect:       Control for unobserved characteristics that are time-invariant....
Archak, Ghose & Ipeirotis (Mgt Sci 2011)Motivation:  What is the economic impact of UGC on product sales?  Using only nume...
Archak, Ghose & Ipeirotis (Mgt Sci 2011)Research Questions:  • What is the economic impact of UGC on product  sales beyond...
Archak, Ghose & Ipeirotis (Mgt Sci 2011)Main Idea:  • Identify which product attributes (e.g., nouns/noun phrases)  are mo...
Archak, Ghose & Ipeirotis (Mgt Sci 2011)Data: • Sales rank, price and consumer reviews from Amazon.com • Two product categ...
Archak, Ghose & Ipeirotis (Mgt Sci 2011)Identification: • Price Endogeneity: IV-lagged price (Villas-Boas and Winer 1999) ...
Ghose, Ipeirotis & Li (Mkt Sci 2012)Motivation:• Content beyond text? Images, geo-maps, social-geo tags…• Social media  P...
Ghose, Ipeirotis & Li (Mkt Sci 2012) Research Questions:   • What is consumers’ willingness-to-pay for   different product...
Ghose, Ipeirotis & Li (Mkt Sci 2012) Main Idea:  1. Identify the important product characteristics that  influence demand....
Ghose, Ipeirotis & Li (Mkt Sci 2012)
Ghose, Ipeirotis & Li (Mkt Sci 2012) Transaction data:       Travelocity.com, 1497 US hotels, 2008/11-2009/1 Location Char...
Ghose, Ipeirotis & Li (Mkt Sci 2012) A Structural Model for Demand Estimation:               u                 ij k t     ...
Ghose, Ipeirotis & Li (Mkt Sci 2012)Identification – Price Endogeneity:  IV for price – variables that are correlated with...
Ghose, Ipeirotis & Li (Mkt Sci 2012)    Identification – Price Endogeneity:      IV for price – variables that are correla...
Ghose, Ipeirotis & Li (Mkt Sci 2012)Identification – UGC Endogeneity:                                               Error ...
Ghose, Ipeirotis & Li (Mkt Sci 2012)Summary:  1.   Identify the important product characteristics that influence       dem...
Luca (HBS Working Paper 2011)Research Question:   How do online reviews affect product demand?Challenge:  Causal relations...
Luca (HBS Working Paper 2011)Identification:• Unobserved factors that are correlated with both Yelp rating  and demand. (e...
Luca (HBS Working Paper 2011)RD Design:
Luca (HBS Working Paper 2011)Model:                                    Restaurant, Quarter Fixed Effects                  ...
Luca (HBS Working Paper 2011)Key Identification Assumption:   - Restaurants become increasingly similar, when approaching ...
Luca (HBS Working Paper 2011)Conclusion: A one-star increase in Yelp rating causes a 5-9% increase in revenue!Note: When u...
Discussions Aspects of social media content that are examined:- Online ratings (valence, volume, variance, helpfulness)- R...
Discussions Product categories that are examined:- Books- Electronics, digital cameras, etc.- Software- TV shows- Movie bo...
Discussions Identification Strategies that are mostly used:- Fixed-Effect- Diff-in-Diff- Regression Discontinuity- Natural...
Discussions       Data-Driven Identification?       • Natural Experiment Setting       Research Question-Driven Identifica...
Upcoming SlideShare
Loading in...5
×

Identification and ugc

560

Published on

Published in: Career, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
560
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Identification and ugc

  1. 1. Identification and UGC IS Economics Research Seminar By Beibei Li May-11-2012 1
  2. 2. What is Identification? Understanding what is the causal relationshipbehind empirical results.e.g., Imagine variables Yt and Xt are correlated. There can be three reasons for this, which are not mutually exclusive: • Cause: Xt  Yt • Reverse Cause: Yt  Xt • Correlated variable: Zt  Both Xt and YtIdentification is essential for empirical research!
  3. 3. Agenda Major Research Questions Why Is Identification Important for UGC Research Overview of Econometric Identification Strategies Examples (Archak et al. 2011, Ghose, Ipeirotis and Li 2012, Luca 2011) Discussions
  4. 4. Major Research Questions Economic Effect  Product sales, pricing power, new product adoptions User Behavior, Motivation, Social Dynamics  Dynamics of online reviews (e.g., evolve over time)  How do previous opinions affect subsequent behavior?  How is rating influenced by public opinions? e.g., existing ratings, professional ratings Firm Perspective, Marketing Strategies, Managerial Implications  Social media vs. Traditional marketing campaigns  What should firms do with the existence of social media? e.g., stimulate additional WOM, adapt pricing/ads to UGC.  Positive & Negative publicity
  5. 5. Why Identification? – Causality Economic Effect  Unobserved product heterogeneity. e.g. product quality  Publicity, advertising… User Behavior, Motivation, Social Dynamics  Online reviews may not convey true opinion. e.g., social influence (cascade/herding, differentiating)  Online reviews may not reveal true quality. e.g., early self-selection bias, review dynamics Firm Perspective, Marketing Strategies, Managerial Implications  Social media vs. Traditional marketing campaigns
  6. 6. Overview of Identification Strategies Fixed Effect: Control for unobserved characteristics that are time-invariant.(e.g., product-fixed effect, location-fixed effect) e.g., Ghose et al. 2007. Diff-in-Diff: Difference out both time-invariant and time-variant unobservables. e.g., Chevalier and Mayzlin 2006. Regression Discontinuity: Exam treatment effect by observing a“discontinuous jump” while controlling for continuous score and other covariates. e.g., Luca 2011. Natural Experiment: Treatments effects are not manipulable by the researchers.(e.g., government interventions, policy changes) e.g., Chan and Ghose 2012. Instrumental Variables: Variables that are correlated with the endogeneousexplanatory variables, but not correlated with the error. e.g., Ghose, Ipeirotis &Li 2012. Propensity Score Matching: Match a treated sample with an untreated samplebased on their predicted propensities to be treated – “would have been treated but not.”e.g., Aral, Muchnik and Sundararajan 2009, Rhue and Sundararajan 2010.
  7. 7. Archak, Ghose & Ipeirotis (Mgt Sci 2011)Motivation: What is the economic impact of UGC on product sales? Using only numeric rating has limitations: • Quality is not one-dimensional; • Reviewers and readers may have different tastes; • Ratings may not convey consumers’ true opinions; (e.g., social influence) • Ratings may not capture true quality information; (e.g., Li & Hitt 2008, early self-selection bias, Hu et al. 2008, bimodal distribution) • Rating is discrete: “4” reviews may read like “3” or “5”
  8. 8. Archak, Ghose & Ipeirotis (Mgt Sci 2011)Research Questions: • What is the economic impact of UGC on product sales beyond the effect of numeric review ratings? • How can product reviews help us learn consumer preferences for different product attributes, and how consumers make trade-offs between those attributes?
  9. 9. Archak, Ghose & Ipeirotis (Mgt Sci 2011)Main Idea: • Identify which product attributes (e.g., nouns/noun phrases) are most frequently discussed in product reviews; Fully automated (POS tagger) vs. Crowdsourcing • Extract opinions (e.g., adjectives that refer to those nouns) about these product attributes; Fully automated (Syntactic dependency parser) vs. Crowdsourcing • Estimate the economic impact of the extracted opinions. Dynamic panel data model + System GMM
  10. 10. Archak, Ghose & Ipeirotis (Mgt Sci 2011)Data: • Sales rank, price and consumer reviews from Amazon.com • Two product categories (digital cameras and camcorders) • 15 months (2005/3-2006/5)Model:
  11. 11. Archak, Ghose & Ipeirotis (Mgt Sci 2011)Identification: • Price Endogeneity: IV-lagged price (Villas-Boas and Winer 1999) • UGC Endogeneity: Google trends product search volume as control (Luan & Neslin 2009) • Autocorrelation: Lagged dependent variable as controlFirst paper to bridge the qualitative nature of UGCand the quantitative nature of consumer choice.
  12. 12. Ghose, Ipeirotis & Li (Mkt Sci 2012)Motivation:• Content beyond text? Images, geo-maps, social-geo tags…• Social media  Product search engines: fail to efficiently leverageinformation created across multiple social media channels;• Ranking mechanism cannot capture multidimensional preferences.
  13. 13. Ghose, Ipeirotis & Li (Mkt Sci 2012) Research Questions: • What is consumers’ willingness-to-pay for different product attributes? • Is there a better method for product search engines for ranking products? Consumers’ decision : “best value” Search engines’ decision : “most relevant”
  14. 14. Ghose, Ipeirotis & Li (Mkt Sci 2012) Main Idea: 1. Identify the important product characteristics that influence demand. 2. Use a choice model to precisely estimate how these product characteristics influence demand. 3. Impute the expected utility gain (surplus) from each product and propose a ranking framework based on surplus. Product ``value-for-money” Price Characteristics
  15. 15. Ghose, Ipeirotis & Li (Mkt Sci 2012)
  16. 16. Ghose, Ipeirotis & Li (Mkt Sci 2012) Transaction data: Travelocity.com, 1497 US hotels, 2008/11-2009/1 Location Characteristics:  Social geo-tags: Geonames.org, “Public transportation”  GeoMapping Search Tools: Microsoft Virtual Earth SDK, “Restaurants”  Image Classification: “Beach”, “Downtown”  On-Demand Survey: Amazon Mechanical Turk (AMT), “Highway”Service Characteristics:  JavaScript parsing engines: TripAdvisor & Travelocity, “# of Internal amenities”, “Reviewer Rating”, “# of online reviews”Additional Review Characteristics: Text Mining: Review-based content from TripAdvisor & Travelocity, Text features (e.g., “Breakfast”, “Staff”), “Subjectivity”, “Readability”, “Disclosure of Reviewer Identity” 16
  17. 17. Ghose, Ipeirotis & Li (Mkt Sci 2012) A Structural Model for Demand Estimation: u ij k t X jk t i  i Pjk t   jk t   ikt , error term, Type I EV hotel utility consumer-specific random coefficientsRandom Coefficient Logit Model (Song 2011, PCM 2007, BLP 1995)How to capture consumer heterogeneity?• Each individual consumer has different  i , i• Each individual consumer has a different error  i 17
  18. 18. Ghose, Ipeirotis & Li (Mkt Sci 2012)Identification – Price Endogeneity: IV for price – variables that are correlated with price, but not error. Price Error  i Advertising, IV Advertising, Cost … Publicity… Stage 1: Regress Price on X and IV; Stage 2: Predict ^Price based on purely X and IV, and substitute Price with the predicted ^Price .  ^Price will not correlated with error! 18
  19. 19. Ghose, Ipeirotis & Li (Mkt Sci 2012) Identification – Price Endogeneity: IV for price – variables that are correlated with price, but not error. Price Error  i Advertising, IV Advertising, Cost … Publicity… Average price of the ``same-star rating” hotels in the other markets as an instrument for price (Hausman et al. 1994). BLP-style instruments - Average characteristics of the same-star rating hotel in the other markets (BLP 1995) Lagged prices as instruments in conjunction with Google Trends data to control for correlated demand shocks (similar as Archak et al. 2011). Region dummies as proxies for the cost (e.g., the cost of transportation, labor, etc.) (Nevo 2001). 19
  20. 20. Ghose, Ipeirotis & Li (Mkt Sci 2012)Identification – UGC Endogeneity: Error  i UGC Rating Advertising, Publicity, Advertising, Publicity, Unobserved Quality… Quality… (Both time-variant and time-invariant) • Product-Fixed Effect • Diff-in-Diff • IV • Regression Discontinuity (Luca 2011) 20
  21. 21. Ghose, Ipeirotis & Li (Mkt Sci 2012)Summary: 1. Identify the important product characteristics that influence demand  Machine learning for social media variables. 2. Random coefficient logit model to estimate how these product characteristics influence demand. Identification: Price/UGC Endogeneity! 3. Derive the expected utility gain (surplus) from each product and propose a ranking framework based on surplus. 4. Randomized experiments for ranking validation. 21
  22. 22. Luca (HBS Working Paper 2011)Research Question: How do online reviews affect product demand?Challenge: Causal relationship  UGC EndogeneityIdentification: Regression DiscontinuityData: • Reviews from Yelp.com, 3,582 Seattle restaurants; • Revenue from the Washington State Department of Revenue, 2003-2009.
  23. 23. Luca (HBS Working Paper 2011)Identification:• Unobserved factors that are correlated with both Yelp rating and demand. (e.g., restaurant quality). Error  i UGC Rating Advertising, Publicity, Advertising, Unobserved Quality… Publicity, Quality… (Both time-variant and time-invariant)Main Idea:• Rounding Mechanism: Ratings are rounded to the nearest half- star.• Seek discontinuous jumps in revenue that follow discontinuous changes in rating.
  24. 24. Luca (HBS Working Paper 2011)RD Design:
  25. 25. Luca (HBS Working Paper 2011)Model: Restaurant, Quarter Fixed Effects Continuous unrounded rating Impact of moving from just below a discontinuity to just above a discontinuity, controlling for the continuous change in unrounded rating.
  26. 26. Luca (HBS Working Paper 2011)Key Identification Assumption: - Restaurants become increasingly similar, when approaching both sides of the threshold. - Random assignment of restaurants to either side of the rounding threshold.McCrary density test for “Gaming:” - Selection bias The thresholds can also be seen by the restaurants, so restaurants may submit reviews themselves to pass the rounding threshold. - If so, one would expect to see a disproportionately large number of restaurants just above the rounding thresholds.
  27. 27. Luca (HBS Working Paper 2011)Conclusion: A one-star increase in Yelp rating causes a 5-9% increase in revenue!Note: When using a RD design, need to seriously consider:  Cost of “agent’s gaming” behavior: RD is only valid when agents face sufficiently high cost of selection. e.g., geographic/age thresholds.  Knowledge of agents: RD is valid when agents do not know the cutoff threshold, or their own score, or both. (e.g., McCrary density test, Luca 2011)
  28. 28. Discussions Aspects of social media content that are examined:- Online ratings (valence, volume, variance, helpfulness)- Review text (length, sentiments, readability and linguistic styles)- Reviewer information (identity disclosure)- Social-tags- Blogs (music blogs, enterprise blogs, microblogging)- Discussion forums- Mobile UGC
  29. 29. Discussions Product categories that are examined:- Books- Electronics, digital cameras, etc.- Software- TV shows- Movie box office- Video games- Mobile phones- Hotels- Restaurants- Bath & home products- Stocks
  30. 30. Discussions Identification Strategies that are mostly used:- Fixed-Effect- Diff-in-Diff- Regression Discontinuity- Natural Experiment- Instrumental Variable- Propensity Score Matching- Randomized Experiment
  31. 31. Discussions Data-Driven Identification? • Natural Experiment Setting Research Question-Driven Identification? • Regression Discontinuity Design • Diff-in-Diff • Instrumental VariableThere are a range of approaches – but they all need some prior economic thought 
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×