SlideShare a Scribd company logo
Research Using Behavioral Big Data
A Tour and Why Mechanical Engineers Should Care
Galit Shmueli
I’m not a mechanical engineer
PhD in Statistics, Technion IE&M
CMU Statistics Dept
U of Maryland Business School
Indian School of Business
National Tsing (“Ching”) Hua U, Inst. Service Science
‫פרופ‬'‫שמואלי‬ ‫מנחם‬(‫ז‬"‫ל‬)
1935-1980
‫מכונות‬ ‫הנדסת‬,‫טכניון‬
Research in Data Analytics
‘Entrepreneurial’ statistical &
data mining modeling
(for today’s problems)
Interdisciplinary Research
Statistical Strategy
To Explain or To Predict?
Information Quality
Data Mining and Causality
What is Behavioral Big Data (BBD)
Special type of Big Data
Behavioral: people’s actions, interactions,
self-reported opinions, thoughts, feelings
Human and social aspects: Intentions, deception,
emotion, reciprocation, herding,…
When aware of data collection -> modify behavior
(legal risks, embarrassment, unwanted solicitation)
BBD vs.
Medical
Big Data
• Physical
measurements
• Data collection
timing often set
by medical system
• Clinical trials:
awareness &
vested interest
• People’s daily actions,
interactions, self-reported
feelings, opinions, thoughts
• Data generation timing
often chosen by user
• Experiments: users often
unaware; goal not always in
user’s interest
BBD on Citizens and Customers – old story
Governments
law enforcement,
security, traffic
(cameras, sensors)
Financial Institutions
fraud, loans
(IT systems, cameras)
Telecoms fraud,
infrastructure, marketing
(IT systems, mobile)
Retail Chains
marketing, operations, merchandising
(POS systems, video, social, mobile)
Insurance
Usage-based premiums
(telematics)
“Old”:
• Cameras
• Sensors
• IT systems
(POS, calls,…)
New:
• GPS
• Internet
• Mobile
• Social
• Things
BBD on Employees
Service Providers
quality control, employee performance
Electronic Performance Monitoring
(EPM) systems, web surfing, e-mails
sent and received, telephone use,
video, location (taxis)
BBD on Citizens, Customers, Employees: Internet!
• BBD now also available to small companies & organizations
• Online platforms have BBD (e-commerce, gaming, search,
social networks…)
• Voluntarily entered by users (UGC): personal details, photos,
comments, messages, search terms, bids in auctions, likes,
payment information, connections with “friends”
• Passive footprints: duration on the website, pages browsed,
sequence, referring website, Internet browser, operating
system, location, IP address.
• BBD now available to individuals: Quantified Self
1. Research
Opportunity
2. Understand
3. Collaborate
How does your ME work
relate to BBD?
To Data Analytics & Social Sci?
Engineering
Social
Sciences
Data
Analytics
Behavioral Big Data
From theory to practice
More and more human and social
activities are moving online
Most companies that have BBD were not
created for the purpose of generating BBD
Two important points
Why should mechanical engineers care about BBD?
Technology is advancing in two directions
Fully automated
(algorithmic) solutions
Because you are (and should be) involved in designing both!
Micro-level recording of
human and social behavior
1. Research Opportunity
2. Understand
3. Collaborate
How does your ME work
relate to BBD?
To Data Analytics & Social Sci?
Engineering
Social
Sciences
Data
Analytics
Behavioral Big Data
the most crucial choices about the future of ordinary voters and their children are
probably made not by Brussels bureaucrats or Washington lobbyists but by
engineers, entrepreneurs, and scientists who are hardly aware of the
implications of their decisions, and who certainly don’t represent anyone.
Brief Tour of BBD Research
in the Land of Social Science & Business
Research using BBD
Duncan Watts, Microsoft Research (NY):
1. Social science problems are almost always more
difficult than they seem
2. The data required to address many problems of
interest to social scientists remain difficult to assemble
3. Thorough exploration of complex social problems
often requires the complementary application of
multiple research traditions
Academic Research Qs using BBD
Causal questions about
human and social behavior
examine new
phenomena
re-examine old phenomena
with better data
Research Methodologies Using BBD
Quasi
experiments
Randomized
experiments
Observational
studies
Survey
studies
Natural
experiments
Research Communities
Researchers with social science + technical backgrounds
Information
Systems
Marketing
Computational
Social Science
7Examples of BBD Studies in Top Journals
Emotional Contagion in Social Networks
(Kramer et al. Proc of the National Academies
of Sciences, 2014)
• Can emotional states be transferred to
others via emotional contagion?
• Old question, new data
• Large-scale experiment run by FB,
manipulating users’ exposure level to
emotional expressions in their
Facebook News Feed
Anonymous Browsing in Dating Websites (Bapna et al. Management Science, 2016)
• How does anonymous browsing affect outcomes on dating sites?
• New questions about human behavior due to new technologies
• Large-scale experiment on N American dating website
Identifying Influential and Susceptible
Members of Social Networks
(Aral and Walker, Science, 2012)
• How do individuals’ attributes
modulate peer influence
• Old question in new context
• Experiment on social news
aggregation website where users
contribute news articles, discuss
them, and rate comments
Consumption in Virtual Worlds
(Hinz et al. Info Sys Research, 2015)
• Does conspicuous consumption increase social status?
• Age-old sociology question with new BBD data
• Observational BBD from 2 virtual world websites (gaming with social network)
Impact of Online Intermediaries on
HIV Transmission
(Ghose & Chan MIS Quarterly, 2015)
• Does entry of major online personals
ad website increase HIV prevalence?
• New context
• Natural experiment on Craigslist
Impact of Info Hiding on Crowdfunding
(Burtch et al. Management Science, 2016)
• Does peer influence drive information
hiding in crowdfunding campaigns and
effect on contributions
• New online social context
• Observational BBD from large online
crowdfunding platform
Forecasting Elections with Non-Representative Polls
(Wang et al. Intl. Journal on Forecasting, 2014)
• Can elections be forecast using a non-representative
sample?
• Old question, new data
• Survey BBD from Xbox with built-in daily poll
ONE WAY MIRRORS IN
ONLINE DATING
A Randomized Field Experiment
Ravi Bapna, University of Minnesota
Jui Ramaprasad, Mcgill University
Galit Shmueli, National Tsing Hua
University
Akhmed Umyarov, University of Minnesota
Online Dating
of the single population in the US uses online dating to
find a partner (Gelles 2011)
%
Online Dating Website
Non-anonymous Browsing (Default)
Profile Visit
Recent visitor:
Anonymous Browsing
Profile Visit
Recent visitor:
NONE
Research Question (in simple words)
How does
anonymous browsing
affect user behavior?
… and matching?
Formal Research Question
what is the relative causal effect of
social inhibitions on search
preferences vs. social inhibitions
of contact initiation in dating
markets?
given known gender asymmetries,
how does this effect differ for men
vs. women?
Randomized Field Experiment
on Large Online Dating Website
50,000 users receive gift of
anonymous browsing
Results
Users treated with anonymity
become disinhibited
view more profiles, view more same-
sex and interracial mates
get less matches
lose ability to leave a weak signal
- especially harmful for women!
Role of anonymity and
importance of
WEAK SIGNAL
in online platforms
In Academia
Purpose: Scientific inquiry
Causal Qs are most popular
• Determinants of social phenomena
• Impact studies
Predictive Qs (quite rare)
In Industry
Purpose: evaluate or improve
products, service, operations, etc.
Mostly predictive, but also causal
• Netflix Prize: recommender system
• Yahoo!, LinkedIn, FB: personalized
news content to increase user
engagement/clicks
• Target: pregnancy prediction
• Amazon: pricing, logistics,...
• Government: campaign targeting
BBD-based Research: Academia vs. Industry
Getting BBD for Research
1. Open Data, Publicly Available Data
Data.gov
Twitter
Kaggle (UCI MR)
API and web scraping
2. Partnering with a Company
• Both parties interested in research question
• Data purchase
• Personal connections
• Partnership between school and organization
(CMU Living Analytics Research Lab)
3. Crowdsourcing
AMT Replacing student subjects
• Experiment subjects
• Survey respondents
• Cleaning and tagging data
“easy access to a large, stable, and diverse
subject pool, the low cost of doing
experiments, and faster iteration between
developing theory and executing
experiments” [Mason and Suri, 2012]
Using BBD for Research: Human Subjects
Institutional Review Board (IRB)
“ethics committee”
University-level committee designated
to approve, monitor, and review
biomedical and behavioral research
involving humans.
• performs benefit-risk analysis for
proposed study
• guidelines: Beneficence, Justice, and
Respect for persons
• HHS propose new IRB exemption criteria for publicly available data (or even buying it)
• Council for Big Data, Ethics & Society’s letter: “these criteria for exclusion focus on the
status of the dataset… not the content of the dataset nor what will be done with the
dataset, which are more accurate criteria for determining the risk profile of the
proposed research
Ethics: Beyond IRB
Facebook experiment [Kramer et al. 2014]:
IRB Exemption
“[The work] was consistent with Facebook’s Data Use Policy, to which
all users agree prior to creating an account on Facebook, constituting
informed consent for this research.”
• Expression of Concern by PNAS editor
• Varied response from public, academia,
press, ethicists, corporates [Adar 2015]
Big Behavioral Field Experiments:
5 Challenges
Big Behavioral Field Experiments: Challenges
1. Fast-Changing Environment
Users keep evolving
Technology changes fast (Netflix)
Parallel experiments run every day (Amazon)
2. Multidimensional Behavior, Context, Objectives
Comp. advertising & content recommendation: 3M’s [Agarwal & Chen 2016]
• Multi-response (clicks, shares, likes,…)
• Multi-context (mobile, email,...)
• Multiple objectives (engagement, revenue,...)
4. Spillover Effects
Treatment can affect control group (social networks)
How to randomly assign on a social network?
Dependence among units (data analysis) [Fienberg, 2015]
3. Knowledge of Allocation; Gift Effect (≈ clinical trials)
• Allocation knowledge can affect outcome
• Blinding? placebo?
• Online users discover their allocation via online forums
• “Gift” or preferential treatment can affect outcome
BB Field Experiments: More Challenges
5. Ethical and Moral Issues
Ease of running a large scale experiment quickly and at low cost
-> danger of harming many people quickly
small scale pilot study?
Experiment platforms: Fair treatment & payment
BB Field Experiments: Even More Challenges
Big Behavioral Quasi-Experiments
& Observational Studies:
5 Methodological Issues
Quasi-Experiments and Observational BBD:
Methodological Challenges
1. Data Size & Dimension
Scaling of statistical inference: p-values, multiple testing
“Too Big to Fail: Large Samples and the p-Value Problem” (Lin, Lucas & Shmueli ISR 2013)
Data Dredging
Can detect lots of tiny & complex effects
Role of theory vs data discovery
Role of Prediction
“Predictive Analytics in Information Systems Research” (Shmueli & Koppius MISQ 2011)
2. Self-Selection Bias
Users choose treatment/control group
Scaling of stat/econ methods to big data
“A Tree-Based Approach for
Addressing Self-selection in Impact
Studies with Big Data” (Yahav,
Shmueli & Mani, MIS Quarterly 2016)
More challenges (in search of causal explanations)
3. Simpson’s Paradox
Causal direction reverses when data
are disaggregated
Big data: lots of possible breakdowns
“The Forest or the Trees?
Tackling Simpson’s Paradox
with Classification Trees”
(Shmueli & Yahav, 2016)
Does a dataset display a paradox?
And finally…
5. Data Contaminated by Experiments
+ some of the randomized experiments issues
(fast-changing environment, etc.)
Using Observational Data: Ethical & Moral Issues
1. Web data collection by researchers
2. Data protection, data sharing, and reproducible research
(Privacy - Netflix)
3. Data tagging by AMT – fair payment (+quality issues)
Large Scale Surveys
Data quality issues at large scale
• duplicate responses
• insincere responses
Online surveys: cheap, easy, fast
Large pool of available “workers”
Supplement experimental/observational studies
The promise of para data
Data on how the survey was accessed/answered
(OECD Survey of Adult Skills)
• time stamps of opening invitation email, survey access,…
• duration for answering each question
The real gorilla in large scale surveys: Generalization
Sampling and non-sampling errors
“The central issue is whether conditional effects in the sample… may
be transported to desired target populations. Success depends on
compatibility of causal structures in study and target populations,
and will require subject matter considerations in each concrete
case.” - Keiding and Louis, JRSS 2016
Statistical generalization & scientific generalization
Who do the Turkers represent?
Information Quality: The Potential of Data & Analytics to Generate Knowledge, Kenett & Shmueli, Wiley 2016
“Clarifying the terminology that describes scientific reproducibility” (Kenett & Shmueli, Nature Methods 2015)
Summary
Technical Challenges
Data access
Analysis scalability
Quick-changing environment
BBD = lots of behavioral data
Who has it?
How is it analyzed?
For what purpose?
Methodological Challenges
Selection bias
Generalization
Data contaminated by other experiments
Spillover effects
Lack of methodical lifecycle
Legal, Ethical, Moral Challenges
Privacy violation (Netflix; networks)
Risks to human subjects
Company vs. Researcher Objectives
Gains of company at expense of
individuals, communities, societies, &
science
Why should mechanical engineers care about BBD?
Technology is advancing in two directions
Fully automated
(algorithmic) solutions
Micro-level recording of
human and social behavior
Going Forward…
Convergence of
Social Sciences & Engineering
Things now collect BBD
(intentionally or not)
1. Research Opportunity
2. Understand
3. Collaborate
How does your ME work
relate to BBD?
To Data Analytics & Social Sci?
Engineering
Social
Sciences
Data
Analytics
Behavioral Big Data
Galit Shmueli 徐茉莉
Institute of Service Science

More Related Content

Similar to Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Should Care

Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Galit Shmueli
 
Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchGalit Shmueli
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysCliff Lampe
 
Determinants of Internet Media Abuse In Workplace.
Determinants of Internet Media Abuse In Workplace.Determinants of Internet Media Abuse In Workplace.
Determinants of Internet Media Abuse In Workplace.Fairul Hisyam Mat
 
RESEARCH ARTICLEEXPECTING THE UNEXPECTED EFFECTS OF DATA.docx
RESEARCH ARTICLEEXPECTING THE UNEXPECTED  EFFECTS OF DATA.docxRESEARCH ARTICLEEXPECTING THE UNEXPECTED  EFFECTS OF DATA.docx
RESEARCH ARTICLEEXPECTING THE UNEXPECTED EFFECTS OF DATA.docxaudeleypearl
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveMicah Altman
 
Reproducibility from an infomatics perspective
Reproducibility from an infomatics perspectiveReproducibility from an infomatics perspective
Reproducibility from an infomatics perspectiveMicah Altman
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptxRahulTr22
 
Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021andygustafson
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
 
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Lauri Eloranta
 
"Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective""Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective"Micah Altman
 
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...Saurabh Mishra
 
Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...Kandy Woodfield
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)Han Woo PARK
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Mike Kujawski
 
Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...Nicola Osborne
 
A Case for Expectation Informed Design
A Case for Expectation Informed DesignA Case for Expectation Informed Design
A Case for Expectation Informed Designgloriakt
 

Similar to Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Should Care (20)

Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
 
Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare Research
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveys
 
Determinants of Internet Media Abuse In Workplace.
Determinants of Internet Media Abuse In Workplace.Determinants of Internet Media Abuse In Workplace.
Determinants of Internet Media Abuse In Workplace.
 
RESEARCH ARTICLEEXPECTING THE UNEXPECTED EFFECTS OF DATA.docx
RESEARCH ARTICLEEXPECTING THE UNEXPECTED  EFFECTS OF DATA.docxRESEARCH ARTICLEEXPECTING THE UNEXPECTED  EFFECTS OF DATA.docx
RESEARCH ARTICLEEXPECTING THE UNEXPECTED EFFECTS OF DATA.docx
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics Perspective
 
Reproducibility from an infomatics perspective
Reproducibility from an infomatics perspectiveReproducibility from an infomatics perspective
Reproducibility from an infomatics perspective
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
 
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
"Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective""Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective"
 
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
 
Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
 
Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...
 
A Case for Expectation Informed Design
A Case for Expectation Informed DesignA Case for Expectation Informed Design
A Case for Expectation Informed Design
 
Social Media
Social MediaSocial Media
Social Media
 

More from Galit Shmueli

“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modificationGalit Shmueli
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?Galit Shmueli
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomGalit Shmueli
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal researchGalit Shmueli
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingGalit Shmueli
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information QualityGalit Shmueli
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareGalit Shmueli
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingGalit Shmueli
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMGalit Shmueli
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageGalit Shmueli
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...Galit Shmueli
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...Galit Shmueli
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...Galit Shmueli
 
Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Galit Shmueli
 
E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)Galit Shmueli
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesGalit Shmueli
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
 
Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Galit Shmueli
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Galit Shmueli
 

More from Galit Shmueli (20)

“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics Classroom
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal research
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and Predicting
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information Quality
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should Care
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PM
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of Marriage
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
 
Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies
 
E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative Industries
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 
Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
 

Recently uploaded

2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdfKamal Acharya
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageRCC Institute of Information Technology
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectRased Khan
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsAtif Razi
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamDr. Radhey Shyam
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdfKamal Acharya
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdfKamal Acharya
 
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...Amil baba
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdfKamal Acharya
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdfKamal Acharya
 
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxwendy cai
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-IVigneshvaranMech
 
Pharmacy management system project report..pdf
Pharmacy management system project report..pdfPharmacy management system project report..pdf
Pharmacy management system project report..pdfKamal Acharya
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdfKamal Acharya
 
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringC Sai Kiran
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfKamal Acharya
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationRobbie Edward Sayers
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringC Sai Kiran
 

Recently uploaded (20)

2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltage
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdf
 
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptx
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 
Pharmacy management system project report..pdf
Pharmacy management system project report..pdfPharmacy management system project report..pdf
Pharmacy management system project report..pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 

Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Should Care

  • 1. Research Using Behavioral Big Data A Tour and Why Mechanical Engineers Should Care Galit Shmueli
  • 2. I’m not a mechanical engineer PhD in Statistics, Technion IE&M CMU Statistics Dept U of Maryland Business School Indian School of Business National Tsing (“Ching”) Hua U, Inst. Service Science ‫פרופ‬'‫שמואלי‬ ‫מנחם‬(‫ז‬"‫ל‬) 1935-1980 ‫מכונות‬ ‫הנדסת‬,‫טכניון‬
  • 3. Research in Data Analytics ‘Entrepreneurial’ statistical & data mining modeling (for today’s problems) Interdisciplinary Research Statistical Strategy To Explain or To Predict? Information Quality Data Mining and Causality
  • 4. What is Behavioral Big Data (BBD) Special type of Big Data Behavioral: people’s actions, interactions, self-reported opinions, thoughts, feelings Human and social aspects: Intentions, deception, emotion, reciprocation, herding,… When aware of data collection -> modify behavior (legal risks, embarrassment, unwanted solicitation)
  • 5. BBD vs. Medical Big Data • Physical measurements • Data collection timing often set by medical system • Clinical trials: awareness & vested interest • People’s daily actions, interactions, self-reported feelings, opinions, thoughts • Data generation timing often chosen by user • Experiments: users often unaware; goal not always in user’s interest
  • 6. BBD on Citizens and Customers – old story Governments law enforcement, security, traffic (cameras, sensors) Financial Institutions fraud, loans (IT systems, cameras) Telecoms fraud, infrastructure, marketing (IT systems, mobile) Retail Chains marketing, operations, merchandising (POS systems, video, social, mobile) Insurance Usage-based premiums (telematics) “Old”: • Cameras • Sensors • IT systems (POS, calls,…) New: • GPS • Internet • Mobile • Social • Things
  • 7. BBD on Employees Service Providers quality control, employee performance Electronic Performance Monitoring (EPM) systems, web surfing, e-mails sent and received, telephone use, video, location (taxis)
  • 8. BBD on Citizens, Customers, Employees: Internet! • BBD now also available to small companies & organizations • Online platforms have BBD (e-commerce, gaming, search, social networks…) • Voluntarily entered by users (UGC): personal details, photos, comments, messages, search terms, bids in auctions, likes, payment information, connections with “friends” • Passive footprints: duration on the website, pages browsed, sequence, referring website, Internet browser, operating system, location, IP address. • BBD now available to individuals: Quantified Self
  • 9. 1. Research Opportunity 2. Understand 3. Collaborate How does your ME work relate to BBD? To Data Analytics & Social Sci? Engineering Social Sciences Data Analytics Behavioral Big Data
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. From theory to practice
  • 15. More and more human and social activities are moving online Most companies that have BBD were not created for the purpose of generating BBD Two important points
  • 16. Why should mechanical engineers care about BBD? Technology is advancing in two directions Fully automated (algorithmic) solutions Because you are (and should be) involved in designing both! Micro-level recording of human and social behavior
  • 17. 1. Research Opportunity 2. Understand 3. Collaborate How does your ME work relate to BBD? To Data Analytics & Social Sci? Engineering Social Sciences Data Analytics Behavioral Big Data
  • 18.
  • 19. the most crucial choices about the future of ordinary voters and their children are probably made not by Brussels bureaucrats or Washington lobbyists but by engineers, entrepreneurs, and scientists who are hardly aware of the implications of their decisions, and who certainly don’t represent anyone.
  • 20. Brief Tour of BBD Research in the Land of Social Science & Business
  • 21. Research using BBD Duncan Watts, Microsoft Research (NY): 1. Social science problems are almost always more difficult than they seem 2. The data required to address many problems of interest to social scientists remain difficult to assemble 3. Thorough exploration of complex social problems often requires the complementary application of multiple research traditions
  • 22. Academic Research Qs using BBD Causal questions about human and social behavior examine new phenomena re-examine old phenomena with better data
  • 23. Research Methodologies Using BBD Quasi experiments Randomized experiments Observational studies Survey studies Natural experiments
  • 24. Research Communities Researchers with social science + technical backgrounds Information Systems Marketing Computational Social Science
  • 25. 7Examples of BBD Studies in Top Journals
  • 26. Emotional Contagion in Social Networks (Kramer et al. Proc of the National Academies of Sciences, 2014) • Can emotional states be transferred to others via emotional contagion? • Old question, new data • Large-scale experiment run by FB, manipulating users’ exposure level to emotional expressions in their Facebook News Feed Anonymous Browsing in Dating Websites (Bapna et al. Management Science, 2016) • How does anonymous browsing affect outcomes on dating sites? • New questions about human behavior due to new technologies • Large-scale experiment on N American dating website Identifying Influential and Susceptible Members of Social Networks (Aral and Walker, Science, 2012) • How do individuals’ attributes modulate peer influence • Old question in new context • Experiment on social news aggregation website where users contribute news articles, discuss them, and rate comments
  • 27. Consumption in Virtual Worlds (Hinz et al. Info Sys Research, 2015) • Does conspicuous consumption increase social status? • Age-old sociology question with new BBD data • Observational BBD from 2 virtual world websites (gaming with social network) Impact of Online Intermediaries on HIV Transmission (Ghose & Chan MIS Quarterly, 2015) • Does entry of major online personals ad website increase HIV prevalence? • New context • Natural experiment on Craigslist Impact of Info Hiding on Crowdfunding (Burtch et al. Management Science, 2016) • Does peer influence drive information hiding in crowdfunding campaigns and effect on contributions • New online social context • Observational BBD from large online crowdfunding platform
  • 28. Forecasting Elections with Non-Representative Polls (Wang et al. Intl. Journal on Forecasting, 2014) • Can elections be forecast using a non-representative sample? • Old question, new data • Survey BBD from Xbox with built-in daily poll
  • 29. ONE WAY MIRRORS IN ONLINE DATING A Randomized Field Experiment Ravi Bapna, University of Minnesota Jui Ramaprasad, Mcgill University Galit Shmueli, National Tsing Hua University Akhmed Umyarov, University of Minnesota
  • 30. Online Dating of the single population in the US uses online dating to find a partner (Gelles 2011) %
  • 34. Research Question (in simple words) How does anonymous browsing affect user behavior? … and matching?
  • 35. Formal Research Question what is the relative causal effect of social inhibitions on search preferences vs. social inhibitions of contact initiation in dating markets? given known gender asymmetries, how does this effect differ for men vs. women?
  • 36. Randomized Field Experiment on Large Online Dating Website 50,000 users receive gift of anonymous browsing
  • 37. Results Users treated with anonymity become disinhibited view more profiles, view more same- sex and interracial mates get less matches lose ability to leave a weak signal - especially harmful for women!
  • 38. Role of anonymity and importance of WEAK SIGNAL in online platforms
  • 39. In Academia Purpose: Scientific inquiry Causal Qs are most popular • Determinants of social phenomena • Impact studies Predictive Qs (quite rare) In Industry Purpose: evaluate or improve products, service, operations, etc. Mostly predictive, but also causal • Netflix Prize: recommender system • Yahoo!, LinkedIn, FB: personalized news content to increase user engagement/clicks • Target: pregnancy prediction • Amazon: pricing, logistics,... • Government: campaign targeting BBD-based Research: Academia vs. Industry
  • 40. Getting BBD for Research 1. Open Data, Publicly Available Data Data.gov Twitter Kaggle (UCI MR) API and web scraping 2. Partnering with a Company • Both parties interested in research question • Data purchase • Personal connections • Partnership between school and organization (CMU Living Analytics Research Lab)
  • 41. 3. Crowdsourcing AMT Replacing student subjects • Experiment subjects • Survey respondents • Cleaning and tagging data “easy access to a large, stable, and diverse subject pool, the low cost of doing experiments, and faster iteration between developing theory and executing experiments” [Mason and Suri, 2012]
  • 42. Using BBD for Research: Human Subjects Institutional Review Board (IRB) “ethics committee” University-level committee designated to approve, monitor, and review biomedical and behavioral research involving humans. • performs benefit-risk analysis for proposed study • guidelines: Beneficence, Justice, and Respect for persons
  • 43. • HHS propose new IRB exemption criteria for publicly available data (or even buying it) • Council for Big Data, Ethics & Society’s letter: “these criteria for exclusion focus on the status of the dataset… not the content of the dataset nor what will be done with the dataset, which are more accurate criteria for determining the risk profile of the proposed research Ethics: Beyond IRB Facebook experiment [Kramer et al. 2014]: IRB Exemption “[The work] was consistent with Facebook’s Data Use Policy, to which all users agree prior to creating an account on Facebook, constituting informed consent for this research.” • Expression of Concern by PNAS editor • Varied response from public, academia, press, ethicists, corporates [Adar 2015]
  • 44. Big Behavioral Field Experiments: 5 Challenges
  • 45. Big Behavioral Field Experiments: Challenges 1. Fast-Changing Environment Users keep evolving Technology changes fast (Netflix) Parallel experiments run every day (Amazon) 2. Multidimensional Behavior, Context, Objectives Comp. advertising & content recommendation: 3M’s [Agarwal & Chen 2016] • Multi-response (clicks, shares, likes,…) • Multi-context (mobile, email,...) • Multiple objectives (engagement, revenue,...)
  • 46. 4. Spillover Effects Treatment can affect control group (social networks) How to randomly assign on a social network? Dependence among units (data analysis) [Fienberg, 2015] 3. Knowledge of Allocation; Gift Effect (≈ clinical trials) • Allocation knowledge can affect outcome • Blinding? placebo? • Online users discover their allocation via online forums • “Gift” or preferential treatment can affect outcome BB Field Experiments: More Challenges
  • 47. 5. Ethical and Moral Issues Ease of running a large scale experiment quickly and at low cost -> danger of harming many people quickly small scale pilot study? Experiment platforms: Fair treatment & payment BB Field Experiments: Even More Challenges
  • 48.
  • 49. Big Behavioral Quasi-Experiments & Observational Studies: 5 Methodological Issues
  • 50. Quasi-Experiments and Observational BBD: Methodological Challenges 1. Data Size & Dimension Scaling of statistical inference: p-values, multiple testing “Too Big to Fail: Large Samples and the p-Value Problem” (Lin, Lucas & Shmueli ISR 2013) Data Dredging Can detect lots of tiny & complex effects Role of theory vs data discovery Role of Prediction “Predictive Analytics in Information Systems Research” (Shmueli & Koppius MISQ 2011)
  • 51. 2. Self-Selection Bias Users choose treatment/control group Scaling of stat/econ methods to big data “A Tree-Based Approach for Addressing Self-selection in Impact Studies with Big Data” (Yahav, Shmueli & Mani, MIS Quarterly 2016)
  • 52. More challenges (in search of causal explanations) 3. Simpson’s Paradox Causal direction reverses when data are disaggregated Big data: lots of possible breakdowns “The Forest or the Trees? Tackling Simpson’s Paradox with Classification Trees” (Shmueli & Yahav, 2016) Does a dataset display a paradox?
  • 53. And finally… 5. Data Contaminated by Experiments + some of the randomized experiments issues (fast-changing environment, etc.)
  • 54. Using Observational Data: Ethical & Moral Issues 1. Web data collection by researchers 2. Data protection, data sharing, and reproducible research (Privacy - Netflix) 3. Data tagging by AMT – fair payment (+quality issues)
  • 55. Large Scale Surveys Data quality issues at large scale • duplicate responses • insincere responses Online surveys: cheap, easy, fast Large pool of available “workers” Supplement experimental/observational studies The promise of para data Data on how the survey was accessed/answered (OECD Survey of Adult Skills) • time stamps of opening invitation email, survey access,… • duration for answering each question
  • 56. The real gorilla in large scale surveys: Generalization Sampling and non-sampling errors “The central issue is whether conditional effects in the sample… may be transported to desired target populations. Success depends on compatibility of causal structures in study and target populations, and will require subject matter considerations in each concrete case.” - Keiding and Louis, JRSS 2016 Statistical generalization & scientific generalization Who do the Turkers represent? Information Quality: The Potential of Data & Analytics to Generate Knowledge, Kenett & Shmueli, Wiley 2016 “Clarifying the terminology that describes scientific reproducibility” (Kenett & Shmueli, Nature Methods 2015)
  • 57. Summary Technical Challenges Data access Analysis scalability Quick-changing environment BBD = lots of behavioral data Who has it? How is it analyzed? For what purpose? Methodological Challenges Selection bias Generalization Data contaminated by other experiments Spillover effects Lack of methodical lifecycle Legal, Ethical, Moral Challenges Privacy violation (Netflix; networks) Risks to human subjects Company vs. Researcher Objectives Gains of company at expense of individuals, communities, societies, & science
  • 58. Why should mechanical engineers care about BBD? Technology is advancing in two directions Fully automated (algorithmic) solutions Micro-level recording of human and social behavior
  • 59. Going Forward… Convergence of Social Sciences & Engineering Things now collect BBD (intentionally or not)
  • 60. 1. Research Opportunity 2. Understand 3. Collaborate How does your ME work relate to BBD? To Data Analytics & Social Sci? Engineering Social Sciences Data Analytics Behavioral Big Data
  • 61. Galit Shmueli 徐茉莉 Institute of Service Science