Keynote address by Galit Shmueli at 2016 Israeli Conference on Mechanical Engineering (ICME), Technion, Israel (Nov 23, 2016). http://icme2016.net.technion.ac.il/
2. I’m not a mechanical engineer
PhD in Statistics, Technion IE&M
CMU Statistics Dept
U of Maryland Business School
Indian School of Business
National Tsing (“Ching”) Hua U, Inst. Service Science
פרופ'שמואלי מנחם(ז"ל)
1935-1980
מכונות הנדסת,טכניון
3. Research in Data Analytics
‘Entrepreneurial’ statistical &
data mining modeling
(for today’s problems)
Interdisciplinary Research
Statistical Strategy
To Explain or To Predict?
Information Quality
Data Mining and Causality
4. What is Behavioral Big Data (BBD)
Special type of Big Data
Behavioral: people’s actions, interactions,
self-reported opinions, thoughts, feelings
Human and social aspects: Intentions, deception,
emotion, reciprocation, herding,…
When aware of data collection -> modify behavior
(legal risks, embarrassment, unwanted solicitation)
5. BBD vs.
Medical
Big Data
• Physical
measurements
• Data collection
timing often set
by medical system
• Clinical trials:
awareness &
vested interest
• People’s daily actions,
interactions, self-reported
feelings, opinions, thoughts
• Data generation timing
often chosen by user
• Experiments: users often
unaware; goal not always in
user’s interest
6. BBD on Citizens and Customers – old story
Governments
law enforcement,
security, traffic
(cameras, sensors)
Financial Institutions
fraud, loans
(IT systems, cameras)
Telecoms fraud,
infrastructure, marketing
(IT systems, mobile)
Retail Chains
marketing, operations, merchandising
(POS systems, video, social, mobile)
Insurance
Usage-based premiums
(telematics)
“Old”:
• Cameras
• Sensors
• IT systems
(POS, calls,…)
New:
• GPS
• Internet
• Mobile
• Social
• Things
7. BBD on Employees
Service Providers
quality control, employee performance
Electronic Performance Monitoring
(EPM) systems, web surfing, e-mails
sent and received, telephone use,
video, location (taxis)
8. BBD on Citizens, Customers, Employees: Internet!
• BBD now also available to small companies & organizations
• Online platforms have BBD (e-commerce, gaming, search,
social networks…)
• Voluntarily entered by users (UGC): personal details, photos,
comments, messages, search terms, bids in auctions, likes,
payment information, connections with “friends”
• Passive footprints: duration on the website, pages browsed,
sequence, referring website, Internet browser, operating
system, location, IP address.
• BBD now available to individuals: Quantified Self
9. 1. Research
Opportunity
2. Understand
3. Collaborate
How does your ME work
relate to BBD?
To Data Analytics & Social Sci?
Engineering
Social
Sciences
Data
Analytics
Behavioral Big Data
15. More and more human and social
activities are moving online
Most companies that have BBD were not
created for the purpose of generating BBD
Two important points
16. Why should mechanical engineers care about BBD?
Technology is advancing in two directions
Fully automated
(algorithmic) solutions
Because you are (and should be) involved in designing both!
Micro-level recording of
human and social behavior
17. 1. Research Opportunity
2. Understand
3. Collaborate
How does your ME work
relate to BBD?
To Data Analytics & Social Sci?
Engineering
Social
Sciences
Data
Analytics
Behavioral Big Data
18.
19. the most crucial choices about the future of ordinary voters and their children are
probably made not by Brussels bureaucrats or Washington lobbyists but by
engineers, entrepreneurs, and scientists who are hardly aware of the
implications of their decisions, and who certainly don’t represent anyone.
20. Brief Tour of BBD Research
in the Land of Social Science & Business
21. Research using BBD
Duncan Watts, Microsoft Research (NY):
1. Social science problems are almost always more
difficult than they seem
2. The data required to address many problems of
interest to social scientists remain difficult to assemble
3. Thorough exploration of complex social problems
often requires the complementary application of
multiple research traditions
22. Academic Research Qs using BBD
Causal questions about
human and social behavior
examine new
phenomena
re-examine old phenomena
with better data
23. Research Methodologies Using BBD
Quasi
experiments
Randomized
experiments
Observational
studies
Survey
studies
Natural
experiments
26. Emotional Contagion in Social Networks
(Kramer et al. Proc of the National Academies
of Sciences, 2014)
• Can emotional states be transferred to
others via emotional contagion?
• Old question, new data
• Large-scale experiment run by FB,
manipulating users’ exposure level to
emotional expressions in their
Facebook News Feed
Anonymous Browsing in Dating Websites (Bapna et al. Management Science, 2016)
• How does anonymous browsing affect outcomes on dating sites?
• New questions about human behavior due to new technologies
• Large-scale experiment on N American dating website
Identifying Influential and Susceptible
Members of Social Networks
(Aral and Walker, Science, 2012)
• How do individuals’ attributes
modulate peer influence
• Old question in new context
• Experiment on social news
aggregation website where users
contribute news articles, discuss
them, and rate comments
27. Consumption in Virtual Worlds
(Hinz et al. Info Sys Research, 2015)
• Does conspicuous consumption increase social status?
• Age-old sociology question with new BBD data
• Observational BBD from 2 virtual world websites (gaming with social network)
Impact of Online Intermediaries on
HIV Transmission
(Ghose & Chan MIS Quarterly, 2015)
• Does entry of major online personals
ad website increase HIV prevalence?
• New context
• Natural experiment on Craigslist
Impact of Info Hiding on Crowdfunding
(Burtch et al. Management Science, 2016)
• Does peer influence drive information
hiding in crowdfunding campaigns and
effect on contributions
• New online social context
• Observational BBD from large online
crowdfunding platform
28. Forecasting Elections with Non-Representative Polls
(Wang et al. Intl. Journal on Forecasting, 2014)
• Can elections be forecast using a non-representative
sample?
• Old question, new data
• Survey BBD from Xbox with built-in daily poll
29. ONE WAY MIRRORS IN
ONLINE DATING
A Randomized Field Experiment
Ravi Bapna, University of Minnesota
Jui Ramaprasad, Mcgill University
Galit Shmueli, National Tsing Hua
University
Akhmed Umyarov, University of Minnesota
30. Online Dating
of the single population in the US uses online dating to
find a partner (Gelles 2011)
%
34. Research Question (in simple words)
How does
anonymous browsing
affect user behavior?
… and matching?
35. Formal Research Question
what is the relative causal effect of
social inhibitions on search
preferences vs. social inhibitions
of contact initiation in dating
markets?
given known gender asymmetries,
how does this effect differ for men
vs. women?
37. Results
Users treated with anonymity
become disinhibited
view more profiles, view more same-
sex and interracial mates
get less matches
lose ability to leave a weak signal
- especially harmful for women!
39. In Academia
Purpose: Scientific inquiry
Causal Qs are most popular
• Determinants of social phenomena
• Impact studies
Predictive Qs (quite rare)
In Industry
Purpose: evaluate or improve
products, service, operations, etc.
Mostly predictive, but also causal
• Netflix Prize: recommender system
• Yahoo!, LinkedIn, FB: personalized
news content to increase user
engagement/clicks
• Target: pregnancy prediction
• Amazon: pricing, logistics,...
• Government: campaign targeting
BBD-based Research: Academia vs. Industry
40. Getting BBD for Research
1. Open Data, Publicly Available Data
Data.gov
Twitter
Kaggle (UCI MR)
API and web scraping
2. Partnering with a Company
• Both parties interested in research question
• Data purchase
• Personal connections
• Partnership between school and organization
(CMU Living Analytics Research Lab)
41. 3. Crowdsourcing
AMT Replacing student subjects
• Experiment subjects
• Survey respondents
• Cleaning and tagging data
“easy access to a large, stable, and diverse
subject pool, the low cost of doing
experiments, and faster iteration between
developing theory and executing
experiments” [Mason and Suri, 2012]
42. Using BBD for Research: Human Subjects
Institutional Review Board (IRB)
“ethics committee”
University-level committee designated
to approve, monitor, and review
biomedical and behavioral research
involving humans.
• performs benefit-risk analysis for
proposed study
• guidelines: Beneficence, Justice, and
Respect for persons
43. • HHS propose new IRB exemption criteria for publicly available data (or even buying it)
• Council for Big Data, Ethics & Society’s letter: “these criteria for exclusion focus on the
status of the dataset… not the content of the dataset nor what will be done with the
dataset, which are more accurate criteria for determining the risk profile of the
proposed research
Ethics: Beyond IRB
Facebook experiment [Kramer et al. 2014]:
IRB Exemption
“[The work] was consistent with Facebook’s Data Use Policy, to which
all users agree prior to creating an account on Facebook, constituting
informed consent for this research.”
• Expression of Concern by PNAS editor
• Varied response from public, academia,
press, ethicists, corporates [Adar 2015]
45. Big Behavioral Field Experiments: Challenges
1. Fast-Changing Environment
Users keep evolving
Technology changes fast (Netflix)
Parallel experiments run every day (Amazon)
2. Multidimensional Behavior, Context, Objectives
Comp. advertising & content recommendation: 3M’s [Agarwal & Chen 2016]
• Multi-response (clicks, shares, likes,…)
• Multi-context (mobile, email,...)
• Multiple objectives (engagement, revenue,...)
46. 4. Spillover Effects
Treatment can affect control group (social networks)
How to randomly assign on a social network?
Dependence among units (data analysis) [Fienberg, 2015]
3. Knowledge of Allocation; Gift Effect (≈ clinical trials)
• Allocation knowledge can affect outcome
• Blinding? placebo?
• Online users discover their allocation via online forums
• “Gift” or preferential treatment can affect outcome
BB Field Experiments: More Challenges
47. 5. Ethical and Moral Issues
Ease of running a large scale experiment quickly and at low cost
-> danger of harming many people quickly
small scale pilot study?
Experiment platforms: Fair treatment & payment
BB Field Experiments: Even More Challenges
50. Quasi-Experiments and Observational BBD:
Methodological Challenges
1. Data Size & Dimension
Scaling of statistical inference: p-values, multiple testing
“Too Big to Fail: Large Samples and the p-Value Problem” (Lin, Lucas & Shmueli ISR 2013)
Data Dredging
Can detect lots of tiny & complex effects
Role of theory vs data discovery
Role of Prediction
“Predictive Analytics in Information Systems Research” (Shmueli & Koppius MISQ 2011)
51. 2. Self-Selection Bias
Users choose treatment/control group
Scaling of stat/econ methods to big data
“A Tree-Based Approach for
Addressing Self-selection in Impact
Studies with Big Data” (Yahav,
Shmueli & Mani, MIS Quarterly 2016)
52. More challenges (in search of causal explanations)
3. Simpson’s Paradox
Causal direction reverses when data
are disaggregated
Big data: lots of possible breakdowns
“The Forest or the Trees?
Tackling Simpson’s Paradox
with Classification Trees”
(Shmueli & Yahav, 2016)
Does a dataset display a paradox?
53. And finally…
5. Data Contaminated by Experiments
+ some of the randomized experiments issues
(fast-changing environment, etc.)
54. Using Observational Data: Ethical & Moral Issues
1. Web data collection by researchers
2. Data protection, data sharing, and reproducible research
(Privacy - Netflix)
3. Data tagging by AMT – fair payment (+quality issues)
55. Large Scale Surveys
Data quality issues at large scale
• duplicate responses
• insincere responses
Online surveys: cheap, easy, fast
Large pool of available “workers”
Supplement experimental/observational studies
The promise of para data
Data on how the survey was accessed/answered
(OECD Survey of Adult Skills)
• time stamps of opening invitation email, survey access,…
• duration for answering each question
56. The real gorilla in large scale surveys: Generalization
Sampling and non-sampling errors
“The central issue is whether conditional effects in the sample… may
be transported to desired target populations. Success depends on
compatibility of causal structures in study and target populations,
and will require subject matter considerations in each concrete
case.” - Keiding and Louis, JRSS 2016
Statistical generalization & scientific generalization
Who do the Turkers represent?
Information Quality: The Potential of Data & Analytics to Generate Knowledge, Kenett & Shmueli, Wiley 2016
“Clarifying the terminology that describes scientific reproducibility” (Kenett & Shmueli, Nature Methods 2015)
57. Summary
Technical Challenges
Data access
Analysis scalability
Quick-changing environment
BBD = lots of behavioral data
Who has it?
How is it analyzed?
For what purpose?
Methodological Challenges
Selection bias
Generalization
Data contaminated by other experiments
Spillover effects
Lack of methodical lifecycle
Legal, Ethical, Moral Challenges
Privacy violation (Netflix; networks)
Risks to human subjects
Company vs. Researcher Objectives
Gains of company at expense of
individuals, communities, societies, &
science
58. Why should mechanical engineers care about BBD?
Technology is advancing in two directions
Fully automated
(algorithmic) solutions
Micro-level recording of
human and social behavior
60. 1. Research Opportunity
2. Understand
3. Collaborate
How does your ME work
relate to BBD?
To Data Analytics & Social Sci?
Engineering
Social
Sciences
Data
Analytics
Behavioral Big Data