SlideShare a Scribd company logo
Al Nevarez
Senior Manager, Business Analytics
LinkedIn
Sally Sadosky
Group Manager, Market Research
LinkedIn
Market Research Meets Big Data Analytics
for Business Transformation
The Market Research Conference
Orlando, FL
Nov 2-4, 2015
Agenda
1. Linkedin’s Business
2. Market Research & Customer Feedback at LinkedIn
3. Market Research Big Data
4. Big Data: talent, tools & process at Linkedin for MR
5. Low cost per answer with modern ETL (Extract, Transform, Load)
6. The value is in the JOIN
7. Reporting
8. Analysis: Traditional & Modern techniques
9. The Big Picture
Linkedin’s Business
Create economic opportunity
for every member of the global
workforce
Vision
SCHOOLSCOMPANIES KNOWLEDGESKILLSMEMBERS JOBS
T H E E C O N O M I C G R A P H
Value Proposition: Connect to Opportunity
B2C
Business to Consumer
B2B
Business to Business
Market Research & Analytics are key
to bridge the gap
With your professional
world
Through professional
news and knowledge
And build your career
Connect Stay Informed Get Hired
For our members
Power the majority
of the world’s hires
Identify & engage
professionals with
relevant content
Social selling.
Transform cold
calls into warm
prospects
Hire Market Sell
Share content,
find, contact, and
learn more about
people at your
company
@Work
For our clients
At LinkedIn, we believe in:
1. Delivers on a singular value proposition in a world class way
2. Simple, intuitive and anticipates needs
3. Exceed expectations
4. Emotionally resonate
5. Change the user’s life for the better
Opportunity
Identification and
Exploration
Idea Generation
Concept Definition
Product Definition
User Experience
and Usability
Go To Market
Product Launch
Post Launch
Tracking and
Evaluation
Member
Empathy
Research and Analytics
NPS as a Measure of Loyalty
Post Launch
Tracking and
Evaluation
Member
Empathy
Opportunity
Identification and
Exploration
Idea Generation
Concept Definition
Product Definition
User Experience
and Usability
Go To Market
Product Launch
Post Launch
Tracking and
Evaluation
13
How likely are you to
recommend LinkedIn to a
friend or a colleague?
NPS
14
Area of Focus
Known to Self
Unknown to Others
Open
Hidden
Known to Linkedin Unknown to Linkedin
Known to Members
Unknown to Members
Discovery
Unknown
15
NPS captures both Heart and Mind
• 2000 completes per month per country
• Daily email sends
• Representative sample: # of visits per 90 days
• Members are kept anonymous
• Mobile ready
• In local language
• Results weighted by country
16
LinkedIn’s NPS and CSAT program
19
Top 9
Countries
Questionnaire Design
• Set a competitive context
• social networking, jobs sites, content
• NPS for each selected site
• Open-end about NPS rating
• CSAT product questions for LinkedIn
• Emotional driver questions for LinkedIn
• Open-end on what LinkedIn can do better
• Key demographics
• Re-contact permission ask
• Behavioral data appends (pre-prop)
Market Research & Big Data
364 mil 97 mil 34 bil
Market
Research
Big Data
Analysis Teams
Research Analysis Teams at Linkedin
1. Market research analysts
2. Business Analytics Data Scientists  Al
Talent
Solutions
Marketing
Solutions
100 team members support 9000+ employees
Sales
Solutions
Premium
Subscriptions
Consumer
Marketing
Business Analytics
Business Operations & Analytics
CFO
CEO
Where is Business Analytics
in Linkedin’s organization ?
Market
Research
Insights
What is the best
that could happen?
Intelligence
What will happen?
Information/Knowledge
Why did it happen?
Data
What happened?
Business ROI
Business analytics evolution: from data to transformation
Transformation & Change
Implement & monitor
Business models
Marketing, Sales, Recruiting
Targeting & Attribution
Customer experience
Communication/interpersonal skills
Statistics
Probability
Optimization
Modeling
Numerical analysis
Simulations
Analytics
A-B Testing
SQL, ETL, APIs,
relational database,
graph database,
software engineering,
tool building, web
applications, R, Python,
Data disualization,
data mining,
Machine Learning
Hadoop,
Spark, Hive, Pig
The business analytics staff - Complete Data Scientists
Business
Knowledge
Outcome = Data products
which many staff can leverage
Big Data
Big Data Technical Themes
1. Efficient: Move the computation to the data
2. Shared foundation to build on with open source
3. Scalability (storage 1/10th the price of traditional)
4. Scalability (grow to multiple – thousands –
of processors with little cost)
5. Reliability (replicated data, failure survival)
6. Schema on read (save all data in raw form, NoSQL)
Components of Hadoop
3 areas
1. Data Storage HDFS: a network OS for the data, replication
2. Map reduce: Efficiently spreads the work
3. Hadoop libraries: Hive, Hbase, Pig….
Big Data Query & Analysis Tools
Hadoop
Big Data Tools We Use Regularly at
Hadoop
Hive
Pig
Low cost storage
Unstructured data
Highly scalable processing
SQL-like query
Query Hadoop data
Massive result sets
Advanced processing
Advanced ETL
Data Flows
Map Reduce
Example: average a billion #s
Distribute to 1000 nodes > Get sum & count at each node >
Sum the sums and sum the counts > at end sum of sums / total counts
Survey
Vendor
DATA
EXTRACTION
DATA
TRANSFORMATION
DATA
VISUALIZATION
Our NPS survey response ETL Process Overview
API
Big Data’s Value for Linkedin
Low cost storage
+
Schema-less storage
+
Easy for Data
Warehouse team
= Lower cost per answer
Sampling from the Data Warehouse
Sampling Data Workflow for Survey Research
Members &
Clients use:
Flagship Desktop
Mobile Apps
Talent solutions
Marketing solutions
Sales solutions
Application
Data storage
(Engineering)
ETL to DWH
(Data Services)
400mil members
• Sign ins
• Profile edits
• Language setting
• Product registrations
• Searches
• Publishing
Profile summaries
Aggregated data
Usage & Engagement
levels (daily visits)
Member segments
Survey history
Survey pre-pop data
Sample for
non-survey studies
Sample for
survey studies
SQL processes
Automated, some manual
Global
Daily, monthly or quarterly
Sampling strategy adjustments
Survey pre-pop data
Snapshot tables
SQL
(Marketing Operations)
Survey vendor
Snapshot
Pass through or pre-pop
Some member data is anonymously passed (or obfuscated and
passed) to the survey vendor with the invitation list to support:
1. Survey branching
2. Survey quota management
3. Survey language
4. Light reporting on survey vendor’s reporting platform
Pass through or pre-pop
Field count: dozen or so
In addition to pre-pop data passed to the survey vendor,
internally we store “snapshot” values about each survey invitee.
1. Maintains a snapshot of the member’s full profile at the time
of survey
2. Private & internal to Linkedin
3. Used for internal NPS (general BI) analysis & dashboards
4. Used for data mining & pattern discovery
5. Used by many departments to understand members/clients’
activity at time of survey
6. Slice and dice by anything that comes up
7. Key = member id
Snapshot Profile Data
Field count: Hundreds
ETL Process for Low Cost Per Answer
from your survey results
ETL Process Before Big Data
Survey Vendor Data
Survey program A
Survey program B
Survey program C
Survey program D
Survey program E
Multiple Relational
Database Tables
Survey Table A
Survey Table B
Survey Table C
Survey Table D
Survey Table E
What if Survey B adds 5 questions and drops 3 questions ?
$ $ $ $
Schema A
Schema B
Schema C
Schema D
Schema E
ETL Process After Big Data
Survey Vendor
Survey program A
Survey program B
Survey program C
Survey program D
Survey program E
1 Simple relational
database table
… with just the data
we need for analysis and
dashboards
But ALL the data fully
available on Hadoop
for other studies
$
$
Schema
HDFS
Survey document
storage on HDFS
Record 1:
{
"record" : 8695,
"uuid" : "zzcxgtz2m0ahuzf2",
"date" : 1434475680000,
"start_date" : 1434475020000,
"customer_id" : "abd123",
”survey_fields" : {
"Q1_NPS" : "10",
"Q6_Driversr1" : "11",
"Q6_Driversr2" : "7",
"Q6_Driversr3" : "8",
"Q7_Productsatr1" : "8",
"Q7_Productsatr2" : "9",
"Q7_Productsatr3" : "10",
"wave" : 1,
"country" : 1,
"is_mobile" : 1,
"mobileos" : 3
"verbatim1": "Love Linkedin!"
"status" : 3
}
}
Schema
An example survey record (condensed)
Core key values are those that exists for
every survey record.
Under “survey_fields” we have the
survey specific fields.
DWH team only stores this.
The may be very different between
survey programs, and may change
for a given survey program. DWH
team doesn’t care.
Example PIG script to read from HDFS
survey_raw = LOAD '/data/external/survey_vendor/survey_program1/
survey_step1 = FILTER survey_raw BY survey_fields#'status' == '3';
survey_step2 = FOREACH survey_step1 GENERATE
(charArray) ‘survey_program1' AS suvey_program_id,
(charArray) uuid AS unique_response_id,
(charArray) id AS member_id,
(int) survey_fields#'vwave' as wave_field,
(int) survey_fields#'Q1_NPS' AS nps_value,
(charArray) survey_fields#'verbatim1' AS reason,
(int) survey_fields#'Q6_Drivers1',
(int) survey_fields#'Q6_Drivers2',
(int) survey_fields#'Q6_Drivers3',
(int) survey_fields#'Q7_Product_csat1',
(int) survey_fields#'V7_Product_csat2',
(int) survey_fields#'V7_Product_csat3',
(int) additionalinfo#'mobileos',
STORE survey_step2 INTO 'survey_nps' USING PigStorage('t');
Upload
To Teradata
Why is all this important? Because..
The Power is in the SQL JOIN
(and letting others join too)
select NPS_value, behavior1, behavior2
from nps_data a
inner join behavior1_data b
on a.customer_id = b.customer_id
inner join behavior2_data c
on a.customer_id = c.customer_id
NPS Data Behavior
1 Data
Behavior
2 Data
• What’s the NPS for each of our
member audience segments?
• What’s the NPS of members who
received our recent marketing
campaign and took action on it?
• What’s the NPS of software engineers
who have at least 5 skills, each with
more than 10 endorsements on their
profile?
Connect Stay Informed Get Hired
The JOIN allows us to answer questions in context of
business needs and customer experience
• What’s the satisfaction with our new
messaging tool for members who had
it enabled?
• What’s the NPS by region for
members who have purchased our
premium subscriptions?
• What’s the CRM record for B2B
customers who took our NPS survey?
• Which members scored highly on both
our member survey and our Talent
solutions survey?
Reports
Our NPS monitoring tool at Linkedin
Analytics
Big Data Trends 2014
1. Uploadable, findable, shareable, real-time data
2. Sensors use rising rapidly.
3. Processing costs falling rapidly, while cloud rises
4. Beautiful new user interfaces, aided by data-generating
consumers – helping make data usable/useful
5. Data mining / analytics tools improving & helping
find patterns
6. Early emergence of data/pattern driven problem
solving
Data Mining or
Machine Learning Outcomes
1. Rank or prioritize a customer or prospect list
2. Replace or move assets or resources
3. Classify or segment
4. Rank drivers of a key metric
5. Categorize text
6. Generate a lift for a key metric
Why not: NPS, Promoters, CSAT ?
Data Mining Techniques
Commonly Used by the
Business Analytics Team on Market
Research & other Marketing data
• Decision Trees & Random Forest
• Generalized Boosted Models (GBM)
• Logistic Regression
• Stochastic Gradient Descent(SGD)
• Clustering
• Bayesian Networks
• Text Classification & Mining (LDA, NLP)
LowHigh
Low High
54
Quad Chart: Importance vs. Performance
Invest & Improve
Monitor
Driver 1
Importance
Performance
Maintain & Leverage
Assess needs
Driver 2
Driver 3
Driver 4
Driver 5
Tools for Provoking & Taking Action
56
1. Always-available NPS and CSAT Dashboards for anyone,
for any product line
2. Drill down analysis
3. Emotional driver prioritization
4. Product driver prioritization
5. Open ends or verbatims
6. Composition & waterfall analysis for studying changes
7. Deep pattern analysis and focus
The Big Picture on Why Big Data Matters to Market Research
Business
Knowledge
Market
Research
The Big Picture on Why Big Data Matters to Market Research
CustomersProduct
Market Research
The Big Picture on Why Big Data Matters to Market Research
Moore’s Law
We are hiring!
Linkedin Job Search on:
Linkedin Business Analytics
Market Research
Transform yourself
Transform the company
Transform the world
Our vision is to create economic opportunity
for every member of the global workforce.
Thank you from
Al Nevarez
Sally Sadosky

More Related Content

What's hot

SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
CareerBuilder.com
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
Revolution Analytics
 
The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !
Christian Bilien
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
Inside Analysis
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
Inside Analysis
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
Sheetal Pratik
 
Applications in R - Success and Lessons Learned from the Marketplace
Applications in R - Success and Lessons Learned from the MarketplaceApplications in R - Success and Lessons Learned from the Marketplace
Applications in R - Success and Lessons Learned from the Marketplace
Revolution Analytics
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Agile Testing Alliance
 
MAALBS Big Data agile framwork
MAALBS Big Data agile framwork MAALBS Big Data agile framwork
MAALBS Big Data agile framwork
balvis_ms
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summitOpen Analytics
 
Data Discoverability at SpotHero
Data Discoverability at SpotHeroData Discoverability at SpotHero
Data Discoverability at SpotHero
Maggie Hays
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
Neo4j
 
Data Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awarenessData Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awareness
DataWorks Summit/Hadoop Summit
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
SoftServe
 

What's hot (20)

SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Resume_VipinKP
Resume_VipinKPResume_VipinKP
Resume_VipinKP
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !
 
Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperative
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
 
Resume (1)
Resume (1)Resume (1)
Resume (1)
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Applications in R - Success and Lessons Learned from the Marketplace
Applications in R - Success and Lessons Learned from the MarketplaceApplications in R - Success and Lessons Learned from the Marketplace
Applications in R - Success and Lessons Learned from the Marketplace
 
RESUME_N
RESUME_NRESUME_N
RESUME_N
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
MAALBS Big Data agile framwork
MAALBS Big Data agile framwork MAALBS Big Data agile framwork
MAALBS Big Data agile framwork
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
 
Data Discoverability at SpotHero
Data Discoverability at SpotHeroData Discoverability at SpotHero
Data Discoverability at SpotHero
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Data Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awarenessData Aggregation, Curation and analytics for security and situational awareness
Data Aggregation, Curation and analytics for security and situational awareness
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 

Similar to Market Research Meets Big Data Analytics for Business Transformation

Agile data science
Agile data scienceAgile data science
Agile data science
Joel Horwitz
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution Showcase
Inside Analysis
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
Fried data summit big data for lob content
Fried data summit big data for lob contentFried data summit big data for lob content
Fried data summit big data for lob content
Jeff Fried
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Elemica
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubCloudera, Inc.
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
FredReynolds2
 
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Denodo
 
GraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4jGraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4j
Neo4j
 
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Denodo
 
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...
panagenda
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
Prakalp Agarwal
 
Marketing Digital Command Center
Marketing Digital Command CenterMarketing Digital Command Center
Marketing Digital Command Center
DataWorks Summit
 
RNUG 2020: Domino Application Strategy: Key insights for successful moderniza...
RNUG 2020: Domino Application Strategy: Key insights for successful moderniza...RNUG 2020: Domino Application Strategy: Key insights for successful moderniza...
RNUG 2020: Domino Application Strategy: Key insights for successful moderniza...
panagenda
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
MapR Technologies
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US InformationJulian Tong
 
Presentation to Analytics Network of the OR Society Nov 2020
Presentation to Analytics Network of the OR Society Nov 2020Presentation to Analytics Network of the OR Society Nov 2020
Presentation to Analytics Network of the OR Society Nov 2020
Paul Laughlin
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Greg Makowski
 

Similar to Market Research Meets Big Data Analytics for Business Transformation (20)

Agile data science
Agile data scienceAgile data science
Agile data science
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution Showcase
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Fried data summit big data for lob content
Fried data summit big data for lob contentFried data summit big data for lob content
Fried data summit big data for lob content
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
 
GraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4jGraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4j
 
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
 
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Marketing Digital Command Center
Marketing Digital Command CenterMarketing Digital Command Center
Marketing Digital Command Center
 
RNUG 2020: Domino Application Strategy: Key insights for successful moderniza...
RNUG 2020: Domino Application Strategy: Key insights for successful moderniza...RNUG 2020: Domino Application Strategy: Key insights for successful moderniza...
RNUG 2020: Domino Application Strategy: Key insights for successful moderniza...
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate RisksLearn How Financial Services Organizations Can Use Big Data to Mitigate Risks
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Presentation to Analytics Network of the OR Society Nov 2020
Presentation to Analytics Network of the OR Society Nov 2020Presentation to Analytics Network of the OR Society Nov 2020
Presentation to Analytics Network of the OR Society Nov 2020
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 

Market Research Meets Big Data Analytics for Business Transformation

  • 1. Al Nevarez Senior Manager, Business Analytics LinkedIn Sally Sadosky Group Manager, Market Research LinkedIn Market Research Meets Big Data Analytics for Business Transformation The Market Research Conference Orlando, FL Nov 2-4, 2015
  • 2. Agenda 1. Linkedin’s Business 2. Market Research & Customer Feedback at LinkedIn 3. Market Research Big Data 4. Big Data: talent, tools & process at Linkedin for MR 5. Low cost per answer with modern ETL (Extract, Transform, Load) 6. The value is in the JOIN 7. Reporting 8. Analysis: Traditional & Modern techniques 9. The Big Picture
  • 4. Create economic opportunity for every member of the global workforce Vision
  • 5. SCHOOLSCOMPANIES KNOWLEDGESKILLSMEMBERS JOBS T H E E C O N O M I C G R A P H
  • 6.
  • 7. Value Proposition: Connect to Opportunity B2C Business to Consumer B2B Business to Business Market Research & Analytics are key to bridge the gap
  • 8. With your professional world Through professional news and knowledge And build your career Connect Stay Informed Get Hired For our members
  • 9. Power the majority of the world’s hires Identify & engage professionals with relevant content Social selling. Transform cold calls into warm prospects Hire Market Sell Share content, find, contact, and learn more about people at your company @Work For our clients
  • 10. At LinkedIn, we believe in: 1. Delivers on a singular value proposition in a world class way 2. Simple, intuitive and anticipates needs 3. Exceed expectations 4. Emotionally resonate 5. Change the user’s life for the better
  • 11. Opportunity Identification and Exploration Idea Generation Concept Definition Product Definition User Experience and Usability Go To Market Product Launch Post Launch Tracking and Evaluation Member Empathy Research and Analytics
  • 12. NPS as a Measure of Loyalty Post Launch Tracking and Evaluation Member Empathy Opportunity Identification and Exploration Idea Generation Concept Definition Product Definition User Experience and Usability Go To Market Product Launch Post Launch Tracking and Evaluation
  • 13. 13 How likely are you to recommend LinkedIn to a friend or a colleague? NPS
  • 14. 14 Area of Focus Known to Self Unknown to Others Open Hidden Known to Linkedin Unknown to Linkedin Known to Members Unknown to Members Discovery Unknown
  • 15. 15 NPS captures both Heart and Mind
  • 16. • 2000 completes per month per country • Daily email sends • Representative sample: # of visits per 90 days • Members are kept anonymous • Mobile ready • In local language • Results weighted by country 16 LinkedIn’s NPS and CSAT program 19 Top 9 Countries
  • 17. Questionnaire Design • Set a competitive context • social networking, jobs sites, content • NPS for each selected site • Open-end about NPS rating • CSAT product questions for LinkedIn • Emotional driver questions for LinkedIn • Open-end on what LinkedIn can do better • Key demographics • Re-contact permission ask • Behavioral data appends (pre-prop)
  • 18. Market Research & Big Data
  • 19.
  • 20. 364 mil 97 mil 34 bil
  • 23. Research Analysis Teams at Linkedin 1. Market research analysts 2. Business Analytics Data Scientists  Al
  • 24. Talent Solutions Marketing Solutions 100 team members support 9000+ employees Sales Solutions Premium Subscriptions Consumer Marketing Business Analytics Business Operations & Analytics CFO CEO Where is Business Analytics in Linkedin’s organization ? Market Research
  • 25.
  • 26. Insights What is the best that could happen? Intelligence What will happen? Information/Knowledge Why did it happen? Data What happened? Business ROI Business analytics evolution: from data to transformation Transformation & Change Implement & monitor
  • 27. Business models Marketing, Sales, Recruiting Targeting & Attribution Customer experience Communication/interpersonal skills Statistics Probability Optimization Modeling Numerical analysis Simulations Analytics A-B Testing SQL, ETL, APIs, relational database, graph database, software engineering, tool building, web applications, R, Python, Data disualization, data mining, Machine Learning Hadoop, Spark, Hive, Pig The business analytics staff - Complete Data Scientists Business Knowledge Outcome = Data products which many staff can leverage
  • 29. Big Data Technical Themes 1. Efficient: Move the computation to the data 2. Shared foundation to build on with open source 3. Scalability (storage 1/10th the price of traditional) 4. Scalability (grow to multiple – thousands – of processors with little cost) 5. Reliability (replicated data, failure survival) 6. Schema on read (save all data in raw form, NoSQL)
  • 30. Components of Hadoop 3 areas 1. Data Storage HDFS: a network OS for the data, replication 2. Map reduce: Efficiently spreads the work 3. Hadoop libraries: Hive, Hbase, Pig….
  • 31. Big Data Query & Analysis Tools Hadoop
  • 32. Big Data Tools We Use Regularly at Hadoop Hive Pig Low cost storage Unstructured data Highly scalable processing SQL-like query Query Hadoop data Massive result sets Advanced processing Advanced ETL Data Flows
  • 33. Map Reduce Example: average a billion #s Distribute to 1000 nodes > Get sum & count at each node > Sum the sums and sum the counts > at end sum of sums / total counts
  • 35. Big Data’s Value for Linkedin Low cost storage + Schema-less storage + Easy for Data Warehouse team = Lower cost per answer
  • 36. Sampling from the Data Warehouse
  • 37. Sampling Data Workflow for Survey Research Members & Clients use: Flagship Desktop Mobile Apps Talent solutions Marketing solutions Sales solutions Application Data storage (Engineering) ETL to DWH (Data Services) 400mil members • Sign ins • Profile edits • Language setting • Product registrations • Searches • Publishing Profile summaries Aggregated data Usage & Engagement levels (daily visits) Member segments Survey history Survey pre-pop data Sample for non-survey studies Sample for survey studies SQL processes Automated, some manual Global Daily, monthly or quarterly Sampling strategy adjustments Survey pre-pop data Snapshot tables SQL (Marketing Operations) Survey vendor Snapshot
  • 38. Pass through or pre-pop
  • 39. Some member data is anonymously passed (or obfuscated and passed) to the survey vendor with the invitation list to support: 1. Survey branching 2. Survey quota management 3. Survey language 4. Light reporting on survey vendor’s reporting platform Pass through or pre-pop Field count: dozen or so
  • 40. In addition to pre-pop data passed to the survey vendor, internally we store “snapshot” values about each survey invitee. 1. Maintains a snapshot of the member’s full profile at the time of survey 2. Private & internal to Linkedin 3. Used for internal NPS (general BI) analysis & dashboards 4. Used for data mining & pattern discovery 5. Used by many departments to understand members/clients’ activity at time of survey 6. Slice and dice by anything that comes up 7. Key = member id Snapshot Profile Data Field count: Hundreds
  • 41. ETL Process for Low Cost Per Answer from your survey results
  • 42. ETL Process Before Big Data Survey Vendor Data Survey program A Survey program B Survey program C Survey program D Survey program E Multiple Relational Database Tables Survey Table A Survey Table B Survey Table C Survey Table D Survey Table E What if Survey B adds 5 questions and drops 3 questions ? $ $ $ $ Schema A Schema B Schema C Schema D Schema E
  • 43. ETL Process After Big Data Survey Vendor Survey program A Survey program B Survey program C Survey program D Survey program E 1 Simple relational database table … with just the data we need for analysis and dashboards But ALL the data fully available on Hadoop for other studies $ $ Schema HDFS
  • 44. Survey document storage on HDFS Record 1: { "record" : 8695, "uuid" : "zzcxgtz2m0ahuzf2", "date" : 1434475680000, "start_date" : 1434475020000, "customer_id" : "abd123", ”survey_fields" : { "Q1_NPS" : "10", "Q6_Driversr1" : "11", "Q6_Driversr2" : "7", "Q6_Driversr3" : "8", "Q7_Productsatr1" : "8", "Q7_Productsatr2" : "9", "Q7_Productsatr3" : "10", "wave" : 1, "country" : 1, "is_mobile" : 1, "mobileos" : 3 "verbatim1": "Love Linkedin!" "status" : 3 } } Schema An example survey record (condensed) Core key values are those that exists for every survey record. Under “survey_fields” we have the survey specific fields. DWH team only stores this. The may be very different between survey programs, and may change for a given survey program. DWH team doesn’t care.
  • 45. Example PIG script to read from HDFS survey_raw = LOAD '/data/external/survey_vendor/survey_program1/ survey_step1 = FILTER survey_raw BY survey_fields#'status' == '3'; survey_step2 = FOREACH survey_step1 GENERATE (charArray) ‘survey_program1' AS suvey_program_id, (charArray) uuid AS unique_response_id, (charArray) id AS member_id, (int) survey_fields#'vwave' as wave_field, (int) survey_fields#'Q1_NPS' AS nps_value, (charArray) survey_fields#'verbatim1' AS reason, (int) survey_fields#'Q6_Drivers1', (int) survey_fields#'Q6_Drivers2', (int) survey_fields#'Q6_Drivers3', (int) survey_fields#'Q7_Product_csat1', (int) survey_fields#'V7_Product_csat2', (int) survey_fields#'V7_Product_csat3', (int) additionalinfo#'mobileos', STORE survey_step2 INTO 'survey_nps' USING PigStorage('t'); Upload To Teradata
  • 46. Why is all this important? Because.. The Power is in the SQL JOIN (and letting others join too) select NPS_value, behavior1, behavior2 from nps_data a inner join behavior1_data b on a.customer_id = b.customer_id inner join behavior2_data c on a.customer_id = c.customer_id NPS Data Behavior 1 Data Behavior 2 Data
  • 47. • What’s the NPS for each of our member audience segments? • What’s the NPS of members who received our recent marketing campaign and took action on it? • What’s the NPS of software engineers who have at least 5 skills, each with more than 10 endorsements on their profile? Connect Stay Informed Get Hired The JOIN allows us to answer questions in context of business needs and customer experience • What’s the satisfaction with our new messaging tool for members who had it enabled? • What’s the NPS by region for members who have purchased our premium subscriptions? • What’s the CRM record for B2B customers who took our NPS survey? • Which members scored highly on both our member survey and our Talent solutions survey?
  • 49. Our NPS monitoring tool at Linkedin
  • 51. Big Data Trends 2014 1. Uploadable, findable, shareable, real-time data 2. Sensors use rising rapidly. 3. Processing costs falling rapidly, while cloud rises 4. Beautiful new user interfaces, aided by data-generating consumers – helping make data usable/useful 5. Data mining / analytics tools improving & helping find patterns 6. Early emergence of data/pattern driven problem solving
  • 52. Data Mining or Machine Learning Outcomes 1. Rank or prioritize a customer or prospect list 2. Replace or move assets or resources 3. Classify or segment 4. Rank drivers of a key metric 5. Categorize text 6. Generate a lift for a key metric Why not: NPS, Promoters, CSAT ?
  • 53. Data Mining Techniques Commonly Used by the Business Analytics Team on Market Research & other Marketing data • Decision Trees & Random Forest • Generalized Boosted Models (GBM) • Logistic Regression • Stochastic Gradient Descent(SGD) • Clustering • Bayesian Networks • Text Classification & Mining (LDA, NLP)
  • 54. LowHigh Low High 54 Quad Chart: Importance vs. Performance Invest & Improve Monitor Driver 1 Importance Performance Maintain & Leverage Assess needs Driver 2 Driver 3 Driver 4 Driver 5
  • 55.
  • 56. Tools for Provoking & Taking Action 56 1. Always-available NPS and CSAT Dashboards for anyone, for any product line 2. Drill down analysis 3. Emotional driver prioritization 4. Product driver prioritization 5. Open ends or verbatims 6. Composition & waterfall analysis for studying changes 7. Deep pattern analysis and focus
  • 57. The Big Picture on Why Big Data Matters to Market Research Business Knowledge Market Research
  • 58. The Big Picture on Why Big Data Matters to Market Research CustomersProduct Market Research
  • 59. The Big Picture on Why Big Data Matters to Market Research Moore’s Law
  • 60. We are hiring! Linkedin Job Search on: Linkedin Business Analytics Market Research Transform yourself Transform the company Transform the world Our vision is to create economic opportunity for every member of the global workforce. Thank you from Al Nevarez Sally Sadosky

Editor's Notes

  1. Facebook not only “regularly polls its members about their Facebook experience” but also has created a Facebook Feedback Panel to harvest the type of longitudinal research that many skeptics have already claimed as dead. Despite its massive amount of user data, clearly Facebook sees the value in directly surveying its members.
  2. May drop the vison slides and just go to the Mission slides to shorten this section and get the Member more quickly.
  3. Mission: Connect the world’s professionals to make them more productive and successful
  4. HQ in Mountain View, CA, with offices in 30 cities around the world Linkedin is available in 23 languages Linkedin has more than 6,000 full time employees Over $2bil revenue last year
  5. A set of circumstances that makes it possible to do something A chance for employment or promotio Ecosystem. Members feed the business, the business feeds the Members. Market research and Analytics play a role in helping to build and strength those bridges Need to ask Al what the graphic means with google amazon, etc
  6. Now LinkedIn is the world biggest professional social network. 7 member segments: students, career starters, career builders, senior leaders, small business olders, - Not every member is our most active member. We look at people who are active. People who are not as active have as much to say as most active. We can learn from everyone.   LinkedIn's mission is to connect the world professionals to make them more productive and successful. The most important word in this mission is "professionals". How we achieve this mission? We focus on the following 3 areas.   Professional identity, professional networks and knowledge   In terms of Professional identity We want to have an up-to-date professional record to represent our experience, skills and perhaps most importantly our ambitions.   Professional networks, it is about connecting all world professionals. Our network connections help us to find career opportunities, business opportunities. We can keep in touch, or get back in touch with our old classmates, co-workers. Knowledge, Members leverage LinkedIn to express and exchange knowledge, as a professional publish platform.  
  7. Hire, Market, Sell   For our enterprise customers, we focus on hire, marketing and sale Hire, help enterprise to find and attract great talent, target the right person with the right job   Marketing, Engage members with relevant and meaningful content at scale.   Sell, find and engage buyers, use your company's connections to get warm introductions.
  8. Call out the 5 principles of building great products….also say this is what drives research….particularly exceeding expectaions and creating an experience that emotionally resonates with our Members
  9. At LI we follow a very traditional approach to both Market Research and User Experience Research. Using both qualitative and quantitative techniques we Identify opportunities in the market place We design and build a product or set of products that meet the members or customers needs We develop a go to market strategy We measure our success through NPS and member follow up research One of the huge benefits at LI is that we have attitudinal data through my surveys as well as the behavioral data that we collect as member use the site. Having this additional massive set of beavhior metrics has fundamentally changed how I think about research in many ways.
  10. But for today, I want to focus on just one area in this cycle The Post Launch Tracking and Evaluation. Our CEO Jeff Weiner comes from Yahoo where NPS was used as the metric to measure success and loyalty. Jeff is a huge fan of NPS and believes that the higher bar of 9 and 10 being promoters versus 0-6 being detractors helps to focus product teams on his 5 operating principles.
  11. Over the past 2 years of setting up the market research department at LI, the one key learning is that Loyalty and Satisfaction of LI has as much to do with the heart as it does with the mind or the product. This is where the analysis that Al and I have been working on really gets interesting In marketing, we have the dual challenge of reaching the hearts and minds of our members so they take action, so they engage with Linkedin. This became evident when we started to collect NPS data and verbatims of why they gave us the Likelihood to Recommend rating that they did. You can think of the mind as the analytical, the measuring side of our conscious. (click to appear) We have some tools to help our staff measure our member’s engagement & loyalty (go/nps), to read about their interactions with our products (go/voices), and more tools to come this year from business analytics and market research to help us to discover the patterns which cause these metrics to move. But unless you’re a product manager, or doing our marketing team’s work to understand our members, why should you care? How can we provoke the broader staff at Linkedin to know, to feel, and to do more for our members? How about if we broke free of the thinking side, and just went for the heart. We came up with one idea, called go/memberfeelin, which you can see scrolling by over there….
  12. AL: Besides possibly generating some empathy & awareness, why is this important? I like this tool called Johari’s window for understanding personal relationships. And I think it can work for Linkedin in general. There are things that you know about yourself, and some of those are known to others, some are unknown to others. (click) We might classify these situations as Open and Hidden. Let’s replace “Self” with “Linkedin”. (click) Same idea. (click) Now let’s consider the things that are Unknown to Linkedin When Unknown to us and but known to our members we can consider these situations where we (Linkedin) are Blind. Unknown to both is just unknown. (click) We don’t want to be blind (highlight box). Here is go/memberfeelin Go over to the other screen (leave Johari’s window up on screen #1) Demo A user interface that highlights emotional and personal words stated by our members These are from our daily Global NPS survey, answers to the “Why did you give Linkedin in that NPS score” question We found 19 terms which we felt would highlight very personal comments It’s currently running in what we called automatic mode, and it cycles through 5 comments for each word then goes onto the next word. You can also click any particular word or term on the left, and we’ll focus on that. We think the feature of anchoring the verbatim on the emotional word, helps to read and get to the heart of the member’s comment. On the right, we indicate the cm segment, their NPS rating level, and when they stated the comment. This goes back through all of 2014.
  13. Over the past 2 years of setting up the market research department at LI, the one key learning is that Loyalty and Satisfaction of LI has as much to do with the heart as it does with the mind or the product. This is where the analysis that Al and I have been working on really gets interesting In marketing, we have the dual challenge of reaching the hearts and minds of our members so they take action, so they engage with Linkedin. This became evident when we started to collect NPS data and verbatims of why they gave us the Likelihood to Recommend rating that they did. We found product improvements for sure….every company does, but we were finding a really strong predictive model based on survey data or behavioral data. This is when Al and I really started to combine forces to understand loyalty based on behaviors as well as emotional impact.
  14. For LinkedIn’s member NPS (we have a similar but customized process for each paid product): We do the following…go through the slide
  15. Set up the questionniare design About 8 minutes to complete 76% give us some input for the open ends…either one or both Over 80% agree to be re-contacted…. Follow ups: we will filter some to our customer service is they indicate a huge problem; We send regions their own data and have the marketing teams work in the local language We have a member call program from the recontacts We develop additional surveys as necessary
  16. Market research and big data. What’s the big deal?
  17. Though we enjoy reading these interesting articles. Well we don’t believe any of them. At Linkedin, we recognize it’s crucial to work together.
  18. Why’s that? Well, at linkedin, we have the priveledge of having lots of data. Coupled with high standards for our the privacy of our member data. As a result, we build internal tools that can handle the scale, the social graph, the economic graph with all it’s companies, employees, skills, jobs, education, and knowledge So you can image, having big data tools that combine this abundance of data, are quite welcome 400million registered users about 100 mil of them visiting every quarter Nearly 40 billion page views in the last quarter Linkedin doesn’t make any physical products. Our value is derived purely from information.
  19. So I think market research and big data can work together. At a place Like Linkedin, there’s a healthy cooperation regarding tools and resources and analytics horsepower Why.. All this data is a reflection of who we are A new Ecosystem has been created. Storage computing power, web frameworks, and social network We all strive to understand our customers better. The opportunity has never been greater.
  20. So how do we gear ourselves up to do this
  21. We live in a world where innovation, new companies, new projects are driven by customer needs. Besides the designers at Linkedin, we have a number of analysis roles to support this modern driver.
  22. At Linkedin, we’re serious about the business in business analytics. So much so, that we actually report to the CFO. My department supports all the major product lines at Linkedin. Helping sales, marketing, and operations in each to be more efficient. To help us know more And to help us know our members and clients more. But we don’t just do adhoc, 1 time analysis and deliver a deck…
  23. Why is our department critical? Back in the 70s there was a study to find the most efficient living thing in the world. A study of the energy needed to get from point A to point B. The condor won. It Took the least amount of energy to get from A to B Man didn't do so well, unimpressive at about a 1/3 of the way down the list. But someone doing the research was insightful enough to test Man on a Bicycle MwB won. twice as good as the condor. Tools But even with tools.. Still need the doctor
  24. Our philosophy and steps for going from data to transformation
  25. The yellow elephant has come to symbolize big data. arising from the Hadoop creator’s son’s toy elephant Where does Big Data come in. Big data has many facets. It’s also a new spirit of data management and analysis. The yellow elephant has become a symbol of this movement, Is it hype? "There is a need to bring market research data to Hadoop and Hadoop to market research data.”
  26. Here are some technical themes regarding big data Does this sound like hype? I don’t know. It seems quite useful to me. Map reduce is efficient Instead of moving the data to the computation, it moves the computation to the data. What are the themes of big data? 1/10th the price of traditional data For the same price you can store 10x as much Deploy to multiple processors with little cost. What is amazing about this is that it scales horizontally. If we double the number of machines, then (ignoring certain fixed-costs of running a MapReduce system) our computation should run approximately twice as fast. Each mapper machine will only need to do half as much work, and (assuming there are enough distinct keys to further distribute the reducer work) the same is true for the reducer mac
  27. 3 areas The first 2 are the true magic here Hadoop is a generic processing framework designed to execute queries and other batch read operations against massive datasets th at can be tens or hundreds of terabytes and even petabytes in size. The data is loaded into or appended to the Hadoop Distributed File System (HDFS) . Hadoop then performs brute force scans through the data to produce results that are output into ot her f iles It enables applications to work with thousands of computational independent computers and petabytes of data. Acid: atomic consistent, isolated durable I found it interesting how Facebook's blended the use of both traditional SQL data stores such as Oracle and MySQL and NoSQL solutions such as Hive as part of their overall solution. Is an entire ecosystem of integrated distributed computing tools, at the core of which are a file system (HDFS) and a programming framework (Map-Reduce).
  28. Big data comes with many new products. And they have fun names. Don’t get overwhelmed. Be thankful actually. It’s a scramble for innovation. A scramble to make it easier.
  29. We only use a few of these tools, Hadoop, Hive, Pig, and get tremendous value from them.
  30. Like a Data level operating system << Hadoop operates on massive datasets by horizontally scaling the processing across very large numbers of servers through an approach called MapReduce. Think about all the math in which you use a summation. What if all the things you were summing couldn’t fit on one machine. Or what if you could do chunks of the summation all in parallel, then bring the mini-results together later for a final result. That’s map reduce. Running on the magic of HDFS.. It brings the processing to the data, rather than the data to a single processor. Hundreds or thousands of small, inexpensive, commodity all executing in parallel. Using the MapReduce approach, Hadoop splits up a problem, sends the sub-problems to different servers, and lets each server solve its sub-problem in parallel. It then merges all the sub-problem solutions together and writes out the solution into files which may in turn be used as inputs into additional MapReduce steps . It enables applications to work with thousands of computational independent computers and petabytes of data. Acid: atomic consistent, isolated durable Is an entire ecosystem of integrated distributed computing tools, at the core of which are a file system (HDFS) and a programming framework (Map-Reduce). Datanodes are the workhorses of the filesystem. They store and retrieve blocks when they are told to (by clients or the namenode), and they report back to the namenode periodically with lists of blocks that they are storing. Example. If a number uses 7 bytes 700000000/1024^3 1 bil 7000000000/1024^3 = 6.5 gig bytes my laptop has 16gb ram 10 bil 65 gigbytes
  31. Extract, transform, load
  32. Here’s the bottom line on why this matter HDFS, the Hadoop Distributed File System, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information. Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications n a Hadoop cluster, data is distributed to all the nodes of the cluster as it is being loaded in. The Hadoop Distributed File System (HDFS) will split large data files into chunks which are managed by different nodes in the cluster. In addition to this each chunk is replicated across several machines, so that a single machine failure does not result in any data being unavailable. An active monitoring system then re-replicates the data in response to system failures which can result in partial storage. Even though the file chunks are replicated and distributed across several machines, they form a single namespace, so their contents are universally accessible.
  33. Linkedin, like facebook and others blend the use of both traditional SQL data stores such as Oracle & Teradta and Hadoop based solutions such as Hive as part of their overall solution. The big data storage is essential for our business, which is logging sign ins, profile updates, searches, etc. These are all evidence of engagement, which we use in our sampling strategy
  34. For example: If we want to plot or data mine NPS broken out by our member segments, we can join the snapshot data with NPS to have the actual segment at the time each member took the survey. Not the new segment the member may have moved into since.
  35. Data warehouse IT time is expensive. Setting up a new survey program requires creating new database tables, and designing the schema for each table Changes to the data structure
  36. From Vendor to storage is now very low cost, including automation from your DWH team Assuming the vendor has an API, passing data to Hadoop is quite inexpensive at this point. And if survey design changes for any of A to E, no problem, hadoops unstructured data storage nature handles it fine No schema to redesign And a 1 time setup of a process to ETL from hadoop to database table. It’s generally immune to changes in survey design, e.g. new fields, etc. Your DWH team will love it. And you can do it within budget.
  37. The power is in the join
  38. The join is powerful. Here are some example of the sort of questions we can answer easily because We have the data in our database, linked with marketing data, with our CRM data, etc. Allows me to ask questions like: I need a list of our power users/lovers for an upsell campaign: Who are the members who scored highly on our member NPS survey, and also score high on our Sales solutions survey And have added 10+ connections in the past week, and have marked at least 10 leads in the past week.
  39. This is one example of a Tableau dashboard we created, for helping any part of our business monitor their NPS score Trends, and verbatims The beauty behind this is that despite the multiple survey program and survey source, we condensed all this Data into 1 Teradata table, and make it quite easy for a tool like tableau to handle.
  40. Each year, Mary Meeker, of the VC firm Kleiner Perkins Caufield Beyers, publishes a comprehensive and always interesting 100+ page deck on Internet Trends. Meeker’s 2014 edition has a section on Big Data where she lists the following six trends: 1. Uploadable, findable, shareable, real-time data 2. Sensors use rising rapidly. 3. Processing costs falling rapidly, while cloud rises 4. Beautiful new user interfaces, aided by data-generating consumers – helping make data usable/useful 5. Data mining / analytics tools improving & helping find patterns 6. Early emergence of data/pattern driven problem solving
  41. Sure, I’ll have some of that.
  42. Traditional quad chart Importance calculated via: Correlation analysis, partial correlation analysis, bayesian networks with sensitivity analysis
  43. Analytics have little power until they inform a decision To conclude In a world where marketing is increasingly about listening to your customers and “meeting their needs, you need to find a way to, both, do that well and do it efficiently”
  44. And to wrap up, on a high level as to why big data matters for market research Remember this venn diagram I showed regarding the ideal data scientist earlier. Well It’s the same image for the modern market researcher.
  45. Let’s keep our role and our skills in perspective But that’s not all
  46. I argue that Moore’s law is why all of this matters. We continue to do more and more with microprocessors. Big data technologies like Hadoop and it’s HDFS have created a step change in what’s possible for our products and our customers, for our market research. Couple this with the rapidly falling price of storage As market researchers, we can help our companies realize new opportunities, even new business models. Moore: the number of transistors in a dense integrated circuit has doubled approximately every two years. The period is often quoted as 18 months because of Intel executive David House, who predicted that chip performance would double every 18 months (being a combination of the effect of more transistors and their being faster)