SlideShare a Scribd company logo
Understanding Voice of
Members via Text Mining
– How Linkedin built a text analytics platform at scale
Chi-Yi Kuan
Weidong Zhang
Tiger Zhang
Who are we?
www.linkedin.com/in/chiyikuan
Chi-Yi Kuan
www.linkedin.com/in/weidongzhang1
Weidong Zhang
 Tiger Zhang
www.linkedin.com/in/tigerzhang
•  Director, Analytics at Linkedin
•  Big data evangelist and
practitioner
•  Manager, Analytics Platform &
Apps at Linkedin
•  Build big data and analytics
products
•  Sr. Staff, Analytics at Linkedin
•  Text mining scientist and big data
enthusiast
Strata + Hadoop World, 12/8/2016
Strata + Hadoop World, 12/8/2016
KnowledgeSchoolsSkillsJobsCompaniesMembers
467M 7M 6M 3B 27k 200k
Endorsements Daily posts
Strata + Hadoop World, 12/8/2016
467M 2B Billions
LinkedIn Big Data
Strata + Hadoop World, 12/8/2016
Strata + Hadoop World, 12/8/2016
467+ million members = a lot of data
Voices: drive actionable intelligence from member voices…
What’s trending Products
Home
Page
Mobile Inbox
Sentiments Value Props
Hire Market Sell
Relevance filtering
Classification
Topic mining
Identify content that is
relevant to Linkedin
brand and products/
services
Structuralize
unstructured textual
data into well-defined
categories
Find most significant
topics and stories in a
certain time window 
Strata + Hadoop World, 12/8/2016
…creating impact across business metrics
Developed game-changing solutions to drive Voice of
Member impact
Improved analytics efficiency with unstructured data by
20X
Drove end-to-end technological integration on big data
and embedding NLP solutions
Piloting operational solutions to scale advanced analytics
impact for broader organization
Strata + Hadoop World, 12/8/2016
LinkedIn Hadoop Ecosystem
HDFS
Map-Reduce Tez Spark
Pig Hive Scalding
YARN
AZKABAN
Strata + Hadoop World, 12/8/2016
Design Principles for Voices Platform
Scalability Availability Easy to Use
Process Platform Data Systems Application Framework
Kafka, Hadoop
Spark
Gobblin
Elasticsearch
NoSQL
Phoenix
Elasticsearch
Highcharts
Strata + Hadoop World, 12/8/2016
E2E Voices Platform Architecture
Strata + Hadoop World, 12/8/2016
Data Processing at Scale – with Generic ETL
Strata + Hadoop World, 12/8/2016
Smart IDs – for Viral Mentions with Threading
Strata + Hadoop World, 12/8/2016
High Availability – through Heterogeneous Data
Strata + Hadoop World, 12/8/2016
Machine learning based analytic engine to surface insights
to everyday business users
Customized Feeds
Central navigation
Trending insights
Social analytics & topic
mining
Deep dives
Sentiment solutions
Strata + Hadoop World, 12/8/2016
Text mining is a crowded space
Strata + Hadoop World, 12/8/2016
Our solution targets unique use cases for LinkedIn
Member info
•  Identity
•  Behavior
•  Social
Social data
Customer feedback
•  Customer service
•  Group updates
•  Network updates
Survey results
What’s trending
Products
Sentiments
Value Propositions
PYMK Group
Home
Page Mobile Inbox
Identity Network
Hire Market Sell
Relevance
solution
Topic mining
Text Classification
Strata + Hadoop World, 12/8/2016
▪ Product insights, launches, and
events
▪ Horizontal themes
▪ PR and marketing campaigns
▪ Brand and value
▪ LinkedIn’s strategy, financial
performance, international etc.
Relevant: Non-relevant:
▪ Status update, e.g. "I posted
something on Linkedin";
▪ Social mentions, e.g. "Please
connect with me on Linkedin" or
"Follow me on Linkedin";
▪ Self promoting materials, e.g.
“share on LinkedIn”
▪ SPAMs
1) Focusing on relevant data
Strata + Hadoop World, 12/8/2016
Keyword based approach
Relevance
prediction
power
Rules
56%
Whitelist
Blacklist
10%
60%
6%
19%
35%
Strata + Hadoop World, 12/8/2016
Generic text classification framework
▪  Feature generation
▪  Feature selection
▪  Machine learning algorithms:
–  Naïve Bayes (NB)
–  Logistic Regression (LR)
–  Support Vector Machines (SVM)
(LibLinear)
▪  Cross-validation and evaluation
Applications
▪  LinkedIn relevance
▪  Sentiment analysis
▪  Product categorization
▪  Value proposition classification
2) Leveraging text classification engine
Strata + Hadoop World, 12/8/2016
Machine learning approach increases overall
relevance by 40%
Relevance
prediction
power
Rules
56%
Whitelist
Blacklist
6%
19%
40%
100%
SVM
35%
SVM: great gain in balancing
precision and recall
Strata + Hadoop World, 12/8/2016
3) Enabling topic mining
HIGH SPARK
Description
POS pattern matching
Part-of-speech (POS) tagging
(Stanford CoreNLP)
This is great.
… …
Topic pruning
-  Stemming
-  removing stop words
-  merging synonyms
-  clustering (optional)
**** ing ****** s
= =
Topic ranking: TF-IDF weighting
and DF ranking
Strata + Hadoop World, 12/8/2016
Trending Insights – identify organic trending topics
Didi and Kuaidi merger
Product release
Strata + Hadoop World, 12/8/2016
LinkedIn’s customer support has evolved into an
intelligence platform…
Scaling to have a broader impact across LinkedIn
▪  GCO cases
▪  Issue resolution
▪  Support focused
▪  Internal data (GCO,
surveys, site
feedback)
▪  App review
▪  LI.com
▪  Social data
▪  Product insight
▪  Member insight
▪  Launch tracking
▪  Social sentiment
▪  Brand tracking
▪  Viral mentions
Reactive Multi-channel Intelligent Predictive
Support Feedback Insights Anticipation
Strata + Hadoop World, 12/8/2016
…breaks down into sentiment
and drivers…
4
(For LI data ) deep dive into
MLC segmentation…
6
…geographic locations…
5
…and audience segmentation…
7
…generates automatic reporting,
alerts and escalations…
8
…and close the feedback loop
with support and PR solutions
9
This is what the future could look like
From the first time we pick up
an isolated comment…
1
Machine determines if there is
significant reach…
2
…and whether it is a trending
topic…
3
Strata + Hadoop World, 12/8/2016
Best customer experience starts
from understanding Voices of
members!
Thank
You!
Engineering blogs for Voices
Strata + Hadoop World, 12/8/2016
Part I.
Voices: a Text Analytics Platform for Understanding Member Feedback
Part II. Technical Details for Topic Mining
References
1.  LibLinear: a library for large linear classification, available at
https://www.csie.ntu.edu.tw/~cjlin/liblinear/
2.  LingPipe: a Java-based toolkit for processing text using computational linguistics,
available at http://alias-i.com/lingpipe/
3.  NLTK: a leading platform for building Python programs to work with human language
data, available at http://www.nltk.org/
4.  Stanford CoreNLP: an open source project lead by Stanford NLP group, available at
http://nlp.stanford.edu/software/

More Related Content

What's hot

Publising Data on the Web
Publising Data on the WebPublising Data on the Web
Publising Data on the Web3 Round Stones
 
RWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data GovernanceRWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data GovernanceDATAVERSITY
 
Data Management Capabilities for the Oil & Gas Industry 17-19 March, Dubai
Data Management Capabilities for the Oil & Gas Industry  17-19 March, DubaiData Management Capabilities for the Oil & Gas Industry  17-19 March, Dubai
Data Management Capabilities for the Oil & Gas Industry 17-19 March, DubaiChristopher Bradley
 
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data LandscapeData Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data LandscapeDATAVERSITY
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo
 
Metadata Strategies - Data Squared
Metadata Strategies - Data SquaredMetadata Strategies - Data Squared
Metadata Strategies - Data SquaredDATAVERSITY
 
The Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementThe Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementDATAVERSITY
 
DAS Slides: Data Modeling at the Environment Agency of England – Case Study
DAS Slides: Data Modeling at the Environment Agency of England – Case StudyDAS Slides: Data Modeling at the Environment Agency of England – Case Study
DAS Slides: Data Modeling at the Environment Agency of England – Case StudyDATAVERSITY
 
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark QuinslandData Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark QuinslandData Con LA
 
Advanced Data Modelling course 3 day synopsis
Advanced Data Modelling course 3 day synopsisAdvanced Data Modelling course 3 day synopsis
Advanced Data Modelling course 3 day synopsisChristopher Bradley
 
Information Management Fundamentals DAMA DMBoK training course synopsis
Information Management Fundamentals DAMA DMBoK training course synopsisInformation Management Fundamentals DAMA DMBoK training course synopsis
Information Management Fundamentals DAMA DMBoK training course synopsisChristopher Bradley
 
CDO Webinar: 2017 Trends in Data Strategy
CDO Webinar: 2017 Trends in Data StrategyCDO Webinar: 2017 Trends in Data Strategy
CDO Webinar: 2017 Trends in Data StrategyDATAVERSITY
 
Information Management Training Options
Information Management Training OptionsInformation Management Training Options
Information Management Training OptionsChristopher Bradley
 
Elasticsearch as a DMP
Elasticsearch as a DMPElasticsearch as a DMP
Elasticsearch as a DMPKazuki Matsuda
 
Generating Big Value from Big Data
Generating Big Value from Big DataGenerating Big Value from Big Data
Generating Big Value from Big DataBrendan Aldrich
 

What's hot (18)

Publising Data on the Web
Publising Data on the WebPublising Data on the Web
Publising Data on the Web
 
RWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data GovernanceRWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data Governance
 
Data Management Capabilities for the Oil & Gas Industry 17-19 March, Dubai
Data Management Capabilities for the Oil & Gas Industry  17-19 March, DubaiData Management Capabilities for the Oil & Gas Industry  17-19 March, Dubai
Data Management Capabilities for the Oil & Gas Industry 17-19 March, Dubai
 
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data LandscapeData Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
 
Metadata
MetadataMetadata
Metadata
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?
 
Metadata Strategies - Data Squared
Metadata Strategies - Data SquaredMetadata Strategies - Data Squared
Metadata Strategies - Data Squared
 
Tf gsds
Tf gsdsTf gsds
Tf gsds
 
The Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementThe Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata Management
 
DAS Slides: Data Modeling at the Environment Agency of England – Case Study
DAS Slides: Data Modeling at the Environment Agency of England – Case StudyDAS Slides: Data Modeling at the Environment Agency of England – Case Study
DAS Slides: Data Modeling at the Environment Agency of England – Case Study
 
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark QuinslandData Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
 
Advanced Data Modelling course 3 day synopsis
Advanced Data Modelling course 3 day synopsisAdvanced Data Modelling course 3 day synopsis
Advanced Data Modelling course 3 day synopsis
 
Information Management Fundamentals DAMA DMBoK training course synopsis
Information Management Fundamentals DAMA DMBoK training course synopsisInformation Management Fundamentals DAMA DMBoK training course synopsis
Information Management Fundamentals DAMA DMBoK training course synopsis
 
CDO Webinar: 2017 Trends in Data Strategy
CDO Webinar: 2017 Trends in Data StrategyCDO Webinar: 2017 Trends in Data Strategy
CDO Webinar: 2017 Trends in Data Strategy
 
Information Management Training Options
Information Management Training OptionsInformation Management Training Options
Information Management Training Options
 
dsl & bigdata
dsl & bigdatadsl & bigdata
dsl & bigdata
 
Elasticsearch as a DMP
Elasticsearch as a DMPElasticsearch as a DMP
Elasticsearch as a DMP
 
Generating Big Value from Big Data
Generating Big Value from Big DataGenerating Big Value from Big Data
Generating Big Value from Big Data
 

Viewers also liked

Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Florian Leitner
 
Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka DataTactics
 
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...MongoDB
 
聽見網路上的聲音- NVivo10處理文字探勘與語意分析-三星統計陳群典-20140104
聽見網路上的聲音- NVivo10處理文字探勘與語意分析-三星統計陳群典-20140104聽見網路上的聲音- NVivo10處理文字探勘與語意分析-三星統計陳群典-20140104
聽見網路上的聲音- NVivo10處理文字探勘與語意分析-三星統計陳群典-20140104Beckett Hsieh
 
Case Study: Advanced analytics in healthcare using unstructured data
Case Study: Advanced analytics in healthcare using unstructured dataCase Study: Advanced analytics in healthcare using unstructured data
Case Study: Advanced analytics in healthcare using unstructured dataDamo Consulting Inc.
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingOntotext
 
LinkedIn naudojimas B2B pardavimams/ Marius Ivanovas
LinkedIn naudojimas B2B pardavimams/ Marius IvanovasLinkedIn naudojimas B2B pardavimams/ Marius Ivanovas
LinkedIn naudojimas B2B pardavimams/ Marius IvanovasVladas Sapranavicius
 
如何使用社會網絡分析工具NodeXL找出意見領袖?Facebook臉書偵測應用實例分析-三星統計林崑峯-20140104
如何使用社會網絡分析工具NodeXL找出意見領袖?Facebook臉書偵測應用實例分析-三星統計林崑峯-20140104如何使用社會網絡分析工具NodeXL找出意見領袖?Facebook臉書偵測應用實例分析-三星統計林崑峯-20140104
如何使用社會網絡分析工具NodeXL找出意見領袖?Facebook臉書偵測應用實例分析-三星統計林崑峯-20140104Beckett Hsieh
 
Overcoming Big Data Bottlenecks in Healthcare - a Predictive Analytics Case S...
Overcoming Big Data Bottlenecks in Healthcare - a Predictive Analytics Case S...Overcoming Big Data Bottlenecks in Healthcare - a Predictive Analytics Case S...
Overcoming Big Data Bottlenecks in Healthcare - a Predictive Analytics Case S...Damo Consulting Inc.
 
Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
Aspect extraction using conditional random fields [SentiRuEval]
Aspect extraction using conditional random fields [SentiRuEval]Aspect extraction using conditional random fields [SentiRuEval]
Aspect extraction using conditional random fields [SentiRuEval]Yuliya Rubtsova
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text MiningMichel Bruley
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017LinkedIn
 

Viewers also liked (18)

Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)
 
Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka
 
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...
 
聽見網路上的聲音- NVivo10處理文字探勘與語意分析-三星統計陳群典-20140104
聽見網路上的聲音- NVivo10處理文字探勘與語意分析-三星統計陳群典-20140104聽見網路上的聲音- NVivo10處理文字探勘與語意分析-三星統計陳群典-20140104
聽見網路上的聲音- NVivo10處理文字探勘與語意分析-三星統計陳群典-20140104
 
Case Study: Advanced analytics in healthcare using unstructured data
Case Study: Advanced analytics in healthcare using unstructured dataCase Study: Advanced analytics in healthcare using unstructured data
Case Study: Advanced analytics in healthcare using unstructured data
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
 
Data Mining Overview
Data Mining OverviewData Mining Overview
Data Mining Overview
 
LinkedIn naudojimas B2B pardavimams/ Marius Ivanovas
LinkedIn naudojimas B2B pardavimams/ Marius IvanovasLinkedIn naudojimas B2B pardavimams/ Marius Ivanovas
LinkedIn naudojimas B2B pardavimams/ Marius Ivanovas
 
如何使用社會網絡分析工具NodeXL找出意見領袖?Facebook臉書偵測應用實例分析-三星統計林崑峯-20140104
如何使用社會網絡分析工具NodeXL找出意見領袖?Facebook臉書偵測應用實例分析-三星統計林崑峯-20140104如何使用社會網絡分析工具NodeXL找出意見領袖?Facebook臉書偵測應用實例分析-三星統計林崑峯-20140104
如何使用社會網絡分析工具NodeXL找出意見領袖?Facebook臉書偵測應用實例分析-三星統計林崑峯-20140104
 
Overcoming Big Data Bottlenecks in Healthcare - a Predictive Analytics Case S...
Overcoming Big Data Bottlenecks in Healthcare - a Predictive Analytics Case S...Overcoming Big Data Bottlenecks in Healthcare - a Predictive Analytics Case S...
Overcoming Big Data Bottlenecks in Healthcare - a Predictive Analytics Case S...
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Text mining
Text miningText mining
Text mining
 
Aspect extraction using conditional random fields [SentiRuEval]
Aspect extraction using conditional random fields [SentiRuEval]Aspect extraction using conditional random fields [SentiRuEval]
Aspect extraction using conditional random fields [SentiRuEval]
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
 

Similar to Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Real Time Recommendation System using Kiji
Real Time Recommendation System using KijiReal Time Recommendation System using Kiji
Real Time Recommendation System using KijiDaqing Zhao
 
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)Denodo
 
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
 It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201... It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...Edgar Alejandro Villegas
 
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Denodo
 
Big data connection overview by aibdp.org
Big data connection overview by aibdp.orgBig data connection overview by aibdp.org
Big data connection overview by aibdp.orgAIBDP
 
Big Data Connection presents: Big Data: Cause of Confusion
Big Data Connection presents:  Big Data: Cause of ConfusionBig Data Connection presents:  Big Data: Cause of Confusion
Big Data Connection presents: Big Data: Cause of ConfusionBob Samuels
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
Dataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine LearningDataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine LearningDataiku
 
What_BigData_means_to_your_organization
What_BigData_means_to_your_organizationWhat_BigData_means_to_your_organization
What_BigData_means_to_your_organizationAttila Barta
 
DesignMind Profisee Partnership September 2016
DesignMind Profisee Partnership September 2016DesignMind Profisee Partnership September 2016
DesignMind Profisee Partnership September 2016DesignMind
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Data Visualization Trends - Next Steps for Tableau
Data Visualization Trends - Next Steps for TableauData Visualization Trends - Next Steps for Tableau
Data Visualization Trends - Next Steps for TableauArunima Gupta
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
Making Sense of your data - eLearning Network April 2014
Making Sense of your data - eLearning Network April 2014Making Sense of your data - eLearning Network April 2014
Making Sense of your data - eLearning Network April 2014Andy Wooler
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Social media analytics powered by data science
Social media analytics powered by data scienceSocial media analytics powered by data science
Social media analytics powered by data scienceNavin Manaswi
 

Similar to Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale (20)

Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Real Time Recommendation System using Kiji
Real Time Recommendation System using KijiReal Time Recommendation System using Kiji
Real Time Recommendation System using Kiji
 
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
 
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
 It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201... It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
 
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
 
Big data connection overview by aibdp.org
Big data connection overview by aibdp.orgBig data connection overview by aibdp.org
Big data connection overview by aibdp.org
 
Big Data Connection presents: Big Data: Cause of Confusion
Big Data Connection presents:  Big Data: Cause of ConfusionBig Data Connection presents:  Big Data: Cause of Confusion
Big Data Connection presents: Big Data: Cause of Confusion
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Dataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine LearningDataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine Learning
 
What_BigData_means_to_your_organization
What_BigData_means_to_your_organizationWhat_BigData_means_to_your_organization
What_BigData_means_to_your_organization
 
DesignMind Profisee Partnership September 2016
DesignMind Profisee Partnership September 2016DesignMind Profisee Partnership September 2016
DesignMind Profisee Partnership September 2016
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Data Visualization Trends - Next Steps for Tableau
Data Visualization Trends - Next Steps for TableauData Visualization Trends - Next Steps for Tableau
Data Visualization Trends - Next Steps for Tableau
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Making Sense of your data - eLearning Network April 2014
Making Sense of your data - eLearning Network April 2014Making Sense of your data - eLearning Network April 2014
Making Sense of your data - eLearning Network April 2014
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Social media analytics powered by data science
Social media analytics powered by data scienceSocial media analytics powered by data science
Social media analytics powered by data science
 

Recently uploaded

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单ewymefz
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单ewymefz
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单vcaxypu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单ukgaet
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单ewymefz
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Domenico Conte
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单nscud
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhArpitMalhotra16
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单ocavb
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 

Recently uploaded (20)

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 

Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

  • 1. Understanding Voice of Members via Text Mining – How Linkedin built a text analytics platform at scale Chi-Yi Kuan Weidong Zhang Tiger Zhang
  • 2. Who are we? www.linkedin.com/in/chiyikuan Chi-Yi Kuan www.linkedin.com/in/weidongzhang1 Weidong Zhang Tiger Zhang www.linkedin.com/in/tigerzhang •  Director, Analytics at Linkedin •  Big data evangelist and practitioner •  Manager, Analytics Platform & Apps at Linkedin •  Build big data and analytics products •  Sr. Staff, Analytics at Linkedin •  Text mining scientist and big data enthusiast Strata + Hadoop World, 12/8/2016
  • 3. Strata + Hadoop World, 12/8/2016
  • 4. KnowledgeSchoolsSkillsJobsCompaniesMembers 467M 7M 6M 3B 27k 200k Endorsements Daily posts Strata + Hadoop World, 12/8/2016
  • 5. 467M 2B Billions LinkedIn Big Data Strata + Hadoop World, 12/8/2016
  • 6. Strata + Hadoop World, 12/8/2016 467+ million members = a lot of data
  • 7. Voices: drive actionable intelligence from member voices… What’s trending Products Home Page Mobile Inbox Sentiments Value Props Hire Market Sell Relevance filtering Classification Topic mining Identify content that is relevant to Linkedin brand and products/ services Structuralize unstructured textual data into well-defined categories Find most significant topics and stories in a certain time window Strata + Hadoop World, 12/8/2016
  • 8. …creating impact across business metrics Developed game-changing solutions to drive Voice of Member impact Improved analytics efficiency with unstructured data by 20X Drove end-to-end technological integration on big data and embedding NLP solutions Piloting operational solutions to scale advanced analytics impact for broader organization Strata + Hadoop World, 12/8/2016
  • 9. LinkedIn Hadoop Ecosystem HDFS Map-Reduce Tez Spark Pig Hive Scalding YARN AZKABAN Strata + Hadoop World, 12/8/2016
  • 10. Design Principles for Voices Platform Scalability Availability Easy to Use Process Platform Data Systems Application Framework Kafka, Hadoop Spark Gobblin Elasticsearch NoSQL Phoenix Elasticsearch Highcharts Strata + Hadoop World, 12/8/2016
  • 11. E2E Voices Platform Architecture Strata + Hadoop World, 12/8/2016
  • 12. Data Processing at Scale – with Generic ETL Strata + Hadoop World, 12/8/2016
  • 13. Smart IDs – for Viral Mentions with Threading Strata + Hadoop World, 12/8/2016
  • 14. High Availability – through Heterogeneous Data Strata + Hadoop World, 12/8/2016
  • 15. Machine learning based analytic engine to surface insights to everyday business users Customized Feeds Central navigation Trending insights Social analytics & topic mining Deep dives Sentiment solutions Strata + Hadoop World, 12/8/2016
  • 16. Text mining is a crowded space Strata + Hadoop World, 12/8/2016
  • 17. Our solution targets unique use cases for LinkedIn Member info •  Identity •  Behavior •  Social Social data Customer feedback •  Customer service •  Group updates •  Network updates Survey results What’s trending Products Sentiments Value Propositions PYMK Group Home Page Mobile Inbox Identity Network Hire Market Sell Relevance solution Topic mining Text Classification Strata + Hadoop World, 12/8/2016
  • 18. ▪ Product insights, launches, and events ▪ Horizontal themes ▪ PR and marketing campaigns ▪ Brand and value ▪ LinkedIn’s strategy, financial performance, international etc. Relevant: Non-relevant: ▪ Status update, e.g. "I posted something on Linkedin"; ▪ Social mentions, e.g. "Please connect with me on Linkedin" or "Follow me on Linkedin"; ▪ Self promoting materials, e.g. “share on LinkedIn” ▪ SPAMs 1) Focusing on relevant data Strata + Hadoop World, 12/8/2016
  • 20. Generic text classification framework ▪  Feature generation ▪  Feature selection ▪  Machine learning algorithms: –  Naïve Bayes (NB) –  Logistic Regression (LR) –  Support Vector Machines (SVM) (LibLinear) ▪  Cross-validation and evaluation Applications ▪  LinkedIn relevance ▪  Sentiment analysis ▪  Product categorization ▪  Value proposition classification 2) Leveraging text classification engine Strata + Hadoop World, 12/8/2016
  • 21. Machine learning approach increases overall relevance by 40% Relevance prediction power Rules 56% Whitelist Blacklist 6% 19% 40% 100% SVM 35% SVM: great gain in balancing precision and recall Strata + Hadoop World, 12/8/2016
  • 22. 3) Enabling topic mining HIGH SPARK Description POS pattern matching Part-of-speech (POS) tagging (Stanford CoreNLP) This is great. … … Topic pruning -  Stemming -  removing stop words -  merging synonyms -  clustering (optional) **** ing ****** s = = Topic ranking: TF-IDF weighting and DF ranking Strata + Hadoop World, 12/8/2016
  • 23. Trending Insights – identify organic trending topics Didi and Kuaidi merger Product release Strata + Hadoop World, 12/8/2016
  • 24. LinkedIn’s customer support has evolved into an intelligence platform… Scaling to have a broader impact across LinkedIn ▪  GCO cases ▪  Issue resolution ▪  Support focused ▪  Internal data (GCO, surveys, site feedback) ▪  App review ▪  LI.com ▪  Social data ▪  Product insight ▪  Member insight ▪  Launch tracking ▪  Social sentiment ▪  Brand tracking ▪  Viral mentions Reactive Multi-channel Intelligent Predictive Support Feedback Insights Anticipation Strata + Hadoop World, 12/8/2016
  • 25. …breaks down into sentiment and drivers… 4 (For LI data ) deep dive into MLC segmentation… 6 …geographic locations… 5 …and audience segmentation… 7 …generates automatic reporting, alerts and escalations… 8 …and close the feedback loop with support and PR solutions 9 This is what the future could look like From the first time we pick up an isolated comment… 1 Machine determines if there is significant reach… 2 …and whether it is a trending topic… 3 Strata + Hadoop World, 12/8/2016
  • 26. Best customer experience starts from understanding Voices of members! Thank You!
  • 27. Engineering blogs for Voices Strata + Hadoop World, 12/8/2016 Part I. Voices: a Text Analytics Platform for Understanding Member Feedback Part II. Technical Details for Topic Mining
  • 28. References 1.  LibLinear: a library for large linear classification, available at https://www.csie.ntu.edu.tw/~cjlin/liblinear/ 2.  LingPipe: a Java-based toolkit for processing text using computational linguistics, available at http://alias-i.com/lingpipe/ 3.  NLTK: a leading platform for building Python programs to work with human language data, available at http://www.nltk.org/ 4.  Stanford CoreNLP: an open source project lead by Stanford NLP group, available at http://nlp.stanford.edu/software/