This document summarizes how LinkedIn built a text analytics platform to understand member feedback at scale. It discusses how LinkedIn used techniques like text classification, topic modeling, and machine learning to analyze large amounts of unstructured social data from members in order to gain insights into trends, sentiments, and values. The platform was designed for scalability, availability, and ease of use. It provides centralized insights that have improved analytics efficiency and driven business impacts.
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale
1. Understanding Voice of
Members via Text Mining
– How Linkedin built a text analytics platform at scale
Chi-Yi Kuan
Weidong Zhang
Tiger Zhang
2. Who are we?
www.linkedin.com/in/chiyikuan
Chi-Yi Kuan
www.linkedin.com/in/weidongzhang1
Weidong Zhang
Tiger Zhang
www.linkedin.com/in/tigerzhang
• Director, Analytics at Linkedin
• Big data evangelist and
practitioner
• Manager, Analytics Platform &
Apps at Linkedin
• Build big data and analytics
products
• Sr. Staff, Analytics at Linkedin
• Text mining scientist and big data
enthusiast
Strata + Hadoop World, 12/8/2016
6. Strata + Hadoop World, 12/8/2016
467+ million members = a lot of data
7. Voices: drive actionable intelligence from member voices…
What’s trending Products
Home
Page
Mobile Inbox
Sentiments Value Props
Hire Market Sell
Relevance filtering
Classification
Topic mining
Identify content that is
relevant to Linkedin
brand and products/
services
Structuralize
unstructured textual
data into well-defined
categories
Find most significant
topics and stories in a
certain time window
Strata + Hadoop World, 12/8/2016
8. …creating impact across business metrics
Developed game-changing solutions to drive Voice of
Member impact
Improved analytics efficiency with unstructured data by
20X
Drove end-to-end technological integration on big data
and embedding NLP solutions
Piloting operational solutions to scale advanced analytics
impact for broader organization
Strata + Hadoop World, 12/8/2016
12. Data Processing at Scale – with Generic ETL
Strata + Hadoop World, 12/8/2016
13. Smart IDs – for Viral Mentions with Threading
Strata + Hadoop World, 12/8/2016
14. High Availability – through Heterogeneous Data
Strata + Hadoop World, 12/8/2016
15. Machine learning based analytic engine to surface insights
to everyday business users
Customized Feeds
Central navigation
Trending insights
Social analytics & topic
mining
Deep dives
Sentiment solutions
Strata + Hadoop World, 12/8/2016
16. Text mining is a crowded space
Strata + Hadoop World, 12/8/2016
17. Our solution targets unique use cases for LinkedIn
Member info
• Identity
• Behavior
• Social
Social data
Customer feedback
• Customer service
• Group updates
• Network updates
Survey results
What’s trending
Products
Sentiments
Value Propositions
PYMK Group
Home
Page Mobile Inbox
Identity Network
Hire Market Sell
Relevance
solution
Topic mining
Text Classification
Strata + Hadoop World, 12/8/2016
18. ▪ Product insights, launches, and
events
▪ Horizontal themes
▪ PR and marketing campaigns
▪ Brand and value
▪ LinkedIn’s strategy, financial
performance, international etc.
Relevant: Non-relevant:
▪ Status update, e.g. "I posted
something on Linkedin";
▪ Social mentions, e.g. "Please
connect with me on Linkedin" or
"Follow me on Linkedin";
▪ Self promoting materials, e.g.
“share on LinkedIn”
▪ SPAMs
1) Focusing on relevant data
Strata + Hadoop World, 12/8/2016
24. LinkedIn’s customer support has evolved into an
intelligence platform…
Scaling to have a broader impact across LinkedIn
▪ GCO cases
▪ Issue resolution
▪ Support focused
▪ Internal data (GCO,
surveys, site
feedback)
▪ App review
▪ LI.com
▪ Social data
▪ Product insight
▪ Member insight
▪ Launch tracking
▪ Social sentiment
▪ Brand tracking
▪ Viral mentions
Reactive Multi-channel Intelligent Predictive
Support Feedback Insights Anticipation
Strata + Hadoop World, 12/8/2016
25. …breaks down into sentiment
and drivers…
4
(For LI data ) deep dive into
MLC segmentation…
6
…geographic locations…
5
…and audience segmentation…
7
…generates automatic reporting,
alerts and escalations…
8
…and close the feedback loop
with support and PR solutions
9
This is what the future could look like
From the first time we pick up
an isolated comment…
1
Machine determines if there is
significant reach…
2
…and whether it is a trending
topic…
3
Strata + Hadoop World, 12/8/2016
27. Engineering blogs for Voices
Strata + Hadoop World, 12/8/2016
Part I.
Voices: a Text Analytics Platform for Understanding Member Feedback
Part II. Technical Details for Topic Mining
28. References
1. LibLinear: a library for large linear classification, available at
https://www.csie.ntu.edu.tw/~cjlin/liblinear/
2. LingPipe: a Java-based toolkit for processing text using computational linguistics,
available at http://alias-i.com/lingpipe/
3. NLTK: a leading platform for building Python programs to work with human language
data, available at http://www.nltk.org/
4. Stanford CoreNLP: an open source project lead by Stanford NLP group, available at
http://nlp.stanford.edu/software/