How Oracle Uses CrowdFlower For Sentiment Analysis

How Oracle Uses CrowdFlower's
Data Enrichment Platform For
Sentiment Analysis

Before we get started
THIS IS A TITLE
#RichData
The housekeeping items:
• Webinar slides, recording, and Q&A will be
emailed
• Enter questions in chat on webinar panel
• Or ask your questions on twitter -
@CrowdFlower
- Use #RichData

Meet the Data Scientists
THIS IS A TITLE
Randall Sparks
Principal Member of Technical Staff
Oracle Data Cloud — Social Platform Group
Pallika Kanani
Senior Research Staff Member
Oracle Labs
Lukas Biewald | @L2K
CEO and Founder
CrowdFlower
#RichData

• Test Question Infrastructure
• Support for tracking contributor
agreement and data quality
People-Powered
Feedback
Overview
What will be covered today?
Train and perfect your algorithms
to build sentiment & other models
that classify text
• Multiple language support
• World-wide contributor network
• Data enrichment capabilities
Insights Why CrowdFlower?
Real examples of data
collection, data modeling done
by Oracle
Use Cases
#RichData

#RichData
Randall Sparks
• Oracle Data Cloud – Social Platform Grou
• Use case: Social Media Analytics
• Data Collection, Data Modeling Process
• Use case: Multiple Languages

About Us
• Oracle Data Cloud — Social Platform Group
– Data Service supporting multiple applications
– Monitoring & Analysis of Social Media Streams & other text sources
• Categorization of social media streams to topics +
enrichments
– Key words/phrases, Semantic vectors (LSA)
• Enrichments
– Themes within a topic, related terms appearing in messages
– Demographics, Location, Indicators of intent, etc.
– Sentiment
• Social Relationship Management
(SRM) Product
#RichData

What We Do
• Collect, filter, & analyze a large volume of streaming social
media content from multiple content sources via multiple
suppliers/aggregators
• Multiple (30+) languages — big data collection challenge
• Process
– Collect content streamed from multiple suppliers/aggregators
– Text filtering, normalization, tokenization, chunking, etc. (NLP)
– “Categorize” messages (match snippets to “Topics”)
– Topics: combinations of keywords/phrases +
semantic filters: vector comparison of words & texts in
“semantic space” using Latent Semantic Analysis (LSA)
#RichData

Use Case: Social Media Analytics
Keywords/phrases + Semantic filters
#RichData

Use Case: Social Media Analytics — Example View
#RichData

Use Case: Social Media Analytics — Example View
• Media Types of matched “snippets”
#RichData

Why We Need Sentiment Data?
• Train sentiment model (Machine Learning)
– Training data: 1000s of human-annotated items
– Features: words
• also: n-grams, phrases, known negation/intensification
patterns, etc.
• punctuation, emoticons, emoji, other metadata
– Various algorithms:
• Decision Trees, Logistic Regression,
Support Vector Machine (SVM), etc.
• Analyze model
– held-out test set
– accuracy, precision/recall, etc.
#RichData

Data Collection & Modeling Process
• Generate “gold” test item data:
– Transform into (our) standard format for upload to
CrowdFlower
– Define CrowdFlower job to generate test questions &
upload data
– Run job & download results
– Select “gold” test items based on analysis of contributor
agreement
#RichData

• Generate full training & test data sets:
– Define main CrowdFlower job, upload data & test items
– Launch & monitor job (remove problematic test questions)
– Download & analyze results
– Select (high-agreement) items for ML sentiment model
training
– Build sentiment model, test, & deploy
Data Collection & Modeling Process (continued)
#RichData

An Example Of How We Collect Data
#RichData

12+ Languages. Target: 30
#RichData

#RichData
Pallika Kanani
• About Oracle Labs
• Power of human-annotated data
• Use case – Language understanding
• Use case – Wisdom of the crowd
• Use case – Data quality

#RichData
Information Retrieval and Machine Learning Group
• Strong research program, publications
• Develop core Information Retrieval, Statistical Natural
Language Processing and Machine Learning
technologies
• Help solve complex and challenging business problems
across Oracle
• Utilize CrowdFlower platform for a wide variety of
relevance ranking and NLP problems

Data Annotation
• First step in building search
/ NLP / machine learning
application
• Many Machine Learning
techniques require some
human-annotated data
• Even for unsupervised
methods, need annotated
data for proper evaluation
#RichData

Use Case: Language Understanding
• Goal: Get a better understanding of what our customers
are talking about
• Extract useful information from raw text
• Language is all about context: Disambiguating extracted
information is crucial, and people are good at
understanding context
– Are people talking about New York subway or
Subway, the restaurant?
#RichData

CrowdFlower as a data enrichment platform
• Data collection for Machine Learning used to be tedious
– Long iterations typically lasting weeks and months
– High prohibitive costs
– Difficult to innovate  overfitting to existing corpora
• Try out new tasks at previously unimaginable speed
• Designing a job for a new NLP task is as short as a day,
getting results can be matter of hours
• Rapid Prototyping due to affordable cost for early trials
(and final data collection)
Before
After
#RichData

Rapid Feedback
• Rapid
debugging of
the data
collection
process
• Works like
debugging a
software with
humans in the
loop
#RichData

Wisdom of the Crowd
• Incorrect test questions
due to lack of
knowledge of pop
culture
• The crowd set me
straight
“’Say Something’ is the name of a
song. Please fix your test
question”
#RichData

Data Quality
• Good quality data
even for tricky tasks
• Example: Ran a task
for finding relevant
URLs from Wikipedia,
and got excellent
results
#RichData

TWITTER.COM/CrowdFlowerINFO@CROWDFLOWER.COMCROWDFLOWER.COM
Q & A

What’s next?
THIS IS A TITLE
• Look out for a follow up email with a copy of these
slides, a recording of the webinar, Q&A recap, and
other fun stuff
• View and share this presentation on Slideshare
- Follow us for more such events
• Next webinar:
- CrowdFlower User Webinar: Graphical Editor and Visual
Reports
- September 10th 2015 – 10:00 AM PST
- Register at: http://www.crowdflower.com/events
#RichData

Rich Data Summit
What is Rich Data Summit?
The leading conference for data scientists
focused on turning big data into rich,
meaningful data
• Data Scientists – 300+
• Sessions focused on Data Science – 5
• Hands-on Workshops – 9
Qualified webinar attendees will receive 30%
discount coupon
Interested? Email us at
conference@crowdflower.com
www.richdatasummit.com
@RichDataSummit
#RichData

TWITTER.COM/CrowdFlowerINFO@CROWDFLOWER.COMCROWDFLOWER.COM
Thank you.

How Oracle Uses CrowdFlower For Sentiment Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How Oracle Uses CrowdFlower For Sentiment Analysis

Similar to How Oracle Uses CrowdFlower For Sentiment Analysis (20)

More from CrowdFlower

More from CrowdFlower (7)

Recently uploaded

Recently uploaded (20)

How Oracle Uses CrowdFlower For Sentiment Analysis