SlideShare a Scribd company logo
How Oracle Uses CrowdFlower's
Data Enrichment Platform For
Sentiment Analysis
Before we get started
THIS IS A TITLE
#RichData
The housekeeping items:
• Webinar slides, recording, and Q&A will be
emailed
• Enter questions in chat on webinar panel
• Or ask your questions on twitter -
@CrowdFlower
- Use #RichData
Meet the Data Scientists
THIS IS A TITLE
Randall Sparks
Principal Member of Technical Staff
Oracle Data Cloud — Social Platform Group
Pallika Kanani
Senior Research Staff Member
Oracle Labs
Lukas Biewald | @L2K
CEO and Founder
CrowdFlower
#RichData
• Test Question Infrastructure
• Support for tracking contributor
agreement and data quality
People-Powered
Feedback
Overview
What will be covered today?
Train and perfect your algorithms
to build sentiment & other models
that classify text
• Multiple language support
• World-wide contributor network
• Data enrichment capabilities
Insights Why CrowdFlower?
Real examples of data
collection, data modeling done
by Oracle
Use Cases
#RichData
#RichData
Randall Sparks
• Oracle Data Cloud – Social Platform Grou
• Use case: Social Media Analytics
• Data Collection, Data Modeling Process
• Use case: Multiple Languages
About Us
• Oracle Data Cloud — Social Platform Group
– Data Service supporting multiple applications
– Monitoring & Analysis of Social Media Streams & other text sources
• Categorization of social media streams to topics +
enrichments
– Key words/phrases, Semantic vectors (LSA)
• Enrichments
– Themes within a topic, related terms appearing in messages
– Demographics, Location, Indicators of intent, etc.
– Sentiment
• Social Relationship Management
(SRM) Product
#RichData
What We Do
• Collect, filter, & analyze a large volume of streaming social
media content from multiple content sources via multiple
suppliers/aggregators
• Multiple (30+) languages — big data collection challenge
• Process
– Collect content streamed from multiple suppliers/aggregators
– Text filtering, normalization, tokenization, chunking, etc. (NLP)
– “Categorize” messages (match snippets to “Topics”)
– Topics: combinations of keywords/phrases +
semantic filters: vector comparison of words & texts in
“semantic space” using Latent Semantic Analysis (LSA)
#RichData
Use Case: Social Media Analytics
Keywords/phrases + Semantic filters
#RichData
Use Case: Social Media Analytics — Example View
#RichData
Use Case: Social Media Analytics — Example View
#RichData
Use Case: Social Media Analytics — Example View
#RichData
Use Case: Social Media Analytics — Example View
#RichData
Use Case: Social Media Analytics — Example View
• Media Types of matched “snippets”
#RichData
Why We Need Sentiment Data?
• Train sentiment model (Machine Learning)
– Training data: 1000s of human-annotated items
– Features: words
• also: n-grams, phrases, known negation/intensification
patterns, etc.
• punctuation, emoticons, emoji, other metadata
– Various algorithms:
• Decision Trees, Logistic Regression,
Support Vector Machine (SVM), etc.
• Analyze model
– held-out test set
– accuracy, precision/recall, etc.
#RichData
Data Collection & Modeling Process
• Generate “gold” test item data:
– Transform into (our) standard format for upload to
CrowdFlower
– Define CrowdFlower job to generate test questions &
upload data
– Run job & download results
– Select “gold” test items based on analysis of contributor
agreement
#RichData
• Generate full training & test data sets:
– Define main CrowdFlower job, upload data & test items
– Launch & monitor job (remove problematic test questions)
– Download & analyze results
– Select (high-agreement) items for ML sentiment model
training
– Build sentiment model, test, & deploy
Data Collection & Modeling Process (continued)
#RichData
An Example Of How We Collect Data
#RichData
12+ Languages. Target: 30
#RichData
#RichData
Pallika Kanani
• About Oracle Labs
• Power of human-annotated data
• Use case – Language understanding
• Use case – Wisdom of the crowd
• Use case – Data quality
#RichData
Information Retrieval and Machine Learning Group
• Strong research program, publications
• Develop core Information Retrieval, Statistical Natural
Language Processing and Machine Learning
technologies
• Help solve complex and challenging business problems
across Oracle
• Utilize CrowdFlower platform for a wide variety of
relevance ranking and NLP problems
Data Annotation
• First step in building search
/ NLP / machine learning
application
• Many Machine Learning
techniques require some
human-annotated data
• Even for unsupervised
methods, need annotated
data for proper evaluation
#RichData
Use Case: Language Understanding
• Goal: Get a better understanding of what our customers
are talking about
• Extract useful information from raw text
• Language is all about context: Disambiguating extracted
information is crucial, and people are good at
understanding context
– Are people talking about New York subway or
Subway, the restaurant?
#RichData
CrowdFlower as a data enrichment platform
• Data collection for Machine Learning used to be tedious
– Long iterations typically lasting weeks and months
– High prohibitive costs
– Difficult to innovate  overfitting to existing corpora
• Try out new tasks at previously unimaginable speed
• Designing a job for a new NLP task is as short as a day,
getting results can be matter of hours
• Rapid Prototyping due to affordable cost for early trials
(and final data collection)
Before
After
#RichData
Rapid Feedback
• Rapid
debugging of
the data
collection
process
• Works like
debugging a
software with
humans in the
loop
#RichData
Wisdom of the Crowd
• Incorrect test questions
due to lack of
knowledge of pop
culture
• The crowd set me
straight
“’Say Something’ is the name of a
song. Please fix your test
question”
#RichData
Data Quality
• Good quality data
even for tricky tasks
• Example: Ran a task
for finding relevant
URLs from Wikipedia,
and got excellent
results
#RichData
TWITTER.COM/CrowdFlowerINFO@CROWDFLOWER.COMCROWDFLOWER.COM
Q & A
What’s next?
THIS IS A TITLE
• Look out for a follow up email with a copy of these
slides, a recording of the webinar, Q&A recap, and
other fun stuff
• View and share this presentation on Slideshare
- Follow us for more such events
• Next webinar:
- CrowdFlower User Webinar: Graphical Editor and Visual
Reports
- September 10th 2015 – 10:00 AM PST
- Register at: http://www.crowdflower.com/events
#RichData
Rich Data Summit
What is Rich Data Summit?
The leading conference for data scientists
focused on turning big data into rich,
meaningful data
• Data Scientists – 300+
• Sessions focused on Data Science – 5
• Hands-on Workshops – 9
Qualified webinar attendees will receive 30%
discount coupon
Interested? Email us at
conference@crowdflower.com
www.richdatasummit.com
@RichDataSummit
#RichData
TWITTER.COM/CrowdFlowerINFO@CROWDFLOWER.COMCROWDFLOWER.COM
Thank you.

More Related Content

What's hot

LavaCon 2013 Content Audit in Three Simple Steps
LavaCon 2013 Content Audit in Three Simple StepsLavaCon 2013 Content Audit in Three Simple Steps
LavaCon 2013 Content Audit in Three Simple Steps
Allison Joyce
 
O365Con19 - Office 365 Groups Surviving the Real World - Jasper Oosterveld
O365Con19 - Office 365 Groups Surviving the Real World - Jasper OosterveldO365Con19 - Office 365 Groups Surviving the Real World - Jasper Oosterveld
O365Con19 - Office 365 Groups Surviving the Real World - Jasper Oosterveld
NCCOMMS
 
LavaCon 2013 presentation: Building Content Collaboration at LSI Corporation ...
LavaCon 2013 presentation: Building Content Collaboration at LSI Corporation ...LavaCon 2013 presentation: Building Content Collaboration at LSI Corporation ...
LavaCon 2013 presentation: Building Content Collaboration at LSI Corporation ...
Vasont Systems
 
SpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entity
SpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entitySpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entity
SpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entity
jordigilnieto
 
Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...
Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...
Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...
Lucidworks
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
Looker
 
Advanced Analytics Implementations at EA scale
Advanced Analytics Implementations at EA scaleAdvanced Analytics Implementations at EA scale
Advanced Analytics Implementations at EA scale
Ani Lopez
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
Sri Ambati
 
Measurement Roadmap
Measurement RoadmapMeasurement Roadmap
Measurement Roadmap
Ani Lopez
 
Graphs in Life Sciences
Graphs in Life SciencesGraphs in Life Sciences
Graphs in Life Sciences
Neo4j
 
The Next Generation of AI-Powered Search
The Next Generation of AI-Powered SearchThe Next Generation of AI-Powered Search
The Next Generation of AI-Powered Search
Lucidworks
 
Good Help is Hard to Find
Good Help is Hard to FindGood Help is Hard to Find
Good Help is Hard to Find
Elaine Meyer
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientists
Sri Ambati
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
Neo4j
 
Prototyping like it is 2022
Prototyping like it is 2022 Prototyping like it is 2022
Prototyping like it is 2022
Michael Yagudaev
 
How to Make your Graph DB Project Successful with Neo4j Services
How to Make your Graph DB Project Successful with Neo4j ServicesHow to Make your Graph DB Project Successful with Neo4j Services
How to Make your Graph DB Project Successful with Neo4j Services
Neo4j
 
GraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4jGraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4j
Neo4j
 
Hadoop Meets Scrum
Hadoop Meets ScrumHadoop Meets Scrum
Hadoop Meets Scrum
Rommel Garcia
 
Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 Overview
Neo4j
 
Tips in migrating to SharePoint 2016 or O365, to avoid a migration headache
Tips in migrating to SharePoint 2016 or O365, to avoid a migration headacheTips in migrating to SharePoint 2016 or O365, to avoid a migration headache
Tips in migrating to SharePoint 2016 or O365, to avoid a migration headache
Mike Maadarani
 

What's hot (20)

LavaCon 2013 Content Audit in Three Simple Steps
LavaCon 2013 Content Audit in Three Simple StepsLavaCon 2013 Content Audit in Three Simple Steps
LavaCon 2013 Content Audit in Three Simple Steps
 
O365Con19 - Office 365 Groups Surviving the Real World - Jasper Oosterveld
O365Con19 - Office 365 Groups Surviving the Real World - Jasper OosterveldO365Con19 - Office 365 Groups Surviving the Real World - Jasper Oosterveld
O365Con19 - Office 365 Groups Surviving the Real World - Jasper Oosterveld
 
LavaCon 2013 presentation: Building Content Collaboration at LSI Corporation ...
LavaCon 2013 presentation: Building Content Collaboration at LSI Corporation ...LavaCon 2013 presentation: Building Content Collaboration at LSI Corporation ...
LavaCon 2013 presentation: Building Content Collaboration at LSI Corporation ...
 
SpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entity
SpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entitySpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entity
SpringIO 2016 - Spring Cloud MicroServices, a journey inside a financial entity
 
Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...
Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...
Presentation at Bio IT World West: To AI or Not to AI, Presented by Simon Tay...
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
 
Advanced Analytics Implementations at EA scale
Advanced Analytics Implementations at EA scaleAdvanced Analytics Implementations at EA scale
Advanced Analytics Implementations at EA scale
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
 
Measurement Roadmap
Measurement RoadmapMeasurement Roadmap
Measurement Roadmap
 
Graphs in Life Sciences
Graphs in Life SciencesGraphs in Life Sciences
Graphs in Life Sciences
 
The Next Generation of AI-Powered Search
The Next Generation of AI-Powered SearchThe Next Generation of AI-Powered Search
The Next Generation of AI-Powered Search
 
Good Help is Hard to Find
Good Help is Hard to FindGood Help is Hard to Find
Good Help is Hard to Find
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientists
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
 
Prototyping like it is 2022
Prototyping like it is 2022 Prototyping like it is 2022
Prototyping like it is 2022
 
How to Make your Graph DB Project Successful with Neo4j Services
How to Make your Graph DB Project Successful with Neo4j ServicesHow to Make your Graph DB Project Successful with Neo4j Services
How to Make your Graph DB Project Successful with Neo4j Services
 
GraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4jGraphTalk Helsinki - Introduction to Graphs and Neo4j
GraphTalk Helsinki - Introduction to Graphs and Neo4j
 
Hadoop Meets Scrum
Hadoop Meets ScrumHadoop Meets Scrum
Hadoop Meets Scrum
 
Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 Overview
 
Tips in migrating to SharePoint 2016 or O365, to avoid a migration headache
Tips in migrating to SharePoint 2016 or O365, to avoid a migration headacheTips in migrating to SharePoint 2016 or O365, to avoid a migration headache
Tips in migrating to SharePoint 2016 or O365, to avoid a migration headache
 

Similar to How Oracle Uses CrowdFlower For Sentiment Analysis

Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
enterprisesearchmeetup
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
Jesus Rodriguez
 
Citihub Open Source and Cloud approach to Social Media Listening
Citihub Open Source and Cloud approach to Social Media ListeningCitihub Open Source and Cloud approach to Social Media Listening
Citihub Open Source and Cloud approach to Social Media Listening
Chris Allison
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
Lynne Thomas
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled Technology
Indiana Online Users Group
 
Enterprise search Information
Enterprise search Information Enterprise search Information
Enterprise search Information
Netwoven Inc.
 
Tableau Conference 2014 Presentation
Tableau Conference 2014 PresentationTableau Conference 2014 Presentation
Tableau Conference 2014 Presentation
krystalstjulien
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
Louise Corti
 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep Learning
Lucidworks
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
ryanorban
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
Digital Transformation EXPO Event Series
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
Toronto-Oracle-Users-Group
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
Shishir Choudhary
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
Dunn Solutions Group
 
Techniques to build, engage and manage your intranet project
Techniques to build, engage and manage your intranet projectTechniques to build, engage and manage your intranet project
Techniques to build, engage and manage your intranet project
Rebecca Jackson
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
botsplash.com
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
Venkatesh Umaashankar
 
STC Information Topology
STC Information TopologySTC Information Topology
STC Information Topology
TyrinAvery1
 

Similar to How Oracle Uses CrowdFlower For Sentiment Analysis (20)

Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
Citihub Open Source and Cloud approach to Social Media Listening
Citihub Open Source and Cloud approach to Social Media ListeningCitihub Open Source and Cloud approach to Social Media Listening
Citihub Open Source and Cloud approach to Social Media Listening
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled Technology
 
Enterprise search Information
Enterprise search Information Enterprise search Information
Enterprise search Information
 
Tableau Conference 2014 Presentation
Tableau Conference 2014 PresentationTableau Conference 2014 Presentation
Tableau Conference 2014 Presentation
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep Learning
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Techniques to build, engage and manage your intranet project
Techniques to build, engage and manage your intranet projectTechniques to build, engage and manage your intranet project
Techniques to build, engage and manage your intranet project
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
STC Information Topology
STC Information TopologySTC Information Topology
STC Information Topology
 

More from CrowdFlower

7 Myths of AI
7 Myths of AI7 Myths of AI
7 Myths of AI
CrowdFlower
 
Active Learning and Human-in-the-Loop
Active Learning and Human-in-the-LoopActive Learning and Human-in-the-Loop
Active Learning and Human-in-the-Loop
CrowdFlower
 
Open Data Science Conference 2015
Open Data Science Conference 2015Open Data Science Conference 2015
Open Data Science Conference 2015
CrowdFlower
 
Productive Out-of-the-Box | Tooling with Yeoman to Rapidly Develop Ember.js A...
Productive Out-of-the-Box | Tooling with Yeoman to Rapidly Develop Ember.js A...Productive Out-of-the-Box | Tooling with Yeoman to Rapidly Develop Ember.js A...
Productive Out-of-the-Box | Tooling with Yeoman to Rapidly Develop Ember.js A...
CrowdFlower
 
Virtual Data Steward: Data Management 3.0
Virtual Data Steward: Data Management 3.0Virtual Data Steward: Data Management 3.0
Virtual Data Steward: Data Management 3.0
CrowdFlower
 
Expert Crowdsourcing with Flash Teams | CrowdConf 2013 poster
Expert Crowdsourcing with Flash Teams | CrowdConf 2013 posterExpert Crowdsourcing with Flash Teams | CrowdConf 2013 poster
Expert Crowdsourcing with Flash Teams | CrowdConf 2013 posterCrowdFlower
 
The State of Enterprise Crowdsourcing 2013
The State of Enterprise Crowdsourcing 2013The State of Enterprise Crowdsourcing 2013
The State of Enterprise Crowdsourcing 2013CrowdFlower
 

More from CrowdFlower (7)

7 Myths of AI
7 Myths of AI7 Myths of AI
7 Myths of AI
 
Active Learning and Human-in-the-Loop
Active Learning and Human-in-the-LoopActive Learning and Human-in-the-Loop
Active Learning and Human-in-the-Loop
 
Open Data Science Conference 2015
Open Data Science Conference 2015Open Data Science Conference 2015
Open Data Science Conference 2015
 
Productive Out-of-the-Box | Tooling with Yeoman to Rapidly Develop Ember.js A...
Productive Out-of-the-Box | Tooling with Yeoman to Rapidly Develop Ember.js A...Productive Out-of-the-Box | Tooling with Yeoman to Rapidly Develop Ember.js A...
Productive Out-of-the-Box | Tooling with Yeoman to Rapidly Develop Ember.js A...
 
Virtual Data Steward: Data Management 3.0
Virtual Data Steward: Data Management 3.0Virtual Data Steward: Data Management 3.0
Virtual Data Steward: Data Management 3.0
 
Expert Crowdsourcing with Flash Teams | CrowdConf 2013 poster
Expert Crowdsourcing with Flash Teams | CrowdConf 2013 posterExpert Crowdsourcing with Flash Teams | CrowdConf 2013 poster
Expert Crowdsourcing with Flash Teams | CrowdConf 2013 poster
 
The State of Enterprise Crowdsourcing 2013
The State of Enterprise Crowdsourcing 2013The State of Enterprise Crowdsourcing 2013
The State of Enterprise Crowdsourcing 2013
 

Recently uploaded

Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 

Recently uploaded (20)

Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 

How Oracle Uses CrowdFlower For Sentiment Analysis

  • 1. How Oracle Uses CrowdFlower's Data Enrichment Platform For Sentiment Analysis
  • 2. Before we get started THIS IS A TITLE #RichData The housekeeping items: • Webinar slides, recording, and Q&A will be emailed • Enter questions in chat on webinar panel • Or ask your questions on twitter - @CrowdFlower - Use #RichData
  • 3. Meet the Data Scientists THIS IS A TITLE Randall Sparks Principal Member of Technical Staff Oracle Data Cloud — Social Platform Group Pallika Kanani Senior Research Staff Member Oracle Labs Lukas Biewald | @L2K CEO and Founder CrowdFlower #RichData
  • 4. • Test Question Infrastructure • Support for tracking contributor agreement and data quality People-Powered Feedback Overview What will be covered today? Train and perfect your algorithms to build sentiment & other models that classify text • Multiple language support • World-wide contributor network • Data enrichment capabilities Insights Why CrowdFlower? Real examples of data collection, data modeling done by Oracle Use Cases #RichData
  • 5. #RichData Randall Sparks • Oracle Data Cloud – Social Platform Grou • Use case: Social Media Analytics • Data Collection, Data Modeling Process • Use case: Multiple Languages
  • 6. About Us • Oracle Data Cloud — Social Platform Group – Data Service supporting multiple applications – Monitoring & Analysis of Social Media Streams & other text sources • Categorization of social media streams to topics + enrichments – Key words/phrases, Semantic vectors (LSA) • Enrichments – Themes within a topic, related terms appearing in messages – Demographics, Location, Indicators of intent, etc. – Sentiment • Social Relationship Management (SRM) Product #RichData
  • 7. What We Do • Collect, filter, & analyze a large volume of streaming social media content from multiple content sources via multiple suppliers/aggregators • Multiple (30+) languages — big data collection challenge • Process – Collect content streamed from multiple suppliers/aggregators – Text filtering, normalization, tokenization, chunking, etc. (NLP) – “Categorize” messages (match snippets to “Topics”) – Topics: combinations of keywords/phrases + semantic filters: vector comparison of words & texts in “semantic space” using Latent Semantic Analysis (LSA) #RichData
  • 8. Use Case: Social Media Analytics Keywords/phrases + Semantic filters #RichData
  • 9. Use Case: Social Media Analytics — Example View #RichData
  • 10. Use Case: Social Media Analytics — Example View #RichData
  • 11. Use Case: Social Media Analytics — Example View #RichData
  • 12. Use Case: Social Media Analytics — Example View #RichData
  • 13. Use Case: Social Media Analytics — Example View • Media Types of matched “snippets” #RichData
  • 14. Why We Need Sentiment Data? • Train sentiment model (Machine Learning) – Training data: 1000s of human-annotated items – Features: words • also: n-grams, phrases, known negation/intensification patterns, etc. • punctuation, emoticons, emoji, other metadata – Various algorithms: • Decision Trees, Logistic Regression, Support Vector Machine (SVM), etc. • Analyze model – held-out test set – accuracy, precision/recall, etc. #RichData
  • 15. Data Collection & Modeling Process • Generate “gold” test item data: – Transform into (our) standard format for upload to CrowdFlower – Define CrowdFlower job to generate test questions & upload data – Run job & download results – Select “gold” test items based on analysis of contributor agreement #RichData
  • 16. • Generate full training & test data sets: – Define main CrowdFlower job, upload data & test items – Launch & monitor job (remove problematic test questions) – Download & analyze results – Select (high-agreement) items for ML sentiment model training – Build sentiment model, test, & deploy Data Collection & Modeling Process (continued) #RichData
  • 17. An Example Of How We Collect Data #RichData
  • 18. 12+ Languages. Target: 30 #RichData
  • 19. #RichData Pallika Kanani • About Oracle Labs • Power of human-annotated data • Use case – Language understanding • Use case – Wisdom of the crowd • Use case – Data quality
  • 20. #RichData Information Retrieval and Machine Learning Group • Strong research program, publications • Develop core Information Retrieval, Statistical Natural Language Processing and Machine Learning technologies • Help solve complex and challenging business problems across Oracle • Utilize CrowdFlower platform for a wide variety of relevance ranking and NLP problems
  • 21. Data Annotation • First step in building search / NLP / machine learning application • Many Machine Learning techniques require some human-annotated data • Even for unsupervised methods, need annotated data for proper evaluation #RichData
  • 22. Use Case: Language Understanding • Goal: Get a better understanding of what our customers are talking about • Extract useful information from raw text • Language is all about context: Disambiguating extracted information is crucial, and people are good at understanding context – Are people talking about New York subway or Subway, the restaurant? #RichData
  • 23. CrowdFlower as a data enrichment platform • Data collection for Machine Learning used to be tedious – Long iterations typically lasting weeks and months – High prohibitive costs – Difficult to innovate  overfitting to existing corpora • Try out new tasks at previously unimaginable speed • Designing a job for a new NLP task is as short as a day, getting results can be matter of hours • Rapid Prototyping due to affordable cost for early trials (and final data collection) Before After #RichData
  • 24. Rapid Feedback • Rapid debugging of the data collection process • Works like debugging a software with humans in the loop #RichData
  • 25. Wisdom of the Crowd • Incorrect test questions due to lack of knowledge of pop culture • The crowd set me straight “’Say Something’ is the name of a song. Please fix your test question” #RichData
  • 26. Data Quality • Good quality data even for tricky tasks • Example: Ran a task for finding relevant URLs from Wikipedia, and got excellent results #RichData
  • 28. What’s next? THIS IS A TITLE • Look out for a follow up email with a copy of these slides, a recording of the webinar, Q&A recap, and other fun stuff • View and share this presentation on Slideshare - Follow us for more such events • Next webinar: - CrowdFlower User Webinar: Graphical Editor and Visual Reports - September 10th 2015 – 10:00 AM PST - Register at: http://www.crowdflower.com/events #RichData
  • 29. Rich Data Summit What is Rich Data Summit? The leading conference for data scientists focused on turning big data into rich, meaningful data • Data Scientists – 300+ • Sessions focused on Data Science – 5 • Hands-on Workshops – 9 Qualified webinar attendees will receive 30% discount coupon Interested? Email us at conference@crowdflower.com www.richdatasummit.com @RichDataSummit #RichData