SlideShare a Scribd company logo
1 of 35
How computers understand
text content
a presentation for the Auckland content strategy meetup
by Anna Divoli
@annadivoli
.
Ph.D. in Biomedical Text Mining | Text Analytics Researcher | Head of R&D at Pingar
Who am I?
• 14 years in academia + 4 years in industry
• academically exposed to different disciplines:
biomedicine, bioinformatics,
computational linguistics, information retrieval,
information extraction, semantic technologies,
human-computer interaction, search user interface usability,
knowledge acquisition, visualizations
• lived in different countries:
Greece, UK, US, NZ
• learned English as a second language
(hint: I empathize with computer systems)
Anna Divoli Auckland content strategy meetup Aug 2015
Who are you?
• Marketing?
• Digital content?
• Information Architecture?
• Journalists?
• UX?
• Business Analysis?
• Software Development?
• CS research (incl. “text” people)?
• Other?
Anna Divoli Auckland content strategy meetup Aug 2015
What is “text”? Where is it?
www.nailingit.com/images/websites.jpg
www.bu.edu/today/files/2012/10/t_journals1.jpgweb.clarku.edu/offices/its/images/filepile.jpg
www.flickr.com/photos/jlconfor/14191286471
Human – Text Content Interaction
Humans:
Slow, Inconsistent, Expensive
Text content:
Overwhelmingly fast growing,
Disseminated across multiple sources
Anna Divoli Auckland content strategy meetup Aug 2015
NLP ∈ Artificial Intelligence
Machine
Learning
NLP
Computational
Linguistics
Applied
Text
Analytics
Storage
Memory
Security
Friendly UIs
Visualizations
Anna Divoli Auckland content strategy meetup Aug 2015
So, what’s in the text?
• Entities
• Facts
• Relations
• Themes/topics
• Opinions & sentiment
• …
+ Time/Location dimensions:
• Trends & paradigm shifts
• Networks
• …
Anna Divoli Auckland content strategy meetup Aug 2015
Named Entity Recognition
Find and classify names…
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
Anna Divoli Auckland content strategy meetup Aug 2015
Named Entity Recognition
Find and classify names…
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
People
Locations
Organizations
Methods: lexicon-based (gazeteers)
grammar-based (rule-based)
✓ statistical models (machine learning: algorithms + features)
✓ hybrids
Anna Divoli Auckland content strategy meetup Aug 2015
Named Entity Recognition
Find and classify names…
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
People Dates
Locations
Organizations
Who? Where?
When?
Anna Divoli Auckland content strategy meetup Aug 2015
Disambiguation & Normalization:
Word Sense Disambiguation & Text
Normalization
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
Word Sense Disambiguation: identifying which sense/meaning
of a word is used in a sentence, when the word has multiple
meanings. Synonyms & homonyms. Use context!!
Text normalization: transforming text into a single canonical
form that it might not have had before.
Anna Divoli Auckland content strategy meetup Aug 2015
Word Sense Disambiguation
& Text Normalization
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
Sam Arlington initiated partnership discussions during his visit to
Eureka offices in July.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
J. Smith went to Washington DC to see the Smithsonian Institute
and also met up with Virginia Peterson for a coffee.
Anna Divoli Auckland content strategy meetup Aug 2015
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
Sam Arlington initiated partnership discussions during his visit to
Eureka office in July.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
J. Smith went to Washington DC to see the Smithsonian Institute
and also met up with Virginia Peterson for a coffee.
Word Sense Disambiguation
& Text Normalization
Anna Divoli Auckland content strategy meetup Aug 2015
Fact & Relationship extraction
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
What?
Anna Divoli Auckland content strategy meetup Aug 2015
Deeper knowledge & Sentiment
S. Arlington initiated partnership discussions during his visit to
Eureka’s Ltd offices last month.
John Smith went to Washington to see the Smithsonian and also
met up with Virginia for a coffee.
How? Why? How do we feel about it?
S. Arlington visited the Eureka’s Ltd offices last month to initiate
partnership discussions.
John Smith was delighted to go to Washington to see the
Smithsonian and also met up with Virginia for a coffee.
Anna Divoli Auckland content strategy meetup Aug 2015
Sentiment analysis & opinion mining
• Dictionary-based (e.g. LIWC)
• Statistical
• Hybrid
• Polarity & strength
• Feelings
• Mood
• Aspects
• Who has this sentiment (source)
• What is the target of the sentiment
Pos | Neu | Neg & score
Angry, sad…
Happy, depressed…
Location, cleanliness…
Employees, customers…
Product, event, person…
Anna Divoli Auckland content strategy meetup Aug 2015
So, what’s in the text?
Anna Divoli Auckland content strategy meetup Aug 2015
• Entities
• Facts
• Relations
• Themes/topics  no training or ontologies need!
can utilize web resources (e.g., Wikipedia)
• Opinions & sentiment
• …
+ Time/Location dimensions:
• Trends & paradigm shifts
• Networks
• …
So, what ELSE is in the text?
• Ambiguity
• Metaphors
• Sarcasm
• Colloquialism/Slang
• Negation
• Hedging
• Conditional statements
• Inconsistencies/Bad grammar
• Text speak
• Anaphora
• Humor
I want an apple.
He drowned in a sea of grief.
George W Bush. Love him!
I slept like crap last night.
I am not sure I want to go to NYC.
The results indicate this.
When it rains I feel sad.
I think your smart.
C u l8r @Jacks
John met with Nick. He was upset.
Did you take a bath today? No. Is one
missing?
Anna Divoli Auckland content strategy meetup Aug 2015
So, what ELSE is in the text?
• Ambiguity
• Metaphors
• Sarcasm
• Colloquialism/Slang
• Negation
• Hedging
• Conditional statements
• Inconsistencies/Bad grammar
• Text speak
• Anaphora
• Humor
I want an apple.
He drowned in a sea of grief.
George W Bush. Love him!
I slept like crap last night.
I am not sure I want to go to NYC.
The results indicate this.
When it rains I feel sad.
I think your smart.
C u l8r @Jacks
John met with Nick. He was upset.
Did you take a bath today? No. Is one
missing?
Consider: distributed information (dialogue), technical/scientific text,
legal text, creative/poetry…
Anna Divoli Auckland content strategy meetup Aug 2015
Human language!
Eye drops off shelf.
Include your children when
baking cookies.
Turn right here.
John saw the man on the
mountain with a telescope.
He gave her cat food.
They are hunting dogs.
Anna Divoli Auckland content strategy meetup Aug 2015
Examples: Biology…
Looking for: interactions between SAF and viral LTR elements
(SAF is a transcription factor, LTR stands for ‘long terminal repeat’)
(Also: SAF = single and free, LTR = long term relationship)
Gene names:
tinman, lilliputian, dreadlocks, lush,
cheap date, methuselah, Van Gogh,
maggie, brainiac, grim, reaper,
cleopatra, swiss cheese, fucK, out cold,
ken and barbie, kenny, lava lamp,
hamlet, sonic hedgehog, werewolf, half
pint, drop dead, chardonnay, agnostic,
I’m not dead yet…
Anna Divoli Auckland content strategy meetup Aug 2015
Current State of NLP
• Rule-based systems for high precision results
• Hybrid systems for more robust performance
(rules + dictionaries/ontologies + statistical models)
• Limitation: specialized systems perform better
(much like humans!)
• Workflows offer work-around for more generic systems
e.g., check language  check category  choose model
Anna Divoli Auckland content strategy meetup Aug 2015
Examples of applications
(some are very specialized!)
Anna Divoli Auckland content strategy meetup Aug 2015
Content Enrichment
Content Inventory
Content Intelligence
pingar.com/discoveryone/
www.youtube.com/watch?v=i9FnMylGQxw
Take home messages
• Machines can do a lot of consistent, fast information
extraction
• Specialization is needed in several fields but systems can have
internal workflows
• Big data + statistics = magic!
• Always room for improvement
• Information management AND decisions AND predictions
Time for questions and discussion!
https://xkcd.com/1263/
Anna Divoli Auckland content strategy meetup Aug 2015
@annadivoli
.

More Related Content

Similar to How computers understand text content - by Anna Divoli

Bigdatahuman
BigdatahumanBigdatahuman
Bigdatahuman
suresh sood
 
Kyla USA trip april 2010
Kyla  USA trip april 2010Kyla  USA trip april 2010
Kyla USA trip april 2010
Connecting Up
 
Let's Go! Final Presentation
Let's Go! Final PresentationLet's Go! Final Presentation
Let's Go! Final Presentation
markschoi
 
AvrahamSpechlerResume.rev-best
AvrahamSpechlerResume.rev-bestAvrahamSpechlerResume.rev-best
AvrahamSpechlerResume.rev-best
Avi Spechler
 
AECT 2015 Creating an intentional web presence
AECT 2015 Creating an intentional web presenceAECT 2015 Creating an intentional web presence
AECT 2015 Creating an intentional web presence
Patrick Lowenthal
 
Purdue Application Essay
Purdue Application EssayPurdue Application Essay
Purdue Application Essay
Ally Gonzales
 
Bringing the Child and Youth's Voice into Research and Evaluation
Bringing the Child and Youth's Voice into Research and EvaluationBringing the Child and Youth's Voice into Research and Evaluation
Bringing the Child and Youth's Voice into Research and Evaluation
MelanieKatz8
 

Similar to How computers understand text content - by Anna Divoli (20)

Data Storytelling for Social Change
Data Storytelling for Social ChangeData Storytelling for Social Change
Data Storytelling for Social Change
 
Bigdatahuman
BigdatahumanBigdatahuman
Bigdatahuman
 
Kyla USA trip april 2010
Kyla  USA trip april 2010Kyla  USA trip april 2010
Kyla USA trip april 2010
 
Let's Go! Final Presentation
Let's Go! Final PresentationLet's Go! Final Presentation
Let's Go! Final Presentation
 
Transgender Identity
Transgender IdentityTransgender Identity
Transgender Identity
 
AvrahamSpechlerResume.rev-best
AvrahamSpechlerResume.rev-bestAvrahamSpechlerResume.rev-best
AvrahamSpechlerResume.rev-best
 
Test your research iq
Test your research iqTest your research iq
Test your research iq
 
Engaging Community Residents with Data
Engaging Community Residents with DataEngaging Community Residents with Data
Engaging Community Residents with Data
 
CIL Stats Workshop April1 2022 Abram Silk.pdf
CIL Stats Workshop April1 2022 Abram Silk.pdfCIL Stats Workshop April1 2022 Abram Silk.pdf
CIL Stats Workshop April1 2022 Abram Silk.pdf
 
AECT 2015 Creating an intentional web presence
AECT 2015 Creating an intentional web presenceAECT 2015 Creating an intentional web presence
AECT 2015 Creating an intentional web presence
 
Socio Scientific Issues Introduction 2014
Socio Scientific Issues Introduction 2014Socio Scientific Issues Introduction 2014
Socio Scientific Issues Introduction 2014
 
Socio Scientific Issue Introduction
Socio Scientific Issue IntroductionSocio Scientific Issue Introduction
Socio Scientific Issue Introduction
 
Loff conference brochure 2015 Dominic Carter Keynotes
Loff conference brochure 2015 Dominic Carter KeynotesLoff conference brochure 2015 Dominic Carter Keynotes
Loff conference brochure 2015 Dominic Carter Keynotes
 
Pin On Products
Pin On ProductsPin On Products
Pin On Products
 
resume
resumeresume
resume
 
Purdue Application Essay
Purdue Application EssayPurdue Application Essay
Purdue Application Essay
 
Essay On Active Listening Skills
Essay On Active Listening SkillsEssay On Active Listening Skills
Essay On Active Listening Skills
 
Bringing the Child and Youth's Voice into Research and Evaluation
Bringing the Child and Youth's Voice into Research and EvaluationBringing the Child and Youth's Voice into Research and Evaluation
Bringing the Child and Youth's Voice into Research and Evaluation
 
Write Like You Mean It
Write Like You Mean ItWrite Like You Mean It
Write Like You Mean It
 
Design for Complexity - talking Social Innovation at Massey University
Design for Complexity - talking Social Innovation at Massey UniversityDesign for Complexity - talking Social Innovation at Massey University
Design for Complexity - talking Social Innovation at Massey University
 

More from Anna Divoli

Ebi apr2011 usability-part
Ebi apr2011 usability-partEbi apr2011 usability-part
Ebi apr2011 usability-part
Anna Divoli
 

More from Anna Divoli (7)

AI for information management: why and how
AI for information management: why and howAI for information management: why and how
AI for information management: why and how
 
NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...
NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...
NLP Tales in Biomedicine (introductory presentation for the Auckland NLP Meet...
 
Constructing a Focused Taxonomy from a Document Collection - ESWC 2013
Constructing a Focused Taxonomy from a Document Collection - ESWC 2013Constructing a Focused Taxonomy from a Document Collection - ESWC 2013
Constructing a Focused Taxonomy from a Document Collection - ESWC 2013
 
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to C...
 
Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...
Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...
Anna Divoli (Pingar Research): Automatic Taxonomy Generation for a News Group...
 
Divoli Presentation at EBI Apr2011 Usability Part
Divoli Presentation at EBI Apr2011 Usability PartDivoli Presentation at EBI Apr2011 Usability Part
Divoli Presentation at EBI Apr2011 Usability Part
 
Ebi apr2011 usability-part
Ebi apr2011 usability-partEbi apr2011 usability-part
Ebi apr2011 usability-part
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

How computers understand text content - by Anna Divoli

  • 1. How computers understand text content a presentation for the Auckland content strategy meetup by Anna Divoli @annadivoli . Ph.D. in Biomedical Text Mining | Text Analytics Researcher | Head of R&D at Pingar
  • 2. Who am I? • 14 years in academia + 4 years in industry • academically exposed to different disciplines: biomedicine, bioinformatics, computational linguistics, information retrieval, information extraction, semantic technologies, human-computer interaction, search user interface usability, knowledge acquisition, visualizations • lived in different countries: Greece, UK, US, NZ • learned English as a second language (hint: I empathize with computer systems) Anna Divoli Auckland content strategy meetup Aug 2015
  • 3. Who are you? • Marketing? • Digital content? • Information Architecture? • Journalists? • UX? • Business Analysis? • Software Development? • CS research (incl. “text” people)? • Other? Anna Divoli Auckland content strategy meetup Aug 2015
  • 4. What is “text”? Where is it? www.nailingit.com/images/websites.jpg www.bu.edu/today/files/2012/10/t_journals1.jpgweb.clarku.edu/offices/its/images/filepile.jpg www.flickr.com/photos/jlconfor/14191286471
  • 5. Human – Text Content Interaction Humans: Slow, Inconsistent, Expensive Text content: Overwhelmingly fast growing, Disseminated across multiple sources Anna Divoli Auckland content strategy meetup Aug 2015
  • 6. NLP ∈ Artificial Intelligence Machine Learning NLP Computational Linguistics Applied Text Analytics Storage Memory Security Friendly UIs Visualizations Anna Divoli Auckland content strategy meetup Aug 2015
  • 7. So, what’s in the text? • Entities • Facts • Relations • Themes/topics • Opinions & sentiment • … + Time/Location dimensions: • Trends & paradigm shifts • Networks • … Anna Divoli Auckland content strategy meetup Aug 2015
  • 8. Named Entity Recognition Find and classify names… S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. Anna Divoli Auckland content strategy meetup Aug 2015
  • 9. Named Entity Recognition Find and classify names… S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. People Locations Organizations Methods: lexicon-based (gazeteers) grammar-based (rule-based) ✓ statistical models (machine learning: algorithms + features) ✓ hybrids Anna Divoli Auckland content strategy meetup Aug 2015
  • 10. Named Entity Recognition Find and classify names… S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. People Dates Locations Organizations Who? Where? When? Anna Divoli Auckland content strategy meetup Aug 2015
  • 11. Disambiguation & Normalization: Word Sense Disambiguation & Text Normalization S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. Word Sense Disambiguation: identifying which sense/meaning of a word is used in a sentence, when the word has multiple meanings. Synonyms & homonyms. Use context!! Text normalization: transforming text into a single canonical form that it might not have had before. Anna Divoli Auckland content strategy meetup Aug 2015
  • 12. Word Sense Disambiguation & Text Normalization S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. Sam Arlington initiated partnership discussions during his visit to Eureka offices in July. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. J. Smith went to Washington DC to see the Smithsonian Institute and also met up with Virginia Peterson for a coffee. Anna Divoli Auckland content strategy meetup Aug 2015
  • 13. S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. Sam Arlington initiated partnership discussions during his visit to Eureka office in July. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. J. Smith went to Washington DC to see the Smithsonian Institute and also met up with Virginia Peterson for a coffee. Word Sense Disambiguation & Text Normalization Anna Divoli Auckland content strategy meetup Aug 2015
  • 14. Fact & Relationship extraction S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. What? Anna Divoli Auckland content strategy meetup Aug 2015
  • 15. Deeper knowledge & Sentiment S. Arlington initiated partnership discussions during his visit to Eureka’s Ltd offices last month. John Smith went to Washington to see the Smithsonian and also met up with Virginia for a coffee. How? Why? How do we feel about it? S. Arlington visited the Eureka’s Ltd offices last month to initiate partnership discussions. John Smith was delighted to go to Washington to see the Smithsonian and also met up with Virginia for a coffee. Anna Divoli Auckland content strategy meetup Aug 2015
  • 16. Sentiment analysis & opinion mining • Dictionary-based (e.g. LIWC) • Statistical • Hybrid • Polarity & strength • Feelings • Mood • Aspects • Who has this sentiment (source) • What is the target of the sentiment Pos | Neu | Neg & score Angry, sad… Happy, depressed… Location, cleanliness… Employees, customers… Product, event, person… Anna Divoli Auckland content strategy meetup Aug 2015
  • 17. So, what’s in the text? Anna Divoli Auckland content strategy meetup Aug 2015 • Entities • Facts • Relations • Themes/topics  no training or ontologies need! can utilize web resources (e.g., Wikipedia) • Opinions & sentiment • … + Time/Location dimensions: • Trends & paradigm shifts • Networks • …
  • 18. So, what ELSE is in the text? • Ambiguity • Metaphors • Sarcasm • Colloquialism/Slang • Negation • Hedging • Conditional statements • Inconsistencies/Bad grammar • Text speak • Anaphora • Humor I want an apple. He drowned in a sea of grief. George W Bush. Love him! I slept like crap last night. I am not sure I want to go to NYC. The results indicate this. When it rains I feel sad. I think your smart. C u l8r @Jacks John met with Nick. He was upset. Did you take a bath today? No. Is one missing? Anna Divoli Auckland content strategy meetup Aug 2015
  • 19. So, what ELSE is in the text? • Ambiguity • Metaphors • Sarcasm • Colloquialism/Slang • Negation • Hedging • Conditional statements • Inconsistencies/Bad grammar • Text speak • Anaphora • Humor I want an apple. He drowned in a sea of grief. George W Bush. Love him! I slept like crap last night. I am not sure I want to go to NYC. The results indicate this. When it rains I feel sad. I think your smart. C u l8r @Jacks John met with Nick. He was upset. Did you take a bath today? No. Is one missing? Consider: distributed information (dialogue), technical/scientific text, legal text, creative/poetry… Anna Divoli Auckland content strategy meetup Aug 2015
  • 20. Human language! Eye drops off shelf. Include your children when baking cookies. Turn right here. John saw the man on the mountain with a telescope. He gave her cat food. They are hunting dogs. Anna Divoli Auckland content strategy meetup Aug 2015
  • 21. Examples: Biology… Looking for: interactions between SAF and viral LTR elements (SAF is a transcription factor, LTR stands for ‘long terminal repeat’) (Also: SAF = single and free, LTR = long term relationship) Gene names: tinman, lilliputian, dreadlocks, lush, cheap date, methuselah, Van Gogh, maggie, brainiac, grim, reaper, cleopatra, swiss cheese, fucK, out cold, ken and barbie, kenny, lava lamp, hamlet, sonic hedgehog, werewolf, half pint, drop dead, chardonnay, agnostic, I’m not dead yet… Anna Divoli Auckland content strategy meetup Aug 2015
  • 22. Current State of NLP • Rule-based systems for high precision results • Hybrid systems for more robust performance (rules + dictionaries/ontologies + statistical models) • Limitation: specialized systems perform better (much like humans!) • Workflows offer work-around for more generic systems e.g., check language  check category  choose model Anna Divoli Auckland content strategy meetup Aug 2015
  • 23. Examples of applications (some are very specialized!) Anna Divoli Auckland content strategy meetup Aug 2015
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 31.
  • 32.
  • 34. Take home messages • Machines can do a lot of consistent, fast information extraction • Specialization is needed in several fields but systems can have internal workflows • Big data + statistics = magic! • Always room for improvement • Information management AND decisions AND predictions
  • 35. Time for questions and discussion! https://xkcd.com/1263/ Anna Divoli Auckland content strategy meetup Aug 2015 @annadivoli .

Editor's Notes

  1. We create and consume text!