SlideShare a Scribd company logo
1 of 35
Parsa Ghaffari, CEO & Founder
aylien.com / @_aylien
Using NLP to understand
textual content at scale
Introduction
ML Dublin - October 2017 @parsaghaffari |
AYLIEN is a platform
for aggregating large
volumes of textual
content and
analyzing and
understanding it
using NLP.
ML Dublin - October 2017 @parsaghaffari |
What is NLP?
ML Dublin - October 2017 @parsaghaffari |
What is NLP not?
Artificial Intelligence
applied to language
@parsaghaffari |
ML Dublin - October 2017 @parsaghaffari |
Why NLP?
Computers can help people accomplish things more
efficiently, if they understand language
Language is a proxy to people's thoughts, emotions
and feelings, and intentions
We can understand people by understanding
language
Wildly applicable technology capable of creating $B’s
in value
Challenges of NLP
Challenge #1
The inherent complexity of language
ML Dublin - October 2017 @parsaghaffari |
What does it mean when someone says:
“I made her duck”?
How many distinct interpretations can you count?
Challenge #1
The inherent complexity of language
ML Dublin - October 2017 @parsaghaffari |
“I made her duck” could be interpreted as:
• I cooked a duck for her
• I cooked a duck belonging to her
• I created a duck for her
• I created a duck that now belongs to her
• I caused her to lower her head
• I turned her into a duck (!)
"Literally ur facebook
message app is useless, you
only want it to increase
profit. Please fix yourself. Its
sad @facebook"
Tone:
Nega%ve,
Subjec%ve
OrganizaCon:
Facebook
Product:
Messenger
App
AdjecCves:
“useless”,
“sad”
Language:
English,
Informal
EmoCon:
Frustra%on
ML Dublin - October 2017 @parsaghaffari |
Complexity of Language
Challenge 1:
ML Dublin - October 2017 @parsaghaffari |
And it’s not just English!
• German: “Donaudampfschifffahrtsgesellschaftskapitän” (5 “words”)
• Chinese: 50,000 different characters (2-3k to read a newspaper)
• Japanese: 3 writing systems
• Thai: Ambiguous word boundaries and sentence concepts
• Slavic: Different word forms depending on gender, case, tense
Complexity of Language
Challenge 1:
ML Dublin - October 2017 @parsaghaffari |
Machine Learning is the most suitable toolbox available to us for dealing
with this complexity.
However, there are still challenges:
1. Variety of tasks
2. Diversity of domains
3. Data preparation
4.Model training
5. Model evaluation
6.Workflow issues
Complexity of Language
Challenge 1:
ML Dublin - October 2017 @parsaghaffari |
Variety of Tasks
Challenge 1.1:
"Literally ur facebook
message app is useless, you
only want it to increase
profit. Please fix yourself. Its
sad @facebook"
SenCment Analysis (nega%ve)
EmoCon DetecCon (frustra%on)
Language DetecCon (English)
EnCty ExtracCon/Linking (Facebook, Facebook
Messenger)
Machine TranslaCon (EN -> X)
SummarizaCon (abstracCve/extracCve)
QuesCon Answering
Topic modeling/Clustering
…
ML Dublin - October 2017 @parsaghaffari |
Diversity of Domains
Challenge 1.2:
Genres & sources:
•News arCcles
•Social media
updates
•Reviews
•…
Languages & dialects:
•English:
•BriCsh:
•Northern:
•Cheshire
•…
•…
•American:
•…
•…
•…
Other types of domains:
•Industries
•Users
•Time
•…
ML Dublin - October 2017 @parsaghaffari |
Tasks, Domains & Languages
Challenges 1.1-2:
Tasks
Languages
Dom
ains
Customer A
Customer B
Customer C
ML Dublin - October 2017 @parsaghaffari |
Data Preparation
Challenge 1.3:
• Gathering data for training/testing
• Defining labels/a taxonomy
• Cleaning up the data
• Getting the data in the right format
• Annotating the data
• Splitting the data
ML Dublin - October 2017 @parsaghaffari |
Model Training
Challenge 1.4:
• Selecting the right algorithm/NN architecture
• Picking the right representations (e.g. pre-trained
word embeddings)
• Hyper-parameter tuning
• Pre-training on other data (e.g. pre-training on
sentiment data before emotion detection)
• Domain adaptation/transfer learning
ML Dublin - October 2017 @parsaghaffari |
Model Evaluation
Challenge 1.5:
• Picking/defining the right metrics
• Creating a test set to evaluate models on
• Comparing models
• Qualitative evaluation
• Explaining predictions
ML Dublin - October 2017 @parsaghaffari |
Workflow-related Issues
Challenge 1.6:
• Continuously updating models and getting feedback
• Active learning
• Domain adaptation
• Dataset visualization
• Consuming trained models
Research from International Data Corporation
(IDC) shows that unstructured content
accounts for 90% of all digital information.
The scale and production rate of unstructured data
“More content was uploaded yesterday than
any one human could ever consume in their
entire life.” – Condé Nast
ML Dublin - October 2017
Challenge #2
@parsaghaffari |
Our Solution
ML Dublin - October 2017 @parsaghaffari |
Tasks, Domains & Languages
Challenges 1.1-2:
Tasks
Languages
Dom
ains
Customer A
Customer B
Customer C
ML Dublin - October 2017 @parsaghaffari |
Text Analysis Platform (TAP)
ß
An in-browser environment for building custom NLP models
Build Datasets
From CSV files, using our Knowledge Base or by forking other datasets
Build Datasets
From CSV files, using our Knowledge Base or by forking other datasets
Build Datasets
From CSV files, using our Knowledge Base or by forking other datasets
Train Models
Compare various models/parameters and pick the best one
Evaluate Models
Understand different aspects of each model
Evaluate Models
Explain predictions made by a model
Use Models
Deploy the best version of your model to production, and use it as an API
Leverage the Marketplace
Share your models and datasets with other users, or leverage theirs
Key takeaways
@parsaghaffari |ML Dublin - October 2017
• NLP brings us closer to people’s thoughts, emotions and
intentions => It has the potential to impact billions of people;
• We’re not doing a great job at modeling the processes of which
language is an output => We face the problem of too many
<task, domain, language>s => Infinite room for research in this
area;
• NLP(/ML) is not a one-size-fits-all problem => We need to rely
on good engineering and products to fill the gap;
Thank you!
Aaylien.com
@_aylien |
We’re hiring!
aylien.com/jobs

More Related Content

Similar to Using NLP to understand textual content at scale

Forum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationForum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationCELI
 
Sudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdfSudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdfSudipta Mukherjee
 
The Concept Of Abstract Data Types
The Concept Of Abstract Data TypesThe Concept Of Abstract Data Types
The Concept Of Abstract Data TypesKaty Allen
 
PennImmersive: Action Plan for Fall 2017
PennImmersive: Action Plan for Fall 2017PennImmersive: Action Plan for Fall 2017
PennImmersive: Action Plan for Fall 2017Kimberly Eke
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
Sudipta_Mukherjee_Resume_APR_2023.pdf
Sudipta_Mukherjee_Resume_APR_2023.pdfSudipta_Mukherjee_Resume_APR_2023.pdf
Sudipta_Mukherjee_Resume_APR_2023.pdfsudipto801
 
Build your own Language - Why and How?
Build your own Language - Why and How?Build your own Language - Why and How?
Build your own Language - Why and How?Markus Voelter
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionLiad Magen
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpRikki Wright
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Dhruv Gohil
 
Ich Bin Ein Website - The impact of culture and language on internationalization
Ich Bin Ein Website - The impact of culture and language on internationalizationIch Bin Ein Website - The impact of culture and language on internationalization
Ich Bin Ein Website - The impact of culture and language on internationalizationMolecular Inc
 
Intelligent Microcontent: At the Point of Content Convergence | Rob Hanna
Intelligent Microcontent: At the Point of Content Convergence | Rob HannaIntelligent Microcontent: At the Point of Content Convergence | Rob Hanna
Intelligent Microcontent: At the Point of Content Convergence | Rob HannaLavaConConference
 
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptxDataScienceConferenc1
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4semanticsconference
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyDafydd Gibbon
 

Similar to Using NLP to understand textual content at scale (20)

Forum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationForum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentation
 
Sudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdfSudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdf
 
The Concept Of Abstract Data Types
The Concept Of Abstract Data TypesThe Concept Of Abstract Data Types
The Concept Of Abstract Data Types
 
PennImmersive: Action Plan for Fall 2017
PennImmersive: Action Plan for Fall 2017PennImmersive: Action Plan for Fall 2017
PennImmersive: Action Plan for Fall 2017
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Sudipta_Mukherjee_Resume_APR_2023.pdf
Sudipta_Mukherjee_Resume_APR_2023.pdfSudipta_Mukherjee_Resume_APR_2023.pdf
Sudipta_Mukherjee_Resume_APR_2023.pdf
 
Build your own Language - Why and How?
Build your own Language - Why and How?Build your own Language - Why and How?
Build your own Language - Why and How?
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
 
Sandeep Resume
Sandeep ResumeSandeep Resume
Sandeep Resume
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical
 
Ich Bin Ein Website - The impact of culture and language on internationalization
Ich Bin Ein Website - The impact of culture and language on internationalizationIch Bin Ein Website - The impact of culture and language on internationalization
Ich Bin Ein Website - The impact of culture and language on internationalization
 
Intelligent Microcontent: At the Point of Content Convergence | Rob Hanna
Intelligent Microcontent: At the Point of Content Convergence | Rob HannaIntelligent Microcontent: At the Point of Content Convergence | Rob Hanna
Intelligent Microcontent: At the Point of Content Convergence | Rob Hanna
 
Siddhesh Dilip Rumde Resume
Siddhesh Dilip Rumde ResumeSiddhesh Dilip Rumde Resume
Siddhesh Dilip Rumde Resume
 
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
 
NLP unit-VI.pptx
NLP unit-VI.pptxNLP unit-VI.pptx
NLP unit-VI.pptx
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Aussenac semanticsnl pwebsem2017-v4
Aussenac semanticsnl pwebsem2017-v4Aussenac semanticsnl pwebsem2017-v4
Aussenac semanticsnl pwebsem2017-v4
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technology
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 

Using NLP to understand textual content at scale

  • 1. Parsa Ghaffari, CEO & Founder aylien.com / @_aylien Using NLP to understand textual content at scale
  • 3. ML Dublin - October 2017 @parsaghaffari | AYLIEN is a platform for aggregating large volumes of textual content and analyzing and understanding it using NLP.
  • 4. ML Dublin - October 2017 @parsaghaffari | What is NLP?
  • 5. ML Dublin - October 2017 @parsaghaffari | What is NLP not?
  • 6. Artificial Intelligence applied to language @parsaghaffari |
  • 7. ML Dublin - October 2017 @parsaghaffari | Why NLP? Computers can help people accomplish things more efficiently, if they understand language Language is a proxy to people's thoughts, emotions and feelings, and intentions We can understand people by understanding language Wildly applicable technology capable of creating $B’s in value
  • 9. Challenge #1 The inherent complexity of language ML Dublin - October 2017 @parsaghaffari | What does it mean when someone says: “I made her duck”? How many distinct interpretations can you count?
  • 10. Challenge #1 The inherent complexity of language ML Dublin - October 2017 @parsaghaffari | “I made her duck” could be interpreted as: • I cooked a duck for her • I cooked a duck belonging to her • I created a duck for her • I created a duck that now belongs to her • I caused her to lower her head • I turned her into a duck (!)
  • 11. "Literally ur facebook message app is useless, you only want it to increase profit. Please fix yourself. Its sad @facebook" Tone: Nega%ve, Subjec%ve OrganizaCon: Facebook Product: Messenger App AdjecCves: “useless”, “sad” Language: English, Informal EmoCon: Frustra%on ML Dublin - October 2017 @parsaghaffari | Complexity of Language Challenge 1:
  • 12. ML Dublin - October 2017 @parsaghaffari | And it’s not just English! • German: “Donaudampfschifffahrtsgesellschaftskapitän” (5 “words”) • Chinese: 50,000 different characters (2-3k to read a newspaper) • Japanese: 3 writing systems • Thai: Ambiguous word boundaries and sentence concepts • Slavic: Different word forms depending on gender, case, tense Complexity of Language Challenge 1:
  • 13. ML Dublin - October 2017 @parsaghaffari | Machine Learning is the most suitable toolbox available to us for dealing with this complexity. However, there are still challenges: 1. Variety of tasks 2. Diversity of domains 3. Data preparation 4.Model training 5. Model evaluation 6.Workflow issues Complexity of Language Challenge 1:
  • 14. ML Dublin - October 2017 @parsaghaffari | Variety of Tasks Challenge 1.1: "Literally ur facebook message app is useless, you only want it to increase profit. Please fix yourself. Its sad @facebook" SenCment Analysis (nega%ve) EmoCon DetecCon (frustra%on) Language DetecCon (English) EnCty ExtracCon/Linking (Facebook, Facebook Messenger) Machine TranslaCon (EN -> X) SummarizaCon (abstracCve/extracCve) QuesCon Answering Topic modeling/Clustering …
  • 15. ML Dublin - October 2017 @parsaghaffari | Diversity of Domains Challenge 1.2: Genres & sources: •News arCcles •Social media updates •Reviews •… Languages & dialects: •English: •BriCsh: •Northern: •Cheshire •… •… •American: •… •… •… Other types of domains: •Industries •Users •Time •…
  • 16. ML Dublin - October 2017 @parsaghaffari | Tasks, Domains & Languages Challenges 1.1-2: Tasks Languages Dom ains Customer A Customer B Customer C
  • 17. ML Dublin - October 2017 @parsaghaffari | Data Preparation Challenge 1.3: • Gathering data for training/testing • Defining labels/a taxonomy • Cleaning up the data • Getting the data in the right format • Annotating the data • Splitting the data
  • 18. ML Dublin - October 2017 @parsaghaffari | Model Training Challenge 1.4: • Selecting the right algorithm/NN architecture • Picking the right representations (e.g. pre-trained word embeddings) • Hyper-parameter tuning • Pre-training on other data (e.g. pre-training on sentiment data before emotion detection) • Domain adaptation/transfer learning
  • 19. ML Dublin - October 2017 @parsaghaffari | Model Evaluation Challenge 1.5: • Picking/defining the right metrics • Creating a test set to evaluate models on • Comparing models • Qualitative evaluation • Explaining predictions
  • 20. ML Dublin - October 2017 @parsaghaffari | Workflow-related Issues Challenge 1.6: • Continuously updating models and getting feedback • Active learning • Domain adaptation • Dataset visualization • Consuming trained models
  • 21. Research from International Data Corporation (IDC) shows that unstructured content accounts for 90% of all digital information. The scale and production rate of unstructured data “More content was uploaded yesterday than any one human could ever consume in their entire life.” – Condé Nast ML Dublin - October 2017 Challenge #2 @parsaghaffari |
  • 23. ML Dublin - October 2017 @parsaghaffari | Tasks, Domains & Languages Challenges 1.1-2: Tasks Languages Dom ains Customer A Customer B Customer C
  • 24. ML Dublin - October 2017 @parsaghaffari |
  • 25. Text Analysis Platform (TAP) ß An in-browser environment for building custom NLP models
  • 26. Build Datasets From CSV files, using our Knowledge Base or by forking other datasets
  • 27. Build Datasets From CSV files, using our Knowledge Base or by forking other datasets
  • 28. Build Datasets From CSV files, using our Knowledge Base or by forking other datasets
  • 29. Train Models Compare various models/parameters and pick the best one
  • 30. Evaluate Models Understand different aspects of each model
  • 32. Use Models Deploy the best version of your model to production, and use it as an API
  • 33. Leverage the Marketplace Share your models and datasets with other users, or leverage theirs
  • 34. Key takeaways @parsaghaffari |ML Dublin - October 2017 • NLP brings us closer to people’s thoughts, emotions and intentions => It has the potential to impact billions of people; • We’re not doing a great job at modeling the processes of which language is an output => We face the problem of too many <task, domain, language>s => Infinite room for research in this area; • NLP(/ML) is not a one-size-fits-all problem => We need to rely on good engineering and products to fill the gap;
  • 35. Thank you! Aaylien.com @_aylien | We’re hiring! aylien.com/jobs