SlideShare a Scribd company logo
Natural language 
processing with Python
Hi all! 
Iván Compañy 
@pyivanc
Index 
1. Introduction 
1. NLP 
2. MyTaste 
2. Theory 
1. Data processing 
1. Normalizing 
2. Tokenizing 
3. Tagging 
2. Data post-processing 
1. Classification 
2. Word count 
3. Practice
WTF Natural language processing (NLP) is a field of 
computer science, artificial intelligence, and linguistics 
concerned with the interactions between computers and 
human (natural) languages 
Wikipedia 
1.1. NLP 
Natural language processing (NLP) is all about human 
text processing 
Me
1.2. MyTaste 
http://www.mytaste.com/
1.2. MyTaste
1.2. MyTaste 
1. How do I know where is the important information? 
2. How do I know if the information is a cooking recipe?
2. Theory
2.1. Data processing 
yo soy programador python 
DET VRB N N 
Yo soy programador Python 
1. Normalize: Adapt the text depending of your needs 
2. Tokenize: Split text in identifiable linguistic units 
3. Tagging: Classify words into their parts-of-speech
Recipes 
I love the chocolate 
Everybody loves the chocolate 
I still love chocolate! 
Why you don’t love chocolate? 
My GF is chocolate 
Non Recipes 
Once upon a time.. 
Remember, remember… 
print(“hello Python”) 
Fly, you fools! 
Let there be rock 
2.2. Data post-processing 
Content selection 
1. Classification
2.2. Data post-processing 1. Classification 
Feature selection 
• “A recipe must contain the word ‘chocolate’” 
• “A recipe must contain at least one verb” 
What! 
Training 
MACHINE LEARNING ALGORITHM
2.2. Data post-processing 1. Classification 
Machine 
learning 
algorithm 
This is my recipe of chocolate 
This is the beginning of anything 
I love chocolate 
Why you don’t love chocolate? 
I’m becoming insane 
recipe 
non recipe 
recipe 
recipe 
non recipe 
This is my recipe of chocolate 
This is the beginning of anything 
I love chocolate 
Why you don’t love chocolate? 
I’m becoming insane 
… 
Test input
2.2. Data post-processing 2. Word count 
Frequency 
distribution 
algotihm 
This is my recipe of chocolate 
This is the beginning of anything 
I love chocolate 
Why you don’t love chocolate? 
I’m becoming insane 
… 
Input
NLTK 
http://www.nltk.org/
DownloadingIn csotarpllionrga:: pythpoinp -icn s“itmaplolr tn lntlktk;nltk.download()”
Data processing 
import nltk 
#Sample text 
text = “I am a Python programmer” 
#Normalize (the ‘I’ character in english must be in uppercase) 
text = text.capitalize() 
#Tokenize 
text_tokenized = nltk.word_tokenize(text) 
#We create the tagger 
from nltk.corpus import brown 
default_tag = nltk.DefaultTagger('UNK') 
tagger = nltk.UnigramTagger(brown.tagged_sents(), backoff=default_tag) 
#Tagging 
text_tagged = tagger.tag(text_tokenized) 
#Output 
#[('I', u'PPSS'), ('am', u'BEM'), ('a', u'AT'), ('python', u'NN'), ('programmer', u'NN')]
Data post-processing: Classification 
from nltk.corpus import brown 
with open(‘tagged_recipes.txt’, ‘rb’) as f: 
recipes = [] 
for recipe in f.readlines(): 
recipes.append(recipe, ‘recipe’) 
non_recipes = [] 
for line in brown.tagged_sents()[:1000]: 
non_recipes.append((line, ‘non_recipe’)) 
all_texts = (recipes + non_recipes) 
random.shuffle(all_texts)
Data post-processing: Classification 
def feature(text): 
has_chocolate = any(['chocolate' in word.lower() for (word, pos) in text]) 
is_american = any(['america' in word.lower() for (word, pos) in text]) 
return {'has_chocolate': has_chocolate, 'is_american': is_american} 
featuresets = [(feature(text), t) for (text, t) in all_sents] 
train_set, test_set = featuresets[:18000], featuresets[18000:] 
classifier = nltk.NaiveBayesClassifier.train(train_set)
Data post-processing: Classification 
print classifier.show_most_informative_features(5) 
“”” 
Most Informative Features 
has_chocolate = True recipe : non_re = 86.6 : 1.0 
is_american = True non_re : recipe = 70.4 : 1.0 
is_american = False recipe : non_re = 1.0 : 1.0 
has_chocolate = False non_re : recipe = 1.0 : 1.0 
“”” 
print nltk.classify.accuracy(classifier, test_set) 
“”” 
0.514 
“””
Data post-processing: Word count 
import nltk 
with open(“recipes.txt”) as f: 
tokenized_text = nltk.word_tokenize(f.read()) 
fdist = nltk.FreqDist(tokenized_text) 
fdist.plot(50)
Data post-processing: Classification
Examples of usage: 
• Text recognition 
• Tag detection in websites 
• Market analysis 
• Ads system 
• Translators 
• …
More things I didn’t explain: 
• Analyze group of texts (Chunk tagging) 
• Semantic analysis 
• How to create a corpus 
• …
More info:
Thank you! 
Iván Compañy 
@pyivanc

More Related Content

What's hot

Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
Aritra Mukherjee
 
Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014Fasihul Kabir
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
dhruv_chaudhari
 
Natural Language Processing in Practice
Natural Language Processing in PracticeNatural Language Processing in Practice
Natural Language Processing in PracticeVsevolod Dyomkin
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
Vsevolod Dyomkin
 
Using Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a bookUsing Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a book
Olusola Amusan
 
Python_in_Detail
Python_in_DetailPython_in_Detail
Python_in_Detail
MAHALAKSHMI P
 
Natural Language Processing: Comparing NLTK and OpenNLP
Natural Language Processing: Comparing NLTK and OpenNLPNatural Language Processing: Comparing NLTK and OpenNLP
Natural Language Processing: Comparing NLTK and OpenNLP
CodeOps Technologies LLP
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
ananth
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
Gagan Gowda
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
Jaganadh Gopinadhan
 
You too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talkYou too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talk
Jacob Perkins
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
Satyam Saxena
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
Pragati Singh
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
David Rostcheck
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
Pragati Singh
 
Python made easy
Python made easy Python made easy
Python made easy
Abhishek kumar
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
Bill Liu
 
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYAChapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Maulik Borsaniya
 

What's hot (20)

Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014Nltk:a tool for_nlp - py_con-dhaka-2014
Nltk:a tool for_nlp - py_con-dhaka-2014
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing in Practice
Natural Language Processing in PracticeNatural Language Processing in Practice
Natural Language Processing in Practice
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
 
Using Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a bookUsing Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a book
 
Python_in_Detail
Python_in_DetailPython_in_Detail
Python_in_Detail
 
Natural Language Processing: Comparing NLTK and OpenNLP
Natural Language Processing: Comparing NLTK and OpenNLPNatural Language Processing: Comparing NLTK and OpenNLP
Natural Language Processing: Comparing NLTK and OpenNLP
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
You too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talkYou too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talk
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
 
Python made easy
Python made easy Python made easy
Python made easy
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYAChapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
 

Viewers also liked

Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profilecharlyheus
 
แรงดันในของเหลว
แรงดันในของเหลวแรงดันในของเหลว
แรงดันในของเหลวtewin2553
 
Edge Amsterdam profile
Edge Amsterdam profileEdge Amsterdam profile
Edge Amsterdam profilecharlyheus
 
Games To Explain Human Factors: Come, Participate, Learn & Have Fun!!! Photo ...
Games To Explain Human Factors: Come, Participate, Learn & Have Fun!!! Photo ...Games To Explain Human Factors: Come, Participate, Learn & Have Fun!!! Photo ...
Games To Explain Human Factors: Come, Participate, Learn & Have Fun!!! Photo ...
Ronald G. Shapiro
 
Neue aufregende Web Technologien, HTML5 + CSS3 anhand von Praxisbeispielen le...
Neue aufregende Web Technologien, HTML5 + CSS3 anhand von Praxisbeispielen le...Neue aufregende Web Technologien, HTML5 + CSS3 anhand von Praxisbeispielen le...
Neue aufregende Web Technologien, HTML5 + CSS3 anhand von Praxisbeispielen le...Eric Eggert
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profilecharlyheus
 
subrat
 subrat subrat
subrat
ABA,BALASORE
 
SharePoint Exchange Forum - 10 Worst Mistakes in SharePoint Branding
SharePoint Exchange Forum - 10 Worst Mistakes in SharePoint BrandingSharePoint Exchange Forum - 10 Worst Mistakes in SharePoint Branding
SharePoint Exchange Forum - 10 Worst Mistakes in SharePoint BrandingMarcy Kellar
 
Automatic Language Identification
Automatic Language IdentificationAutomatic Language Identification
Automatic Language Identification
bigshum
 
Aids to creativity
Aids to creativityAids to creativity
Aids to creativity
preciouspresentation
 
Django & Drupal: A Tale of Two Cities
Django & Drupal: A Tale of Two CitiesDjango & Drupal: A Tale of Two Cities
Django & Drupal: A Tale of Two Cities
Donna Benjamin
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Mike Long
 
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePointSharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
Marc D Anderson
 
поездка на родину
поездка на родинупоездка на родину
поездка на родинуzizari
 
Examples of BT using SharePoint 2010
Examples of BT using SharePoint 2010Examples of BT using SharePoint 2010
Examples of BT using SharePoint 2010
Mark Morrell
 
Edge Amsterdam Profile
Edge Amsterdam ProfileEdge Amsterdam Profile
Edge Amsterdam Profilecharlyheus
 
Conversational Internet - Creating a natural language interface for web pages
Conversational Internet - Creating a natural language interface for web pagesConversational Internet - Creating a natural language interface for web pages
Conversational Internet - Creating a natural language interface for web pages
Dale Lane
 

Viewers also liked (20)

Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profile
 
แรงดันในของเหลว
แรงดันในของเหลวแรงดันในของเหลว
แรงดันในของเหลว
 
Edge Amsterdam profile
Edge Amsterdam profileEdge Amsterdam profile
Edge Amsterdam profile
 
Games To Explain Human Factors: Come, Participate, Learn & Have Fun!!! Photo ...
Games To Explain Human Factors: Come, Participate, Learn & Have Fun!!! Photo ...Games To Explain Human Factors: Come, Participate, Learn & Have Fun!!! Photo ...
Games To Explain Human Factors: Come, Participate, Learn & Have Fun!!! Photo ...
 
Verb to be
Verb to beVerb to be
Verb to be
 
Neue aufregende Web Technologien, HTML5 + CSS3 anhand von Praxisbeispielen le...
Neue aufregende Web Technologien, HTML5 + CSS3 anhand von Praxisbeispielen le...Neue aufregende Web Technologien, HTML5 + CSS3 anhand von Praxisbeispielen le...
Neue aufregende Web Technologien, HTML5 + CSS3 anhand von Praxisbeispielen le...
 
Conspiracy Profile
Conspiracy ProfileConspiracy Profile
Conspiracy Profile
 
subrat
 subrat subrat
subrat
 
SharePoint Exchange Forum - 10 Worst Mistakes in SharePoint Branding
SharePoint Exchange Forum - 10 Worst Mistakes in SharePoint BrandingSharePoint Exchange Forum - 10 Worst Mistakes in SharePoint Branding
SharePoint Exchange Forum - 10 Worst Mistakes in SharePoint Branding
 
Automatic Language Identification
Automatic Language IdentificationAutomatic Language Identification
Automatic Language Identification
 
Aids to creativity
Aids to creativityAids to creativity
Aids to creativity
 
Trabajo grupal
Trabajo grupalTrabajo grupal
Trabajo grupal
 
Django & Drupal: A Tale of Two Cities
Django & Drupal: A Tale of Two CitiesDjango & Drupal: A Tale of Two Cities
Django & Drupal: A Tale of Two Cities
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePointSharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint
 
TRABAJO GRUPAL
TRABAJO GRUPALTRABAJO GRUPAL
TRABAJO GRUPAL
 
поездка на родину
поездка на родинупоездка на родину
поездка на родину
 
Examples of BT using SharePoint 2010
Examples of BT using SharePoint 2010Examples of BT using SharePoint 2010
Examples of BT using SharePoint 2010
 
Edge Amsterdam Profile
Edge Amsterdam ProfileEdge Amsterdam Profile
Edge Amsterdam Profile
 
Conversational Internet - Creating a natural language interface for web pages
Conversational Internet - Creating a natural language interface for web pagesConversational Internet - Creating a natural language interface for web pages
Conversational Internet - Creating a natural language interface for web pages
 

Similar to Natural language processing

Mining the social web ch8 - 1
Mining the social web ch8 - 1Mining the social web ch8 - 1
Mining the social web ch8 - 1scor7910
 
Conf orm - explain
Conf orm - explainConf orm - explain
Conf orm - explain
Louise Grandjonc
 
Natural Language Processing sample code by Aiden
Natural Language Processing sample code by AidenNatural Language Processing sample code by Aiden
Natural Language Processing sample code by Aiden
Aiden Wu, FRM
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLP
Justin Long
 
Python: The Dynamic!
Python: The Dynamic!Python: The Dynamic!
Python: The Dynamic!
Omid Mogharian
 
Debugging and Testing ES Systems
Debugging and Testing ES SystemsDebugging and Testing ES Systems
Debugging and Testing ES SystemsChris Birchall
 
Data Science
Data Science Data Science
Data Science
University of Sindh
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
RahulTr22
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
Ganesh E
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
kalai75
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
Aravind Reddy
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
MENGSAYLOEM1
 
Algorithms overview
Algorithms overviewAlgorithms overview
Algorithms overview
Deborah Akuoko
 
ARCHIVED: new version available. 2016 - METASPACE Training Course
ARCHIVED: new version available. 2016 - METASPACE Training CourseARCHIVED: new version available. 2016 - METASPACE Training Course
ARCHIVED: new version available. 2016 - METASPACE Training Course
METASPACE
 
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010singingfish
 
The Ring programming language version 1.5.1 book - Part 10 of 180
The Ring programming language version 1.5.1 book - Part 10 of 180The Ring programming language version 1.5.1 book - Part 10 of 180
The Ring programming language version 1.5.1 book - Part 10 of 180
Mahmoud Samir Fayed
 
Logic programming in python
Logic programming in pythonLogic programming in python
Logic programming in python
Pierre Carbonnelle
 
CoreData - there is an ORM you can like!
CoreData - there is an ORM you can like!CoreData - there is an ORM you can like!
CoreData - there is an ORM you can like!
Tomáš Jukin
 
Dutch PHP Conference 2013: Distilled
Dutch PHP Conference 2013: DistilledDutch PHP Conference 2013: Distilled
Dutch PHP Conference 2013: Distilled
Zumba Fitness - Technology Team
 

Similar to Natural language processing (20)

Mining the social web ch8 - 1
Mining the social web ch8 - 1Mining the social web ch8 - 1
Mining the social web ch8 - 1
 
Conf orm - explain
Conf orm - explainConf orm - explain
Conf orm - explain
 
Natural Language Processing sample code by Aiden
Natural Language Processing sample code by AidenNatural Language Processing sample code by Aiden
Natural Language Processing sample code by Aiden
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLP
 
Python: The Dynamic!
Python: The Dynamic!Python: The Dynamic!
Python: The Dynamic!
 
Debugging and Testing ES Systems
Debugging and Testing ES SystemsDebugging and Testing ES Systems
Debugging and Testing ES Systems
 
Data Science
Data Science Data Science
Data Science
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Algorithms overview
Algorithms overviewAlgorithms overview
Algorithms overview
 
ARCHIVED: new version available. 2016 - METASPACE Training Course
ARCHIVED: new version available. 2016 - METASPACE Training CourseARCHIVED: new version available. 2016 - METASPACE Training Course
ARCHIVED: new version available. 2016 - METASPACE Training Course
 
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
 
The Ring programming language version 1.5.1 book - Part 10 of 180
The Ring programming language version 1.5.1 book - Part 10 of 180The Ring programming language version 1.5.1 book - Part 10 of 180
The Ring programming language version 1.5.1 book - Part 10 of 180
 
Magento code audit
Magento code auditMagento code audit
Magento code audit
 
Logic programming in python
Logic programming in pythonLogic programming in python
Logic programming in python
 
CoreData - there is an ORM you can like!
CoreData - there is an ORM you can like!CoreData - there is an ORM you can like!
CoreData - there is an ORM you can like!
 
Dutch PHP Conference 2013: Distilled
Dutch PHP Conference 2013: DistilledDutch PHP Conference 2013: Distilled
Dutch PHP Conference 2013: Distilled
 

Recently uploaded

MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
itech2017
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
 

Natural language processing

  • 2. Hi all! Iván Compañy @pyivanc
  • 3. Index 1. Introduction 1. NLP 2. MyTaste 2. Theory 1. Data processing 1. Normalizing 2. Tokenizing 3. Tagging 2. Data post-processing 1. Classification 2. Word count 3. Practice
  • 4. WTF Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages Wikipedia 1.1. NLP Natural language processing (NLP) is all about human text processing Me
  • 7. 1.2. MyTaste 1. How do I know where is the important information? 2. How do I know if the information is a cooking recipe?
  • 9. 2.1. Data processing yo soy programador python DET VRB N N Yo soy programador Python 1. Normalize: Adapt the text depending of your needs 2. Tokenize: Split text in identifiable linguistic units 3. Tagging: Classify words into their parts-of-speech
  • 10. Recipes I love the chocolate Everybody loves the chocolate I still love chocolate! Why you don’t love chocolate? My GF is chocolate Non Recipes Once upon a time.. Remember, remember… print(“hello Python”) Fly, you fools! Let there be rock 2.2. Data post-processing Content selection 1. Classification
  • 11. 2.2. Data post-processing 1. Classification Feature selection • “A recipe must contain the word ‘chocolate’” • “A recipe must contain at least one verb” What! Training MACHINE LEARNING ALGORITHM
  • 12. 2.2. Data post-processing 1. Classification Machine learning algorithm This is my recipe of chocolate This is the beginning of anything I love chocolate Why you don’t love chocolate? I’m becoming insane recipe non recipe recipe recipe non recipe This is my recipe of chocolate This is the beginning of anything I love chocolate Why you don’t love chocolate? I’m becoming insane … Test input
  • 13. 2.2. Data post-processing 2. Word count Frequency distribution algotihm This is my recipe of chocolate This is the beginning of anything I love chocolate Why you don’t love chocolate? I’m becoming insane … Input
  • 14.
  • 16. DownloadingIn csotarpllionrga:: pythpoinp -icn s“itmaplolr tn lntlktk;nltk.download()”
  • 17. Data processing import nltk #Sample text text = “I am a Python programmer” #Normalize (the ‘I’ character in english must be in uppercase) text = text.capitalize() #Tokenize text_tokenized = nltk.word_tokenize(text) #We create the tagger from nltk.corpus import brown default_tag = nltk.DefaultTagger('UNK') tagger = nltk.UnigramTagger(brown.tagged_sents(), backoff=default_tag) #Tagging text_tagged = tagger.tag(text_tokenized) #Output #[('I', u'PPSS'), ('am', u'BEM'), ('a', u'AT'), ('python', u'NN'), ('programmer', u'NN')]
  • 18. Data post-processing: Classification from nltk.corpus import brown with open(‘tagged_recipes.txt’, ‘rb’) as f: recipes = [] for recipe in f.readlines(): recipes.append(recipe, ‘recipe’) non_recipes = [] for line in brown.tagged_sents()[:1000]: non_recipes.append((line, ‘non_recipe’)) all_texts = (recipes + non_recipes) random.shuffle(all_texts)
  • 19. Data post-processing: Classification def feature(text): has_chocolate = any(['chocolate' in word.lower() for (word, pos) in text]) is_american = any(['america' in word.lower() for (word, pos) in text]) return {'has_chocolate': has_chocolate, 'is_american': is_american} featuresets = [(feature(text), t) for (text, t) in all_sents] train_set, test_set = featuresets[:18000], featuresets[18000:] classifier = nltk.NaiveBayesClassifier.train(train_set)
  • 20. Data post-processing: Classification print classifier.show_most_informative_features(5) “”” Most Informative Features has_chocolate = True recipe : non_re = 86.6 : 1.0 is_american = True non_re : recipe = 70.4 : 1.0 is_american = False recipe : non_re = 1.0 : 1.0 has_chocolate = False non_re : recipe = 1.0 : 1.0 “”” print nltk.classify.accuracy(classifier, test_set) “”” 0.514 “””
  • 21. Data post-processing: Word count import nltk with open(“recipes.txt”) as f: tokenized_text = nltk.word_tokenize(f.read()) fdist = nltk.FreqDist(tokenized_text) fdist.plot(50)
  • 23. Examples of usage: • Text recognition • Tag detection in websites • Market analysis • Ads system • Translators • …
  • 24. More things I didn’t explain: • Analyze group of texts (Chunk tagging) • Semantic analysis • How to create a corpus • …
  • 26. Thank you! Iván Compañy @pyivanc