SlideShare a Scribd company logo
1 of 22
Analyzing…
https://blogs.microsoft.com/blog/2018/01/17/future-
computed-artificial-intelligence-role-
society/?MC=DevOps&MC=MachLearn&MC=OfficeO
365&MC=MSAzure&MC=CloudPlat
BeautifulSoup: Web Scraping in Python
 Natural language processing (NLP) is a field of
computer science, artificial intelligence concerned
with the interactions between computers and human
(natural) languages, and, in particular, concerned with
programming computers to fruitfully process large
natural language data.
What???
 In 1950, Alan Turing published an article titled
"Computing Machinery and Intelligence“ which
proposed what is now called the Turing test as a
criterion of intelligence.
 Eliza 1964
ELIZA might provide a generic response, for example,
responding to "My head hurts" with "Why do you say
your head hurts?".
When???
How??
Where?
 Machine Translation
 Fighting Spam
 Mail Inbox or Spam
 Information Extraction
 Social media monitoring
 Summarization
 Question Answering
Tasks in OpenNLP
 The Apache OpenNLP library is a machine
learning based toolkit for the processing of natural
language text.
 It supports the most common NLP tasks, such
as language detection, tokenization, sentence
segmentation, part-of-speech tagging, named entity
extraction, chunking, parsing and co reference
resolution.
Tasks in OpenNLP
 Text data in the form of un-structured text, in the form
of comments, reviews or articles
 Extract meaning full information from them, done
with the help of set of tasks
 Tokenizing:
 Take a large piece of text and break it into smaller components
 Break it into sentences or individual words
Stop words removal
 Once Tokenized,
 next Stop words removal, i.e. differentiating words
which has specific meaning from the words which
adds to structure to the sentence.
 Eg:
N-Grams
 Once Stop words are removed,
 Commonly occurring words in a sentence, because these will be
most important words in the text.
https://en.wikipedia.org/wiki/N-gram#Examples
 If a word appears 2 times in a particular sentence, its called
bigrams.
 Eg:
 Code(s) Description
 M79.661 Pain in right lower leg
 M79.662 Pain in left lower leg
 M79.669 Pain in unspecified lower leg
Word Sense Disambiguation
 Eg:
 I am taking aspirin for my cold
 Let's go inside, I'm cold
 It's cold today, only 2 degrees
 It identifies the meaning of the word, based on the
context it is spoken.
Parts-of-Speech Tagging
 It can either occur as part of WSD, or as a independent
task.
 It helps in identifying parts of speech, whether Noun,
Verb, Adjective, etc.
Stemming
 Eg: Close, Closed, Closely, Closer
 Converting the word to its base form
Python NLTK and OpenNLP
 NLTK is one of the leading platforms for working with
human language data and Python, the module NLTK is
used for natural language processing.
 NLTK is literally an acronym for Natural Language Toolkit.
 The Apache OpenNLP library is a machine learning based
toolkit for the processing of natural language text.
 It supports the most common NLP tasks, such as
tokenization, sentence segmentation, part-of-speech
tagging, named entity extraction, chunking, parsing, and
coreference resolution.
Python NLTK
 Step 1: Collect all individual Sentences in an article, to
a list.
 Tokenization () from NLTK: ie from nltk.tokenize we
can import the functions sent_tokenize(breakdown into
sentences) and word_tokenize(breakdown into words)
 Import stopwords () from nltk.corpus module.
 punctuation from string module.
 Note : Sentence ends with a period symbol(.) and a space
after that.
Frequency distribution
 Construct a frequency distribution : words and no of
times each word occurs
 Functions Defined for NLTK's Frequency Distributions
Example Description
fdist = FreqDist(samples) create a frequency distribution containing the given samples
fdist[sample] += 1 increment the count for this sample
fdist['monstrous'] count of the number of times a given sample occurred
fdist.freq('monstrous') frequency of a given sample
fdist.N() total number of samples
fdist.most_common(n) the n most common samples and their frequencies
for sample in fdist: iterate over the samples
fdist.max() sample with the greatest count
fdist.tabulate() tabulate the frequency distribution
fdist.plot() graphical plot of the frequency distribution
fdist.plot(cumulative=True) cumulative plot of the frequency distribution
fdist1 |= fdist2 update fdist1 with counts from fdist2
fdist1 < fdist2 test if samples in fdist1 occur less frequently than in fdist2
Use Tokenizing: Sentence Detector
 Python Usage
 Step 1: Import NLTK
 Step2:
 text = "Mary had a little lamp. Her fleece was as white as snow"
 from nltk.tokenize import word_tokenize, sent_tokenize
 sents = sent_tokenize(text)
 print(sents)
 Java Usage
 Step 1:
 Step 2: Some Java code snippet
OpenNLP syntax
 OpenNLP components have similar APIs. Normally, to
execute a task, one should provide a model and an
input.
 A model is usually loaded by providing a
FileInputStream with a model to a constructor of the
model class
 try (InputStream modelIn = new
FileInputStream("lang-model-name.bin")) {
SomeModel model = new SomeModel(modelIn); }
Features
 Language detection
 https://www.apache.org/dist/opennlp/models/langdete
ct/1.8.3/README.txt
Breaking into Word
 Python
 words=[word_tokenize(sent) for sent in sents]
 print words
 Java
 InputStream is = new FileInputStream("en-token.bin");
 TokenizerModel model = new TokenizerModel(is);
 Tokenizer tokenizer = new TokenizerME(model);
 String tokens[] = tokenizer.tokenize("Hi. How are you? This
is Mike.");
 for (String a : tokens)
System.out.println(a);
 is.close();
POS - Tagging
https://www.winwaed.com/blog/2011/11/08/part-of-speech-tags/
I hope I made myself
understandable.
Thanks!

More Related Content

What's hot

Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligenceVikram Kumar
 
Types Of Artificial Intelligence | Edureka
Types Of Artificial Intelligence | EdurekaTypes Of Artificial Intelligence | Edureka
Types Of Artificial Intelligence | EdurekaEdureka!
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningHaptik
 
2012/11/14 softlab_study 発表資料「SSDの基礎」
2012/11/14 softlab_study 発表資料「SSDの基礎」2012/11/14 softlab_study 発表資料「SSDの基礎」
2012/11/14 softlab_study 発表資料「SSDの基礎」Ryo Okubo
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningGanesh Satpute
 
Future Trends in Artificial Intelligence
Future Trends in Artificial IntelligenceFuture Trends in Artificial Intelligence
Future Trends in Artificial IntelligenceDR.P.S.JAGADEESH KUMAR
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Simplilearn
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligenceravijain90
 
Mis module v Artificial Intelligence
Mis module v Artificial IntelligenceMis module v Artificial Intelligence
Mis module v Artificial IntelligenceArnav Chowdhury
 
Computer vision (machine learning for developers)
Computer vision (machine learning for developers)Computer vision (machine learning for developers)
Computer vision (machine learning for developers)Rachhek Shrestha
 
Artificial Intelligence (A.I) and Its Application -Seminar
Artificial Intelligence (A.I) and Its Application -SeminarArtificial Intelligence (A.I) and Its Application -Seminar
Artificial Intelligence (A.I) and Its Application -SeminarBIJAY NAYAK
 
Information Extraction
Information ExtractionInformation Extraction
Information Extractionssbd6985
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...Edureka!
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesCharles Nutter
 
Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AIVishal Singh
 
ppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptxppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptxAnweshaGarima
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovZeroTurnaround
 

What's hot (20)

Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
AI
AIAI
AI
 
Types Of Artificial Intelligence | Edureka
Types Of Artificial Intelligence | EdurekaTypes Of Artificial Intelligence | Edureka
Types Of Artificial Intelligence | Edureka
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
2012/11/14 softlab_study 発表資料「SSDの基礎」
2012/11/14 softlab_study 発表資料「SSDの基礎」2012/11/14 softlab_study 発表資料「SSDの基礎」
2012/11/14 softlab_study 発表資料「SSDの基礎」
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Future Trends in Artificial Intelligence
Future Trends in Artificial IntelligenceFuture Trends in Artificial Intelligence
Future Trends in Artificial Intelligence
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Mis module v Artificial Intelligence
Mis module v Artificial IntelligenceMis module v Artificial Intelligence
Mis module v Artificial Intelligence
 
Computer vision (machine learning for developers)
Computer vision (machine learning for developers)Computer vision (machine learning for developers)
Computer vision (machine learning for developers)
 
AI.pptx
AI.pptxAI.pptx
AI.pptx
 
Artificial Intelligence (A.I) and Its Application -Seminar
Artificial Intelligence (A.I) and Its Application -SeminarArtificial Intelligence (A.I) and Its Application -Seminar
Artificial Intelligence (A.I) and Its Application -Seminar
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for Dummies
 
Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AI
 
ppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptxppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptx
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir Ivanov
 

Similar to Natural Language Processing: Comparing NLTK and OpenNLP

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingCloudxLab
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationssChandan Deb
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analyticsshanbady
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Pythonshanbady
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptpavankalyanadroittec
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...Apache OpenNLP
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyoutsider2
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easyGopi Krishnan Nambiar
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text ProcessingSuneel Marthi
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextDataWorks Summit
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extractionGabriel Hamilton
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in RAshraf Uddin
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019 Alexis Agahi
 
Natural language processing using python
Natural language processing using pythonNatural language processing using python
Natural language processing using pythonPrakash Anand
 

Similar to Natural Language Processing: Comparing NLTK and OpenNLP (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analytics
 
Nltk
NltkNltk
Nltk
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Python
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text Processing
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in R
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
 
Natural language processing using python
Natural language processing using pythonNatural language processing using python
Natural language processing using python
 
PYTHON PPT.pptx
PYTHON PPT.pptxPYTHON PPT.pptx
PYTHON PPT.pptx
 
Presentation1
Presentation1Presentation1
Presentation1
 

More from CodeOps Technologies LLP

AWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetupAWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetupCodeOps Technologies LLP
 
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONSBUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONSCodeOps Technologies LLP
 
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICESAPPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICESCodeOps Technologies LLP
 
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPSBUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPSCodeOps Technologies LLP
 
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNERCREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNERCodeOps Technologies LLP
 
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...CodeOps Technologies LLP
 
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESSWRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESSCodeOps Technologies LLP
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaCodeOps Technologies LLP
 
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu SalujaDeploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu SalujaCodeOps Technologies LLP
 
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...CodeOps Technologies LLP
 
YAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra KhareYAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra KhareCodeOps Technologies LLP
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...CodeOps Technologies LLP
 
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta JhaMonitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta JhaCodeOps Technologies LLP
 
Functional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and StreamsFunctional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and StreamsCodeOps Technologies LLP
 
Distributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps FoundationDistributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps FoundationCodeOps Technologies LLP
 
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire  "Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire CodeOps Technologies LLP
 

More from CodeOps Technologies LLP (20)

AWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetupAWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetup
 
Understanding azure batch service
Understanding azure batch serviceUnderstanding azure batch service
Understanding azure batch service
 
DEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNINGDEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNING
 
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONSSERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
 
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONSBUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
 
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICESAPPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
 
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPSBUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
 
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNERCREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
 
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
 
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESSWRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
 
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu SalujaDeploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
 
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
 
YAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra KhareYAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra Khare
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
 
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta JhaMonitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
 
Jet brains space intro presentation
Jet brains space intro presentationJet brains space intro presentation
Jet brains space intro presentation
 
Functional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and StreamsFunctional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and Streams
 
Distributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps FoundationDistributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps Foundation
 
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire  "Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
 

Recently uploaded

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2
 

Recently uploaded (20)

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 

Natural Language Processing: Comparing NLTK and OpenNLP

  • 1.
  • 4.  Natural language processing (NLP) is a field of computer science, artificial intelligence concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language data. What???
  • 5.  In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence“ which proposed what is now called the Turing test as a criterion of intelligence.  Eliza 1964 ELIZA might provide a generic response, for example, responding to "My head hurts" with "Why do you say your head hurts?". When???
  • 7. Where?  Machine Translation  Fighting Spam  Mail Inbox or Spam  Information Extraction  Social media monitoring  Summarization  Question Answering
  • 8. Tasks in OpenNLP  The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.  It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing and co reference resolution.
  • 9. Tasks in OpenNLP  Text data in the form of un-structured text, in the form of comments, reviews or articles  Extract meaning full information from them, done with the help of set of tasks  Tokenizing:  Take a large piece of text and break it into smaller components  Break it into sentences or individual words
  • 10. Stop words removal  Once Tokenized,  next Stop words removal, i.e. differentiating words which has specific meaning from the words which adds to structure to the sentence.  Eg:
  • 11. N-Grams  Once Stop words are removed,  Commonly occurring words in a sentence, because these will be most important words in the text. https://en.wikipedia.org/wiki/N-gram#Examples  If a word appears 2 times in a particular sentence, its called bigrams.  Eg:  Code(s) Description  M79.661 Pain in right lower leg  M79.662 Pain in left lower leg  M79.669 Pain in unspecified lower leg
  • 12. Word Sense Disambiguation  Eg:  I am taking aspirin for my cold  Let's go inside, I'm cold  It's cold today, only 2 degrees  It identifies the meaning of the word, based on the context it is spoken.
  • 13. Parts-of-Speech Tagging  It can either occur as part of WSD, or as a independent task.  It helps in identifying parts of speech, whether Noun, Verb, Adjective, etc. Stemming  Eg: Close, Closed, Closely, Closer  Converting the word to its base form
  • 14. Python NLTK and OpenNLP  NLTK is one of the leading platforms for working with human language data and Python, the module NLTK is used for natural language processing.  NLTK is literally an acronym for Natural Language Toolkit.  The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.  It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.
  • 15. Python NLTK  Step 1: Collect all individual Sentences in an article, to a list.  Tokenization () from NLTK: ie from nltk.tokenize we can import the functions sent_tokenize(breakdown into sentences) and word_tokenize(breakdown into words)  Import stopwords () from nltk.corpus module.  punctuation from string module.  Note : Sentence ends with a period symbol(.) and a space after that.
  • 16. Frequency distribution  Construct a frequency distribution : words and no of times each word occurs  Functions Defined for NLTK's Frequency Distributions Example Description fdist = FreqDist(samples) create a frequency distribution containing the given samples fdist[sample] += 1 increment the count for this sample fdist['monstrous'] count of the number of times a given sample occurred fdist.freq('monstrous') frequency of a given sample fdist.N() total number of samples fdist.most_common(n) the n most common samples and their frequencies for sample in fdist: iterate over the samples fdist.max() sample with the greatest count fdist.tabulate() tabulate the frequency distribution fdist.plot() graphical plot of the frequency distribution fdist.plot(cumulative=True) cumulative plot of the frequency distribution fdist1 |= fdist2 update fdist1 with counts from fdist2 fdist1 < fdist2 test if samples in fdist1 occur less frequently than in fdist2
  • 17. Use Tokenizing: Sentence Detector  Python Usage  Step 1: Import NLTK  Step2:  text = "Mary had a little lamp. Her fleece was as white as snow"  from nltk.tokenize import word_tokenize, sent_tokenize  sents = sent_tokenize(text)  print(sents)  Java Usage  Step 1:  Step 2: Some Java code snippet
  • 18. OpenNLP syntax  OpenNLP components have similar APIs. Normally, to execute a task, one should provide a model and an input.  A model is usually loaded by providing a FileInputStream with a model to a constructor of the model class  try (InputStream modelIn = new FileInputStream("lang-model-name.bin")) { SomeModel model = new SomeModel(modelIn); }
  • 19. Features  Language detection  https://www.apache.org/dist/opennlp/models/langdete ct/1.8.3/README.txt
  • 20. Breaking into Word  Python  words=[word_tokenize(sent) for sent in sents]  print words  Java  InputStream is = new FileInputStream("en-token.bin");  TokenizerModel model = new TokenizerModel(is);  Tokenizer tokenizer = new TokenizerME(model);  String tokens[] = tokenizer.tokenize("Hi. How are you? This is Mike.");  for (String a : tokens) System.out.println(a);  is.close();
  • 22. I hope I made myself understandable. Thanks!