SlideShare a Scribd company logo
Basic Natural Language Processing
using
Natural (JavaScript/Node) Library
Aniruddha Chakrabarti
AVP and Chief Architect, Digital, Mphasis
@anchakra | Linkedin.com/in/aniruddhac | slideshare.net/aniruddha.chakrabarti/
Agenda
• Emergence of Artificial Intelligence, AI First
• What is Natural Language Processing (NLP)
• Natural JavaScript/Node NLP Library
• Tokenization - Word Tokenizer
• Stemming and Lemmatization
• String Distance
• Inflectors
• Phonetics
• N-Grams
• Classifier
• tf-idf
• POS Tagger
• Spell Check
→ Turing Machine
→ Automating manual processes,
tabulating data
→ Reducing manual effort and time
→ IBM System/360 (S/360),
Mainframes, AS/400
→ Computing Power (Moore’s Law)
→ Systems need to be explicitly programmed using
explicit logic and rules. Pre programmed
→ Personal Computers (PCs), Communication
(Networked PCs, Client/Server, Internet, WWW)
→ Automating business processes
→ Mostly structured data
→ Systems that learn from historical data and can make predictions. Not
rule based system.
→ Uses Machine Learning, NLP to analyze unstructured data (text, image,
audio, video)
→ Predictive Analytics, Deep Learning, Neural Nets,
→ OCR, Speech recognition, Text to speech, Face recognition, Video
analysis, …
→ Cognitive Services (pay as you go model) – IBM Watson, Microsoft
Cognitive Services, …
→ Robotics, Internet of Things, Conversational Systems, Wearables, Blur of
physical & virtual
→ Still mostly Weak AI / Narrow AI
Third Era of Computing * - AI First/AI Everywhere (Cognitive Systems)
* From “The Computing Universe” by Tony Hey and Gyuri Papav
→ Strong AI / Full AI
→ Artificial General
Intelligence (AGI)
Tabulating Machines
1960 – 1980
Programmable Systems
1980 - 2010
AI First/AI Everywhere
(Cognitive Systems)
2010 - Current
Real AI ?
?
AI Winter AI Summer
• Artificial Intelligence has emerged as the third era of computing after tabulating machine and
programmable systems.
Gartner Hype Cycle … 2017
• AI technologies like Cognitive Computing, Virtual
Assistants/Chatbot, Conversational AI, Machine
Learning, Deep Learning and Autonomous Vehicles
appear at the peak in Gartner Hype Cycle of Emerging
Technologies, 2017.
• Reinforcement Learning and Artificial General
Intelligence (AGI) has appeared at the starting points of
hype cycle – they are expected to peak in coming years.
Emergence of “AI Everywhere”
Gartner recons AI as one of the
three mega trends. AI
technologies like
Conversational UI, Machine
Learning, Deep Learning and
Cognitive Computing
constitutes “AI Everywhere”
What is Natural Language Processing?
• Field of computer science, artificial intelligence and computational linguistics concerned
with the interactions between computers and human (natural) languages, and, in particular,
concerned with programming computers to fruitfully process large natural language corpora –
Wikipedia
• Broadly categorized into two areas -
▪ Natural Language Understanding (NLU)
▪ Natural Language Generation (NLG)
Natural Language
Processing (NLP)
Natural Language
Understanding (NLU)
Natural Language
Generation (NLG)
Some applications of NLP
• Spell correction (MS Word/ any other editor)
• Search engines (Google, Bing, Yahoo, wolfram alpha)
• Speech engines (Siri, Google Voice, Cortana)
• Personal Voice Assistants (Amazon Alexa, Google Home, …)
• Spam classifiers (All e-mail services)
• News feeds (Google, Yahoo!, and so on)
• Machine translation (Google Translate, and so on)
• Chatbots, Intelligent Virtual Agent/IVA
• IBM Watson, Microsoft LUIS, Amazon Lex/Alexa
NLP Tools & Libraries
• GATE
• Mallet (Java)
• Open NLP – Apache (Java)
• UIMA
• CoreNLP - Stanford CoreNLP toolkit (Java)
• Genism
• Natural Language Toolkit / NLTK (Python) – by far the most popular NLP library & tool
• spaCy (Python) – built on top of NLTK
• TextBlob
• Natural Library (JavaScript/Node)
NLTK
What is Natural
• "Natural" is a general natural language processing library for nodejs.
• Supports basic NLP tasks like tokenizing, stemming, classification, phonetics, tf-idf, WordNet,
string similarity, inflections
• At the moment, most of the algorithms are English-specific
• Created by Chris Umbel
• Loosely based on NLTK (Python) NLP Library
• https://github.com/NaturalNode/natural
• http://www.chrisumbel.com/article/node_js_natural_language_porter_stemmer_lancaster_baye
s_naive_metaphone_soundex
Natural library install and setup
• Install using npm (Package manager for Node), use –g switch (for global installation)
• Include the Natural package through require
npm install –g natural
// include the natural library
let Natural = require('natural');
Tokenization
• A word (Token) is the minimal unit that a machine can understand and process.
• Tokenization is the process of splitting the raw string into meaningful tokens
• Raw text cannot be further processed without going through tokenization.
• Complexity of tokenization varies according to the need of the NLP application, and the
complexity of the language itself.
▪ In English it can be as simple as choosing only words and numbers through a regular
expression. But for Chinese and Japanese, it will be a very complex task.
• Two primary types of tokenizers:
▪ Word Tokenizer: Tokenizes raw text to words
▪ Sentence Tokenizer: Tokenizes raw text to sentences
Word Tokenizer
• A word (Token) is the minimal unit that a machine can understand & process
• Tokenization is the process of splitting the raw string into meaningful tokens – Tokenizer
tokenizes or splits raw text into words
• Natural comes with multiple tokenizers -
▪ Word Tokenizer: a tokenizer that divides a text into sequences of alphabetic and
numeric characters. (Ignores punctuation)
▪ Word Punct Tokenizer: Word + punctuation tokenizer. A tokenizer that divides a text into
sequences of alphabetic and non-alphabetic characters.
▪ Treebank Word Tokenizer: uses regular expressions to tokenize text as in Penn
Treebank
▪ Regexp Tokenizer: Tokenizes text using regular expression patterns.
▪ Aggressive Tokenizer:
Word Tokenizer (Cont’d)
var sentence = "Hello, how are you? I don't know you!"
var wordTokenizer = new Natural.WordTokenizer();
var tokens = wordTokenizer.tokenize(sentence);
console.log(tokens);
// prints [ 'Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ]
var tokenizer = new Natural.WordPunctTokenizer();
var tokens = tokenizer.tokenize(sentence);
console.log(tokens);
// prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’,
// 't’, 'know', 'you', '!' ]
var tokenizer = new Natural. TreebankWordTokenizer();
var tokens = tokenizer.tokenize(sentence);
console.log(tokens);
// prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’,
// 't’, 'know', 'you', '!' ]
console.log(new Natural.AgressiveTokenizer().tokenize(sentence));
// prints ['Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ]
Stemming
• Process of reducing inflected or derived words to their word stem, base or root form.
• Similar to cutting down the branches of a tree to its stem
• More of a crude rule-based process by which we want to club together different variations of
the token – rule based
• Removes –s/es or -ing or -ed
eating, eats, eaten, eat -> eat
stopping, stopped, stops, stop -> stop
ate -> ate (wrong should be eat)
Stemming (Cont’d)
• Different stemming algorithms -
▪ Lovins Stemmer - First published stemmer was written by Julie Beth Lovins in 1968.
Lovins Stemmer is not used currently.
▪ Porter Stemmer - Written by Martin Porter and in July 1980. Very widely used and
became the de facto standard algorithm used for English stemming.
▪ Lancaster Stemmer - Paice/Husk stemmer developed at Lancaster University. The
stemmer, although remaining efficient and easily implemented, is known to be very
strong and aggressive. The stemmer utilizes a single table of rules, each of which may
specify the removal or replacement of an ending.
▪ Snowball Stemmer – Also called Porter2 stemmer, since this is an updated version of
original Porter Stemmer. Natural does not support Snowball Stemmer
• Lemmatization is a more robust and methodical way of combining grammatical variations to
the root of a word.
▪ Natural does not support any Lemmatization algorithm.
▪ NLTK and other matured NLP libraries support Lemmatization
Stemming – Porter Stemmer and Lancaster Stemmer
var porterStemmer = Natural.PorterStemmer;
console.log(porterStemmer.stem("ate")); // prints at
console.log(porterStemmer.stem("eating")); // prints eat
console.log(porterStemmer.stem("eats")); // prints eat
console.log(porterStemmer.stem("eat")); // prints eat
console.log(porterStemmer.stem("agreement")); // prints agreement
var lancasterStemmer = Natural.LancasterStemmer;
console.log(lancasterStemmer.stem("ate")); // prints at
console.log(lancasterStemmer.stem("eating")); // prints eat
console.log(lancasterStemmer.stem("eats")); // prints eat
console.log(lancasterStemmer.stem("eat")); // prints eat
console.log(lancasterStemmer.stem("agreement")); // prints agr
• Natural supports Porter Stemmer and Lancaster Stemmer only. It does not support Snowball
Stemmer.
• Both the stemmers provide a stem method
Stemming – Porter Stemmer (Non English languages)
• Natural supports Porter Stemmer in Non English languages also
• Following languages are supported -
▪ Farsi - PorterStemmerFa
▪ French - PorterStemmerFr
▪ Russian - PorterStemmerRu
▪ Spanish - PorterStemmerEs
▪ Italian - PorterStemmerIt
▪ PorterStemmerNo
▪ Swedish - PorterStemmerSv
▪ PorterStemmerPt
Lemmatization
• More methodical way of converting all the grammatical/inflected forms of the root of the
word.
• Uses context and part of speech to determine the inflected form of the word and applies
different normalization rules for each part of speech to get the root word (lemma)
• Natural NLP library does not support Lemmatization.
Inflector
• Inflectors are used to pluralize or singularize words
• There are different types of Inflectors available in Natural Library
▪ Noun Inflector: pluralize or singularize nouns only
▪ Verb Inflector: Verbs can be pluralized/singularized with a Verb Inflector. Natural
provides a inflector called PresentVerbInflector which works on Present Tense Verbs
only
▪ Both noun and verb inflector provides singularize and pluralize methods
▪ Number or Count Inflector: Ordinal numbers could be formed from normal number
▪ Provides a single method called nth which returns the ordinal form of any number
passed
Inflector (Cont’d)
// pluralize or singularize nouns only
var nounInflector = new Natural.NounInflector();
console.log(nounInflector.pluralize("Book")); // prints Books
console.log(nounInflector.pluralize("radius")); // prints radii
console.log(nounInflector.singularize("flies")); // prints fly
console.log(nounInflector.singularize("men")); // prints man
var countInflector = Natural.CountInflector;
console.log(countInflector.nth("1")); // prints 1st
console.log(countInflector.nth("2")); // prints 2nd
console.log(countInflector.nth("3")); // prints 3rd
console.log(countInflector.nth("4")); // prints 4th
console.log(countInflector.nth("10")); // prints 10th
var verbInflector = new Natural.PresentVerbInflector();
console.log(verbInflector.singularize("go")); // prints goes
console.log(verbInflector.singularize("run")); // prints runs
console.log(verbInflector.pluralize("becomes")); // prints become
console.log(verbInflector.pluralize("presents")); // prints present
N-Grams
• an n-gram is a contiguous sequence of n items from a given sample of text or speech.
• The items can be phonemes, syllables, letters, words or base pairs according to the
application. The n-grams typically are collected from a text or speech corpus.
• When the items are words, n-grams may also be called shingles
• An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram"; size 3 is a "trigram".
• Larger sizes are sometimes referred to by the value of n in modern language, e.g., "four-
gram", "five-gram", and so on.
Hello how are you Hello how how are are you
bigram
Hello how are you Hello how are how are you
trigram
Hello how are you Hello
unigram
how are you
N-Grams (Cont’d)
var sentence = "Hello how are you";
var ngrams = Natural.NGrams;
console.log(ngrams.bigrams(sentence));
// prints [ [ 'Hello', 'how' ], [ 'how', 'are' ], [ 'are', 'you' ] ]
console.log(ngrams.trigrams(sentence));
// prints [ [ 'Hello', 'how', 'are' ], [ 'how', 'are', 'you' ] ]
console.log(ngrams.ngrams(sentence, 1)); // unigram
//prints [ [ 'Hello' ], [ 'how' ], [ 'are' ], [ 'you' ] ]
sentence = "NLTK is a Natural Language Processing Library in Nodejs";
console.log(ngrams.ngrams(sentence, 4)); // four-gram
prints [ [ 'NLTK', 'is', 'a', 'Natural' ],
[ 'is', 'a', 'Natural', 'Language' ],
[ 'a', 'Natural', 'Language', 'Processing' ],
[ 'Natural', 'Language', 'Processing', 'Library' ],
[ 'Language', 'Processing', 'Library', 'in' ],
[ 'Processing', 'Library', 'in', 'Nodejs' ] ]
Phonetics
• A phonetic algorithm is an algorithm for indexing of words by their pronunciation.
• A phonetic matching algorithm is an algorithm that matches word by their pronunciation rather
than spelling.
• Most phonetic algorithms were developed for use with the English language. Consequently,
applying the rules to words in other languages might not give a meaningful result.
• Some of the well known phonetics algorithms are –
▪ Soundex - Developed to encode surnames for use in censuses. Soundex codes are four-
character strings composed of a single letter followed by three numbers.
▪ Daitch–Mokotoff Soundex - Refinement of Soundex designed to better match surnames of
Slavic & Germanic origin. Daitch–Mokotoff Soundex codes are strings composed of six
numeric digits.
▪ Cologne phonetics - Similar to Soundex, but more suitable for German words.
▪ Metaphone, Double Metaphone, and Metaphone 3 - Suitable for use with most English
words, not just names. Metaphone algorithms are basis for many popular spell checkers.
▪ New York State Identification and Intelligence System (NYSIIS) - Maps similar phonemes to
the same letter. The result is a string that can be pronounced by the reader without decoding.
▪ Match Rating Approach developed by Western Airlines in 1977 - this algorithm has an
encoding and range comparison technique.
▪ Caverphone, created to assist in data matching between late 19th century and early 20th
century electoral rolls, optimized for accents present in parts of New Zealand.
Phonetics Matching (Cont’d)
• Natural supports Phonetic Matching using three algorithms –
▪ SoundEx
▪ Metaphone
▪ DoubleMetaphone
var metaphone = Natural.Metaphone;
var soundex = Natural.SoundEx;
var doubleMetaphone = Natural.DoubleMetaphone;
// using SoundEx for phonetic matching
console.log(soundex.compare("nuremberg", "nuremburg")); // returns true
console.log(soundex.compare("Paris", "Pari")); // returns false
// using Metaphone for phonetic matching
console.log(metaphone.compare("Fool", "Full")); // returns true
console.log(metaphone.compare("Fool", "Failed")); // returns false
// using Double Metaphone for phonetic matching
console.log(doubleMetaphone.compare("Bangalore", "Bengaluru")); // returns true
console.log(doubleMetaphone.compare("Mumbai", "Bombay")); // returns false
String Distance
• String Distance measures how closely two strings match.
• Natural provides JaroWinkler Distance and Levenshtein Distance algorithms for String
Distance match
JaroWinkler Distance
• Jaro distance between two words is the minimum number of single-character transpositions
required to change one word into the other.
• It is a variant proposed in 1990 by William E. Winkler of the Jaro distance metric (1989,
Matthew A. Jaro).
• Returns a number between 0 and 1 which tells how closely the strings match (0 = no match,
1 = exact match)
// Using JaroWrinkler Distance algorithm
console.log(Natural.JaroWinklerDistance("Hello", "Hello")); // returns 1: exact match
console.log(Natural.JaroWinklerDistance("Me", "You")); // returns 0: no match
console.log(Natural.JaroWinklerDistance("Bangalore", "Bengaluru")); // returns 0.72: partial match
console.log(Natural.JaroWinklerDistance("Mumbai", "Bombay")); // returns 0.66: partial match
String Distance - Levenstein Distance
• Levenstein Distance between two words is the minimum number of single-character edits
(insertions, deletions or substitutions) required to change one word into the other.
• Named after the Soviet mathematician Vladimir Levenshtein, who considered this distance
in 1965
• Also be referred as edit distance
// Using Levenshtein Distance algorithm
console.log(Natural.LevenshteinDistance("Hello", "Hello")); // 0
console.log(Natural.LevenshteinDistance("Bangalore", "Bengaluru")); // 3
console.log(Natural.LevenshteinDistance("Mumbai", "Bombay")); // 3
console.log(Natural.LevenshteinDistance("Chennai", "Madras")); // 6
console.log(Natural.LevenshteinDistance("Nuremberg", "Nuremburg")); // 1
B a n g a l o r e B e n g a l u r u
3 character change
N u r e m b e r g N u r e m b u r g
1 character change
tf-idf
• tf–idf or TFIDF is short for term frequency - inverse document frequency
• tf-idf determines how important a word (or words) is to a document relative to a corpus.
• Often used as weighting factor in searches of information retrieval, text mining & user modeling.
• The tf-idf value increases proportionally to the number of times a word appears in the
document and is offset by the frequency of the word in the corpus, which helps to adjust for
the fact that some words appear more frequently in general.
• tfidf method returns the measure of importance of a word
var tfidf = new Natural.TfIdf();
// Documents could be added to tf-idf. Here only a single doc is added, but more could be added
tfidf.addDocument("this document is about node. Its also about NLP. Node is used for it");
// Find out the tf-idf of different words in the document
console.log(tfidf.tfidf("node", 0)); // prints 0.61 as node appears multiple times in the doc
console.log(tfidf.tfidf("NLP", 0)); // prints 0.30 as NLP appears only single time
console.log(tfidf.tfidf("ruby", 0)); // prints 0 as ruby does not appear in the doc
console.log(tfidf.listTerms(0)); [ { term: 'node', tfidf: 0.6137056388801094 },
{ term: 'document', tfidf: 0.3068528194400547 },
{ term: 'nlp', tfidf: 0.3068528194400547 },
{ term: 'used', tfidf: 0.3068528194400547 } ]
tf-idf (cont’d)
• Disc files could also be added to tf-idf
• Multiple documents could be added to tf-idf
var tfidf = new Natural.TfIdf();
// Adding files from disc to tfidf
tfidf.addFileSync("C:/Data/Profile.txt");
console.log(tfidf.listTerms(0));
// Multiple documents added to tdidf which forms the entire corpus
tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it');
tfidf.addDocument('this document is about ruby.');
tfidf.addDocument('this document is about ruby and node.');
console.log(tfidf.tfidf("node", 0)); // prints 2
console.log(tfidf.tfidf("NLP", 0)); // prints 1.40
console.log(tfidf.tfidf("ruby", 0)); // prints 0
console.log(tfidf.tfidf("node", 1)); // prints 0 as node does not appear in 2nd doc
console.log(tfidf.tfidf("ruby", 1)); // prints 1 as ruby appears in 2nd doc
console.log(tfidf.tfidf("node", 2)); // prints 1 as node appears in 3rd doc
console.log(tfidf.tfidf("ruby", 2)); // prints 1 as ruby appears in 3rd doc
tf-idf (cont’d)
• tfidf method returns the measure of importance of a word in various documents
• tfidf method accepts the word and a callback
// Multiple documents added to tdidf which forms the entire corpus
tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it');
tfidf.addDocument('this document is about ruby.');
tfidf.addDocument('this document is about ruby and node.’);
// tfidfs method is used to find the importance of the word across multiple documents
tfidf.tfidfs('node', function(ctr, measure){
console.log('tf-idf of node in document #' + ctr + ' is ' + measure);
});
POS (Part of Speech) Tagging
• Process of marking up a word in a text (corpus) as corresponding to a particular part of
speech, based on both its definition and its context—i.e., its relationship with adjacent and
related words in a phrase, sentence, or paragraph.
• Also called grammatical tagging or word-category disambiguation,
POS (Part of Speech) Tagging
• Current state of the art POS tagging algorithms can predict the POS of the given word with
a higher degree of precision (that is approximately 97%). But still lots of research going on
in the area of POS tagging.
No Tag Description
1. CC Coordinating conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN Preposition or subordinating conjunction
7. JJ Adjective
8. JJR Adjective, comparative
9. JJS Adjective, superlative
10. LS List item marker
11. MD Modal
12. NN Noun, singular or mass
13. NNS Noun, plural
14. NNP Proper noun, singular
15. NNPS Proper noun, plural
16. PDT Predeterminer
17. POS Possessive ending
18. PRP Personal pronoun
No Tag Description
19. PRP$ Possessive pronoun
20. RB Adverb
21. RBR Adverb, comparative
22. RBS Adverb, superlative
23. RP Particle
24. SYM Symbol
25. TO to
26. UH Interjection
27. VB Verb, base form
28. VBD Verb, past tense
29. VBG Verb, gerund or present participle
30. VBN Verb, past participle
31. VBP Verb, non-3rd person singular present
32. VBZ Verb, 3rd person singular present
33. WDT Wh-determiner
34. WP Wh-pronoun
35. WP$ Possessive wh-pronoun
36. WRB Wh-adverb
POS Tagging – Brill POS Tagger
• Natural supports POS tagging through Brill POS Tagger that implements Eric Brill's
transformational algorithm (transformation rules are specified in external files).
• E. Brill's tagger, most widely used English POS-taggers, employs rule-based algorithms.
// Path where natural library is located
var baseFolder = path.join(path.dirname(require.resolve("natural")), "brill_pos_tagger");
// Rules file located in /data/<language> sub folder under natural library
var rulesFilename = baseFolder + "/data/English/tr_from_posjs.txt";
// Lexicon file located in /data/<language> sub folder under natural library
var lexiconFilename = baseFolder + "/data/English/lexicon_from_posjs.json";
var defaultCategory = 'N';
var lexicon = new Natural.Lexicon(lexiconFilename, defaultCategory);
var rules = new Natural.RuleSet(rulesFilename);
// Any tagger needs lexicon and rules for successful POS tagging of words
// Brill POS Tagger object is created passing lexicon file and rules file location
var tagger = new Natural.BrillPOSTagger(lexicon, rules);
var sentence = "I see the man with the telescope";
var tokenizer = new Natural.WordTokenizer();
// tokenize the sentence to tokens
var tokens = tokenizer.tokenize(sentence);
console.log(tagger.tag(tokens));
[ [ 'I', 'NN' ],
[ 'see', 'VB' ],
[ 'the', 'DT' ],
[ 'man', 'NN' ],
[ 'with', 'IN' ],
[ 'the', 'DT' ],
[ 'telescope', 'NN' ] ]

More Related Content

What's hot

Context free grammar
Context free grammar Context free grammar
Context free grammar
Mohammad Ilyas Malik
 
S/MIME
S/MIMES/MIME
S/MIME
maria azam
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
Gabriel Hamilton
 
3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cfl3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cfl
Sampath Kumar S
 
Message authentication and hash function
Message authentication and hash functionMessage authentication and hash function
Message authentication and hash function
omarShiekh1
 
Symmetric Key Algorithm
Symmetric Key AlgorithmSymmetric Key Algorithm
Symmetric Key Algorithm
SHUBHA CHATURVEDI
 
Remote invocation
Remote invocationRemote invocation
Remote invocation
ishapadhy
 
MoM2010: Arabic natural language processing
MoM2010: Arabic natural language processingMoM2010: Arabic natural language processing
MoM2010: Arabic natural language processingHend Al-Khalifa
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
Hemantha Kulathilake
 
Detection of phishing websites
Detection of phishing websitesDetection of phishing websites
Detection of phishing websites
m srikanth
 
x.509-Directory Authentication Service
x.509-Directory Authentication Servicex.509-Directory Authentication Service
x.509-Directory Authentication Service
Swathy T
 
RECURSIVE DESCENT PARSING
RECURSIVE DESCENT PARSINGRECURSIVE DESCENT PARSING
RECURSIVE DESCENT PARSING
Jothi Lakshmi
 
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone ProjectEmotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Diego Rios
 
HTTP Request and Response Structure
HTTP Request and Response StructureHTTP Request and Response Structure
HTTP Request and Response Structure
BhagyashreeGajera1
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
Robert Lujo
 
Lecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfLecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdf
Deptii Chaudhari
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Engine
rajdeep
 
Virus and its CounterMeasures -- Pruthvi Monarch
Virus and its CounterMeasures                         -- Pruthvi Monarch Virus and its CounterMeasures                         -- Pruthvi Monarch
Virus and its CounterMeasures -- Pruthvi Monarch
Pruthvi Monarch
 
Semi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguationSemi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguation
kokanechandrakant
 
Enterprise messaging with jms
Enterprise messaging with jmsEnterprise messaging with jms
Enterprise messaging with jms
Sridhar Reddy
 

What's hot (20)

Context free grammar
Context free grammar Context free grammar
Context free grammar
 
S/MIME
S/MIMES/MIME
S/MIME
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cfl3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cfl
 
Message authentication and hash function
Message authentication and hash functionMessage authentication and hash function
Message authentication and hash function
 
Symmetric Key Algorithm
Symmetric Key AlgorithmSymmetric Key Algorithm
Symmetric Key Algorithm
 
Remote invocation
Remote invocationRemote invocation
Remote invocation
 
MoM2010: Arabic natural language processing
MoM2010: Arabic natural language processingMoM2010: Arabic natural language processing
MoM2010: Arabic natural language processing
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
Detection of phishing websites
Detection of phishing websitesDetection of phishing websites
Detection of phishing websites
 
x.509-Directory Authentication Service
x.509-Directory Authentication Servicex.509-Directory Authentication Service
x.509-Directory Authentication Service
 
RECURSIVE DESCENT PARSING
RECURSIVE DESCENT PARSINGRECURSIVE DESCENT PARSING
RECURSIVE DESCENT PARSING
 
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone ProjectEmotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
 
HTTP Request and Response Structure
HTTP Request and Response StructureHTTP Request and Response Structure
HTTP Request and Response Structure
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Lecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfLecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdf
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Engine
 
Virus and its CounterMeasures -- Pruthvi Monarch
Virus and its CounterMeasures                         -- Pruthvi Monarch Virus and its CounterMeasures                         -- Pruthvi Monarch
Virus and its CounterMeasures -- Pruthvi Monarch
 
Semi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguationSemi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguation
 
Enterprise messaging with jms
Enterprise messaging with jmsEnterprise messaging with jms
Enterprise messaging with jms
 

Similar to NLP using JavaScript Natural Library

Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
Paul Kahoro
 
Natural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesNatural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital Humanities
Xiang Li
 
தமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புதமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்பு
BalaSundaraRaman (Sundar)
 
The Holistic Programmer
The Holistic ProgrammerThe Holistic Programmer
The Holistic Programmer
Adam Keys
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholisticoscon2007
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
Korakot Chaovavanich
 
Preventing Complexity in Game Programming
Preventing Complexity in Game ProgrammingPreventing Complexity in Game Programming
Preventing Complexity in Game Programming
Yaser Zhian
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013Iván Montes
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)
Hon Weng Chong
 
Taming Text
Taming TextTaming Text
Taming Text
Grant Ingersoll
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Apache OpenNLP
 
Polyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software DesignPolyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software Designkompalg
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...gagravarr
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
Tyrone Systems
 
Nltk
NltkNltk
Nltk
Anirudh
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
gagravarr
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
Shifa Khan
 
PCEP Module 1.pptx
PCEP Module 1.pptxPCEP Module 1.pptx
PCEP Module 1.pptx
zakariaHujale
 

Similar to NLP using JavaScript Natural Library (20)

Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
Natural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesNatural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital Humanities
 
தமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புதமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்பு
 
The Holistic Programmer
The Holistic ProgrammerThe Holistic Programmer
The Holistic Programmer
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
Antlr Conexaojava
Antlr ConexaojavaAntlr Conexaojava
Antlr Conexaojava
 
Preventing Complexity in Game Programming
Preventing Complexity in Game ProgrammingPreventing Complexity in Game Programming
Preventing Complexity in Game Programming
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)
 
Taming Text
Taming TextTaming Text
Taming Text
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 
Polyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software DesignPolyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software Design
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
Nltk
NltkNltk
Nltk
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
PCEP Module 1.pptx
PCEP Module 1.pptxPCEP Module 1.pptx
PCEP Module 1.pptx
 

More from Aniruddha Chakrabarti

Pinecone Vector Database.pdf
Pinecone Vector Database.pdfPinecone Vector Database.pdf
Pinecone Vector Database.pdf
Aniruddha Chakrabarti
 
Mphasis-Annual-Report-2018.pdf
Mphasis-Annual-Report-2018.pdfMphasis-Annual-Report-2018.pdf
Mphasis-Annual-Report-2018.pdf
Aniruddha Chakrabarti
 
Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...
Aniruddha Chakrabarti
 
Dart programming language
Dart programming languageDart programming language
Dart programming language
Aniruddha Chakrabarti
 
Third era of computing
Third era of computingThird era of computing
Third era of computing
Aniruddha Chakrabarti
 
Golang - Overview of Go (golang) Language
Golang - Overview of Go (golang) LanguageGolang - Overview of Go (golang) Language
Golang - Overview of Go (golang) Language
Aniruddha Chakrabarti
 
Amazon alexa - building custom skills
Amazon alexa - building custom skillsAmazon alexa - building custom skills
Amazon alexa - building custom skills
Aniruddha Chakrabarti
 
Using Node-RED for building IoT workflows
Using Node-RED for building IoT workflowsUsing Node-RED for building IoT workflows
Using Node-RED for building IoT workflows
Aniruddha Chakrabarti
 
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Aniruddha Chakrabarti
 
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Aniruddha Chakrabarti
 
Future of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows PlatformsFuture of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows Platforms
Aniruddha Chakrabarti
 
CoAP - Web Protocol for IoT
CoAP - Web Protocol for IoTCoAP - Web Protocol for IoT
CoAP - Web Protocol for IoT
Aniruddha Chakrabarti
 
Groovy Programming Language
Groovy Programming LanguageGroovy Programming Language
Groovy Programming Language
Aniruddha Chakrabarti
 
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoTMphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoTAniruddha Chakrabarti
 
Level DB - Quick Cheat Sheet
Level DB - Quick Cheat SheetLevel DB - Quick Cheat Sheet
Level DB - Quick Cheat Sheet
Aniruddha Chakrabarti
 
Lisp
LispLisp
Overview of CoffeeScript
Overview of CoffeeScriptOverview of CoffeeScript
Overview of CoffeeScript
Aniruddha Chakrabarti
 
memcached Distributed Cache
memcached Distributed Cachememcached Distributed Cache
memcached Distributed Cache
Aniruddha Chakrabarti
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
Aniruddha Chakrabarti
 

More from Aniruddha Chakrabarti (20)

Pinecone Vector Database.pdf
Pinecone Vector Database.pdfPinecone Vector Database.pdf
Pinecone Vector Database.pdf
 
Mphasis-Annual-Report-2018.pdf
Mphasis-Annual-Report-2018.pdfMphasis-Annual-Report-2018.pdf
Mphasis-Annual-Report-2018.pdf
 
Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...
 
Dart programming language
Dart programming languageDart programming language
Dart programming language
 
Third era of computing
Third era of computingThird era of computing
Third era of computing
 
Golang - Overview of Go (golang) Language
Golang - Overview of Go (golang) LanguageGolang - Overview of Go (golang) Language
Golang - Overview of Go (golang) Language
 
Amazon alexa - building custom skills
Amazon alexa - building custom skillsAmazon alexa - building custom skills
Amazon alexa - building custom skills
 
Using Node-RED for building IoT workflows
Using Node-RED for building IoT workflowsUsing Node-RED for building IoT workflows
Using Node-RED for building IoT workflows
 
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
 
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
 
Future of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows PlatformsFuture of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows Platforms
 
CoAP - Web Protocol for IoT
CoAP - Web Protocol for IoTCoAP - Web Protocol for IoT
CoAP - Web Protocol for IoT
 
Groovy Programming Language
Groovy Programming LanguageGroovy Programming Language
Groovy Programming Language
 
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoTMphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoT
 
Level DB - Quick Cheat Sheet
Level DB - Quick Cheat SheetLevel DB - Quick Cheat Sheet
Level DB - Quick Cheat Sheet
 
Lisp
LispLisp
Lisp
 
Overview of CoffeeScript
Overview of CoffeeScriptOverview of CoffeeScript
Overview of CoffeeScript
 
memcached Distributed Cache
memcached Distributed Cachememcached Distributed Cache
memcached Distributed Cache
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
 
pebble - Building apps on pebble
pebble - Building apps on pebblepebble - Building apps on pebble
pebble - Building apps on pebble
 

Recently uploaded

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Yara Milbes
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 

Recently uploaded (20)

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 

NLP using JavaScript Natural Library

  • 1. Basic Natural Language Processing using Natural (JavaScript/Node) Library Aniruddha Chakrabarti AVP and Chief Architect, Digital, Mphasis @anchakra | Linkedin.com/in/aniruddhac | slideshare.net/aniruddha.chakrabarti/
  • 2. Agenda • Emergence of Artificial Intelligence, AI First • What is Natural Language Processing (NLP) • Natural JavaScript/Node NLP Library • Tokenization - Word Tokenizer • Stemming and Lemmatization • String Distance • Inflectors • Phonetics • N-Grams • Classifier • tf-idf • POS Tagger • Spell Check
  • 3. → Turing Machine → Automating manual processes, tabulating data → Reducing manual effort and time → IBM System/360 (S/360), Mainframes, AS/400 → Computing Power (Moore’s Law) → Systems need to be explicitly programmed using explicit logic and rules. Pre programmed → Personal Computers (PCs), Communication (Networked PCs, Client/Server, Internet, WWW) → Automating business processes → Mostly structured data → Systems that learn from historical data and can make predictions. Not rule based system. → Uses Machine Learning, NLP to analyze unstructured data (text, image, audio, video) → Predictive Analytics, Deep Learning, Neural Nets, → OCR, Speech recognition, Text to speech, Face recognition, Video analysis, … → Cognitive Services (pay as you go model) – IBM Watson, Microsoft Cognitive Services, … → Robotics, Internet of Things, Conversational Systems, Wearables, Blur of physical & virtual → Still mostly Weak AI / Narrow AI Third Era of Computing * - AI First/AI Everywhere (Cognitive Systems) * From “The Computing Universe” by Tony Hey and Gyuri Papav → Strong AI / Full AI → Artificial General Intelligence (AGI) Tabulating Machines 1960 – 1980 Programmable Systems 1980 - 2010 AI First/AI Everywhere (Cognitive Systems) 2010 - Current Real AI ? ? AI Winter AI Summer • Artificial Intelligence has emerged as the third era of computing after tabulating machine and programmable systems.
  • 4. Gartner Hype Cycle … 2017 • AI technologies like Cognitive Computing, Virtual Assistants/Chatbot, Conversational AI, Machine Learning, Deep Learning and Autonomous Vehicles appear at the peak in Gartner Hype Cycle of Emerging Technologies, 2017. • Reinforcement Learning and Artificial General Intelligence (AGI) has appeared at the starting points of hype cycle – they are expected to peak in coming years.
  • 5. Emergence of “AI Everywhere” Gartner recons AI as one of the three mega trends. AI technologies like Conversational UI, Machine Learning, Deep Learning and Cognitive Computing constitutes “AI Everywhere”
  • 6. What is Natural Language Processing? • Field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora – Wikipedia • Broadly categorized into two areas - ▪ Natural Language Understanding (NLU) ▪ Natural Language Generation (NLG) Natural Language Processing (NLP) Natural Language Understanding (NLU) Natural Language Generation (NLG)
  • 7. Some applications of NLP • Spell correction (MS Word/ any other editor) • Search engines (Google, Bing, Yahoo, wolfram alpha) • Speech engines (Siri, Google Voice, Cortana) • Personal Voice Assistants (Amazon Alexa, Google Home, …) • Spam classifiers (All e-mail services) • News feeds (Google, Yahoo!, and so on) • Machine translation (Google Translate, and so on) • Chatbots, Intelligent Virtual Agent/IVA • IBM Watson, Microsoft LUIS, Amazon Lex/Alexa
  • 8. NLP Tools & Libraries • GATE • Mallet (Java) • Open NLP – Apache (Java) • UIMA • CoreNLP - Stanford CoreNLP toolkit (Java) • Genism • Natural Language Toolkit / NLTK (Python) – by far the most popular NLP library & tool • spaCy (Python) – built on top of NLTK • TextBlob • Natural Library (JavaScript/Node) NLTK
  • 9. What is Natural • "Natural" is a general natural language processing library for nodejs. • Supports basic NLP tasks like tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, inflections • At the moment, most of the algorithms are English-specific • Created by Chris Umbel • Loosely based on NLTK (Python) NLP Library • https://github.com/NaturalNode/natural • http://www.chrisumbel.com/article/node_js_natural_language_porter_stemmer_lancaster_baye s_naive_metaphone_soundex
  • 10. Natural library install and setup • Install using npm (Package manager for Node), use –g switch (for global installation) • Include the Natural package through require npm install –g natural // include the natural library let Natural = require('natural');
  • 11. Tokenization • A word (Token) is the minimal unit that a machine can understand and process. • Tokenization is the process of splitting the raw string into meaningful tokens • Raw text cannot be further processed without going through tokenization. • Complexity of tokenization varies according to the need of the NLP application, and the complexity of the language itself. ▪ In English it can be as simple as choosing only words and numbers through a regular expression. But for Chinese and Japanese, it will be a very complex task. • Two primary types of tokenizers: ▪ Word Tokenizer: Tokenizes raw text to words ▪ Sentence Tokenizer: Tokenizes raw text to sentences
  • 12. Word Tokenizer • A word (Token) is the minimal unit that a machine can understand & process • Tokenization is the process of splitting the raw string into meaningful tokens – Tokenizer tokenizes or splits raw text into words • Natural comes with multiple tokenizers - ▪ Word Tokenizer: a tokenizer that divides a text into sequences of alphabetic and numeric characters. (Ignores punctuation) ▪ Word Punct Tokenizer: Word + punctuation tokenizer. A tokenizer that divides a text into sequences of alphabetic and non-alphabetic characters. ▪ Treebank Word Tokenizer: uses regular expressions to tokenize text as in Penn Treebank ▪ Regexp Tokenizer: Tokenizes text using regular expression patterns. ▪ Aggressive Tokenizer:
  • 13. Word Tokenizer (Cont’d) var sentence = "Hello, how are you? I don't know you!" var wordTokenizer = new Natural.WordTokenizer(); var tokens = wordTokenizer.tokenize(sentence); console.log(tokens); // prints [ 'Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ] var tokenizer = new Natural.WordPunctTokenizer(); var tokens = tokenizer.tokenize(sentence); console.log(tokens); // prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’, // 't’, 'know', 'you', '!' ] var tokenizer = new Natural. TreebankWordTokenizer(); var tokens = tokenizer.tokenize(sentence); console.log(tokens); // prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’, // 't’, 'know', 'you', '!' ] console.log(new Natural.AgressiveTokenizer().tokenize(sentence)); // prints ['Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ]
  • 14. Stemming • Process of reducing inflected or derived words to their word stem, base or root form. • Similar to cutting down the branches of a tree to its stem • More of a crude rule-based process by which we want to club together different variations of the token – rule based • Removes –s/es or -ing or -ed eating, eats, eaten, eat -> eat stopping, stopped, stops, stop -> stop ate -> ate (wrong should be eat)
  • 15. Stemming (Cont’d) • Different stemming algorithms - ▪ Lovins Stemmer - First published stemmer was written by Julie Beth Lovins in 1968. Lovins Stemmer is not used currently. ▪ Porter Stemmer - Written by Martin Porter and in July 1980. Very widely used and became the de facto standard algorithm used for English stemming. ▪ Lancaster Stemmer - Paice/Husk stemmer developed at Lancaster University. The stemmer, although remaining efficient and easily implemented, is known to be very strong and aggressive. The stemmer utilizes a single table of rules, each of which may specify the removal or replacement of an ending. ▪ Snowball Stemmer – Also called Porter2 stemmer, since this is an updated version of original Porter Stemmer. Natural does not support Snowball Stemmer • Lemmatization is a more robust and methodical way of combining grammatical variations to the root of a word. ▪ Natural does not support any Lemmatization algorithm. ▪ NLTK and other matured NLP libraries support Lemmatization
  • 16. Stemming – Porter Stemmer and Lancaster Stemmer var porterStemmer = Natural.PorterStemmer; console.log(porterStemmer.stem("ate")); // prints at console.log(porterStemmer.stem("eating")); // prints eat console.log(porterStemmer.stem("eats")); // prints eat console.log(porterStemmer.stem("eat")); // prints eat console.log(porterStemmer.stem("agreement")); // prints agreement var lancasterStemmer = Natural.LancasterStemmer; console.log(lancasterStemmer.stem("ate")); // prints at console.log(lancasterStemmer.stem("eating")); // prints eat console.log(lancasterStemmer.stem("eats")); // prints eat console.log(lancasterStemmer.stem("eat")); // prints eat console.log(lancasterStemmer.stem("agreement")); // prints agr • Natural supports Porter Stemmer and Lancaster Stemmer only. It does not support Snowball Stemmer. • Both the stemmers provide a stem method
  • 17. Stemming – Porter Stemmer (Non English languages) • Natural supports Porter Stemmer in Non English languages also • Following languages are supported - ▪ Farsi - PorterStemmerFa ▪ French - PorterStemmerFr ▪ Russian - PorterStemmerRu ▪ Spanish - PorterStemmerEs ▪ Italian - PorterStemmerIt ▪ PorterStemmerNo ▪ Swedish - PorterStemmerSv ▪ PorterStemmerPt
  • 18. Lemmatization • More methodical way of converting all the grammatical/inflected forms of the root of the word. • Uses context and part of speech to determine the inflected form of the word and applies different normalization rules for each part of speech to get the root word (lemma) • Natural NLP library does not support Lemmatization.
  • 19. Inflector • Inflectors are used to pluralize or singularize words • There are different types of Inflectors available in Natural Library ▪ Noun Inflector: pluralize or singularize nouns only ▪ Verb Inflector: Verbs can be pluralized/singularized with a Verb Inflector. Natural provides a inflector called PresentVerbInflector which works on Present Tense Verbs only ▪ Both noun and verb inflector provides singularize and pluralize methods ▪ Number or Count Inflector: Ordinal numbers could be formed from normal number ▪ Provides a single method called nth which returns the ordinal form of any number passed
  • 20. Inflector (Cont’d) // pluralize or singularize nouns only var nounInflector = new Natural.NounInflector(); console.log(nounInflector.pluralize("Book")); // prints Books console.log(nounInflector.pluralize("radius")); // prints radii console.log(nounInflector.singularize("flies")); // prints fly console.log(nounInflector.singularize("men")); // prints man var countInflector = Natural.CountInflector; console.log(countInflector.nth("1")); // prints 1st console.log(countInflector.nth("2")); // prints 2nd console.log(countInflector.nth("3")); // prints 3rd console.log(countInflector.nth("4")); // prints 4th console.log(countInflector.nth("10")); // prints 10th var verbInflector = new Natural.PresentVerbInflector(); console.log(verbInflector.singularize("go")); // prints goes console.log(verbInflector.singularize("run")); // prints runs console.log(verbInflector.pluralize("becomes")); // prints become console.log(verbInflector.pluralize("presents")); // prints present
  • 21. N-Grams • an n-gram is a contiguous sequence of n items from a given sample of text or speech. • The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. • When the items are words, n-grams may also be called shingles • An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram"; size 3 is a "trigram". • Larger sizes are sometimes referred to by the value of n in modern language, e.g., "four- gram", "five-gram", and so on. Hello how are you Hello how how are are you bigram Hello how are you Hello how are how are you trigram Hello how are you Hello unigram how are you
  • 22. N-Grams (Cont’d) var sentence = "Hello how are you"; var ngrams = Natural.NGrams; console.log(ngrams.bigrams(sentence)); // prints [ [ 'Hello', 'how' ], [ 'how', 'are' ], [ 'are', 'you' ] ] console.log(ngrams.trigrams(sentence)); // prints [ [ 'Hello', 'how', 'are' ], [ 'how', 'are', 'you' ] ] console.log(ngrams.ngrams(sentence, 1)); // unigram //prints [ [ 'Hello' ], [ 'how' ], [ 'are' ], [ 'you' ] ] sentence = "NLTK is a Natural Language Processing Library in Nodejs"; console.log(ngrams.ngrams(sentence, 4)); // four-gram prints [ [ 'NLTK', 'is', 'a', 'Natural' ], [ 'is', 'a', 'Natural', 'Language' ], [ 'a', 'Natural', 'Language', 'Processing' ], [ 'Natural', 'Language', 'Processing', 'Library' ], [ 'Language', 'Processing', 'Library', 'in' ], [ 'Processing', 'Library', 'in', 'Nodejs' ] ]
  • 23. Phonetics • A phonetic algorithm is an algorithm for indexing of words by their pronunciation. • A phonetic matching algorithm is an algorithm that matches word by their pronunciation rather than spelling. • Most phonetic algorithms were developed for use with the English language. Consequently, applying the rules to words in other languages might not give a meaningful result. • Some of the well known phonetics algorithms are – ▪ Soundex - Developed to encode surnames for use in censuses. Soundex codes are four- character strings composed of a single letter followed by three numbers. ▪ Daitch–Mokotoff Soundex - Refinement of Soundex designed to better match surnames of Slavic & Germanic origin. Daitch–Mokotoff Soundex codes are strings composed of six numeric digits. ▪ Cologne phonetics - Similar to Soundex, but more suitable for German words. ▪ Metaphone, Double Metaphone, and Metaphone 3 - Suitable for use with most English words, not just names. Metaphone algorithms are basis for many popular spell checkers. ▪ New York State Identification and Intelligence System (NYSIIS) - Maps similar phonemes to the same letter. The result is a string that can be pronounced by the reader without decoding. ▪ Match Rating Approach developed by Western Airlines in 1977 - this algorithm has an encoding and range comparison technique. ▪ Caverphone, created to assist in data matching between late 19th century and early 20th century electoral rolls, optimized for accents present in parts of New Zealand.
  • 24. Phonetics Matching (Cont’d) • Natural supports Phonetic Matching using three algorithms – ▪ SoundEx ▪ Metaphone ▪ DoubleMetaphone var metaphone = Natural.Metaphone; var soundex = Natural.SoundEx; var doubleMetaphone = Natural.DoubleMetaphone; // using SoundEx for phonetic matching console.log(soundex.compare("nuremberg", "nuremburg")); // returns true console.log(soundex.compare("Paris", "Pari")); // returns false // using Metaphone for phonetic matching console.log(metaphone.compare("Fool", "Full")); // returns true console.log(metaphone.compare("Fool", "Failed")); // returns false // using Double Metaphone for phonetic matching console.log(doubleMetaphone.compare("Bangalore", "Bengaluru")); // returns true console.log(doubleMetaphone.compare("Mumbai", "Bombay")); // returns false
  • 25. String Distance • String Distance measures how closely two strings match. • Natural provides JaroWinkler Distance and Levenshtein Distance algorithms for String Distance match JaroWinkler Distance • Jaro distance between two words is the minimum number of single-character transpositions required to change one word into the other. • It is a variant proposed in 1990 by William E. Winkler of the Jaro distance metric (1989, Matthew A. Jaro). • Returns a number between 0 and 1 which tells how closely the strings match (0 = no match, 1 = exact match) // Using JaroWrinkler Distance algorithm console.log(Natural.JaroWinklerDistance("Hello", "Hello")); // returns 1: exact match console.log(Natural.JaroWinklerDistance("Me", "You")); // returns 0: no match console.log(Natural.JaroWinklerDistance("Bangalore", "Bengaluru")); // returns 0.72: partial match console.log(Natural.JaroWinklerDistance("Mumbai", "Bombay")); // returns 0.66: partial match
  • 26. String Distance - Levenstein Distance • Levenstein Distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. • Named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965 • Also be referred as edit distance // Using Levenshtein Distance algorithm console.log(Natural.LevenshteinDistance("Hello", "Hello")); // 0 console.log(Natural.LevenshteinDistance("Bangalore", "Bengaluru")); // 3 console.log(Natural.LevenshteinDistance("Mumbai", "Bombay")); // 3 console.log(Natural.LevenshteinDistance("Chennai", "Madras")); // 6 console.log(Natural.LevenshteinDistance("Nuremberg", "Nuremburg")); // 1 B a n g a l o r e B e n g a l u r u 3 character change N u r e m b e r g N u r e m b u r g 1 character change
  • 27. tf-idf • tf–idf or TFIDF is short for term frequency - inverse document frequency • tf-idf determines how important a word (or words) is to a document relative to a corpus. • Often used as weighting factor in searches of information retrieval, text mining & user modeling. • The tf-idf value increases proportionally to the number of times a word appears in the document and is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general. • tfidf method returns the measure of importance of a word var tfidf = new Natural.TfIdf(); // Documents could be added to tf-idf. Here only a single doc is added, but more could be added tfidf.addDocument("this document is about node. Its also about NLP. Node is used for it"); // Find out the tf-idf of different words in the document console.log(tfidf.tfidf("node", 0)); // prints 0.61 as node appears multiple times in the doc console.log(tfidf.tfidf("NLP", 0)); // prints 0.30 as NLP appears only single time console.log(tfidf.tfidf("ruby", 0)); // prints 0 as ruby does not appear in the doc console.log(tfidf.listTerms(0)); [ { term: 'node', tfidf: 0.6137056388801094 }, { term: 'document', tfidf: 0.3068528194400547 }, { term: 'nlp', tfidf: 0.3068528194400547 }, { term: 'used', tfidf: 0.3068528194400547 } ]
  • 28. tf-idf (cont’d) • Disc files could also be added to tf-idf • Multiple documents could be added to tf-idf var tfidf = new Natural.TfIdf(); // Adding files from disc to tfidf tfidf.addFileSync("C:/Data/Profile.txt"); console.log(tfidf.listTerms(0)); // Multiple documents added to tdidf which forms the entire corpus tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it'); tfidf.addDocument('this document is about ruby.'); tfidf.addDocument('this document is about ruby and node.'); console.log(tfidf.tfidf("node", 0)); // prints 2 console.log(tfidf.tfidf("NLP", 0)); // prints 1.40 console.log(tfidf.tfidf("ruby", 0)); // prints 0 console.log(tfidf.tfidf("node", 1)); // prints 0 as node does not appear in 2nd doc console.log(tfidf.tfidf("ruby", 1)); // prints 1 as ruby appears in 2nd doc console.log(tfidf.tfidf("node", 2)); // prints 1 as node appears in 3rd doc console.log(tfidf.tfidf("ruby", 2)); // prints 1 as ruby appears in 3rd doc
  • 29. tf-idf (cont’d) • tfidf method returns the measure of importance of a word in various documents • tfidf method accepts the word and a callback // Multiple documents added to tdidf which forms the entire corpus tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it'); tfidf.addDocument('this document is about ruby.'); tfidf.addDocument('this document is about ruby and node.’); // tfidfs method is used to find the importance of the word across multiple documents tfidf.tfidfs('node', function(ctr, measure){ console.log('tf-idf of node in document #' + ctr + ' is ' + measure); });
  • 30. POS (Part of Speech) Tagging • Process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. • Also called grammatical tagging or word-category disambiguation,
  • 31. POS (Part of Speech) Tagging • Current state of the art POS tagging algorithms can predict the POS of the given word with a higher degree of precision (that is approximately 97%). But still lots of research going on in the area of POS tagging. No Tag Description 1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition or subordinating conjunction 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10. LS List item marker 11. MD Modal 12. NN Noun, singular or mass 13. NNS Noun, plural 14. NNP Proper noun, singular 15. NNPS Proper noun, plural 16. PDT Predeterminer 17. POS Possessive ending 18. PRP Personal pronoun No Tag Description 19. PRP$ Possessive pronoun 20. RB Adverb 21. RBR Adverb, comparative 22. RBS Adverb, superlative 23. RP Particle 24. SYM Symbol 25. TO to 26. UH Interjection 27. VB Verb, base form 28. VBD Verb, past tense 29. VBG Verb, gerund or present participle 30. VBN Verb, past participle 31. VBP Verb, non-3rd person singular present 32. VBZ Verb, 3rd person singular present 33. WDT Wh-determiner 34. WP Wh-pronoun 35. WP$ Possessive wh-pronoun 36. WRB Wh-adverb
  • 32. POS Tagging – Brill POS Tagger • Natural supports POS tagging through Brill POS Tagger that implements Eric Brill's transformational algorithm (transformation rules are specified in external files). • E. Brill's tagger, most widely used English POS-taggers, employs rule-based algorithms. // Path where natural library is located var baseFolder = path.join(path.dirname(require.resolve("natural")), "brill_pos_tagger"); // Rules file located in /data/<language> sub folder under natural library var rulesFilename = baseFolder + "/data/English/tr_from_posjs.txt"; // Lexicon file located in /data/<language> sub folder under natural library var lexiconFilename = baseFolder + "/data/English/lexicon_from_posjs.json"; var defaultCategory = 'N'; var lexicon = new Natural.Lexicon(lexiconFilename, defaultCategory); var rules = new Natural.RuleSet(rulesFilename); // Any tagger needs lexicon and rules for successful POS tagging of words // Brill POS Tagger object is created passing lexicon file and rules file location var tagger = new Natural.BrillPOSTagger(lexicon, rules); var sentence = "I see the man with the telescope"; var tokenizer = new Natural.WordTokenizer(); // tokenize the sentence to tokens var tokens = tokenizer.tokenize(sentence); console.log(tagger.tag(tokens)); [ [ 'I', 'NN' ], [ 'see', 'VB' ], [ 'the', 'DT' ], [ 'man', 'NN' ], [ 'with', 'IN' ], [ 'the', 'DT' ], [ 'telescope', 'NN' ] ]