SlideShare a Scribd company logo
1 of 24
Download to read offline
How to parse ‘go’
Natural Language Processing in Ruby
Tom Cartwright
@tomcartwrightuk
!

keepmebooked
giveaiddirect.com
Python, surely?
Yes. The NLTK is awesome.
But you have a Ruby-based app.
Extracting meaning from !
human input
Summarisation
Extracting entities
Tagging text
Sentiment analysis
Filtering text
document

sentence

From document level!
!
!
!
!

word

example

to word level
document

sentence

word

example

Chunking & segmenting
Breaking text into paragraphs, sentences and other zones
Start with a document/some text:
“The second nonabsolute number is the given time of
arrival, which is now known to be one of those most bizarre
of mathematical concepts, a recipriversexclusion, a number
whose existence can only be defined as being anything other
than itself…..”
document

sentence

word

Punkt sentence tokenizer to the rescue….

example
document

sentence

word

example

tokenizer = Punkt::SentenceTokenizer.new(!
"The second nonabsolute number is the given time
of arrival...")!
!

result = !
tokenizer.sentences_from_text(text,!
:output => :sentences_text)!
!
!
!
document

sentence

word

example

Training

trainer = Punkt::Trainer.new()!
trainer.train(bistromatic_text)
document

sentence

word

example

Tokenising
Breaking text into words, phrases and symbols.
“Time is an illusion. Lunchtime
doubly so.”.split(“ “)!
!

#=> !
!

[“Time", “is", “an", “illusion.”,
“Lunchtime", “doubly", “so.”]!
document

sentence

word

example

Tokenizer gem
Regexes and rules
class Tokenizer	
	
FS = Regexp.new(‘[[:blank:]]+')	
PAIR_PRE = ['(', '{', '[']	
SIMPLE_POST = ['!', '?', ',', ':', ';', '.']	
PAIR_POST = [')', '}', ']']	
PRE_N_POST = ['"', “'"]	
…
document

sentence

word

tokenizer = Tokenizer::Tokenizer.new
tokenizer.tokenize(“Time is an
illusion. Lunchtime doubly so.”)

#=>

[“Time", “is", “an", “illusion", “.”,
“Lunchtime", “doubly", “so", “.”]

example
document

sentence

word

example

Stemming
Jogging => Jog
“jogging”.gsub(/.ing/, “”) !
#=> “jog"!
!

“bring”.gsub(/.ing/, “”) !
#=> “b"
document

sentence

1. Ruby-Stemmer
2. Text

word

example

multi-language porter stemmer

porter stemmer

stemmer = Lingua::Stemmer.new(:language => "en")
stemmer.stem("programming") #=> program
stemmer.stem("vimming") #=> vim
document

sentence

word

example

Parts-of-speech tagging
CC

conjunction

DET

determiner

and, but
this, some

IN

preposition / conjunction

JJ

adjective

NNP

above, about

orange, tiny

proper noun

Camden Pale Ale
document

sentence

word

A couple of methods!
!

Regex tagger
/*.ing/
VBG
/*.ed/

VBD
!

Lookup on words
E.g.
calculating : { VBG: 6 }
orange: { JJ: 2, NN: 5 }

example
document

sentence

word

example

A tale of two taggers
EngTagger

rb-brill-tagger

Probabilistic (uses

•

Rule based

look up table prev.

•

•

C extensions

slide)
•

Brown corpus trained

•

Pure ruby
document

sentence

word

example

Treat gem
Bundles many of the gems shown
Wraps them in a DSL
s = sentence(“A really good sentence.”)
s.do(:chunk, :segment, :tokenize, :parse)

stemming; tokenising; chunking; serialising;
tagging; text extraction from pdfs and html;
LRUG Sentiments
A tag

{NN}

Pass in regex => /({JJ}|{JJS})({NNS}|{NNP})/
And some tagged tokens
#=> [(Word @tag="JJ", @text="jolly"),!
(Word @tag="NN", @text="face")]
Sentimental value
1.0
!
1.0
0.21875
0.21875
-1.0
-1.0

epic!
good!
chance!
brisk!
slanderous!
piteous
Results
!
!
!
•
•

•
•
•

Ruby!
Practical ObjectOriented Design in
Ruby!
Doctors!
Lrug!
recruiters (!)

•
•
•

dedicated servers!
pdfs!
Surrey

•

•
•
•
•

unsolicited phone
calls from
r********s!
clients!
Paypal!
XML!
geeks
Gems
Text - Paul Battley’s box of tricks
Treat
Tokenizer
Punkt segmenter
Chronic - for extracting dates
Other things you can do/I didn’t talk about
Calculate text edit distance
Extract entities using the Stanford
libraries via the RJB
!

Extract topic words (LDA)
!

Keyword extraction - TfIdf
!

Jruby
Thank you for processing.
Questions?
@tomcartwrightuk

Thanks to Tim Cowlishaw and the HT dev
team for specialised rubber duck support

More Related Content

What's hot

Ruby Introduction
Ruby IntroductionRuby Introduction
Ruby Introduction
Prabu D
 
Programming languages vienna
Programming languages viennaProgramming languages vienna
Programming languages vienna
greg_s
 

What's hot (17)

Ruby Introduction
Ruby IntroductionRuby Introduction
Ruby Introduction
 
Etymology Markup in TEI XML
Etymology Markup in TEI XMLEtymology Markup in TEI XML
Etymology Markup in TEI XML
 
Ruby Hell Yeah
Ruby Hell YeahRuby Hell Yeah
Ruby Hell Yeah
 
NLP new words
NLP new wordsNLP new words
NLP new words
 
Week2
Week2Week2
Week2
 
Semana Interop: Trabalhando com IronPython e com Ironruby
Semana Interop: Trabalhando com IronPython e com IronrubySemana Interop: Trabalhando com IronPython e com Ironruby
Semana Interop: Trabalhando com IronPython e com Ironruby
 
Programming languages vienna
Programming languages viennaProgramming languages vienna
Programming languages vienna
 
Ruby monsters
Ruby monstersRuby monsters
Ruby monsters
 
Python2 unicode-pt1
Python2 unicode-pt1Python2 unicode-pt1
Python2 unicode-pt1
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
Kotlin L → ∞
Kotlin L → ∞Kotlin L → ∞
Kotlin L → ∞
 
Ruby
RubyRuby
Ruby
 
Go programing language
Go programing languageGo programing language
Go programing language
 
Intro to NLP. Lecture 2
Intro to NLP.  Lecture 2Intro to NLP.  Lecture 2
Intro to NLP. Lecture 2
 
Ruby Presentation
Ruby Presentation Ruby Presentation
Ruby Presentation
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
 

Viewers also liked

Viewers also liked (13)

Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Python
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
PG-Strom
PG-StromPG-Strom
PG-Strom
 
Google guava - almost everything you need to know
Google guava - almost everything you need to knowGoogle guava - almost everything you need to know
Google guava - almost everything you need to know
 
Patient matching in FHIR
Patient matching in FHIRPatient matching in FHIR
Patient matching in FHIR
 
Procesamiento de Lenguaje Natural, Python y NLTK
Procesamiento de Lenguaje Natural, Python y NLTKProcesamiento de Lenguaje Natural, Python y NLTK
Procesamiento de Lenguaje Natural, Python y NLTK
 
Evolution of Software Engineering in NCTR Projects
Evolution of Software Engineering in NCTR  Projects   Evolution of Software Engineering in NCTR  Projects
Evolution of Software Engineering in NCTR Projects
 
Codeception Testing Framework -- English #phpkansai
Codeception Testing Framework -- English #phpkansaiCodeception Testing Framework -- English #phpkansai
Codeception Testing Framework -- English #phpkansai
 
A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...
A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...
A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...
 
Google guava overview
Google guava overviewGoogle guava overview
Google guava overview
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
 
Building a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneBuilding a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and Lucene
 

Similar to Natural Language Processing in Ruby

Javascriptbootcamp
JavascriptbootcampJavascriptbootcamp
Javascriptbootcamp
oscon2007
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
Gopi Krishnan Nambiar
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
oscon2007
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02
Ramamohan Chokkam
 
Ruby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic IntroductionRuby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic Introduction
Prabu D
 

Similar to Natural Language Processing in Ruby (20)

TechDays - IronRuby
TechDays - IronRubyTechDays - IronRuby
TechDays - IronRuby
 
Javascriptbootcamp
JavascriptbootcampJavascriptbootcamp
Javascriptbootcamp
 
Modern C++
Modern C++Modern C++
Modern C++
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02
 
CL-NLP
CL-NLPCL-NLP
CL-NLP
 
Embed--Basic PERL XS
Embed--Basic PERL XSEmbed--Basic PERL XS
Embed--Basic PERL XS
 
The Holistic Programmer
The Holistic ProgrammerThe Holistic Programmer
The Holistic Programmer
 
Streams of information - Chicago crystal language monthly meetup
Streams of information - Chicago crystal language monthly meetupStreams of information - Chicago crystal language monthly meetup
Streams of information - Chicago crystal language monthly meetup
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
 
Ruby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic IntroductionRuby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic Introduction
 
Words in Code
Words in CodeWords in Code
Words in Code
 
Go language presentation
Go language presentationGo language presentation
Go language presentation
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Beyond the Style Guides
Beyond the Style GuidesBeyond the Style Guides
Beyond the Style Guides
 
Build a compiler using C#, Irony and RunSharp.
Build a compiler using C#, Irony and RunSharp.Build a compiler using C#, Irony and RunSharp.
Build a compiler using C#, Irony and RunSharp.
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 

Recently uploaded (20)

Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 

Natural Language Processing in Ruby