• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Natural Language Processing in Ruby
 

Natural Language Processing in Ruby

on

  • 801 views

An introduction to performing natural language processing (NLP) tasks in Ruby. Video is here: https://skillsmatter.com/skillscasts/4883-how-to-parse-go#video

An introduction to performing natural language processing (NLP) tasks in Ruby. Video is here: https://skillsmatter.com/skillscasts/4883-how-to-parse-go#video

Statistics

Views

Total Views
801
Views on SlideShare
801
Embed Views
0

Actions

Likes
1
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Natural Language Processing in Ruby Natural Language Processing in Ruby Presentation Transcript

    • How to parse ‘go’ Natural Language Processing in Ruby Tom Cartwright @tomcartwrightuk ! keepmebooked giveaiddirect.com
    • Python, surely? Yes. The NLTK is awesome. But you have a Ruby-based app.
    • Extracting meaning from ! human input Summarisation Extracting entities Tagging text Sentiment analysis Filtering text
    • document sentence From document level! ! ! ! ! word example to word level
    • document sentence word example Chunking & segmenting Breaking text into paragraphs, sentences and other zones Start with a document/some text: “The second nonabsolute number is the given time of arrival, which is now known to be one of those most bizarre of mathematical concepts, a recipriversexclusion, a number whose existence can only be defined as being anything other than itself…..”
    • document sentence word Punkt sentence tokenizer to the rescue…. example
    • document sentence word example tokenizer = Punkt::SentenceTokenizer.new(! "The second nonabsolute number is the given time of arrival...")! ! result = ! tokenizer.sentences_from_text(text,! :output => :sentences_text)! ! ! !
    • document sentence word example Training trainer = Punkt::Trainer.new()! trainer.train(bistromatic_text)
    • document sentence word example Tokenising Breaking text into words, phrases and symbols. “Time is an illusion. Lunchtime doubly so.”.split(“ “)! ! #=> ! ! [“Time", “is", “an", “illusion.”, “Lunchtime", “doubly", “so.”]!
    • document sentence word example Tokenizer gem Regexes and rules class Tokenizer FS = Regexp.new(‘[[:blank:]]+') PAIR_PRE = ['(', '{', '['] SIMPLE_POST = ['!', '?', ',', ':', ';', '.'] PAIR_POST = [')', '}', ']'] PRE_N_POST = ['"', “'"] …
    • document sentence word tokenizer = Tokenizer::Tokenizer.new tokenizer.tokenize(“Time is an illusion. Lunchtime doubly so.”) #=> [“Time", “is", “an", “illusion", “.”, “Lunchtime", “doubly", “so", “.”] example
    • document sentence word example Stemming Jogging => Jog “jogging”.gsub(/.ing/, “”) ! #=> “jog"! ! “bring”.gsub(/.ing/, “”) ! #=> “b"
    • document sentence 1. Ruby-Stemmer 2. Text word example multi-language porter stemmer porter stemmer stemmer = Lingua::Stemmer.new(:language => "en") stemmer.stem("programming") #=> program stemmer.stem("vimming") #=> vim
    • document sentence word example Parts-of-speech tagging CC conjunction DET determiner and, but this, some IN preposition / conjunction JJ adjective NNP above, about orange, tiny proper noun Camden Pale Ale
    • document sentence word A couple of methods! ! Regex tagger /*.ing/ VBG /*.ed/ VBD ! Lookup on words E.g. calculating : { VBG: 6 } orange: { JJ: 2, NN: 5 } example
    • document sentence word example A tale of two taggers EngTagger rb-brill-tagger Probabilistic (uses • Rule based look up table prev. • • C extensions slide) • Brown corpus trained • Pure ruby
    • document sentence word example Treat gem Bundles many of the gems shown Wraps them in a DSL s = sentence(“A really good sentence.”) s.do(:chunk, :segment, :tokenize, :parse) stemming; tokenising; chunking; serialising; tagging; text extraction from pdfs and html;
    • LRUG Sentiments A tag {NN} Pass in regex => /({JJ}|{JJS})({NNS}|{NNP})/ And some tagged tokens #=> [(Word @tag="JJ", @text="jolly"),! (Word @tag="NN", @text="face")]
    • Sentimental value 1.0 ! 1.0 0.21875 0.21875 -1.0 -1.0 epic! good! chance! brisk! slanderous! piteous
    • Results ! ! ! • • • • • Ruby! Practical ObjectOriented Design in Ruby! Doctors! Lrug! recruiters (!) • • • dedicated servers! pdfs! Surrey • • • • • unsolicited phone calls from r********s! clients! Paypal! XML! geeks
    • Gems Text - Paul Battley’s box of tricks Treat Tokenizer Punkt segmenter Chronic - for extracting dates
    • Other things you can do/I didn’t talk about Calculate text edit distance Extract entities using the Stanford libraries via the RJB ! Extract topic words (LDA) ! Keyword extraction - TfIdf ! Jruby
    • Thank you for processing. Questions? @tomcartwrightuk Thanks to Tim Cowlishaw and the HT dev team for specialised rubber duck support