Natural Language Processing with Perl




G Jaganadh
C-DAC Thiruvananthapuram


           FossConf 2008 Chennai
Talk Overview
Introduction

Natural Language Processing

Perl

Perl Lingua Modules

Some examples

Towards future



                 FossConf 2008 Chennai
Introduction


•Objectives   of the talk
    Introducing NLP techniques for Language Researchers




                       FossConf 2008 Chennai
Natural Language
Processing



Introduction to NLP

Sub fields in NLP




                    FossConf 2008 Chennai
Perl


•Practical   Extraction and Report Language

 Free and Open Source

 Easy to Learn

 Powerful regular Expressions for text searching




                       FossConf 2008 Chennai
Perl Lingua Modules


Perl Modules for Linguistic Processing

All most all modules are for English Dutch and other

European Languages

Powerful implementation of different NLP algorithms




                  FossConf 2008 Chennai
Some Examples


Counting words in a text

Pattern Matching

Use of Lingua::EN::Sentence

Use of Lingua::EN::NamedEntity




                   FossConf 2008 Chennai
Counting words
$text = <>;
while ($line = <>) {
$text .= $line;
}
#$text =~ tr/a-z��������A-Z���������n/cs;
@words = split(/n/, $text);
for ($i = 0; $i <= $#words; $i++) {
      if (!exists($frequency{$words[$i]})) {
             $frequency{$words[$i]} = 1;
      } else {
             $frequency{$words[$i]}++;
      }
}
foreach $word (sort keys %frequency){
      print "$frequency{$word} $wordn";
}


                            FossConf 2008 Chennai
Lingua::EN::Sentence

#!/usr/local/bin/perl -w
use Lingua::EN::Sentence qw( get_sentences add_acronyms );
## adding support for abbreviations
add_acronyms('lt','gen');
$/ = "nn";

while(<>) {
  $sentences=get_sentences($_);
  foreach $s (@$sentences) {
     print "<s> $s </s>n";
  }
}



                          FossConf 2008 Chennai
Lingua::EN::NamedEntity

#!/usr/bin/perl
use strict;
use Lingua::EN::NamedEntity;
while (<>) {
my $str = join 'n',<>;
#my $str = join 'n',<INP>;
my @entities = extract_entities($str);
foreach my $entity (@entities) {
     print $entity->{entity},"n";
          }
}




                           FossConf 2008 Chennai
Pattern Matching

while ($line = <>) {

     if ($line =~ m/_____/ ) {

         print $line ;

     }

}




                            FossConf 2008 Chennai
Toward future

 Lingua Modules for Indian Languages

 Useful Stuff
•http://search.cpan.org/search?query=Lingua&mode=all

 http://wiki.christophchamp.com/index.php/Perl/Modules/Lingu




                   FossConf 2008 Chennai
Question ?




FossConf 2008 Chennai
Thanks
jaganadhg@gmail.com




 FossConf 2008 Chennai

Natural Language Processing with Per