CLTL
Software and Web
Services
Rubén Izquierdo Beviá
Rubén Izquierdo Beviá
About me
 5-year degree on Computer Science (University of
Alicante, Alicante, Spain)

 National N...
CLTL software
 In general common input/output format
 KAF
 NAF, as an extension of KAF

 Single components performing ...
KAF
Kyoto Annotation Format
 Stand-off, layered, XML-based representation format





Different types of information ...
KAF
Kyoto Annotation Format
NAF
NewsReader Annotation Format
 Extension of KAF

 Allow the cross-document processing
 Event coreference

 ID’s are...
How the software is provided I
 All modules are publicly available on GitHub
 CLTL GitHub
 http://github.com/cltl

 Ne...
How the software is provided
II
 Some are available as Web Services
 Exposed as REST web services
 Accept and input str...
How the software is provided
II
How the software is provided
II
Our software I
 General modules (integrated)
 Tokenizers: whitespace based, open-nlp trained...
 Sentence splitters: ba...
Our software II
 General modules (developed by us)
 Wordnet Tools
 Functions to use a WordNet in LMF format

 Word Sen...
Our software III
 General modules (developed by us)
 Named Entity Recognizer
 Detects dates and locations using specifi...
Our software IV
 OpeNER related (developed by us)
 Hotel property tagger
 Detect aspects related with
cleanliness, staf...
Our software V
 NewsReader related (developed by us)
 Discourse Module
 Splits incoming texts into headers and paragrap...
CLTL
Software and Web
Services
Rubén Izquierdo Beviá
Upcoming SlideShare
Loading in...5
×

CLTL: Description of web services and sofware. Nijmegen 2013

223
-1

Published on

CLTL: Description of web services and sofware. Nijmegen 2013

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
223
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CLTL: Description of web services and sofware. Nijmegen 2013

  1. 1. CLTL Software and Web Services Rubén Izquierdo Beviá
  2. 2. Rubén Izquierdo Beviá About me  5-year degree on Computer Science (University of Alicante, Alicante, Spain)  National NLP projects and 1 European project (QALLME) (University of Alicante, Alicante, Spain)  Thesis about NLP & Word Sense Disambiguation (University of Alicante, Alicante, Spain. Sept 2010)  Postdoc position at DutchSemCor Project (University of Tilburg, Tilburg. Sept 2011-Sept2012)  Postdoc position at OpeNER Project (Vrije University, Amsterdam. Sept 2012-)
  3. 3. CLTL software  In general common input/output format  KAF  NAF, as an extension of KAF  Single components performing single tasks  Integration of existing modules  Adaptation of input/output formats  Development of new ones
  4. 4. KAF Kyoto Annotation Format  Stand-off, layered, XML-based representation format     Different types of information are stored in different layers Layers are linked by means of references Suitable for creating pipelines based on this format Layers:  Text  tokens  Term  lemmas, part-of-speech, term sentiment, word senses  Entities, chunks, opinions…
  5. 5. KAF Kyoto Annotation Format
  6. 6. NAF NewsReader Annotation Format  Extension of KAF  Allow the cross-document processing  Event coreference  ID’s are converted into valid URI’s  Store the same type of information provided by different tools  Result of two different pos-taggers
  7. 7. How the software is provided I  All modules are publicly available on GitHub  CLTL GitHub  http://github.com/cltl  NewsReader GitHub  http://github.com/newsreader  OpeNER GitHub  http://github.com/opener-project/
  8. 8. How the software is provided II  Some are available as Web Services  Exposed as REST web services  Accept and input stream (KAF/NAF)  Generate an output stream (KAF/NAF)  Easy to call from command line with CURL  Easy to create module pipelines in the same way you create a linux commands pipeline  http://wordpress.let.vupr.nl/web-services/
  9. 9. How the software is provided II
  10. 10. How the software is provided II
  11. 11. Our software I  General modules (integrated)  Tokenizers: whitespace based, open-nlp trained...  Sentence splitters: based on rules, open-nlp  Pos-taggers: treetagger, open-nlp pos taggers  Chunker: trained on Alpino data with open-nlp  Parsers: Alpino (nl), Stanford (en)
  12. 12. Our software II  General modules (developed by us)  Wordnet Tools  Functions to use a WordNet in LMF format  Word Sense Disambiguation systems  UKB: unsupersived  SVM: supervised (for nl derived from DutchSemcor)  Multiword tagger  multiword sequences of terms according the WordNet  OntoTagger  Ontotagger inserts (semantic) labels into KAF representation on the basis of lemma or wordnet synset representations of text
  13. 13. Our software III  General modules (developed by us)  Named Entity Recognizer  Detects dates and locations using specific resources + GeoNames  KyBot  Extract tuples and relations from a set of profiles formulated using semantic and structural properties
  14. 14. Our software IV  OpeNER related (developed by us)  Hotel property tagger  Detect aspects related with cleanliness, staff, breakfast, rooms…  Term polarity tagger  Positive/negative terms, intensifiers, negators …  Opinion miner  Detect opinions: target + holder + expression  2 rule based version // 1 machine learning version
  15. 15. Our software V  NewsReader related (developed by us)  Discourse Module  Splits incoming texts into headers and paragraphs  Factuality Classifier  Classifies whether a statement is factual/probable/possible or not  Event Coreference  Compares descriptions of events within and across documents to decide if they refer to the same events.
  16. 16. CLTL Software and Web Services Rubén Izquierdo Beviá
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×