Using Suffix Arrays forEfficient Recognition of Named Entities             in Large Scale             Benjamin Adrian,    ...
A huge Web of Data                                                 The Semantic Web offers                                ...
Wouldnt it be great to … ?… to link entity references in text to referents in RDF graphs.     Benjamin   works at DFKI,   ...
How to recognize entity references ?natural language text   efficient representation           RDF source      Benjamin   ...
Entity Recognition Processtext   noun-phrase   suffix array             hashes              database       RDF graph      ...
RDF statements        <#19810211>     <rdfs:label>           “Benjamin Adrian”symbols <#67478302>     <rdfs:label>        ...
Represent RDF data                sepatarate storage of symbols and relations              SYMBOLS                        ...
Suffix ArrayText                     “Benjamin Adrian works in DFKI, Kaiserslautern”Suffix array (sorted list of suffixes)...
Suffix ArrayText                     “Benjamin Adrian works in DFKI, Kaiserslautern”Suffix array (sorted list of suffixes)...
Noun phrases in natural languagetext                    Benjamin Adrian, Sven Schwarz   10                    http://www.d...
Hashing prefixesSuffix array (hashed prefix size = 4)                                                                  LIT...
Select candidates from database                     Benjamin Adrian, Sven Schwarz   12                     http://www.dfki...
Response time                Benjamin Adrian, Sven Schwarz   13                http://www.dfki.de/~lastname
Summarytext   noun-phrase   suffix array             hashes              database       RDF graph        chunking         ...
Thank youBenjamin Adrian        Questions?Sven Schwarz           Benjamin Adrian, Sven Schwarz   15           http://www.d...
Upcoming SlideShare
Loading in …5
×

Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

1,263 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,263
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

  1. 1. Using Suffix Arrays forEfficient Recognition of Named Entities in Large Scale Benjamin Adrian, Sven Schwarz Benjamin Adrian, Sven Schwarz http://www.dfki.de/~lastname
  2. 2. A huge Web of Data The Semantic Web offers techniques for ... ● representing, ● formalizing, ● and reasoning information … on the WWW in order to make information ... ● transferable, ● portable, ● and interpretable … for machine consumption.∑ 9,363,625 distinct literal values Benjamin Adrian, Sven Schwarz 2 http://www.dfki.de/~lastname
  3. 3. Wouldnt it be great to … ?… to link entity references in text to referents in RDF graphs. Benjamin  works at DFKI,  Kaiserslautern. Goal: Enrich natural language text with formal facts. Benjamin Adrian, Sven Schwarz 3 http://www.dfki.de/~lastname
  4. 4. How to recognize entity references ?natural language text efficient representation RDF source Benjamin  works at DFKI,  Kaiserslautern. → application of relational databases and suffix arrays Benjamin Adrian, Sven Schwarz 4 http://www.dfki.de/~lastname
  5. 5. Entity Recognition Processtext noun-phrase suffix array hashes database RDF graph chunking prefix hashing query candidates with matching prefixes exact match exact matches Benjamin Adrian, Sven Schwarz 5 http://www.dfki.de/~lastname
  6. 6. RDF statements <#19810211> <rdfs:label> “Benjamin Adrian”symbols <#67478302> <rdfs:label> “DFKI” relation <#19810211> <#employedAt> <#67478302> Benjamin Adrian, Sven Schwarz 6 http://www.dfki.de/~lastname
  7. 7. Represent RDF data sepatarate storage of symbols and relations SYMBOLS RELATIONS SUBJECT PREDICATE OBJECT SUBJECT PREDICATE OBJECT RESOURCE INDEX LITERAL INDEX URI INDEX INDEX LITERAL HASH dictionaries Benjamin Adrian, Sven Schwarz 7 http://www.dfki.de/~lastname
  8. 8. Suffix ArrayText “Benjamin Adrian works in DFKI, Kaiserslautern”Suffix array (sorted list of suffixes) Adrian works in DFKI, Kaiserslautern Benjamin Adrian works in DFKI, Kaiserslautern DFKI, Kaiserslautern in DFKI, Kaiserslautern Kaiserslautern works in DFKI, Kaiserslautern Benjamin Adrian, Sven Schwarz 8 http://www.dfki.de/~lastname
  9. 9. Suffix ArrayText “Benjamin Adrian works in DFKI, Kaiserslautern”Suffix array (sorted list of suffixes) Adrian works in DFKI, Kaiserslautern Benjamin Adrian works in DFKI, Kaiserslautern DFKI, Kaiserslautern in DFKI, Kaiserslautern Kaiserslautern works in DFKI, KaiserslauternPhrases in text Reduced suffix array Adrian works in DFKI, Kaiserslautern Benjamin Adrian Benjamin Adrian works in DFKI, Kaiserslautern DFKI DFKI, Kaiserslautern Kaiserslautern Kaiserslautern Benjamin Adrian, Sven Schwarz 9 http://www.dfki.de/~lastname
  10. 10. Noun phrases in natural languagetext Benjamin Adrian, Sven Schwarz 10 http://www.dfki.de/~lastname
  11. 11. Hashing prefixesSuffix array (hashed prefix size = 4) LITERAL INDEX Adrian works in DFKI, Kaiserslautern Benjamin Adrian works in DFKI, Kaiserslautern INDEX LITERAL HASH DFKI, Kaiserslautern Kaiserslautern Benjamin Adrian, Sven Schwarz 11 http://www.dfki.de/~lastname
  12. 12. Select candidates from database Benjamin Adrian, Sven Schwarz 12 http://www.dfki.de/~lastname
  13. 13. Response time Benjamin Adrian, Sven Schwarz 13 http://www.dfki.de/~lastname
  14. 14. Summarytext noun-phrase suffix array hashes database RDF graph chunking prefix hashing query candidates with matching prefixes exact match exact matches Benjamin Adrian, Sven Schwarz 14 http://www.dfki.de/~lastname
  15. 15. Thank youBenjamin Adrian Questions?Sven Schwarz Benjamin Adrian, Sven Schwarz 15 http://www.dfki.de/~lastname

×