Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

1,436 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

  1. 1. Using Suffix Arrays forEfficient Recognition of Named Entities in Large Scale Benjamin Adrian, Sven Schwarz Benjamin Adrian, Sven Schwarz http://www.dfki.de/~lastname
  2. 2. A huge Web of Data The Semantic Web offers techniques for ... ● representing, ● formalizing, ● and reasoning information … on the WWW in order to make information ... ● transferable, ● portable, ● and interpretable … for machine consumption.∑ 9,363,625 distinct literal values Benjamin Adrian, Sven Schwarz 2 http://www.dfki.de/~lastname
  3. 3. Wouldnt it be great to … ?… to link entity references in text to referents in RDF graphs. Benjamin  works at DFKI,  Kaiserslautern. Goal: Enrich natural language text with formal facts. Benjamin Adrian, Sven Schwarz 3 http://www.dfki.de/~lastname
  4. 4. How to recognize entity references ?natural language text efficient representation RDF source Benjamin  works at DFKI,  Kaiserslautern. → application of relational databases and suffix arrays Benjamin Adrian, Sven Schwarz 4 http://www.dfki.de/~lastname
  5. 5. Entity Recognition Processtext noun-phrase suffix array hashes database RDF graph chunking prefix hashing query candidates with matching prefixes exact match exact matches Benjamin Adrian, Sven Schwarz 5 http://www.dfki.de/~lastname
  6. 6. RDF statements <#19810211> <rdfs:label> “Benjamin Adrian”symbols <#67478302> <rdfs:label> “DFKI” relation <#19810211> <#employedAt> <#67478302> Benjamin Adrian, Sven Schwarz 6 http://www.dfki.de/~lastname
  7. 7. Represent RDF data sepatarate storage of symbols and relations SYMBOLS RELATIONS SUBJECT PREDICATE OBJECT SUBJECT PREDICATE OBJECT RESOURCE INDEX LITERAL INDEX URI INDEX INDEX LITERAL HASH dictionaries Benjamin Adrian, Sven Schwarz 7 http://www.dfki.de/~lastname
  8. 8. Suffix ArrayText “Benjamin Adrian works in DFKI, Kaiserslautern”Suffix array (sorted list of suffixes) Adrian works in DFKI, Kaiserslautern Benjamin Adrian works in DFKI, Kaiserslautern DFKI, Kaiserslautern in DFKI, Kaiserslautern Kaiserslautern works in DFKI, Kaiserslautern Benjamin Adrian, Sven Schwarz 8 http://www.dfki.de/~lastname
  9. 9. Suffix ArrayText “Benjamin Adrian works in DFKI, Kaiserslautern”Suffix array (sorted list of suffixes) Adrian works in DFKI, Kaiserslautern Benjamin Adrian works in DFKI, Kaiserslautern DFKI, Kaiserslautern in DFKI, Kaiserslautern Kaiserslautern works in DFKI, KaiserslauternPhrases in text Reduced suffix array Adrian works in DFKI, Kaiserslautern Benjamin Adrian Benjamin Adrian works in DFKI, Kaiserslautern DFKI DFKI, Kaiserslautern Kaiserslautern Kaiserslautern Benjamin Adrian, Sven Schwarz 9 http://www.dfki.de/~lastname
  10. 10. Noun phrases in natural languagetext Benjamin Adrian, Sven Schwarz 10 http://www.dfki.de/~lastname
  11. 11. Hashing prefixesSuffix array (hashed prefix size = 4) LITERAL INDEX Adrian works in DFKI, Kaiserslautern Benjamin Adrian works in DFKI, Kaiserslautern INDEX LITERAL HASH DFKI, Kaiserslautern Kaiserslautern Benjamin Adrian, Sven Schwarz 11 http://www.dfki.de/~lastname
  12. 12. Select candidates from database Benjamin Adrian, Sven Schwarz 12 http://www.dfki.de/~lastname
  13. 13. Response time Benjamin Adrian, Sven Schwarz 13 http://www.dfki.de/~lastname
  14. 14. Summarytext noun-phrase suffix array hashes database RDF graph chunking prefix hashing query candidates with matching prefixes exact match exact matches Benjamin Adrian, Sven Schwarz 14 http://www.dfki.de/~lastname
  15. 15. Thank youBenjamin Adrian Questions?Sven Schwarz Benjamin Adrian, Sven Schwarz 15 http://www.dfki.de/~lastname

×