Thesis presentation

3,285 views

Published on

The presentation I gave about the topic of my th

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Thesis presentation

  1. 1. CetgK o l g o t f t lkdD t rain n w d e u o I eine a e nr a BIS – 2012/ 01 Leipzig – Page 1 03/ http:/ l /od2.eu A Transparent Formalization of Text for Machines http://nlp2rdf.orgStart: Jan 2009Tentative End: Summer 2012 Sebastian Hellmann A S , U ivr äLipig KW n e it e z st L D Pee tt n . 0 .0 .2 1 . P g O 2 rsnaio 2 9 00 ae ht:/o 2 u t / d .e p l
  2. 2. BIS – 2012/ 01 Leipzig – Page 2 03/ http:/ l /od2.eu OverviewIntroduction of the touched areasScientific CoreEvaluationPlan
  3. 3. BIS – 2012/ 01 Leipzig – Page 3 03/ http:/ l /od2.euThe Semantic Gap
  4. 4. BIS – 2012/ 01 Leipzig – Page 4 03/ http:/ l /od2.euThe Semantic Gap Most problems occurred at the bottom Data integration is difficult, if the pivots are not well defined Questions (in order): What structure to use? What URIs to use? What is a String? How can we teach machines to understand Strings (Knowledge Representation)?
  5. 5. BIS – 2012/ 01 Leipzig – Page 5 03/ http:/ l /od2.eu Main questionHow can we formalize text in a way, which is: Transparent for machines Efficient for NLP Use Cases Consistent with the Web architecture
  6. 6. BIS – 2012/ 01 Leipzig – Page 6 03/ http:/ l /od2.euAreas
  7. 7. BIS – 2012/ 01 Leipzig – Page 7 03/ http:/ l /od2.eu Preliminary definitionThe NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. This definition is still limited to RDF and NLP and targets software integration via a common exchange format
  8. 8. BIS – 2012/ 01 Leipzig – Page 8 03/ http:/ l /od2.euScientific core
  9. 9. BIS – 2012/ 01 Leipzig – Page 9 03/ http:/ l /od2.euScientific core
  10. 10. BIS – 2012/ 01 Leipzig – Page 10 03/ http:/ l /od2.euScientific core Intransparent for machines
  11. 11. BIS – 2012/ 01 Leipzig – Page 11 03/ http:/ l /od2.eu Scientific core Universe of discourse is defined as the words over the alphabet of Unicode characters (Unicode Normal Form C), often called Σ* URIhttp://example.org/sample “The city Berlin is the capital of #offset_0_42 Germany.”
  12. 12. BIS – 2012/ 01 Leipzig – Page 12 03/ http:/ l /od2.eu Scientific core Universe of discourse is defined as the words over the alphabet of Unicode characters (Unicode Normal Form C), often called Σ* URIhttp://example.org/sample context “The city Berlin is the capital of #offset_0_42 isString Germany.” referenceContexthttp://example.org/sample isString “Germany” #offset_34_41
  13. 13. BIS – 2012/ 01 Leipzig – Page 13 03/ http:/ l /od2.eu Scientific coreDefine the notion of “Context” and formalize it in OWL: Context is similar to the German word “Betrachtungshorizont” In English maybe “inside context”, i.e. the text itself, which serves as a reference context for all included substrings. Definitely disjoint with groupings such as “Document”, because a “wider context” is needed for this. Example following...
  14. 14. BIS – 2012/ 01 Leipzig – Page 14 03/ http:/ l /od2.euScientific core
  15. 15. BIS – 2012/ 01 Leipzig – Page 15 03/ http:/ l /od2.eu Scientific coreDefine the notion of “Context” and formalize it in OWL: Context is similar to the German word “Betrachtungshorizont” In English maybe “inside context”, i.e. the text itself, which serves as a reference context for all included substrings. Definitely disjoint with groupings such as “Document”, because a “wider context” is needed for this.
  16. 16. BIS – 2012/ 01 Leipzig – Page 16 03/ http:/ l /od2.eu Scientific CoreGoal is to research some of the implications, ... but I might not be able to finish it, completely.In scope: Property “contextString” is inverse-functional, which means that machines can infer automatically that the same context occurs in different documents. Show consistency with ambiguity Define metrics that compare contexts Formalize the interpretation function Show interoperability with internal models of all major NLP frameworks (Partial) compatibility with the WWW and the GGG
  17. 17. BIS – 2012/ 01 Leipzig – Page 17 03/ http:/ l /od2.eu Scientific CoreOut of scope: Transition between contexts: Do statements from a smaller context hold in a broader context Incorporate all layers of NLP (Stack). Limited to POS tags and Entity Recognition Fill all the question marks in the Venn diagram
  18. 18. BIS – 2012/ 01 Leipzig – Page 18 03/ http:/ l /od2.euAreas
  19. 19. BIS – 2012/ 01 Leipzig – Page 19 03/ http:/ l /od2.euLinguistic Linked Open Data Cloud
  20. 20. BIS – 2012/ 01 Leipzig – Page 20 03/ http:/ l /od2.euDevelopers study
  21. 21. BIS – 2012/ 01 Leipzig – Page 21 03/ http:/ l /od2.euAreas
  22. 22. BIS – 2012/ 01 Leipzig – Page 22 03/ http:/ l /od2.eu EvaluationCompare to other models in NLP:Size (RDF vs. XML) , performance, expressivityIs NIF easy to understand and implement?Developers study, release of the specification had quite an impact, people started to create extensions and use the format. 50 people on the mailing list.How to evaluate Web Service integration or consistency with web architecture. If the way strings are represented is transparent and formalized, do I need to do experimental evaluation to show benefits?
  23. 23. BIS – 2012/ 01 Leipzig – Page 23 03/ http:/ l /od2.euQ&A Thank you for your attention Standing on the shoulders of giants

×