Your SlideShare is downloading. ×
Relationship between the Semantic Web and NLP
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Relationship between the Semantic Web and NLP

1,016
views

Published on

Published in: Education, Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,016
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Relationship between the Semantic Web and NLP Rajendra Akerkar Technomathematics Research Foundation, Kolhapur, IndiaMarch 17, 2009 Akerkar: Sogndal Lecture 1
  • 2. Structure of this talk  Relationship between NLP and SW  Inspiration: QA system and H I i ti t d Haystack t k  RDF Schema & NL Annotations  Information Access Schemata  Information Planning Schemata  Integration  ConclusionMarch 17, 2009 Akerkar: Sogndal Lecture 2
  • 3. The sense of the relationship Could the Semantic Web enhance the technical level of NLP technologies? Could NLP technologies help in delivering and using a better Semantic Web? gMarch 17, 2009 Akerkar: Sogndal Lecture 3
  • 4. Purpose of the Semantic Web to help users  locate,  organize, organize and  process information. belief:  It should be grounded in the information access method humans are comfortable with — natural language.March 17, 2009 Akerkar: Sogndal Lecture 4
  • 5. Why natural language? It is intuitive intuitive, easy to use and rapidly deployable, and no specialized training training.March 17, 2009 Akerkar: Sogndal Lecture 5
  • 6. Vision The Semantic Web equally accessible by computers using specialized languages and interchange formats, and humans using natural l t l language.  Ask a computer: “when was the king of Norway born? born?”  “What’s the cheapest flight to the Mumbai this month? month?” Retrieve “exact information”.March 17, 2009 Akerkar: Sogndal Lecture 6
  • 7. What synergistic opportunities exist between naturallanguage technology and the Sl h l d h Semantic W b? i Web? State of the art State-of-the-art NL systems are capable of providing users intuitive access to a wealth of textual data using ordinary language. However, such systems are often hampered by  the knowledge engineering bottleneck (wrappers, integrate new data source),  knowledge integration (from multi. Sources), and g g )  time consuming. Here Semantic Web comes in …March 17, 2009 Akerkar: Sogndal Lecture 7
  • 8. Semantic Web research Constructing, integrating, packaging, Constructing integrating packaging and exporting segments of knowledge to be usable by the entire world. y NL technology can tap into this knowledge framework  In return provides natural language information access for the Semantic Web.March 17, 2009 Akerkar: Sogndal Lecture 8
  • 9. SW: What is missing?  Where in the loop is the human?  How will we communicate with our software agents?  How will we access information on the Semantic Web? Obviously, we cannot expect ordinary Semantic Web users to manually manipulate ontologies, query with formal logic expressions, etc. i t We would like to communicate with software agents in natural language…  What is the role of natural language in the Semantic Web?March 17, 2009 Akerkar: Sogndal Lecture 9
  • 10. Mechanism for integrating NL into theRDF Augmenting RDF property definitions Creating Information Access Schemata  To bridge gap between NL & RDF Extension to mirror human question answering behaviour in the form of NL query plans.March 17, 2009 Akerkar: Sogndal Lecture 10
  • 11. Inspiration Question Answering Question-Answering (QA) System Haystack System  End user semantic web platform  aggregates all user s information into a unified user’s repository.March 17, 2009 Akerkar: Sogndal Lecture 11
  • 12. QA system The use of metadata is a common technique for rendering information fragments more tenable to processing by computer systems. Our approach  natural language itself as metadata  numerous advantages and opportunities.  preserves h human readability and d bilit d  encourages non-expert users to engage in metadata creation.March 17, 2009 Akerkar: Sogndal Lecture 12
  • 13. QA system Natural language annotations  machine-parsable sentences and phrases that describe the content of various i f d ib th t t f i information ti segments.  annotations serve as metadata  describe the kinds of questions a particular piece of knowledge is capable of answering. Contains natural language annotation technologyMarch 17, 2009 Akerkar: Sogndal Lecture 13
  • 14. QA system “For pioneering contributions to the theory and For practice of optimizing compiler techniques that laid the foundation for modern optimizing compilers and automatic parallel execution.” F t ti ll l ti ” Frances E All was Allen selected for Turing award 2006. Annotation:  Frances E Allen is selected for Turing award in 2006.  2006 Turing awardMarch 17, 2009 Akerkar: Sogndal Lecture 14
  • 15. QA system The annotations allow system to answer:  What award did Allan receive in 2006?  Who was selected for the Turing award in 2006?  To whom was the Turing award given in 2006?March 17, 2009 Akerkar: Sogndal Lecture 15
  • 16. QA system Feature of natural language annotations  any information segment can be annotated:  not only text, but also images, multimedia … y , g , To provide uniform access to semi-structured resources on the Web  a virtual database system  integrates Web sources under a single query interface.March 17, 2009 Akerkar: Sogndal Lecture 16
  • 17. Haystack Aggregates a user’s information into a unified user s repository.  e mail, e-mail, documents, calendar, and web pages. It is presented using RDF  makes it easy for agents to access filter and access, filter, process this information in an automated fashion.March 17, 2009 Akerkar: Sogndal Lecture 17
  • 18. Haystack “Present Tim the letter from the secretary I Present met with last Tuesday from TMRF.”  Current IT allows to store all info to answer the query  Scattered amongst multiple systems  Agent need to communicate with  Email client  Calendar  File system  Directory serverMarch 17, 2009 Akerkar: Sogndal Lecture 18
  • 19. Haystack Reduce the protocol barriers to information— information standardizing on RDF as a common model for information—  agents are free to mine the semantics of a user’s various data sources End-user End user application for managing information  serves as a powerful platform for experimenting with various information retrieval and user interface research problemsMarch 17, 2009 Akerkar: Sogndal Lecture 19
  • 20. QA System & Haystack By incorporating natural language search capabilities into Haystack Demonstrate  the usefulness of natural language search  show its applicability to the Semantic WebMarch 17, 2009 Akerkar: Sogndal Lecture 20
  • 21. To endow Haystack with the ability toanswer  What is the state bird of India?  Tell me what the vision statement of TMRF is.  Do you know Sogndal’s population? Sogndal s Easy on Web But, for this data to be usable by any Semantic Web system it must be system, restructured in terms of the RDF model.March 17, 2009 Akerkar: Sogndal Lecture 21
  • 22. Adenine To facilitate frequent manipulation of RDF data, Haystack’s programming language.  Features of Lisp, Python, and Notation3.  Basic data unit is the RDF triple.March 17, 2009 Akerkar: Sogndal Lecture 22
  • 23. Adenine :State class and the :bird property @prefix dc: <http://purl.org/dc/elements/1.1/> @prefix : <www.tourindia.com/data#> add { :State Triples T i l are enclosed i curly l d in l rdf:type rdfs:Class ; rdfs:label "State" braces { } and expressed in } subject-predicate-object order. add { :bird rdf:type rdf:Property ; semicolon denotes the predicate- rdfs:label “State bird" ; rdfs:domain :State object pair is to assume the last } used subject subject. # ... more property declarations add { :india rdf:type :State ; dc:title “India" ; RDF literals are written as strings in double quotes :bird “Peacock" ; :flower “Lotus" ; :population "1,147,995,904" # ... more information about India and its states }March 17, 2009 Akerkar: Sogndal Lecture 23
  • 24. Adenine unique feature Every Adenine instruction is encoded as a node in the RDF graph, and a sequence of instructions is expressed by adenine:next p y arcs between these instruction nodes. As a result, data and procedures can be embedded within the same RDF graph and can be distributed together.March 17, 2009 Akerkar: Sogndal Lecture 24
  • 25. The connection between the RDF schema and theNL annotations in natural language schema i i t ll h@prefix nl: <http://www.tmrfindia.org/sw/projects/enlight#> add { :stateAttribute rdf:type nl:NaturalLanguageSchema ; # This annotation handles cases like "[state bird] of [India]" # and "[population] of [India]". nl:annotation @( :attribute "of" :state ) ; The definition of :attribute # Code to run to resolve state attribute restricts the resource representing nl:code :stateAttributeCode the attribute to be queried to have} type rdf:Property. df P t add { :attribute rdf:type nl:Parameter ; nl:domain rdf:Property ; The rdfs:label property to resolve the actual literal, nl:descriptionProperty rdfs:label e.g., “State bird” or “population”.} add { :state :state restricts the resource to have type rdf:type nl:Parameter ; nl:domain :State ; :State and to have the resolver dc:title nl:descriptionProperty dc:title}# The identifier [state] will be bound to the value of the named# parameter :state. The identifier [attribute] will be bound to the# value of the named parameter :attribute.method :stateAttributeCode :state = state :attribute = attribute # Ask the system what the [attribute] property of [state] is return (ask %{ attribute state ?x })March 17, 2009 Akerkar: Sogndal Lecture 25
  • 26. Question Answering What is the state bird of India?  System parses the question and determines that :stateAttribute is the relevant natural language schema to invoke. i k  System extracts the natural language bindings of :attribute and :state, which are “state bird” and “India”, respectively. This is further resolved into the RDF resources :bird and :india :india.  As a response to the question, the method :stateAttributeCode is invoked with named parameter :attribute bound to :bird and named parameter :state p bound to :india.  The invoked method performs a query into Haystack’s RDF store, which returns “Peacock”, the state bird of India.March 17, 2009 Akerkar: Sogndal Lecture 26
  • 27. User query is parsed by QA System So, So a single natural language annotation is capable of answering a question. QA system is capable of normalizing different methods for requesting the same information information.  imperative (“Tell me...”),  interrogative (“What is ”) ( What is... ).March 17, 2009 Akerkar: Sogndal Lecture 27
  • 28. Natural language schemaadd { :stateAttribute rdf:type nl:NaturalLanguageSchema The method invoked by the ; NLS queries the RDF store for q nl:annotation @( :state " has the the resource of type :State largest " :comparisonAttribute that contains the maximal ) ; integer value for the property nl:code :maxComparisonAttributeCode given by} :comparisonAttribute. pmethod :maxComparisonAttributeCode :comparisonAttribute = attribute return (ask %{ Allow our system to answer the rdf:type ?x :State , following questions: adenine:argMax ?x ?y 1 xsd:int • Which state has the lowest population? %{ • Do you know what state has the :attribute ?x ?y largest area? }} @(?x))March 17, 2009 Akerkar: Sogndal Lecture 28
  • 29. QA System Built a prototype implementing the natural language schemata. Limited in the types of questions that it can answer and the domain. However, proof of concept that demonstrates a method of marrying natural language with the Semantic Web.March 17, 2009 Akerkar: Sogndal Lecture 29
  • 30. Further integrating natural languagetechnology with the Semantic Web RDF triples ≈ System’s ternary expression representation of NL. Clipping natural language annotations directly into i t rdf:Property d fi iti definitions. Consider a piece of an ontology modeling an address book entry in Haystack:March 17, 2009 Akerkar: Sogndal Lecture 30
  • 31. A natural language-aware software agent could answer questions…add { :Person rdf:type rdfs:Class The :homeAddress is a property} specifying a user’s home address.add { :homeAddress rdf:type rdf:Property ; rdfs:domain :Person ; rdfs:range xsd:string ; Annotation nl:annotation @( nl:subject " lives at " expresses this nl:object ) ; connection nl:annotation @( nl:subject "’s home address is " concretely in nl:object ) ; natural language, nl:annotation @( nl:subject "’s bungalow" ) ; via the nl:generation @( nl:subject "’s home address is nl:annotation nl:object ) property.} The phrase “nl:subject lives at nl:object” is linked to every RDF statement involving the :homeAddress property, where nl:subject is shorthand for indicating the subject (domain) of the relation, and nl:object is h th d for the bj t (range) of the relation. i shorthand f th object ( ) f th l tiMarch 17, 2009 Akerkar: Sogndal Lecture 31
  • 32. ‘Make sense’ with minimal cost! The nl:generation property specifies a natural language version of the knowledge.  allows software agents to present meaningful, natural responses to users.  Question: Where does Ram live?  Reply: Ram’s home is Tellefsens gate 5.March 17, 2009 Akerkar: Sogndal Lecture 32
  • 33. Information Access Schemata Despite the simplicity of adding NL annotations to RDF properties  Significant restriction : only one RDF statement can be queried at once.  Solution: Create a schemata that captures similar patterns of information access.March 17, 2009 Akerkar: Sogndal Lecture 33
  • 34. An information access schema is aquadruple Annotations: NL sentences ( (either declarative or interrogative) or phrases that describe the types of user questions the schema can answer answer. Pattern: a declarative pattern of RDF triples that references a pre-existing ontology. p g gy Action: a set of operators to further process variables bound during the pattern matching process. Mapping: mechanism for handling disjunction between lexical and ontological terms.March 17, 2009 Akerkar: Sogndal Lecture 34
  • 35. Example: “family” of questions  What is the country in Asia with the largest area?  Tell me what Asian country has the highest population density density.  What country in Europe has the lowest infant mortality rate? y  What is the most populated American country?March 17, 2009 Akerkar: Sogndal Lecture 35
  • 36. Capture the “pattern” of informationrequests i an i f in information access schema i h <nl:InformationAccessSchema> Natural language <nl:ann>what country in $region has the largest $attribute</nl:ann> annotations are <nl:pattern>?x a :Country</nl:pattern> employed to <nl:pattern>?x map($attribute) ?val</nl:pattern> describe a pattern of RDF statements <nl:pattern>?x :location $region</nl:pattern> <nl:action>display(boundto(?x, max(?val))) </nl:action> <nl:mapping> Because annotations would be <nl:hash variable="$attribute"> variable $attribute > p processed by linguistically y g y <nl:map value="population"> sophisticated systems, different :population adjectives such as “highest” and </nl:map> “largest” could be uniformly mapped <nl:map value="area"> onto the maximum operation. :area </nl:map> ... Schema answers questions that </nl:hash> involve region specific superlative </nl:mapping> pp g comparison of countries. </nl:InformationAccessSchema>March 17, 2009 Akerkar: Sogndal Lecture 36
  • 37.  The pattern binds to the value of the particular attribute for countries within the queried geographic region, and the action ti ithi th i d hi i d th ti specifies an aggregate operation (maximum) over the values bound within the pattern.  The country corresponding to that maximum value is returned as the answer.  The mapping provides a translation from language attributes to pp g p g g RDF properties.  Information access schemata are written with respect to a particular pre-existing ontology;  In thi I this example, we assume th t an appropriate ontology h b l that i t t l has been established (i.e., :Country is defined as a class, and :location is defined as a property). In this vision of the Semantic Web, information access schemata grounded in natural language would co-exist alongside RDF metadata.March 17, 2009 Akerkar: Sogndal Lecture 37
  • 38. Further extension: Query Plan Question: What is the distance from India to Norway? Solution Plan: To compute the distance between their respective capitals.Could humans “teach” such plans to a computer directly teach ?March 17, 2009 Akerkar: Sogndal Lecture 38
  • 39. Information Planning Schemata An extension of Information Access Schemata. Simplifies the task of knowledge engineering. Example:  Instead of writing RDF patterns,  which would require knowledge of domain-specific ontologies,  Use natural language itself to describe the process of answering a question.  The answer plan (nl:plan) reflects the user’s thought process expressed in natural language: first find the capitals of the countries, and then find the distance between those citiesMarch 17, 2009 Akerkar: Sogndal Lecture 39
  • 40. An information planning schema <nl:InformationPlanningSchema> <nl:ann>distance between $country1 and $country2</ann> <nl:plan> <rdf:Seq> <rdf:li>what is the capital of $country1 := ?capital1</rdf:li> <rdf:li>what is the capital of $country2 := ?capital2</rdf:li> <rdf:li>what is the distance between ?capital1 and ?capital2 := ?distance</rdf:li> </rdf:Seq> </nl:plan> <nl:action>display(?distance)</nl:action> </nl:InformationPlanningSchema>March 17, 2009 Akerkar: Sogndal Lecture 40
  • 41. Integrating the methods The three proposed methods for integrating natural language and RDF can be used together to afford greater flexibility.  Annotating RDF properties is a low-cost (from a knowledge engineering perspective) way of providing natural language access to RDF statements.  Information access schemata while being more complex and schemata, requiring knowledge of domain-specific ontologies, give experienced knowledge engineers fine-grained tools for manipulating RDF and controlling the output.  Information planning schemata allow users to describe in natural describe, language itself, how they would go about answering a particular class of questions. These three methods can combine to provide the foundation for question answering on the Semantic Web.March 17, 2009 Akerkar: Sogndal Lecture 41
  • 42. Thank You !March 17, 2009 Akerkar: Sogndal Lecture 42

×