Relationship between the
      Semantic Web and NLP

                 Rajendra Akerkar
                 Technomathematics Research Foundation,
                 Kolhapur, India




March 17, 2009                      Akerkar: Sogndal Lecture   1
Structure of this talk
         Relationship between NLP and SW
         Inspiration: QA system and H
          I    i ti          t     d Haystack
                                         t k
         RDF Schema & NL Annotations
         Information Access Schemata
         Information Planning Schemata
         Integration
         Conclusion




March 17, 2009            Akerkar: Sogndal Lecture   2
The sense of the relationship

    Could the Semantic Web enhance the
     technical level of NLP technologies?

    Could NLP technologies help in delivering
     and using a better Semantic Web?
             g




March 17, 2009        Akerkar: Sogndal Lecture   3
Purpose of the Semantic Web

    to help users
         locate,
         organize,
          organize and
         process information.


    belief:
         It should be grounded in the information access
          method humans are comfortable with
          — natural language.

March 17, 2009             Akerkar: Sogndal Lecture         4
Why natural language?

    It is intuitive
           intuitive,
    easy to use and rapidly deployable, and
    no specialized training
                      training.




March 17, 2009        Akerkar: Sogndal Lecture   5
Vision
    The Semantic Web equally accessible by
     computers using specialized languages and
     interchange formats, and humans using
     natural l
        t l language.
         Ask a computer: “when was the king of Norway
          born?
          born?”
         “What’s the cheapest flight to the Mumbai this
          month?
          month?”
    Retrieve “exact information”.


March 17, 2009             Akerkar: Sogndal Lecture        6
What synergistic opportunities exist between natural
language technology and the S
l           h l       d h Semantic W b?
                                     i Web?
    State of the art
     State-of-the-art NL systems are capable of
     providing users intuitive access to a wealth of
     textual data using ordinary language.
    However, such systems are often hampered
     by
         the knowledge engineering bottleneck (wrappers,
          integrate new data source),
         knowledge integration (from multi. Sources), and
                  g     g                           )
         time consuming.
    Here Semantic Web comes in …

March 17, 2009                     Akerkar: Sogndal Lecture   7
Semantic Web research

    Constructing, integrating, packaging,
     Constructing integrating packaging and
     exporting segments of knowledge to be
     usable by the entire world.
             y
    NL technology can tap into this knowledge
     framework
         In return provides natural language information
          access for the Semantic Web.




March 17, 2009             Akerkar: Sogndal Lecture         8
SW: What is missing?

         Where in the loop is the human?
         How will we communicate with our software agents?
         How will we access information on the Semantic Web?

          Obviously, we cannot expect ordinary Semantic Web users to
          manually manipulate ontologies, query with formal logic
          expressions, etc.
                 i      t
          We would like to communicate with software agents in natural
          language…

         What is the role of natural language in the Semantic
          Web?


March 17, 2009                  Akerkar: Sogndal Lecture                 9
Mechanism for integrating NL into the
RDF
    Augmenting RDF property definitions
    Creating Information Access Schemata
         To bridge gap between NL & RDF
    Extension to mirror human question
     answering behaviour in the form of NL query
     plans.




March 17, 2009           Akerkar: Sogndal Lecture   10
Inspiration

    Question Answering
     Question-Answering (QA) System

    Haystack System
         End user semantic web platform
         aggregates all user s information into a unified
                         user’s
          repository.




March 17, 2009              Akerkar: Sogndal Lecture         11
QA system

    The use of metadata is a common technique
     for rendering information fragments more
     tenable to processing by computer systems.

    Our approach
         natural language itself as metadata
         numerous advantages and opportunities.
                preserves h
                           human readability and
                                    d bilit    d
                encourages non-expert users to engage in metadata
                 creation.


March 17, 2009                   Akerkar: Sogndal Lecture            12
QA system

    Natural language annotations
         machine-parsable sentences and phrases that
          describe the content of various i f
          d    ib th      t t f      i    information
                                                 ti
          segments.
         annotations serve as metadata
                describe the kinds of questions a particular piece of
                 knowledge is capable of answering.
    Contains natural language annotation
     technology

March 17, 2009                     Akerkar: Sogndal Lecture              13
QA system

    “For pioneering contributions to the theory and
      For
     practice of optimizing compiler techniques that laid
     the foundation for modern optimizing compilers and
     automatic parallel execution.” F
        t    ti      ll l     ti ” Frances E All was
                                                Allen
     selected for Turing award 2006.

    Annotation:
         Frances E Allen is selected for Turing award in 2006.
         2006 Turing award



March 17, 2009                Akerkar: Sogndal Lecture            14
QA system

    The annotations allow system to answer:

         What award did Allan receive in 2006?
         Who was selected for the Turing award in 2006?
         To whom was the Turing award given in 2006?




March 17, 2009            Akerkar: Sogndal Lecture         15
QA system

    Feature of natural language annotations
         any information segment can be annotated:
                not only text, but also images, multimedia …
                        y     ,             g ,
    To provide uniform access to semi-structured
     resources on the Web
         a virtual database system
                integrates Web sources under a single query interface.




March 17, 2009                     Akerkar: Sogndal Lecture               16
Haystack

    Aggregates a user’s information into a unified
                  user s
     repository.
         e mail,
          e-mail, documents, calendar, and web pages.
    It is presented using RDF
         makes it easy for agents to access filter and
                                       access, filter,
          process this information in an automated fashion.




March 17, 2009             Akerkar: Sogndal Lecture           17
Haystack

    “Present Tim the letter from the secretary I
      Present
     met with last Tuesday from TMRF.”
         Current IT allows to store all info to answer the
          query
                Scattered amongst multiple systems
                Agent need to communicate with
                    Email client
                    Calendar
                    File system
                    Directory server


March 17, 2009                          Akerkar: Sogndal Lecture   18
Haystack

    Reduce the protocol barriers to information—
                                     information
     standardizing on RDF as a common model
     for information—
         agents are free to mine the semantics of a user’s
          various data sources
    End-user
     End user application for managing
     information
         serves as a powerful platform for experimenting
          with various information retrieval and user
          interface research problems

March 17, 2009             Akerkar: Sogndal Lecture           19
QA System & Haystack

    By incorporating natural language search
     capabilities into Haystack
    Demonstrate
         the usefulness of natural language search
         show its applicability to the Semantic Web




March 17, 2009             Akerkar: Sogndal Lecture    20
To endow Haystack with the ability to
answer
         What is the state bird of India?
         Tell me what the vision statement of TMRF is.
         Do you know Sogndal’s population?
                        Sogndal s


    Easy on Web
    But, for this data to be usable by any
     Semantic Web system it must be
                      system,
     restructured in terms of the RDF model.

March 17, 2009             Akerkar: Sogndal Lecture       21
Adenine

    To facilitate frequent manipulation of RDF
     data, Haystack’s programming language.
         Features of Lisp, Python, and Notation3.
         Basic data unit is the RDF triple.




March 17, 2009             Akerkar: Sogndal Lecture   22
Adenine :State class and the :bird property
     @prefix dc: <http://purl.org/dc/elements/1.1/>
     @prefix : <www.tourindia.com/data#>

          add { :State                                   Triples
                                                         T i l are enclosed i curly
                                                                        l   d in    l
          rdf:type rdfs:Class ;
          rdfs:label "State"                             braces { } and expressed in
     }                                                   subject-predicate-object order.
          add { :bird
          rdf:type rdf:Property ;               semicolon denotes the predicate-
          rdfs:label “State bird" ;
          rdfs:domain :State                    object pair is to assume the last
     }                                          used subject
                                                     subject.
     # ... more property declarations

        add { :india
        rdf:type :State ;
        dc:title “India" ;      RDF literals are written as         strings in double quotes
        :bird “Peacock" ;
        :flower “Lotus" ;
        :population "1,147,995,904"
     # ... more information about India and its states
     }

March 17, 2009                    Akerkar: Sogndal Lecture                                 23
Adenine unique feature

    Every Adenine instruction is encoded as a
     node in the RDF graph, and a sequence of
     instructions is expressed by adenine:next
                       p        y
     arcs between these instruction nodes.

    As a result, data and procedures can be
     embedded within the same RDF graph and
     can be distributed together.


March 17, 2009       Akerkar: Sogndal Lecture    24
The connection between the RDF schema and the
NL annotations in natural language schema
          i    i    t ll             h
@prefix nl: <http://www.tmrfindia.org/sw/projects/enlight#>
    add { :stateAttribute
    rdf:type nl:NaturalLanguageSchema ;
    # This annotation handles cases like "[state bird] of [India]"
    # and "[population] of [India]".
    nl:annotation @( :attribute "of" :state ) ;                   The definition of :attribute
    # Code to run to resolve state attribute                      restricts the resource representing
    nl:code :stateAttributeCode                                   the attribute to be queried to have
}                                                                 type rdf:Property.
                                                                         df P         t
    add { :attribute
    rdf:type nl:Parameter ;
    nl:domain rdf:Property ;                The rdfs:label property to resolve the actual literal,
    nl:descriptionProperty rdfs:label       e.g., “State bird” or “population”.
}
    add { :state
                                                   :state restricts the resource to have type
    rdf:type nl:Parameter ;
    nl:domain :State ;                             :State and to have the resolver dc:title
    nl:descriptionProperty dc:title
}
# The identifier [state] will be bound to the value of the named
# parameter :state. The identifier [attribute] will be bound to the
# value of the named parameter :attribute.
method :stateAttributeCode :state = state :attribute = attribute
    # Ask the system what the [attribute] property of [state] is
    return (ask %{ attribute state ?x })



March 17, 2009                         Akerkar: Sogndal Lecture                                25
Question Answering
    What is the state bird of India?

         System parses the question and determines that
          :stateAttribute is the relevant natural language schema to
          invoke.
          i    k
         System extracts the natural language bindings of :attribute
          and :state, which are “state bird” and “India”, respectively. This
          is further resolved into the RDF resources :bird and :india
                                                                :india.
         As a response to the question, the method
          :stateAttributeCode is invoked with named parameter
          :attribute bound to :bird and named parameter :state
                                                       p
          bound to :india.
         The invoked method performs a query into Haystack’s RDF store,
          which returns “Peacock”, the state bird of India.


March 17, 2009                   Akerkar: Sogndal Lecture                 26
User query is parsed by QA System

    So,
     So a single natural language annotation is
     capable of answering a question.

    QA system is capable of normalizing different
     methods for requesting the same information
                                     information.
         imperative (“Tell me...”),
         interrogative (“What is ”)
                        ( What is... ).



March 17, 2009               Akerkar: Sogndal Lecture   27
Natural language schema
add { :stateAttribute
      rdf:type nl:NaturalLanguageSchema                 The method invoked by the
         ;                                              NLS queries the RDF store for
                                                              q
      nl:annotation @( :state " has the                 the resource of type :State
         largest " :comparisonAttribute                 that contains the maximal
         ) ;
                                                        integer value for the property
      nl:code
         :maxComparisonAttributeCode                    given by
}
                                                        :comparisonAttribute.

               p
method :maxComparisonAttributeCode
   :comparisonAttribute = attribute
      return (ask %{                                  Allow our system to answer the
         rdf:type ?x :State ,                         following questions:
         adenine:argMax ?x ?y 1 xsd:int                   • Which state has the lowest
                                                            population?
            %{
                                                          • Do you know what state has the
         :attribute ?x ?y                                   largest area?
         }
} @(?x))


March 17, 2009             Akerkar: Sogndal Lecture                                  28
QA System

    Built a prototype implementing the natural
     language schemata.
    Limited in the types of questions that it can
     answer and the domain.

    However, proof of concept that demonstrates
     a method of marrying natural language with
     the Semantic Web.

March 17, 2009         Akerkar: Sogndal Lecture      29
Further integrating natural language
technology with the Semantic Web
    RDF triples ≈ System’s ternary expression
                   representation of NL.

    Clipping natural language annotations directly
     into
     i t rdf:Property d fi iti
                          definitions.

    Consider a piece of an ontology modeling an
     address book entry in Haystack:

March 17, 2009         Akerkar: Sogndal Lecture       30
A natural language-aware software agent could answer questions…

add { :Person
     rdf:type                      rdfs:Class                        The :homeAddress is a property
}                                                                    specifying a user’s home address.
add { :homeAddress
      rdf:type                      rdf:Property ;
      rdfs:domain                   :Person ;
      rdfs:range                    xsd:string ;
                                                                                       Annotation
          nl:annotation             @( nl:subject " lives at "                         expresses this
                                       nl:object ) ;                                   connection
          nl:annotation             @( nl:subject "’s home address is "                concretely in
                                       nl:object ) ;                                   natural language,
          nl:annotation             @( nl:subject "’s bungalow" ) ;                    via the
          nl:generation             @( nl:subject "’s home address is                  nl:annotation
                                       nl:object )                                     property.
}


                 The phrase “nl:subject lives at nl:object” is linked to every RDF
                 statement involving the :homeAddress property, where nl:subject is
                 shorthand for indicating the subject (domain) of the relation, and nl:object
                 is h th d for the bj t (range) of the relation.
                 i shorthand f th object (         ) f th   l ti

March 17, 2009                            Akerkar: Sogndal Lecture                                  31
‘Make sense’ with minimal cost!

    The nl:generation property specifies a
     natural language version of the knowledge.
         allows software agents to present meaningful,
          natural responses to users.

         Question: Where does Ram live?
         Reply: Ram’s home is Tellefsens gate 5.




March 17, 2009              Akerkar: Sogndal Lecture      32
Information Access Schemata

    Despite the simplicity of adding NL
     annotations to RDF properties
         Significant restriction : only one RDF statement
          can be queried at once.
         Solution: Create a schemata that captures similar
          patterns of information access.




March 17, 2009             Akerkar: Sogndal Lecture           33
An information access schema is a
quadruple
    Annotations: NL sentences (   (either declarative or
     interrogative) or phrases that describe the types of
     user questions the schema can answer
                                       answer.
    Pattern: a declarative pattern of RDF triples that
     references a pre-existing ontology.
                   p         g        gy
    Action: a set of operators to further process variables
     bound during the pattern matching process.
    Mapping: mechanism for handling disjunction
     between lexical and ontological terms.


March 17, 2009           Akerkar: Sogndal Lecture        34
Example: “family” of questions

         What is the country in Asia with the largest area?
         Tell me what Asian country has the highest
          population density
                      density.
         What country in Europe has the lowest infant
          mortality rate?
                  y
         What is the most populated American country?




March 17, 2009              Akerkar: Sogndal Lecture           35
Capture the “pattern” of information
requests i an i f
         in information access schema
                      i            h
     <nl:InformationAccessSchema>
                                                                                    Natural language
           <nl:ann>what country in $region has the largest
        $attribute</nl:ann>                                                         annotations are
           <nl:pattern>?x a :Country</nl:pattern>                                   employed to
           <nl:pattern>?x map($attribute) ?val</nl:pattern>                         describe a pattern of
                                                                                    RDF statements
                 <nl:pattern>?x :location $region</nl:pattern>
                 <nl:action>display(boundto(?x, max(?val)))
                  </nl:action>

       <nl:mapping>                                             Because annotations would be
           <nl:hash variable="$attribute">
                    variable $attribute >                       p
                                                                processed by linguistically
                                                                             y g          y
                   <nl:map value="population">                  sophisticated systems, different
                   :population                                  adjectives such as “highest” and
                    </nl:map>                                   “largest” could be uniformly mapped
                    <nl:map value="area">                       onto the maximum operation.
                    :area
                     </nl:map>
                      ...                                       Schema answers questions that
                     </nl:hash>                                 involve region specific superlative
                     </nl:mapping>
                            pp g                                comparison of countries.
         </nl:InformationAccessSchema>


March 17, 2009                       Akerkar: Sogndal Lecture                                         36
    The pattern binds to the value of the particular attribute for
          countries within the queried geographic region, and the action
              ti      ithi th      i d         hi      i       d th     ti
          specifies an aggregate operation (maximum) over the values
          bound within the pattern.
                The country corresponding to that maximum value is returned as the
                 answer.

         The mapping provides a translation from language attributes to
                pp g p                               g g
          RDF properties.
                Information access schemata are written with respect to a particular
                 pre-existing ontology;
                In thi
                 I this example, we assume th t an appropriate ontology h b
                               l             that            i t    t l   has been
                 established (i.e., :Country is defined as a class, and :location is
                 defined as a property).


    In this vision of the Semantic Web, information access schemata
     grounded in natural language would co-exist alongside RDF
     metadata.


March 17, 2009                        Akerkar: Sogndal Lecture                      37
Further extension: Query Plan

    Question: What is the distance from India to
     Norway?
    Solution Plan: To compute the distance
     between their respective capitals.


Could humans “teach” such plans to a computer directly
              teach                                      ?

March 17, 2009           Akerkar: Sogndal Lecture            38
Information Planning Schemata

    An extension of Information Access Schemata.
    Simplifies the task of knowledge engineering.

    Example:
                Instead of writing RDF patterns,
                    which would require knowledge of domain-specific ontologies,
                Use natural language itself to describe the process of
                 answering a question.
                    The answer plan (nl:plan) reflects the user’s thought process
                     expressed in natural language: first find the capitals of the
                     countries, and then find the distance between those cities



March 17, 2009                         Akerkar: Sogndal Lecture                      39
An information planning schema
   <nl:InformationPlanningSchema>

              <nl:ann>distance between $country1
                        and $country2</ann>
              <nl:plan>
                 <rdf:Seq>
                     <rdf:li>what is the capital of $country1
                                 := ?capital1</rdf:li>
                      <rdf:li>what is the capital of $country2
                                 := ?capital2</rdf:li>
                      <rdf:li>what is the distance between
                                 ?capital1 and ?capital2
                                  := ?distance</rdf:li>
                        </rdf:Seq>
                     </nl:plan>
                   <nl:action>display(?distance)</nl:action>
     </nl:InformationPlanningSchema>




March 17, 2009              Akerkar: Sogndal Lecture             40
Integrating the methods
    The three proposed methods for integrating natural language and
     RDF can be used together to afford greater flexibility.
                Annotating RDF properties is a low-cost (from a knowledge
                 engineering perspective) way of providing natural language access to
                 RDF statements.
                Information access schemata while being more complex and
                                     schemata,
                 requiring knowledge of domain-specific ontologies, give experienced
                 knowledge engineers fine-grained tools for manipulating RDF and
                 controlling the output.
                Information planning schemata allow users to describe in natural
                                                              describe,
                 language itself, how they would go about answering a particular class
                 of questions.


    These three methods can combine to provide the foundation for
     question answering on the Semantic Web.



March 17, 2009                         Akerkar: Sogndal Lecture                     41
Thank You !




March 17, 2009      Akerkar: Sogndal Lecture   42

Relationship between the Semantic Web and NLP

  • 1.
    Relationship between the Semantic Web and NLP Rajendra Akerkar Technomathematics Research Foundation, Kolhapur, India March 17, 2009 Akerkar: Sogndal Lecture 1
  • 2.
    Structure of thistalk  Relationship between NLP and SW  Inspiration: QA system and H I i ti t d Haystack t k  RDF Schema & NL Annotations  Information Access Schemata  Information Planning Schemata  Integration  Conclusion March 17, 2009 Akerkar: Sogndal Lecture 2
  • 3.
    The sense ofthe relationship  Could the Semantic Web enhance the technical level of NLP technologies?  Could NLP technologies help in delivering and using a better Semantic Web? g March 17, 2009 Akerkar: Sogndal Lecture 3
  • 4.
    Purpose of theSemantic Web  to help users  locate,  organize, organize and  process information.  belief:  It should be grounded in the information access method humans are comfortable with — natural language. March 17, 2009 Akerkar: Sogndal Lecture 4
  • 5.
    Why natural language?  It is intuitive intuitive,  easy to use and rapidly deployable, and  no specialized training training. March 17, 2009 Akerkar: Sogndal Lecture 5
  • 6.
    Vision  The Semantic Web equally accessible by computers using specialized languages and interchange formats, and humans using natural l t l language.  Ask a computer: “when was the king of Norway born? born?”  “What’s the cheapest flight to the Mumbai this month? month?”  Retrieve “exact information”. March 17, 2009 Akerkar: Sogndal Lecture 6
  • 7.
    What synergistic opportunitiesexist between natural language technology and the S l h l d h Semantic W b? i Web?  State of the art State-of-the-art NL systems are capable of providing users intuitive access to a wealth of textual data using ordinary language.  However, such systems are often hampered by  the knowledge engineering bottleneck (wrappers, integrate new data source),  knowledge integration (from multi. Sources), and g g )  time consuming.  Here Semantic Web comes in … March 17, 2009 Akerkar: Sogndal Lecture 7
  • 8.
    Semantic Web research  Constructing, integrating, packaging, Constructing integrating packaging and exporting segments of knowledge to be usable by the entire world. y  NL technology can tap into this knowledge framework  In return provides natural language information access for the Semantic Web. March 17, 2009 Akerkar: Sogndal Lecture 8
  • 9.
    SW: What ismissing?  Where in the loop is the human?  How will we communicate with our software agents?  How will we access information on the Semantic Web? Obviously, we cannot expect ordinary Semantic Web users to manually manipulate ontologies, query with formal logic expressions, etc. i t We would like to communicate with software agents in natural language…  What is the role of natural language in the Semantic Web? March 17, 2009 Akerkar: Sogndal Lecture 9
  • 10.
    Mechanism for integratingNL into the RDF  Augmenting RDF property definitions  Creating Information Access Schemata  To bridge gap between NL & RDF  Extension to mirror human question answering behaviour in the form of NL query plans. March 17, 2009 Akerkar: Sogndal Lecture 10
  • 11.
    Inspiration  Question Answering Question-Answering (QA) System  Haystack System  End user semantic web platform  aggregates all user s information into a unified user’s repository. March 17, 2009 Akerkar: Sogndal Lecture 11
  • 12.
    QA system  The use of metadata is a common technique for rendering information fragments more tenable to processing by computer systems.  Our approach  natural language itself as metadata  numerous advantages and opportunities.  preserves h human readability and d bilit d  encourages non-expert users to engage in metadata creation. March 17, 2009 Akerkar: Sogndal Lecture 12
  • 13.
    QA system  Natural language annotations  machine-parsable sentences and phrases that describe the content of various i f d ib th t t f i information ti segments.  annotations serve as metadata  describe the kinds of questions a particular piece of knowledge is capable of answering.  Contains natural language annotation technology March 17, 2009 Akerkar: Sogndal Lecture 13
  • 14.
    QA system  “For pioneering contributions to the theory and For practice of optimizing compiler techniques that laid the foundation for modern optimizing compilers and automatic parallel execution.” F t ti ll l ti ” Frances E All was Allen selected for Turing award 2006.  Annotation:  Frances E Allen is selected for Turing award in 2006.  2006 Turing award March 17, 2009 Akerkar: Sogndal Lecture 14
  • 15.
    QA system  The annotations allow system to answer:  What award did Allan receive in 2006?  Who was selected for the Turing award in 2006?  To whom was the Turing award given in 2006? March 17, 2009 Akerkar: Sogndal Lecture 15
  • 16.
    QA system  Feature of natural language annotations  any information segment can be annotated:  not only text, but also images, multimedia … y , g ,  To provide uniform access to semi-structured resources on the Web  a virtual database system  integrates Web sources under a single query interface. March 17, 2009 Akerkar: Sogndal Lecture 16
  • 17.
    Haystack  Aggregates a user’s information into a unified user s repository.  e mail, e-mail, documents, calendar, and web pages.  It is presented using RDF  makes it easy for agents to access filter and access, filter, process this information in an automated fashion. March 17, 2009 Akerkar: Sogndal Lecture 17
  • 18.
    Haystack  “Present Tim the letter from the secretary I Present met with last Tuesday from TMRF.”  Current IT allows to store all info to answer the query  Scattered amongst multiple systems  Agent need to communicate with  Email client  Calendar  File system  Directory server March 17, 2009 Akerkar: Sogndal Lecture 18
  • 19.
    Haystack  Reduce the protocol barriers to information— information standardizing on RDF as a common model for information—  agents are free to mine the semantics of a user’s various data sources  End-user End user application for managing information  serves as a powerful platform for experimenting with various information retrieval and user interface research problems March 17, 2009 Akerkar: Sogndal Lecture 19
  • 20.
    QA System &Haystack  By incorporating natural language search capabilities into Haystack  Demonstrate  the usefulness of natural language search  show its applicability to the Semantic Web March 17, 2009 Akerkar: Sogndal Lecture 20
  • 21.
    To endow Haystackwith the ability to answer  What is the state bird of India?  Tell me what the vision statement of TMRF is.  Do you know Sogndal’s population? Sogndal s  Easy on Web  But, for this data to be usable by any Semantic Web system it must be system, restructured in terms of the RDF model. March 17, 2009 Akerkar: Sogndal Lecture 21
  • 22.
    Adenine  To facilitate frequent manipulation of RDF data, Haystack’s programming language.  Features of Lisp, Python, and Notation3.  Basic data unit is the RDF triple. March 17, 2009 Akerkar: Sogndal Lecture 22
  • 23.
    Adenine :State classand the :bird property @prefix dc: <http://purl.org/dc/elements/1.1/> @prefix : <www.tourindia.com/data#> add { :State Triples T i l are enclosed i curly l d in l rdf:type rdfs:Class ; rdfs:label "State" braces { } and expressed in } subject-predicate-object order. add { :bird rdf:type rdf:Property ; semicolon denotes the predicate- rdfs:label “State bird" ; rdfs:domain :State object pair is to assume the last } used subject subject. # ... more property declarations add { :india rdf:type :State ; dc:title “India" ; RDF literals are written as strings in double quotes :bird “Peacock" ; :flower “Lotus" ; :population "1,147,995,904" # ... more information about India and its states } March 17, 2009 Akerkar: Sogndal Lecture 23
  • 24.
    Adenine unique feature  Every Adenine instruction is encoded as a node in the RDF graph, and a sequence of instructions is expressed by adenine:next p y arcs between these instruction nodes.  As a result, data and procedures can be embedded within the same RDF graph and can be distributed together. March 17, 2009 Akerkar: Sogndal Lecture 24
  • 25.
    The connection betweenthe RDF schema and the NL annotations in natural language schema i i t ll h @prefix nl: <http://www.tmrfindia.org/sw/projects/enlight#> add { :stateAttribute rdf:type nl:NaturalLanguageSchema ; # This annotation handles cases like "[state bird] of [India]" # and "[population] of [India]". nl:annotation @( :attribute "of" :state ) ; The definition of :attribute # Code to run to resolve state attribute restricts the resource representing nl:code :stateAttributeCode the attribute to be queried to have } type rdf:Property. df P t add { :attribute rdf:type nl:Parameter ; nl:domain rdf:Property ; The rdfs:label property to resolve the actual literal, nl:descriptionProperty rdfs:label e.g., “State bird” or “population”. } add { :state :state restricts the resource to have type rdf:type nl:Parameter ; nl:domain :State ; :State and to have the resolver dc:title nl:descriptionProperty dc:title } # The identifier [state] will be bound to the value of the named # parameter :state. The identifier [attribute] will be bound to the # value of the named parameter :attribute. method :stateAttributeCode :state = state :attribute = attribute # Ask the system what the [attribute] property of [state] is return (ask %{ attribute state ?x }) March 17, 2009 Akerkar: Sogndal Lecture 25
  • 26.
    Question Answering  What is the state bird of India?  System parses the question and determines that :stateAttribute is the relevant natural language schema to invoke. i k  System extracts the natural language bindings of :attribute and :state, which are “state bird” and “India”, respectively. This is further resolved into the RDF resources :bird and :india :india.  As a response to the question, the method :stateAttributeCode is invoked with named parameter :attribute bound to :bird and named parameter :state p bound to :india.  The invoked method performs a query into Haystack’s RDF store, which returns “Peacock”, the state bird of India. March 17, 2009 Akerkar: Sogndal Lecture 26
  • 27.
    User query isparsed by QA System  So, So a single natural language annotation is capable of answering a question.  QA system is capable of normalizing different methods for requesting the same information information.  imperative (“Tell me...”),  interrogative (“What is ”) ( What is... ). March 17, 2009 Akerkar: Sogndal Lecture 27
  • 28.
    Natural language schema add{ :stateAttribute rdf:type nl:NaturalLanguageSchema The method invoked by the ; NLS queries the RDF store for q nl:annotation @( :state " has the the resource of type :State largest " :comparisonAttribute that contains the maximal ) ; integer value for the property nl:code :maxComparisonAttributeCode given by } :comparisonAttribute. p method :maxComparisonAttributeCode :comparisonAttribute = attribute return (ask %{ Allow our system to answer the rdf:type ?x :State , following questions: adenine:argMax ?x ?y 1 xsd:int • Which state has the lowest population? %{ • Do you know what state has the :attribute ?x ?y largest area? } } @(?x)) March 17, 2009 Akerkar: Sogndal Lecture 28
  • 29.
    QA System  Built a prototype implementing the natural language schemata.  Limited in the types of questions that it can answer and the domain.  However, proof of concept that demonstrates a method of marrying natural language with the Semantic Web. March 17, 2009 Akerkar: Sogndal Lecture 29
  • 30.
    Further integrating naturallanguage technology with the Semantic Web  RDF triples ≈ System’s ternary expression representation of NL.  Clipping natural language annotations directly into i t rdf:Property d fi iti definitions.  Consider a piece of an ontology modeling an address book entry in Haystack: March 17, 2009 Akerkar: Sogndal Lecture 30
  • 31.
    A natural language-awaresoftware agent could answer questions… add { :Person rdf:type rdfs:Class The :homeAddress is a property } specifying a user’s home address. add { :homeAddress rdf:type rdf:Property ; rdfs:domain :Person ; rdfs:range xsd:string ; Annotation nl:annotation @( nl:subject " lives at " expresses this nl:object ) ; connection nl:annotation @( nl:subject "’s home address is " concretely in nl:object ) ; natural language, nl:annotation @( nl:subject "’s bungalow" ) ; via the nl:generation @( nl:subject "’s home address is nl:annotation nl:object ) property. } The phrase “nl:subject lives at nl:object” is linked to every RDF statement involving the :homeAddress property, where nl:subject is shorthand for indicating the subject (domain) of the relation, and nl:object is h th d for the bj t (range) of the relation. i shorthand f th object ( ) f th l ti March 17, 2009 Akerkar: Sogndal Lecture 31
  • 32.
    ‘Make sense’ withminimal cost!  The nl:generation property specifies a natural language version of the knowledge.  allows software agents to present meaningful, natural responses to users.  Question: Where does Ram live?  Reply: Ram’s home is Tellefsens gate 5. March 17, 2009 Akerkar: Sogndal Lecture 32
  • 33.
    Information Access Schemata  Despite the simplicity of adding NL annotations to RDF properties  Significant restriction : only one RDF statement can be queried at once.  Solution: Create a schemata that captures similar patterns of information access. March 17, 2009 Akerkar: Sogndal Lecture 33
  • 34.
    An information accessschema is a quadruple  Annotations: NL sentences ( (either declarative or interrogative) or phrases that describe the types of user questions the schema can answer answer.  Pattern: a declarative pattern of RDF triples that references a pre-existing ontology. p g gy  Action: a set of operators to further process variables bound during the pattern matching process.  Mapping: mechanism for handling disjunction between lexical and ontological terms. March 17, 2009 Akerkar: Sogndal Lecture 34
  • 35.
    Example: “family” ofquestions  What is the country in Asia with the largest area?  Tell me what Asian country has the highest population density density.  What country in Europe has the lowest infant mortality rate? y  What is the most populated American country? March 17, 2009 Akerkar: Sogndal Lecture 35
  • 36.
    Capture the “pattern”of information requests i an i f in information access schema i h <nl:InformationAccessSchema> Natural language <nl:ann>what country in $region has the largest $attribute</nl:ann> annotations are <nl:pattern>?x a :Country</nl:pattern> employed to <nl:pattern>?x map($attribute) ?val</nl:pattern> describe a pattern of RDF statements <nl:pattern>?x :location $region</nl:pattern> <nl:action>display(boundto(?x, max(?val))) </nl:action> <nl:mapping> Because annotations would be <nl:hash variable="$attribute"> variable $attribute > p processed by linguistically y g y <nl:map value="population"> sophisticated systems, different :population adjectives such as “highest” and </nl:map> “largest” could be uniformly mapped <nl:map value="area"> onto the maximum operation. :area </nl:map> ... Schema answers questions that </nl:hash> involve region specific superlative </nl:mapping> pp g comparison of countries. </nl:InformationAccessSchema> March 17, 2009 Akerkar: Sogndal Lecture 36
  • 37.
    The pattern binds to the value of the particular attribute for countries within the queried geographic region, and the action ti ithi th i d hi i d th ti specifies an aggregate operation (maximum) over the values bound within the pattern.  The country corresponding to that maximum value is returned as the answer.  The mapping provides a translation from language attributes to pp g p g g RDF properties.  Information access schemata are written with respect to a particular pre-existing ontology;  In thi I this example, we assume th t an appropriate ontology h b l that i t t l has been established (i.e., :Country is defined as a class, and :location is defined as a property).  In this vision of the Semantic Web, information access schemata grounded in natural language would co-exist alongside RDF metadata. March 17, 2009 Akerkar: Sogndal Lecture 37
  • 38.
    Further extension: QueryPlan  Question: What is the distance from India to Norway?  Solution Plan: To compute the distance between their respective capitals. Could humans “teach” such plans to a computer directly teach ? March 17, 2009 Akerkar: Sogndal Lecture 38
  • 39.
    Information Planning Schemata  An extension of Information Access Schemata.  Simplifies the task of knowledge engineering.  Example:  Instead of writing RDF patterns,  which would require knowledge of domain-specific ontologies,  Use natural language itself to describe the process of answering a question.  The answer plan (nl:plan) reflects the user’s thought process expressed in natural language: first find the capitals of the countries, and then find the distance between those cities March 17, 2009 Akerkar: Sogndal Lecture 39
  • 40.
    An information planningschema <nl:InformationPlanningSchema> <nl:ann>distance between $country1 and $country2</ann> <nl:plan> <rdf:Seq> <rdf:li>what is the capital of $country1 := ?capital1</rdf:li> <rdf:li>what is the capital of $country2 := ?capital2</rdf:li> <rdf:li>what is the distance between ?capital1 and ?capital2 := ?distance</rdf:li> </rdf:Seq> </nl:plan> <nl:action>display(?distance)</nl:action> </nl:InformationPlanningSchema> March 17, 2009 Akerkar: Sogndal Lecture 40
  • 41.
    Integrating the methods  The three proposed methods for integrating natural language and RDF can be used together to afford greater flexibility.  Annotating RDF properties is a low-cost (from a knowledge engineering perspective) way of providing natural language access to RDF statements.  Information access schemata while being more complex and schemata, requiring knowledge of domain-specific ontologies, give experienced knowledge engineers fine-grained tools for manipulating RDF and controlling the output.  Information planning schemata allow users to describe in natural describe, language itself, how they would go about answering a particular class of questions.  These three methods can combine to provide the foundation for question answering on the Semantic Web. March 17, 2009 Akerkar: Sogndal Lecture 41
  • 42.
    Thank You ! March17, 2009 Akerkar: Sogndal Lecture 42