Your SlideShare is downloading. ×
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis

1,619
views

Published on

Presentation at the KCAP 2011 conference of the paper: http://data.open.ac.uk/applications/kcap2011.pdf

Presentation at the KCAP 2011 conference of the paper: http://data.open.ac.uk/applications/kcap2011.pdf

Published in: Technology, Education

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,619
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
    Mathieu d’Aquin and Enrico Motta
    Knowledge Media Institute
    The Open University, Milton Keynes, UK
  • 2. Hey, Data!
    I Love Data!
  • 3. Me?
    My name is “one great dataset” and my namespace http://datasets.com/greatone/
    Let’s see… You there, what are you about?
    One great dataset
  • 4. 1,254,245 triples.
    I also have a SPARQL endpoint!
    OK, but what’s there?
    One great dataset
  • 5. Euh.. I have a Void description… with links and all…
    Can you be more explicit?
    One great dataset
  • 6. You mean you want to see… my ontology?
    Hmm… I mean, what are these triples saying?
    One great dataset
  • 7. That would help… but can you tell me what I can ask you?
    Like example SPARQL queries?
    One great dataset
  • 8. Yeah… but I don’t know SPARQL, and how do you chose your examples anyway?

    One great dataset
  • 9. Well… figure it out by yourself them!
    One great dataset
  • 10. Summarizing an RDF dataset with questions
    We would like to be able to give an entry point to a dataset by showing questions it is good at answering
    In a way that can be navigated
    Example:
    Who are the people Tom knows?
    Tom Heath’s FOAF profile
  • 11. A question
    A list of characteristics of objects (clauses) based on the relationships between objects
    Things that are people, i.e. instances of <Person>
    Related to <tom> through the relation <knows>
    For which the answer is a set of objects
    All the objects that satisfy the clauses of the question
  • 12. Formal concept analysis
    Lattice of concepts: set of objects (extension) with common properties (intension)
    Formal context: objects with binary attributes
    Example from: http://en.wikipedia.org/wiki/Formal_concept_analysis
  • 13. RDF instances as individuals in a formal context
    Present relations of objects as binary attributes:
    RDF: tom a Person. tom knows enrico. jeff knows tom.
    FCA: tom: {Class:-Person, knows:-Enrico, jeff-:knows}
    Include implicit information based on the ontology
    tom: {Class:-Person, Class:-Agent, Class:-Thing, knows:-Enrico, knows:Person, knows:-Agent, knows:-Thing,jeff-:knows, Person:-knows, Agent-:knows, Thing:-knows}
  • 14. Example lattice: Tom’s FOAF Profile
  • 15. Eliminating redundancies
    Who are the people Tom knows?
  • 16. A concept in the lattice is a question
    Intension = clauses of the question
    Extension = answers
    All the objects of the extension satisfy the clauses of the question
    Different areas of the lattice focus on different topics
    Questions are
    organized in a hierarchy
    {Class:-Person, tom-:knows}
    What are the (Person) that (tom knows)?
    What are tom’s current projects?
    What are the people?
    What are the people that tom knows?
  • 17. But…
    The RDFFormal Context process can generate a lot of attributes and so a lot of questions
    Ranging from things uninterestingly general
    What are the Things?
    To the ones that might be interesting only in very specific cases
    What are the indian restaurants located in San Diego that have been rated OK and are called “Chez Bob”?
    Need to extract a list of questions as an entry point
  • 18. How to measure the interestingness of a question - metrics
    Inspired by ontology summarization:
    Coverage: if providing a list of questions, the questions should cover the entire lattice (i.e., at least one question per branch)
    Level: Too general or too specific questions are not useful
    Density: The number of clauses can have an impact (avoid too complex questions as well as too simple ones)
    Inspired from FCA:
    Support: the cardinality of the extent – i.e. the number of answers
    Intentional Stability: How much a concept depends on particular elements of the extension
    Extensional Stability: How much a concept depends on particular elements of the intension
  • 19. Experiment: finding the relevant metrics
    4 datasets in different domains
    12 evaluators providing questions of interest for these datasets
    Obtained 44 questions, out of which 27 are valid (no overlap)
    Some are too complicated for our model (include disjunction, negation, aggregation functions)
    “What is the highest point in Florida?”
    A large part do not comply with the initial instructions: should be self-contained and answered by a list of objects
    “How high is mountain x?”
    “What are the restaurant in a given city?”
  • 20. Results
    Level: Questions between levels 3 and 7. 4.46 is the average.
    • Interesting questions located around the center of the lattice
    Density: Questions have between 1 and 3 clauses
    • Simple questions are preferred
    Support: Very large variations amongst the obtained questions
    Intentional Stability: Very large variations amongst the obtained questions
    Extensional Stability: High values (between 0.75 and 1.0), especially compared to the average (0.4)
    Conclusion:
    In order to establish a list of questions most likely to be of interest, a combination of level, density and extensional stability, together with coverage should be used
  • 21. Evaluation
    Algorithm to generate a set of questions from the lattice of an RDF dataset that
    Cover the entire lattice
    Are believed to be interesting according to a given measure
    Datasets from data.open.ac.uk
    614 course descriptions
    1706 Video podcasts
    Using the metrics: random, closeness to middle level, density close to 2, support, extensional stability, and
    Aggregated = 1/3 level + 1/3 density + 1/3 stability
    6 users to score the resulting sets of questions (6 metrics in 2 datasets: 12 sets in total) depending on interestingness
  • 22. Results
  • 23. Implementation: the whatoaskinterface
    Dataset with SPARQL endpoint
    SPARQL2RCF
    Formal Context
    CORON
    Offline
    Lattice
    Online
    Lattice Parser
    Interface Generation (using metrics)
    Interface with navigation in Browser
    User
  • 24. Example: Open educational material(OpenLearn)
  • 25. Example: Database of reading experiences (Arts History project)
  • 26. Example: Open University Buildings
  • 27. Conclusion
    The technique presented provides both a summary and an exploration mechanism over RDF data, using the underlying ontology and formal concept analysis
    It provides an interface for documenting the dataset by examples rather than by specification
    It favors serendipity in the exploration of the dataset, without the need for prior, specialized knowledge
    The current interface in beta is available in an online demo
    Need to improve the question generation and navigation mechanisms
    Ongoing experiment including information gathered through the links to external dataset, to generate un-anticipated questions
    Use-cases in research projects in Arts and Humanities
  • 28. Thank you!
    More info
    Demo: http://lucero-project.info/lb/2011/06/what-to-ask-linked-data/
    Data.open.ac.uk (for some of the datasets used)
    @mdaquin – m.daquin@open.ac.uk