Your SlideShare is downloading. ×
  • Like
Table Recognition
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Table Recognition

  • 275 views
Published

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
275
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The DIADEM Ontology DIADEM 1.0 Yiyang Bao 2 , Xiaonan Guo 2 , Giorgio Orsi 1,2 , Christian Schallhart 2 , Cheng Wang 2 1 Institute for the Future of Computing University of Oxford 2 Department of Computer Science University of Oxford
  • 2. The languages of the web
    • HTML objects provide the data model of a web-page.
    • CSS boxes and properties provide the layout.
    • Javascript provides web dynamics.
    <html> <head> </head> <body> <title> </title> <div> … </div> </body> </html> ox:Property xsd:string ox:address Real World Web this.value.toLowerCase();
    • … ?
    • RDF annotations provide the conceptualization of the domain.
  • 3. Why ontology?
    • Ontologies provide a conceptualization of a domain of interest (Gruber ‘93)
    ox:Property xsd:string ox:address ox:minPrice ox:partOf ox:priceSegment
    • But… we do not only want to model the application domain
    • We must model the domain of its web representations , i.e., its phenomenology .
    • In the end, it is also an ontology
  • 4. Why ontology?
    • Can be used to complete an incomplete model.
    • Can be used to verify a model.
    • Must tolerate uncertainty and inconsistency .
  • 5. A logical model for web extraction
    • Logical model for web entities
      • input and refinement forms .
      • result pages
      • page blocks (e.g., ads)
    • Phenomenological model
      • How logical entities are concretely represented
  • 6. The building blocks
    • HTML entities
      • labels
      • fields (included links)
      • text -nodes and text attributes
    <form> < label for=&quot;male&quot;>Male</label> < input type=&quot;radio&quot; name=&quot;sex&quot; id=&quot;male&quot; /> < label for=&quot;female&quot;>Female</label> < input type=&quot;radio&quot; name=&quot;sex&quot; id=&quot;female&quot; /> </form> <div> <span> Price: </span> <span> £ 250 </span> </div> Price: £ 250
    • Logical entities
      • constructs of our data model
    • Rules
      • describe the phenomenology
  • 7. The form model
    • Goal: model web form phenomenology
  • 8. The form model
    • Areas :
      • button
      • location
      • price
      • room
      • type
      • buy/rent
      • order-by
      • display
    • Root entity:
      • RealEstateForm
    • Properties :
      • partOf  hierarchical structures
  • 9. The form model: elements
      • price
          • type = {min, max}
          • purpose = {buy, rent}
      • currency
      • room
        • category = {bathroom, bedroom, …}
        • type = {min, max}
  • 10. The form model: elements
    • display
    • per page
    • add-in-time
      • property type
    • button
      • submit
      • reset
      • map search
      • advance submit
      • link button
    • order-by
    • buy
    • rent
    • buy/rent
    • new/resale
    • SSTC
    • other
  • 11. The form model: phenomenology
    • Based on linguistic annotations and (visual) heuristics .
    buyElement(X,F) :- visibleField(X), hasAnnotationFeature (X,&quot;majorType&quot;, &quot;reform.label&quot;), hasAnnotationFeature (X,&quot;minorType&quot;, &quot;buy&quot;), not hasAnnotationFeature (X,&quot;minorType&quot;, &quot;rent&quot;), not hasAnnotationFeature (X,&quot;minorType&quot;, &quot;includeSSTC&quot;), group(Ns,_,_,F),#member(X,Ns). radiusElement(X,F) :- visibleField(X), hasAnnotationFeature (X,&quot;majorType&quot;,&quot;reform.label&quot;), hasAnnotationFeature (X,&quot;minorType&quot;,&quot;radius&quot;), group(Ns,_,_,F),#member(X,Ns).
  • 12. The form model: segments
    • A segment is:
      • a single element
      • a group of elements
      • a group of segments
      • a pair <segment, label>
    • Segments
      • buttons
      • geographic
      • price
      • Room
      • property type
      • buy/rent
      • order-by
      • display
      • per page
      • add in time
      • new/resale
      • SSTC
    • Form
      • real-estate
  • 13. The result-page model
    • Goal: model result-pages phenomenology
  • 14. The result-page model
    • Attributes and values
      • e.g., < price , £ 250,000 >
    • Record
      • groups of pairs < attribute, value >
    • Data area
      • groups of records
    • Mandatory attribute(s)
      • must be present in a record
      • sanity check purposes
  • 15. A Conceptual Model for Data Extraction
    • Conceptual Modelling on the Web
      • Software modelling e.g., UML and stereotypes
      • Ad hoc languages e.g., WebML
  • 16. Linking the domain ontology: OntoX
  • 17. DIADEM Ontology: discussion
    • Expressive power
      • safe nr-datalog with stratified negation and aggregation
      • pros: easy to compute
      • cons: not robust to uncertainty and inconsistencies
    • Adaptability
      • result-page model is substantially domain independent
      • Form model is domain dependent (entity types )
        • The number of entities is limited
  • 18. Uncertainty, Vagueness and Inconsistencies
  • 19.
    • Origin
      • annotations are noisy
      • entity types are uncertain
    • Multiple models
      • probabilistic models
        • Markov Logic Networks (Lukasiewicz and Simari)
        • C-tables, Bayesian Networks (Olteanu)
      • ASP
        • disjunctive models
        • weak constraints
    Uncertainty, Vagueness and Inconsistencies
  • 20. Thank you!