0
DIADEMDomain-centric, Intelligent, Automated Data Extraction<br />Tim Furche, Georg Gottlob, Giorgio Orsi<br />May 11th, 2...
2<br />
3<br />1<br />Web Data Extraction<br />
4<br />Section 1: Web Data Extraction<br />Data on the Web<br />there is more of it than we can use<br />no longer availab...
5<br />Section 1: Web Data Extraction<br />Surface vs. Deep Web<br />estimated 500 × surface web<br />estimated 400000 dee...
6<br />And it’s not just one haystack …<br />
7<br />
8<br />
9<br />
10<br />
11<br />7 bedrooms<br />5 bedrooms<br />
12<br />Section 1: Web Data Extraction<br />The Web is more than HTML<br />
13<br />Section 1: Web Data Extraction<br />Overview<br />Introducing Web Data Extraction<br />Scenarios<br />Why now?<br ...
14<br />1.1<br />Web Data Extraction:Scenarios<br />
15<br />Section 1: Web Data Extraction<br />The Need of Web Data Extraction<br />information<br />drives business (decisio...
16<br />keyword search fails<br />example due to Fabian Suchaneck<br />
17<br />keyword search fails<br />
18<br />Section 1: Web Data Extraction<br />Scenario ➀: Electronics retailer<br />electronics retailer: online market inte...
19<br />Section 1: Web Data Extraction<br />Scenario ➁: Supermarket chain<br />supermarket chain<br />competitors’product ...
20<br />Section 1: Web Data Extraction<br />Scenario ➂: Hotel Agency<br />online travel agency<br />best price guarantee <...
21<br />Section 1: Web Data Extraction<br />Scenario ➃: Hedge Fund<br />house price index<br />published in regular interv...
22<br />Section 1: Web Data Extraction<br />And a lot more …<br />monitor blogs and forums<br />market intelligence, e.g.,...
23<br />
24<br />1.1<br />Web Data Extraction:Why Now?<br />
25<br />Scale<br />
26<br />Applications<br />
27<br />Section 1: Web Data Extraction<br />How to book a flight?<br />
How to find a history book?<br />28<br />Section 1: Web Data Extraction<br />
How to find a paper?<br />29<br />Section 1: Web Data Extraction<br />
30<br />Section 1: Web Data Extraction<br />How to find a flat?<br />
31<br />Structured Data<br />
32<br />
33<br />Section 1: Web Data Extraction<br />Why Web Data Extraction Now?<br />Why now? Trends<br />Trend ➊: scale—every bu...
Web Data Extraction:Supervised<br />34<br />2<br />
35<br />manual: (e.g., Web Harvest)<br />user writes the wrapper, sometimes using wrapping libraries<br />supervised: (e.g...
36<br />Section 2: Supervised Web Data Extraction<br />Supervised Web Data Extraction<br />User interaction needed to<br /...
37<br />
38<br />
39<br />
40<br />Section 1: Supervised Web Data Extraction<br />Lixto: Extraction & Analysis<br />Lixto: sophisticated, visual semi...
41<br />
42<br />
Web Data Extraction:Unsupervised<br />43<br />3<br />
44<br />17000 real estatesites in the UK alone<br />
45<br />Section 3: Unsupervised Web Data Extraction<br />Why Automating Data Extraction?<br />Too many fish in the pond<br...
46<br />Section 3: Unsupervised Web Data Extraction<br />Why Automating Data Extraction?<br />All the fish are different<b...
47<br />Section 3: Unsupervised Web Data Extraction<br />… and we really need it!<br />search engine providers (Google, Mi...
48<br />“no one really has done this successfully at scale yet”<br />Raghu Ramakrishnan, Yahoo!, March 2009<br />“Current ...
49<br />Section 3: Unsupervised Web Data Extraction<br />Unsupervised: The Story so Far<br />Key observation: <br />“datab...
?<br />51<br />
52<br />4<br />DIADEM<br />
53<br />Section 4: DIADEM<br />Domain-Centric Data Extraction<br />Blackbox analyser that<br />turns any of the thousands ...
54<br />host of domain specific annotators<br />
55<br />domain ontology & phenomenology<br />
56<br />+ everything the others are doing<br />template discovery<br />machine learning for classification<br />
57<br />
58<br />
59<br />Section 4: DIADEM<br />DIADEM: Overview<br />DIADEM combines<br />host of domain-specific annotators with<br />giv...
60<br />4.1<br />DEMO<br />
61<br />
62<br />DIADEM 0.1<br />First prototype<br />
63<br />
64<br />7 bedrooms<br />5 bedrooms<br />
65<br />Form successfully filled<br />Next step<br />
66<br />Section 4: DIADEM<br />Achievements in Numbers<br />15k-150k facts (5-50MB) generated per web page<br />time: usua...
67<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<...
68<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☣<br />☣<br />☣<br />☣<br />☣<...
69<br />
OPAL:Ontologies for Form Analysis<br />70<br />4.2<br />
71<br />
72<br />Diversity<br />
73<br />
74<br />Section 4: DIADEM » OPAL<br />OPAL: Overview<br />Three step process:<br />browser extraction and annotation<br />...
75<br />
76<br />
77<br />
78<br />
79<br />ICQ Data Set: Application to Other Domains<br />
AMBER:Ontologies for Record Extraction<br />80<br />4.3<br />
81<br />7 bedrooms<br />5 bedrooms<br />
82<br />just opposite as in OPAL<br />
AMBER: Overview<br />Three step process like OPAL<br />browser extraction and annotation<br />classification (phenomenolog...
84<br />
85<br />
86<br />Repeating<br />
87<br />Similarity<br />
Upcoming SlideShare
Loading in...5
×

Diadem 1.0

794

Published on

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
794
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Diadem 1.0"

  1. 1. DIADEMDomain-centric, Intelligent, Automated Data Extraction<br />Tim Furche, Georg Gottlob, Giorgio Orsi<br />May 11th, 2011@ Oxford University Computing Laboratories<br />joint work with Giovanni Grasso, Omer Gunes, XiaonanGuo, AndreyKravchenko, Thomas Lukasiewicz, Christian Schallhart, Andrew Sellers, Gerardo Simaris, Cheng Wang<br />
  2. 2. 2<br />
  3. 3. 3<br />1<br />Web Data Extraction<br />
  4. 4. 4<br />Section 1: Web Data Extraction<br />Data on the Web<br />there is more of it than we can use<br />no longer availability, but finding, integrating, analysing, …<br />
  5. 5. 5<br />Section 1: Web Data Extraction<br />Surface vs. Deep Web<br />estimated 500 × surface web<br />estimated 400000 deep web databases <br />What?<br />Products (stores)<br />Directories (yellow pages)<br />Catalogs (libraries)<br />Public DBs (publications, census, data.gov,…)<br />Public services (weather, location, …)<br />
  6. 6. 6<br />And it’s not just one haystack …<br />
  7. 7. 7<br />
  8. 8. 8<br />
  9. 9. 9<br />
  10. 10. 10<br />
  11. 11. 11<br />7 bedrooms<br />5 bedrooms<br />
  12. 12. 12<br />Section 1: Web Data Extraction<br />The Web is more than HTML<br />
  13. 13. 13<br />Section 1: Web Data Extraction<br />Overview<br />Introducing Web Data Extraction<br />Scenarios<br />Why now?<br />Supervised Web Data Extraction<br />Unsupervised Web Data Extraction<br />DIADEM<br />OPAL<br />AMBER<br />OXPath<br />IVLIA<br />Datalog±<br />
  14. 14. 14<br />1.1<br />Web Data Extraction:Scenarios<br />
  15. 15. 15<br />Section 1: Web Data Extraction<br />The Need of Web Data Extraction<br />information<br />drives business (decision making, trend analysis, …)<br />available in troves on the internet<br />but: as HTML made for humans, not as structured data<br />companies need<br />product specifications<br />pricing information<br />market trends<br />regulatory information<br />
  16. 16. 16<br />keyword search fails<br />example due to Fabian Suchaneck<br />
  17. 17. 17<br />keyword search fails<br />
  18. 18. 18<br />Section 1: Web Data Extraction<br />Scenario ➀: Electronics retailer<br />electronics retailer: online market intelligence<br />comprehensive overview of the market<br />daily information on price, shipping costs, trends, product mix<br />by product, geographical region, or competitor<br />thousands of products<br />hundreds of competitors<br />nowadays: specialised companies<br />mostly manual, interpolation<br />large cost<br />
  19. 19. 19<br />Section 1: Web Data Extraction<br />Scenario ➁: Supermarket chain<br />supermarket chain<br />competitors’product prices <br />special offer or promotion (time sensitive)<br />new products, product formats & packaging<br />
  20. 20. 20<br />Section 1: Web Data Extraction<br />Scenario ➂: Hotel Agency<br />online travel agency<br />best price guarantee <br />prices of competing agencies<br />average market price<br />
  21. 21. 21<br />Section 1: Web Data Extraction<br />Scenario ➃: Hedge Fund<br />house price index<br />published in regular intervals by national statistics agency<br />affects share values of various industries<br />hedge fund<br />online market intelligence to predict the house price index<br />
  22. 22. 22<br />Section 1: Web Data Extraction<br />And a lot more …<br />monitor blogs and forums<br />market intelligence, e.g., complaints, common problems<br />customer opinions<br />ranking and analysing product reviews<br />financial analysts<br />monitor trends and stats for products of a certain company / category<br />interest rates from financial institutions<br />press releases and financial reports<br />patent search & analysis<br />…<br />
  23. 23. 23<br />
  24. 24. 24<br />1.1<br />Web Data Extraction:Why Now?<br />
  25. 25. 25<br />Scale<br />
  26. 26. 26<br />Applications<br />
  27. 27. 27<br />Section 1: Web Data Extraction<br />How to book a flight?<br />
  28. 28. How to find a history book?<br />28<br />Section 1: Web Data Extraction<br />
  29. 29. How to find a paper?<br />29<br />Section 1: Web Data Extraction<br />
  30. 30. 30<br />Section 1: Web Data Extraction<br />How to find a flat?<br />
  31. 31. 31<br />Structured Data<br />
  32. 32. 32<br />
  33. 33. 33<br />Section 1: Web Data Extraction<br />Why Web Data Extraction Now?<br />Why now? Trends<br />Trend ➊: scale—every business is online<br />automation at scale<br />Trend ➋: web applications rather than web documents<br />automated form filling (deep web navigation)<br />Trend ➌: structured, common-sense data available <br />allows more sophisticated automated analysis<br />also a tool for improved data extraction?<br />
  34. 34. Web Data Extraction:Supervised<br />34<br />2<br />
  35. 35. 35<br />manual: (e.g., Web Harvest)<br />user writes the wrapper, sometimes using wrapping libraries<br />supervised: (e.g., Lixto)<br />user provides examples and refines the wrapper<br />semi-supervised: <br />user provides examples (per site), wrapper is automatically learned<br />unsupervised: entirely automated (e.g., DIADEM)<br />some systems omit examples and run analysis directly on all pages <br />some systems automatically guess examples<br />
  36. 36. 36<br />Section 2: Supervised Web Data Extraction<br />Supervised Web Data Extraction<br />User interaction needed to<br />rather than manually writing in a programming language<br />record interaction sequences (such as form fillings)<br />visually select examples for data<br />Current gold standard for high-accuracy extraction<br />Examples: <br />Lixto<br />Automation Anywhere<br />Web Harvest<br />…<br />
  37. 37. 37<br />
  38. 38. 38<br />
  39. 39. 39<br />
  40. 40. 40<br />Section 1: Supervised Web Data Extraction<br />Lixto: Extraction & Analysis<br />Lixto: sophisticated, visual semi-automated extraction tool<br />visually select, automatically derives patterns, verification<br />highly scalable extraction and processing with Lixto server<br />but also: data integration & business analytics suite<br />data cleaning<br />data flow scenarios: merge & filter from different web sites<br />market intelligence & analytics<br />
  41. 41. 41<br />
  42. 42. 42<br />
  43. 43. Web Data Extraction:Unsupervised<br />43<br />3<br />
  44. 44. 44<br />17000 real estatesites in the UK alone<br />
  45. 45. 45<br />Section 3: Unsupervised Web Data Extraction<br />Why Automating Data Extraction?<br />Too many fish in the pond<br />> 17000 real estate UK sites<br />similar for restaurants, travel, airlines, pharmacies, retail shops, …<br />aggregators cover only a fraction<br />updated slowly<br /><ul><li>per site manual work infeasible</li></ul>wrapper construction too expensive <br />tracking changes<br />excludes manual & (semi-) supervised<br />
  46. 46. 46<br />Section 3: Unsupervised Web Data Extraction<br />Why Automating Data Extraction?<br />All the fish are different<br />large, modern aggregators (>100000)<br />nation-wide agencies (>10000)<br />agencies for single quarter (< 15)<br /><ul><li>no single unsupervised wrapper</li></ul>can do this today<br />
  47. 47. 47<br />Section 3: Unsupervised Web Data Extraction<br />… and we really need it!<br />search engine providers (Google, Microsoft, Yahoo!) all work on <br />information and data extraction for<br />“vertical”, “object” and “semantic” search<br />turn search engines into knowledge bases for decision support<br />
  48. 48. 48<br />“no one really has done this successfully at scale yet”<br />Raghu Ramakrishnan, Yahoo!, March 2009<br />“Current technologies are not good enough yet to provide what search engines really need. [...] Any successful approach would probably need a combination of knowledge and learning.”<br />Alon Halevy, Google, Feb. 2009<br />
  49. 49. 49<br />Section 3: Unsupervised Web Data Extraction<br />Unsupervised: The Story so Far<br />Key observation: <br />“database” web sites are generated using templates<br />wrapper generators need to automatically identifying templates<br />Two major approaches<br />machine learning from a few hand-labeled examples<br />similar to semi-supervised, but only one set of examples for an entire domain<br />high precision only for simple domains (single entity type, few attributes)<br />fully automatically exploit the repeated structure of result pages<br />good precision needs a lot of data (many records per page, many pages)<br />doesn’t work for forms (no repetition)<br />
  50. 50.
  51. 51. ?<br />51<br />
  52. 52. 52<br />4<br />DIADEM<br />
  53. 53. 53<br />Section 4: DIADEM<br />Domain-Centric Data Extraction<br />Blackbox analyser that<br />turns any of the thousands of websites of a domain<br />into structured data<br />
  54. 54. 54<br />host of domain specific annotators<br />
  55. 55. 55<br />domain ontology & phenomenology<br />
  56. 56. 56<br />+ everything the others are doing<br />template discovery<br />machine learning for classification<br />
  57. 57. 57<br />
  58. 58. 58<br />
  59. 59. 59<br />Section 4: DIADEM<br />DIADEM: Overview<br />DIADEM combines<br />host of domain-specific annotators with<br />gives us a first “guess” to automatically generate examples<br />high-level ontology about domain entities and<br />their phenomenology on web sites of the domain<br />allows us to verify & refine examples<br />+ advances in existing techniques for <br />repeated structure analysis <br />page & block classification<br />bottom-up understanding & top-down reasoning<br />
  60. 60. 60<br />4.1<br />DEMO<br />
  61. 61. 61<br />
  62. 62. 62<br />DIADEM 0.1<br />First prototype<br />
  63. 63. 63<br />
  64. 64. 64<br />7 bedrooms<br />5 bedrooms<br />
  65. 65. 65<br />Form successfully filled<br />Next step<br />
  66. 66. 66<br />Section 4: DIADEM<br />Achievements in Numbers<br />15k-150k facts (5-50MB) generated per web page<br />time: usually between 30-60 sec, at most few minutes<br />300-400 predicates<br />Some numbers on the prototype:<br />Java files: 293 with 44993 lines of code<br />DLV rules: over 500 rules, over 200 predicates<br />Gazetteers: 111 gazetteers with 48000 entries <br />JAPE rules: 23 rules files with 30 rules<br />
  67. 67. 67<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☂<br />☂<br />☂<br />☀<br />☀<br />☀<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☀<br />☀<br />☣<br />☣<br />☣<br />☣<br />☣<br />
  68. 68. 68<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☀<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☣<br />☀<br />☂<br />☂<br />☣<br />
  69. 69. 69<br />
  70. 70. OPAL:Ontologies for Form Analysis<br />70<br />4.2<br />
  71. 71. 71<br />
  72. 72. 72<br />Diversity<br />
  73. 73. 73<br />
  74. 74. 74<br />Section 4: DIADEM » OPAL<br />OPAL: Overview<br />Three step process:<br />browser extraction and annotation<br />labelling & segmentation<br />classification (phenomenological mapping)<br />Model-based, knowledge driven<br />latter two steps are model transformations<br />thin layer of domain-dependent concepts<br />field types and labels<br />triggers for field & form creation<br />
  75. 75. 75<br />
  76. 76. 76<br />
  77. 77. 77<br />
  78. 78. 78<br />
  79. 79. 79<br />ICQ Data Set: Application to Other Domains<br />
  80. 80. AMBER:Ontologies for Record Extraction<br />80<br />4.3<br />
  81. 81. 81<br />7 bedrooms<br />5 bedrooms<br />
  82. 82. 82<br />just opposite as in OPAL<br />
  83. 83. AMBER: Overview<br />Three step process like OPAL<br />browser extraction and annotation<br />classification (phenomenological mapping)<br />record segmentation (much harder than in OPAL)<br />Model-based, knowledge driven<br />latter two steps are model transformations<br />thin layer of domain-dependent concepts<br />record and attribute types<br />triggers for record & attribute creation<br />83<br />Section 4: DIADEM » AMBER<br />
  84. 84. 84<br />
  85. 85. 85<br />
  86. 86. 86<br />Repeating<br />
  87. 87. 87<br />Similarity<br />
  88. 88. 88<br />
  89. 89. OXPath:Scalable, Memory-Efficient Web Extraction<br />89<br />4.4<br />
  90. 90. How to book a flight?<br />90<br />Section 4: DIADEM » OXPath<br />
  91. 91. How to find a history book?<br />91<br />Section 4: DIADEM » OXPath<br />
  92. 92. How to find a flat?<br />92<br />Section 4: DIADEM » OXPath<br />
  93. 93. How to find a paper?<br />93<br />Scenarios<br />
  94. 94. How to find a flat with OXPath<br />Section 4: DIADEM » OXPath<br />Start at rightmove.co.uk: doc("rightmove.co.uk")<br />Fill “oxford’ into the first visible field/descendant::field()[1]/{"oxford"}<br />Click on the second next button/following::field()[2]/{click /}<br />On the refinement form just continue by clicking on the last field/descendant::field()[last()]/{click /}<br />Grab all the prices//p.price<br />94<br />
  95. 95. State of Web Extraction<br />No interaction with rich, scripted interfaces<br />no actions other than form filling and submission<br />➀ Imperative extraction scripts<br />explicit variable assignments, flow control, etc.<br />either proprietary selection language or mix of XPath & external flow control<br />➁ Focus on automation and visual interfaces<br />no or very limited extraction language, only ad-hoc extractions<br />no multiway navigation, no optimization<br />95<br />Section 4: DIADEM » OXPath<br />
  96. 96. Why OXPath?<br />96<br />Section 4: DIADEM » OXPath<br />scalability<br />familiarity<br />there is no XPath for data extraction<br />simplicity<br />web applications<br />
  97. 97.
  98. 98. Summary of Complexity<br />98<br />Section 4: DIADEM » OXPath<br />Combined: PTime-hard<br />PTime-hard<br />Data: NLogSpace<br />LogSpace<br />Extraction marker = n-ary, nested queries<br />Actions = multiple pages<br />O(n4⋅q2)<br />O(n3⋅q2)<br />Contextual actions (action free prefix)<br />Buffer bounded by page depth<br />
  99. 99. 99<br />Constant Memory<br />
  100. 100. 100<br />browser bound<br />
  101. 101. 101<br />… for many pages<br />
  102. 102. 102<br />… for many results<br />
  103. 103. 103<br />memory<br />
  104. 104. 104<br />faster<br />
  105. 105. 105<br />even faster<br />
  106. 106. 106<br />4.5<br />IVLIA:Ontologies for PDF Extraction<br />
  107. 107. 107<br />
  108. 108. PDF Analysis<br />108<br />Section 4: DIADEM » IVLIA<br />
  109. 109. Semantic Analysis and Annotation<br />109<br />Section 4: DIADEM » IVLIA<br />
  110. 110. Datalog±:Ontological Reasoning at Web Scale<br />110<br />4.6<br />
  111. 111. 111<br />Section 4: DIADEM » Datalog±<br />Much is possible with Datalog<br />DL axiom<br />Datalog rule<br />Concept Inclusion<br />employee(X) -> person(X)<br />employeevperson<br />(Inverse) Role Inclusion<br />reports¡vmanager<br />reports(X,Y) -> manager(Y,X)<br />Role Transitivity<br />trans(manager)<br />manager(X,Y), manager(Y,Z) -> manager(X,Z)<br />Datalog and ontological reasoning<br />
  112. 112. 112<br />Section 4: DIADEM » Datalog±<br />but it’s not enough …<br />DL axiom<br />Datalog(?) rule<br />Participation<br />employeev∃report<br />employee(X) -> ∃Yreport(X,Y)<br />Disjointness<br />employee(X), customer(X) -> ⊥<br />employee v:customer<br />Functionality<br />reports(X,Y), reports(X,Z) -> Y = Z<br />funct(reports)<br />Datalog and ontological reasoning<br />
  113. 113. 113<br />Section 4: DIADEM » Datalog±<br />Ontological Databases<br />E/R Schema<br />Object Relational Schema<br />Relational Schema<br />person(ssn, name, birthdate)<br />employee (ssn, empID, name, birthdate, department)<br />department (depName, building)<br />project (projID, startDate, duration)<br />supervision (supervisor, supervised)<br />assignment (employee, project)<br />
  114. 114. 114<br />Section 4: DIADEM » Datalog±<br />Ontological Constraints<br />Taxonomy Definitions<br />employee(X,Y,Z,W) -> ∃V person(V,Y,Z)<br />project(X,Y,Z) -> activity(X,Y,Z)<br />Concept Definitions<br />employee(X1,Y1,Z1,W1,U1), supervision(Y1,Y2), <br />employee(X2,Y2,Z2,W2,U2) -> supervisor(X1,Y1,Z1,W1,U1)<br />An employee who supervises another employee is a supervisor<br />generalManager(X1,Y1,Z1,W1,U1) -> supervision(Y1,Y1)<br />A general manager supervises him/herself<br />
  115. 115. 115<br />expressiveness<br />efficiency<br />KR<br />expressiveness<br />efficiency<br />DB<br />Big Picture<br />
  116. 116. 116<br />Big Picture<br />
  117. 117. 117<br />Our goal …<br />DB<br />technology<br />+<br />constraints<br />Datalog<br />DLs<br />(DL-Lite, EL, Flogic Lite)<br />Unifying Framework<br />Section 4: DIADEM » Datalog±<br />while maintaining query answering tractable in data complexity!<br />
  118. 118. 118<br />employee(X), inProject(X,Y) ->∃Zemployee(Z),supervises(Z,X)<br />reports(X,Y),reports(Z,X)->Y = Z<br />employee(X),customer(X) -> ⊥<br />Section 4: DIADEM » Datalog±<br />Extend Datalog by allowing in the head: <br />existential (∃) variables  Tuple-generating dependencies (TGDs)<br />equality (=) Equality-generating dependencies (EGDs)<br />constant false (⊥)  Negative constraints (NCs)<br />What we get is Datalog[∃,=,⊥] Datalog+<br />Datalog±<br />
  119. 119. 119<br />Linear<br />DL-Lite<br />Sticky-join<br />FO-rewritable<br />Guarded<br />EL<br />PTIME<br />Datalog±: Overview<br />Section 4: DIADEM » Datalog±<br />
  120. 120. 120<br />Section 4: DIADEM » Datalog±<br />Comparison with existing semantic data management solutions<br />IBM IODT [Ma et Al. SIGMOD ‘08]<br />Ontotext BigOWLLim [Kiryakov WWW ‘06]<br />Requiem [Horrocks et Al. ISWC ‘09]<br />Prototype implementation:<br />Nyaya (http://mais.dia.uniroma3.it/Nyaya/Home.html)<br />Implements guarded, weakly-acyclic, linear and sticky Datalog ±<br />Couples a Datalog ± engine with efficient storage mechanism<br />Datalog±: In practice (experiments)<br />
  121. 121. 121<br />Section 4: DIADEM » Datalog±<br />Paper Semantic Data Markets: Store, Reason and Query<br />by R. De Virgilio, G. Orsi, L. Tanca and R. Torlone (submitted) <br />Findings:<br />commercial systems do not identify FO-rewritable fragments<br />they could answer queries much faster than they do now<br />testing FO-rewritability conditions is easy<br />Datalog±: In practice (experiments)<br />
  122. 122. 122<br />Section 4: DIADEM » Datalog±<br />If the language of Σis FO-rewritable<br />fact updates reduce to updates in a RDBMS<br />predicate updates reduce to re-compute the rewriting<br />Datalog±: Updates<br />
  123. 123. 123<br />
  124. 124. Q&A<br />diadem-project.info<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×