From Databases to Dataspaces*Wearing the Linked Data goggles<br />DERI reading group presentation 23.02.2011    PhD J. Umb...
Background of the paper<br />Motivation of the paper in 2005<br />The authors<br /><ul><li>Motivation of the paper in 2005
Development of relational database management systems showed spectacular results
BUT: “data everywhere” and use cases relying on large amount of diverse, interrelated data sources poses new challenges fo...
M. Franklin: UC Berkeley, large scale data management
A. Halevy: Google Inc.usage of structured data in web search
D. Maier: Portland State Universitycoined Datalog, data stream processing </li></ul>1 / 24 <br />
 Topic of the paper<br />Dataspacesand their support systems as anew agenda for data management<br />2 / 24 <br />
 The Problem: Data Management<br />Loosely connected data sources<br />Information are available in various formats<br />N...
 The Solution<br />Define space of data<br />Identifiable scope and control across the data and underlying systems<br />Da...
DataSpaces: System<br />
DataSpaces: Logical Components<br />data co-existence approach (not data integration)<br />contains all information releva...
RDBs, XML, text, services
Stored or streamed
Different query support
Support updates, read only
Any kind of relationship
A replica of B
C mapping for A and B</li></ul>Broader set of relations<br /><ul><li>E and F created independently but cover same physical...
DataSpaces: Services<br />Content heterogeneity requires multiple style of data access<br />Cataloging data resources (sou...
DataSpaces: System<br />Source: Franklin et al: From Databases to Dataspaces, SIGMOD Rec. 2005<br />7 / 24 <br />
 DSSP: Catalog<br /><ul><li>Contains information about all the participants
Like (Rate of change, query answering, statistics, ownership, access, privacy policies, relationships
Basic inventory
Identifier, type, creation date
Answering presence, absence of data element
Model Management environment on top of the catalog</li></ul>8 / 24 <br />
Upcoming SlideShare
Loading in …5
×

Reading Group: From Database to Dataspaces

649 views
580 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
649
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Reading Group: From Database to Dataspaces

  1. 1. From Databases to Dataspaces*Wearing the Linked Data goggles<br />DERI reading group presentation 23.02.2011 PhD J. Umbrich<br />* M. Franklin, A. Halevy, D. Maier in ACM SIGMOD Record, Dez. 2005<br />
  2. 2. Background of the paper<br />Motivation of the paper in 2005<br />The authors<br /><ul><li>Motivation of the paper in 2005
  3. 3. Development of relational database management systems showed spectacular results
  4. 4. BUT: “data everywhere” and use cases relying on large amount of diverse, interrelated data sources poses new challenges for the data management
  5. 5. M. Franklin: UC Berkeley, large scale data management
  6. 6. A. Halevy: Google Inc.usage of structured data in web search
  7. 7. D. Maier: Portland State Universitycoined Datalog, data stream processing </li></ul>1 / 24 <br />
  8. 8. Topic of the paper<br />Dataspacesand their support systems as anew agenda for data management<br />2 / 24 <br />
  9. 9. The Problem: Data Management<br />Loosely connected data sources<br />Information are available in various formats<br />Not always control over data<br />Low-level data management challenges across heterogeneous collections<br />Search & querying<br />Tracking lineage<br />Availability & recovery<br />Enforcing rules<br />Integrity constraints<br />Access control<br />Naming convention<br />(meta) data evolution <br />3 / 24 <br />
  10. 10. The Solution<br />Define space of data<br />Identifiable scope and control across the data and underlying systems<br />DataSpace Support Platforms (DSSPs)Offers a suite of interrelated services and guarantees over self managed data sources (no complete data control)<br />Pay-as-you-go<br />Keyword search is bare minimum<br />More function and increased consistency as you add work<br />4 / 24 <br />
  11. 11. DataSpaces: System<br />
  12. 12. DataSpaces: Logical Components<br />data co-existence approach (not data integration)<br />contains all information relevantto a particular organisation regardless of the format and location<br />model a rich collection of relationships between data repositories<br />Participants<br />Relations<br /><ul><li>Individual data sources
  13. 13. RDBs, XML, text, services
  14. 14. Stored or streamed
  15. 15. Different query support
  16. 16. Support updates, read only
  17. 17. Any kind of relationship
  18. 18. A replica of B
  19. 19. C mapping for A and B</li></ul>Broader set of relations<br /><ul><li>E and F created independently but cover same physical system</li></ul>5 / 24 <br />
  20. 20. DataSpaces: Services<br />Content heterogeneity requires multiple style of data access<br />Cataloging data resources (source, name, size, creation data, location) <br />Search as a primary mechanism to deal with large collections and unfamiliar data (Similarity search, ranking)<br />Search applicable to all content of the dataspace regardless of data format (includes also meta data) <br />Updates (major research)<br />Monitoring, event detection, support for complex workflows<br />6 / 24 <br />
  21. 21. DataSpaces: System<br />Source: Franklin et al: From Databases to Dataspaces, SIGMOD Rec. 2005<br />7 / 24 <br />
  22. 22. DSSP: Catalog<br /><ul><li>Contains information about all the participants
  23. 23. Like (Rate of change, query answering, statistics, ownership, access, privacy policies, relationships
  24. 24. Basic inventory
  25. 25. Identifier, type, creation date
  26. 26. Answering presence, absence of data element
  27. 27. Model Management environment on top of the catalog</li></ul>8 / 24 <br />
  28. 28. DSSP: Search & Query<br /><ul><li>Query everything
  29. 29. Query data item regardless of format
  30. 30. Keyword search
  31. 31. Structured Query
  32. 32. common interfaces (mediated schema)
  33. 33. Over specific source
  34. 34. Peer-data management systems
  35. 35. Various query formats with mappings
  36. 36. Meta-data queries
  37. 37. Result sources, timestamps, uncertainty
  38. 38. Source location and similarity queries
  39. 39. Monitoring
  40. 40. Stateless or stateful</li></ul>9 / 24 <br />
  41. 41. DSSP: Local Store and Index<br /><ul><li>Create efficiently queryable association between participants
  42. 42. Improve access to data sources with limited access patterns
  43. 43. Data replication
  44. 44. Support of high availability and recovery
  45. 45. Highly adaptive to heterogeneous data
  46. 46. Identifies information across participants
  47. 47. Robust for multiple real-world objects</li></ul>10 / 24 <br />
  48. 48. DSSP: Discovery<br /><ul><li>Locating participants
  49. 49. Creation of relationships
  50. 50. Semi automatically
  51. 51. Monitoring/Learning</li></ul>11 / 24 <br />
  52. 52. DSSP: Enhancement<br /><ul><li>Imbue participants with additional capabilities
  53. 53. Schema
  54. 54. Keyword search
  55. 55. Update monitoring</li></ul>12 / 24 <br />
  56. 56. Research Challenges<br />Data models and querying <br />Dataspace discovery<br />Reusing human attention<br />Dataspace storage and indexing<br />Correctness guarantees<br />Theoretical foundations<br />13 / 24 <br />
  57. 57. Data Models and Querying<br />Heterogeneous data models and query languages<br />Query reformulation (complex -> simple, vice versa)<br /><ul><li>Hierachy of query languages (pay-as-you-go</li></ul>File system-like queries<br />Keyword query (bag-of-words)<br />Path/containment queries (semi-structured)<br />Structured Queries (XML , RDF, OWL)<br />14 / 24 <br />
  58. 58. DataSpace Discovery<br />Locate participants<br />Semi-automatic tool for clustering and finding relationships between data sources<br />Creation of more precise relationships<br />15 / 24 <br />
  59. 59. Reusing Human Attention<br />Semantic integration evolves over time<br />Humans the most scarce resource <br />Machine learning <br />16 / 24 <br />
  60. 60. Storage & Indexing<br />Heterogeneity of the index (different data formats)<br />Ideally, uniformly indexing of all data items<br />Dealing with multiple identifiers for the same real word thing<br />Updates<br />Automated tuning, which data items to cache which indexes to build ?<br />17 / 24 <br />
  61. 61. Correctness guarantees<br />Quality of answers for accessing disparate data source<br />Involving updates <br />Define levels of service guarantees <br />Rethinking of fundamental data management principles<br />Inherent tradeoffs in terms of quality, performance and control<br />18 / 24 <br />
  62. 62. Theoretical Foundations<br />Formal understanding of different data models<br />What queries are expressible over a dataspace?<br />Detection of semantically equivalent but syntactically different query languages?<br />19 / 24 <br />
  63. 63. Linked Data …<br />… as a major step towards a concrete implementation of a dataspacesupport platform ?<br />Use and reuse of HTTP URIs for real-world things<br />Provide useful (self-descriptive) content in RDF<br />20 / 24 <br />
  64. 64. Data Models and Querying<br />Unified data model (RDF) <br />URIs as identifiers for real-world things<br />Linkage as relationships between sources and entities<br />Data co-exists (everyone can say everything about everybody)<br />Keyword query (bag-of-words)<br />SPARQL<br />21 / 24 <br />
  65. 65. Remaining Challenges<br />Querying<br />Meta data queries <br />Discovery<br />Link traversal, link creation<br />Reasoning, Graph Mining<br />Storage & Indexing<br />Consolidation<br />Correctness Guarantees<br />Reuse Human Attention <br />Updates / Monitoring<br />Data access/ privacy<br />22 / 24 <br />
  66. 66. LinkedData: DataSpaces<br />Source: Franklin et al: From Databases to Dataspaces, SIGMOD Rec. 2005<br />23 / 24 <br />
  67. 67. DSSPs Examples <br />Search Engines (SWSE, Sindice, FalconS,…) <br />Keyword search, ranking<br />SPARQL<br />Data access<br />RDB2RDF, RDFizers<br />Discovery<br />SILK<br />All-in-one<br />Structured Dynamics LLC<br />24 / 24 <br />
  68. 68. Questions? <br />Opinions?<br />

×