Reading Group: From Database to Dataspaces

1. From Databases to Dataspaces*Wearing the Linked Data goggles DERI reading group presentation 23.02.2011 PhD J. Umbrich * M. Franklin, A. Halevy, D. Maier in ACM SIGMOD Record, Dez. 2005

3. Development of relational database management systems showed spectacular results

4. BUT: “data everywhere” and use cases relying on large amount of diverse, interrelated data sources poses new challenges for the data management

5. M. Franklin: UC Berkeley, large scale data management

6. A. Halevy: Google Inc.usage of structured data in web search

7. D. Maier: Portland State Universitycoined Datalog, data stream processing 1 / 24

8. Topic of the paper Dataspacesand their support systems as anew agenda for data management 2 / 24

9. The Problem: Data Management Loosely connected data sources Information are available in various formats Not always control over data Low-level data management challenges across heterogeneous collections Search & querying Tracking lineage Availability & recovery Enforcing rules Integrity constraints Access control Naming convention (meta) data evolution 3 / 24

10. The Solution Define space of data Identifiable scope and control across the data and underlying systems DataSpace Support Platforms (DSSPs)Offers a suite of interrelated services and guarantees over self managed data sources (no complete data control) Pay-as-you-go Keyword search is bare minimum More function and increased consistency as you add work 4 / 24

11. DataSpaces: System

13. RDBs, XML, text, services

14. Stored or streamed

15. Different query support

16. Support updates, read only

17. Any kind of relationship

18. A replica of B

20. DataSpaces: Services Content heterogeneity requires multiple style of data access Cataloging data resources (source, name, size, creation data, location) Search as a primary mechanism to deal with large collections and unfamiliar data (Similarity search, ranking) Search applicable to all content of the dataspace regardless of data format (includes also meta data) Updates (major research) Monitoring, event detection, support for complex workflows 6 / 24

21. DataSpaces: System Source: Franklin et al: From Databases to Dataspaces, SIGMOD Rec. 2005 7 / 24

23. Like (Rate of change, query answering, statistics, ownership, access, privacy policies, relationships

24. Basic inventory

25. Identifier, type, creation date

26. Answering presence, absence of data element

27. Model Management environment on top of the catalog8 / 24

29. Query data item regardless of format

30. Keyword search

31. Structured Query

32. common interfaces (mediated schema)

33. Over specific source

34. Peer-data management systems

35. Various query formats with mappings

36. Meta-data queries

37. Result sources, timestamps, uncertainty

38. Source location and similarity queries

39. Monitoring

40. Stateless or stateful9 / 24

42. Improve access to data sources with limited access patterns

43. Data replication

44. Support of high availability and recovery

45. Highly adaptive to heterogeneous data

46. Identifies information across participants

47. Robust for multiple real-world objects10 / 24

49. Creation of relationships

50. Semi automatically

51. Monitoring/Learning11 / 24

53. Schema

54. Keyword search

55. Update monitoring12 / 24

56. Research Challenges Data models and querying Dataspace discovery Reusing human attention Dataspace storage and indexing Correctness guarantees Theoretical foundations 13 / 24

58. DataSpace Discovery Locate participants Semi-automatic tool for clustering and finding relationships between data sources Creation of more precise relationships 15 / 24

59. Reusing Human Attention Semantic integration evolves over time Humans the most scarce resource Machine learning 16 / 24

60. Storage & Indexing Heterogeneity of the index (different data formats) Ideally, uniformly indexing of all data items Dealing with multiple identifiers for the same real word thing Updates Automated tuning, which data items to cache which indexes to build ? 17 / 24

61. Correctness guarantees Quality of answers for accessing disparate data source Involving updates Define levels of service guarantees Rethinking of fundamental data management principles Inherent tradeoffs in terms of quality, performance and control 18 / 24

62. Theoretical Foundations Formal understanding of different data models What queries are expressible over a dataspace? Detection of semantically equivalent but syntactically different query languages? 19 / 24

63. Linked Data … … as a major step towards a concrete implementation of a dataspacesupport platform ? Use and reuse of HTTP URIs for real-world things Provide useful (self-descriptive) content in RDF 20 / 24

64. Data Models and Querying Unified data model (RDF) URIs as identifiers for real-world things Linkage as relationships between sources and entities Data co-exists (everyone can say everything about everybody) Keyword query (bag-of-words) SPARQL 21 / 24

65. Remaining Challenges Querying Meta data queries Discovery Link traversal, link creation Reasoning, Graph Mining Storage & Indexing Consolidation Correctness Guarantees Reuse Human Attention Updates / Monitoring Data access/ privacy 22 / 24

66. LinkedData: DataSpaces Source: Franklin et al: From Databases to Dataspaces, SIGMOD Rec. 2005 23 / 24

67. DSSPs Examples Search Engines (SWSE, Sindice, FalconS,…) Keyword search, ranking SPARQL Data access RDB2RDF, RDFizers Discovery SILK All-in-one Structured Dynamics LLC 24 / 24

68. Questions? Opinions?

Reading Group: From Database to Dataspaces

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Reading Group: From Database to Dataspaces

Similar to Reading Group: From Database to Dataspaces (20)

Recently uploaded

Recently uploaded (20)

Reading Group: From Database to Dataspaces