Exploring the Future Potential of AI-Enabled Smartphone Processors
Reading Group: From Database to Dataspaces
1. From Databases to Dataspaces*Wearing the Linked Data goggles DERI reading group presentation 23.02.2011 PhD J. Umbrich * M. Franklin, A. Halevy, D. Maier in ACM SIGMOD Record, Dez. 2005
7. D. Maier: Portland State Universitycoined Datalog, data stream processing 1 / 24
8. Topic of the paper Dataspacesand their support systems as anew agenda for data management 2 / 24
9. The Problem: Data Management Loosely connected data sources Information are available in various formats Not always control over data Low-level data management challenges across heterogeneous collections Search & querying Tracking lineage Availability & recovery Enforcing rules Integrity constraints Access control Naming convention (meta) data evolution 3 / 24
10. The Solution Define space of data Identifiable scope and control across the data and underlying systems DataSpace Support Platforms (DSSPs)Offers a suite of interrelated services and guarantees over self managed data sources (no complete data control) Pay-as-you-go Keyword search is bare minimum More function and increased consistency as you add work 4 / 24
20. DataSpaces: Services Content heterogeneity requires multiple style of data access Cataloging data resources (source, name, size, creation data, location) Search as a primary mechanism to deal with large collections and unfamiliar data (Similarity search, ranking) Search applicable to all content of the dataspace regardless of data format (includes also meta data) Updates (major research) Monitoring, event detection, support for complex workflows 6 / 24
56. Research Challenges Data models and querying Dataspace discovery Reusing human attention Dataspace storage and indexing Correctness guarantees Theoretical foundations 13 / 24
57.
58. DataSpace Discovery Locate participants Semi-automatic tool for clustering and finding relationships between data sources Creation of more precise relationships 15 / 24
59. Reusing Human Attention Semantic integration evolves over time Humans the most scarce resource Machine learning 16 / 24
60. Storage & Indexing Heterogeneity of the index (different data formats) Ideally, uniformly indexing of all data items Dealing with multiple identifiers for the same real word thing Updates Automated tuning, which data items to cache which indexes to build ? 17 / 24
61. Correctness guarantees Quality of answers for accessing disparate data source Involving updates Define levels of service guarantees Rethinking of fundamental data management principles Inherent tradeoffs in terms of quality, performance and control 18 / 24
62. Theoretical Foundations Formal understanding of different data models What queries are expressible over a dataspace? Detection of semantically equivalent but syntactically different query languages? 19 / 24
63. Linked Data … … as a major step towards a concrete implementation of a dataspacesupport platform ? Use and reuse of HTTP URIs for real-world things Provide useful (self-descriptive) content in RDF 20 / 24
64. Data Models and Querying Unified data model (RDF) URIs as identifiers for real-world things Linkage as relationships between sources and entities Data co-exists (everyone can say everything about everybody) Keyword query (bag-of-words) SPARQL 21 / 24
65. Remaining Challenges Querying Meta data queries Discovery Link traversal, link creation Reasoning, Graph Mining Storage & Indexing Consolidation Correctness Guarantees Reuse Human Attention Updates / Monitoring Data access/ privacy 22 / 24