Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent Systems

The Real-time Linked Dataspace (RLD) is an enabling platform for data management for intelligent systems within smart environments that combines the pay-as-you-go paradigm of dataspaces, linked data, and knowledge graphs with entity-centric real-time query capabilities.

The RLD contains all the relevant information within a data ecosystem including things, sensors, and data sources and has the responsibility for managing the relationships among these participants.

It manages sources without presuming a pre-existing semantic integration among them using specialised dataspace support services for loose administrative proximity and semantic integration for event and stream systems. Support services leverage approximate and best-effort techniques and operate under a 5 star model for “pay-as-you-go” incremental data management.

  • Be the first to comment

From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent Systems

  1. 1. From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent Systems Edward Curry Insight @ NUI Galway
  2. 2. Open Access Book Contents Part I: Fundamentals and Concepts Part II: Data Support Services Part III: Stream and Event Processing Services Part IV: Intelligent Systems and Applications Part V: Future Directions Team
  3. 3. Part I: Fundamentals and Concepts 3
  4. 4. Data Driven Innovations Digital Twins: A digital replica of physical assets (car), processes (value-chain), systems, or physical environments (building). The digital representation (i.e. simulation modelling or data-driven models) provided by the digital twin can be analysed to optimise the operation of the “physical twin”. Physical-Cyber-Social (PCS): A computing paradigm that supports a richer human experience with a holistic data-rich view of the smart environment that integrates, correlates, interprets, and provides contextually relevant abstractions to humans. Mass Personalisation: More human-centric thinking in the design of systems where users have growing expectations for highly personalised digital services for the “Market of One”. Data Network Effects: As more systems/users join and contribute data to the smart environment, a “network effect” can take place, resulting in the overall data available becoming more valuable.
  5. 5. Real World Digital World Sensors Orient DecideActuators Act Observe Physical Twin (Asset-centric) Digital Twin (System-centric) Digital Twins 5
  6. 6. Connected Intelligent Systems 6
  7. 7. 7 Value Chains in Data Ecosystems
  8. 8. Data Management Challenges • Pay-as-you-go Data Integration, Accessibility, and Sharing – Standard data syntax, semantics, and linkage: Facilitate integration and sharing, ideally with open standards and non-proprietary approaches. – Single-point data discoverability and accessibility: Allow the organisation and access to datasets and metadata through a single location. – Incremental data management: Enable a low barrier to entry and a pay-as-you-go paradigm to minimise costs. • Secure Access Control: Support data access rights to preserve the security of data and privacy of users in the smart environment. • Real-time Data Processing and Historical Querying – Real-time data processing: Including ingestion, aggregation, and pattern detection within event streams originating from sensors and things in the smart environment. – Unified querying of real-time data and historical data: Provide applications and end-users with a holistic queryable state of the smart environment at a latency suitable for user interaction. • Entity-centric Data Views – Entity management: The storage, linkage, curation, and retrieval of entity data, such as users, zones, and locations. – Event enrichment: Enhancement of sensor/things streams with contextual data (e.g. entities) to make the stream data more encapsulated and useful in downstream processing.
  9. 9. The “gold mining” metaphor applied to data processing
  10. 10. Traditional Approaches to Data Integration Low High High Frequency of use Cost of administration & semantic integration using traditional approaches Popularity/Use Number of data sources, entities, attributes
  11. 11. Data is Key to AI…Data Platforms will Fuel AI Decisions Data Generation and Analysis (including IoT) Data Platforms (Access and Portability) AI and Decision Platforms
  12. 12. IoT-Enablement Layer 1 - Communication and Sensing IPv6, Wi-Fi, RFID, CoAP, AVB, etc. Layer 3 - Data Schema, Entities, Catalog, Sharing, Access/Control, etc. Layer 4 – Intelligent Apps, Analytics, and Users Datasets Things / Sensors Contextual Data Sources (including legacy systems) Predictive Analytics Situation Awareness Decision Support Digital Twin Machine Learning Users Layer 2 - Middleware Peer-to-Peer, Events, Pub/Sub, SOA, SDN, etc. A Data Sharing Layer is needed…. Adapted from: L. Atzori, A. Iera, and G. Morabito, “The Internet of Things: A survey,” Comput. Networks, vol. 54, no. 15, pp. 2787–2805, Oct. 2010.
  13. 13. Cost of Data Management Solutions Administrative Proximity: – With close control many assumptions can hold concerning guarantees such as data quality and consistency., – Far control refers to a loosely coupled environment and a lack of coordination on the data sources. Semantic Integration – Degree to which data schemas are matched up (types, attributes, and names). – All data conform to an agreed-upon schema vs. no schema information. This dimension is relevant to how much semantically rich querying can be done. 13 Halevy, A., Franklin, M. and Maier, D. 2006. Principles of dataspace systems. 25th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS ’06 (New York, New York, USA, 2006), 1–9.
  14. 14. (Real-time Linked) Dataspace Principles: (adapted from by Halevy et al.) • Must deal with many different formats of streams and events. • Does not subsume the stream and event processing engines; they still provide individual access via their native interfaces. • Queries in are provided on a best-effort and approximate basis. • Must provide pathways to improve the integration among the data sources, including streams and events, in a pay-as- you-go fashion. 14 Dataspace “Dataspaces are not a data integration approach; rather, they are more of a data co- existence approach. The goal of dataspace support is to provide base functionality over all data sources, regardless of how integrated they are.” (Halevy, A., Franklin, M. and Maier, D. 2006.) Real-time Linked Dataspace (RLD) Enabling platform for data management for intelligent systems within smart environments that combines the pay-as-you-go paradigm of dataspaces, linked data, and knowledge graphs with entity-centric real-time query capabilities.
  15. 15. Approximate and Best Effort Approaches Low High High Frequency of use Approximate & best-effort approaches Cost of administration & semantic integration using traditional approaches Popularity/Use Number of data sources, entities, attributes
  16. 16. Architecture of Real-time Linked Dataspace • Support Platform: Responsible for providing the functionalities and services essential for managing the dataspace. • Things / Sensors: Produce real-time data streams that need to be processed & managed. • Data Sources: Available in a wide variety of formats and accessible through different systems interfaces. • Managed Entities: Actively managed entities including their relationship to participating things, data sources, and other entities. • Intelligent Applications, Analytics, & Users: Leverage RLDs data and services to provide data analytics, decision support tools, user interfaces, and data visualisations. 16
  17. 17. Pay-as-you-Go Tiered Data Model 17 • Provides flexibility by reducing the initial cost and barriers to joining the dataspace. • Specialisation of the 5 star scheme defined by Tim Berners-Lee. • Over time the level of integration with the support services can be improved in an incremental manner on an as-needed basis. • The more investment made to integrate with the support services; the better integration is achievable in the dataspace.
  18. 18. Service Tiers for Support Services
  19. 19. Part II: Data Support Services
  20. 20. Part III: Stream and Event Processing Services
  21. 21. Data Self-Management 21 Techniques for: • Self-Configuration • Self-Healing • Self-Optimizing Automatic Source Selection • Source Selection • Source Replacement • Model Selection • Model Training • Parameterization
  22. 22. Entity Data Management and Humans in the Loop Enables Users in the Smart Environment to participate in data management tasks • Collection & Enrichment • Mapping & Matching • Operator Evaluation • Feedback & Refinement • Citizen Actuation Key HIL Challenges • Task Specification (simplicity) • Interaction Mechanism • Task Assignment (Geospatial, expertise) 22
  23. 23. Semantic Approximation Matching of Streams Challenges • Heterogeneity in Event Semantics (000s schema) • Heterogeneity in processing Rules (000s of rule tied to schema) Approx. Semantic Event Matcher • Sub-symbolic Distributional Event Semantics • Enables pay-as-you-go event matching for data streams • Replaced 48,000 exact rules with 100 approximate rules with around 85% accuracy 23
  24. 24. Part IV: Intelligent Systems and Applications LOCATION Airport Office Home Mixed Use School LINATE AIRPORT, MILAN, ITALY INSIGHT, GALWAY, IRELAND HOUSES, THERMI, GREECE ENGINEERING, NUI GALWAY COLÁISTE NA COIRIBE, IRELAND TARGETUSERS • Corporate users • ~9.5 million passengers • Utilities management • Maintenance staff • Environmental managers • 130 staff • Office consumers • Operations managers • Utility providers • Building managers • Domestic consumers (adults, young adults and children) • Utility providers • Mixed/Public consumers • Building managers • 100 staff • 1000 students (ages 18 to 24) • Mixed/Public consumers • School management • Maintenance staff • 500 students (ages 12 to 18) • 40 teachers INFRASTRUCTURE • Safety critical • 10 km water network • Multiple buildings • Water meters • Energy meters • Legacy systems • 2190 m2 space • 22 offices + 160 open plan spaces • Conference room • 4 meeting rooms • 3 kitchens • Data centre • 30 person café • Energy meters • 10 households • Typical variety of domestic settings including kitchen, showers, baths, living room, bedrooms, and garden • Water meters • Water meters • Energy meters • Rainwater harvesting • Café • Weather station • Wet labs • Showers • Water meters • Energy meters • Rainwater harvesting India (OK)India (OK)India (OK) Smart Water and Energy Management Pilots
  25. 25. Smart School CnaC School in Galway, Ireland Mixed Use Galway, Ireland Building Manager University Students Smart Airport Milan Linate, Italy Corporate Staff Passengers Smart Homes Municipality of Thermi, Greece Smart Office Galway, Ireland Families Operational Staff Researchers Application Developers Teaching Staff School Students Data Scientist Need to target different Target Users
  26. 26. IoT-enabled Digital Twins and Intelligent Applications Real-time Linked Dataspace DatasetsThings / Sensors Entity Management Service Catalog & Access Control Service Personal DashboardPublic Dashboards Decision Analytics and Machine Learning Notifications Apps Alerts Orient Decide Act Search & Query Service Entity-Centric Real-Time Query Service Complex Event Processing Service Digital Twin CEP D Human Task Service Human Task Service Observe “OODA” Loop
  27. 27. Interactive Public Displays Alerts and NotificationsPersonalised Dashboards Example Applications
  28. 28. Experiences and Lessons Learnt from Dataspaces • Developer education need for stream processing and approximate results • Incremental data management can support agile software development • Build the business case for data-driven innovation • Integration with legacy data is a significant cost in smart environments • The 5 star pay-as-you-go model simplified communication with non-technical users • A secure canonical source for entity data simplifies application development • Data quality with things and sensors is challenging in an operational environment • Working with three pipelines add overhead (LAMBDA + Entity Layer) 28
  29. 29. Part V: Future Directions 29 Large-scale Decentralised Support Services • Enhanced Supported Services • Scaling Entity Management • Maintenance and Operation Cost Multimedia/Knowledge-Intensive Event Processing • Support Services for Multimedia Data • Placement of Multimedia Data and Workloads • Adaptive Training of Classifiers • Complex Multimedia Event Processing Trusted Data Sharing • Trusted Platforms • Usage Control • Personal/ Industrial Dataspaces Ecosystem Governance and Economic Models • Decentralised Data Governance • Economic Models Incremental Intelligent Systems Engineering Cognitive Adaptability • Pay-as-you-go Systems • Cognitive Adaptability Towards Human-centric Systems • Explainable Artificial Intelligence and Data Provenance • Human-in-the-loop
  30. 30. Some final thoughts on Impacts, Influence, and Future Funding
  31. 31. Data Sharing Spaces – Position Paper Key Recommendations Create the conditions for the development of a trusted European data sharing framework Incorporate data sharing at the core of the data lifecycle to enable greater access to data. Provide supportive measures for European businesses to safely embrace new technologies, practices and policies. Assemble a European-wide digital skills strategy to equip the workforce for the new data economy.
  32. 32. A European Strategy for Data BDVA Meeting 26 February 2020 Yvo Volman Head of Unit G1 - Data Policy and Innovation DG CNECT, European Commission
  33. 33. European Strategy for Data Data can flow within the EU and across sectors European rules and values are fully respected Rules for access and use of data are fair, practical and clear & clear data governance mechanisms are in place A common European data space, a single market for data Availability of high quality data to create and innovate
  34. 34. Rich pool of data (varying degree of accessibility) Free flow of data across sectors and countries Full respect of GDPR Health Industrial & Manufacturing Agriculture Finance Mobility Green Deal Energy −Technical tools for data pooling and sharing −Standards & interoperability (technical, semantic) − Sectoral Data Governance (contracts, licenses, access rights, usage rights) − IT capacity, including cloud storage, processing and services Horizontal framework for data governance and data access Common European data spaces Public Administration Skills