Successfully reported this slideshow.
Your SlideShare is downloading. ×

From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent Systems

Ad

From Data Platforms to Dataspaces:
Enabling Data Ecosystems for Intelligent Systems
Edward Curry,
Insight SFI Research Cen...

Ad

Overview
• Part I: Data Ecosystems for Intelligent Systems
• Part II: Real-time Linked Dataspaces
• Part III: Final Though...

Ad

Contents
Part I: Fundamentals and Concepts
Part II: Data Support Services
Part III: Stream and Event Processing Services
P...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 63 Ad
1 of 63 Ad

From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent Systems

Download to read offline

Digital transformation is driving a new wave of large-scale datafication in every aspect of our world. Today our society creates data ecosystems where data moves among actors within complex information supply chains that can form around an organization, community, sector, or smart environment. These ecosystems of data can be exploited to transform our world and present new challenges and opportunities in the design of intelligent systems. This talk presents my recent work on using the dataspace paradigm as a best-effort approach to data management within data ecosystems. The talk explores the theoretical foundations and principles of dataspaces and details a set of specialized best-effort techniques and models to enable loose administrative proximity and semantic integration of heterogeneous data sources. Finally, I share my perspectives on future dataspace research challenges, including multimedia data, data governance and the role of dataspaces to enable large-scale data sharing within Europe to power data-driven AI.

Digital transformation is driving a new wave of large-scale datafication in every aspect of our world. Today our society creates data ecosystems where data moves among actors within complex information supply chains that can form around an organization, community, sector, or smart environment. These ecosystems of data can be exploited to transform our world and present new challenges and opportunities in the design of intelligent systems. This talk presents my recent work on using the dataspace paradigm as a best-effort approach to data management within data ecosystems. The talk explores the theoretical foundations and principles of dataspaces and details a set of specialized best-effort techniques and models to enable loose administrative proximity and semantic integration of heterogeneous data sources. Finally, I share my perspectives on future dataspace research challenges, including multimedia data, data governance and the role of dataspaces to enable large-scale data sharing within Europe to power data-driven AI.

Advertisement
Advertisement

More Related Content

Slideshows for you (19)

Advertisement

From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent Systems

  1. 1. From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent Systems Edward Curry, Insight SFI Research Centre for Data Analytics edward.curry@nuigalway.ie LDAC2021 - 9th Linked Data in Architecture and Construction Workshop (11 - 13 October 2021)
  2. 2. Overview • Part I: Data Ecosystems for Intelligent Systems • Part II: Real-time Linked Dataspaces • Part III: Final Thoughts on Research Directions and Data Policy
  3. 3. Contents Part I: Fundamentals and Concepts Part II: Data Support Services Part III: Stream and Event Processing Services Part IV: Intelligent Systems and Applications Part V: Future Directions Team http://dataspaces.info Web: dataspaces.info A Team Effort: Open Access Book
  4. 4. Part I: Data Ecosystems for Intelligent Systems
  5. 5. First LDAC Meeting 2012
  6. 6. Emerging Smart Environments….
  7. 7. Real World Digital World Sensors Orient Decide Actuators Act Observe Physical Twin (Asset-centric) Digital Twin (System-centric) Digital Twins http://dataspaces.info 10
  8. 8. 11 Data-driven Intelligence will be drive by industrial, personal and open data Connected Intelligent Systems
  9. 9. Distributed and Decentralised Data Ecosystems Key Barrier: Interoperability – Protocols and Semantics 12 Curry, E. and Sheth, A. (2018) ‘Next-Generation Smart Environments: From System of Systems to Data Ecosystems’, IEEE Intelligent Systems, 33(3), pp. 69–76. doi: 10.1109/MIS.2018.033001418.
  10. 10. Ecosystem community of organisms and their environment interacting as a system Tansley (1935) Lindeman (1942),…
  11. 11. Data Ecosystem socio-technical system extracting value from data value chains by interacting organisations and individuals oriented to business and societal purposes marketplace, competition, collaboration Curry, E. (2016) ‘The Big Data Value Chain: Definitions, Concepts, and Theoretical Approaches’, in Cavanillas, J. M., Curry, E., and Wahlster, W. (eds) New Horizons for a Data-Driven Economy..
  12. 12. http://dataspaces.info 15
  13. 13. The “gold mining” metaphor applied to data processing Transforming Transport has made use of a total of 164 terabytes of data from 160 different data sources
  14. 14. Maturity stages of data assets and related “sieves”
  15. 15. Traditional Approaches to Data Integration Low High High Frequency of use Cost of administration & semantic integration using traditional approaches Popularity / Use Number of data sources, entities, attributes http://dataspaces.info The Long Tail of Data
  16. 16. 20 • Heterogeneous, complex and large-scale data • Very-large and dynamic “schemas” • Open Environments: distributed, decentralised decoupled data sources, anonymous users, multi- domain, lack of global order of information flow • Multiple perspectives (conceptualisations) of the reality. • Ambiguity, vagueness, inconsistency. Content Space: From Rigid Schemas to Schema-less..... ...and Fundamental Decentralisation
  17. 17. The Red Queen Hypothesis “It takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!” Lewis Carroll's Through the Looking-Glass
  18. 18. Part II: Real-time Linked Dataspaces
  19. 19. Data Platforms will Fuel AI-Driven Decision-Making Data Generation and Analysis (including IoT) Data Platforms (Access and Portability) AI and Decision Platforms
  20. 20. IoT-Enablement Layer 1 - Communication and Sensing IPv6, Wi-Fi, RFID, CoAP, AVB, etc. Layer 3 - Data Schema, Entities, Catalog, Sharing, Access/Control, etc. Layer 4 – Intelligent Apps, Analytics, and Users Datasets Things / Sensors Contextual Data Sources (including legacy systems) Predictive Analytics Situation Awareness Decision Support Digital Twin Machine Learning Users Layer 2 - Middleware Peer-to-Peer, Events, Pub/Sub, SOA, SDN, etc. A Data Sharing Layer is needed…. Adapted from: L. Atzori, A. Iera, and G. Morabito, “The Internet of Things: A survey,” Comput. Networks, vol. 54, no. 15, pp. 2787–2805, Oct. 2010. http://dataspaces.info
  21. 21. Human Interactivity: Web Search From Structure to Knowledge Graph to Search ~1995 ~100K Websites Exact Results Human Curated ~1998 ~2.4M Websites Approximate Results Computed ~2012 ~700M Approximate Results + Exact Computed + Crowd 25
  22. 22. Cost of Data Management Solutions http://dataspaces.info Administrative Proximity – Close vs. Loose Coordination – Assumptions concerning guarantees such as data, access, quality, and consistency, Semantic Integration – Degree to which data schemas are matched up (types, attributes, and names). 26 Halevy, A., Franklin, M. and Maier, D. 2006. Principles of dataspace systems. 25th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS ’06 (New York, New York, USA, 2006), 1–9.
  23. 23. Approximate and Best Effort Approaches Low High High Frequency of use Approximate & best-effort approaches Cost of administration & semantic integration using traditional approaches Popularity / Use Number of data sources, entities, attributes http://dataspaces.info The Long Tail of Data
  24. 24. Dataspace “Dataspaces are not a data integration approach; rather, they are more of a data co-existence approach. The goal of dataspace support is to provide base functionality over all data sources, regardless of how integrated they are.” (Halevy, A., Franklin, M. and Maier, D. 2006.)
  25. 25. Enabling platform for data management for intelligent systems within smart environments Combines the pay-as-you-go paradigm of dataspaces, linked data, and knowledge graphs with entity-centric real-time queries Real-time Linked Dataspaces 29 Principles: (adapted from by Halevy et al.) • Must deal with many different formats of streams and events. • Does not subsume the stream and event processing engines; they still provide individual access via their native interfaces. • Queries in are provided on a best-effort and approximate basis. • Must provide pathways to improve the integration among the data sources, including streams and events, in a pay-as-you-go fashion.
  26. 26. Key Challenge http://dataspaces.info Investigate techniques to enable approximate and best-effort support services for loose administrative proximity and semantic integration Incremental support services • Catalog • entity management • query and search • data discovery • human tasks • quality of service • complex event processing • streams dissemination • approximate semantic event matching
  27. 27. • • Sahlgren, 2013 Formal World Real World Baroni et al. 2013
  28. 28. • Distributional hypothesis: the context surrounding a given word in a text provides relevant information about its meaning. – "a word is characterized by the company it keeps" was popularized by Firth in the 1950s • Simplified semantic model: Associational and quantitative. 32 A wife is a female partner in a marriage. The term "wife" seems to be a close term to bride, the latter is a female participant in a wedding ceremony, while a wife is a married woman during her marriage. ... Distributional Semantic Model 32
  29. 29. c1 child husband spouse cn c2 function (number of times that the words occur in c1) 0.7 0.5 Distributional Semantic Model Distributional semantic model: Semantic statistical knowledge extracted from large Web corpora Works as a semantic ranking function E.g. esa(room, building)= 0.099 E.g. esa(room, car)= 0.009 θ Gabrilovich, E.; Markovitch, S.(2007). Computing semantic relatedness using Wikipedia-based Explicit Semantic Analysis. Proc. 20th Int'l Joint Conf. on Artificial Intelligence (IJCAI). 33
  30. 30. Schema-Agnostic Natural Language Queries NobelPrizeWinner A Semantic Gap Marie Curie :type Possible Data Representations Information Need: Who are the children of Marie Curie married to? Marie Curie 2 B C Marie Curie Henry R. Labouisse Ève Curie Irène Joliot-Curie :motherOf :motherOf :wifeOf :type :numberOfKids Frédéric Joliot-Curie :wifeOf Frédéric Joliot-Curie Irène Joliot-Curie :Spouse :Child Henry R. Labouisse Ève Curie :Spouse :Child Scientist Freitas, A. and Curry, E. (2014) ‘Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach’, in 18th International Conference on Intelligent User Interfaces (IUI’14): ACM
  31. 31. Marie Curie children married to Person :Marie Curie Query: Linked Data: :Ève Curie :motherOf :Henry R. Labouisse :wifeOf Distributional Semantic Search Information Need: Who are the children of Marie Curie married to?
  32. 32. Query Planner Ƭ-Space Large-scale unstructured data Commonsense knowledge Database Distributional semantics Core semantic approximation & composition operations Query Analysis Query Query Features Query Plan Treo: Question Answering over Linked Data
  33. 33. Challenges • Heterogeneity in Event Semantics (000s schema) • Heterogeneity in processing Rules (000s of rule tied to schema) • Manually Implemented Approximate Semantic Event Matcher • Distributional Event Semantics • Enables pay-as-you-go event matching for data streams • Replaced 48,000 exact rules with 100 approximate rules with around 85% accuracy Approximate Semantic Matching of Streams 37 Hasan, S. and Curry, E. (2014) ‘Approximate Semantic Matching of Events for the Internet of Things’, ACM Transactions on Internet Technology, 14(1).
  34. 34. Intelligent Systems and Applications http://dataspaces.info L OCATION Airport Office Home Mixed Use School LINATE AIRPORT, MILAN, ITALY INSIGHT, GALWAY, IRELAND HOUSES, THERMI, GREECE ENGINEERING, NUI GALWAY COLÁISTE NA COIRIBE, IRELAND T ARGET U SER S • Corporate users • ~9.5 million passengers • Utilities management • Maintenance staff • Environmental managers • 130 staff • Office consumers • Operations managers • Utility providers • Building managers • Domestic consumers (adults, young adults and children) • Utility providers • Mixed/Public consumers • Building managers • 100 staff • 1000 students (ages 18 to 24) • Mixed/Public consumers • School management • Maintenance staff • 500 students (ages 12 to 18) • 40 teachers I NFRASTRUCTURE • Safety critical • 10 km water network • Multiple buildings • Water meters • Energy meters • Legacy systems • 2190 m2 space • 22 offices + 160 open plan spaces • Conference room • 4 meeting rooms • 3 kitchens • Data centre • 30 person café • Energy meters • 10 households • Typical variety of domestic settings including kitchen, showers, baths, living room, bedrooms, and garden • Water meters • Water meters • Energy meters • Rainwater harvesting • Café • Weather station • Wet labs • Showers • Water meters • Energy meters • Rainwater harvesting Smart Water and Energy Management Pilots
  35. 35. Smart School CnaC School in Galway, Ireland Mixed Use Galway, Ireland Building Manager University Students Smart Airport Milan Linate, Italy Corporate Staff Passengers Smart Homes Municipality of Thermi, Greece Smart Office Galway, Ireland Families Operational Staff Researchers Application Developers Teaching Staff School Students Data Scientist Need to target different Target Users
  36. 36. IoT-enabled Digital Twins and Intelligent Applications Real-time Linked Dataspace Datasets Things / Sensors Entity Management Service Catalog & Access Control Service Personal Dashboard Public Dashboards Decision Analytics and Machine Learning Notifications Apps Alerts Orient Decide Act Search & Query Service Entity-Centric Real-Time Query Service Complex Event Processing Service Digital Twin CEP D Human Task Service Human Task Service Observe http://dataspaces.info “OODA” Loop
  37. 37. Interactive Public Displays Alerts and Notifications Personalised Dashboards Example Applications
  38. 38. Pilot Impacts
  39. 39. Experiences and Lessons Learnt from Dataspaces spaces.info • Developer education need for stream processing and approximate results • Incremental data management can support agile software development • Build the business case for data-driven innovation • Integration with legacy data is a significant cost in smart environments • The 5 star pay-as-you-go model simplified communication with non- technical users • A secure canonical source for entity data simplifies application development • Data quality with things and sensors is challenging in an operational environment • Working with three pipelines adds overhead (LAMBDA + Entity Layer) 43
  40. 40. Part III: Final Thoughts on Research Directions and Data Policy
  41. 41. http://dataspaces.info 45 Large-scale Decentralised Support Services • Enhanced Supported Services • Scaling Entity Management • Maintenance and Operation Cost Multimedia/Knowledge-Intensive Event Processing • Support Services for Multimedia Data • Placement of Multimedia Data and Workloads • Adaptive Training of Classifiers • Complex Multimedia Event Processing Trusted Data Sharing • Trusted Platforms • Usage Control • Personal/ Industrial Dataspaces Ecosystem Governance and Economic Models • Decentralised Data Governance • Economic Models Incremental Intelligent Systems Engineering Cognitive Adaptability • Pay-as-you-go Systems • Cognitive Adaptability Towards Human-centric Systems • Explainable Artificial Intelligence and Data Provenance • Human-in-the-loop Future Research Directions
  42. 42. Internet of Multimedia Things (IoMT)
  43. 43. Overview Multimodal Event Processing • Shift from Structure to Unstructured • Enabling Intelligent Systems with Real- time Multimodal Data Multimodal Data is a game changer for Smart Environments…. 47 • Multimodal Data Streams • Structured • Video • Audio • Rich-Content Processing • Larger data volumes • Larger Content-space • Content Extraction Costs • Edge and Resources • Computational Intensive • Network Intensive
  44. 44. Person Person Vest Vest Hat Hat Temp Wind Speed Lux Site Structured Sensor Streams Unstructured Sensor Streams occupant Left/right wearing wearing wearing wearing occupant has has has Real-time Health and Safety Monitoring Queries § Is everyone wearing PPE/hardhat? § Are there any visitors? § Is it a safe working temperature? § Is smoke detected? § Is the wind speed safe? § Is there any unsafe behaviour?
  45. 45. Neuro Symbolic Gnosis: Neuro-Symbolic Event Processing Camera Sensor Query 1 IoMT Sources IoMT Applications Camera Camera Sensor Sensor … … Query 2 Query 3 Sound Sound Sound Complex Event Matcher Single Event Matcher History Rules Multimedia Flows Structured Flows
  46. 46. Multimodal Event Processing Language Yadav, P. et al. (2021) ‘Query-Driven Video Event Processing for the Internet of Multimedia Things (Demo)’, Proceedings of the VLDB Endowment, 14(12), pp. 2847–2850.
  47. 47. Data Policy
  48. 48. “The future is already here – it’s just not evenly distributed.” William Gibson
  49. 49. (Open) Data is Key to AI “The world’s most valuable resource is no longer oil, but data. The data economy demands a new approach to antitrust rules” The Economist …startups and established firms that are just beginning to use AI need access to data in order to train their AI systems. Difficulty in accessing the necessary data can create a barrier to entry, potentially reducing competition and innovation. - Forbes
  50. 50. From Open Data to ……. Public Digital Infrastructures Forward-thinking societies will see the provision of digital infrastructure (including data platforms) as a shared societal service in the same way as water, sanitation, and healthcare. 54
  51. 51. Over 100 million
  52. 52. A European strategy for data
  53. 53. European Strategy for Data Data can flow within the EU and across sectors European rules and values are fully respected Rules for access and use of data are fair, practical and clear & clear data governance mechanisms are in place A common European data space, a single market for data Availability of high quality data to create and innovate
  54. 54. Health Industrial & Manufacturing Agriculture Culture Mobility Green Deal Security Cloud Federation, common European data spaces and AI Public Administration • Driven by stakeholders • Rich pool of data of varying degree of openness • Sectoral data governance (contracts, licenses, access rights, usage rights) • Technical tools for data pooling and sharing High Value Datasets From public sector AI Testing and Experimentation Facilities AI on demand platform IaaS (Infrastructure as a Service) Servers, computing, OS, storage, network PaaS (Platforms as a Service) Smart Interoperability Middleware SaaS (Software as a Service) Software, ERP, CRM, data analytics Edge Infrastructure & Services High- Performance Computing Federation of Cloud & HPC Infrastructure & Services Cloud stack management and multi-cloud / hybrid cloud, cloud governance Marketplace for Cloud to Edge based Services Cloud services meeting high requirements for data protection, security, portability, interoperability, energy efficiency Media
  55. 55. Boosting the Adoption of AI in Europe
  56. 56. Towards a European-Governed Data Sharing Space
  57. 57. http://dataspaces.info 62
  58. 58. The future is already here – it’s just not ……..WE need to evenly distribute it

×