Digital transformation is driving a new wave of large-scale datafication in every aspect of our world. Today our society creates data ecosystems where data moves among actors within complex information supply chains that can form around an organization, community, sector, or smart environment. These ecosystems of data can be exploited to transform our world and present new challenges and opportunities in the design of intelligent systems. This talk presents my recent work on using the dataspace paradigm as a best-effort approach to data management within data ecosystems. The talk explores the theoretical foundations and principles of dataspaces and details a set of specialized best-effort techniques and models to enable loose administrative proximity and semantic integration of heterogeneous data sources. Finally, I share my perspectives on future dataspace research challenges, including multimedia data, data governance and the role of dataspaces to enable large-scale data sharing within Europe to power data-driven AI.
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
Cyber-Physical Energy Systems (CPES) exploit the potential of information technology to boost energy efficiency while minimising environmental impacts. CPES can help manage energy more efficiently by providing a functional view of the entire energy system so that energy activities can be understood, changed, and reinvented to better support sustainable practices. CPES can be applied at different scales from Smart Grids and Smart Cities to Smart Enterprises and Smart Buildings. Significant technical challenges exist in terms of information management, leveraging real-time sensor data, coordination of the various stakeholders to optimize energy usage.
In this talk I describe an approach to overcome these challenges by re-using the Web standards to quickly connect the required systems within a CPES. The resulting lightweight architecture leverages Web technologies including Linked Data, the Web of Things, and Social Media. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Management scenario smart building.
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
Data management efforts such as Master Data Management and Data Curation are a popular approach for high quality enterprise data. However, Data Curation can be heavily centralised and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within big data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk discusses how crowdsourcing data management techniques can be applied within an enterprise context.
Topics covered include:
- Data Quality And Data Curation
- Crowdsourcing
- Case Studies on Crowdsourced Data Curation
- Setting up a Crowdsourced Data Curation Process
- Linked Open Data Example
- Future Research Challenges
ISWC 2016 Tutorial: Semantic Web of Things M3 framework & FIESTA-IoT EU projectFIESTA-IoT
Amelie Gyrard presents a tutorial on SWOT - the Semantic Web of Things.
For further information about this work. Please visit:
http://semantic-web-of-things.appspot.com
Extending Memory on the Web via Human-Centric Knowledge Exchange Network. Presented at W3C Workshop on Social Standards: The Future of Business, 7-8 August 2013, San Francisco, USA
The NIH Data Commons - BD2K All Hands Meeting 2015Vivien Bonazzi
Presentation given at the BD2K All Hands meeting in Bethesda, MD, USA in November 2015
https://datascience.nih.gov/bd2k/events/NOV2015-AllHands
Video cast of this presentation:
http://videocast.nih.gov/summary.asp?Live=17480&bhcp=1
talk starts at 2hrs 40min (its about 55mins long) - includes video!
Document describing the Commons : https://datascience.nih.gov/commons
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
Cyber-Physical Energy Systems (CPES) exploit the potential of information technology to boost energy efficiency while minimising environmental impacts. CPES can help manage energy more efficiently by providing a functional view of the entire energy system so that energy activities can be understood, changed, and reinvented to better support sustainable practices. CPES can be applied at different scales from Smart Grids and Smart Cities to Smart Enterprises and Smart Buildings. Significant technical challenges exist in terms of information management, leveraging real-time sensor data, coordination of the various stakeholders to optimize energy usage.
In this talk I describe an approach to overcome these challenges by re-using the Web standards to quickly connect the required systems within a CPES. The resulting lightweight architecture leverages Web technologies including Linked Data, the Web of Things, and Social Media. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Management scenario smart building.
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
Data management efforts such as Master Data Management and Data Curation are a popular approach for high quality enterprise data. However, Data Curation can be heavily centralised and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within big data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk discusses how crowdsourcing data management techniques can be applied within an enterprise context.
Topics covered include:
- Data Quality And Data Curation
- Crowdsourcing
- Case Studies on Crowdsourced Data Curation
- Setting up a Crowdsourced Data Curation Process
- Linked Open Data Example
- Future Research Challenges
ISWC 2016 Tutorial: Semantic Web of Things M3 framework & FIESTA-IoT EU projectFIESTA-IoT
Amelie Gyrard presents a tutorial on SWOT - the Semantic Web of Things.
For further information about this work. Please visit:
http://semantic-web-of-things.appspot.com
Extending Memory on the Web via Human-Centric Knowledge Exchange Network. Presented at W3C Workshop on Social Standards: The Future of Business, 7-8 August 2013, San Francisco, USA
The NIH Data Commons - BD2K All Hands Meeting 2015Vivien Bonazzi
Presentation given at the BD2K All Hands meeting in Bethesda, MD, USA in November 2015
https://datascience.nih.gov/bd2k/events/NOV2015-AllHands
Video cast of this presentation:
http://videocast.nih.gov/summary.asp?Live=17480&bhcp=1
talk starts at 2hrs 40min (its about 55mins long) - includes video!
Document describing the Commons : https://datascience.nih.gov/commons
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
A wide-scale bottom-up approach to the creation and management of open data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. This talk explores how to involving a wide community of users in collaborative management of open data activities within a Smart City. The talk discusses how crowdsourcing techniques can be applied within a Smart City context using crowdsourcing and human computation platforms such as Amazon Mechanical Turk, Mobile Works, and Crowd Flower.
presented at the 2011 SemTech
Open government data and related services/applications are quickly growing on the Web. Although most agree that the government data has great potential in solving real world problems, there are still many challenges that must be addressed. This talk will describe several representative domain applications and provide concrete examples of evolving technical challenges remaining. We will show solution paths that have proven useful and make recommendations on the corresponding Semantic Web best practices.
• Scalability. How can we handle(e.g. search and cleanse) the 3,000+ raw/tool datasets, and the additional 300,000+ geo datasets from data.gov?
• Interoperability. Multi-scale open government data came from city governments, state governments, and national governments. How can one compare the GDP of the US and China, and later link to state-level financial data? Open government data covers many domains. How can one associate open government data with domain knowledge to build a cancer prevention application?
• Provenance and quality. How should provenance be leveraged to facilitate high-quality data management interactions (e.g. reuse, mash-up and feedback) between the government and the public?
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
Recent advances in web technologies allow people to help solve complex problems by performing online tasks in return for money, learning, or fun. At present, human contribution is limited to the tasks defined on individual crowdsourcing platforms. Furthermore, there is a lack of tools and technologies that support matching of tasks with appropriate users, across multiple systems. A more explicit capture of the semantics of crowdsourcing tasks could enable the design and development of matchmaking services between users and tasks. The paper presents the SLUA ontology that aims to model users and tasks in crowdsourcing systems in terms of the relevant actions, capabilities, and rewards. This model describes different types of human tasks that help in solving complex problems using crowds. The paper provides examples of describing users and tasks in some real world systems, with SLUA ontology.
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
The growing number of datasets published on the Web as linked data brings both opportunities for high data availability and challenges inherent to querying data in a semantically heterogeneous and distributed environment. Approaches used for querying siloed databases fail at Web-scale because users don't have an a priori understanding of all the available datasets. This article investigates the main challenges in constructing a query and search solution for linked data and analyzes existing approaches and trends.
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...European Data Forum
Invited Talk Julie Marguerite, THALES, at the European Data Forum 2013, 9 April 2013 in Dublin, Ireland: Big data: a new world of opportunities for software services
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris
This lecture highlights current trends, challenges and opportunities related to the emergence of large amounts of data. It also presents Sirris’s recent research activities in this domain.
Key Technology Trends for Big Data in EuropeEdward Curry
In this presentation we will discuss some of the results of the BIG project including analysis of foundational Big Data research technologies, technology and strategy roadmaps to enable business to understand the potential of Big Data technologies across different sectors, and the necessary collaboration and dissemination infrastructure to link technology suppliers, integrators and leading user organizations.
Edward Curry is leading the Technical Working Group of the BIG Project with over 30 committed experts along the big data value chain (Acquisition, Analysis, Curation, Storage, Usage). With the help of the other technical leads, he will elaborate on the key technology trends identified in the BIG Project and how they bring data-driven value to industrial sectors.
Linked Water Data For Water Information ManagementEdward Curry
The management of water consumption is hindered by low general awareness and absence of precise historical and contextual information. Effective and efficiency management of water resources requires a holistic approach considering all the stages of water usage. A decision support tool for water management services requires access to a number of different data domains and different data providers. The design of next-generation water information management systems poses significant technical challenges in terms of information management, integration of heterogeneous data, and real-time processing of dynamic data. Linked Data is a set of web technologies that enables integration of different data sources. This work investigates the usage of Linked Data technologies in the Water Management domain, describes the fundamental concepts of the approach, details an architecture, and discusses possible water management applications.
Within the operational phase buildings are now producing more data than ever before, from energy usage, utility information, occupancy patterns, weather data, etc. In order to manage a building holistically it is important to use knowledge from across these information sources. However, many barriers exist to their interoperability and there is little interaction between these islands of information. As part of moving building data to the cloud there is a critical need to reflect on the design of cloud-based data services and how they are designed from an interoperability perspective. If new cloud data services are designed in the same manner as traditional building management systems they will suffer from the data interoperability problems. Linked data technology leverages the existing open protocols and W3C standards of the Web architecture for sharing structured data on the web. In this paper we propose the use of linked data as an enabling technology for cloud-based building data services. The objective of linking building data in the cloud is to create an integrated well-connected graph of relevant information for managing a building. This paper describes the fundamentals of the approach and demonstrates the concept within a Small Medium sized Enterprise (SME) with an owner-occupied office building.
EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...European Data Forum
Invited talk Florian Bauer, Operations & IT Director REEEP, at the European Data Forum 2013, 10 April 2013 in Dublin, Ireland: Unleashing climate and energy knowledge with Linked Open Data and consistent terminology
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Student Achievement Review (initially presented during Inauguration Function of the Ohio Center of Excellence in Knowledge-Enabled Computing at Wright State (Kno.e.sis)) - updated since
Center overview: http://bit.ly/coe-k
Invitation: http://bit.ly/COE-invite
GENI Engineering Conference -- Ian FosterIan Foster
I was invited to talk at the 18th GENI Engineering Conference (http://groups.geni.net/geni/wiki/GEC18Agenda) on experiences in the Grid community with creating and operating large shared infrastructures. I chose to focus on our experiences using Software as a Service (SaaS: aka Cloud) to reduce barriers to the use of the capabilities required to create and operate virtual organizations.
Briefing on US EPA Open Data Strategy using a Linked Data Approach3 Round Stones
An overview presented by Ms. Bernadette Hyland on 18-Nov 2014 on the US EPA Open Data strategy, focusing on the Resource Conservation & Recovery Act (RCRA) dataset to be published as linked data . This work is in support of Presidential Memorandum M13-13 - Open Data Policy and Managing Information as an Asset.
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
The Real-time Linked Dataspace (RLD) is an enabling platform for data management for intelligent systems within smart environments that combines the pay-as-you-go paradigm of dataspaces, linked data, and knowledge graphs with entity-centric real-time query capabilities.
The RLD contains all the relevant information within a data ecosystem including things, sensors, and data sources and has the responsibility for managing the relationships among these participants.
It manages sources without presuming a pre-existing semantic integration among them using specialised dataspace support services for loose administrative proximity and semantic integration for event and stream systems. Support services leverage approximate and best-effort techniques and operate under a 5 star model for “pay-as-you-go” incremental data management.
Presentation slide for this:
Kei Kurakawa, Toward universal information access on the digital object cloud, In book of abstracts of International Workshop on Data Science - Present & Future of Open Data & Open Science -, p.57-59, November 12-15, 2018, Mishima Citizens Cultural Hall & Joint Support-Center for Data Science Research, Mishima, Shizuoka, Japan
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
A wide-scale bottom-up approach to the creation and management of open data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. This talk explores how to involving a wide community of users in collaborative management of open data activities within a Smart City. The talk discusses how crowdsourcing techniques can be applied within a Smart City context using crowdsourcing and human computation platforms such as Amazon Mechanical Turk, Mobile Works, and Crowd Flower.
presented at the 2011 SemTech
Open government data and related services/applications are quickly growing on the Web. Although most agree that the government data has great potential in solving real world problems, there are still many challenges that must be addressed. This talk will describe several representative domain applications and provide concrete examples of evolving technical challenges remaining. We will show solution paths that have proven useful and make recommendations on the corresponding Semantic Web best practices.
• Scalability. How can we handle(e.g. search and cleanse) the 3,000+ raw/tool datasets, and the additional 300,000+ geo datasets from data.gov?
• Interoperability. Multi-scale open government data came from city governments, state governments, and national governments. How can one compare the GDP of the US and China, and later link to state-level financial data? Open government data covers many domains. How can one associate open government data with domain knowledge to build a cancer prevention application?
• Provenance and quality. How should provenance be leveraged to facilitate high-quality data management interactions (e.g. reuse, mash-up and feedback) between the government and the public?
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
Recent advances in web technologies allow people to help solve complex problems by performing online tasks in return for money, learning, or fun. At present, human contribution is limited to the tasks defined on individual crowdsourcing platforms. Furthermore, there is a lack of tools and technologies that support matching of tasks with appropriate users, across multiple systems. A more explicit capture of the semantics of crowdsourcing tasks could enable the design and development of matchmaking services between users and tasks. The paper presents the SLUA ontology that aims to model users and tasks in crowdsourcing systems in terms of the relevant actions, capabilities, and rewards. This model describes different types of human tasks that help in solving complex problems using crowds. The paper provides examples of describing users and tasks in some real world systems, with SLUA ontology.
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
The growing number of datasets published on the Web as linked data brings both opportunities for high data availability and challenges inherent to querying data in a semantically heterogeneous and distributed environment. Approaches used for querying siloed databases fail at Web-scale because users don't have an a priori understanding of all the available datasets. This article investigates the main challenges in constructing a query and search solution for linked data and analyzes existing approaches and trends.
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...European Data Forum
Invited Talk Julie Marguerite, THALES, at the European Data Forum 2013, 9 April 2013 in Dublin, Ireland: Big data: a new world of opportunities for software services
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris
This lecture highlights current trends, challenges and opportunities related to the emergence of large amounts of data. It also presents Sirris’s recent research activities in this domain.
Key Technology Trends for Big Data in EuropeEdward Curry
In this presentation we will discuss some of the results of the BIG project including analysis of foundational Big Data research technologies, technology and strategy roadmaps to enable business to understand the potential of Big Data technologies across different sectors, and the necessary collaboration and dissemination infrastructure to link technology suppliers, integrators and leading user organizations.
Edward Curry is leading the Technical Working Group of the BIG Project with over 30 committed experts along the big data value chain (Acquisition, Analysis, Curation, Storage, Usage). With the help of the other technical leads, he will elaborate on the key technology trends identified in the BIG Project and how they bring data-driven value to industrial sectors.
Linked Water Data For Water Information ManagementEdward Curry
The management of water consumption is hindered by low general awareness and absence of precise historical and contextual information. Effective and efficiency management of water resources requires a holistic approach considering all the stages of water usage. A decision support tool for water management services requires access to a number of different data domains and different data providers. The design of next-generation water information management systems poses significant technical challenges in terms of information management, integration of heterogeneous data, and real-time processing of dynamic data. Linked Data is a set of web technologies that enables integration of different data sources. This work investigates the usage of Linked Data technologies in the Water Management domain, describes the fundamental concepts of the approach, details an architecture, and discusses possible water management applications.
Within the operational phase buildings are now producing more data than ever before, from energy usage, utility information, occupancy patterns, weather data, etc. In order to manage a building holistically it is important to use knowledge from across these information sources. However, many barriers exist to their interoperability and there is little interaction between these islands of information. As part of moving building data to the cloud there is a critical need to reflect on the design of cloud-based data services and how they are designed from an interoperability perspective. If new cloud data services are designed in the same manner as traditional building management systems they will suffer from the data interoperability problems. Linked data technology leverages the existing open protocols and W3C standards of the Web architecture for sharing structured data on the web. In this paper we propose the use of linked data as an enabling technology for cloud-based building data services. The objective of linking building data in the cloud is to create an integrated well-connected graph of relevant information for managing a building. This paper describes the fundamentals of the approach and demonstrates the concept within a Small Medium sized Enterprise (SME) with an owner-occupied office building.
EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...European Data Forum
Invited talk Florian Bauer, Operations & IT Director REEEP, at the European Data Forum 2013, 10 April 2013 in Dublin, Ireland: Unleashing climate and energy knowledge with Linked Open Data and consistent terminology
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Student Achievement Review (initially presented during Inauguration Function of the Ohio Center of Excellence in Knowledge-Enabled Computing at Wright State (Kno.e.sis)) - updated since
Center overview: http://bit.ly/coe-k
Invitation: http://bit.ly/COE-invite
GENI Engineering Conference -- Ian FosterIan Foster
I was invited to talk at the 18th GENI Engineering Conference (http://groups.geni.net/geni/wiki/GEC18Agenda) on experiences in the Grid community with creating and operating large shared infrastructures. I chose to focus on our experiences using Software as a Service (SaaS: aka Cloud) to reduce barriers to the use of the capabilities required to create and operate virtual organizations.
Briefing on US EPA Open Data Strategy using a Linked Data Approach3 Round Stones
An overview presented by Ms. Bernadette Hyland on 18-Nov 2014 on the US EPA Open Data strategy, focusing on the Resource Conservation & Recovery Act (RCRA) dataset to be published as linked data . This work is in support of Presidential Memorandum M13-13 - Open Data Policy and Managing Information as an Asset.
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
The Real-time Linked Dataspace (RLD) is an enabling platform for data management for intelligent systems within smart environments that combines the pay-as-you-go paradigm of dataspaces, linked data, and knowledge graphs with entity-centric real-time query capabilities.
The RLD contains all the relevant information within a data ecosystem including things, sensors, and data sources and has the responsibility for managing the relationships among these participants.
It manages sources without presuming a pre-existing semantic integration among them using specialised dataspace support services for loose administrative proximity and semantic integration for event and stream systems. Support services leverage approximate and best-effort techniques and operate under a 5 star model for “pay-as-you-go” incremental data management.
Presentation slide for this:
Kei Kurakawa, Toward universal information access on the digital object cloud, In book of abstracts of International Workshop on Data Science - Present & Future of Open Data & Open Science -, p.57-59, November 12-15, 2018, Mishima Citizens Cultural Hall & Joint Support-Center for Data Science Research, Mishima, Shizuoka, Japan
This is a version of series of talks given at NCSA-UIUC's director seminar, IBM Almaden, HP Labs, DERI-Galway, City Univ of Dublin, and KMI-Open University during Aug-Oct 2010 (replaces earlier keynote version). It deals with couple of items of the vision outlined at http://bit.ly/4ynB7A
A video of this presentation: http://www.ncsa.illinois.edu/News/Video/2010/sheth.html
Link to this talk as http://bit.ly/CHE-talk
Relationship Web: Trailblazing, Analytics and Computing for Human ExperienceAmit Sheth
Amit Sheth, "Relationship Web: Trailblazing, Analytics and Computing for Human Experience," Keynote talk at 27th International Conference on Conceptual Modeling (ER 2008) Barcelona, October 20-23 2008.
See associated discussion at:
http://knoesis.org/amit/publications/index.php?page=9
http://knoesis.org/library/resource.php?id=00190
Smarter Cities pillars: Internet of Things, Web of Data, Crowdsourcing
Interdependence analysis: Society ageing and Societal urbanisation
Enablement of Smarter Inclusive Cities
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
Presentation of our paper at the WHISE workshop at ESWC 2016 on requirements for metadata over non-public datasets for the science & technology studies field.
The best way to publish and share research data is with a research data repository. A repository is an online database that allows research data to be preserved across time and helps others find it.
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
Thinking about the need for deeper provenance for knowledge graphs but also using knowledge graphs to enrich provenance. Presented at https://seminariomirianandres.unirioja.es/sw19/
CHRISTINA NGUYEN, University of Toronto Mississauga Library
In the world of digital literacies, liaison and instructional librarians are increasingly coming to terms with a new term: algorithmic literacy. No matter the liaison or instruction subjects – computer science, sociology, language and literature, chemistry, physics, economics, or other – students are grappling with assignments that demand a critical understanding, or even use, of algorithms. Over the course of this session, we’ll discuss the term ‘algorithmic literacies,’ explore how it fits into other digital literacies, and see why it as a curriculum might belong at your library. We’ll also look at some examples of practical pedagogical methods you can implement right away, depending on what types of AL lessons you want to teach, and who your patrons are. Lastly, we’ll discuss how librarians should view themselves as co-learners when working with AL skills. This session seeks to bring together participants from across the different libraries, with diverse missions/vision/mandates, to explore ways we can all benefit from teaching AL. If time permits, we may discuss how text and data librarians (functional specialists) can support the development of this curriculum.
Similar to From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent Systems (20)
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Analysis insight about a Flyball dog competition team's performance
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent Systems
1. From Data Platforms to Dataspaces:
Enabling Data Ecosystems for Intelligent Systems
Edward Curry,
Insight SFI Research Centre for Data Analytics
edward.curry@nuigalway.ie
LDAC2021 - 9th Linked Data in Architecture and Construction Workshop (11 - 13 October 2021)
2. Overview
• Part I: Data Ecosystems for Intelligent Systems
• Part II: Real-time Linked Dataspaces
• Part III: Final Thoughts on Research Directions and Data Policy
3. Contents
Part I: Fundamentals and Concepts
Part II: Data Support Services
Part III: Stream and Event Processing Services
Part IV: Intelligent Systems and Applications
Part V: Future Directions
Team
http://dataspaces.info
Web:
dataspaces.info
A Team Effort: Open Access Book
10. Real World Digital World
Sensors Orient
Decide
Actuators Act
Observe
Physical Twin
(Asset-centric)
Digital Twin
(System-centric)
Digital
Twins
http://dataspaces.info 10
12. Distributed and Decentralised Data Ecosystems
Key Barrier: Interoperability – Protocols and Semantics
12
Curry, E. and Sheth, A. (2018) ‘Next-Generation Smart Environments: From System of Systems to Data Ecosystems’,
IEEE Intelligent Systems, 33(3), pp. 69–76. doi: 10.1109/MIS.2018.033001418.
14. Data
Ecosystem
socio-technical system
extracting value from data
value chains by interacting
organisations and individuals
oriented to business and
societal purposes
marketplace, competition,
collaboration
Curry, E. (2016) ‘The Big Data Value Chain: Definitions, Concepts,
and Theoretical Approaches’, in Cavanillas, J. M., Curry, E., and
Wahlster, W. (eds) New Horizons for a Data-Driven Economy..
16. The “gold mining” metaphor applied to data processing
Transforming Transport has
made use of a total of 164
terabytes of data from 160
different data sources
19. Traditional Approaches to Data Integration
Low
High
High
Frequency
of use
Cost of administration &
semantic integration using
traditional approaches
Popularity
/
Use
Number of data sources, entities, attributes
http://dataspaces.info
The Long Tail of Data
20. 20
• Heterogeneous, complex and large-scale data
• Very-large and dynamic “schemas”
• Open Environments: distributed, decentralised
decoupled data sources, anonymous users, multi-
domain, lack of global order of information flow
• Multiple perspectives
(conceptualisations) of the reality.
• Ambiguity, vagueness, inconsistency.
Content Space: From Rigid
Schemas to Schema-less.....
...and Fundamental
Decentralisation
21. The Red
Queen
Hypothesis
“It takes all the running you can do, to keep in the
same place. If you want to get somewhere else,
you must run at least twice as fast as that!”
Lewis Carroll's Through the Looking-Glass
23. Data Platforms will Fuel AI-Driven Decision-Making
Data Generation and Analysis
(including IoT)
Data Platforms
(Access and Portability)
AI and Decision Platforms
24. IoT-Enablement
Layer 1 - Communication and Sensing
IPv6, Wi-Fi, RFID, CoAP, AVB, etc.
Layer 3 - Data
Schema, Entities, Catalog, Sharing, Access/Control, etc.
Layer 4 – Intelligent Apps, Analytics, and Users
Datasets
Things / Sensors
Contextual Data Sources
(including legacy systems)
Predictive
Analytics
Situation
Awareness
Decision
Support
Digital
Twin
Machine
Learning
Users
Layer 2 - Middleware
Peer-to-Peer, Events, Pub/Sub, SOA, SDN, etc.
A Data Sharing Layer is needed….
Adapted from: L. Atzori, A. Iera, and G. Morabito, “The
Internet of Things: A survey,” Comput. Networks, vol. 54,
no. 15, pp. 2787–2805, Oct. 2010.
http://dataspaces.info
25. Human Interactivity: Web Search
From Structure to Knowledge Graph
to Search
~1995
~100K Websites
Exact Results
Human Curated
~1998
~2.4M Websites
Approximate Results
Computed
~2012
~700M
Approximate Results + Exact
Computed + Crowd
25
26. Cost of Data Management Solutions
http://dataspaces.info
Administrative Proximity
– Close vs. Loose Coordination
– Assumptions concerning
guarantees such as data, access,
quality, and consistency,
Semantic Integration
– Degree to which data schemas are
matched up (types, attributes, and
names).
26
Halevy, A., Franklin, M. and Maier, D. 2006. Principles of dataspace
systems. 25th ACM SIGMOD-SIGACT-SIGART symposium on Principles of
database systems - PODS ’06 (New York, New York, USA, 2006), 1–9.
27. Approximate and Best Effort Approaches
Low
High
High
Frequency
of use Approximate &
best-effort
approaches
Cost of administration &
semantic integration using
traditional approaches
Popularity
/
Use
Number of data sources, entities, attributes
http://dataspaces.info
The Long Tail of Data
28. Dataspace
“Dataspaces are not a data integration approach; rather, they are
more of a data co-existence approach. The goal of dataspace
support is to provide base functionality over all data sources,
regardless of how integrated they are.”
(Halevy, A., Franklin, M. and Maier, D. 2006.)
29. Enabling platform for data management for intelligent
systems within smart environments
Combines the pay-as-you-go paradigm of dataspaces,
linked data, and knowledge graphs with entity-centric
real-time queries
Real-time Linked Dataspaces
29
Principles: (adapted from by Halevy et al.)
• Must deal with many different formats of streams
and events.
• Does not subsume the stream and event processing
engines; they still provide individual access via their
native interfaces.
• Queries in are provided on a best-effort and
approximate basis.
• Must provide pathways to improve the integration
among the data sources, including streams and
events, in a pay-as-you-go fashion.
30. Key Challenge
http://dataspaces.info
Investigate techniques to enable approximate
and best-effort support services for loose
administrative proximity and semantic
integration
Incremental support services
• Catalog
• entity management
• query and search
• data discovery
• human tasks
• quality of service
• complex event
processing
• streams dissemination
• approximate semantic
event matching
32. • Distributional hypothesis: the context surrounding a given word in a text provides
relevant information about its meaning.
– "a word is characterized by the company it keeps" was popularized by Firth in the 1950s
• Simplified semantic model: Associational and quantitative.
32
A wife is a female partner in a marriage. The term "wife" seems to
be a close term to bride, the latter is a female participant in a
wedding ceremony, while a wife is a married woman during her
marriage.
...
Distributional Semantic Model
32
33. c1
child
husband
spouse
cn
c2
function (number of times that the words occur in c1)
0.7
0.5
Distributional Semantic Model
Distributional
semantic model:
Semantic statistical
knowledge extracted
from large Web
corpora
Works as a semantic
ranking function
E.g. esa(room, building)= 0.099
E.g. esa(room, car)= 0.009
θ
Gabrilovich, E.; Markovitch, S.(2007). Computing semantic relatedness using Wikipedia-based
Explicit Semantic Analysis. Proc. 20th Int'l Joint Conf. on Artificial Intelligence (IJCAI).
33
34. Schema-Agnostic Natural Language Queries
NobelPrizeWinner
A
Semantic Gap
Marie Curie
:type
Possible Data Representations
Information Need: Who are the children of Marie Curie married to?
Marie Curie
2
B C
Marie Curie
Henry R. Labouisse
Ève Curie
Irène Joliot-Curie
:motherOf
:motherOf :wifeOf
:type
:numberOfKids
Frédéric Joliot-Curie
:wifeOf
Frédéric Joliot-Curie
Irène Joliot-Curie
:Spouse
:Child
Henry R. Labouisse
Ève Curie
:Spouse
:Child
Scientist
Freitas, A. and Curry, E. (2014) ‘Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional
Semantics Approach’, in 18th International Conference on Intelligent User Interfaces (IUI’14): ACM
35. Marie Curie children married to Person
:Marie Curie
Query:
Linked
Data:
:Ève Curie
:motherOf
:Henry R. Labouisse
:wifeOf
Distributional Semantic Search
Information Need: Who are the children of Marie Curie married to?
37. Challenges
• Heterogeneity in Event Semantics
(000s schema)
• Heterogeneity in processing Rules
(000s of rule tied to schema)
• Manually Implemented
Approximate Semantic Event Matcher
• Distributional Event Semantics
• Enables pay-as-you-go event
matching for data streams
• Replaced 48,000 exact rules with
100 approximate rules with around
85% accuracy
Approximate Semantic Matching of Streams
37
Hasan, S. and Curry, E. (2014) ‘Approximate Semantic Matching of Events
for the Internet of Things’, ACM Transactions on Internet Technology, 14(1).
38. Intelligent Systems and Applications
http://dataspaces.info
L
OCATION
Airport Office Home Mixed Use School
LINATE AIRPORT,
MILAN, ITALY
INSIGHT,
GALWAY, IRELAND
HOUSES,
THERMI, GREECE
ENGINEERING,
NUI GALWAY
COLÁISTE NA
COIRIBE, IRELAND
T
ARGET
U
SER
S
• Corporate users
• ~9.5 million
passengers
• Utilities
management
• Maintenance
staff
• Environmental
managers
• 130 staff
• Office consumers
• Operations
managers
• Utility providers
• Building
managers
• Domestic
consumers
(adults, young
adults and
children)
• Utility providers
• Mixed/Public
consumers
• Building
managers
• 100 staff
• 1000 students
(ages 18 to 24)
• Mixed/Public
consumers
• School
management
• Maintenance
staff
• 500 students
(ages 12 to 18)
• 40 teachers
I
NFRASTRUCTURE
• Safety critical
• 10 km water
network
• Multiple
buildings
• Water meters
• Energy meters
• Legacy systems
• 2190 m2 space
• 22 offices + 160
open plan spaces
• Conference room
• 4 meeting rooms
• 3 kitchens
• Data centre
• 30 person café
• Energy meters
• 10 households
• Typical variety of
domestic settings
including kitchen,
showers, baths,
living room,
bedrooms, and
garden
• Water meters
• Water meters
• Energy meters
• Rainwater
harvesting
• Café
• Weather station
• Wet labs
• Showers
• Water meters
• Energy meters
• Rainwater
harvesting
Smart Water
and Energy
Management
Pilots
39. Smart School
CnaC School in
Galway, Ireland
Mixed Use
Galway, Ireland
Building
Manager
University Students
Smart Airport
Milan Linate,
Italy
Corporate
Staff
Passengers
Smart Homes
Municipality of
Thermi, Greece
Smart Office
Galway, Ireland
Families
Operational
Staff
Researchers
Application
Developers
Teaching Staff School Students
Data
Scientist
Need to target different Target Users
40. IoT-enabled
Digital Twins
and
Intelligent
Applications
Real-time Linked Dataspace
Datasets
Things / Sensors
Entity Management Service
Catalog &
Access Control
Service
Personal Dashboard
Public Dashboards
Decision Analytics and
Machine Learning
Notifications Apps
Alerts
Orient Decide
Act
Search & Query
Service
Entity-Centric
Real-Time Query
Service
Complex Event
Processing Service
Digital Twin
CEP
D
Human Task Service
Human Task
Service
Observe
http://dataspaces.info
“OODA” Loop
43. Experiences and Lessons Learnt from Dataspaces
spaces.info
• Developer education need for stream processing and approximate
results
• Incremental data management can support agile software
development
• Build the business case for data-driven innovation
• Integration with legacy data is a significant cost in smart environments
• The 5 star pay-as-you-go model simplified communication with non-
technical users
• A secure canonical source for entity data simplifies application
development
• Data quality with things and sensors is challenging in an operational
environment
• Working with three pipelines adds overhead (LAMBDA + Entity Layer)
43
44. Part III: Final Thoughts on
Research Directions and Data
Policy
45. http://dataspaces.info 45
Large-scale Decentralised Support Services
• Enhanced Supported Services
• Scaling Entity Management
• Maintenance and Operation Cost
Multimedia/Knowledge-Intensive Event
Processing
• Support Services for Multimedia Data
• Placement of Multimedia Data and
Workloads
• Adaptive Training of Classifiers
• Complex Multimedia Event Processing
Trusted Data Sharing
• Trusted Platforms
• Usage Control
• Personal/ Industrial Dataspaces
Ecosystem Governance and Economic
Models
• Decentralised Data Governance
• Economic Models
Incremental Intelligent Systems
Engineering Cognitive Adaptability
• Pay-as-you-go Systems
• Cognitive Adaptability
Towards Human-centric Systems
• Explainable Artificial Intelligence
and Data Provenance
• Human-in-the-loop
Future Research Directions
47. Overview
Multimodal Event Processing
• Shift from Structure to Unstructured
• Enabling Intelligent Systems with Real-
time Multimodal Data
Multimodal Data is a game changer
for Smart Environments….
47
• Multimodal Data Streams
• Structured
• Video
• Audio
• Rich-Content Processing
• Larger data volumes
• Larger Content-space
• Content Extraction Costs
• Edge and Resources
• Computational Intensive
• Network Intensive
48. Person
Person
Vest
Vest
Hat
Hat
Temp
Wind
Speed
Lux
Site
Structured Sensor Streams Unstructured Sensor Streams
occupant
Left/right
wearing
wearing
wearing
wearing
occupant
has
has
has
Real-time Health and Safety Monitoring
Queries
§ Is everyone wearing
PPE/hardhat?
§ Are there any visitors?
§ Is it a safe working
temperature?
§ Is smoke detected?
§ Is the wind speed
safe?
§ Is there any unsafe
behaviour?
49. Neuro Symbolic
Gnosis: Neuro-Symbolic Event Processing
Camera
Sensor
Query 1
IoMT Sources IoMT Applications
Camera
Camera
Sensor
Sensor
…
…
Query 2
Query 3
Sound
Sound
Sound
Complex Event Matcher
Single Event Matcher
History Rules
Multimedia Flows
Structured Flows
50. Multimodal Event Processing Language
Yadav, P. et al. (2021) ‘Query-Driven Video Event Processing for
the Internet of Multimedia Things (Demo)’, Proceedings of the
VLDB Endowment, 14(12), pp. 2847–2850.
52. “The future is already here –
it’s just not evenly distributed.” William Gibson
53. (Open) Data is Key to AI
“The world’s most valuable resource is
no longer oil, but data. The data
economy demands a new approach to
antitrust rules”
The Economist
…startups and established firms that are
just beginning to use AI need access to
data in order to train their AI systems.
Difficulty in accessing the necessary data
can create a barrier to entry, potentially
reducing competition and innovation. -
Forbes
54. From Open Data to …….
Public Digital Infrastructures
Forward-thinking societies
will see the provision of
digital infrastructure
(including data platforms) as
a shared societal service in
the same way as water,
sanitation, and healthcare.
54
57. European Strategy for Data
Data can flow within the
EU and across sectors
European rules and values
are fully respected
Rules for access and use of data are
fair, practical and clear & clear data
governance mechanisms are in place
A common European data space, a single market for data
Availability of high quality data
to create and innovate
58. Health
Industrial &
Manufacturing Agriculture Culture Mobility Green Deal Security
Cloud Federation, common European data spaces and AI
Public
Administration
• Driven by stakeholders
• Rich pool of data of varying degree of openness
• Sectoral data governance (contracts, licenses,
access rights, usage rights)
• Technical tools for data pooling and sharing
High Value
Datasets
From
public
sector
AI Testing and
Experimentation Facilities
AI on demand platform
IaaS (Infrastructure as a Service)
Servers, computing, OS, storage, network
PaaS (Platforms as a Service)
Smart Interoperability Middleware
SaaS (Software as a Service)
Software, ERP, CRM, data analytics
Edge
Infrastructure
& Services
High-
Performance
Computing
Federation of Cloud & HPC Infrastructure & Services
Cloud stack management and multi-cloud / hybrid cloud, cloud governance
Marketplace for Cloud to Edge based Services
Cloud services meeting high requirements for data protection, security, portability, interoperability, energy efficiency
Media