This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
This presentation looks in detail at SPARQL (SPARQL Protocol and RDF Query Language) and introduces approaches for querying and updating semantic data. It covers the SPARQL algebra, the SPARQL protocol, and provides examples for reasoning over Linked Data. We use examples from the music domain, which can be directly tried out and ran over the MusicBrainz dataset. This includes gaining some familiarity with the RDFS and OWL languages, which allow developers to formulate generic and conceptual knowledge that can be exploited by automatic reasoning services in order to enhance the power of querying.
This presentation covers the whole spectrum of Linked Data production and exposure. After a grounding in the Linked Data principles and best practices, with special emphasis on the VoID vocabulary, we cover R2RML, operating on relational databases, Open Refine, operating on spreadsheets, and GATECloud, operating on natural language. Finally we describe the means to increase interlinkage between datasets, especially the use of tools like Silk.
This presentation gives details on technologies and approaches towards exploiting Linked Data by building LD applications. In particular, it gives an overview of popular existing applications and introduces the main technologies that support implementation and development. Furthermore, it illustrates how data exposed through common Web APIs can be integrated with Linked Data in order to create mashups.
This presentation focuses on providing means for exploring Linked Data. In particular, it gives an overview of current visualization tools and techniques, looking at semantic browsers and applications for presenting the data to the end used. We also describe existing search options, including faceted search, concept-based search and hybrid search, based on a mix of using semantic information and text processing. Finally, we conclude with approaches for Linked Data analysis, describing how available data can be synthesized and processed in order to draw conclusions.
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
Â
This presentation introduces the main principles of Linked Data, the underlying technologies and background standards. It provides basic knowledge for how data can be published over the Web, how it can be queried, and what are the possible use cases and benefits. As an example, we use the development of a music portal (based on the MusicBrainz dataset), which facilitates access to a wide range of information and multimedia resources relating to music.
Big Linked Data - Creating Training CurriculaEUCLID project
Â
This presentation includes an overview of the basic rules to follow when developing training and education curricula for Linked Data and Big Linked Data
This presentation looks in detail at SPARQL (SPARQL Protocol and RDF Query Language) and introduces approaches for querying and updating semantic data. It covers the SPARQL algebra, the SPARQL protocol, and provides examples for reasoning over Linked Data. We use examples from the music domain, which can be directly tried out and ran over the MusicBrainz dataset. This includes gaining some familiarity with the RDFS and OWL languages, which allow developers to formulate generic and conceptual knowledge that can be exploited by automatic reasoning services in order to enhance the power of querying.
This presentation covers the whole spectrum of Linked Data production and exposure. After a grounding in the Linked Data principles and best practices, with special emphasis on the VoID vocabulary, we cover R2RML, operating on relational databases, Open Refine, operating on spreadsheets, and GATECloud, operating on natural language. Finally we describe the means to increase interlinkage between datasets, especially the use of tools like Silk.
This presentation gives details on technologies and approaches towards exploiting Linked Data by building LD applications. In particular, it gives an overview of popular existing applications and introduces the main technologies that support implementation and development. Furthermore, it illustrates how data exposed through common Web APIs can be integrated with Linked Data in order to create mashups.
This presentation focuses on providing means for exploring Linked Data. In particular, it gives an overview of current visualization tools and techniques, looking at semantic browsers and applications for presenting the data to the end used. We also describe existing search options, including faceted search, concept-based search and hybrid search, based on a mix of using semantic information and text processing. Finally, we conclude with approaches for Linked Data analysis, describing how available data can be synthesized and processed in order to draw conclusions.
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
Â
This presentation introduces the main principles of Linked Data, the underlying technologies and background standards. It provides basic knowledge for how data can be published over the Web, how it can be queried, and what are the possible use cases and benefits. As an example, we use the development of a music portal (based on the MusicBrainz dataset), which facilitates access to a wide range of information and multimedia resources relating to music.
Big Linked Data - Creating Training CurriculaEUCLID project
Â
This presentation includes an overview of the basic rules to follow when developing training and education curricula for Linked Data and Big Linked Data
As described in the April NISO/DCMI webinar by Dan Brickley, schema.org is a search-engine initiative aimed at helping webmasters use structured data markup to improve the discovery and display of search results. Drupal 7 makes it easy to markup HTML pages with schema.org terms, allowing users to quickly build websites with structured data that can be understood by Google and displayed as Rich Snippets.
Improved search results are only part of the story, however. Data-bearing documents become machine-processable once you find them. The subject matter, important facts, calendar events, authorship, licensing, and whatever else you might like to share become there for the taking. Sales reports, RSS feeds, industry analysis, maps, diagrams and process artifacts can now connect back to other data sets to provide linkage to context and related content. The key to this is the adoption standards for both the data model (RDF) and the means of weaving it into documents (RDFa). Drupal 7 has become the leading content platform to adopt these standards.
This webinar will describe how RDFa and Drupal 7 can improve how organizations publish information and data on the Web for both internal and external consumption. It will discuss what is required to use these features and how they impact publication workflow. The talk will focus on high-level and accessible demonstrations of what is possible. Technical people should learn how to proceed while non-technical people will learn what is possible.
The slideset used to conduct an introduction/tutorial
on DBpedia use cases, concepts and implementation
aspects held during the DBpedia community meeting
in Dublin on the 9th of February 2015.
(slide creators: M. Ackermann, M. Freudenberg
additional presenter: Ali Ismayilov)
Existing data management approaches assume control over schema, data and data generation, which is not the case in open, de-centralised environments such as the Web. The lack of control means that there are social processes necessary to generate 'ordo ab chao' and hence a new life cycle model is necessary.
Based on our experience in Linked Data publishing and consumption over the past years, we have identify involved parties and fundamental phases, which provide for a multitude of so called Linked Data life cycles.
If you want to hear me speak to the slides, you might want to check out the following videos on YouTube:
Part 1: http://www.youtube.com/watch?v=AFJSMKv5s3s
Part 2: http://www.youtube.com/watch?v=G6YJSZdXOsc
Part 3: http://www.youtube.com/watch?v=OagzNpDEPJg
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
Â
Many issues are faced by scholars, book researchers, museum directors who try to find the underlying connection between resources. Scholars in particular continuously emphasizes the role of digital humanities and the value of linked data in cultural heritage information systems.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
Â
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
The W3C Linked Data Platform (LDP) specification describes a set of best practices and simple approach for a read-write Linked Data architecture, based on HTTP access to web resources that describe their state using the RDF data model. This presentation provides a set of simple examples that illustrates how an LDP client can interact with an LDP server in the context of a read-write Linked Data application i.e. how to use the LDP protocol for retrieving, updating, creating and deleting Linked Data resources.
Talk delivered at YOW! Developer Conferences in Melbourne, Brisbane and Sydney Australia on 1-9 December 2016.
Abstract: Governments collect a lot of data. Data on air quality, toxic chemicals, laws and regulations, public health, and the census are intended to be widely distributed. Some data is not for public consumption. This talk focuses on open government data â the information that is meant to be made available for benefit of policy makers, researchers, scientists, industry, community organisers, journalists and members of civil society.
Weâll cover the evolution of Linked Data, which is now being used by Google, Apple, IBM Watson, federal governments worldwide, non-profits including CSIRO and OpenPHACTS, and thousands of others worldwide.
Next weâll delve into the evolution of the U.S. Environmental Protection Agencyâs Open Data service that we implemented using Linked Data and an Open Source Data Platform. Highlights include how we connected to hundreds of billions of open data facts in the worldâs largest, open chemical molecules database PubChem and DBpedia.
WHO SHOULD ATTEND
Data scientists, software engineers, data analysts, DBAs, technical leaders and anyone interested in utilising linked data and open government data.
A set of slides that provides a high-level overview of the W3C Linked Data Platform specification presented at the 4th Linked Data in Architecture and Construction Workshop.
For more detailed and technical version of the presentation, please refer to
http://www.slideshare.net/nandana/learning-w3c-linked-data-platform-with-examples
LDAC 2016 programme
http://smartcity.linkeddata.es/LDAC2016/#programme
Dec'2013 webinar from the EUCLID project on managing large volumes of Linked Data
webinar recording at https://vimeo.com/84126769 and https://vimeo.com/84126770
more info on EUCLID: http://euclid-project.eu/
As described in the April NISO/DCMI webinar by Dan Brickley, schema.org is a search-engine initiative aimed at helping webmasters use structured data markup to improve the discovery and display of search results. Drupal 7 makes it easy to markup HTML pages with schema.org terms, allowing users to quickly build websites with structured data that can be understood by Google and displayed as Rich Snippets.
Improved search results are only part of the story, however. Data-bearing documents become machine-processable once you find them. The subject matter, important facts, calendar events, authorship, licensing, and whatever else you might like to share become there for the taking. Sales reports, RSS feeds, industry analysis, maps, diagrams and process artifacts can now connect back to other data sets to provide linkage to context and related content. The key to this is the adoption standards for both the data model (RDF) and the means of weaving it into documents (RDFa). Drupal 7 has become the leading content platform to adopt these standards.
This webinar will describe how RDFa and Drupal 7 can improve how organizations publish information and data on the Web for both internal and external consumption. It will discuss what is required to use these features and how they impact publication workflow. The talk will focus on high-level and accessible demonstrations of what is possible. Technical people should learn how to proceed while non-technical people will learn what is possible.
The slideset used to conduct an introduction/tutorial
on DBpedia use cases, concepts and implementation
aspects held during the DBpedia community meeting
in Dublin on the 9th of February 2015.
(slide creators: M. Ackermann, M. Freudenberg
additional presenter: Ali Ismayilov)
Existing data management approaches assume control over schema, data and data generation, which is not the case in open, de-centralised environments such as the Web. The lack of control means that there are social processes necessary to generate 'ordo ab chao' and hence a new life cycle model is necessary.
Based on our experience in Linked Data publishing and consumption over the past years, we have identify involved parties and fundamental phases, which provide for a multitude of so called Linked Data life cycles.
If you want to hear me speak to the slides, you might want to check out the following videos on YouTube:
Part 1: http://www.youtube.com/watch?v=AFJSMKv5s3s
Part 2: http://www.youtube.com/watch?v=G6YJSZdXOsc
Part 3: http://www.youtube.com/watch?v=OagzNpDEPJg
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
Â
Many issues are faced by scholars, book researchers, museum directors who try to find the underlying connection between resources. Scholars in particular continuously emphasizes the role of digital humanities and the value of linked data in cultural heritage information systems.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
Â
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
The W3C Linked Data Platform (LDP) specification describes a set of best practices and simple approach for a read-write Linked Data architecture, based on HTTP access to web resources that describe their state using the RDF data model. This presentation provides a set of simple examples that illustrates how an LDP client can interact with an LDP server in the context of a read-write Linked Data application i.e. how to use the LDP protocol for retrieving, updating, creating and deleting Linked Data resources.
Talk delivered at YOW! Developer Conferences in Melbourne, Brisbane and Sydney Australia on 1-9 December 2016.
Abstract: Governments collect a lot of data. Data on air quality, toxic chemicals, laws and regulations, public health, and the census are intended to be widely distributed. Some data is not for public consumption. This talk focuses on open government data â the information that is meant to be made available for benefit of policy makers, researchers, scientists, industry, community organisers, journalists and members of civil society.
Weâll cover the evolution of Linked Data, which is now being used by Google, Apple, IBM Watson, federal governments worldwide, non-profits including CSIRO and OpenPHACTS, and thousands of others worldwide.
Next weâll delve into the evolution of the U.S. Environmental Protection Agencyâs Open Data service that we implemented using Linked Data and an Open Source Data Platform. Highlights include how we connected to hundreds of billions of open data facts in the worldâs largest, open chemical molecules database PubChem and DBpedia.
WHO SHOULD ATTEND
Data scientists, software engineers, data analysts, DBAs, technical leaders and anyone interested in utilising linked data and open government data.
A set of slides that provides a high-level overview of the W3C Linked Data Platform specification presented at the 4th Linked Data in Architecture and Construction Workshop.
For more detailed and technical version of the presentation, please refer to
http://www.slideshare.net/nandana/learning-w3c-linked-data-platform-with-examples
LDAC 2016 programme
http://smartcity.linkeddata.es/LDAC2016/#programme
Dec'2013 webinar from the EUCLID project on managing large volumes of Linked Data
webinar recording at https://vimeo.com/84126769 and https://vimeo.com/84126770
more info on EUCLID: http://euclid-project.eu/
Linked Data from a Digital Object Management SystemUldis Bojars
Â
Lightning talk about generating Linked Data from a digital object management system at the National Library of Latvia. Conference: http://swib.org/swib12/programme.php
Talk given at Open Knowledge Foundation 'Opening Up Metadata: Challenges, Standards and Tools' Workshop, Queen Mary University of London, 13th June 2012.
Info on the event at http://openglam.org/2012/05/31/last-places-left-for-opening-up-metadata-challenges-standards-and-tools/
Linked Data for the Masses: The approach and the SoftwareIMC Technologies
Â
Title: Linked Data for the Masses: The approach and the Software
@ EELLAK (GFOSS) Conference 2010
Athens, Greece
15/05/2010
Creator: George Anadiotis (R&D Director)
This paper surveys the landscape of linked open data projects in cultural heritage, exam- ining the work of groups from around the world. Traditionally, linked open data has been ranked using the five star method proposed by Tim Berners-Lee. We found this ranking to be lacking when evaluating how cultural heritage groups not merely develop linked open datasets, but find ways to used linked data to augment user experience. Building on the five-star method, we developed a six-stage life cycle describing both dataset development and dataset usage. We use this framework to describe and evaluate fifteen linked open data projects in the realm of cultural heritage.
(http://lod2.eu/BlogPost/webinar-series) In this Webinar Michael Martin presents CubeViz - a facetted browser for statistical data utilizing the RDF Data Cube vocabulary which is the state-of-the-art in representing statistical data in RDF. This vocabulary is compatible with SDMX and increasingly being adopted. Based on the vocabulary and the encoded Data Cube, CubeViz is generating a facetted browsing widget that can be used to filter interactively observations to be visualized in charts. Based on the selected structure, CubeViz offer beneficiary chart types and options which can be selected by users.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
An introduction deck for the Web of Data to my team, including basic semantic web, Linked Open Data, primer, and then DBpedia, Linked Data Integration Framework (LDIF), Common Crawl Database, Web Data Commons.
About the Webinar
The library and cultural institution communities have generally accepted the vision of moving to a Linked Data environment that will align and integrate their resources with those of the greater Semantic Web. But moving from vision to implementation is not easy or well-understood. A number of institutions have begun the needed infrastructure and tools development with pilot projects to provide structured data in support of discovery and navigation services for their collections and resources.
Join NISO for this webinar where speakers will highlight actual Linked Data projects within their institutionsâfrom envisioning the model to implementation and lessons learnedâand present their thoughts on how linked data benefits research, scholarly communications, and publishing.
Speakers:
Jon Voss - Strategic Partnerships Director, We Are What We Do
LODLAM + Historypin: A Collaborative Global Community
Matt Miller - Front End Developer, NYPL Labs at the New York Public Library
The Linked Jazz Project: Revealing the Relationships of the Jazz Community
Cory Lampert - Head, Digital Collections , UNLV University Libraries
Silvia Southwick - Digital Collections Metadata Librarian, UNLV University Libraries
Linked Data Demystified: The UNLV Linked Data Project
From the Feb 19 2014 NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
The Web of Data - â¨Ralph Swick, Domain Lead of the Information and Knowledge Domain at W3C
Dev Dives: Train smarter, not harder â active learning and UiPath LLMs for do...UiPathCommunity
Â
đĽ Speed, accuracy, and scaling â discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Miningâ˘:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing â with little to no training required
Get an exclusive demo of the new family of UiPath LLMs â GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
đ¨âđŤ Andras Palfi, Senior Product Manager, UiPath
đŠâđŤ Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
Â
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Â
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Â
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overviewâ
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Â
Are you looking to streamline your workflows and boost your projectsâ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, youâre in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part âEssentials of Automationâ series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Hereâs what youâll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
Weâll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Donât miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
Â
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. Whatâs changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Â
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But thereâs more:
In a second workflow supporting the same use case, youâll see:
Your campaign sent to target colleagues for approval
If the âApproveâ button is clicked, a Jira/Zendesk ticket is created for the marketing design team
Butâif the âRejectâ button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
Â
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
⢠The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
⢠Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
⢠Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
⢠Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. Analysis &
Mining Module
Visualization
Module
RDFa
Data acquisition
LD Dataset
Access
Application
EUCLID Objective
SPARQL
Endpoint
Publishing
Vocabulary
Mapping
Interlinking
LD Wrapper
Physical Wrapper
Integrated
Dataset
Cleansing
R2R Transf.
LD Wrapper
RDF/
XML
Streaming providers
Downloads
Musical Content
Metadata
EUCLID â Scaling up Linked Data
Other content
2
3. Motivation: Music!
⢠Our aim: build a music-based portal using Linked
CH 1
Data technologies
⢠So far, we have studied different mechanisms for:
â˘
â˘
â˘
â˘
Linked Data management via SPARQL queries
Reasoning over Linked Data
Linked Data access (RDF dumps, endpoints, RDFa)
Linked Data storage in repositories
CH 5
CH 2
CH 3
⢠In this chapter, we will study current research and
technologies to scale up to very large volumes of
Linked Data
EUCLID â Scaling up Linked Data
3
4. Agenda
1. Introduction to Big (Linked) Data
2. NoSQL databases for Linked Data
3. Hadoop for Linked Data
4. Stream processing for Linked Data
5. ⌠and more
EUCLID â Scaling up Linked Data
4
6. Introduction to Big Data
Big
Data
Management of data which is âtoo
complexâ for being processed with
traditional solutions
â˘
Big does not stand primarily for size,
but as an analogy for âoverwhelmingâ
â˘
Big can mean âhigh varietyâ, âhigh
volumeâ or âhigh velocityâ
EUCLID â Scaling up Linked Data
6
7. The 3 Vs of Big Data
Variety
Big
Big
Data
Data
Different forms of data
Volume
Petabytes of data
Velocity
Real-time data streams
EUCLID â Scaling up Linked Data
7
8. The 3 Vs of Big Data
Variety
Volume
Velocity
time
Data
characteristic
Structured, semi- Large volumes of Streams, sensors,
structured and
data
near real-time
unstructured
data, IoT
Challenge
Data integration
Reasoning and
querying
Reasoning &
querying
Solution
Semantic
technologies are
a good fit
Distributed
storage &
processing,
parallel
processing
Stream reasoning
& querying
EUCLID â Scaling up Linked Data
8
9. The Extended Vs of Big Data
Variety
Volume
Velocity
⢠Veracity: Uncertainty of the data
⢠Variability: Variation in meaning in different contexts
⢠Value: turning data into information into insight
⢠Not easy measure
⢠Depend on context and intended use
⢠Linked Data & Semantic Technologies can help
EUCLID â Scaling up Linked Data
9
11. Beyond Big Data (2)
Semantic Technologies
Semantic technologies extract meaning from data, ranging from quantitative
data and text, to video, voice and images. Many of these techniques have
existed for years and are based on advanced statistics, data mining, machine
learning and knowledge management. One reason they are garnering more
interest is the renewed business requirement for monetizing information as a
strategic asset. Even more pressing is the technical need. Increasing volumes,
variety and velocity â big data â in IM and business operations, requires
semantic technology that makes sense out of data for humans, or
automates decisions
Source: Gartner Inc. âGartner Identifies Top Technology Trends Impacting Information
Infrastructure in 2013â
EUCLID â Scaling up Linked Data
11
12. Towards Big Linked Data
⢠This characteristic is the most inherent to Linked Data
Variety
⢠Agile data model
⢠Different vocabularies
Volume
2007
Velocity
2008
2009
2010
2011
⢠RDF Streams
⢠Semantic Sensors
EUCLID â Scaling up Linked Data
12
14. Big Linked Data &
Linked Big Data
Big Linked Data
Linked Big Data
⢠Exponential growth of Linked
Data in the last five years
⢠Big Data approach adopted by
the Linked Data community,
especially to handle
Volume
Velocity
⢠Linked Data approach
adopted by the Big Data
community
⢠RDF data model for
Variety
⢠Enrich Big Data with metadata
and semantics
⢠Interlink Big Data sets &
reduce duplication
⢠Simplify data access,
discovery & integration
Source: M. Dimitrov. âSemantic Technologies for Big Dataâ
EUCLID â Scaling up Linked Data
14
16. RDF Databases
⢠Native or RDBMS based RDF databases
â OWLIM (http://www.ontotext.com/owlim)
â Virtuoso Universal Server (http://virtuoso.openlinksw.com/ )
â Stardog (http://stardog.com)
â AllegroGraph (http://www.franz.com/agraph/allegrograph/ )
â Systap Bigdata (http://www.systap.com/)
â Jena TDB (http://jena.apache.org/documentation/tdb/)
â Oracle, DB2
EUCLID â Scaling up Linked Data
16
17. RDF Database Advantages
⢠RDF (graph) based data model
â Global identifies of resources/entities
â Agile schema
⢠Inference of implicit facts
â Forward, backward, hybrid reasoning strategy
⢠Expressive query language (SPARQL)
⢠Compliance to standards
EUCLID â Scaling up Linked Data
17
18. NoSQL Databases
⢠âNot Only SQLâ
⢠a group of databases technologies which donât
follow the relational data model
⢠Typical requirements
â Distributed
â High availability
â Handle big data & query volumes (scalability)
â Hierarchical or graph data structures
â Flexible schema
EUCLID â Scaling up Linked Data
18
19. NoSQL Taxonomy
Conceptual structures
⢠Key/value stores
Value
Key
â Each key associated with a value (DHT)
⢠Wide-column stores
Artist
â Each key is associated with many attributes,
columns are stored together
Album
Song
The
Beatles
Let it be
Get back
Queen
Jazz
Fun it
⢠Document databases
â Each key associated with a complex data
structure
Key
Structureddocument
Key
Structureddocument
⢠Graph databases
â Data is represented as nodes and edges
Data
EUCLID â Scaling up Linked Data
Relationship
Data
19
20. Key/Value Stores
⢠Efficient key/value lookups
Key
Value
⢠Schema-less
⢠Simpler read/write operations
â Low latency & high throughput
⢠Examples
â DynamoDB, Azure Table Storage, Riak, Redis, MemcacheDB,
Voldemort
EUCLID â Scaling up Linked Data
20
21. Wide-Column Stores
â˘
â˘
â˘
â˘
â˘
â˘
â˘
A key is associated with several attributes
Data in the same column is stored together
Efficient for complex aggregations over data
Artist
Album
Song
Schema-less / dynamic schema
The
Let it be
Get back
Beatles
Easy to add new columns
Queen
Jazz
Fun it
Columns can be grouped together (column family)
Examples:
â HBase (http://hbase.apache.org)
â Cassandra (http://cassandra.apache.org)
EUCLID â Scaling up Linked Data
21
22. HBase
â˘
â˘
â˘
â˘
â˘
â˘
â˘
Open source column-oriented store
Based on Googleâs BigTable
Built on top of HDFS and Hadoop
Horizontally scalable, automatic sharding
high availability / automatic failover
Strongly consistent reads/writes
Java/REST API
EUCLID â Scaling up Linked Data
22
23. Document Databases
⢠Each key associated with a complex data structure
(document)
⢠Documents can contain key/value pairs, key/array
pairs, or even nested structures
⢠Schema-less / dynamic schema
â New fields can be easily added to the document structure
⢠Typical document formats
â JSON, XML
⢠Examples:
Key
â Couchbase (http://www.couchbase.com)
â MongoDB (http://www.mongodb.org)
EUCLID â Scaling up Linked Data
Structureddocument
Key
Structureddocument
23
25. Couchbase
⢠Document-oriented database
â Documents are stored as JSON
⢠Flexible schema
â Document structure easy to change
⢠Optimised to run in-memory and on several
nodes
â Ejection and eventual persistence
⢠Incremental views & indexes
⢠Scalability, rebalancing, replication, failover
⢠RESTful API
EUCLID â Scaling up Linked Data
25
26. Graph Databases
Motivation
Graphs: Representation of highly connected data
Network of Friends in a High School
Relationship among artists in Last.fm
http://sixdegrees.hu/last.fm/
A Fragment of Facebook
EUCLID â Scaling up Linked Data
Relationships between Tweets
26
27. Graph Databases
⢠Based on the property graph model
⢠Support for query languages and core graph-based
tasks
â reachability, traversal, adjacency and pattern matching
⢠Examples
Relationship
Data
â Neo4j (http://neo4j.org)
â Dex (http://sparsity-technologies.com/dex.php)
â HyperGraphDB (http://www.hypergraphdb.org)
EUCLID â Scaling up Linked Data
Data
27
28. Graph Databases
Example: Property Graph Model
Year: 1970
Duration: 35:16
Let it be
Homepage:
thebeatles.com
Origin: Liverpool
The Beatles
created
Year: 1961
Duration: 32:02
Year: 1965
Elvis Presley
created
Revolver
Revolver
Year: 1966
Duration: 35:01
Fullname: Elvis Aaron
Presley
Homepage: elvis.com
Origin: Memphis
Help!
⢠Nodes and edges may have properties
⢠Properties: Key-value pairs
EUCLID â Scaling up Linked Data
28
29. Neo4j
⢠Graph database
â Nodes, Relationships, Properties, Paths
â Indexes over properties
â˘
â˘
â˘
â˘
â˘
Flexible schema
Cypher graph query language
ACID transactions
High availability, distributed clusters
RESTful and Java APIs
EUCLID â Scaling up Linked Data
29
30. Rya
⢠RDF store based on Accumulo
â Column-store, HDFS
â Sesame query parser, SAIL
implementation
⢠3 table index
â SPO, POS, OSP
â Sufficient for all triple patterns
â All triple parts (S, P, O) encoded in
the RowID
â Clustered index
Source: R. Punnoose, A. Crainiceanu, D. Rapp âRya: A Scalable RDF Triple Store for the Cloudsâ
EUCLID â Scaling up Linked Data
30
31. Rya (2)
⢠Query processing
â Sesame (SPARQL) query plan translated to Accumulo range
scans & lookups
â Parallel scans for joins (x10-20 speedup)
â Batch scans (Accumulo) to reduce number of range scans
â Statistics for triple patterns selectivity, query re-ordering
⢠Performance evaluation (LUBM)
â No significant degradation when data grows with 2-3 orders
of magnitude
Source: R. Punnoose, A. Crainiceanu, D. Rapp âRya: A Scalable RDF Triple Store for the Cloudsâ
EUCLID â Scaling up Linked Data
31
32. âNoSQL Databases f0r RDF: An
Empirical Evaluationâ
⢠Goal
â Store RDF data in HBase, Couchbase, Hive & Cassandra
â Benchmark query performance against a native
distributed RDF database (4store)
⢠HBase prototype
â Jena for SPARQL queries
â 3 index tables (SPO, POS, OSP)
â Row key encodes S+P+O, cells are empty
â Jena query plan translated to HBase filters & lookups
Source: Cudre-Mauroux et al. âNoSQL Databases for RDF: An Empirical Evaluationâ
EUCLID â Scaling up Linked Data
32
33. âNoSQL Databases f0r RDF: An
Empirical Evaluationâ (2)
⢠Hive+HBase prototype
â SPARQL to HiveQL translation
â Property table
⢠Row key is S
⢠a column for each P
⢠cell value stores O
⢠Multi-valued attributes have different timestamps
Source: Cudre-Mauroux et al. âNoSQL Databases for RDF: An Empirical Evaluationâ
EUCLID â Scaling up Linked Data
33
34. âNoSQL Databases f0r RDF: An
Empirical Evaluationâ (3)
⢠CumulusRDF prototype
â Sesame for SPARQL queries, Cassandra for data management
â 3 index tables (SPO, POS, OSP)
â Sesame query plan translated to Cassandra index lookups
⢠Couchbase prototype
â Map RDF into JSON documents
⢠all triples with the same S stored in the same document (molecule)
⢠2 JSON arrays for Ps and Os
â Jena as a SPARQL query engine
â 3 indexes (Couchbase views): SPO, POS, OSP
Source: Cudre-Mauroux et al. âNoSQL Databases for RDF: An Empirical Evaluationâ
EUCLID â Scaling up Linked Data
34
35. âNoSQL Databases f0r RDF: An
Empirical Evaluationâ (4)
⢠Benchmarks
â BSBM 10M, 100M
and 1B triples
â 1, 2, 4, 8, 16 node
cluster
â AWS cost & query
execution time
Source: Cudre-Mauroux et al. âNoSQL Databases for RDF: An Empirical Evaluationâ
EUCLID â Scaling up Linked Data
35
36. âNoSQL Databases f0r RDF: An
Empirical Evaluationâ (5)
⢠Results
â Simple SPARQL queries can be executed more
efficiently on a NoSQL datastore
â Data loading time for some NoSQL datastores
comparable or better than the native RDF store
â Complex SPARQL queries perform significantly slower
on NoSQL systems
⢠Query optimisations are required
â MapReduce operations (Hive & Couchbase) introduce
high latency for view maintenance / query execution
Source: Cudre-Mauroux et al. âNoSQL Databases for RDF: An Empirical Evaluationâ
EUCLID â Scaling up Linked Data
36
38. Working with Distributed Data
⢠Apache Hadoop (http://hadoop.apache.org) is an open source
implementation of MapReduce
⢠MapReduce
â Distributed batch processing
â Map phase partitions the input set (K/V pairs), Reduce phase performs
aggregated processing over the partitions in parallel
â Shuffle intermediate results (from Map nodes to Reduce nodes)
⢠Allows for the processing of distributed large data sets across
clusters of computers
â On a distributed file system (HDFS)
â Scales up to thousands of nodes, each offering local processing power
and storage
EUCLID â Scaling up Linked Data
38
39. âScalable Distributed Reasoning
with MapReduceâ
⢠Goal
â Utilise Hadoop for large scale reasoning
⢠Approach
â Implement each RDFS rule (join) via a Map & Reduce function
â Map outputs original triple as value, and the join term as key
â Reducer receives all needed triples to perform the join
Source: Urbani et al. âScalable Distributed Reasoning with MapReduceâ
EUCLID â Scaling up Linked Data
39
40. âScalable Distributed Reasoning
with MapReduceâ (2)
Source: Urbani et al. âScalable Distributed Reasoning with MapReduceâ
EUCLID â Scaling up Linked Data
40
41. âScalable Distributed Reasoning
with MapReduceâ (3)
⢠Challenge
â Too many duplicates (unique to derived
triple ratio of 1:50)
⢠Optimisations
â Replicate schema triples on each mode
(in memory)
⢠Needed for each join; usually a small set
â Rule re-ordering
⢠Which rule may be triggered by another
rule?
⢠Reduce the number of required iterations
Source: Urbani et al. âScalable Distributed Reasoning with MapReduceâ
EUCLID â Scaling up Linked Data
41
42. âScalable Distributed Reasoning
with MapReduceâ (4)
⢠Results
â Throughput of 4.5M triples / sec on a 16-node cluster
â 16+ nodes do not improve the performance
significantly
Source: Urbani et al. âScalable Distributed Reasoning with MapReduceâ
EUCLID â Scaling up Linked Data
42
43. Lessons Learned from Largescale Reasoning (J. Urbani)
⢠1st Law: Treat schema triples differently
â Replicate on all nodes to minimise subsequent data transfer
⢠2nd Law: Data skew dominates data distribution
â No universal partitioning scheme for input data
â Computation tasks moved to the nodes storing the data
(data locality)
⢠3rd Law: Certain problems only appear at a very large
scale
â Proof-of-concept prototypes are often not representative
Source: Jacopo Urbani âThree Laws Learned from Web-scale Reasoningâ
EUCLID â Scaling up Linked Data
43
45. Streaming Data
⢠A large amount of new data is constantly being created or
data is being updated at a rapid rate
â Traffic data, sensor networks, social networks, financial markets
time
⢠Many data sources create a constant âstream of informationâ
â Not always practical to store all data and then query it
â Continuous queries over transient data
⢠More recent data is more important
â Describes the current state of a dynamic system
EUCLID â Scaling up Linked Data
46
46. Stream Processing
⢠Streams are observed through windows
⢠Continuous queries can be registered over the stream
⢠Continuous queries are iteratively evaluated over the data in the
current window
â Can leverage static background knowledge (e.g., schema information)
⢠Generates a stream of answers
Window
time
Background
Knowledge
Continuous
Query
EUCLID â Scaling up Linked Data
Stream of answers
47
47. Linked Stream Data
⢠A representation of sensor/stream data following the
Linked Data principles
â Sensor data can be enriched with semantics
â Facilitates data discovery and integration of heterogeneous data
sources
⢠Challenges
â RDF Triples must be annotated with timestamps
â Extensions to the SPARQL language â windows, continuous queries,
streaming operators
â Continuous semantics
â Scalability (Volume)
â High throughput and low latency (Velocity)
â Approximate reasoning
EUCLID â Scaling up Linked Data
48
48. Querying Streams with
SPARQL Extensions
⢠The mechanism to evaluate queries over streaming data is the
specification of continuous queries
⢠The corresponding results to the continuous query are
updated while new data arrives
⢠Several SPARQL extensions with streaming operators based on
CQL (Continuous Query Language)
â C-SPARQL
â SPARQLStream
â EP-SPARQL, CQELS, Instants
EUCLID â Scaling up Linked Data
49
49. C-SPARQL (1)
C-SPARQL is an extension of SPARQL 1.1
1. RDF Streams: Sequence of RDF triples annotated with timestamps:
<(s,p,o), timestamp>
2. FROM STREAM extension for stream sources and windows
FromStrClause
ď¨ 'FROM' ['NAMED'] 'STREAM' StreamIRI
' [ RANGE' Window ']'
Window
ď¨ LogicalWindow | PhysicalWindow
LogicalWindow
ď¨ Number TimeUnit WindowOverlap
TimeUnit
'DAY'
ď¨ 'MSEC' | 'SEC' | 'MIN' | 'HOUR' |
WindowOverlap
ď¨ 'STEP' Number TimeUnit | 'TUMBLING'
PhysicalWindow
ď¨ 'TRIPLES' Number
EUCLID â Scaling up Linked Data
50
50. C-SPARQL (2)
3. Registration
⢠Creates a continuous query over the data source
⢠The query output is variable bindings, RDF graph, or a
new stream
Registration ď¨ 'REGISTER' ('QUERY'|'STREAM') QName 'AS' Query
EUCLID â Scaling up Linked Data
51
51. C-SPARQL (3)
Example
Query:
Retrieve the cars and districts, where the car was registered in a toll.
REGISTER QUERY CarsEnteringInDistricts AS
SELECT DISTINCT ?district ?car
FROM STREAM <www.uc.eu/tollgates.trdf> [RANGE 40 SEC STEP 10 SEC]
WHERE {
?toll t:registers ?car .
?toll c:placedIn ?street .
?district c:contains ?street . }
Source: Barbieri, Davide Francesco, et al. "Querying rdf streams with c-sparql." ACM SIGMOD
Record 39.1 (2010): 20-26.
EUCLID â Scaling up Linked Data
52
52. C-SPARQL (4)
Source: M. Balduini et al. âTutorial on Stream Reasoning for Linked Data (ISWCâ2013)â
EUCLID â Scaling up Linked Data
53
53. SPARQLStream (1)
⢠Utilizes the same definition of RDF streams as in C-SPARQL:
<(s,p,o), timestamp>
⢠The language is defined as follows:
NamedStream ď¨ 'FROM' ['NAMED'] 'STREAM' StreamIRI ' [' Window ']'
Window
ď¨ 'NOW-' Integer TimeUnit [UpperBound] [Slide]
UpperBound
ď¨ 'TO NOW-' Integer TimeUnit
Slide
ď¨ 'SLIDE' Integer TimeUnit
TimeUnit
ď¨ 'MS' | 'S' | 'MINUTES' | 'HOURS' | 'DAY'
Select
ď¨
'SELECT' [XStream] [DISTINCT | REDUCED] âŚ
Xstream
ď¨
'ISTREAM' | 'DSTREAM' | 'RSTREAM'
Source: Jean-Paul Calbimonte and Oscar Corcho. âSPARQLStream: Ontology-based access to data
streams." Tutorial at ISWC 2013
EUCLID â Scaling up Linked Data
54
54. SPARQLStream (2)
Example
Query:
Retrieve a rstream with the observations captured by all sensors in the last
10 minutes.
PREFIX ssn: <http://purl.oclc.org/NET/ssnx/ssn>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns/#>
SELECT RSTREAM ?sensor ?observation
FROM STREAM <www.semsorgrid4env.eu/SensorReadings.srdf>
[FROM NOW â 10 MINUTES TO NOW STEP 1 MINUTE]
WHERE {
?observation a ssn:Observation;
ssn:observedBy ?sensor .
}
EUCLID â Scaling up Linked Data
55
56. W3C Semantic Sensor Networks
⢠SSN Ontology
â
â
â
â
http://www.w3.org/2005/Incubator/ssn/ssnx/ssn
OWL DL ontology
used to semantically describe sensors and sensor networks & data
Recommendations for applying the ontology for Linked Sensor Data
EUCLID â Scaling up Linked Data
57
57. W3C Semantic Sensor Networks
(2)
⢠Different perspectives
â Sensor, data/observation, system
EUCLID â Scaling up Linked Data
58
59. A Trillion RDF Triples
⢠Use case
â Use RDF and Linked Data for the customer management
database of a big telecom
â Franz Inc / AllegroGraph
EUCLID â Scaling up Linked Data
60
60. uRiKA Appliance
⢠YarcData
⢠Big Data appliance for graph
analytics
â 8K processors, 1TB RAM
â In-memory RDF database
â SPARQL 1.1 support
EUCLID â Scaling up Linked Data
61
61. RDFS Reasoning on GPUs
⢠Similar approach to Urbani et al. for large scale
reasoning with Hadoop
â Handle rules with 2 antecedents
â Rule reordering
â Dictionary encoding
⢠Shared-memory architecture
â Efficient GPU algorithm implementation is challenging
Source: Norman Heino & Jeff Z. Pan âRDFS Reasoning on Massively Parallel Hardware" ISWC 2012
EUCLID â Scaling up Linked Data
62
62. RDFS Reasoning on GPUs (2)
⢠Data parallelism
â Apply one rule (thread) on one instance triple, join to a schema triple
if possible
â Hundreds / thousands of threads working on parallel
⢠Challenge
â Duplicate removal
⢠Benchmark
â x5 speedup of computation
â But⌠memory transfer overhead is significant
Source: Norman Heino & Jeff Z. Pan âRDFS Reasoning on Massively Parallel Hardware" ISWC 2012
EUCLID â Scaling up Linked Data
63
63. Benchmarks
⢠BSBM v3.1 (April 2013)
â http://wifo5-03.informatik.unimannheim.de/bizer/berlinsparqlbenchmark/results/V7/
â Includes benchmarks with up to 150 billion triples
â x750 scale increase since the last BSBM result (200M triples)
⢠LDBC
â Industry neutral, non-profit organisation
â Benchmarks for RDF and graph databases, similar to TPC
â Big data volume, complex queries
EUCLID â Scaling up Linked Data
64
65. Summary
⢠Linked Data is a good fit for the Variety
challenge of Big Data
⢠Linked Data can simplify data discovery, data
access, data integration challenges for Big Data
⢠Exponential growth of Linked Data
⢠Linked Data benchmarks target bigger
workloads
EUCLID â Scaling up Linked Data
66
66. Summary (2)
⢠Ongoing R&D towards scaling up Linked Data
for high data Volume and Velocity
â NoSQL datastores for RDF data management
â Hadoop for scalable RDF reasoning
â GPUs for scalable RDF reasoning
⢠Adapting Linked Data & SPARQL for streaming
data scenarios
EUCLID â Scaling up Linked Data
67
67. For exercises, quiz and further material visit our website:
http://www.euclid-project.eu
Course
eBook
Other channels:
@euclid_project
euclidproject
EUCLID â Scaling up Linked Data
euclidproject
68