This document discusses Linked Open Data and how to publish open government data. It explains that publishing data in open, machine-readable formats and linking it to other external data sources increases its value. It provides examples of published open government data and outlines best practices for making data open through licensing, standard formats like CSV and XML, using URIs as identifiers, and linking to related external data. The key benefits outlined are empowering others to build upon the data and improving transparency, competition and innovation.
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
Theoretical and practical introducton to linked data, focusing both on the value proposition, the theory/foundations, and on practical examples. The material is tailored to the context of the EU institutions.
Within the course, we will present Linked Data as a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the past years, leading to the creation of a global data space that contains many billions of assertions – the Web of Linked Data.
This module supported the training on Linked Open Data delivered to the EU Institutions on 30 November 2015 in Brussels. https://joinup.ec.europa.eu/community/ods/news/ods-onsite-training-european-commission
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
Theoretical and practical introducton to linked data, focusing both on the value proposition, the theory/foundations, and on practical examples. The material is tailored to the context of the EU institutions.
Within the course, we will present Linked Data as a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the past years, leading to the creation of a global data space that contains many billions of assertions – the Web of Linked Data.
This module supported the training on Linked Open Data delivered to the EU Institutions on 30 November 2015 in Brussels. https://joinup.ec.europa.eu/community/ods/news/ods-onsite-training-european-commission
An introduction to linked data (semantic web) for a Knowledge and Information Network (KIN) webinar. The presentation shows some examples of linked data in action, data visualization, difference between open and linked data and how linkd data is being used in UK gov and local gov.
HDL - Towards A Harmonized Dataset Model for Open Data PortalsAhmad Assaf
The Open Data movement triggered an unprecedented amount of data published in a wide range of domains. Governments and corporations around the world are encouraged to publish, share, use and integrate Open Data. There are many areas where one can see the added value of Open Data, from transparency and self-empowerment to improving efficiency, effectiveness and decision making. This growing amount of data requires rich metadata in order to reach its full potential. This metadata enables dataset discovery, understanding, integration and maintenance. Data portals, which are considered to be datasets' access points, offer metadata represented in different and heterogenous models. In this paper, we first conduct a unique and comprehensive survey of seven metadata models: CKAN, DKAN, Public Open Data, Socrata, VoID, DCAT and Schema.org. Next, we propose a Harmonized Dataset modeL (HDL) based on this survey. We describe use cases that show the benefits of providing rich metadata to enable dataset discovery, search and spam detection
Slides from our tutorial on Linked Data generation in the energy domain, presented at the Sustainable Places 2014 conference on October 2nd in Nice, France
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Big Linked Data - Creating Training CurriculaEUCLID project
This presentation includes an overview of the basic rules to follow when developing training and education curricula for Linked Data and Big Linked Data
PoolParty Semantic Suite: Management Briefing and Functional Overview Martin Kaltenböck
Slides for the presentation of PoolParty Semantic Suite on 12.11. 2015 at KNVI Congres 2015 in Utrecht, the Netherlands, see: http://congres.knvi.info/ by Martin Kaltenböck in the Big Data & Linked Data Session.
The presentation gives an overview of what metadata is and why it is important. It also addresses the benefits that metadata can bring and offers advice and tips on how to produce good quality metadata and, to close, how EUDAT uses metadata in the B2FIND service.
November 2016
https://doi.org/10.6084/m9.figshare.11854626.v1
Presented at Dutch National Librarian/Information Professianal Association annual conference 2011 - NVB2011
November 17, 2011
An introduction to linked data (semantic web) for a Knowledge and Information Network (KIN) webinar. The presentation shows some examples of linked data in action, data visualization, difference between open and linked data and how linkd data is being used in UK gov and local gov.
HDL - Towards A Harmonized Dataset Model for Open Data PortalsAhmad Assaf
The Open Data movement triggered an unprecedented amount of data published in a wide range of domains. Governments and corporations around the world are encouraged to publish, share, use and integrate Open Data. There are many areas where one can see the added value of Open Data, from transparency and self-empowerment to improving efficiency, effectiveness and decision making. This growing amount of data requires rich metadata in order to reach its full potential. This metadata enables dataset discovery, understanding, integration and maintenance. Data portals, which are considered to be datasets' access points, offer metadata represented in different and heterogenous models. In this paper, we first conduct a unique and comprehensive survey of seven metadata models: CKAN, DKAN, Public Open Data, Socrata, VoID, DCAT and Schema.org. Next, we propose a Harmonized Dataset modeL (HDL) based on this survey. We describe use cases that show the benefits of providing rich metadata to enable dataset discovery, search and spam detection
Slides from our tutorial on Linked Data generation in the energy domain, presented at the Sustainable Places 2014 conference on October 2nd in Nice, France
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Big Linked Data - Creating Training CurriculaEUCLID project
This presentation includes an overview of the basic rules to follow when developing training and education curricula for Linked Data and Big Linked Data
PoolParty Semantic Suite: Management Briefing and Functional Overview Martin Kaltenböck
Slides for the presentation of PoolParty Semantic Suite on 12.11. 2015 at KNVI Congres 2015 in Utrecht, the Netherlands, see: http://congres.knvi.info/ by Martin Kaltenböck in the Big Data & Linked Data Session.
The presentation gives an overview of what metadata is and why it is important. It also addresses the benefits that metadata can bring and offers advice and tips on how to produce good quality metadata and, to close, how EUDAT uses metadata in the B2FIND service.
November 2016
https://doi.org/10.6084/m9.figshare.11854626.v1
Presented at Dutch National Librarian/Information Professianal Association annual conference 2011 - NVB2011
November 17, 2011
Linked Data and Images: Building Blocks for Cultural HeritageRobert Sanderson
Presentation given at UC Berkeley on 18th of April, 2014. Describes the benefits of Linked Data for Cultural Heritage, along with the details of IIIF and Open Annotation frameworks.
What is pattern recognition (lecture 4 of 6)Randa Elanwar
In this series I intend to simplify a beautiful branch of computer science that we as humans use it in everyday life without knowing. Pattern recognition is a sub-branch of the computer vision research and is tightly related to digital signal processing research as well as machine learning and artificial intelligence.
In this project, we study the classification problem and compare some traditional statistical models with neural networks. This work was done in the frame of postgraduate programme in Web Science at Department of Mathematics, Aristotle University of Thessaloniki
Open Data management is still not trivial nor sustainable - COMSODE results are here to bring automation to publication and management of Open Data in public institutions and companies. Presentation includes Open Data Ready standard proposal, three use cases and invitation for Horizon 2020 projects 2016.
Data Science at Scale - The DevOps ApproachMihai Criveti
DevOps Practices for Data Scientists and Engineers
1 Data Science Landscape
2 Process and Flow
3 The Data
4 Data Science Toolkit
5 Cloud Computing Solutions
6 The rise of DevOps
7 Reusable Assets and Practices
8 Skills Development
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
Cloud Computing Evolution
Why Cloud Computing needed?
Cloud Computing Models
Cloud Solutions
Cloud Jobs opportunities
Criteria for Big Data
Big Data challenges
Technologies to process Big Data- Hadoop
Hadoop History and Architecture
Hadoop Eco-System
Hadoop Real-time Use cases
Hadoop Job opportunities
Hadoop and SAP HANA integration
Summary
Dec'2013 webinar from the EUCLID project on managing large volumes of Linked Data
webinar recording at https://vimeo.com/84126769 and https://vimeo.com/84126770
more info on EUCLID: http://euclid-project.eu/
Presentation done by Ander García, Maria Teresa Linaza, Javier Franco and Miriam Juaristi, during "Data management" workshop, of the ENTER2015 eTourism conference.
This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
Automate your Data Science pipeline with Ansible, Python and Kubernetes - ODSC Talk
What is Data Science and the Data Science Landscape
Process and Flow
Understanding Data
The Data Science Toolkit
The Big Data Challenge
Cloud Computing Solutions
The rise of DevOps in Data Science
Automate your data pipeline with Ansible
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationDenodo
Watch full webinar here: https://bit.ly/3DBA4EP
A data mesh architecture offers a lot of promise to change the way we manage data – and for the better. But there’s a lot of confusion about a data mesh. People will tell you that you can build a data mesh on top of a data lake or on top of a data warehouse, and that you don’t need data virtualization to build a data mesh.
Many vendors are jumping on to the data mesh bandwagon and are claiming that they inherently support a data mesh architecture. But do they? How much of this is hype versus reality? Is it true that you don’t need data virtualization to build a scalable, enterprise-grade data mesh?
This is the myth we will attempt to bust in this next Myth Busters webinar.
Watch this session on-demand to learn about the concepts and components of a data mesh, and hear how the logical approach to data management and integration – powered by data virtualization - is critical for a data mesh.
(http://lod2.eu/BlogPost/webinar-series) In this Webinar Michael Martin presents CubeViz - a facetted browser for statistical data utilizing the RDF Data Cube vocabulary which is the state-of-the-art in representing statistical data in RDF. This vocabulary is compatible with SDMX and increasingly being adopted. Based on the vocabulary and the encoded Data Cube, CubeViz is generating a facetted browsing widget that can be used to filter interactively observations to be visualized in charts. Based on the selected structure, CubeViz offer beneficiary chart types and options which can be selected by users.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
A summary of DBpedia's History and a detailed analysis of challenges and solutions.
We show how the Linked Data Cloud evolved around DBpedia and also what problems we and other data projects encountered. We included a section on the new solutions that will lead DBpedia into a bright future.
Linked Data for the Masses: The approach and the SoftwareIMC Technologies
Title: Linked Data for the Masses: The approach and the Software
@ EELLAK (GFOSS) Conference 2010
Athens, Greece
15/05/2010
Creator: George Anadiotis (R&D Director)
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...Jochem van Grondelle
Recently the concept of a ‘data mesh’ was introduced by Zhamak Deghani to solve architectural and organizational challenges with getting value from data at scale more logically and efficiently, built around four principles:
* Domain-oriented decentralized data ownership
* Data as a product
* Self-serve data infrastructure as a platform
* Federated computational governance
This presentation will initially deep-dive into the ‘data mesh’ and how it fundamentally differs from the typical data lake architectures used today. Subsequently, it describes OLX Europe’s current data platform state aimed partially towards a more decentralized data architecture, covering its analytical data platform, data infrastructure, data discovery, and data privacy.
Finally, it will see to what extent the main principles around the ‘data mesh’ can be applied to a future vision for our data platform and what advantages and challenges implementing such a vision can bring for OLX and other companies.
For more information on data mesh principles, check out the original article by Zhamak: https://martinfowler.com/articles/data-mesh-principles.html.
Talk given at Open Knowledge Foundation 'Opening Up Metadata: Challenges, Standards and Tools' Workshop, Queen Mary University of London, 13th June 2012.
Info on the event at http://openglam.org/2012/05/31/last-places-left-for-opening-up-metadata-challenges-standards-and-tools/
From the Feb 19 2014 NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
The Web of Data - Ralph Swick, Domain Lead of the Information and Knowledge Domain at W3C
Schibsted collects and analyzes 900 million events/day using AWS. This presentation gives an overview of the systems and architecture, including the solutions to GDPR.
NoSQL databases were created to solve scalability problems with SQL databases. It turns out these problems are profoundly connected with Einstein's theory of relativity (no, honestly), and understanding this illuminates the SQL/NoSQL divide in surprising ways.
An overview of farmhouse brewing in Norway, both as it exists today, and as it was historically. Extra information on the unique Norwegian yeast cultures that still survive.
NoSQL databases, the CAP theorem, and the theory of relativityLars Marius Garshol
A presentation showing how the CAP theorem causes NoSQL databases to have BASE semantics. That is, they don't support ACID consistency. Then shows how CAP is related to Einstein's theory of relativity. And finally shows how Google Spanner and F1 provide ACID that scales.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
4. Linked Open Data
foundation for all IT functionality
no data – no service
published
license allowing reuse
machine-readable
connected to other data
self-documenting
4
6. US Government Works
• US copyright law states that
– "a work prepared by an officer or employee of the
U.S. government as part of that person's official
duties”
• is
– not entitled to domestic copyright protection
6 http://en.wikipedia.org/wiki/Copyright_status_of_work_by_the_U.S._government
12. Why publish open data?
• To empower other people to do things
– you can’t afford
– you don’t have the time to
– you don’t want to
– you couldn’t imagine
• Again: no data, no service
12
13. Why is this so popular now?
• Because the advent of
– ubiquitous computing,
– cheap hardware and disk space,
– cheap and fast networks
• has dramatically reduced the cost of
distributing data
– and building applications,
– while dramatically increasing the value from such
applications
• Also, fashion
13
14. Why is access to data important?
• It’s necessary for democracy to function
properly
– voters and activists cannot make the right
decisions if they don’t have the necessary
information
• It’s important for economic growth
– in a post-modern society, information is the life-
blood of the economy
– many companies make their living simply by
selling repackaged information
14
15. Economic effects
• Making information more easily available
– levels the playing field
– benefiting smaller companies
– improves competition
• Large companies have
– advertising budgets,
– money to extract data,
– ...
• Smaller entities don’t
16
16. Two kinds of availability (1)
• Available to humans
– this makes the raw data available to humans
– only humans can then digest and process it, and
ultimately pass it on to other humans
17
17. Two kinds of availability (2)
• Available to machines
– that is, make the data available in machine-
processable form
– people can then build many different kinds of
services based on the data
– allows many different kinds of access to the data
18
32. Cleaning customer data
• I did work deduplicating a customer
database
• We wanted to look at cleaning the data
automatically
– ie: picking which of the duplicate records to use
• Decided at the time it couldn’t be done
automatically
– because company data not available online
• Now it is available...
33
33. http://dbpedia.org/About
• Not really an application
– but a very important data set
• Basically Wikipedia as Linked Open Data
– Wikipedia fact boxes etc extracted as RDF
– 400 million statements about 3.77 million things
34
36. Ten Principles for Open Gov’t Data
1. Completeness
2. Primary
3. Timeliness
4. Ease of access
5. Machine readable
6. Non-discrimination
7. Use open standards
8. Licensing
9. Permanence
10. Usage cost
37
http://sunlightfoundation.com/policy/documents/ten-open-data-principles/
37. 5-star model
Available, open license
Machine-readable format
Non-proprietary format
URIs as identifiers
Linked to other data
38
http://5stardata.info/
38. Data licenses
• Necessary so users know what they are
allowed/not allowed to do with the data
• Open Data Commons has licenses you can
reuse:
– http://opendatacommons.org/licenses/
• Such as
– public domain (anything goes)
– attribution license (must give credit)
– attribution share-alike
• Norwegian license
– http://data.norge.no/nlod/en
– Norwegian Licence for Open Government Data
39
39. Machine-readable
Readable Not readable
• CSV • Microsoft Word
• XML • PDF
• Microsoft Excel • HTML
• RDF • Flash
• JSON
>>> import csv
>>> r = csv.reader(open('countries-mondial.csv'))
>>> r.next()
['id', 'country', 'capital', 'area']
>>> r.next()
['4202', 'Malta', 'Valletta', '320']
>>> int(r.next()[3]) * 247.105
3330975.4000000004 # area of next country in acres
40
40. Non-proprietary
• Microsoft Excel is a proprietary format
– it’s owned and controlled by Microsoft
– it’s also very complicated to read
• An open alternative is CSV
• Open, standardized alternatives are
– XML, RDF, JSON
41
41. Ways to publish
• Download
– this is the easiest
– just put up static data files for download
– vastly better than nothing
• API
– build an API people can use for interacting with the
data
– fashionable, but not really necessary
• Stream
– publish streams of changes, for easy syncing
– using SDshare, for example
– very useful, but can be costly
42
42. How hard is it to publish data?
• Depends on how you do it
• Dumping CSV from a relational database is
trivial
– can literally be as simple as just a few lines of code
• The more ambitious you are, the more
work it becomes
– filtering out sensitive records
– using better formats than CSV
– linking to other data
– documenting
– adding streaming
– ...
43
43. URIs as identifiers
• URIs are globally unique
– thanks to their use of domain names
– anyone with a domain name can make URIs
• Benefits
– identifiers that can be reused anywhere, and still
remain unique
– can be resolved to an explanation of what they
identify
44. Linking to other data (1)
• Data becomes more valuable when it is
connected to other data
– because this reduces the cost of reusing and
processing the data
• Imagine reusing the data below
– how do you connect with country data from other
sources?
ID COUNTRY CAPITAL AREA
4202 Malta Valletta 320
19654 Moldova Chisinau 25333
8715 Kazakstan Almaty 2717300
Data from Mondial
45
45. Linking to other data (2)
• By using common URIs for your concepts
you can make data reuse much easier
ID COUNTRY CAPITAL AREA
http://dbpedia.org/resource/Malta Malta Valletta 316
http://dbpedia.org/resource/Moldova Moldova Chişinău 33846
http://dbpedia.org/resource/Kazakhstan Kazakhstan Almaty 2717300
Data from DBpedia
46
46. 47 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
47. Linked Data principles
1. Use URIs as names for things
1. Use HTTP URIs so that people can look up
those names.
1. When someone looks up a URI, provide
useful information, using the standards
(RDF*, SPARQL)
1. Include links to other URIs so that they
can discover more things.
48 http://www.w3.org/DesignIssues/LinkedData
48. An example
<result>
<heading>VEGMELDINGER - Viktigste</heading>
<messages>
<message>
<heading>Oslo (Ulven) - Karihaugen, ved [41] Furuset</heading>
<messagenumber>64952</messagenumber>
<version>1</version>
<ingress>To felt stengt på grunn av vedlikeholdsarbeid i periodene: Onsdag og torsdag fra
22:00 til 05:30 (neste dag). Fare for kø.</ingress>
<messageType>Redusert framkommelighet</messageType>
<urgency>X</urgency>
<roadType>Ev</roadType>
<roadNumber>6</roadNumber>
<validFrom>2012-10-24 22:00:00.0 CEST</validFrom>
<validTo>2012-10-26 05:30:00.999 CEST</validTo> Are these helpful ways to
<actualCounties> refer to route E6 and the
<string>Oslo</string>
</actualCounties> county of Oslo?
<coordinates>
<crs>EPSG:4326</crs>
<startPoint>
<xCoord>10.889964</xCoord>
<yCoord>59.937266</yCoord>
</startPoint>
</coordinates>
49
49. An alternative
<result>
<heading>VEGMELDINGER - Viktigste</heading>
<messages>
<message>
<heading>Oslo (Ulven) - Karihaugen, ved [41] Furuset</heading>
<messagenumber>64952</messagenumber>
<version>1</version>
<ingress>To felt stengt på grunn av vedlikeholdsarbeid i periodene: Onsdag og torsdag fra
22:00 til 05:30 (neste dag). Fare for kø.</ingress>
<messageType>Redusert framkommelighet</messageType>
<urgency>X</urgency>
<road>http://dbpedia.org/resource/European_route_E06</road>
<validFrom>2012-10-24 22:00:00.0 CEST</validFrom>
<validTo>2012-10-26 05:30:00.999 CEST</validTo>
<actualCounties>
<county>http://dbpedia.org/resource/Oslo</county>
</actualCounties>
<coordinates>
<crs>EPSG:4326</crs>
<startPoint> Now we’re using URIs as names for these
<xCoord>10.889964</xCoord>
<yCoord>59.937266</yCoord> concepts. And, what’s more, the names
</startPoint>
</coordinates> resolve to more information.
50
50. HTML
[larsga@Lars-Marius-Garshols-MacBook-Pro-6 ~]$ telnet dbpedia.org 80
Trying 194.109.129.58...
Connected to dbpedia.org (194.109.129.58).
Escape character is '^]'.
GET /page/European_route_E06 HTTP/1.0
This is the green type of DBpedia web
HTTP/1.1 200 OK
Date: Wed, 24 Oct 2012 07:16:21 GMT page you’ve already seen for Rælingen.
Content-Type: text/html; charset=UTF-8
Content-Length: 44484
Connection: close
Vary: Accept-Encoding
Server: Virtuoso/06.04.3132 (Linux) x86_64-generic-linux-glibc212-64 VDB
Accept-Ranges: bytes
Expires: Wed, 31 Oct 2012 07:16:20 GMT
Link: <http://dbpedia.org/data/European_route_E06.rdf>; rel="alternate";
type="application/rdf+xml" ...
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:dbpprop="http://dbpedia.org/property/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
version="XHTML+RDFa 1.0" xml:lang="en">
51
53. Weaknesses in traditional formats
• Cannot be imported directly
– must always be translated/interpreted somehow
• Cannot be automatically merged
– no concept of identity
• Schema information is not linked
– types and properties not connected to other types
and properties
• Linked Data solves all of these problems
– Linked Data ≈ RDF
54
54. How RDF works
relational table ‘PERSON’
ID NAME EMAIL
1 Stian Danenbarger stian.danenbarger@
2 Lars Marius Garshol larsga@bouvet.no
3 Axel Borge axel.borge@bouvet
RDF-ized data
SUBJECT PROPERTY OBJECT
http://example.com/person/1 rdf:type ex:Person
http://example.com/person/1 ex:name Stian Danenbarger
http://example.com/person/1 ex:email stian.danenbarger@
http://example.com/person/2 rdf:type Person
http://example.com/person/2 ex:name Lars Marius Garshol
... ... ...
55
56. Hafslund SESAM
• An archive system, really
• Automatically enriches metadata on documents
when archived
• To do this, must collect data from enterprise
systems
http://www.slideshare.net/larsga/hafslund-sesam-semantic-integration-in-practice
61. A simple example
• We were building an internet service for
DSS for use in the ministries
– based on RDF
• Were looking for open data which internal
data at the ministries could be connected
to
– all for display in the intranet
62
62.
63. RDF description of the data set,
using Dublin Core and VOID
http://sws.ifi.uio.no/npd/page/
LinkedOpenNPDFactPages
68. How hard would it be to use?
• We loaded the data locally in a few minutes
• We already have the ability to display
arbitrary data from RDF
• All that’s missing is connections from
existing data
– organization numbers is one way
– statistical analysis is another
69
69. Linking data without common ids
• It’s possible, using statistical techniques
– in many cases not even that hard
– http://code.google.com/p/duke/
70
70. Are there any tools out there?
Modelling
pellet
Reasoners
Redland RDF Libraries
APIs
Triple stores
71. Linked Schemas
• Objects are identified by URI
– but so are properties and classes
• Means you can describe your properties
and classes in terms of others
– you can build on existing schemas
• RDF provides very powerful ways to do this
72
72. RDFS and OWL
• Schema languages for RDF
– used to describe classes and properties
– in many ways like XML Schema or a database schema
• Represented in RDF
– just like the data
– means you can say anything you want about the data
• However, it doesn’t work like you expect
– based on Open World Assumption
– based on logical reasoning
73. Open World Assumption
• That nobody’s said it, doesn’t mean it’s not true
– that we don’t have the date of death doesn’t mean the
person is alive
– that we have two different rows in the PERSON table
doesn’t mean we have two different people
• In other words, data may be connected in
unexpected ways
– this usually doesn’t apply in single systems
– but when you’re on the open web...
74. Open World Assumption
Rule: The value of dc:creator must be a person
rdfs:Range
dc:creator rdf:type
person
owl:disjointWith
rdf:type
dc:creator rdf:type
sheep
75. Is Kyoto in Asia?
owl:Tr
locat
type-instance ansitiv
ed-in eProp
erty
Kyot
located-in Japa
located-in
Asia
o n
select ?c where { Kyoto located-in ?c . }
76. One model can extend another
FOAF
foaf:
foaf:depiction foaf:
(friend of a friend)
Person Image
foaf:knows
ph:depicted-in
owl:inverseOf
ph: ph: ph:
Person ph:depiction-of Photo Event
ph:taken-by
ph: ph: Photo ontology
Category Place
(not developed yet)
ph:contained-in
SKOS dbpedia
77. Serious logic
• Bus drivers are people who drive buses
• Drivers are people who drive vehicles
• Therefore bus drivers are drivers!
ex:Bus owl:subClassOf
owl:someValuesFrom
owl:intersectionOf owl:restriction
ex:BusDriver
owl:on
owl:subClassOf ex:drives
ex:Person
owl:on
owl:intersectionOf owl:restriction
ex:Driver
owl:someValuesFrom
ex:Vehicle
http://owl.man.ac.uk/2003/why/latest/
78. But ... you have to speak logic
• A serious challenge for
most people
• Must be very precise
about what you say
• People mostly use just
small fractions of OWL
80. Sensitive data
• Some data is sensitive due to
– privacy concerns
– public security concerns
• These concerns are real, and must be
addressed
– data may need to be filtered, or
– in the worst case, not published at all
• A rule of thumb
– if it’s available on paper or in human-readable form,
machine-readable should be OK, too
81
81. Will people understand the data?
• People can and will misunderstand
anything
– this not your responsibility
– but you can help
• How to avoid
– document the data
– use self-describing linked data
82
82. Capacity issues
• What if the data become too popular,
could there be overloading issues?
– yes, this happens
• What to do
– scale up (more hardware)
– implement download restrictions, and charge for
use above limit
– let a data hotel host the data
• http://data.norge.no/datahotellet
83
83. Data quality
• What if the data is not 100% correct?
• No data set is perfect
– there are always problems with data
– in the worst case you can add a disclaimer
– all use is at own risk, in any case
• Rules of thumb
– if you take the trouble to use and maintain the data, it’s good
enough for the public, too
– if the data is too poor to publish, you should probably delete it
84
84. Media outrage
• What if someone analyses the data and
finds evidence of a scandal?
• Does that mean it was wrong to publish
the data?
• Discuss!
85
86. Conclusion
• Open data is important
– for democracy, and for the economy
• Open data is Norwegian gov’t policy
– (per letter from FAD in 2010)
• There are different kinds of open
– human-readable is good
– machine-readable is easier, and often better
– linked data is best (but not necessary)
87
87. Where to learn more
• These slides
– http://slideshare.net/larsga
• FADs viderebruksveileder
– http://no.wikibooks.org/wiki/Viderebruksveileder
• Free ebook on Linked Open Data
– http://linkeddatabook.com/editions/1.0/
88