Presentation on the Data Cube vocabulary to support Linked Data publication of statistics and measurement data sets. Given at SemTech 2011, San Francisco.
Creating a Modern Data Architecture for Digital TransformationMongoDB
By managing Data in Motion, Data at Rest, and Data in Use differently, modern Information Management Solutions are enabling a whole range of architecture and design patterns that allow enterprises to fully harness the value in data flowing through their systems. In this session we explored some of the patterns (e.g. operational data lakes, CQRS, microservices and containerisation) that enable CIOs, CDOs and senior architects to tame the data challenge, and start to use data as a cross-enterprise asset.
Linking Services and Linked Data: Keynote for AIMSA 2012John Domingue
An overview of the approach, principles and technologies supporting how services and Linked Data can be combined to support the creation of applications.
Ipres 2011 The Costs and Economics of Preservationneilgrindley
To introduce and describe some of the work that has been done to help institutions and research groups understand both the costs and the economics of preservation
To describe ongoing phases of JISC-funded work that are attempting to further advance understanding and implement approaches in this area
To give some indication of where collective international effort may be of universal benefit.
Adoption of Cloud Computing in Scientific ResearchYehia El-khatib
Some might say the scientific research community is somewhat behind the curve of adopting the cloud. In this talk, I present a few examples of adopting the cloud from the wider research community. I also highlight some of the aspects by which cloud computing could affect scientific research in the near future and the associated challenges.
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...Greenapps&web
Volunteer geographical information (VGI) either in the context of citizen science, active crowdsourcing and even passive crowdsourcing has been proven useful in various societal domains such as natural hazards, health status, disease epidemic and biological monitoring. Nonetheless, the variable degrees or unknown quality due to the crowdsourcing settings are still an obstacle for fully integrating these data sources in environmental studies and potentially in policy making. The data curation process in which a quality assurance (QA) is needed is often driven by the direct usability of the data collected within a data conflation process or data fusion (DCDF) combining the crowdsourced data into one view using potentially other data sources as well. Using two examples, namely land cover validation and inundation extent estimation, this paper discusses the close links between QA and DCDF in order to determine whether a disentanglement can be beneficial or not to a better understanding of the data curation process and to its methodology with respect to crowdsourcing data. Far from rejecting the usability quality criterion, the paper advocates for a decoupling of the QA process and the DCDF step as much as possible but still in integrating them within an approach analogous to a Bayesian paradigm.
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-3/W5, 2015
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
Shirshanka Das and Yael Garten describe how LinkedIn redesigned its data analytics ecosystem in the face of a significant product rewrite, covering the infrastructure changes that enable LinkedIn to roll out future product innovations with minimal downstream impact. Shirshanka and Yael explore the motivations and the building blocks for this reimagined data analytics ecosystem, the technical details of LinkedIn’s new client-side tracking infrastructure, its unified reporting platform, and its data virtualization layer on top of Hadoop and share lessons learned from data producers and consumers that are participating in this governance model. Along the way, they offer some anecdotal evidence during the rollout that validated some of their decisions and are also shaping the future roadmap of these efforts.
Architecting for change: LinkedIn's new data ecosystemYael Garten
2016 StrataHadoop NYC conference talk.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52182
Abstract:
Last year, LinkedIn embarked on an ambitious mission to completely revamp the mobile experience for its members. This would mean a completely new mobile application, reimagined user experiences, and new interaction concepts. As the team evaluated the impact of this big rewrite on the data analytics ecosystem, they observed a few problems.
Over the past few years, LinkedIn has become extremely good at incrementally changing the site one mini-feature at a time, often in conjunction with hundreds of other incremental changes. LinkedIn’s experimentation platform ensures that it is always monitoring a wide gamut of impacted metrics with every change before rolling fully forward. However, when it comes to rolling out a big change like this, different challenges crop up. You have to rollout the entire application all at once; the new experience means that you have no baseline on new metrics; and existing metrics may see double digit changes just because of the new experience or because the metric’s logic is no longer accurate—the challenge is in figuring out which is which.
Shirshanka Das and Yael Garten describe how LinkedIn redesigned its data analytics ecosystem in the face of a significant product rewrite, covering the infrastructure changes that enable LinkedIn to roll out future product innovations with minimal downstream impact. Shirshanka and Yael explore the motivations and the building blocks for this reimagined data analytics ecosystem, the technical details of LinkedIn’s new client-side tracking infrastructure, its unified reporting platform, and its data virtualization layer on top of Hadoop and share lessons learned from data producers and consumers that are participating in this governance model. Along the way, they offer some anecdotal evidence during the rollout that validated some of their decisions and are also shaping the future roadmap of these efforts.
The NIH Data Commons - BD2K All Hands Meeting 2015Vivien Bonazzi
Presentation given at the BD2K All Hands meeting in Bethesda, MD, USA in November 2015
https://datascience.nih.gov/bd2k/events/NOV2015-AllHands
Video cast of this presentation:
http://videocast.nih.gov/summary.asp?Live=17480&bhcp=1
talk starts at 2hrs 40min (its about 55mins long) - includes video!
Document describing the Commons : https://datascience.nih.gov/commons
COBWEB A quality assurance workflow authoring tool for citizen science and cr...COBWEB Project
Presented by Didier Leibovici, Julian Rosser, Mike Jackson (Nottingham Geospatial Institute, University College Nottingham) at the COBWEB Summit, a side event of the Open Geospatial Constorium's (OGC) 99th Technical & Planning Committee (TC/PC) Meeting held at University College Dublin, 2016.
Models Done Better... - UDG2018 - Intertek and DHIStephen Flood
Use of integrator systems (operational data and model management platforms) to enhance model performance and value.
Presented at the CIWEM Urban Drainage Group Annual Conference 2018
Richard Dannatt - Intertek
Steve Flood - DHI
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
Depuis les années 1980, le volume de données produit et le risque lié à ces données ont littéralement explosé. 90% des données existantes aujourd’hui ont été créé ces 2 dernières années, dont 80% sont non structurées. Avec plus d’utilisateurs et le besoin de disponibilité permanent, les risques sont beaucoup plus élevés.
Quels sont les paramètres de bases de données qu’un décideur doit prendre en compte pour déployer ses applications innovantes?
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Presentation on the Data Cube vocabulary to support Linked Data publication of statistics and measurement data sets. Given at SemTech 2011, San Francisco.
Creating a Modern Data Architecture for Digital TransformationMongoDB
By managing Data in Motion, Data at Rest, and Data in Use differently, modern Information Management Solutions are enabling a whole range of architecture and design patterns that allow enterprises to fully harness the value in data flowing through their systems. In this session we explored some of the patterns (e.g. operational data lakes, CQRS, microservices and containerisation) that enable CIOs, CDOs and senior architects to tame the data challenge, and start to use data as a cross-enterprise asset.
Linking Services and Linked Data: Keynote for AIMSA 2012John Domingue
An overview of the approach, principles and technologies supporting how services and Linked Data can be combined to support the creation of applications.
Ipres 2011 The Costs and Economics of Preservationneilgrindley
To introduce and describe some of the work that has been done to help institutions and research groups understand both the costs and the economics of preservation
To describe ongoing phases of JISC-funded work that are attempting to further advance understanding and implement approaches in this area
To give some indication of where collective international effort may be of universal benefit.
Adoption of Cloud Computing in Scientific ResearchYehia El-khatib
Some might say the scientific research community is somewhat behind the curve of adopting the cloud. In this talk, I present a few examples of adopting the cloud from the wider research community. I also highlight some of the aspects by which cloud computing could affect scientific research in the near future and the associated challenges.
On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for En...Greenapps&web
Volunteer geographical information (VGI) either in the context of citizen science, active crowdsourcing and even passive crowdsourcing has been proven useful in various societal domains such as natural hazards, health status, disease epidemic and biological monitoring. Nonetheless, the variable degrees or unknown quality due to the crowdsourcing settings are still an obstacle for fully integrating these data sources in environmental studies and potentially in policy making. The data curation process in which a quality assurance (QA) is needed is often driven by the direct usability of the data collected within a data conflation process or data fusion (DCDF) combining the crowdsourced data into one view using potentially other data sources as well. Using two examples, namely land cover validation and inundation extent estimation, this paper discusses the close links between QA and DCDF in order to determine whether a disentanglement can be beneficial or not to a better understanding of the data curation process and to its methodology with respect to crowdsourcing data. Far from rejecting the usability quality criterion, the paper advocates for a decoupling of the QA process and the DCDF step as much as possible but still in integrating them within an approach analogous to a Bayesian paradigm.
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-3/W5, 2015
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
Shirshanka Das and Yael Garten describe how LinkedIn redesigned its data analytics ecosystem in the face of a significant product rewrite, covering the infrastructure changes that enable LinkedIn to roll out future product innovations with minimal downstream impact. Shirshanka and Yael explore the motivations and the building blocks for this reimagined data analytics ecosystem, the technical details of LinkedIn’s new client-side tracking infrastructure, its unified reporting platform, and its data virtualization layer on top of Hadoop and share lessons learned from data producers and consumers that are participating in this governance model. Along the way, they offer some anecdotal evidence during the rollout that validated some of their decisions and are also shaping the future roadmap of these efforts.
Architecting for change: LinkedIn's new data ecosystemYael Garten
2016 StrataHadoop NYC conference talk.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52182
Abstract:
Last year, LinkedIn embarked on an ambitious mission to completely revamp the mobile experience for its members. This would mean a completely new mobile application, reimagined user experiences, and new interaction concepts. As the team evaluated the impact of this big rewrite on the data analytics ecosystem, they observed a few problems.
Over the past few years, LinkedIn has become extremely good at incrementally changing the site one mini-feature at a time, often in conjunction with hundreds of other incremental changes. LinkedIn’s experimentation platform ensures that it is always monitoring a wide gamut of impacted metrics with every change before rolling fully forward. However, when it comes to rolling out a big change like this, different challenges crop up. You have to rollout the entire application all at once; the new experience means that you have no baseline on new metrics; and existing metrics may see double digit changes just because of the new experience or because the metric’s logic is no longer accurate—the challenge is in figuring out which is which.
Shirshanka Das and Yael Garten describe how LinkedIn redesigned its data analytics ecosystem in the face of a significant product rewrite, covering the infrastructure changes that enable LinkedIn to roll out future product innovations with minimal downstream impact. Shirshanka and Yael explore the motivations and the building blocks for this reimagined data analytics ecosystem, the technical details of LinkedIn’s new client-side tracking infrastructure, its unified reporting platform, and its data virtualization layer on top of Hadoop and share lessons learned from data producers and consumers that are participating in this governance model. Along the way, they offer some anecdotal evidence during the rollout that validated some of their decisions and are also shaping the future roadmap of these efforts.
The NIH Data Commons - BD2K All Hands Meeting 2015Vivien Bonazzi
Presentation given at the BD2K All Hands meeting in Bethesda, MD, USA in November 2015
https://datascience.nih.gov/bd2k/events/NOV2015-AllHands
Video cast of this presentation:
http://videocast.nih.gov/summary.asp?Live=17480&bhcp=1
talk starts at 2hrs 40min (its about 55mins long) - includes video!
Document describing the Commons : https://datascience.nih.gov/commons
COBWEB A quality assurance workflow authoring tool for citizen science and cr...COBWEB Project
Presented by Didier Leibovici, Julian Rosser, Mike Jackson (Nottingham Geospatial Institute, University College Nottingham) at the COBWEB Summit, a side event of the Open Geospatial Constorium's (OGC) 99th Technical & Planning Committee (TC/PC) Meeting held at University College Dublin, 2016.
Models Done Better... - UDG2018 - Intertek and DHIStephen Flood
Use of integrator systems (operational data and model management platforms) to enhance model performance and value.
Presented at the CIWEM Urban Drainage Group Annual Conference 2018
Richard Dannatt - Intertek
Steve Flood - DHI
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
Depuis les années 1980, le volume de données produit et le risque lié à ces données ont littéralement explosé. 90% des données existantes aujourd’hui ont été créé ces 2 dernières années, dont 80% sont non structurées. Avec plus d’utilisateurs et le besoin de disponibilité permanent, les risques sont beaucoup plus élevés.
Quels sont les paramètres de bases de données qu’un décideur doit prendre en compte pour déployer ses applications innovantes?
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
3. Linked Data journey ...
explore
what is linked data?
what use it is for us?
4. Linked Data journey ...
explore
what is linked data?
what use it is for us?
self-describing Integration
carries semantics with it comparable
annotate and explain slice and dice
data in context web API
... ...
5. Linked Data journey ...
explore
what is linked data?
what use it is for us?
self-describing Integration
carries semantics with it comparable
annotate and explain slice and dice
data in context web API
... ...
what’s involved?
7. Linked Data journey ...
explore pilot routine?
Great pilot but ...
can we reduce the time and cost?
how do we handle changes and updates?
how can we make the published data easier to use?
How do we make Linked Data “business as usual”?
8. Example case study: Environment Agency
monitoring of bathing
water quality
static pilot
live pilot
historic annual
assessments
weekly assessments
operational system
additional data feeds
live update
integrated API
data explorer
9. From pilot to practice
reduce modelling costs
patterns dive1
reuse
handling change and update
patterns
publication process
automation
conversion
publication
embed in the business process
use internally as well as externally
publish once, use many
data platform
10. Reduce costs - modelling
1. Don’t do it
map source data into isomorphic RDF, synthesize URIs
loses some of the value proposition
2. Reuse existing ontologies intact or mix-and-match
best solution when available
W3C GLD work on vocabularies – people, organizations,
datasets ...
3. Reusable vocabulary patterns
example:
Data cube plus reference URI sets
adaptable to broad range of data – environmental, statistical,
financial ...
11. Reusable patterns: Data cube
Much public sector data has regularities
set of measures
observations, forecasts, budgets, assessments, statistics ...
>0.1 34
27 good
excellent
poor
good 125
12. Reusable patterns: Data cube
Much public sector data has regularities
sets of measures
observations, forecasts, budgets, assessments, estimates ...
organized along some dimensions
region, agency, time, category, cost centre ...
objective code cost centre
12 15 25
measure: spend
8 9 11
120 130 180
time
13. Reusable patterns: Data cube
Much public sector data has regularities
sets of measures
observations, forecasts, budgets, assessments, estimates ...
organized along some dimensions
region, agency, time, category, cost centre ...
interpreted according to attributes
units, multipliers, status
objective code cost centre
provisional
$12k $15k $25k
measure: spend
$8k $9k $11k
final
$120k $130k $180k
time
15. Data cube pattern
Pattern, not a fixed ontology
customize by selecting measures, dimensions and attributes
originated in publishing of statistics
applied to environment measurements, weather forecasts, budgets
and spend, quality assessments, regional demographics ...
Supports reuse
widely reusable URI sets – geography, time periods, agencies, units
organization-wide sets
modelling often only requires small increments on top of core
pattern and reusable components
opens door for reusable visualization tools
standardization through W3C GLD
16. Application to case study
Data Cubes for water quality measurement
in-season weekly assessments
end of season annual assessments
dimensions:
time intervals – UK reference time service
location - reference URI set for bathing waters and sample pts
cubes can reuse these dimensions
just need to define specific measures
17. From pilot to practice
reduce modelling costs
patterns
reuse
handling change and update
patterns dive 2
publication process
automation
conversion
publication
embed in the business process
use internally as well as externally
publish once, use many
data platform
18. Handling change
critical challenge
most initial pilots choose a snapshot dataset
and go stale, fast
understanding the nature of data updates and how to handle
them is critical to successful scaling to business as usual
types of change
new data related to different time period
corrections to data
entities change
properties
identity
19. Modelling change
1. Individual data items relate to new time period
Pattern: n-ary relation
observation resource relates value to time period and other context
use Data Cube dimensions for this
bwq:sampleYear
bwq:bathingWater http://reference.data.gov.uk/id/year/2009
http://environment.data.gov.
uk/id/bathing- bwq:classification Higher
water/ukk1202-36000
bwq:sampleYear
Clevedon Beach http://reference.data.gov.uk/id/year/2010
bwq:classification
Minimum
bwq:sampleYear
http://reference.data.gov.uk/id/year/2011
bwq:classification
Higher
History or latest?
latest is non-monotonic but helpful for many practical uses
materialize (SPARQL Update), implement in query, implement in API
choice whether to keep history as well
water quality v. weather forecasts
21. Modelling change
3. Mutation
Infrequent change of properties, essential identity remains
e.g. renaming a school, adding another building
routine accesses see property value, not function of time
patterns
in place update
named graphs
current graph + graphs for each previous state + meta-graph
explicit versioning with open periods
22. Modelling change
3. Mutation
explicit versioning with open periods
dct:hasVersion dct:hasVersion
endurant
“Clevedon Beach” “Clevedon Sands”
time:intervalStarts time:intervalStarts
dct:valid 2003 dct:valid 2011
2011
time:intervalFinishes
find right version by query on validity interval
simplify use through
non-monotonic “latest value” link
API to implement query filters automatically
23. Application to case study
weekly and annual samples
use Data Cube pattern (n-ary relation)
withdrawn samples
replacement pattern (no explicit change event)
Data Cube slice for “latest valid assessment”
generated by a SPARQL Update query
API gives easy access to the latest valid values
linked data following or raw SPARQL query allows drilling into changes
changes to bathing water profile
versioning pattern
bathing water entity points to latest profile (SPARQL Update again)
24. From pilot to practice
reduce modelling costs
patterns
reuse
handling change and update
patterns
publication process
automation
conversion dive 3
publication
embed in the business process
use internally as well as externally
publish once, use many
data platform
25. Automation
Transform and publish data feed increments
transformation engine service
reusable mappings, low cost to adapt to new feeds
linking to reference data
publication service that supports non-monotonic changes
publication
service
data increments (csv) transform
service
replicated
xform xform reconciliation
xform
spec. spec. publication
spec. service
servers
Reference data
26. Transformation service
declarative specification of transform
single service support range of transformations
easy to adapt transformation to new feeds and modelling
changes
R2RML – RDB to RDF Mapping Language
specify mapping from database tables to RDF triples
W3C candidate recommendation
D2RML
R2RML extension to treat CSV feed as a database table
28. Using patterns
problems with verbosity, increases reuse costs
extend to support modelling patterns
Data Cube
specify mapping to observation with measures and dimensions
engine generates Data Set and Data Structure Definition
automatically
29. D2RML cube map example
:dataCubeMap a dr:DataCubeMap ;
rr:logicalTable “dataSource”;
dr:datasetIRI “http://example.org/datacube1”^^xsd:anyURI ;
dr:dsdIRI “http://example.org/myDsd”^^xsd:anyURI ;
Instances will
dr:observationMap [ automatically link to
rr:subjectMap [ base Data Set
rr:termType rr:IRI ;
rr:template “http://example.org/observation/{PLACE}/{DATE}” ] ;
rr:componentMap [
Implies an entry in the Data
dr:componentType qb:measure ;
Structure Definition which is
rr:predicate aq:concentration ;
auto-generated
rr:objectMap [ rr:column “NO2” ; rr:datatype xsd:decimal ; ]
] ;
... Define how measure
value is to be
represented
30. But what about linking?
connect observations to reference data
a core value of linked data
R2RML has Term Maps to create values
constants and templates
extend to allow maps based on other data sources
Lookup map
lookup resource in a store, fetch predicate
Reconcile
specify lookup in a remote service
use Google Refine reconciliation API
31. Automation
Transform and publish data feed increments
transformation engine service
reusable mappings, low cost to adapt to new feeds
linking to reference data
publication service that supports non-monotonic changes
publication
service
data increments (csv) transform
service
replicated
xform xform reconciliation
xform
spec. spec. publication
spec. service
servers
Reference data
32. Publication service
goals
cope with non-monotonic effects of change representation
so replication is robust and cheap (=> make it idempotent)
solution
SPARQL Update
publish transformed increment as a simple DATA INSERT
then run SPARQL Update script for non-monotonic links
dct:replacedBy links
lastest value slices
34. Automation
Transform and publish data feed increments
transformation engine service
reusable mappings, low cost to adapt to new feeds
linking to reference data
publication service that supports non-monotonic changes
publication
service
data increments (csv) transform
service
replicated
xform xform reconciliation
xform
spec. spec. publication
spec. service
servers
Reference data
35. Application to case study
Update server
transforms based on scripts (earlier scripting utility)
linking to reference data
distributed publication via
SPARQL Update
extensible range of data sets
annual assessments
in-season assessments
bathing water profile
features (e.g. pollution sources)
reference data
36. From pilot to practice
reduce modelling costs
patterns
reuse
handling change and update
patterns
publication process
automation
conversion
publication
embed in the business process dive 4
use internally as well as externally
publish once, use many
data platform
37. Embed in business process
embedding is critical to ensure data kept up to date
in turn needs usage
=> lower barrier to use external
use
data not
used rich, up
to date invest
data
data goes hard to
stale justify
internal
use
38. Lowering barrier to use
simple REST APIs
use Linked Data API specification
rich query without learning SPARQL
easy consumption as JSON, XML
gets developers used to data and data model
publication
LD API
service
transform
service
39. Application to case study
embedded in process for weekly/daily updates
infrastructure to automate conversion and publishing
API plus extensive developer documentation
third party and in-house applications built over API
publish once, use many
information products as applications over a data platform,
usable externally as well as internally
40. The next stage
grow range of data publications and uses
range of reference data and sets brings new challenges
discover reference terms and models to reuse
discover datasets to use for application
discover models and links between sets
needs a coordination or registry service
story for another day ...
41. Conclusions
illustrated how public sector users of linked are moving
from static pilots to operational systems
keys are:
reduce modelling costs through patterns and reuse
design for continuous update
automation of publication using declarative mappings and
SPARQL Update
lower barrier to use through API design and documentation
embed in organization’s process so the data is used and useful
Acknowledgements
Only possible thanks to many smart colleagues: Stuart
Williams, Andy Seaborne, Ian Dickinson, Brian McBride,
Chris Dollin
plus Alex Coley and team from the Environment Agency