This document discusses the importance of provenance and trust in linked data. It begins by explaining that provenance, which records the origin and history of data, is key to assessing data reliability and supporting reproducibility. Maintaining provenance allows users to evaluate the trustworthiness of data sources and contents. When applied to linked data, provenance can help address issues of bias, establish attribution, and enable automatic reasoning across interlinked datasets. The document provides examples of how provenance could improve trust in personal cloud services and scientific knowledge preservation on the semantic web.
"At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, 'so how do I know I can trust this information?'. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons."
Tim Berners-Lee, W3C Chair, Web Design Issues, September 1997
Provenance is focused on the description and understanding of where and how data is produced, the actors involved in the production of such data, and the processes by which the data was manipulated and transformed until it arrived to the collection from which it is being accessed. Provenance aims at providing the ability to trace the sources of data, enabling the exploration not just of the relationships between datasets, but also of their authors and affiliations, with the goal of preserving data ownership and establishing a notion of trust based on authenticity and reliability.
The Future Internet poses important challenges for provenance, derived from complex and rich scenarios characterized by the presence of large amounts of data stemming from heterogeneous sources like user communities, services, and things. Such challenges span across technical but also socioeconomic dimensions. The former includes aspects like vocabularies for representing provenance, interoperability and scalability issues, and means to produce, acquire, and reason with provenance in order to provide measures of trust and information quality. However, it is probably in the socieconomic dimension where more significant efforts need to be made as to addressing issues like the role of provenance in the overall picture of the Future Internet, entry barriers preventing the generation of provenance-aware internet content, means required to incentivate the production of such content, and ways to prevent provenance forgery.
In this talk, we provide and overview on provenance and the above mentioned challenges and introduce ongoing work in order to address trust issues from the provenance perspective in the Future Internet. We also link provenance to other relevant aspects for trust discussed in the session, like security, legal frameworks, and economics.
Movie Recommendation with DBpedia
Roberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Claudio Ostuni, Eugenio Di Sciascio
3rd Italian Information Retrieval Workshop (IIR 2012) - Bari
January 26, 2012
In this paper we present MORE (acronym of MORE than MOvie REcommendation), a Facebook application that semantically recommends movies to the user leveraging the knowledge within Linked Data and the information elicited from her profile. MORE exploits the power of social knowledge bases (e.g. DBpedia) to detect semantic sim- ilarities among movies. These similarities are computed by a Semantic version of the classical Vector Space Model (sVSM), applied to semantic datasets. Precision and recall experiments prove the validity of our ap- proach for movie recommendation. MORE is freely available as a Facebook application.
"At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, 'so how do I know I can trust this information?'. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons."
Tim Berners-Lee, W3C Chair, Web Design Issues, September 1997
Provenance is focused on the description and understanding of where and how data is produced, the actors involved in the production of such data, and the processes by which the data was manipulated and transformed until it arrived to the collection from which it is being accessed. Provenance aims at providing the ability to trace the sources of data, enabling the exploration not just of the relationships between datasets, but also of their authors and affiliations, with the goal of preserving data ownership and establishing a notion of trust based on authenticity and reliability.
The Future Internet poses important challenges for provenance, derived from complex and rich scenarios characterized by the presence of large amounts of data stemming from heterogeneous sources like user communities, services, and things. Such challenges span across technical but also socioeconomic dimensions. The former includes aspects like vocabularies for representing provenance, interoperability and scalability issues, and means to produce, acquire, and reason with provenance in order to provide measures of trust and information quality. However, it is probably in the socieconomic dimension where more significant efforts need to be made as to addressing issues like the role of provenance in the overall picture of the Future Internet, entry barriers preventing the generation of provenance-aware internet content, means required to incentivate the production of such content, and ways to prevent provenance forgery.
In this talk, we provide and overview on provenance and the above mentioned challenges and introduce ongoing work in order to address trust issues from the provenance perspective in the Future Internet. We also link provenance to other relevant aspects for trust discussed in the session, like security, legal frameworks, and economics.
Movie Recommendation with DBpedia
Roberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Claudio Ostuni, Eugenio Di Sciascio
3rd Italian Information Retrieval Workshop (IIR 2012) - Bari
January 26, 2012
In this paper we present MORE (acronym of MORE than MOvie REcommendation), a Facebook application that semantically recommends movies to the user leveraging the knowledge within Linked Data and the information elicited from her profile. MORE exploits the power of social knowledge bases (e.g. DBpedia) to detect semantic sim- ilarities among movies. These similarities are computed by a Semantic version of the classical Vector Space Model (sVSM), applied to semantic datasets. Precision and recall experiments prove the validity of our ap- proach for movie recommendation. MORE is freely available as a Facebook application.
This talk introduces Linked Data and Semantic Web by using two examples - population sciences grid and semantAqua - a semantically enabled environmental monitoring. It shows a few tools and the semantic methodology and opens discussion for LOD and team science
Semantics empowered Physical-Cyber-Social Systems for EarthCubeAmit Sheth
Presentation at the EarthCube Face Face-to-Face Workshop of Semantics & Ontologies Workgroup: April 30-May 1, 2012, Ballston, VA.
Workshop site: http://earthcube.ning.com/group/semantics-and-ontologies/page/workshops
For more recent material on this topic, see: http://wiki.knoesis.org/index.php/PCS
Want to get involved in our big data/big content efforts? Direct Tweet me at jmancini77 -- I also did a blog post on this topic -- http://www.digitallandfill.org/2012/03/big-data-and-big-content-just-hype-or-a-real-opportunity.html
Linked Data Warehouses: A new breed of Business Intelligence3 Round Stones
Using a Linked Data approach for publication & consumption of data on the Web is significantly reducing the costs and complexity of reaching many more consumers of your content. This presentation highlights how Best Buy, BBC, US EPA and Sentara Healthcare are leveraging a Linked Data approach. Session delivered at Enterprise Data World 2012 in Atlanta GA, USA on 2-May-2012.
We present an approach towards knowledge acquisition of process knowledge for the natural sciences. The work has been conducted within Project Halo, which is creating advanced knowledge authoring and question answering systems for the natural sciences. An analysis of AP®-level questions for Biology, Chemistry and Physics uncovered that process knowledge is the single most frequent type of knowledge required. Thus, we developed means to acquire process knowledge, to formally represent it, and to reason about it in order to answer novel questions about the do-mains.
All these tasks are supported by an abstract process meta-model. It provides the terminology for user-tailored process diagrams, which are automatically translated into executa-ble FLogic code. The meta-model and the code generation are based on the notion of Problem Solving Methods (PSM) which represent an abstract formalization of the reasoning strategies needed for processes.
about process knowledge and how it is possible to enable users without any kind of IT skills to i) model processes and ii) analyze the provenance of process executions, without the intervention of software or knowledge engineers. Jose Manuel proposes the utilization of Problem Solving Methods (PSMs) as key enablers for the accomplishment of such objectives and demonstrates the solutions developed, evaluated in the contexts of Project Halo and the Provenance Challenge, respectively. Jose Manuel concludes the talk with a process-centric overview on the challenges raised by the new web-driven computing paradigm, where large amounts of data are contributed and exploited by users on the web, requiring scalable, non-monotonic reasoning techniques as well as stimulating collaboration while preserving trust.
This talk introduces Linked Data and Semantic Web by using two examples - population sciences grid and semantAqua - a semantically enabled environmental monitoring. It shows a few tools and the semantic methodology and opens discussion for LOD and team science
Semantics empowered Physical-Cyber-Social Systems for EarthCubeAmit Sheth
Presentation at the EarthCube Face Face-to-Face Workshop of Semantics & Ontologies Workgroup: April 30-May 1, 2012, Ballston, VA.
Workshop site: http://earthcube.ning.com/group/semantics-and-ontologies/page/workshops
For more recent material on this topic, see: http://wiki.knoesis.org/index.php/PCS
Want to get involved in our big data/big content efforts? Direct Tweet me at jmancini77 -- I also did a blog post on this topic -- http://www.digitallandfill.org/2012/03/big-data-and-big-content-just-hype-or-a-real-opportunity.html
Linked Data Warehouses: A new breed of Business Intelligence3 Round Stones
Using a Linked Data approach for publication & consumption of data on the Web is significantly reducing the costs and complexity of reaching many more consumers of your content. This presentation highlights how Best Buy, BBC, US EPA and Sentara Healthcare are leveraging a Linked Data approach. Session delivered at Enterprise Data World 2012 in Atlanta GA, USA on 2-May-2012.
We present an approach towards knowledge acquisition of process knowledge for the natural sciences. The work has been conducted within Project Halo, which is creating advanced knowledge authoring and question answering systems for the natural sciences. An analysis of AP®-level questions for Biology, Chemistry and Physics uncovered that process knowledge is the single most frequent type of knowledge required. Thus, we developed means to acquire process knowledge, to formally represent it, and to reason about it in order to answer novel questions about the do-mains.
All these tasks are supported by an abstract process meta-model. It provides the terminology for user-tailored process diagrams, which are automatically translated into executa-ble FLogic code. The meta-model and the code generation are based on the notion of Problem Solving Methods (PSM) which represent an abstract formalization of the reasoning strategies needed for processes.
about process knowledge and how it is possible to enable users without any kind of IT skills to i) model processes and ii) analyze the provenance of process executions, without the intervention of software or knowledge engineers. Jose Manuel proposes the utilization of Problem Solving Methods (PSMs) as key enablers for the accomplishment of such objectives and demonstrates the solutions developed, evaluated in the contexts of Project Halo and the Provenance Challenge, respectively. Jose Manuel concludes the talk with a process-centric overview on the challenges raised by the new web-driven computing paradigm, where large amounts of data are contributed and exploited by users on the web, requiring scalable, non-monotonic reasoning techniques as well as stimulating collaboration while preserving trust.
Prov-O-Viz is a visualisation service for provenance graphs expressed using the W3C PROV vocabulary. It uses the Sankey-style visualisation from D3js.
See http://provoviz.org
These are the slides for Robert H. McDonald for the Future Trends Panel Presentation at the the Inter-institutional Approaches to Supporting Scholarly Communication Symposium held on August 16, 2012 at the Georgia Institute of Technology.
Open Data, by definition, provides the chance to re-shape and publish heterogeneous pieces and fragments of information which are open, namely anyone is free to use, reuse, and redistribute it. In order for users to fully benefit this idea, Open Data Systems of tomorrow must provide high quality data, relying on real time and ubiquitous services, along with a deep integration with mobile and smart devices and infrastructures.
In this session, we present a syntheses of Whitehall proposal addressed a this vision: is addressed at building Open Data in a fully-fledged Big Data infrastructure, realized using graph based and NoSQL technologies. This idea is shaped in a cultural heritage scenario, where data in envisaged at valorizing one of the main assets of Italy: cultural heritage.
Linked Data and Semantic Technologies can support a next generation of science. This talk shows examples of discovery, access, integration, analysis, and shows directions towards prediction and vision.
Data Mining on Web URL Using Base 64 Encoding to Generate Secure URNIJMTST Journal
The current Web has no general mechanisms to make digital artifacts such as datasets, code, texts, and images verifiable and permanent. For digital artifacts that are supposed to be immutable, there is moreover no commonly accepted method to enforce this immutability. These shortcomings have a serious negative impact on the ability to reproduce the results of processes that rely on Web resources, which in turn heavily impacts areas such as science where reproducibility is important. To solve this problem, we propose trusty URIs containing cryptographic hash values. We show how trusty URIs can be used for the verification of digital artifacts, in a manner that is independent of the serialization format in the case of structured data files such as nano publications. We demonstrate how the contents of these files become immutable, including dependencies to external digital artifacts and thereby extending the range of verifiability to the entire reference tree. Our approach sticks to the core principles of the Web, namely openness and decentralized architecture, and is fully compatible with existing standards and protocols.
Where is the opportunity for libraries in the collaborative data infrastructure?LIBER Europe
Presentation by Susan Reilly at Bibsys2013 on the opportunties for libraries and their role in the collaborative data infrastructure. Looks at data sharing, authentication, preservation and advocacy.
Cloud Security: Trust and TransformationPeter Coffee
Common concerns regarding cloud security are increasingly being recognized as speculative cases, compared to the reality of how IT governance often fails in traditional on-premise environments: failure modes that the cloud model greatly offsets
presented at Cloud 5 Feb 2011 - open source meet about content analytics and market research. Argues that online data may be more usefully interpreted as if it was purely behavioural or contextual rather than as content. The earliest presentation where I talk about the failure of research to properly address context - something I write about regularly in 2012
The amount of data in our world today is substantially outsized. Many of the personal and non-personal aspects of our day-to-day activities are aggregated and stored as data by both businesses and governments. The increasing data captured through multimedia, social media, and the Internet are a phenomenon that needs to be properly examined. In this article, we explore this topic and analyse the term data ownership. We aim to raise awareness and trigger a debate for policy makers with regard to data ownership and the need to improve existing data protection, privacy laws, and legislation at both national and international levels.
Similar to Trust and linked data jmgomez-v1.1 (20)
1. iSOCO
Trust and Linked Data
José Manuel Gómez Pérez
Linked Data in the Future internet
Future Internet Assembly
Ghent, 16th December 2010
2. Scope of the talk
Why/how establishing a measure of
trust is key to realize the Linked Data
vision
Provenance Trust Linked Data
How can Linked Data contribute to
support trust in the Future Internet
Architecture
2
3. Provenance is key in many domains
For the Web Architecture
- "At the toolbar (menu, whatever) associated with a document there is a button
marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to
the Web, 'so how do I know I can trust this information?'. The software then
goes directly or indirectly back to metainformation about the document, which
suggests a number of reasons.“-- Tim Berners-Lee, W3C Chair, Web Design
Issues, September 1997
For Linked Data
- "Provenance is the number one issue we face when publishing government
data as linked data for data.gov.uk” -- John Sheridan, UK National Archives,
data.gov.uk, February 2010
For Science
- "We need a paradigm that makes it simple [...] to perform and publish
reproducible computational research. [...] A Reproducible Research
Environment (RRE) [...] provides computational tools together with the ability
to automatically track the provenance of data, analyses, and results and to
package them (or pointers to persistent versions of them) for redistribution."
3
4. Provenance is…
Records of
- Sources of information, including
entities and processes, involved in
producing or delivering data
- History of subsequent owners (chain
of custody)
Motivation to maintain provenance
records
1. Assessing data reliability and quality
2. Providing a justification of the state
of a data product
3. Supporting process reproducibility
4. Determining ownership for the data
derivation
4
5. Provenance is…
Valuable and objective
Necessary to assign credit
…and blame
i.e. fundamental to
establish Trust
5
6. Trust and provenance
Who created the data
(author/attribution)?
Were the data ever manipulated, if so
by what processes/entities?
Who is providing access to the data
(repository)?
Can any of the answers to these
questions be verified?
6
7. Trust and provenance (II)
Association
- Source is NYT, source cites NYT
- Source is cited in Wikipedia
Bias, e.g. source is an oil company
Distrust, e.g. source is a blog
Trust measures derived from
provenance information
7
8. Trust and Provenance: A real-life example
• Two fake web sites
Reusing web data without the means that allow • A fake Wikipedia entry
contrasting its provenance can be harmful, • Fake California public safety
especially in sensitive domains.
phone numbers
• Fake local TV station
The hoax caused a 1000-word tome
on Frankfurter Allgemeine Zeitung…
and public apologies from DPA
Trust on Wikipedia misled DPA
In a provenance-aware world, DPA
would have had means based on
data provenance to evaluate trust
- Bluewater did not exist
- The Berlin Boys do not exist
8
9. Why Provenance is key in Linked Data
“The goal of the W3C SWEO Linking Open Data
community project is to extend the Web with a data
commons by publishing various open data sets as RDF
on the Web and by setting RDF links between data Web data comes from diverse
items from different data sources”
data sources
- Varying quality
- Different scope
- Different assumptions
Often derived from replication,
query processing, modification,
merging…
- Poor quality data can propagate
quickly through the interlinked data
cloud
Important to keep track of who
(agent) created a particular
piece of data and how (process)
Eventually, for the computation
of quality measures like
timeliness and trustworthiness
9
10. Why Linked Data is Key in Provenance and Trust
Provenance metadata can be
Linked Data Design Issues published on the Web following the
Tim Berners-Lee, 2006 Linked Data Principles
1. Use URIs to identify things Represented in (hopefully)
standard RDFS/OWL provenance
- Anything, not just documents
vocabularies (OPM, PML, PRV…)
2. Use HTTP URIs for people to - W3C Provenance Incubator group
lookup such names
Stored and secured in scalable
- Globally unique names
semantic repositories
- Distributed ownership
Accessible through SPARQL
3. Provide useful information in endpoints for provenance-aware
RDF upon URI resolution applications
4. Include RDF links to other URIs Available for automatic reasoning
- Enable discovery of related
Interlinked, so that provenance
information
information can be
- Enriched across different
provenance datasets
- Contrasted between different
sources 10
11. Example I: Personal cloud services
Scenario inspired by one of the most
popular cloud services: Gmail
Virtualization may hinder the transparency
required to inspect what is done with
personal data
- Are my data being used for the intended purpose
and not others?
- Are my data being shared with their intended
recipients and not others?
Personalized advertisements based on
email data (alternatively useful and
annoying, privacy issues)
Data management upon contract
termination (storage traceability and
personal info at third party inbox)
Auditing capabilities required addressing
the entire data chain (acquisition,
dissemination, storage, and usage)
Complete scenario available at the Provenance Challenge:
http://twiki.ipaw.info/bin/view/Main/TrustAndPrivacyManagementInCloudPlatforms
11
12. Example II: Preservation of scientific knowledge
Key for Integrity & Authenticity
maintenance of research objects
Preservation of scientific workflows
and their associated research
objects
Integrity
- Condition of being whole, complete
and unaltered
- Crucial for ensuring the quality of
preserved data in research objects
Authenticity
- Proof of the origin of data,
genuineness, trustworthiness and
realness
- Ensuring an entity, e.g. a person or
other kind of actor is genuine and
has the right credentials
12
13. Dr. José Manuel Gómez-Pérez
R&D Director
Thanks for
jmgomez@isoco.com
T +34 609 077 103 your attention!
Barcelona Madrid Pamplona Valencia
Tel +34 935 677 200 Tel +34 913 349 797 Tel +34 948 102 408 Tel +34 963 467 143
Edificio Testa A Av. del Partenón, 16-18, 1º7ª Parque Tomás Oficina 107
C/ Alcalde Barnils, 64-68 Campo de las Naciones Caballero, 2, 6º4ª C/ Prof. Beltrán Báguena, 4
St. Cugat del Vallès 28042 Madrid 31006 Pamplona 46009 Valencia
08174 Barcelona
Infromation on Provenance standardization activities and
outcome of the W3C Provenance Incubator Group available at:
http://www.w3.org/2005/Incubator/prov/wiki
13