Provenance is broadly defined as the origin or source from which something comes and the history of subsequent owners. In the context of data, process and computation-intensive disciplines, provenance focuses on the description and understanding of where and how data is produced, the actors involved in its production, and the processes applied to it. Provenance has been a hot topic in the last years in scientific disciplines, with a strong emphasis in eScience, where technology and means for representing provenance have been proposed, ranging between different degrees of expressivity. Since the amount of data involved has increased in the different domains, provenance models have eventually evolved into semantic overlays, which describe provenance at different levels of granularity, facilitating user understanding. Nowadays, the need of provenance analysis has expanded beyond scientific domains into the Web of Data arena. The abundance of data is encouraging organizations and governments to publish and expose their data in a way that can be made available to the public and reused for a number of purposes through the Linked Data initiative. However, while an important number of large and interlinked data sets such as the UK government and the BBC web sites are starting to be now publicly available, important challenges still need to be addressed before this vision can be achieved. Amongst them, provenance is one of the most outstanding issues in order to guarantee data quality, trustworthiness and realiability in the Web of Data. In this talk, we will provide an insight on provenance, from eScience to the Web of Data, describing old problems and new challenges, which need to be addressed in the upcoming years.
IA 7: IA? IxD? UX! is an uncooked
collection of definitions, categorizations, outlines, and visualizations concerning
⁄ Information architecture IA,
⁄ Interaction design IxD, and
⁄ User experience UX design.
This deck is an updated version of IA 3: IA Concepts. It’s main purpose is to sear the partially dry substances into my own memory.
Download is disabled due to the copyrighted material within the presentation.
Credits: Alan Dix, Ben Shneiderman, Christina Wodtke, Dan Brown, Don Norman, Erin Malone, George Olsen, Jan Borchers, Jesse James Garrett, Jess McMullin, Olga Howard, Peter Morville, Theo Mandel, Todd Warfel
Image credits: flickr.com/library_of_virginia, /liewcf, /nypl
IA 7: IA? IxD? UX! is an uncooked
collection of definitions, categorizations, outlines, and visualizations concerning
⁄ Information architecture IA,
⁄ Interaction design IxD, and
⁄ User experience UX design.
This deck is an updated version of IA 3: IA Concepts. It’s main purpose is to sear the partially dry substances into my own memory.
Download is disabled due to the copyrighted material within the presentation.
Credits: Alan Dix, Ben Shneiderman, Christina Wodtke, Dan Brown, Don Norman, Erin Malone, George Olsen, Jan Borchers, Jesse James Garrett, Jess McMullin, Olga Howard, Peter Morville, Theo Mandel, Todd Warfel
Image credits: flickr.com/library_of_virginia, /liewcf, /nypl
BEKEE, Expert Knowledge Modeling with Bayesian Belief Networksjouffe
This presentation describes BEKEE (BAYESIA Expert Knowledge Elicitation Environment). This is our fully new web application for Expert Knowledge Modeling with Bayesian Belief Networks, proposing both Interactive and Batch sessions. This environment allows reducing lots of biases (cognitive, group and facilitator), allows greatly improving the traceability of the brainstorming sessions, and comes with news tools for probability verification.
The interplay of affect and cognition in consumer decision making. Baba Shiv & Alexander Fedorikhin
Buying Behavior presentation: Andreea Dicu, Raquel Gonzalez Martin,
François-Xavier Jeanne, Carmen Neghina, Algirdas Sabaliauskas
Education and policies for gifted students are based on past research and learning traditions. But are these ideas sufficient for anticipating and understanding what might come next for developing learners and ourselves? This session draws on futures (or “foresight”) studies to explore evolving contexts for understanding and supporting gifts, giftedness, and creative talent development in our rapidly shifting and complex environments.
This session presents the dilemmas of complexity, and introduces complexity theory models including complex adaptive systems (CAS) and Cynefin to better understand organizational contexts and respond with Innovation.
This session should appeal to Agile practitioners interested in exploring complexity and applying practical techniques for improving Agile project outcomes. The session will discuss the following:
1) Introduce complexity theory and offer Cynefin as a valuable and practical tool for Scrum teams to manage changing contexts and operate Scrum as a Complex Adaptive System.
2) Explain how this enhances sense-making during an Agile project e.g. during sprint planning and user story development.
3) Explain how a team can apply different approaches for Cynefin domains e.g. Probe-Sense-Respond vs. Sense-Analyse-Respond.
4) Discuss useful Cognitive-Edge techniques e.g. safe to fail experimentation, butterfly stamping with the backlog, ritual dissent with solution design.
6) Show how Cynefin practices enhance the role of the ScrumMaster to create more effective and responsive teams.
7) Wet the appetite to start experimenting with Cynefin and build on small successes.
These slides are an overview of the process we present at our two-day Internet Intelligence Workshop and three-day Open Source Intelligence Gathering workshop.
Improving Findability: The Role of Information Architecture in Effective SearchScott Abel
Presented at Documentation and Training East 2007 by Seth Earley -- Search is not just a plug in or a utility. While "just Googling" for information works on the web, there are numerous reasons why this is not always the best approach for intranets and individual web sites.
This slide deck explores the role of information architecture and discusses 5 important strategies for improving search including tuned search, metadata and tagging, faceted search, term expansion and disambiguation, and results clustering.
February 2010 8 Things You Cant Afford To Ignore About eDiscoveryJohn Wang
8 Things You Can't Afford to Ignore About eDiscovery. Unstructured content is growing at an unprecedented rate, reaching 650% over five years, with Fortune 1000 companies managing petabytes of data. With electronically stored information (ESI) being formally covered under the Federal Rules of Civil Procedure (FRCP), organizations need new tools to effectively manage, analyze, and review ESI. This article presents 8 techniques and technologies that can be used to lower costs and improve litigation success.
The Science of Cyber Security Experimentation: The DETER ProjectDETER-Project
Ms. Terry Benzel's keynote presentation slides at the Annual Security Applications Conference (ACSAC) on December 9, 2011. Ms. Benzel's presentation crystalizes many of the key concepts that she (principal investigator) and her team have been working on in The DETER Project (www.deter-project.org). It provides descriptions of the research focused on new transformational methods of increasing knowledge, incorporating higher level, semantic information about experiments, new approaches to scalable modeling and Emulation, and techniques for increasing the efficiency and efficacy of experimentation. Further described at: http://www.deter-project.org/blog/deter_-_keynote_address_acsac_key_new_web_site
Data Mining is the process of discovering new correlations, patterns, and trends by digging into (mining) large amounts of data stored in warehouses, using artificial intelligence, statistical and mathematical techniques. Data mining can also be defined as the process of extracting knowledge hidden from large volumes of raw data i.e. the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The alternative name of Data Mining is Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, etc.
BEKEE, Expert Knowledge Modeling with Bayesian Belief Networksjouffe
This presentation describes BEKEE (BAYESIA Expert Knowledge Elicitation Environment). This is our fully new web application for Expert Knowledge Modeling with Bayesian Belief Networks, proposing both Interactive and Batch sessions. This environment allows reducing lots of biases (cognitive, group and facilitator), allows greatly improving the traceability of the brainstorming sessions, and comes with news tools for probability verification.
The interplay of affect and cognition in consumer decision making. Baba Shiv & Alexander Fedorikhin
Buying Behavior presentation: Andreea Dicu, Raquel Gonzalez Martin,
François-Xavier Jeanne, Carmen Neghina, Algirdas Sabaliauskas
Education and policies for gifted students are based on past research and learning traditions. But are these ideas sufficient for anticipating and understanding what might come next for developing learners and ourselves? This session draws on futures (or “foresight”) studies to explore evolving contexts for understanding and supporting gifts, giftedness, and creative talent development in our rapidly shifting and complex environments.
This session presents the dilemmas of complexity, and introduces complexity theory models including complex adaptive systems (CAS) and Cynefin to better understand organizational contexts and respond with Innovation.
This session should appeal to Agile practitioners interested in exploring complexity and applying practical techniques for improving Agile project outcomes. The session will discuss the following:
1) Introduce complexity theory and offer Cynefin as a valuable and practical tool for Scrum teams to manage changing contexts and operate Scrum as a Complex Adaptive System.
2) Explain how this enhances sense-making during an Agile project e.g. during sprint planning and user story development.
3) Explain how a team can apply different approaches for Cynefin domains e.g. Probe-Sense-Respond vs. Sense-Analyse-Respond.
4) Discuss useful Cognitive-Edge techniques e.g. safe to fail experimentation, butterfly stamping with the backlog, ritual dissent with solution design.
6) Show how Cynefin practices enhance the role of the ScrumMaster to create more effective and responsive teams.
7) Wet the appetite to start experimenting with Cynefin and build on small successes.
These slides are an overview of the process we present at our two-day Internet Intelligence Workshop and three-day Open Source Intelligence Gathering workshop.
Improving Findability: The Role of Information Architecture in Effective SearchScott Abel
Presented at Documentation and Training East 2007 by Seth Earley -- Search is not just a plug in or a utility. While "just Googling" for information works on the web, there are numerous reasons why this is not always the best approach for intranets and individual web sites.
This slide deck explores the role of information architecture and discusses 5 important strategies for improving search including tuned search, metadata and tagging, faceted search, term expansion and disambiguation, and results clustering.
February 2010 8 Things You Cant Afford To Ignore About eDiscoveryJohn Wang
8 Things You Can't Afford to Ignore About eDiscovery. Unstructured content is growing at an unprecedented rate, reaching 650% over five years, with Fortune 1000 companies managing petabytes of data. With electronically stored information (ESI) being formally covered under the Federal Rules of Civil Procedure (FRCP), organizations need new tools to effectively manage, analyze, and review ESI. This article presents 8 techniques and technologies that can be used to lower costs and improve litigation success.
The Science of Cyber Security Experimentation: The DETER ProjectDETER-Project
Ms. Terry Benzel's keynote presentation slides at the Annual Security Applications Conference (ACSAC) on December 9, 2011. Ms. Benzel's presentation crystalizes many of the key concepts that she (principal investigator) and her team have been working on in The DETER Project (www.deter-project.org). It provides descriptions of the research focused on new transformational methods of increasing knowledge, incorporating higher level, semantic information about experiments, new approaches to scalable modeling and Emulation, and techniques for increasing the efficiency and efficacy of experimentation. Further described at: http://www.deter-project.org/blog/deter_-_keynote_address_acsac_key_new_web_site
Data Mining is the process of discovering new correlations, patterns, and trends by digging into (mining) large amounts of data stored in warehouses, using artificial intelligence, statistical and mathematical techniques. Data mining can also be defined as the process of extracting knowledge hidden from large volumes of raw data i.e. the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The alternative name of Data Mining is Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, etc.
Slides supporting the book "Process Mining: Discovery, Conformance, and Enhancement of Business Processes" by Wil van der Aalst. See also http://springer.com/978-3-642-19344-6 (ISBN 978-3-642-19344-6) and the website http://www.processmining.org/book/start providing sample logs.
In this presentation we explore different applications of AI, mainly Natural Language Processing, but also some innovative uses of Computer Vision to deal with two significant challenges: the analysis of scientific publications, involving text but also scientific figures and diagrams, and the identification of potentially radical aspects in text.
The digital universe is booming, especially metadata and user-generated data. This raises strong challenges in order to identify the relevant portions of data which are relevant for a particular problem and to deal with the lifecycle of data. Finer grain problems include data evolution and the potential impact of change in the applications relying on the data, causing decay. The management of scientific data is especially sensitive to this. We present the Research Objects concept as the means to indentify and structure relevant data in scientific domains, addressing data as first-class citizens. We also identify and formally represent the main reasons for decay in this domain and propose methods and tools for their diagnosis and repair, based on provenance information. Finally, we discuss on the application of these concepts to the broader domain of the Web of Data: Data with a Purpose.
We present an approach towards knowledge acquisition of process knowledge for the natural sciences. The work has been conducted within Project Halo, which is creating advanced knowledge authoring and question answering systems for the natural sciences. An analysis of AP®-level questions for Biology, Chemistry and Physics uncovered that process knowledge is the single most frequent type of knowledge required. Thus, we developed means to acquire process knowledge, to formally represent it, and to reason about it in order to answer novel questions about the do-mains.
All these tasks are supported by an abstract process meta-model. It provides the terminology for user-tailored process diagrams, which are automatically translated into executa-ble FLogic code. The meta-model and the code generation are based on the notion of Problem Solving Methods (PSM) which represent an abstract formalization of the reasoning strategies needed for processes.
about process knowledge and how it is possible to enable users without any kind of IT skills to i) model processes and ii) analyze the provenance of process executions, without the intervention of software or knowledge engineers. Jose Manuel proposes the utilization of Problem Solving Methods (PSMs) as key enablers for the accomplishment of such objectives and demonstrates the solutions developed, evaluated in the contexts of Project Halo and the Provenance Challenge, respectively. Jose Manuel concludes the talk with a process-centric overview on the challenges raised by the new web-driven computing paradigm, where large amounts of data are contributed and exploited by users on the web, requiring scalable, non-monotonic reasoning techniques as well as stimulating collaboration while preserving trust.
V jornada en automatización e informática industrial: "Tecnologías de información y control en salud".
En esta charla hablaremos de por qué la tecnología semántica es importante en eHealth, centrándonos en gran medida en la relación entre SNOMED y OWL y cómo el uso de tecnologías semánticas puede agilizar el proceso de postcoordinación. También hablaremos de cómo las tecnologías semánticas pueden contribuir a resolver el problema de interoperabilidad entre terminologías y veremos algunos recursos semánticos disponibles hoy día. Repasaremos los proyectos en los que hemos tratado estos temas y mostraremos una panorámica de lo que se está haciendo a nivel europeo en este momento.
"At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, 'so how do I know I can trust this information?'. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons."
Tim Berners-Lee, W3C Chair, Web Design Issues, September 1997
Provenance is focused on the description and understanding of where and how data is produced, the actors involved in the production of such data, and the processes by which the data was manipulated and transformed until it arrived to the collection from which it is being accessed. Provenance aims at providing the ability to trace the sources of data, enabling the exploration not just of the relationships between datasets, but also of their authors and affiliations, with the goal of preserving data ownership and establishing a notion of trust based on authenticity and reliability.
The Future Internet poses important challenges for provenance, derived from complex and rich scenarios characterized by the presence of large amounts of data stemming from heterogeneous sources like user communities, services, and things. Such challenges span across technical but also socioeconomic dimensions. The former includes aspects like vocabularies for representing provenance, interoperability and scalability issues, and means to produce, acquire, and reason with provenance in order to provide measures of trust and information quality. However, it is probably in the socieconomic dimension where more significant efforts need to be made as to addressing issues like the role of provenance in the overall picture of the Future Internet, entry barriers preventing the generation of provenance-aware internet content, means required to incentivate the production of such content, and ways to prevent provenance forgery.
In this talk, we provide and overview on provenance and the above mentioned challenges and introduce ongoing work in order to address trust issues from the provenance perspective in the Future Internet. We also link provenance to other relevant aspects for trust discussed in the session, like security, legal frameworks, and economics.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
3. Provenance is…
Records of
Origin or source from
which something comes
History of subsequent
owners (change of
custody)
Adapted from James Cheney’s Principles of Provenance
3
4. Provenance is…
Evidence of authenticity, integrity,
and quality
Certifies products of good process
Adapted from James Cheney’s Principles of Provenance
4
5. Provenance is…
Valuable
Hard to collect and verify
Necessary to assign credit
…and blame
i.e. establish
Trust
Adapted from James Cheney’s Principles of Provenance
5
6. Why provenance of electronic data is difficult
Paper data Electronic data
Creation process leaves Often, there is no bits
paper trail trail
Easier to detect Easy to forge,
modification, copy, plagiarize, and modify
forgery data
Usually, one can judge There is no cover to
a book by the cover judge by
Addressing this requires
explicitly representing the
provenance of data, store it, keep it
secure, and reason with it.
Adapted from James Cheney’s Principles of Provenance
6
7. Provenance in eScience
One of the most active fields in Provenance development
Curated scientific biologic databases
- Ensure database quality
- Need provenance for data quality control and accountability
- Currently done manually by curators
Scientific workflows – grid computing
- Abstract process execution complexity
- Need provenance for process reproducibility, efficiency
- Currently supported by ad-hoc systems
7
11. Semantic overlays for provenance analysis
Objective: To support domain experts in
Problem Solving Methods
understanding process executions (PSMs) (McDermott 1988)
How
• Provide reusable guidelines
to formulate process
knowledge
• Support reasoning
• Describe the main rationale
Semantic behind a process
What
Overlays
Whom
PROVENANCE SMEs
11
12. PSM perspectives
Task-method Interaction
decomposition
Black-box perspective
Knowledge transformation
within the PSM
Hierarchically defines how tasks
PSM establishes and controls the decompose into simpler
sequence of actions required to (sub)tasks
perform a task Describes tasks at several levels
Defines knowledge required at of detail
each task step
Provides alternative ways to
achieve a task
Knowledge flow
Task
Method
Role
12
13. Towards knowledge provenance
PSMs as semantic overlays on top
of existing process documentation
Task: What is going to be
achieved by executing a process
PSM: HOW
Provenance, from a knowledge perspective
- How recorded provenance relates to the execution of a
process
- Simpler process analysis proposing decompositions into
simpler subprocesses
- Visualize provenance at different levels of detail
Supporting domain experts in two main ways
- Validation of process executions
Source: myGrid - Identification of reasoning patterns in process executions
13
14. The twig join function
Based on XML pattern matching algorithms on Directed Acyclic
Graphs (Bruno et al., 2002)
twig_join detects the occurrence of a pattern in a XML DAG
Given
- P, a process
- T, a task potentially describing P
- M, a PSM providing a strategy on how to achieve T
- i(T), the set of input roles of T
- o(T), the set of output roles of T
- D, the DAG resulting from documenting the execution of P
twig_join(D,i(T),o(T)) checks whether a twig exists for M that
connects i(T) with o(T) in D
In this case, PSM M is the pattern to be identified in the process
documentation DAG D
14
15. A twig join example in provenance analysis
Domain Bridges PSM entities
entities (mapping)
twig join!
15
16. The matching algorithm
• twig_join recursively applied at
Task-method
decomposition
each decomposition level
• Each task decomposed by one
or several PSMs (task-method
twig_join(Ti, D) decomposition view)
• Knowledge flow defines the
sequence of evaluation
decompose(Ti)
twig_join(T11, D)
Knowledge flow
twig_join(T12, D)
twig_join(T13, D)
Backtracking
possible at PSM and
role levels
twig_join(T14, D)
Interaction
16
20. KOPE evaluation (II)
120%
Focus on precision and recall
100%
metrics
80%
60%
Precision Identified at three different
Recall
40% layered contexts
20% - Method
0%
Level1 Level2 Level3 Level4 - Task
Goal 1: identify the main
- Decomposition-level
rationale behind process
executions by detecting
occurrences of semantic
overlays in their logs
Goal 2: To exploit the
structure of semantic
overlays to describe
process executions at
different levels of detail
Perfect match
Partial match
No match
20
23. While the economy contracts, the digital universe expands…
Source: IDC
In 2006, the size of the digital universe
was estimated in 161 exabytes
3 million times, the information in all
books ever written
By 2010, expected to turn 988
exabytes
…and all this data is potentially
exposed online
23
25. The Linked Data paradigm
Tim Berners Lee, 2006 (Design Issues)
How can we
exploit all the
available data? 1. Use URIs to identify things
- Anything, not just documents
2. Use HTTP URIs for people to
Data reuse and remix lookup such names
Common flexible and usable APIs - Globally unique names
Standard vocabularies to - Distributed ownership
describe interlinked datasets 3. Provide useful information in RDF
Tools upon URI resolution
Realize the Semantic Web vision 4. Include RDF links to other URIs
- Enable discovery of related
information
25
31. The Web of Data
Apply the Linked Data principles to expose open datasets in
RDF
Define RDF links between data items for different datasets
Over 7.5 billion triples, 5 million links (as of November 2009)
31
34. A real-life example
Linking and exploiting distributed data sets without the
means that allow contrasting its provenance can be harmful,
Two fake web sites
especially in sensitive domains.
A fake Wikipedia entry
Fake California public safety phone
numbers
The hoax caused a 1000-word tome on
Frankfurter Allgemeine Zeitung… and
public apologies from DPA
Trust on Wikipedia misled DPA
In a provenance-aware world, DPA
would have had means based on data
provenance to automatically check that
- The town did not exist
- The Berlin Boys do not exist
- The reporting local TV station does not exist
34
35. The Linked Data flow
Linked Data applications
Data trustworthiness
Exploit Linked Data
SPARQL EPRs
Provenance
Provenance
Linked Data
Data quality
Publish Linked Data
(RDF, HTTP, URIs)
Web documents
Data lineage
Multimedia
Legacy resources e.g.
DBs, XML repositories
35
36. Provenance and Linked Data
Linked Data is largely about reusing. However, reusing data from 3rd
parties requires knowing its provenance!!! Is the data Is the quality
reliable? of the data
Provenance shall provide the ability to good?
- Trace the sources of data
- Enable the exploration of relationships between datasets, their authors and
affiliations
Provenance analysis shall provide an insight on how data is produced
and exploited
Provenance should create a notion of information quality
- is a certain dataset consistent and up to date?
- is the connection between two interlinked datasets meaningful?
- is a given dataset relevant for a particular domain?
Provenance to establish information trustworthiness
Provenance to provide data views following some criteria
36
37. Provenance challenges in the Web of Data
Provenance information needs to be
Represented
Captured and recorded
Stored and secured, queried, and reasoned about
Visualized and browsed
37
38. A Provenance architecture for the Web of Data
Authoritative
agencies required
to certify and keep
data provenance
secure!!!
38
39. Semantics in support of provenance in the Web of Data
Semantic Web Provenance
stack stack
This, we still
need to define!
Information quality
inference
Trust inference
Reasoning with provenance
Provenance querying
Provenance capture
Provenance access policy definition
Provenance encryption
39
40. Towards a model of Web Data provenance
Adapted from Olaf Hartig’s Provenance
Information in the Web of data
Provenance represented as a graph
- Nodes: provenance elements (pieces of provenance information)
- Edges: relate provenance elements to each other
- Subgraphs for related data items possible
Provenance models define
- Types of provenance elements (roles)
- Relationships between them
Actor
Execution
Artifact
40
41. Provenance-related vocabularies
DC – Dublin Core Metadata Terms
FOAF – Friend of a Friend
SIOC – Semantically-Interlinked Online
Communities
SWP – Semantic Web Publishing vocabulary
WOT – Web Of Trust schema
VOiD – VOcabulary of Interlinked Datasets
However, general lack of
provenance-related
metadata on the Web of
Data!
41
42. Action points
Provenance Awareness of Tools for data
vocabularies data providers providers
Represent and reason
Generation of
with trust and
W3C Provenance IG provenance metadata
information quality
Extend emerging
Provenance
Linked Data
authoritative agencies
vocabularies
Linked Data
standards (VOiD Provenance
VOiD again) visualization
42
45. José Manuel Gómez-Pérez
Thanks for R&D Director
your T +34913349778
attention! M +34609077103
jmgomez@isoco.com
iSOCO
Para obtener más información sobre como puede
ayudar a su empresa a optimizar sus negocios digitales y aportar
una solución innovadora, contáctenos en
www. .com
Barcelona Madrid Valencia
Tel +34 93 5677200 +34 91 3349797 +34 96 3467143
Edificio Testa A C/Pedro de Valdivia, 10 Oficina 107
C/ Alcalde Barnils 64-68 28006 Madrid C/ Prof. Beltrán Báguena 4,
St. Cugat del Vallès 46009 Valencia
08174 Barcelona
45