Prov-O-Viz: Interactive Provenance Visualization

PROV-O-Viz
InteractiveProvenanceVisualization
RinkeHoekstra and Paul Groth 
VU University Amsterdam/University of Amsterdam
rinke.hoekstra@vu.nl
TM
to
2Data SemanticsSemantics for Scientific Data PublishersFrom Data
Many slides courtesy of PaulGroth

Provenance
byJenniferCompton http://stillcraic.blogspot.nl/2014/01/tuesday-poem-provenance-by-jennifer.html

Definition 
(OxfordEnglishDictionary)
• The fact of coming from some particular source or quarter;
origin, derivation;
• the history or pedigree of a work of art, manuscript, rare
book, etc.;
• concretely, arecordofthepassage of an item through its
various owners.

Provenance
Making trust judgements on the Web

Provenance
Compliance and auditing of business processes

Provenance
Licensing and attribution of combined information

Provenance
Liability, trust and privacy in open government data

Provenance
Liability, trust and privacy in open government data
Safeguarding quality, reproducibility and integrity of the scientific process

“WebDesignIssues”
“At the toolbar (menu, whatever) associated
with a document there is a button marked
“Oh, yeah?”. You press it when you lose that
feeling of trust. It says to the Web, “so how
do I know I can trust this information?”. The
software then goes directly or indirectly back
to metainformation about the document,
which suggests a number of reasons.”
Tim Berners-Lee, Web Design Issues, September 1997

ProvenanceinWebDocuments
Standards for ethical aggregation?
Curator’s code for attributing discovery?

ProvenanceinOpenGovernment
Need provenance for data integration and reuse 
diversity of data sources 
varying quality 
different scope 
different assumptions
“Provenance is the number one
issue that we face when publishing
government data in data.gov.uk”
John Sheridan, UK National Archives, data.gov.uk

ProvenanceinScience
“We need a paradigm that makes it simple […]
to perform and publish reproducible
computational research. […] a Reproducible
Research Environment (RRE) […] provides
computational tools together with the ability
to automatically track the provenance of data,
analysis, and results and to package them (or
pointers to persistent versions of them) for
redistribution.”
Jill Mesirov, Chief Informatics Officer of the MIT/ 
Harvard Broad Institute, in Science, January 2010
Need provenance for reproducibility  
and verification of processes

W3CWorkingGroup
Provenance is a record that describes the people,
institutions, entities, and activities, involved in
producing, influencing, or delivering a piece of data or
a thing.
http://www.w3.org/TR/prov-overview
Luc Moreau & Paul Groth

Provenance?
• Provenance = Metadata? 
Provenance can be seen as metadata, but not all metadata is
provenance
• Provenance = Trust? 
Provenance provides a substrate for deriving different trust
metrics
• Provenance = Authentication? 
Provenance records can be used to verify and authenticate
amongst users

ThreeDimensions
• Content 
Capturing and representing provenance information
• Management 
Storing, querying, and accessing provenance information
• Use 
Interpreting and understanding provenance in practice

ThreeDimensions
• Content 
• Management 
• Use 
recording

ThreeDimensions
• Content 
• Management 
• Use 
recording annotating

ThreeDimensions
• Content 
• Management 
• Use 
recording annotating workflow systems

ThreeDimensions
• Content 
• Management 
• Use 
scalability

ThreeDimensions
• Content 
• Management 
• Use 
scalability interoperability

ThreeDimensions
• Content 
• Management 
• Use 
trust

ThreeDimensions
• Content 
• Management 
• Use 
trust accountability

ThreeDimensions
• Content 
• Management 
• Use 
trust accountability compliance

ThreeDimensions
• Content 
• Management 
• Use 
trust accountability compliance explanation

ThreeDimensions
• Content 
• Management 
• Use 
trust accountability compliance explanation debugging

Warning: provenance is about history!

NaiveApproaches
InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical Grouping 
Madelaine D. Boyd - http://www.seas.harvard.edu/sites/default/files/files/archived/Boyd.pdf
Orbiter has several limitations. It does not have capabilities for query subgraph high-
lighting, regular expression filters, process grouping, annotations, or programmable views[16].
Furthermore, the structure of each summary node, where child nodes are grouped within
parents and are hidden until the parent is expanded, benefits queries earlier in the depen-
dency chain. Initial overviews often correspond with system bootup, and appear very similar
across di↵erent traces (time slices of system activity).
Figure 10: In these screenshots of Orbiter, the presence of edges overwhelms the visibility of
nodes. By relying on a node-link graph layout and using spatial location to encode object
relationships, Orbiter’s graph layout algorithm must draw many long edges to communi-
cate node connections. Without edge bundling or opacity variation, the meanings of these
relationships are obscured.
Another one of Orbiter’s weaknesses is its node-link diagram layout. As a result, each
node’s position in the X-Y plane and the length and angle of connecting lines are wasted
attributes. The chosen graph layout algorithm (dot by default) arranges nodes to minimize
Figure 11: (Top): A screenshot of the portion of the graph generated by GraphViz for a
trace of the third provenance challenge. (Bottom): A zoomed-in view of the same graph.
The horizontal black bars across the images are dense collections of edges.
E↵ective large graph visualizations present the user with a summary view that can be
explored, filtered, and expanded interactively.
2.5 Tree Visualization
While trees are a subcategory of graphs, because of their hierarchical composition, tree visu-
alization forms its own subfield of research. A survey of over two-hundred tree visualizations
is given at Hans-Jrg Schulz’s treevis.net. Visitors can narrow down by dimensionality
(2D, 3D, or mixed), representation (explicit node-link diagram, implicit treemap, or combi-
nation), alignment (XY plot, radial layout, or free diagram)[55]. These categories are shown
Figure 12: Left: Pajek uses various summary node-link and matrix-based representations
depending on the structure of the supplied data set. Pictured is a main core subgraph
extracted from routing data on the Internet. Right: TopoLayout optimizes the choice of
visualization display depending on the underlying graph structure. The right column is
TopoLayout’s output, while the left and middle columns are the outputs of the GRIP and
FM graph layout algorithms.
Figure 13: treevis.net defines di↵erent categories for tree maps. Tree maps can be cate-
gorized by dimensionality (2D, 3D, or mixed), representation (explicit, implicit, or mixed),
or alignment (XY, radial, or spring).
Tree visualizations are either explicit or implicit. Explicit representations resemble node-
link diagrams. An example of an implicit representation is a tree map, a diagram where the
entire tree is inscribed in a rectangle representing the root node. This root is subdivided
hierarchically into more rectangles, which represent child nodes, and each child node is
subdivided into more child nodes. Treemaps are excellent for displaying hierarchical or
categorical data[57]. One famous example, shown in Figure 14, is the “Map of the Market”
from SmartMoney.com, which displays in red and green the changes in market value of
publicly-traded companies, grouped by market sector, with cell size proportional to market
capitalization[64].
TreePlus is an example of a tree-inspired graph visualization tool (Figure 15). It uses
the guiding metaphor of “plant a seed to watch it grow” to summarize navigation of its tree-

InProv
InProv: Visualizing Provenance Graphs with Radial Layouts and Time-Based Hierarchical Grouping 
Madelaine D. Boyd - http://www.seas.harvard.edu/sites/default/files/files/archived/Boyd.pdf
6 Final Design
Figure 30: A view of a cluster of system activity. This particular timeslice shows the activity
of the init.sh and mount processes.
This visualization was designed with the Visual Information-Seeking Mantra in mind -
“overview ﬁrst, zoom and ﬁlter, then details-on-demand”[56].

D3.js
Visualize the magnitudeofflow between nodes in a network

PROV-O-Vizhttp://provoviz.org
Insert any PROV-O RDF
Or connect to a SPARQL endpoint

Width of activities and entities is based on informationflow
Activities and entities are extracted from an egograph

Move activities and entities around
Hover over interesting dependencies

Embed graph into your own webpage

TomdeNies(Ghent University) 
SaraMagliacane (VU University Amsterdam)

Discussion
• Provenance is vital in many areas 
government, science, industry, …
• PROV is the W3Cstandard for expressing provenance
• Provenance graphs can be overwhelming and complex
• PROV-O-Viz builds intuitive Sankey-style visualizations
• … for any provenance trace expressed using PROV
to
2Data SemanticsSemantics for Scientific Data PublishersFrom Data
http://semweb.cs.vu.nl/provoviz
Thanks to: Paul Groth, Provenance XG, WG, Luc Moreau, James Cheney, Paolo Missier, Olaf Hartig, Satya Sahoo

Prov-O-Viz: Interactive Provenance Visualization

More Related Content

What's hot

Viewers also liked

Similar to Prov-O-Viz: Interactive Provenance Visualization

More from Rinke Hoekstra

Recently uploaded

Prov-O-Viz: Interactive Provenance Visualization