Provenance and Uncertainty in Human
Terrain Visual Analytics
Kai Xu
Middlesex University, UK
Background: DIVA Project
• DIVA: Data Intensive Visual Analytics
• EPSRC (UK Research Council) and DSTL (Defence
Science and Technology Lab)
• Uncertainty in Human Terrain Analysis
– Help ground troops understand local social structure
– Working with large and heterogeneous data sets

• Approach
– Visual Analytic
– Provenance
Provenance
• “The place of origin or earliest
known history of something”
(Oxford Dictionary)
• “The sources of information,
such as entities and processes,
involved in producing an
artefact” (W3C).
Different Types of Provenance
• Data provenance:
– Data source and collection
– Data changes & quality issues

• Computation provenance:
– Workflow
– Parameters & results

• Visual exploration
provenance:
– User interactions
– Insights

• Reasoning/sensemaking
provenance:
– Reasoning artefact: evidence,
hypothesis, etc.

Transformation
and Analysis
Data
Collection

Knowledge
and insights
Visualisation
and Interaction

Conclusions
/ Decisions

Analytic Provenance
Why Provenance?
• Provide the ‘context’ of • Data/analysis quality
– Data and analysis
– Reasoning and decision

• Reproducibility
– Trace the source
– Automatic update

• Help others understand
the process
– Collaboration
– Reporting

– Missing data, errors,
and uncertainty
– Computational analysis
artefacts
– Human reasoning bias

• Trust
– Understanding of data,
analysis, and reasoning
helps build the trust
DIVA Project - Details
•
•
•
•
•

Process for this project (participatory design)
Schema for data and provenance (ProveML)
Prototype system for HTVA
Constructing narratives
Demo/Video
Workshops
Requirements: Data Characteristics
•
•
•
•

Semi-structured
Clear language
Different perspectives
Synthesized or derived data
Requirements: Uncertainty Types
•
•
•
•

Source uncertainty
Collection bias
Spoofing or astroturfing
Automated extraction of information
•
•
•
•
•

Process for this project
Schema for data and provenance
Prototype system for HTVA
Constructing narratives
Demo/video
ProveML and Facets
• ProveML: Provenance XML
• Facets: document, author, place, time, and
theme
• Review as ‘document’
Place
Author

Write
Theme
Document

Time
Insight as ‘Document’
Mariachi
Tequila
Shack

Place

Author

Write

Time

Pancho
Villa's
Quesadilla

Paco's Bar
and Grill

Mexican

Mexican food is
becoming more
popular

Restaurant

Theme
Document

A. N.
Analyst

Insight

Reviews & insights ↔ A ProveML graph
Mariachi
Tequila
Shack
Pancho
Villa's
Quesadilla

Mexican

Mexican food is
becoming more
popular

Paco's Bar
and Grill

Restaurant

Insight

A. N.
Analyst

Mexican food is
becoming more
popular

A. N.
Analyst

Collection: all
places tagged with
both Mexican and
Restaurant

Insight
•
•
•
•
•

Process for this project
Schema for data and provenance
Prototype system for HTVA
Constructing narratives
Demo/video
i
d
w
s
w
n
d
e
a
t
w

Fig. 4: Summary graphics showing the distribution of values for each

t
l
n
f
s
•
•
•
•
•

Process for this project
Schema for data and provenance
Prototype system for HTVA
Constructing narratives
Demo/Video
Visual Exploration in ProveML
Collection

State

<visual encoding>

A. N. Analyst
Link to the Rest of ProveML Graph
Bookmark

Collection

State

A comment about why
this is important

A. N. Analyst
Visual Summary of a State
A Series of States
Spatial Uncertainty
Constructing Narrative
•
•
•
•
•

Process for this project
Schema for data and provenance
Prototype system for HTVA
Constructing narratives
Demo/Video
Social Media: VAST Challenge 2011
Conclusions and Future Work
• Framework for provenance and uncertainty in
Human Terrain Analysis
• Some confidence that our work is relevant and
directly related to Dstl requirements
• Try ProveML with other data sets
• Semantically-rich provenance in the future:
infer analyst intent from actions
The Team
City
University
(London)
Jason Dykes

Jo Wood

Aidan Slingsby

Derek Stephens
Loughborough
University, UK

Middlesex
University
(London)
William Wong

Rick Walker

Phong
Nguyen

Yongjun
Zheng
Visit Us @ Middlesex University
• North West London: Google Map
• Interaction Design Centre
• Lots of Visual Analytics Research
– UK Visual Analytics Consortium: Oxford, Imperial,
UCL, and Bangor
– Visual Analytics Summer School and MSc program
– MoD, EPSRC, and EU projects

• Always look for collaboration

The Data-Intensive Visual Analytics (DIVA) project

  • 1.
    Provenance and Uncertaintyin Human Terrain Visual Analytics Kai Xu Middlesex University, UK
  • 2.
    Background: DIVA Project •DIVA: Data Intensive Visual Analytics • EPSRC (UK Research Council) and DSTL (Defence Science and Technology Lab) • Uncertainty in Human Terrain Analysis – Help ground troops understand local social structure – Working with large and heterogeneous data sets • Approach – Visual Analytic – Provenance
  • 3.
    Provenance • “The placeof origin or earliest known history of something” (Oxford Dictionary) • “The sources of information, such as entities and processes, involved in producing an artefact” (W3C).
  • 4.
    Different Types ofProvenance • Data provenance: – Data source and collection – Data changes & quality issues • Computation provenance: – Workflow – Parameters & results • Visual exploration provenance: – User interactions – Insights • Reasoning/sensemaking provenance: – Reasoning artefact: evidence, hypothesis, etc. Transformation and Analysis Data Collection Knowledge and insights Visualisation and Interaction Conclusions / Decisions Analytic Provenance
  • 5.
    Why Provenance? • Providethe ‘context’ of • Data/analysis quality – Data and analysis – Reasoning and decision • Reproducibility – Trace the source – Automatic update • Help others understand the process – Collaboration – Reporting – Missing data, errors, and uncertainty – Computational analysis artefacts – Human reasoning bias • Trust – Understanding of data, analysis, and reasoning helps build the trust
  • 6.
    DIVA Project -Details • • • • • Process for this project (participatory design) Schema for data and provenance (ProveML) Prototype system for HTVA Constructing narratives Demo/Video
  • 7.
  • 8.
    Requirements: Data Characteristics • • • • Semi-structured Clearlanguage Different perspectives Synthesized or derived data
  • 9.
    Requirements: Uncertainty Types • • • • Sourceuncertainty Collection bias Spoofing or astroturfing Automated extraction of information
  • 13.
    • • • • • Process for thisproject Schema for data and provenance Prototype system for HTVA Constructing narratives Demo/video
  • 14.
    ProveML and Facets •ProveML: Provenance XML • Facets: document, author, place, time, and theme • Review as ‘document’ Place Author Write Theme Document Time
  • 15.
    Insight as ‘Document’ Mariachi Tequila Shack Place Author Write Time Pancho Villa's Quesadilla Paco'sBar and Grill Mexican Mexican food is becoming more popular Restaurant Theme Document A. N. Analyst Insight Reviews & insights ↔ A ProveML graph
  • 16.
    Mariachi Tequila Shack Pancho Villa's Quesadilla Mexican Mexican food is becomingmore popular Paco's Bar and Grill Restaurant Insight A. N. Analyst Mexican food is becoming more popular A. N. Analyst Collection: all places tagged with both Mexican and Restaurant Insight
  • 17.
    • • • • • Process for thisproject Schema for data and provenance Prototype system for HTVA Constructing narratives Demo/video
  • 20.
    i d w s w n d e a t w Fig. 4: Summarygraphics showing the distribution of values for each t l n f s
  • 23.
    • • • • • Process for thisproject Schema for data and provenance Prototype system for HTVA Constructing narratives Demo/Video
  • 24.
    Visual Exploration inProveML Collection State <visual encoding> A. N. Analyst
  • 25.
    Link to theRest of ProveML Graph Bookmark Collection State A comment about why this is important A. N. Analyst
  • 26.
  • 27.
  • 28.
  • 29.
  • 31.
    • • • • • Process for thisproject Schema for data and provenance Prototype system for HTVA Constructing narratives Demo/Video
  • 32.
    Social Media: VASTChallenge 2011
  • 33.
    Conclusions and FutureWork • Framework for provenance and uncertainty in Human Terrain Analysis • Some confidence that our work is relevant and directly related to Dstl requirements • Try ProveML with other data sets • Semantically-rich provenance in the future: infer analyst intent from actions
  • 34.
    The Team City University (London) Jason Dykes JoWood Aidan Slingsby Derek Stephens Loughborough University, UK Middlesex University (London) William Wong Rick Walker Phong Nguyen Yongjun Zheng
  • 35.
    Visit Us @Middlesex University • North West London: Google Map • Interaction Design Centre • Lots of Visual Analytics Research – UK Visual Analytics Consortium: Oxford, Imperial, UCL, and Bangor – Visual Analytics Summer School and MSc program – MoD, EPSRC, and EU projects • Always look for collaboration