OeRC Seminar

Who am I?

Sean Bechhofer

University of Manchester

sean.bechhofer@manchester.ac.uk

@seanbechhofer

http://humblyreport.wordpress.com

1

Who am I?

Sean Bechhofer



@seanbechhofer


2

Who am I?

Sean Bechhofer



@seanbechhofer


3

Who am I?

Sean Bechhofer



@seanbechhofer


4

Who am I?

Sean Bechhofer



@seanbechhofer


5

Who am I?

Sean Bechhofer



@seanbechhofer


6

Who am I?

Sean Bechhofer



@seanbechhofer


7

Who am I?

Sean Bechhofer



@seanbechhofer


8

Who am I?

Sean Bechhofer



@seanbechhofer


9

Research Objects: Towards
Exchange and Reuse of Digital
Knowledge

Sean Bechhofer



@seanbechhofer


10

Publication

•  Argumentation: Convince the reader of the
validity of a position [Mesirov]

–  Reproducible Results System: facilitates enactment
and publication of reproducible research.

J. Mesirov Accessible Reproducible Research Science 327(5964), p.415-416, 2010
http://dx.doi.org/10.1126/science.1179653

•  Results are reinforced by reproducability [De Roure]

–  Explicit representation of method.

D. De Roure and C. Goble Anchors in Shifting Sand: the

Primacy of Method in the Web of Data Web Science Conference 2010, Raleigh
NC, 2010 http://eprints.ecs.soton.ac.uk/20817/

•  Veriﬁability as a key factor in scientiﬁc discovery.

Stodden et. al. Reproducible Research: Addressing the Need for Data and
Code Sharing in Computational Science Computing in Science and Engineering 12
(5), p.8-13, 2010 http://dx.doi.org/10.1109/MCSE.2010.113

Publication

•  Nano-publications. Explicit representation at the statement
level.

Groth et. al. The Anatomy of a Nano-publication Information Services and Use
30(1), p.51-56, 2010 http://iospress.metapress.com/index/FTKH21Q50T521WM2.pdf

•  Executable Papers

–  Collage

–  SHARE

–  Veriﬁable Computational Results

Nowakowski et. al. The Collage Authoring Environment ICCS 2011, 2011 http://
dx.doi.org/10.1016/j.procs.2011.04.064

Van Gorpet. al SHARE: a web portal for creating and sharing executable
research papers ICCS 2011, 2011 http://dx.doi.org/10.1016/j.procs.2011.04.062

Gavish et. al. A Universal Identiﬁer for Computational Results

ICCS 2011, 2011 http://dx.doi.org/10.1016/j.procs.2011.04.067

12

Knowledge Burying in paper publication

Experiment

Knowledge

Publication

Text Mining

Paper

•  Publishing/mining cycle results in loss of knowledge

–  ≥ 40% of information lost

•  RIP – Rest in Paper

•  Need for mechanisms for publication of knowledge, preserving
information about the process.

B.Mons Which Gene Did You Mean? BMC Bioinformatics 6 p.142 2005
http://dx.doi.org/10.1186/1471-2105-6-142

The Problem

•  Moving to digital environments

–  Workﬂows, protocols, algorithms

–  Consuming and producing data

–  Electronic publication methods

•  From (linear) paper publications to….

???

•  Need for frameworks for facilitating reuse and
exchange of digital knowledge

14

Workflows

A Scientific Workflow can be seen as the •  Central in experimental science

combination of data and processes into a
•  Enable automation

configurable, structured set of steps that implement
•  Make science repeatable (and sometimes
semi-automated computational solutions in scientific
reproducible)

problem-solving

•  Encourage best practices

•  Scientist-friendly

•  Aimed at (some types of) scientists, possibly
even without strong computational skills

•  Communities: Need for scientific data
preservation

•  Enhance scientific development by building on,
sharing, and extending previous results within
scientific communities

•  However, workflow preservation is
especially complex

•  Workflows not only specified statically at
design time but also interpreted through their
execution

BioAID_DiseaseDiscovery v3

•  Complex models are required to describe
workflows and related resources, including
documents, data and services

•  Resources often beyond control of scientists

myExperiment

  A repository of research   A probe into researcher
methods

behaviour

  A community social network of   Open source (BSD) Ruby on Rails
people and things

app

  A Social Virtual Research   REST and SPARQL interfaces,
Environment

supports Linked Data

  Part of product family including
  Web 2.0 “boutique” site

BioCatalogue, MethodBox and
SysmoDB

5550
members,
300
groups,
2300
workﬂows,
220
packs

16

Motivating Projects

•  myExperiment

–  Workﬂow sharing

•  Sysmo-DB

–  Assets catalogue supporting exchange of data,
models, SOPs

•  Obesity e-Lab/MethodBox

–  Sharing survey data/analysis scripts

•  myExperiment packs

–  Packs supporting (simple)
aggregations.

–  Links not just references

–  Packs as nascent ROs

17

Wf4Ever

…technological infrastructure for the preservation and
efficient retrieval and reuse of scientific workflows in a range
of disciplines.

•  Architecture/implementation for workflow preservation,
sharing and reuse

•  Research Object models

•  Workflow Decay, Integrity and Authenticity

•  Workflow Evolution and Recommendation

•  Provenance

•  Driven by Use Cases

FP7 Digital Libraries and Digital Preservation

iSOCO, University of Manchester, Universidad Politécnica de
Madrid, University of Oxford, Poznan Supercomputing and
Networking Centre, Instituto de Astrofísica de Andalucía,
Leiden University Medical Centre

18

Research Objects

Semantically rich aggregations of resources,
supporting a research objective

Linking

19

Astronomers Questions

When accessing a workflow

When sharing a workflow

•  Can I use it for my purposes (in my •  What rights do others have?

words)?

•  What a good workflow is to get a
•  If I can expect it to run, when was good score?

it was last run, by whom?

–  Make my workflow findable, reusable,
and ready for review

•  What it does quickly, by one of

–  Instructions to authors

–  example input / output (and trying it)

–  Two types of contributions: serious
–  a description

science, preliminary/playing around

–  ‘reading’ its key parts

•  If my workflow may have issues

–  what it was used for

–  What the system or other users think
–  related workflows its creator

it does

–  contacting the creator or last user

•  How it relates to other things

•  How I need to cite the author and
workflow?

•  Share freely or anonymously upon
request?

22

http://www.flickr.com/photos/-bast-/349497988/

User Requirements

Reader

Re-User

Trainee Contributor
Finder/Searcher
Creator

Contributor
Publisher

Comparator

Curator

Evaluator/Reviewer
As a Creator of ROs, I want to aggregate existing
resources so that I can conveniently access related
resources from a single place.

•  Study of user scenarios

•  Isolation of User Requirements

As a Reader of ROs, I want to compare an RO with
others so that I can determine whether the investigation
is novel

•  User review

As a Comparator of ROs, I want to follow the steps
•  Project Technical requirements

taken so that I can understand the investigative process
or method

•  Classify Technical Requirements

23

User Roles

Creator. Collecting together resources as an RO for reuse or
repurpose. May be for personal use.

Contributor. Providing materials to be used within an RO

Collaborator. Providing materials to be used without
necessarily being aware of the RO

Reader. Looking for related works, state of the art.

Comparator. Looking for similar or previous work to a task in
hand

Re-User. Understands the underlying methods encapsulated
(e.g. workﬂow) and how to extract/replace components.

Publisher. Disseminating results or methods. Upload to
repository, publish via myExp, embed in blog post.

Evaluator/Reviewer. Evaluating/validating or reviewing content.
Conﬁrmation of results or validation of process.

24

Workflow Reproducibility
Stability, Completeness, Integrity, Authenticity, Quality

Workflow Decay
•  Component level
•  flux/decay/unavailability
•  Data level
•  formats/ids/standards
•  Infrastructure level
•  platform/resources

Experiment Decay
•  Methodological changes
•  New technologies
•  New resources/components
•  New data

25

Wf4Ever functionalities

Access Usage Functionalities

Edit Use Annotate …

Data Management Analysis Functionalities

Stability Completeness Recommenda
Visualization Collaboration …
Evaluation Evaluation -tions

Storage Functionalities Lifecycle Functionalities

Storage Retrieval Maintenance … Execution Publication Archival …

26

Wf4Ever Reference Implementation
(By the end of 1st Year)
Access Usage Clients

Dropbox Client
RO Manager
RO Portal
Tool ROBox

Data Management Analysis Services

Stability Completeness
Recommender
Evaluation Evaluation

Storage Services Lifecycle Services

Taverna Workﬂow
Mgmt System

RO Digital Library

27

Linked Data

•  A set of best practices for publishing
and connecting data on the Web

1.  Use URIs to name things

2.  Use dereferencable HTTP URIs

3.  Provide useful content on lookup using standards

4.  Include links to other stuff

28

Linked Data is not Enough!

Note: The answer is

•  A set of best practices for publishing not not Linked Data!*

and connecting data on the Web

*Logician joke

1.  Use URIs to name things

2.  Use dereferencable HTTP URIs

3.  Provide useful content on lookup using standards

4.  Include links to other stuff

•  All very nice, lots of publishing going on, but no common
models for lifecycle, aggregation, ownership, etc

•  A platform for sharing and publishing, but more is needed

Bechhofer et al Linked Data is not Enough for Scientists Future Generation
Computer Systems, 2011 http://dx.doi.org/10.1016/j.future.2011.08.004

30

ROs and Linked Data

•  Linked Data: Collection of best practices for publishing
and connecting structured data on the web.

•  ROs should be independent of mechanisms for
representation and delivery

•  ROs as non-information
resources

LD Cloud

–  “Named Graphs
for LD

RO

31

WP2 - Workflow Lifecycle Management
Research Object Model

»  Research Object Model

›  Focus of work in M6-12

»  Version 0.1 released to project in November 2011

Container Structure
»  Use within developed RO services (RODL)

»  A suite of linked ontologies

›  Research Object Core - ro (aggregation and annotation)

•  Research Object

Emphasis on
›  Workflow Description - wfdesc (content)

Workflow-centric
Research Objects
•  Abstract workflow

›  Workflow Provenance - wfprov (provenance)

•  Workflow provenance

Minimal place holder

32

Research Object Core (ro)

»  Aggregation (OAI-ORE)

›  Use of OAI-ORE to support the description of collections of
resources.

›  Established vocabulary

›  Usage in existing work (myExperiment)

›  Fit with Linked Data publication

»  Annotation (AO)

›  Survey of existing annotation vocabularies, Annotation Ontology (Clark et al) and Open
Annotation Collaboration (Van de Sompel et al).

›  Liaison and discussion with both groups

•  Little to choose in technical terms

•  A catalyst and focus for collaboration between AO and OAC

›  Choice of AO

•  Existing collaboration/relationship (UNIMAN and AO)

»  Formation of W3C Open Annotation Community Group

›  Participation from Wf4Ever staff

›  Potential for impact/collaborations

»  Deﬁnes the core data model used by the RO Digital Library service and the
Command Line Tool developed in WP1.

33

Workflow Description (wfdesc)

»  Model providing initial descriptions of workflows

›  Process instances

›  Linked via input/output/parameters.

›  Support for the tasks of workflow abstraction, indexing, classifications, and general
workflow analysis.

›  Generic technologies, adaptable to different domains using specific catalogues, e.g. SADI
framework.

›  Reflects explicit focus on workflow-centric ROs

»  Evolved from the OPMW ontology by Wf4Ever staff member Daniel Garijo and
Yolanda Gil.

»  Tooling generating wfdesc descriptions from aggregated Taverna workflows has
been developed.

›  Descriptions already used by the Workflow Recommendation Service for inspecting
workflow structures and service interconnections. WP3

34

Workflow Provenance (wfprov)

»  A provenance convergence layer

›  Potential for links to OPM-V or PROV-O.

›  Mappings to OPM-V and PROV-O are under development

›  A placeholder for the v0.1 ontology suite

»  Taverna plugin has been developed exporting Taverna provenance in PROV-O
format in WP4

»  Prototype for a conversion agent that generates wfprov descriptions from PROV-
O developed, wfprov data will primarily be used by Integrity and Authenticity in
order to inspect workﬂow executions. WP4

»  More extended modeling and descriptions of provenance information will be
reported in WP4.

35

ROs are Technical and Social

•  An artefact to support preservation of the method, data
etc.

•  Technical details of platform, services etc.

•  A record of an investigation or experiment

•  A mechanism for communication, packaging, sharing,
publishing, ﬁnding

•  An object that connects people together

De Roure et al. Social Scientiﬁc Objects 1st International Workshop on Social
Object Networks, Boston, 2011 http://users.ox.ac.uk/~oerc0033/preprints/
myExpSocialObjects.pdf

Where Next/Challenges

•  Prototype development

•  Models for Research Objects

–  Vocabularies

•  Reﬁnement of lifecycle states

–  Versioning and Evolution

•  Provenance

–  RO components

–  The RO itself

•  Trust

http://www.ﬂickr.com/photos/marsdd/2986989396/

37

Music

•  Music IR and Linked Data

–  Publication of collections

  eTree

  Million Song Dataset

  Beneﬁts?

•  Music IR and ROs

–  What are the Research
Objects of Music IR?

–  Intermediate results/feature
sets

•  Ontologies and vocabularies for describing results/feature
sets

39

Thanks!

•  Manchester Information Management Group

–  http://img.cs.manchester.ac.uk

•  myGrid Team

–  http://www.mygrid.org.uk/

•  Wf4Ever Team

–  http://www.wf4ever-project.org/

40

OeRC Seminar

More Related Content

What's hot

Viewers also liked

Similar to OeRC Seminar

Recently uploaded

OeRC Seminar