The document introduces Sean Bechhofer and provides his contact information, including that he is from the University of Manchester, his email address, Twitter handle, and blog. It then lists several publications and projects related to reproducible and open research, including myExperiment and Research Objects, with the goal of facilitating exchange and reuse of digital knowledge. Key challenges discussed are how to move beyond linear paper publications to frameworks that better support reuse of digital assets like workflows and datasets.
Presentation given at CERN Workshop on Innovations in Scholarly Communication (OAI7) on 22nd June 2011
http://indico.cern.ch/conferenceDisplay.py?confId=103325
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
This is a derivative of a talk I gave at the Linnean society on 20th Sept. 2012. This version was given at the i4Life Environmental Genomics workshop on 25th Sept. and refocused to look at the dark taxa problem and developing published descriptions of molecular sequence clusters.
Presentation given at CERN Workshop on Innovations in Scholarly Communication (OAI7) on 22nd June 2011
http://indico.cern.ch/conferenceDisplay.py?confId=103325
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
This is a derivative of a talk I gave at the Linnean society on 20th Sept. 2012. This version was given at the i4Life Environmental Genomics workshop on 25th Sept. and refocused to look at the dark taxa problem and developing published descriptions of molecular sequence clusters.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Deborah McGuinness
Ontologies are seeing a resurgence of interest and usage as big data proliferates, machine learning advances, and integration of data becomes more paramount. The previous models of sometimes labor-intensive, centralized ontology construction and maintenance do not mesh well in today’s interdisciplinary world that is in the midst of a big data, information extraction, and machine learning explosion. In this talk, we will provide some historical perspective on ontologies and their usage, and discuss a model of building and maintaining large collaborative, interdisciplinary ontologies along with the data repositories and data services that they empower. We will give a few examples of heterogeneous semantic data resources made more interconnected and more powerful by ontology-supported infrastructures, discuss a vision for ontology-enabled future research and provide some examples in a large health empowerment joint effort between RPI and IBM Watson Health.
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
Invited talk given to the National Acquisitions Group conference, 5 September 2012.
Focusing on the reasons for building the Digital Library, making the case, and the social/organisational and technological aspects of digital preservation. Not covered are aspects such as collection development, audience engagement, and resource discovery.
presented by Stuart Macdonald at the College of Science and Engineering - "What's new for you in the Library“, Murray Library, Kings Buildings, University of Edinburgh. 28 May 2014
Covers research data, research data management, funder policies and the University's RDM policy, RDM services and support, awareness raising, training, progress so far.
Research data as the main product of research can be unique and is often the result of a complex and cost-intensive research process. Reuse and reinterpretation of such material is envisioned, not only to maintain research integrity, but also to accelerate the advancement of science by sharing results in an early stage.
Generally speaking, there is little general experience with preservation, provision and publishing of research data. Thus so far little research has been done when it comes to researching data publishing models. In history, this has partly been due to the limited existing infrastructures, but with current information technologies, modern and tailored research data provision and publishing are facilitated.
Why tailored? Characteristics of research data vary across and within disciplines. This results in more complex prerequisites/specification when compared to the process of paper publication which is very similar across disciplines. Thus, tailored models are necessary to match the individual characteristics of research data across disciplines. Within this presentation three different approaches are distinguished: object centric, text centric and data centric. Prerequisites and limitations regarding timing and room of the data provision will be discussed and experiences with each of the different models presented.
Regardless of these models, it becomes apparent that due to the individual characteristics of research data, its provision and publication is only possible with the support and knowhow of the research community. This know-how needs to be linked to the competences of infrastructure facilities.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Deborah McGuinness
Ontologies are seeing a resurgence of interest and usage as big data proliferates, machine learning advances, and integration of data becomes more paramount. The previous models of sometimes labor-intensive, centralized ontology construction and maintenance do not mesh well in today’s interdisciplinary world that is in the midst of a big data, information extraction, and machine learning explosion. In this talk, we will provide some historical perspective on ontologies and their usage, and discuss a model of building and maintaining large collaborative, interdisciplinary ontologies along with the data repositories and data services that they empower. We will give a few examples of heterogeneous semantic data resources made more interconnected and more powerful by ontology-supported infrastructures, discuss a vision for ontology-enabled future research and provide some examples in a large health empowerment joint effort between RPI and IBM Watson Health.
University of Bath Research Data Management training for researchersJez Cope
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
Invited talk given to the National Acquisitions Group conference, 5 September 2012.
Focusing on the reasons for building the Digital Library, making the case, and the social/organisational and technological aspects of digital preservation. Not covered are aspects such as collection development, audience engagement, and resource discovery.
presented by Stuart Macdonald at the College of Science and Engineering - "What's new for you in the Library“, Murray Library, Kings Buildings, University of Edinburgh. 28 May 2014
Covers research data, research data management, funder policies and the University's RDM policy, RDM services and support, awareness raising, training, progress so far.
Research data as the main product of research can be unique and is often the result of a complex and cost-intensive research process. Reuse and reinterpretation of such material is envisioned, not only to maintain research integrity, but also to accelerate the advancement of science by sharing results in an early stage.
Generally speaking, there is little general experience with preservation, provision and publishing of research data. Thus so far little research has been done when it comes to researching data publishing models. In history, this has partly been due to the limited existing infrastructures, but with current information technologies, modern and tailored research data provision and publishing are facilitated.
Why tailored? Characteristics of research data vary across and within disciplines. This results in more complex prerequisites/specification when compared to the process of paper publication which is very similar across disciplines. Thus, tailored models are necessary to match the individual characteristics of research data across disciplines. Within this presentation three different approaches are distinguished: object centric, text centric and data centric. Prerequisites and limitations regarding timing and room of the data provision will be discussed and experiences with each of the different models presented.
Regardless of these models, it becomes apparent that due to the individual characteristics of research data, its provision and publication is only possible with the support and knowhow of the research community. This know-how needs to be linked to the competences of infrastructure facilities.
Slides from a keynote talk at the University of Manchester UK Schools Computer Animation Competition in July 2014.
http://animation14.cs.manchester.ac.uk/festival/
Scientific Software Challenges and Community ResponsesDaniel S. Katz
a talk given at RTI International on 7 December 2015, discussing 12 scientific software challenges and how the scientific software community is responding to them
2012 03-28 Wf4ever, preserving workflows as digital research objectsStian Soiland-Reyes
Presented on 2012-03-28 at EGI Community Forum 2012, Munich.
http://www.wf4ever-project.org/
http://purl.org/wf4ever/model
http://cf2012.egi.eu/
https://www.egi.eu/indico/sessionDisplay.py?sessionId=66&confId=679#20120328
Introducing PRIME:Publisher, Repository and Institutional Metadata ExchangeBrian Hole
"Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange" – Brian Hole, Ubiquity Press.
OpenAIRE Interoperability Workshop - University of Minho, Braga, Portugal, 8 February 2013
Do Libraries Meet Research 2.0 : collaborative tools and relevance for Resear...Guus van den Brekel
Presentation June 30th 2009 Toulouse at LIBER Conference 2009
http://liber2009.biu-toulouse.fr/
Research Libraries & Web 2.0. Scientists engage in science & research 2.0, libraries should follow, outreach, engage, explore and facilitate etc
Knowledge Infrastructure for Global Systems ScienceDavid De Roure
Presentation at the First Open Global Systems Science Conference, Brussels, 8-10 November 2012
http://www.gsdp.eu/nc/news/news/date/2012/10/31/first-open-global-systems-science-conference/
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
An short introduction to the PRIME (Publisher, Repository and Institutional Metadata Exchange) project, by Brian Hole, at the JISC Managing Research Data programme launch workshop in Nottingham, UK, October 25th 2012.
The Journal of Open Archaeology Data and PRIME: Incentivising Open Data Archi...Brian Hole
An introduction to the Journal of Open Archaeology Data (JOAD) and the Publisher, Repository and Institutional Metadata Exchange (PRIME) project, by Brian Hole. Presentation given at the 7th World Archaeological Congress (WAC 7), at the Dead Sea, Jordan, in 18 January 2013.
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...Open Science Fair
Eloy Rodrigues, Petr Knoth & Kathleen Shearer showcase the conceptual model for this vision, as well as the role and functions of repositories within this model.
Workshop title: Building a global knowledge commons - ramping up repositories to support widespread change in the ecosystem
Workshop abstract:
The extensive international deployment of repository systems in higher education and research institutions, as well as scholarly communities, provides the foundation for a distributed, globally networked infrastructure for scholarly communication. This distributed network of repositories can and should be a powerful tool to promote the transformation of the scholarly communication ecosystem. However, repository platforms are still using technologies and protocols designed almost twenty years ago, before the boom of the web and the dominance of Google, social networking, semantic web and ubiquitous mobile devices. In April 2016, the Confederation of Open Access Repositories (COAR) launched a working group to help identify new functionalities and technologies for repositories and develop a road map for their adoption. For the past several months, the group has been working to define a vision for repositories and sketch out the priority user stories and scenarios that will help guide the development of new functionalities. The results of this work will be available in the summer of 2017.
This workshop will present the functionalities and technologies for the next generation of repositories and reflect on how these functionalities will be adopted into the existing software platforms. In addition, participants will discuss the important implications for the network layers, and how repositories will uniformly interact with the networks to provide value added services on top of their content.
DAY 3 - PARALLEL SESSION 6 & 7
http://www.opensciencefair.eu/workshops/parallel-day-3-1/building-a-global-knowledge-commons-ramping-up-repositories-to-support-widespread-change-in-the-ecosystem
The Liber 2009 presentation repeated for a Dutch audience IN Dutch but with the english slides (just the first one is in Dutch :-)
Samenwerking Hogeschool bibliotheken SHB, 5 november 2009
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
myExperiment and the Rise of Social MachinesDavid De Roure
Talk at hubbub 2012, Indianapolis, 25 September 2012. The talk introduces myExperiment and Wf4Ever, discusses the future of research communication including FORCE11, and introduces the SOCIAM project (Theory and Practice of Social Machines) which launches in October 2012.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
OeRC Seminar
1. Who am I?
Sean Bechhofer
University of Manchester
sean.bechhofer@manchester.ac.uk
@seanbechhofer
http://humblyreport.wordpress.com
1
2. Who am I?
Sean Bechhofer
University of Manchester
sean.bechhofer@manchester.ac.uk
@seanbechhofer
http://humblyreport.wordpress.com
2
3. Who am I?
Sean Bechhofer
University of Manchester
sean.bechhofer@manchester.ac.uk
@seanbechhofer
http://humblyreport.wordpress.com
3
4. Who am I?
Sean Bechhofer
University of Manchester
sean.bechhofer@manchester.ac.uk
@seanbechhofer
http://humblyreport.wordpress.com
4
5. Who am I?
Sean Bechhofer
University of Manchester
sean.bechhofer@manchester.ac.uk
@seanbechhofer
http://humblyreport.wordpress.com
5
6. Who am I?
Sean Bechhofer
University of Manchester
sean.bechhofer@manchester.ac.uk
@seanbechhofer
http://humblyreport.wordpress.com
6
7. Who am I?
Sean Bechhofer
University of Manchester
sean.bechhofer@manchester.ac.uk
@seanbechhofer
http://humblyreport.wordpress.com
7
8. Who am I?
Sean Bechhofer
University of Manchester
sean.bechhofer@manchester.ac.uk
@seanbechhofer
http://humblyreport.wordpress.com
8
9. Who am I?
Sean Bechhofer
University of Manchester
sean.bechhofer@manchester.ac.uk
@seanbechhofer
http://humblyreport.wordpress.com
9
10. Research Objects: Towards
Exchange and Reuse of Digital
Knowledge
Sean Bechhofer
University of Manchester
sean.bechhofer@manchester.ac.uk
@seanbechhofer
http://humblyreport.wordpress.com
10
11. Publication
• Argumentation: Convince the reader of the
validity of a position [Mesirov]
– Reproducible Results System: facilitates enactment
and publication of reproducible research.
J. Mesirov Accessible Reproducible Research Science 327(5964), p.415-416, 2010
http://dx.doi.org/10.1126/science.1179653
• Results are reinforced by reproducability [De Roure]
– Explicit representation of method.
D. De Roure and C. Goble Anchors in Shifting Sand: the
Primacy of Method in the Web of Data Web Science Conference 2010, Raleigh
NC, 2010 http://eprints.ecs.soton.ac.uk/20817/
• Verifiability as a key factor in scientific discovery.
Stodden et. al. Reproducible Research: Addressing the Need for Data and
Code Sharing in Computational Science Computing in Science and Engineering 12
(5), p.8-13, 2010 http://dx.doi.org/10.1109/MCSE.2010.113
12. Publication
• Nano-publications. Explicit representation at the statement
level.
Groth et. al. The Anatomy of a Nano-publication Information Services and Use
30(1), p.51-56, 2010 http://iospress.metapress.com/index/FTKH21Q50T521WM2.pdf
• Executable Papers
– Collage
– SHARE
– Verifiable Computational Results
Nowakowski et. al. The Collage Authoring Environment ICCS 2011, 2011 http://
dx.doi.org/10.1016/j.procs.2011.04.064
Van Gorpet. al SHARE: a web portal for creating and sharing executable
research papers ICCS 2011, 2011 http://dx.doi.org/10.1016/j.procs.2011.04.062
Gavish et. al. A Universal Identifier for Computational Results
ICCS 2011, 2011 http://dx.doi.org/10.1016/j.procs.2011.04.067
12
13. Knowledge Burying in paper publication
Experiment
Knowledge
Publication
Text Mining
Paper
• Publishing/mining cycle results in loss of knowledge
– ≥ 40% of information lost
• RIP – Rest in Paper
• Need for mechanisms for publication of knowledge, preserving
information about the process.
B.Mons Which Gene Did You Mean? BMC Bioinformatics 6 p.142 2005
http://dx.doi.org/10.1186/1471-2105-6-142
14. The Problem
• Moving to digital environments
– Workflows, protocols, algorithms
– Consuming and producing data
– Electronic publication methods
• From (linear) paper publications to….
???
• Need for frameworks for facilitating reuse and
exchange of digital knowledge
14
15. Workflows
A Scientific Workflow can be seen as the • Central in experimental science
combination of data and processes into a
• Enable automation
configurable, structured set of steps that implement
• Make science repeatable (and sometimes
semi-automated computational solutions in scientific
reproducible)
problem-solving
• Encourage best practices
• Scientist-friendly
• Aimed at (some types of) scientists, possibly
even without strong computational skills
• Communities: Need for scientific data
preservation
• Enhance scientific development by building on,
sharing, and extending previous results within
scientific communities
• However, workflow preservation is
especially complex
• Workflows not only specified statically at
design time but also interpreted through their
execution
BioAID_DiseaseDiscovery v3
• Complex models are required to describe
workflows and related resources, including
documents, data and services
• Resources often beyond control of scientists
16. myExperiment
A repository of research A probe into researcher
methods
behaviour
A community social network of Open source (BSD) Ruby on Rails
people and things
app
A Social Virtual Research REST and SPARQL interfaces,
Environment
supports Linked Data
Part of product family including
Web 2.0 “boutique” site
BioCatalogue, MethodBox and
SysmoDB
5550
members,
300
groups,
2300
workflows,
220
packs
16
18. Wf4Ever
…technological infrastructure for the preservation and
efficient retrieval and reuse of scientific workflows in a range
of disciplines.
• Architecture/implementation for workflow preservation,
sharing and reuse
• Research Object models
• Workflow Decay, Integrity and Authenticity
• Workflow Evolution and Recommendation
• Provenance
• Driven by Use Cases
FP7 Digital Libraries and Digital Preservation
iSOCO, University of Manchester, Universidad Politécnica de
Madrid, University of Oxford, Poznan Supercomputing and
Networking Centre, Instituto de Astrofísica de Andalucía,
Leiden University Medical Centre
18
19. Research Objects
Semantically rich aggregations of resources,
supporting a research objective
Linking
19
22. Astronomers Questions
When accessing a workflow
When sharing a workflow
• Can I use it for my purposes (in my • What rights do others have?
words)?
• What a good workflow is to get a
• If I can expect it to run, when was good score?
it was last run, by whom?
– Make my workflow findable, reusable,
and ready for review
• What it does quickly, by one of
– Instructions to authors
– example input / output (and trying it)
– Two types of contributions: serious
– a description
science, preliminary/playing around
– ‘reading’ its key parts
• If my workflow may have issues
– what it was used for
– What the system or other users think
– related workflows its creator
it does
– contacting the creator or last user
• How it relates to other things
• How I need to cite the author and
workflow?
• Share freely or anonymously upon
request?
22
http://www.flickr.com/photos/-bast-/349497988/
23. User Requirements
Reader
Re-User
Trainee Contributor
Finder/Searcher
Creator
Contributor
Publisher
Comparator
Curator
Evaluator/Reviewer
As a Creator of ROs, I want to aggregate existing
resources so that I can conveniently access related
resources from a single place.
• Study of user scenarios
• Isolation of User Requirements
As a Reader of ROs, I want to compare an RO with
others so that I can determine whether the investigation
is novel
• User review
As a Comparator of ROs, I want to follow the steps
• Project Technical requirements
taken so that I can understand the investigative process
or method
• Classify Technical Requirements
23
24. User Roles
Creator. Collecting together resources as an RO for reuse or
repurpose. May be for personal use.
Contributor. Providing materials to be used within an RO
Collaborator. Providing materials to be used without
necessarily being aware of the RO
Reader. Looking for related works, state of the art.
Comparator. Looking for similar or previous work to a task in
hand
Re-User. Understands the underlying methods encapsulated
(e.g. workflow) and how to extract/replace components.
Publisher. Disseminating results or methods. Upload to
repository, publish via myExp, embed in blog post.
Evaluator/Reviewer. Evaluating/validating or reviewing content.
Confirmation of results or validation of process.
24
25. Workflow Reproducibility
Stability, Completeness, Integrity, Authenticity, Quality
Workflow Decay
• Component level
• flux/decay/unavailability
• Data level
• formats/ids/standards
• Infrastructure level
• platform/resources
Experiment Decay
• Methodological changes
• New technologies
• New resources/components
• New data
25
27. Wf4Ever Reference Implementation
(By the end of 1st Year)
Access Usage Clients
Dropbox Client
RO Manager
RO Portal
Tool ROBox
Data Management Analysis Services
Stability Completeness
Recommender
Evaluation Evaluation
Storage Services Lifecycle Services
Taverna Workflow
Mgmt System
RO Digital Library
27
28. Linked Data
• A set of best practices for publishing
and connecting data on the Web
1. Use URIs to name things
2. Use dereferencable HTTP URIs
3. Provide useful content on lookup using standards
4. Include links to other stuff
28
30. Linked Data is not Enough!
Note: The answer is
• A set of best practices for publishing not not Linked Data!*
and connecting data on the Web
*Logician joke
1. Use URIs to name things
2. Use dereferencable HTTP URIs
3. Provide useful content on lookup using standards
4. Include links to other stuff
• All very nice, lots of publishing going on, but no common
models for lifecycle, aggregation, ownership, etc
• A platform for sharing and publishing, but more is needed
Bechhofer et al Linked Data is not Enough for Scientists Future Generation
Computer Systems, 2011 http://dx.doi.org/10.1016/j.future.2011.08.004
30
31. ROs and Linked Data
• Linked Data: Collection of best practices for publishing
and connecting structured data on the web.
• ROs should be independent of mechanisms for
representation and delivery
• ROs as non-information
resources
LD Cloud
– “Named Graphs
for LD
RO
31
32. WP2 - Workflow Lifecycle Management
Research Object Model
» Research Object Model
› Focus of work in M6-12
» Version 0.1 released to project in November 2011
Container Structure
» Use within developed RO services (RODL)
» A suite of linked ontologies
› Research Object Core - ro (aggregation and annotation)
• Research Object
Emphasis on
› Workflow Description - wfdesc (content)
Workflow-centric
Research Objects
• Abstract workflow
› Workflow Provenance - wfprov (provenance)
• Workflow provenance
Minimal place holder
32
33. WP2 - Workflow Lifecycle Management
Research Object Core (ro)
» Aggregation (OAI-ORE)
› Use of OAI-ORE to support the description of collections of
resources.
› Established vocabulary
› Usage in existing work (myExperiment)
› Fit with Linked Data publication
» Annotation (AO)
› Survey of existing annotation vocabularies, Annotation Ontology (Clark et al) and Open
Annotation Collaboration (Van de Sompel et al).
› Liaison and discussion with both groups
• Little to choose in technical terms
• A catalyst and focus for collaboration between AO and OAC
› Choice of AO
• Existing collaboration/relationship (UNIMAN and AO)
» Formation of W3C Open Annotation Community Group
› Participation from Wf4Ever staff
› Potential for impact/collaborations
» Defines the core data model used by the RO Digital Library service and the
Command Line Tool developed in WP1.
33
34. WP2 - Workflow Lifecycle Management
Workflow Description (wfdesc)
» Model providing initial descriptions of workflows
› Process instances
› Linked via input/output/parameters.
› Support for the tasks of workflow abstraction, indexing, classifications, and general
workflow analysis.
› Generic technologies, adaptable to different domains using specific catalogues, e.g. SADI
framework.
› Reflects explicit focus on workflow-centric ROs
» Evolved from the OPMW ontology by Wf4Ever staff member Daniel Garijo and
Yolanda Gil.
» Tooling generating wfdesc descriptions from aggregated Taverna workflows has
been developed.
› Descriptions already used by the Workflow Recommendation Service for inspecting
workflow structures and service interconnections. WP3
34
35. WP2 - Workflow Lifecycle Management
Workflow Provenance (wfprov)
» A provenance convergence layer
› Potential for links to OPM-V or PROV-O.
› Mappings to OPM-V and PROV-O are under development
› A placeholder for the v0.1 ontology suite
» Taverna plugin has been developed exporting Taverna provenance in PROV-O
format in WP4
» Prototype for a conversion agent that generates wfprov descriptions from PROV-
O developed, wfprov data will primarily be used by Integrity and Authenticity in
order to inspect workflow executions. WP4
» More extended modeling and descriptions of provenance information will be
reported in WP4.
35
36. ROs are Technical and Social
• An artefact to support preservation of the method, data
etc.
• Technical details of platform, services etc.
• A record of an investigation or experiment
• A mechanism for communication, packaging, sharing,
publishing, finding
• An object that connects people together
De Roure et al. Social Scientific Objects 1st International Workshop on Social
Object Networks, Boston, 2011 http://users.ox.ac.uk/~oerc0033/preprints/
myExpSocialObjects.pdf
37. Where Next/Challenges
• Prototype development
• Models for Research Objects
– Vocabularies
• Refinement of lifecycle states
– Versioning and Evolution
• Provenance
– RO components
– The RO itself
• Trust
http://www.flickr.com/photos/marsdd/2986989396/
37
39. Music
• Music IR and Linked Data
– Publication of collections
eTree
Million Song Dataset
Benefits?
• Music IR and ROs
– What are the Research
Objects of Music IR?
– Intermediate results/feature
sets
• Ontologies and vocabularies for describing results/feature
sets
39
40. Thanks!
• Manchester Information Management Group
– http://img.cs.manchester.ac.uk
• myGrid Team
– http://www.mygrid.org.uk/
• Wf4Ever Team
– http://www.wf4ever-project.org/
40