Today I’m going to talk about the trajectory of e-Science – from its conception through examples of 3 generations, and I’ll reflect on how we are moving from generation 2 to generation 3. Different disciplines and especially communities may be in different stages of evolution.
First something about words. This definition of e-Science is important – it reminds us that it isn’t just about technology but about people working together and being empowered by technology – and the emphasis on “science” reminds us that ultimately success is measured by new scientific outcome. At the turn of the decade this was a vision of the future. A programme was created called e-Science. The projects doing the innovation were labelled as “e-Science”. By the time we arrive, it’s just “science”. So “e-Science” has become the name of the journey rather than the destination. Note that the innovation that takes us to the destination isn’t solely in the custody of e-Science projects – there’s a lot of relevant work going on that doesn’t carry that label. Note also that when we say “e-Science” we actually mean “e-Research”! We sometimes forget to say that.
e-Science is often characterised as dealing with the data deluge – especially from new experimental techniques such as combinatorial chemistry, DNA microarrays, instruments, sensor networks, earth observation – even facebook (which I see as a kind of large hadron collider – or large people collider - of social science) as well as digitisation programmes and release of existing data (e.g. open government data) or new modes of access to secure data. Researchers are working digitally. The data deluge is caused by, and needs to be handled by, automation. The trick with automation is getting the right balance of “human in the loop” so that researcher can do what they’re very good at while machines do what machines are very good at. BTW Note the cocktail on the form in this slide!
Scientific workflow systems are a key automation technique for systematically handling the data deluge and giving us the “workflow” as a new sharable artefact of digital science – to record, repeat, reproduce and repurpose an experiment. This is an iconic slide by Carole Goble which is much repeated, reproduced and repurposed!
As keen observers of the e-Research ecosystem (as I’m sure we all are!) it’s interesting to note just how many workflow systems there are. This isn’t bad – each one comes prepackaged to solve particular problems for particular research communities. This is a good thing – it’s about adoption, about doing the specific before the generic. It shows co-evolution in action – successful e-Research isn’t about technology impacting research, it’s about technology being harnessed by researchers. Note Computer scientists in the audience may feel an urge to build a generic workflow language so that these systems can inter-operate. As it happens, workflows by their very nature plug together pretty well anyway – calling each other as services, or piping data from one to another.
Some co-evolution in action. In CombeChem I didn’t get requirements and go away and design a system that nobody wanted. We empowered some chemists to harness the technology – in this case Semantic Web. We “went on the journey” with them. They have done cool stuff! Semantic lab books, publication at source (e-crystals then blogging the lab), semantically enhanced publications. And a neat units ontology.
This is a summary of the phase we have been describing. The text on my summary slides has evolved but was originally based on the work of the e-Laboratories group at Manchester University (cf collaboratory or Virtual Research Environment) – I believe this framework to be more generally applicable, as you’ll see in this talk.
What we didn’t see much in phase 1 was sharing and reuse, but this is essential to harnessing of the new technology. The story on this slide involves sharing in a corridor and we will go on to see how we do it digitally! But it’s an important motivation. It led to new science.
The problem with sharing is that scientists are selfish – not so much e-Science as “me-Science”!
Heard this one? :-)
So we created “myExperiment” to find out whether scientists do indeed share enough to enjoy the benefits. New Scientist called it “mySpace for Scientists” (and my daughter called it mySpace for Science homework) but alas mySpace was soon passé, so it rapidly became Facebook for scientists. But that was a deterrent to uptake, because it was perceived to imply no privacy. So it’s not facebook! Incidentally our astronomy colleagues picked up the idea to create “Spacebook” :-)
How we actually describe myExperiment of course depends on our audience, and there are things of interest to many people. It’s like the blind monks and the elephant. Apologies to repositories colleagues in the audience for putting them at the tail end!
myExperiment in one slide! It’s a “boutique” Web site with the largest public collection of scientific workflows. For lots more information see the myExperiment wiki http://wiki.myexperiment.org/ BioCatalogue is a registry of Web Service in the life sciences and is directly based on the myExperiment experience. Sysmo and Methodbox grew from the myExperiment codebase – methodbox is an e-Social Science e-Laboratory for sharing and analysing data, and sysmo is customised to the systems biology domain. See http://www.biocatalogue.org/ http://www.methodbox.org/ http://www.sysmo-db.org/
My example screenshot page today isn’t a Taverna workflow but is another example of co-evolution. This is a nimrod workflow, and it’s on the Australian instance of myExperiment. We don’t mandate how people use myExperiment, we empower and watch and learn! One of the distinctives is the yellow strip – the “social metadata”... Licenses, credits, attribution. Without this scientists wouldn’t use it.
Lots of people focus on data (after all, there is a deluge!). Another important distinctive of myExperiment is that we have focused on sharing workflows (specific first – we focus on workflows like movies on youtube or photos on flickr) – or more generally on methods (sharing “know-how” ). If there is a data deluge then surely methods for handling and analysing it are just as important as the data?
This is reflected in a third distinctive – the pack. This is Paul Fishers pack from the Tryps example. Some packs contain example input and output data so workflows can be checked for “decay” (they don’t actually rot, but the world changes round them). While others are looking at semantically enhanced publication, we are asking “what is the shared artefact of future research?” We come at the same problem from the other side. We have it surrounded! Our approach relieves us of the paper mindest – so, for example, a Research Object could contain information for many audiences and purposes, with a commonly interpreted core (social scientists will recognise the idea of a “boundary object”).
None of this would be relevant if we weren’t seeing new science coming out – and we are. This example involves a microscope – back to our earlier instruments and automation theme – and a Kepler workflow which is shared on myexperiment.org.au and is in routine use.
This is pretty much where we are now!
Now we look at myExperiment as a probe into the future behaviour of researchers. For example, these workflows by Francois Belleau show what could be described as another level of working – building on the new tooling.
Here we see bioinformaticians assembling the resources they need to answer a research question – and also demonstrating what the methods section of the future paper needs to look like. They are using Linked Data. We see the power – ease of assembly. This could be where the new computer science challenges lie in e-Research.
To show it isn’t just bioinformaticians, here are Computational Musicologists doing a similar thing. Here the “signal” is digital music recordings, and the research question relates to country music!
That example comes from a Digging into Data project with the best project acronym ever. The projects is conducting a massive structural analysis of music in the internet archibe, to support musicologists. It illustrates many of the things we are now seeing in e-Research – crowdsourcing, annotation, community software development, high performance computation, data publication. This project involves UIUC, McGill and Oxford – and the supercomputer time is donated by NCSA.
We’ve seen digital humanities, let’s look briefly at e-social science – or rather, “Digital Social Research” (the name of the destination not the journey!) In social science we have more data than ever before but not collected for social science research per se – it’s fit for a different purpose. This brings a set of challenges, from statistics to ethics. We also have more capability than ever before, as illustrated in this talk. We believe the trick (again) is to focus on “methods” – the training and capacity building in the next generation of researchers. Social Science has another important angle – the social science study of e-Research itself. Many useful studies are now emerging.
Once the technologies are established and adopted we can realise the benefits of sharing – not just in “big science” but in everyday research. Collections like myExperiment enable new forms of analysis – of patterns of methods for example.
What we have seen throughout this talk is co-evolution or co-design in action. Or – more words – co-constitution. For computer scientists let’s just say co-* :-) A year ago I did a tour of the US with Malcolm Atkinson and we introduced two metaphors which have become “memes”: Intellectual access ramps, of which workflow systems and myExperiment are examples, enable incremental engagement – rather thank jumping straight into the fast lane! They are for scientists but also developers and research technologists. - Datascopes. These are the assemblies of tools that take us from signal to understanding. They are scientific instruments which equally support humanists. We hope they will change our understanding of our place in the universe.
Evolution of e-Research
The Evolution of e-Research David De Roure
Overview <ul><li>e-Science: The Destination and the Journey </li></ul><ul><li>Generation 1 – Early adopters </li></ul><ul><li>Generation 2 – Embedding </li></ul><ul><li>Generation 3 – Radical sharing </li></ul><ul><li>Reflections </li></ul>
e-Science <ul><li>e-Science was defined by John Taylor (Director General of the UK Research Councils) as </li></ul><ul><li>global collaboration in key areas of science and the next generation of infrastructure that will enable it </li></ul><ul><li>e-Science was the name of the destination </li></ul><ul><li>It became the name of the journey </li></ul><ul><li>When we arrive, the destination is just called science </li></ul>
<ul><li>Workflows are the new rock and roll </li></ul><ul><li>Machinery for coordinating the execution of (scientific) services and linking together (scientific) resources </li></ul><ul><li>The era of Service Oriented Applications </li></ul><ul><li>Repetitive and mundane boring stuff made easier </li></ul>Carole Goble E. Science laboris
<ul><li>Box of Chemists </li></ul>My Chemistry Experiment
1 st Generation Summary Current practices of early adoptors of tools. Characterised by researchers using tools within their particular problem area, with some re-use of tools, data and methods within the discipline. Traditional publishing is supplemented by publication of some digital artefacts like workflows and links to data. Science is accelerated and practice beginning to shift to emphasise in silico work.
<ul><li>Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle </li></ul><ul><li>Paul meets Jo. Jo is investigating Whipworm in mouse. </li></ul><ul><li>Jo reuses one of Paul’s workflow without change . </li></ul><ul><li>Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. </li></ul><ul><li>Previously a manual two year study by Jo had failed to do this. </li></ul>Reuse, Recycling, Repurposing
<ul><li>“ A biologist would rather share their toothbrush than their gene name” </li></ul>Mike Ashburner and others Professor in Dept of Genetics, University of Cambridge, UK
“ Data mining: my data’s mine and your data’s mine”
mySpace for scientists! Facebook for scientists! Not Facebook for scientists!
Web 2 Open Repositories Researchers Social Network Developers Social Scientists The experiment that is
<ul><li>“ Facebook for Scientists” ...but different to Facebook! </li></ul><ul><li>A repository of research methods </li></ul><ul><li>A community social network of people and things </li></ul><ul><li>A Social Virtual Research Environment </li></ul><ul><li>A probe into researcher behaviour </li></ul><ul><li>Open source (BSD) Ruby on Rails app </li></ul><ul><li>REST and SPARQL interfaces, Linked Data compliant </li></ul><ul><li>Inspiration for: BioCatalogue, MethodBox and SysmoDB </li></ul>myExperiment currently has 4989 members, 234 groups, 1260 workflows, 345 files and 129 packs
Results Logs Results Metadata Paper Slides Feeds into produces Included in produces Published in produces Included in Included in Included in Published in Workflow 16 Workflow 13 Common pathways QTL Paul’s Pack Paul’s Research Object
Biomedical Task Effect of antibody treatment on tumour blood vessels and stroma? <ul><li>Define tissue-containing area on slide-> nucleated area </li></ul><ul><li>Define relative stromal area (ratio: stroma/nuclei) </li></ul><ul><li>Define number of blood vessels </li></ul>David Abramson Antibody-treated untreated Nuclei Blood vessels stroma merged
2 nd Generation Summary Projects delivering now. Some institutional embedding. Key characteristic is re-use - of the increasing pool of tools, data and methods across areas/disciplines. Contain some freestanding, recombinant, reproducible research objects. New scientific practices are established and opportunities arise for completely new scientific investigations. Some expert curation.
<ul><li>“… to discover proteins that interact with transmembrane proteins, particularly those that can be related to neuro-degenerative diseases in which amyloids play a significant role” </li></ul><ul><ul><li>Taverna provenance exposed as RDF </li></ul></ul><ul><ul><li>myExperiment RDF document for a protein discovery workflow </li></ul></ul><ul><ul><li>Mocked-up BioCatalogue document using myExperiment RDF data as example </li></ul></ul><ul><ul><li>Provisional RDF documents obtained from the ConceptWiki (conceptwiki.org) development server </li></ul></ul><ul><ul><li>An RDF document for an example protein, obtained from the RDF interface of the UniProt web site </li></ul></ul>A Bioinformatics Experiment Scott Marshall Marco Roos Taverna
Digital Music Collections Crowdsourced ground truth Community Software Linked Data Repository Supercomputer Structural Analysis of Large Amounts of Music Information
<ul><li>Data . ‘Born-digital’ data sources including social transactional data. Ease of access to secure data </li></ul><ul><li>Capability . Increased capability in tools, infrastructure and services </li></ul>Digital Social Research Harnessing advances in digital technology and practice to achieve world-class social research with maximum impact <ul><li>Methods . Taking advantage of new data and capability, new forms of interpretative research </li></ul><ul><li>Studies . Study of e‑Science, understanding innovation pathways, and assessment of impact </li></ul>
3 rd Generation Summary The solutions we'll be delivering in 5 years Characterised by global reuse of tools, data and methods across any discipline, and surfacing the right levels of complexity for the researcher. Routine use. Key characteristic is radical sharing . Research is significantly data driven - plundering the backlog of data, results and methods. Increasing automation and decision-support for the researcher - the VRE becomes assistive. Curation is autonomic and social.
Reflections <ul><li>Co-Evolution, Co-Design, Co-Constitution </li></ul><ul><li>Intellectual Access Ramps </li></ul><ul><ul><li>Incremental engagement </li></ul></ul><ul><ul><li>Safe places to play </li></ul></ul><ul><li>Datascopes </li></ul><ul><ul><li>From signal to understanding </li></ul></ul><ul><ul><li>Medical images, music, ancient manuscripts </li></ul></ul><ul><ul><li>New born-digital data </li></ul></ul>
<ul><li>Thanks to: Jeremy Frey & CombeChem; Carole Goble & myGrid; Iain Buchan, Sean Bechhofer & e-Laboratories; myExperiment; David Abramson; Marco Roos; Stephen Downie & SALAMI; the e-Social Science Directorate; Malcolm Atkinson </li></ul><ul><li>[email_address] </li></ul><ul><li>Visit wiki.myexperiment.org </li></ul><ul><li>This presentation is in myExperiment Pack 141 and http://www.slideshare.net/dder/evolution-of-eresearch </li></ul>