• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

The Evolution of e-Research: Machines, Methods and Music

on

  • 3,177 views

David De Roure's Inaugural Lecture on 28th October at Oxford e-Research Centre, University of Oxford, UK ...

David De Roure's Inaugural Lecture on 28th October at Oxford e-Research Centre, University of Oxford, UK

10 years ago we saw a few early adopters of e-Science technology; now we see acceleration of research through broader adoption and sharing of tools, techniques and artefacts, both for 'big science' and the 'long tail scientist'.

Will this incremental trend continue or are we seeing glimpses of a phase change ahead, where researchers harness these emerging digital capabilities to address research questions in ways that simply were not possible before?

This talk will describe three generations of e-Research, using the myExperiment social website as a lens to glimpse future research practice, and focusing on a web-scale computational musicology project as an illustration of 3rd generation thinking.

Also available from http://wiki.myexperiment.org/index.php/Presentations

Statistics

Views

Total Views
3,177
Views on SlideShare
3,177
Embed Views
0

Actions

Likes
1
Downloads
35
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Today I’m going to talk about the trajectory of e-Science – from its conception through examples of 3 generations, and I’ll reflect on how we are moving from generation 2 to generation 3. Different disciplines and especially communities may be in different stages of evolution.
  • First something about words. This definition of e-Science is important – it reminds us that it isn’t just about technology but about people working together and being empowered by technology – and the emphasis on “science” reminds us that ultimately success is measured by new scientific outcome.At the turn of the decade this was a vision of the future. A programme was created called e-Science. The projects doing the innovation were labelled as “e-Science”. By the time we arrive, it’s just “science”. So “e-Science” has become the name of the journey rather than the destination. Note that the innovation that takes us to the destination isn’t solely in the custody of e-Science projects – there’s a lot of relevant work going on that doesn’t carry that label.Note also that when we say “e-Science” we actually mean “e-Research”! We sometimes forget to say that.
  • First something about words. This definition of e-Science is important – it reminds us that it isn’t just about technology but about people working together and being empowered by technology – and the emphasis on “science” reminds us that ultimately success is measured by new scientific outcome.At the turn of the decade this was a vision of the future. A programme was created called e-Science. The projects doing the innovation were labelled as “e-Science”. By the time we arrive, it’s just “science”. So “e-Science” has become the name of the journey rather than the destination. Note that the innovation that takes us to the destination isn’t solely in the custody of e-Science projects – there’s a lot of relevant work going on that doesn’t carry that label.Note also that when we say “e-Science” we actually mean “e-Research”! We sometimes forget to say that.
  • Today I’m going to talk about the trajectory of e-Science – from its conception through examples of 3 generations, and I’ll reflect on how we are moving from generation 2 to generation 3. Different disciplines and especially communities may be in different stages of evolution.
  • CERN teams up with Leaders in Information Technology to build giant Data GridData accumulation rate: 10 Petabytes per year (equivalent to about 20 million CD-ROMs).http://public.web.cern.ch/press/pressreleases/Releases2001/PR11.01ECERNopenlab.html
  • Scientific workflow systems are a key automation technique for systematically handling the data deluge and giving us the “workflow” as a new sharable artefact of digital science – to record, repeat, reproduce and repurpose an experiment.This is an iconic slide by Carole Goble which is much repeated, reproduced and repurposed!
  • Today I’m going to talk about the trajectory of e-Science – from its conception through examples of 3 generations, and I’ll reflect on how we are moving from generation 2 to generation 3. Different disciplines and especially communities may be in different stages of evolution.
  • What we didn’t see much in phase 1 was sharing and reuse, but this is essential to harnessing of the new technology.The story on this slide involves sharing in a corridor and we will go on to see how we do it digitally! But it’s an important motivation. It led to new science.
  • myExperiment in one slide! It’s a “boutique” Web site with the largest public collection of scientific workflows. For lots more information see the myExperiment wiki http://wiki.myexperiment.org/BioCatalogue is a registry of Web Service in the life sciences and is directly based on the myExperiment experience. Sysmo and Methodbox grew from the myExperiment codebase – methodbox is an e-Social Science e-Laboratory for sharing and analysing data, and sysmo is customised to the systems biology domain. Seehttp://www.biocatalogue.org/http://www.methodbox.org/http://www.sysmo-db.org/
  • This is reflected in a third distinctive – the pack. This is Paul Fishers pack from the Tryps example.Some packs contain example input and output data so workflows can be checked for “decay” (they don’t actually rot, but the world changes round them).While others are looking at semantically enhanced publication, we are asking “what is the shared artefact of future research?” We come at the same problem from the other side. We have it surrounded! Our approach relieves us of the paper mindest – so, for example, a Research Object could contain information for many audiences and purposes, with a commonly interpreted core (social scientists will recognise the idea of a “boundary object”).
  • Today I’m going to talk about the trajectory of e-Science – from its conception through examples of 3 generations, and I’ll reflect on how we are moving from generation 2 to generation 3. Different disciplines and especially communities may be in different stages of evolution.
  • First something about words. This definition of e-Science is important – it reminds us that it isn’t just about technology but about people working together and being empowered by technology – and the emphasis on “science” reminds us that ultimately success is measured by new scientific outcome.At the turn of the decade this was a vision of the future. A programme was created called e-Science. The projects doing the innovation were labelled as “e-Science”. By the time we arrive, it’s just “science”. So “e-Science” has become the name of the journey rather than the destination. Note that the innovation that takes us to the destination isn’t solely in the custody of e-Science projects – there’s a lot of relevant work going on that doesn’t carry that label.Note also that when we say “e-Science” we actually mean “e-Research”! We sometimes forget to say that.
  • Now we look at myExperiment as a probe into the future behaviour of researchers. For example, these workflows by Francois Belleau show what could be described as another level of working – building on the new tooling.
  • Here we see bioinformaticians assembling the resources they need to answer a research question – and also demonstrating what the methods section of the future paper needs to look like.They are using Linked Data. We see the power – ease of assembly. This could be where the new computer science challenges lie in e-Research.
  • From The Galileo Project web site: http://galileo.rice.edu/sci/instruments/telescope.html- The earliest known illustration of a telescope. Giovanpattista della Porta included this sketch in a letter written in August 1609 - porta-sketchJohannes Hevelius (Poland, 1611-1687) observing with one of his telescopes (Source: Selenographia, 1647)Hubble_earth_horz and hubble - from http://hubble.nasa.gov/. Very Large Array from http://images.nrao.edu/Telescopes. Copyright requirement - include "NRAO/AUI/NSF" on slide.
  • Today I’m going to talk about the trajectory of e-Science – from its conception through examples of 3 generations, and I’ll reflect on how we are moving from generation 2 to generation 3. Different disciplines and especially communities may be in different stages of evolution.
  • That example comes from a Digging into Data project with the best project acronym ever. The projects is conducting a massive structural analysis of music in the internet archibe, to support musicologists. It illustrates many of the things we are now seeing in e-Research – crowdsourcing, annotation, community software development, high performance computation, data publication. This project involves UIUC, McGill and Oxford – and the supercomputer time is donated by NCSA.
  • That example comes from a Digging into Data project with the best project acronym ever. The projects is conducting a massive structural analysis of music in the internet archibe, to support musicologists. It illustrates many of the things we are now seeing in e-Research – crowdsourcing, annotation, community software development, high performance computation, data publication. This project involves UIUC, McGill and Oxford – and the supercomputer time is donated by NCSA.

The Evolution of e-Research: Machines, Methods and Music The Evolution of e-Research: Machines, Methods and Music Presentation Transcript

  • The Evolutionof e-Research
    Machines, Methods and Music
    David De Roure
  • Programming
    1981
    Maths
    Electronics
    Physics
    Music
    Medical electronics
    Networks
    PhD in distributed declarative
    programming language design
    Temporal Media
    MIT
    AJGH
    PH
    WH
    Hypermedia
    QBH
    Large scale
    Distributed
    Systems
    Transputers
    ProcessNetworks
    PEOPLE
    EOPLE
    Agents
    AdvancedKnowledge
    Technologies
    Devices
    Semantic Web
    Workflows
    Grid
    Statistics
    AmorphousComputing
    Equator
    Web 2
    Web
    Science
    Linked
    Data
    VREs
    Semantic
    Grid
    Environmental
    sensing
    e-Science
    Digital SocialResearch
    myExperiment
    Semantic Sensor Networks
    e-Laboratories
    Computational Musicology
    2010
  • Overview
    Generation 1: Early adopters
    Generation 2: Embedding
    Generation 3: Radical sharing
    SALAMI
    A case study in 3rd generation e-Research
  • e-Science
    e-Science was defined by John Taylor (Director General of the UK Research Councils) as
    global collaboration in key areas of science and the next generation of infrastructure that will enable it
    e-Science was the name of the destination
    It became the name of the journey
    When we arrive, the destination is just called science
  • e-Research
    “e-research extendse-Science andcyberinfrastructureto other disciplines, including the humanities andsocial sciences.”
    http://mitpress.mit.edu/catalog/item/default.asp?tid=12185&ttype=2
  • Generation 1
    2000 – 2005
  • ...the imminent flood of scientific data expected from the next generation of experiments, simulations, sensors and satellites
    Tony Hey and Anne Trefethen
    Source: CERN, CERN-EX-0712023, http://cdsweb.cern.ch/record/1203203
  • Jeremy Frey
    26/2/2007 | myExperiment | Slide 8
  • E. Science laboris
    • Workflows are the new rock and roll
    • Machinery for coordinating the execution of (scientific) services and linking together (scientific) resources
    • The era of Service Oriented Applications
    • Repetitive and mundane boring stuff made easier
    Carole Goble
  • Kepler
    Triana
    BPEL
    Trident
    Meandre
    Taverna
    Galaxy
  • co-shaping
    co-design
    co-evolution
    co-
    co-creation
    co-construction
    co-constitution
    co-realisation
  • http://webscience.org
  • My Chemistry Experiment
    Box of Chemists
    CombeChem
  • CombeChem
  • empower to equip or supply with an ability; enable
    servicethe performance of duties or the duties performed as or by a waiter or servant
  • Thanks to Iain Buchan
    and the chipmunks
    1st Generation Summary
    Early adoptors of tools.
    Characterised by researchers using tools within their particular problem area, with some re-use of tools, data and methods within the discipline.
    Traditional publishing is supplemented by publication of some digital artefacts like workflows and links to data.
    Science is accelerated and practice beginning to shift to emphasisein silicowork.
  • Generation 2
    2005 – 2010
  • Reuse, Recycling, Repurposing
    • Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle
    • Paul meets Jo. Jo is investigating Whipworm in mouse.
    • Jo reuses one of Paul’s workflow without change.
    • Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite.
    • Previously a manual two year study by Jo had failed to do this.
    Carole Goble
  • Carole Goble “e-Science is me-Science: What do Scientists want?”, EGEE 2006
    “There are these great collaboration tools that 12-year-olds are using. It’s all back to front.”
    Robert Stevens
  • “A biologist would rather share their toothbrush than their gene name”
    Mike Ashburner and others
    Professor in Dept of Genetics,
    University of Cambridge, UK
  • Data mining: my data’s mine and your data’s mine
  • photos
    movies
    slides
    workflows
  • too passé!
    Not
    mySpace for scientists!
    Facebook
    too open!
  • Researchers
    Social Scientists
    Open Repositories
    Social Networkers
    Developers
    • A probe into researcher behaviour
    • Open source (BSD) Ruby on Rails app
    • REST and SPARQL interfaces, supports Linked Data
    • Inspiration for: BioCatalogue, MethodBox and SysMO-SEEK
    • “Facebook for Scientists” ...but different to Facebook!
    • A repository of research methods
    • A community social network of people and things
    • A Social Virtual Research Environment
    myExperiment currently has 4400 members, 236 groups, 1336 workflows, 351 files and 141 packs
  • http://www.myexperiment.org/
  • Global collaboration in key areas of science and the next generation of infrastructure that will enable it
    Visits to www.myexperiment.org (Oct 2010)
    http://wiki.myexperiment.org
  • method
    data
  • Though this be madness, yet there is method in it
    *
    Data bonanza => Methods bonanza!
    ***
    Methods should be first class citizens
    Celebrate the flux! Let the data flow through the pipelines. Nail down the methods not the data!
    Towards “Linked Open Methods”
    **
    * Polonius in Hamlet ** Sean Bechhofer in Manchester *** Not the e-Science Envoy
  • It’s not just the data
    It’s what you do with it that counts
    And what other people do with it
    ...that you never thought of
  • Paul’s Pack
    Paul’s Research Object
    Workflow 16
    QTL
    Results
    produces
    Included in
    Published in
    Included in
    Feeds into
    Logs
    produces
    Included in
    Included in
    Metadata
    Slides
    Paper
    produces
    Published in
    Common pathways
    Workflow 13
    Results
  • The Six Rs of Research Object Behaviours
    Research Objects enable data-intensive research to be:
    Replayable – go back and see what happened
    Repeatable – run the experiment again
    Reproducible – independent expt to reproduce
    Reusable – use as part of new experiments
    Repurposeable – reuse the pieces in new expt
    Reliable – robust under automation
    Referenceable– citable and traceable
    http://blog.openwetware.org/deroure/?p=56
  • Semantically enhanced publication versus
    Shared digital Research Objects
    Challenging the mindset of paper-sized chunks
  • Documentsunder glass
  • 2nd Generation Summary
    Projects delivering now.
    Some institutional embedding.
    Key characteristic is re-use – of the increasing pool of tools, data and methods across areas/disciplines.
    Contain some freestanding, recombinant, reproducible research objects.
    New scientific practices are established and opportunities arise for completely new scientific investigations.
    Some expert curation.
  • Generation 3
    2010 – 2015
  • 4th Paradigm
    The Fourth Paradigm: Data-Intensive Scientific Discovery
    Presenting the firstbroad look at the rapidly emerging field of data-intensive science
    http://research.microsoft.com/en-us/collaboration/fourthparadigm/
  • http://blogs.nature.com/fourthparadigm/
  • Doug Kell
    BioEssays, 26(1):99–105, January 2004
  • Francois Belleau
  • Taverna
    A Bioinformatics Experiment
    Scott Marshall Marco Roos
    “…to discover proteins that interact with transmembrane proteins, particularly those that can be related to neuro-degenerative diseases in which amyloids play a significant role”
    Taverna provenance exposed as RDF
    myExperiment RDF document for a protein discovery workflow
    Mocked-up BioCatalogue document using myExperiment RDF data as example
    Provisional RDF documents obtained from the ConceptWiki (conceptwiki.org) development server
    An RDF document for an example protein, obtained from the RDF interface of the UniProt web site
  • LifeGuide
    http://www.lifeguideonline.org/
    Lucy Yardley
  • http://www.methodbox.org/
    MethodBox
    Enable cross disciplinary research into Major Public Health problems
    Ease handling data and sharing results and insights
  • http://www.galaxyzoo.org/
  • Arfon Smith
    http://www.zooniverse.org/
  • 3rd Generation Summary
    The solutions we'll be delivering in 5 years
    Characterised by global reuse of tools, data and methods across any discipline, and surfacing the right levels of complexity for the researcher.
    Routine use.
    Key characteristic is radical sharing.
    Research is significantly data driven – plundering the backlog of data, results and methods.
    Publishing by the social network
    Increasing automation and decision-support for the researcher – the VRE becomes assistive.
    Curation is autonomic and social.
  • Find a service & relax
    Intellectual ramps
    Easy and low risk to start
    Progress to advanced skills
    For researchers
    No obligation
    Go as far as you want
    Malcolm Atkinson
  • telescopes for the naked mind
    Datascopes
    Malcolm Atkinson
    NRAO/AUI/NSF
    From Signal to Understanding
  • Jeannette M. Wing COMMUNICATIONS OF THE ACM March 2006/Vol. 49, No. 3 Pages 33-35
  • Music and Linked Data
    2010 – 2011and beyond
  • http://www.openarchives.org/ore/terms/aggregates
    http://eprints.ecs.soton.ac.uk/id/eprint/20817
  • It’s about enabling the join
    Ben Fields, 6th October 2010
  • SALAMI: Structural Analysis of Large Amounts of Music Information
    David De Roure
    J. Stephen Downie
    Ichiro Fujinaga
  • www.diggingintodata.org
  • The SALAMI collaboration
    DDeR (e-Research South), J. Stephen Downie (Illinois) and Ichiro Fujinaga (McGill)
    NCSA donating 250,000 supercomputer hours
    350,000 pieces of music (23,000 hours)
    Internet Archive, DRAM, IMIRSEL, McGill
    Feature analysis and structural analysis
    Music Ontology by Yves Raimond (BBC)
    Musicologists from McGill and Southampton
    Sharing of analyses
    http://salami.music.mcgill.ca
  • Digital Music Collections
    23,000 hours ofrecorded music
    Music InformationRetrieval Community
    Community Software
    Crowdsourced ground truth
    Supercomputer
    250,000 hours NCSASupercomputer time
    Linked Data Repositories
  • Ashley Burgoyne
    http://www.sonicvisualiser.org/
  • MIREX Overview
    Began in 2005
    Tasks defined by community debate
    Data sets collected and/or donated
    Participants submit code to IMIRSEL
    Code rarely works first try 
    Huge labour consumption getting programs to work
    Meet at ISMIR to discuss results
    Stephen Downie
    http://www.music-ir.org/mirex
  • Meandre
    seasr.org/meandre
  • It’s web-like!
    “Ground Truth”
    Community
    Digital Audio
    “Signal”
    StructuralAnalysis
    Q. If and when should community-generated content be assimilated into managed repositories?
  • How country is my country?
    Kevin Page and Ben Fields
    http://www.nema.ecs.soton.ac.uk/countrycountry/
  • Music and computational thinking
    Stephen Downie
  • “Again, it [the Analytical Engine] might act upon other things besides number, were objects found whose mutual fundamental relations could be expressed by those of the abstract science of operations, and which should bealso susceptible of adaptations to the action of the operating notation and mechanism of the engine...”
  • “Supposing, for instance, that the fundamental relations of pitched sounds in the science of harmony and of musical composition were susceptible of such expression and adaptations, the engine might compose elaborate and scientific pieces of music of any degree of complexity or extent.”
    Ada, The Enchantress of Numbers: Poetical Science
    by Betty Alexandra Toole
    http://www.well.com/user/adatoole/
    Betty Alexandra Toole
  • I can write a workflow that creates workflows based on those of others, and automatically modify it – think genetic mutation and crossovers. Who owns it?
    I can register a query over an increasing number and diversity of “linked data” sources to ask new research questions.
    The computer can learn from the activities of 1,000,000 scientists – and be indistinguishable from them?
    What about the ethics of Citizen Social Science? Of citizens designing experiments?
    http://eresearch-ethics.org/
  • Co-*
    Methods
    Access ramps
    Research Objects
    Computational thinking
    Ethics of e-Research at scale
    Enjoy the Open Day!
  • david.deroure@oerc.ox.ac.uk
    Thanks to: Jeremy Frey & CombeChem; Carole Goble, myGrid and myExperiment; Iain Buchan & Obesity e-Lab; Sean Bechhofer; Doug Kell; Marco Roos; Lucy Yardley; Arfon Smith; Malcolm Atkinson; Stephen Downie, Kevin Page, Ben Fields, Ashley Burgoyneand NEMA/SALAMI; Betty Toole.
    http://www.myexperiment.org/packs/153