Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Evolution
of e-Research
Machines, Methods and Music
David De Roure
MathsPhysics
Medical
electronics PhD in distributed declarative
programming language design
Hypermedia
Large scale
Distrib...
Overview
Generation 1: Early adopters
Generation 2: Embedding
Generation 3: Radical sharing
SALAMI
A case study in 3rd gen...
e-Science
• e-Science was defined by John Taylor (Director
General of the UK Research Councils) as
global collaboration in...
“e-research extends
e-Science and
cyberinfrastructure
to other disciplines,
including the
humanities and
social sciences.”...
2000 – 2005
Generation 1
...the imminent flood of
scientific data expected
from the next generation of
experiments, simulations,
sensors and satell...
26/2/2007 | myExperiment |
Slide 8
Jeremy Frey
• Workflows are the new rock
and roll
• Machinery for coordinating
the execution of (scientific)
services and linking toge...
Kepler
Triana
BPEL
Taverna
Trident
Meandre
Galaxy
co-shaping
co-design
co-creation
co-constitution
co-evolution
co-construction
co-
co-realisation
http://webscience.org
Box of Chemists
My Chemistry Experiment
CombeChem
CombeChem
empower
to equip or supply with an ability;
enable
service
the performance of duties or the
duties performed as or by a
wa...
Early adoptors of tools.
Characterised by researchers using tools within their
particular problem area, with some re-use o...
2005 – 2010
Generation 2
• Paul writes workflows for identifying biological
pathways implicated in resistance to
Trypanosomiasis in cattle
• Paul m...
Carole Goble “e-Science
is me-Science: What do
Scientists want?”, EGEE
2006
“There are these great
collaboration tools tha...
“A biologist would rather share their
toothbrush than their gene name”
Mike Ashburner and others
Professor in Dept of Gene...
Data mining: my data’s mine and your
data’s mine
workflows
photos
movies
slides
mySpace for scientists!FacebookNot
too open!
too passé!
Open
Repositories
Researchers
Social
Networkers
Developers
Social
Scientists
 “Facebook for Scientists”
...but different to Facebook!
 A repository of research
methods
 A community social network
...
http://www.myexperiment.org/
Visits to www.myexperiment.org (Oct 2010)
Global collaboration
in key areas of
science and the next
generation of
infrastr...
data
method
Methods should be first class citizens
Celebrate the flux! Let the data flow
through the pipelines. Nail down the
methods ...
It’s not just the data
And what other people do with it
...that you never thought of
It’s what you do with it that counts
Results
Logs
Results
Metadata
PaperSlides
Feeds into
produces
Included
in
produces Published in
produces
Included in
Inclu...
Research Objects enable data-intensive research to be:
1. Replayable – go back and see what happened
2. Repeatable – run t...
Semantically enhanced publication versus
Shared digital Research Objects
Challenging the mindset of paper-sized chunks
Documents
under glass
Projects delivering now.
Some institutional embedding.
Key characteristic is re-use – of the increasing pool of
tools, dat...
2010 – 2015
Generation 3
4th Paradigm
The Fourth Paradigm:
Data-Intensive
Scientific Discovery
Presenting the first
broad look at the rapidly
emerg...
http://blogs.nature.com/fourthparadigm/
BioEssays, 26(1):99–105, January 2004
Doug Kell
Francois Belleau
“…to discover proteins that interact with transmembrane
proteins, particularly those that can be related to neuro-
degener...
LifeGuide http://www.lifeguideonline.org/
Lucy Yardley
MethodBox http://www.methodbox.org/
Enable cross
disciplinary research
into Major Public
Health problems
Ease handling dat...
http://www.galaxyzoo.org/
Arfon Smith
http://www.zooniverse.org/
The solutions we'll be delivering in 5 years
Characterised by global reuse of tools, data and methods
across any disciplin...
Easy and low risk to start
Progress to advanced skills
For researchers
No obligation
Go as far as you want
Find a service ...
NRAO/AUI/NSF
telescopes for the naked mindDatascopes
Malcolm Atkinson
From Signal to Understanding
Jeannette M. Wing COMMUNICATIONS OF THE ACM March 2006/Vol. 49, No. 3 Pages 33-35
2010 – 2011
and beyond
Music and Linked Data
http://www.openarchives.org/ore/terms/aggregates
http://eprints.ecs.soton.ac.uk/id/eprint/20817
It’s about enabling the join
Ben Fields, 6th October 2010
SALAMI: Structural Analysis
of Large Amounts of Music
Information
David De Roure
J. Stephen Downie
Ichiro Fujinaga
www.diggingintodata.org
The SALAMI collaboration
• DDeR (e-Research South), J. Stephen Downie (Illinois) and
Ichiro Fujinaga (McGill)
• NCSA donat...
Digital Music
Collections
Crowdsourced
ground truth
Community
Software
Linked Data
Repositories
Supercomputer
23,000 hours...
Ashley Burgoyne
http://www.sonicvisualiser.org/
MIREX Overview
• Began in 2005
• Tasks defined by community debate
• Data sets collected and/or donated
• Participants sub...
MIREX TASKS
Audio Artist Identification Audio Onset Detection
Audio Beat Tracking Audio Tag Classification
Audio Chord Det...
seasr.org/meandreMeandre
“Signal”
Digital Audio
“Ground Truth”
Community
It’s web-like!
Q. If and when should community-generated content be assimi...
How country is
my country?
Kevin Page and Ben Fieldshttp://www.nema.ecs.soton.ac.uk/countrycountry/
Stephen Downie
Music and computational thinking
“Again, it [the Analytical
Engine] might act upon
other things besides
number, were objects
found whose mutual
fundamental...
“Supposing, for instance,
that the fundamental
relations of pitched
sounds in the science of
harmony and of musical
compos...
I can write a workflow that creates
workflows based on those of others, and
automatically modify it – think genetic
mutati...
Co-*
Methods
Access ramps
Research Objects
Computational thinking
Ethics of e-Research at scale
david.deroure@oerc.ox.ac.uk
Thanks to: Jeremy Frey & CombeChem; Carole Goble, myGrid and
myExperiment; Iain Buchan & Obesi...
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
Upcoming SlideShare
Loading in …5
×

The Evolution of e-Research: Machines, Methods and Music

3,979 views

Published on

David De Roure's Inaugural Lecture on 28th October at Oxford e-Research Centre, University of Oxford, UK

10 years ago we saw a few early adopters of e-Science technology; now we see acceleration of research through broader adoption and sharing of tools, techniques and artefacts, both for 'big science' and the 'long tail scientist'.

Will this incremental trend continue or are we seeing glimpses of a phase change ahead, where researchers harness these emerging digital capabilities to address research questions in ways that simply were not possible before?

This talk will describe three generations of e-Research, using the myExperiment social website as a lens to glimpse future research practice, and focusing on a web-scale computational musicology project as an illustration of 3rd generation thinking.

Also available from http://wiki.myexperiment.org/index.php/Presentations

Published in: Technology, Education
  • Be the first to comment

The Evolution of e-Research: Machines, Methods and Music

  1. 1. The Evolution of e-Research Machines, Methods and Music David De Roure
  2. 2. MathsPhysics Medical electronics PhD in distributed declarative programming language design Hypermedia Large scale Distributed Systems Semantic Sensor Networks Web Science Devices Amorphous Computing Digital Social Research Equator e-Science MusicElectronics Programming Transputers Temporal Media Computational Musicology Advanced Knowledge Technologies Semantic Web Process Networks myExperiment Web 2 Statistics Grid Linked Data 1981 2010 Environmental sensing Networks VREs MITAJGH PH WH PEOPLEOPLE Agents Semantic Grid e-Laboratories Workflows QBH
  3. 3. Overview Generation 1: Early adopters Generation 2: Embedding Generation 3: Radical sharing SALAMI A case study in 3rd generation e-Research
  4. 4. e-Science • e-Science was defined by John Taylor (Director General of the UK Research Councils) as global collaboration in key areas of science and the next generation of infrastructure that will enable it • e-Science was the name of the destination • It became the name of the journey • When we arrive, the destination is just called science
  5. 5. “e-research extends e-Science and cyberinfrastructure to other disciplines, including the humanities and social sciences.” e-Research http://mitpress.mit.edu/catalog/item/default.asp?tid=12185&ttype=2
  6. 6. 2000 – 2005 Generation 1
  7. 7. ...the imminent flood of scientific data expected from the next generation of experiments, simulations, sensors and satellites Tony Hey and Anne Trefethen Source: CERN, CERN-EX-0712023, http://cdsweb.cern.ch/record/1203203
  8. 8. 26/2/2007 | myExperiment | Slide 8 Jeremy Frey
  9. 9. • Workflows are the new rock and roll • Machinery for coordinating the execution of (scientific) services and linking together (scientific) resources • The era of Service Oriented Applications • Repetitive and mundane boring stuff made easier Carole Goble E. Science laboris
  10. 10. Kepler Triana BPEL Taverna Trident Meandre Galaxy
  11. 11. co-shaping co-design co-creation co-constitution co-evolution co-construction co- co-realisation
  12. 12. http://webscience.org
  13. 13. Box of Chemists My Chemistry Experiment CombeChem
  14. 14. CombeChem
  15. 15. empower to equip or supply with an ability; enable service the performance of duties or the duties performed as or by a waiter or servant
  16. 16. Early adoptors of tools. Characterised by researchers using tools within their particular problem area, with some re-use of tools, data and methods within the discipline. Traditional publishing is supplemented by publication of some digital artefacts like workflows and links to data. Science is accelerated and practice beginning to shift to emphasise in silico work. 1st Generation Summary Thanks to Iain Buchan and the chipmunks
  17. 17. 2005 – 2010 Generation 2
  18. 18. • Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle • Paul meets Jo. Jo is investigating Whipworm in mouse. • Jo reuses one of Paul’s workflow without change. • Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. • Previously a manual two year study by Jo had failed to do this. Reuse, Recycling, Repurposing Carole Goble
  19. 19. Carole Goble “e-Science is me-Science: What do Scientists want?”, EGEE 2006 “There are these great collaboration tools that 12-year-olds are using. It’s all back to front.” Robert Stevens
  20. 20. “A biologist would rather share their toothbrush than their gene name” Mike Ashburner and others Professor in Dept of Genetics, University of Cambridge, UK
  21. 21. Data mining: my data’s mine and your data’s mine
  22. 22. workflows photos movies slides
  23. 23. mySpace for scientists!FacebookNot too open! too passé!
  24. 24. Open Repositories Researchers Social Networkers Developers Social Scientists
  25. 25.  “Facebook for Scientists” ...but different to Facebook!  A repository of research methods  A community social network of people and things  A Social Virtual Research Environment  A probe into researcher behaviour  Open source (BSD) Ruby on Rails app  REST and SPARQL interfaces, supports Linked Data  Inspiration for: BioCatalogue, MethodBox and SysMO-SEEK myExperiment currently has 4400 members, 236 groups, 1336 workflows, 351 files and 141 packs
  26. 26. http://www.myexperiment.org/
  27. 27. Visits to www.myexperiment.org (Oct 2010) Global collaboration in key areas of science and the next generation of infrastructure that will enable it http://wiki.myexperiment.org
  28. 28. data method
  29. 29. Methods should be first class citizens Celebrate the flux! Let the data flow through the pipelines. Nail down the methods not the data! Towards “Linked Open Methods” Though this be madness, yet there is method in it * Polonius in Hamlet ** Sean Bechhofer in Manchester *** Not the e-Science Envoy * *** ** Data bonanza => Methods bonanza!
  30. 30. It’s not just the data And what other people do with it ...that you never thought of It’s what you do with it that counts
  31. 31. Results Logs Results Metadata PaperSlides Feeds into produces Included in produces Published in produces Included in Included in Included in Published in Workflow 16 Workflow 13 Common pathways QTL Paul’s PackPaul’s Research Object
  32. 32. Research Objects enable data-intensive research to be: 1. Replayable – go back and see what happened 2. Repeatable – run the experiment again 3. Reproducible – independent expt to reproduce 4. Reusable – use as part of new experiments 5. Repurposeable – reuse the pieces in new expt 6. Reliable – robust under automation 7. Referenceable – citable and traceable The Six Rs of Research Object Behaviours http://blog.openwetware.org/deroure/?p=56
  33. 33. Semantically enhanced publication versus Shared digital Research Objects Challenging the mindset of paper-sized chunks
  34. 34. Documents under glass
  35. 35. Projects delivering now. Some institutional embedding. Key characteristic is re-use – of the increasing pool of tools, data and methods across areas/disciplines. Contain some freestanding, recombinant, reproducible research objects. New scientific practices are established and opportunities arise for completely new scientific investigations. Some expert curation. 2nd Generation Summary
  36. 36. 2010 – 2015 Generation 3
  37. 37. 4th Paradigm The Fourth Paradigm: Data-Intensive Scientific Discovery Presenting the first broad look at the rapidly emerging field of data- intensive science http://research.microsoft.com/en-us/collaboration/fourthparadigm/
  38. 38. http://blogs.nature.com/fourthparadigm/
  39. 39. BioEssays, 26(1):99–105, January 2004 Doug Kell
  40. 40. Francois Belleau
  41. 41. “…to discover proteins that interact with transmembrane proteins, particularly those that can be related to neuro- degenerative diseases in which amyloids play a significant role” 1) Taverna provenance exposed as RDF 2) myExperiment RDF document for a protein discovery workflow 3) Mocked-up BioCatalogue document using myExperiment RDF data as example 4) Provisional RDF documents obtained from the ConceptWiki (conceptwiki.org) development server 5) An RDF document for an example protein, obtained from the RDF interface of the UniProt web site A Bioinformatics Experiment Scott Marshall Marco Roos
  42. 42. LifeGuide http://www.lifeguideonline.org/ Lucy Yardley
  43. 43. MethodBox http://www.methodbox.org/ Enable cross disciplinary research into Major Public Health problems Ease handling data and sharing results and insights
  44. 44. http://www.galaxyzoo.org/
  45. 45. Arfon Smith http://www.zooniverse.org/
  46. 46. The solutions we'll be delivering in 5 years Characterised by global reuse of tools, data and methods across any discipline, and surfacing the right levels of complexity for the researcher. Routine use. Key characteristic is radical sharing. Research is significantly data driven – plundering the backlog of data, results and methods. Publishing by the social network Increasing automation and decision-support for the researcher – the VRE becomes assistive. Curation is autonomic and social. 3rd Generation Summary
  47. 47. Easy and low risk to start Progress to advanced skills For researchers No obligation Go as far as you want Find a service & relax Intellectual ramps Malcolm Atkinson
  48. 48. NRAO/AUI/NSF telescopes for the naked mindDatascopes Malcolm Atkinson From Signal to Understanding
  49. 49. Jeannette M. Wing COMMUNICATIONS OF THE ACM March 2006/Vol. 49, No. 3 Pages 33-35
  50. 50. 2010 – 2011 and beyond Music and Linked Data
  51. 51. http://www.openarchives.org/ore/terms/aggregates http://eprints.ecs.soton.ac.uk/id/eprint/20817
  52. 52. It’s about enabling the join Ben Fields, 6th October 2010
  53. 53. SALAMI: Structural Analysis of Large Amounts of Music Information David De Roure J. Stephen Downie Ichiro Fujinaga
  54. 54. www.diggingintodata.org
  55. 55. The SALAMI collaboration • DDeR (e-Research South), J. Stephen Downie (Illinois) and Ichiro Fujinaga (McGill) • NCSA donating 250,000 supercomputer hours • 350,000 pieces of music (23,000 hours) – Internet Archive, DRAM, IMIRSEL, McGill • Feature analysis and structural analysis • Music Ontology by Yves Raimond (BBC) • Musicologists from McGill and Southampton • Sharing of analyses http://salami.music.mcgill.ca
  56. 56. Digital Music Collections Crowdsourced ground truth Community Software Linked Data Repositories Supercomputer 23,000 hours of recorded music 250,000 hours NCSA Supercomputer time Music Information Retrieval Community
  57. 57. Ashley Burgoyne http://www.sonicvisualiser.org/
  58. 58. MIREX Overview • Began in 2005 • Tasks defined by community debate • Data sets collected and/or donated • Participants submit code to IMIRSEL • Code rarely works first try  • Huge labour consumption getting programs to work • Meet at ISMIR to discuss results Stephen Downie http://www.music-ir.org/mirex
  59. 59. MIREX TASKS Audio Artist Identification Audio Onset Detection Audio Beat Tracking Audio Tag Classification Audio Chord Detection Audio Tempo Extraction Audio Classical Composer ID Multiple F0 Estimation Audio Cover Song Identification Multiple F0 Note Detection Audio Drum Detection Query-by-Singing/Humming Audio Genre Classification Query-by-Tapping Audio Key Finding Score Following Audio Melody Extraction Symbolic Genre Classification Audio Mood Classification Symbolic Key Finding Audio Music Similarity Symbolic Melodic Similarity
  60. 60. seasr.org/meandreMeandre
  61. 61. “Signal” Digital Audio “Ground Truth” Community It’s web-like! Q. If and when should community-generated content be assimilated into managed repositories? Structural Analysis
  62. 62. How country is my country? Kevin Page and Ben Fieldshttp://www.nema.ecs.soton.ac.uk/countrycountry/
  63. 63. Stephen Downie Music and computational thinking
  64. 64. “Again, it [the Analytical Engine] might act upon other things besides number, were objects found whose mutual fundamental relations could be expressed by those of the abstract science of operations, and which should be also susceptible of adaptations to the action of the operating notation and mechanism of the engine...”
  65. 65. “Supposing, for instance, that the fundamental relations of pitched sounds in the science of harmony and of musical composition were susceptible of such expression and adaptations, the engine might compose elaborate and scientific pieces of music of any degree of complexity or extent.” Ada, The Enchantress of Numbers: Poetical Science by Betty Alexandra Toole http://www.well.com/user/adatoole/ Betty Alexandra Toole
  66. 66. I can write a workflow that creates workflows based on those of others, and automatically modify it – think genetic mutation and crossovers. Who owns it? I can register a query over an increasing number and diversity of “linked data” sources to ask new research questions. http://eresearch-ethics.org/ The computer can learn from the activities of 1,000,000 scientists – and be indistinguishable from them? What about the ethics of Citizen Social Science? Of citizens designing experiments?
  67. 67. Co-* Methods Access ramps Research Objects Computational thinking Ethics of e-Research at scale
  68. 68. david.deroure@oerc.ox.ac.uk Thanks to: Jeremy Frey & CombeChem; Carole Goble, myGrid and myExperiment; Iain Buchan & Obesity e-Lab; Sean Bechhofer; Doug Kell; Marco Roos; Lucy Yardley; Arfon Smith; Malcolm Atkinson; Stephen Downie, Kevin Page, Ben Fields, Ashley Burgoyne and NEMA/SALAMI; Betty Toole. http://www.myexperiment.org/packs/153

×