Codes, Clouds & Constellations: Open Science in the Data Decade


Published on

Presentation given at the CNI Meeting, Baltimore in April 2010.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Codes, Clouds & Constellations: Open Science in the Data Decade

  1. 1. UKOLN is supported by: Codes, Clouds & Constellations: Open Science in the Data Decade Dr Liz Lyon, Director, UKOLN, University of Bath, UK Associate Director, UK Digital Curation Centre CNI Meeting, Baltimore, April 2010 . This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0
  2. 2. <ul><li>Scaling to Share </li></ul><ul><li>Publication and Attribution </li></ul><ul><li>Pathways to Participation </li></ul><ul><li>Institutions and Informatics </li></ul> <ul><li>2010 Perspectives </li></ul><ul><li>November 2009 </li></ul><ul><li>Consultation </li></ul><ul><li>eResearch Australasia slides </li></ul><ul><li> </li></ul><ul><li>Progress, Prospects? </li></ul>
  3. 3. Scaling to Share Human Genome printed
  4. 4. From the Laboratory bench....
  5. 5. … to a national crystallography service....
  6. 6. Diamond Light Source
  7. 7. <ul><li>“ Bridging the chasm ” between the local laboratory bench and large scale facilities </li></ul><ul><li>Develop Integrated Information Model </li></ul><ul><li>Use cases and Inter-disciplinary Pilots </li></ul><ul><li>Cost-benefit analysis: before and after </li></ul>
  8. 8. Diamond Light Source National Crystallography Service (NCS) Local Earth Sciences Lab University of Cambridge Function International service -multiple communities UK service - multiple institutions. Also uses Diamond Lone researcher at institution - uses NCS and ISIS large-scale facility Administration Peer-reviewed proposal required Paper-based records –experiments, safety ERA, instrument time Multiple proposals, multiple forms Metadata Core Scientific MetaData Model eBank/eCrystals schema ? Identifiers Beam-line number DOI InChI ? Workflow Formulaic and bespoke Formulaic, unrecorded Complex, unrecorded Software In-house scripts In-house scripts + open-source suite In-house scripts + open-source suite Raw data In-house GDA store ATLAS data-store Laptop / local server Derived data Taken offsite on laptop / USB stick eCrystals repository Laptop / local server / USB stick
  9. 10. Technology race to market $1000 genome in <15 minutes 2013?
  10. 11. deluge challenges.... <ul><li>Large-scale data storage that is: </li></ul><ul><ul><li>Cost-effective (rent on-demand) </li></ul></ul><ul><ul><li>Secure (privacy and IPR) </li></ul></ul><ul><ul><li>Robust and resilient </li></ul></ul><ul><ul><li>Low entry barrier / ease-of-use </li></ul></ul><ul><ul><li>Has data-handling / transfer / analysis capability </li></ul></ul><ul><li>Move sequencing out of genome centres </li></ul><ul><li>“ .... analyse an entire human genome in a single day sitting with a laptop at your local Starbucks. ” </li></ul> services?
  11. 12. clouds in the media
  12. 13. Clients in the cloud
  13. 15. Post-genome decade Human genomes: >24 published & almost 200 unpublished
  14. 16. “ P4 medicine : predictive, personalised, preventive, participatory.” Leroy Hood – Institute for Systems Biology <ul><li>Each patient’s genome sequenced </li></ul><ul><li>Your genome is the basis of your medical record </li></ul><ul><li>New predictive models of health and disease </li></ul><ul><li>Individualised treatments focusing on preventative therapies </li></ul>Image from Scientific American Genome scale network biology Genomic data as a commodity
  15. 17. <ul><li>Sage Bionetworks : Integrative genomics </li></ul><ul><li>Develop predictive models of disease: liver / breast / colon cancer, diabetes, obesity </li></ul><ul><li>Open data in the Sage Commons </li></ul><ul><li>Human and mouse: clinical and genetics data </li></ul><ul><li>Congress San Francisco 23-24 April 2010 </li></ul>Stephen Friend
  16. 18. They have shared their data….
  17. 19. Heather Piwowar … but many researchers don’t share… … and are reluctant to re-use data…
  18. 20. Publication and Attribution /
  19. 21. Calls for action, new metrics
  20. 22. <ul><li>Journal </li></ul><ul><li>Article </li></ul><ul><li>Workflow </li></ul><ul><li>Data </li></ul><ul><li>Annotation </li></ul><ul><li>Concept </li></ul>Macro Micro / Nano Attribution granularity ... complexity challenges...
  21. 23. Citing network models <ul><li>Multiple data sources </li></ul><ul><li>Many standards </li></ul><ul><li>Workflow integration </li></ul><ul><li>User requirements </li></ul><ul><li>Service functionality? </li></ul>
  22. 24. Pathways to Participation
  23. 25. Continuum of Openness Open access Closed Access Participation Lone scholar Professional, experts Volunteers interested amateurs Citizen science “ dark data” Creative Commons Attribution-Non-Commercial-Share Alike 2.0
  24. 26. Data Informatics: Logistics dilemma Professional scientist Citizens Capability Capacity Data scientists , LIS Peer production Volunteers, interested amateurs Community curation Creative Commons Attribution-Non-Commercial-Share Alike 2.0 Professional scientist Observations Audit Preservation Ontologies Metadata schema Annotation Data management plans Selection & Appraisal Data cleansing Training Visualisation
  25. 28. Peer Production
  26. 30. Using gaming to drive curation
  27. 31. Professional Scientists Enthusiastic amateurs Training Citizen scientist Standards and ethics Local : natural history, environ. Peer-review Global : astronomy Organisational support Self-supporting
  28. 32. Citizen science...
  29. 33. Privacy issues? … “ participatory urbanism”?
  30. 34. “ You have zero privacy anyway. Get over it” Scott McNealy, CEO Sun Microsystems, 1999
  31. 36. Working with science professionals ...cultural challenges for faculty?
  32. 37. Institutions and Informatics University of Edinburgh Informatics Forum
  33. 38. Open Science at Web-Scale Report 2009
  34. 39. Institutional response : High Throughput Biology
  35. 40. <ul><li>North Carolina universities </li></ul><ul><li>Cyber-infrastructure project </li></ul><ul><li>Data cloud across three campuses </li></ul><ul><li>“ regional” </li></ul><ul><li>Policy & practice </li></ul>
  36. 41. New data support structures
  37. 42. Facilitating team science - Future Chips - Biocomputation & Bioinformatics - Tetherless World - Integrative Systems Biology - Graphic designers? - Animators? - Social scientists? - Legal experts?
  38. 45. Embedding data informatics education ...for faculty & LIS...
  39. 46. Take homes <ul><li>Data sharing requires pragmatic solutions </li></ul><ul><li>Attribution granularity & citation complexity </li></ul><ul><li>We need “the crowd” </li></ul><ul><li>Institutional strategies embrace informatics </li></ul><ul><li>The prospects are transformational ... </li></ul>
  40. 47. Slides will be available at :