Evolution or revolution? The changing data landscape


Published on

Presentation given by Liz Lyon (UKOLN) at the 1st DCC Regional Roadshow, Bath, UK in November 2010.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • CATH database (http://www.cathdb.info/). Its about protein structure classification, so it is quite niche.
    SCOP (http://scop.mrc-lmb.cam.ac.uk/scop), the other protein structure classification database. Its information is manually curated by Alexey Murzin and his group at the MRC Laboratory of Molecular Biology in Cambridge. Its web page still looks like it belongs in the late 90s.
    ChemSpider was originally a homemade project but was bought out by the Royal Society of Chemistry when they realised what a valuable resource it was to chemists.
    Massive centralisation – clouds and curated core facilitiesMassive decentralisation – sticks, spreadsheets, wikis
  • Evolution or revolution? The changing data landscape

    1. 1. A centre of expertise in digital information management www.ukoln.ac.uk UKOLN is supported by: Evolution or revolution? The changing data landscape Dr Liz Lyon, Director, UKOLN, University of Bath, UK Associate Director, UK Digital Curation Centre 1st DCC Regional Roadshow, Bath, November 2010 . This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0
    2. 2. “Data sets are becoming the new instruments of science” Dan Atkins, Univ Michigan
    3. 3. Digital data as the new special collections? Sayeed Choudhury, Johns Hopkins
    4. 4. Research data : institutional crown jewels? http://www.flickr.com/photos/lifes__too_short__to__drink__cheap__wine/4754234186/
    5. 5. Perspectives • Environmental scan – Scale and complexity – Infrastructure – Open science • Policy – Funders – Institutions – Ethics & IP • Practice Challenges – Storage – Incentives – Costs & Sustainability http://www.flickr.com/photos/thegreenalbum/3997609142/
    6. 6. Big science
    7. 7. PDB GenBank UniProt Pfam Spreadsheets, Notebooks Local, Lost High throughput experimental methods Industrial scale Commons based production Publicly data sets Cherry picked results Preserved CATH, SCOP (Protein Structure Classification) ChemSpider Data collections Slide: Carole Goble
    8. 8. A centre of expertise in digital information management www.ukoln.ac.uk Structural Sciences Infrastructure
    9. 9. Infrastructure Roadmap Cross Organisations
    10. 10. Infrastructure Roadmap Cross Disciplines
    11. 11. Infrastructure Roadmap Open Science
    12. 12. Open Laboratories
    13. 13. A centre of expertise in digital information management www.ukoln.ac.uk • Faculty work with public • Smartphone apps facilitation • Societal benefits Citizen as scientist
    14. 14. Validate results data
    15. 15. Policy
    16. 16. INCREMENTAL ProjectInstitutional perspective • Creating & organising data • Storage and access • Back-up • Preservation • Sharing and re-use The majority of people felt that some form of policy or guidance was needed....
    17. 17. Jeff Haywood, RDMF V October 2010 http://www.dcc.ac.uk/sites/default/files/documents/RDMF/RDMF5/Haywood.pdf
    18. 18. “While many researchers are positive about sharing data in principle, they are almost universally reluctant in practice. ..... using these data to publish results before anyone else is the primary way of gaining prestige in nearly all INCREMENTAL Project “Data sharing was more readily discussed by early career researchers.”
    19. 19. “In our view, CRU should have been more open with its raw data…” Data is headline news JISC FoI FAQ
    20. 20. P4 medicine: Predictive, Personalised, Preventive, Participatory. Leroy Hood – Institute for Systems Biology Your genome is basis for your medical record
    21. 21. Open data and ethics • Direct-to-Consumer kits • Informed consent? • Privacy? • UC Berkeley initiative • Implications for HE students & staff?
    22. 22. Policy Gaps... • Is Policy disconnected from Practice? – Data Sharing – Data Licensing – Ethics and Privacy – Citizen Science & Public Engagement – Data Storage, Selection & Appraisal – Data Citation and Attribution
    23. 23. “Departments don’t have guidelines or norms for personal back-up and researcher procedure, knowledge and diligence varies tremendously. Many have experienced moderate to catastrophic data loss” Incremental Project Report, June 2010 http://www.flickr.com/photos/mattimattila/3003324844/
    24. 24. Data storage... The case for cloud computing in genome informatics. Lincoln D Stein, May 2010 – Scaleable – Cost-effective (rent on-demand) – Secure (privacy and IPR) – Robust and resilient – Low entry barrier / ease-of-use – Has data-handling / transfer / analysis capability • Cloud services?
    25. 25. Your data in the cloud
    26. 26. Incentivising data management
    27. 27. Sustainability: Who owns? Who benefits? Who selects? Who preserves? Who pays?
    28. 28. KRDS
    29. 29. Chicago Mart Plaza, 6-8 December 2010 Thank you…