Data management: international challenges, national infrastructure, and institutional responses


Published on

Presentation delivered to UKOLN on April 1, 2011.

Published in: Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • So, let’s look at the state of data in scholarly communication. Unfortunately, it’s inconvenient, imprisoned, invisible, inaccessible, and incomprehensible
  • Need to retype
  • Near impossible to liberate. Talk about ChemXSeer example and DataThief Java application
  • Too transformed
  • Discipline scientist may know how to get these data but I don’t
  • NOTE: Some of these arguments are at individual, national, global levelEfficiency for researcher – don’t reinvent wheelValidation – repeatability of researchIntegrity – of scholarly recordValue for Money for funder – public money funded it, it should be available to public (ClimateGate!)Self-interest – sharing with a future self, greater visibility, more citationsSo, what are some good stories around data sharing?
  • Number of initiatives around the world working to do a better job on data: NSF DataNet (Sayeed/Bill later in conference), JISC Managing Research Data, NL SURF/DANS
  • I’m going to take a programmatic view (because that explains how we are funding stuff), while recognising that the issues don’t necessarily fit neatly inside those boundaries
  • And thank you for the opportunity to speak to you this afternoon.
  • Data management: international challenges, national infrastructure, and institutional responses

    1. 1. Data Management: International challenges, National Infrastructure, and Institutional Responses - an Australian Perspective<br />Dr Andrew Treloar<br />Director of Technology<br />Australian National Data Service<br />
    2. 2. International Challenges<br />
    3. 3. Inconvenient data<br />DOI: 10.1098/rsta.2005.1569<br />
    4. 4. Imprisoneddata<br />DOI 10.1098/rsta.2006.1793<br />
    5. 5. Invisible data<br />DOI 10.1098/rsta.2006.1793<br />
    6. 6. Inaccessible data<br />
    7. 7. Incomprehensible data<br /><br />7<br />
    8. 8. 8<br />Summary<br />Not a first class object<br />Unmanaged<br />Disconnected<br />Unfindable<br />Unreusable<br />
    9. 9. Why re-use data?<br />Efficiency<br />Validation<br />Integrity<br />Value for money<br />Self-interest<br />
    10. 10. Astronomy case study<br />Hubble Space Telescope (HST) operating since 1990<br />Observations are proposed, and if accepted, data is collected and made available to the proposers – who then write a research paper<br />Each year around 1,000 proposals are reviewed and approximately 200 are selected, for a total of 20,000 individual observations<br />Data is stored at the Space Telescope Science Institute and made available after embargo period<br />There are now more research papers written by “second use” of the research data, than by the use initially proposed<br />10<br />
    11. 11. 11<br />Source:<br />
    12. 12. Cancer micro-array trial case study<br />Piwowar, et. al., “Sharing Detailed Research Data Is Associated with Increased Citation Rate”<br /><br />Looked at the citation history of cancer microarray clinical trial publications<br />Found that publicly available data was associated with a 69% increase in citations, independent of journal impact factor, date of publication, and author country of origin<br />12<br />
    13. 13. Alzheimer’s Disease NeuroImaging Initiative<br />Collaborative effort to find brain biomarkers for Alzheimer’s disease<br />Key: All brain scans and other data freely available to scientific community without embargo.<br />Over 3K full downloads and 1M scan downloads by over 400 investigators world-wide<br />Over 100 publications<br />13<br />Institut Douglas CC BY-NC-ND <br /><br />
    14. 14. National Infrastructure<br />14<br />
    15. 15. National approaches<br />Number of different countries: UK, US, DE, NL<br />Different environments => different ecosystems<br />and so some local tradeoffs<br />But some common themes emerging:<br />Do the things that only you can do<br />Be the ‘voice for data’<br />Prime the pump<br />
    16. 16. Australian National Data Service<br /><ul><li>An initiative of the Australian Government being conducted as part of the National Collaborative Research Infrastructure Strategy ($A24M) and the Super Science Initiative ($A48M)
    17. 17. A collaboration between Monash University, the Australian National University and CSIRO
    18. 18. Nearly 50staff, funded to mid 2013
    19. 19. More researchers re-using more data more often
    20. 20. Data as a first-class object</li></ul><br />16<br />
    21. 21. ANDS is enabling the transformation of:<br />Data that are:<br />Unmanaged<br />Disconnected<br />Invisible<br />Single use<br />17<br />Collections that are:<br />Managed<br />Connected<br />Findable<br />Reusable<br />so that Australian researchers can easily discover, access and re-use data<br />
    22. 22. 18<br />Defining characteristics of ANDS<br />Building national services<br />Engaging with institutions not researchers (mostly)<br />Working within funding constraints<br />use, not amount!<br />Building the Australian Research Data Commons<br />
    23. 23.
    24. 24. 20<br />ANDS Programs<br />Frameworks and Capability<br />Seeding the Commons<br />Data Capture<br />Metadata Stores<br />ARDC Core<br />Public Sector Data<br />Applications<br />
    25. 25. 21<br />Spending profile<br />
    26. 26. RDA Demo<br /><br />22<br />
    27. 27. Institutional Responses<br />
    28. 28. 24<br />Driven by Australian Code for Responsible Conduct of Research<br />Equivalent of UKRIO’s Code of Practice for Research: Promoting good practice and preventing misconduct<br />Takes significant time to get accepted<br />ANDS providing models of good practice<br />Seeding the Commons<br />U->M<br />Data management policy and planning<br />
    29. 29. 25<br />Retrospective data description<br />Different selection mechanisms<br />Seeding the Commons<br />U->M<br />Fixing the past<br />
    30. 30. 26<br />Improving internal CRIS systems<br />Better integration<br />Moving beyond publications<br />Better links to data collection descriptions<br />Seeding the Commons, Metadata Stores<br />D->C<br />
    31. 31. 27<br />Facilitating easier/better capture of data and metadata from selected ‘instruments’<br />Making the right thing easier<br />Improving quality of metadata<br />Data Capture<br />U->M<br />S->R<br />Fixing the future<br />
    32. 32. 28<br />Describing institutions research data assets<br />Series of metadata stores rollouts plus some ancillary activity<br />Metadata Stores, Seeding the Commons, Data Capture<br />D->C<br />I->F<br />
    33. 33. 29<br />
    34. 34. Ongoing Issues<br />30<br />
    35. 35. Country-Institution-Discipline<br />Who wins?<br />Who should win?<br />31<br />
    36. 36. Sustainability, sustainability, sustainability…<br />Institutional activity<br />National services/resources<br />Developed software<br />32<br />
    37. 37. 33<br />Priming the pump, or continuing to pump?<br />If institutions/researchers/disciplines don’t care, why should the funders?<br />Role of Government<br />
    38. 38. Questions/Links<br /><br /><br /><br />@atreloar<br /><br />