1. Data, Librarians, and Services TICER 2010 Dr Andrew TreloarDirector of Technology, ANDS 1
2. Contents Data – the past and the future Data and scholarly communications Data problems in published literature Why re-use data? Data sharing services and librarians’ role 2
3. A historical perspective Data capture and management problems have been with us for a while… But for how long? And what are some of the basic operations?
9. Doomed data http://www.learningcurve.gov.uk/focuson/domesday/take-a-closer-look/ In the vill in which St. Peter’s Church is situated [Westminster] the abbot of the same place holds 13½ hides. There is land for 11 ploughs. To the demesne belongs 9 hides and 1 virgate, and there are 4 ploughs. The villeins have 6 ploughs, and there could be 1 plough more. There are 9 villeins each on 1 virgate and 1 villein on 1 hide, and 9 villeins on each half a virgate and 1 cottar on 5 acres, and 41 cottars who pay 40 shillings a year for their gardens. [There is] Meadow for 11 ploughs, pasture for the livestock of the vill, woodland for 100 pigs, and 25 houses of the abbot’s knights and other men who pay 8 shillings a year. In all it is worth £10; when received, the same; TRE £12. This manor belonged and belongs to the demesne of St. Peter’s Church
10.
11. “A Correct Tide-Table, Shewing the True Times of the High-Waters at London-Bridge, to Every Day in the Year 1683. By Mr. Flamstead” Philosophical Transactions, Vol. 13, (1683), pp. 10-15
12. “An Observation of the Beginning of the Lunar Eclipse which Hapned Aug. 19. 1681. in the Morning, Made on the Island of St. Lawrence or Madagascar, by Mr. Tho. Heathcot, and Communicated by Mr. Flamstead” Philosophical Transactions, Vol. 13, (1683), p. 15
20. Why Data? Why Now? We are in an era of increasing data-intensive research Almost all data is now born digital Increasing amount of data generated(semi-)automatically “Consequently, increasing effort and therefore funding will necessarily be diverted to data and data management over time”Towards the Australian Data Commons(TADC), p. 4 20
21. Need for standardisation Software and hardware keep getting cheaper, wetware keeps getting more expensive Fixing data management problems is enormously labour intensive and costly “Consequently, standardisation within forms of data and simplification in the frameworks around retention, storage, access and use of data, and the elimination of differences whose resolution requires labour, must be made, if the on-going keeping and reuse of data is to remain affordable” (TADC, p. 5) 21
22. Bringing data together With more data online, more can be done Possible now to answer questions unrelated to reasons why data was collected originally Increasing focus on cross-disciplinary science “Consequently greater clarity is needed over control and access to community-funded data, and the means of aggregating, federating and accessing such data are increasingly important” (TADC, p. 5) 22
23. Why re-use data? Efficiency Validation Integrity Value for money Self-interest ands.org.au 23
24. Astronomy case study Hubble Space Telescope (HST) operating since 1990 Observations are proposed, and if accepted, data is collected and made available to the proposers – who then write a research paper Each year around 1,000 proposals are reviewed and approximately 200 are selected, for a total of 20,000 individual observations Data is stored at the Space Telescope Science Institute and made available after embargo period There are now more research papers written by “second use” of the research data, than by the use initially proposed 24
26. Cancer micro-array trial case study Piwowar, et. al., “Sharing Detailed Research Data Is Associated with Increased Citation Rate” http://www.plosone.org/article/info:doi/10.1371/journal.pone.0000308 Looked at the citation history of cancer microarray clinical trial publications Found that publicly available data was associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin ands.org.au 26
27. Climate proxy data case study ands.org.au 27 The southern limit of whaling is constrained by sea ice, and since 1931 whaling records have been collected for every whale caught Analysis indicates that the Antarctic summer sea-ice edge has moved southwards by 2.8° of latitude between the mid 1950s and early 1970s This suggests a decline in the area covered by sea ice of some 25% Nature, Vol 389, 4 Sep 1997, pp. 57-60
33. Key differentiators Nationally co-ordinated approach Institutionally-focussed engagement “helping them meet their research data ambitions” But also engaging with large nationally-funded discipline investments Bulk of funds spent outside ANDS All disciplines covered Focus on data re-use 29 ands.org.au
34. What does ANDS provide? Resourcing community infrastructure projects institutional change funding for data activities, infrastructure development Online Services ANDS infrastructure to support data registration, identification, publication, classification, etc Expertise and information consultancy recommendations, policy, and advice capability building information sharing, sharing experience Policy advocacy 30 ands.org.au 30
38. Plan Planning is important to get research institutions, managers and researchers to think about issues early, and to make sure steps aren’t missed Librarians’ role provide leadership and advice develop policies, procedures, planning guidelines get these adapted to, implemented and used in institutions advocate for, and promote, best practice examples ands.org.au 34
39. Create (or Capture) Adding metadata is done most cheaply and effectively as close as possible to point of creation/capture Treloar/Wilkinson, DOI 10.1109/eScience.2008.41 Librarians’ role advise researchers/technologists on appropriate standards and metadata schema assist with metadata content quality ands.org.au 35
40. Store Needs to be done well, on institutionally-supported system/s Librarians’ role work with researchers to identify appropriate solutions partner with it to ensure availability of solutions use metadata expertise to ameliorate poor metadata management in data store solutions raise awareness of risks of non-appropriate solutions ands.org.au 36
41. Describe Four kinds of information needed for re-use information for discovery information for determination of value information for access information for re-use Librarians’ role use metadata expertise to assist researchers and technologists with metadata standards ands.org.au 37
42. Identify Persistent identifiers for data provide level of indirection to assist with long term access DataCite consortium formed in 2009 to assign DOIs to data objects Librarians’ role advise researchers on how to cite data (and make it available for citing) lobby authors of style guides and bibliography mgt systems ands.org.au 38
43. Register Metadata about data can be made available to registries for discovery and re-use OAI-PMH may be available, DC probably less useful for data than for documents Librarians’ role help data infrastructure folks to identify and feed appropriate registries investigate use of IRs for data and maintain feeds to registries ands.org.au 39
44. Discover Range of discovery options (web search engine, metadata aggregators, discipline-specific) Librarians’ role help data store managers to identify right discovery systems to feed ands.org.au 40
45. Access Multiple options direct link to open-access data link to data store with its own access controls register/login only for open access (DANS) register/login for restricted access contact information for how to get data Librarians’ role advise researchers on IP and rights issues ensure data is managed appropriately in data store ands.org.au 41
46. Exploit Focused on the kinds of things that can be built on top of data once it is re-usable mashups data fusion cross-disciplinary discovery and visualisation Librarians’ role assist researchers to locate relevant data provide advice about 3rd party copyright/IP issues and licensing for data ands.org.au 42
47. Preserve Could have chosen Curate (but this is bigger) or Migrate (but this is a means not an end) Will require engagement with storage service providers Librarians’ role provide expertise in long-term preservation and curation of objects partner with technologists and archivists to combine all relevant expertise ands.org.au 43
49. Conclusion Data is becoming steadily more important for research Research results need to be communicated Data is the next great challenge for scholarly communication And so, it should be the next great challenge for libraries Over to you!
50. Acknowledgements and Links Thanks to CathrineHarboe-Ree (University Librarian) and Sam Searle (Data Management Coordinator), Monash University ANDS Web Site: http://ands.org.au/ ANDS Services site: http://services.ands.org.au/ Me: andrew.treloar.net ands.org.au 46
Editor's Notes
7,000 BCE
Bullae – 4,000 BCE
Tablets – 2900 BCE
Note: sign like 7 is a full stop. Numbers are in roman numerals
1665 +350 = 2015
18 years after journal founded
Actual scientific observations
Illustrated from the journaI I showed the cover of: Philosophical Transactions of the Royal Society A
Need to retype
Near impossible to liberate. Talk about ChemXSeer example if time
Too transformed
Scientist may know how to get these data but I don’t
Only journal like this I know. Anecdotal evidence that it is hard to get negative papers published
Efficiency – don’t reinvent wheelValidation – repeatability of researchIntegrity – of scholarly recordValue for Money – public money funded it, it should be available to public (ClimateGate!)Self-interest – sharing with a future self