Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

The Past, Present and Future of data

on

  • 911 views

Opening keynote for NZ eResearch Symposium 2010. http://www.eresearch.org.nz/. Discusses the past, present and future of data.

Opening keynote for NZ eResearch Symposium 2010. http://www.eresearch.org.nz/. Discusses the past, present and future of data.

Statistics

Views

Total Views
911
Views on SlideShare
911
Embed Views
0

Actions

Likes
0
Downloads
14
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Start with a question.What is the different between these?
  • And these?
  • Thanks to machines like these, we now know that at genetic level
  • It’s only 1% of this.But that’s just genetics. What about culture?
  • We now know that a range of species (including crows!) are tool users, and they pass on particular techniquesThink of this as a chimpanzee tutorial…
  • But this sort of transmission of culture doesn’t transcend either time or space. You need to be in the same time and place to learn.
  • For our species one of the big breakthroughs was the development of language. This now allowed for easier transmission than show and tell, but still didn’t address the time and space problem.So, where am I going with this? To data of course…
  • These are data from 7,000 BCEEach token is a particular valueInitially they were used on their own (a bit like coins today)
  • Then around4,000 BCE we see the emergence of these: bullaeExplain: Seal (identify), signs for what is traded, contents as tokens.Essentially the first written contracts
  • To avoid having to literally break the contract to see what numbers it contained, the next step was to provide a representation on the outside.Then in 2900 BCE some genius made the crucial conceptual leap: if we have the numbers in symbolic form on the outside, do we need them in physical form on the inside? Answer: No, and so we get clay tablets. And those strange marks next to the numbers? The very beginnings of pictographic writing…
  • And then very quickly, the first libraries.I don’t have time to cover the entire history of writing, but just want to make the point that writing came from the need to capture and manage data. Or to put it another way, much of what we regard as civilisation started with accounting. Any accounting graduates in the audience?
  • So, let’s fast forward about 45 centuries to the present and look at the state of data in scholarly communication. Unfortunately, it’s inconvenient, imprisoned, invisible, inaccessible, and ignored
  • Need to retype
  • Near impossible to liberate. Talk about ChemXSeer example and DataThief Java application
  • Too transformed
  • Discipline scientist may know how to get these data but I don’t
  • Only journal like this I know. Anecdotal evidence that it is hard to get negative papers publishedAll of the above problems are really about difficulties in getting to the data so it can be re-used.By why would you want to re-use data?
  • NOTE: Some of these arguments are at individual, national, global levelEfficiency – don’t reinvent wheelValidation – repeatability of researchIntegrity – of scholarly recordValue for Money – public money funded it, it should be available to public (ClimateGate!)Self-interest – sharing with a future self, greater visibilitySo, what are some good stories around data sharing?
  • Hubble Space Telescope (HST) operating since 1990Observations are proposed, and if accepted, data is collected and made available to the proposers – who then write a research paperEach year around 1,000 proposals are reviewed and approximately 200 are selected, for a total of 20,000 individual observationsData is stored at the Space Telescope Science Institute and made available after embargo period
  • GO = General Observation programAR = Archival Reuse
  • From Wikipedia: “A DNA microarray is a multiplex technology used in molecular biology. It consists of an arrayed series of thousands of microscopic spots of DNAoligonucleotides, called features, each containing picomoles (10−12 moles) of a specific DNA sequence, known as probes (or reporters). These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample (called target) under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. Since an array can contain tens of thousands of probes, a microarray experiment can accomplish many genetic tests in parallel. Therefore arrays have dramatically accelerated many types of investigation.”
  • HeatherPiwowarlooked at the citation history of cancer microarray clinical trial publicationsFound that publicly available data was associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin
  • Climate researchers need to be able to run their models foreward (forecasting) and backwards (backcasting) to check they are correct.The southern limit of whaling is constrained by sea ice, and since 1931 whaling records have been collected for every whale caught. This paper took these records and used this.His analysis indicates that the Antarctic summer sea-ice edge has moved southwards by 2.8° of latitude between the mid 1950s and early 1970sThis suggests a decline in the area covered by sea ice of some 25%
  • Number of initiatives around the world working to do a better job on data: NSF DataNet (Bill at end of conference), JISC Managing Research Data, NL SURF/DANS
  • I want to talk about one from New Zealand’s West Island…
  • So, how are we doing this? We’ve got a whole series of programs of activity, but one way to visualise the infrastructure that is needed is to distinguish…
  • The current picture for Australian (and other) research dataFrom…
  • The components that ANDS is adding to produce the ARDC
  • So, if that is a partial view of the present (Bill will tell you more tomorrow, I’m sure), what about the future?
  • Talk about ANDS was a founding member of DataCite. TIB in Germany was another and is providing the data DOIs for this example
  • So, to conclude:The need to manage data is not just a modern problem – it drove crucial developments in Western civilisation nearly 9,000 years agoFor most of the last two hundred years, data has largely been the neglected stepdaughter in scholarly communication, eclipsed by its more glamorous sister the journal article. And I’ve reviewed some of the attendant problems arising from thisTwo things are driving a change in this approach: the shift to more data-intensive research and growth in information systems that can better manage and make available the underlying dataI showed you some of the bits of the future that are starting to appear – forerunners of the way the research world might look for many disciplines in the next 10-20 yearsOr to put it another way, data is what helped to make it possible to go from this <click.
  • Thanks to all those who made their images available under CC licensing for re-use[click]
  • And thank you for the opportunity to speak to you this morning.

The Past, Present and Future of data The Past, Present and Future of data Presentation Transcript

  • The Past, Present and Future of Data
    Dr Andrew Treloar
    Director of Technology
    Australian National Data Service
  • THE Past
  • View slide
  • View slide
  • The Present
  • Inconvenient data
    DOI: 10.1098/rsta.2005.1569
  • Imprisoned data
    DOI 10.1098/rsta.2006.1793
  • Invisible data
    DOI 10.1098/rsta.2006.1793
  • Inaccessible data
  • Ignored negative data
  • Why re-use data?
    Efficiency
    Validation
    Integrity
    Value for money
    Self-interest
  • Cancer Micro-array trials
  • Piwowar, et. al., “Sharing Detailed Research Data Is Associated with Increased Citation Rate”
    http://www.plosone.org/article/info:doi/10.1371/journal.pone.0000308
  • Climate Archæology
    de la Mare, William K., 1997, "Abrupt mid-twentieth-century decline in Antarctic sea-ice extent from whaling records", Nature, vol.389, pp 87-90, 4 Sept 97
  • ANDS
  • Australian National Data Service (ANDS)
    • An initiative of the Australian Government being conducted as part of the National Collaborative Research Infrastructure Strategy ($A24M) and the Super Science Initiative ($A48M)
    • A collaboration between Monash University, the Australian National University and CSIRO
    • Nearly 50staff, funded to mid 2013
    • More researchers re-using more data more often
    • Data as a first-class object
    ands.org.au
    28
  • Unmanaged
  • Managed
  • Disconnected
  • Connected
  • Invisible
  • Findable
  • Single use
  • Reusable
  • The Future
  • “The future is already here – it’s just not very evenly distributed”
    William Gibson
  • Create: Open Notebook Science
    http://usefulchem.wikispaces.com/malaria
    http://www.ourexperiment.org/racemic_pzq
    http://www.infotoday.com/it/sep10/Poynder.shtml
  • Describe, Store: PIC Cloud Demo
    http://www.polarcommons.org/
    http://piccloud.arcs.org.au/piccloud/
  • Discover, Access: RDA Demo
    http://www.google.com/
    http://services.ands.org.au/pages/
  • Identify: Journal Demo
    http://dx.doi.org/10.1016/j.yqres.2010.04.004
    “Elsevier and PANGAEA (Publishing Network for Geoscientific & Environmental Data) announced their next step in interconnecting the diverse elements of scientific research. Elsevier articles at ScienceDirect are now enriched with graphical information linking to associated research data sets that are deposited at PANGAEA. This enrichment functionality offers a blueprint of how Elsevier would like to work with data set repositories all over the world [emphasis added].”
    http://newsbreaks.infotoday.com/Digest/Elsevier-Enriches-Articles-With-Research-Data-Sets-69148.asp
  • Conclusion
  • 2001
    http://www.youtube.com/watch?annotation_id=annotation_701469&v=TSW69UwxKbU&feature=iv
    5:04 through 6:00
  • Acknowledgements
    http://www.flickr.com/photos/shashwat/1215492062
    http://www.flickr.com/photos/carbonnyc/3160378286
    http://jpkc.fimmu.com/sfzx/new/Upload/20091024163545503.JPG
    NASA/courtesy of nasaimages.org
    http://www.pri.kyoto-u.ac.jp/press/20090716/bossou_chimpanzee_stone-tool_use.jpg
    http://www.flickr.com/photos/13238706@N00/136830103/
    http://www.flickr.com/photos/mplemmon/215790292/
    http://www.utexas.edu/features/archive/2003/vase.html
    http://www.flickr.com/photos/steveharris/84026155/
    Clip of 2001 shown in accordance with section 47(2) of Copyright Act 1994 No 143 (as at 07 July 2010)
    http://legislation.govt.nz/act/public/1994/0143/latest/DLM345972.html#DLM345972
  • Questions/Links
    ands.org.au
    services.ands.org.au
    andrew.treloar@ands.org.au
    andrew.treloar.net