The ANU Data Commons

    enabling e-research
What does it mean


● Building a solution to store research data

● data can be looked after
● found again and shared
● searched for
● can be reused

● a research data management solution aka a data repository
But don't we have a repository already?


We do.

The digital collections site allows researchers to deposit copies of journal
papers. The current project aims to do the same for research data.

Each dataset will have a unique and persistent identifier as an aid to
citation.
Yes but why ?


Increasingly researchers overseas are being required to make their data
available in order to support their mainstream publications.

In the US the NSF now requires researchers to submit a data
management plan as part of their grant submission.

In the UK, JISC is funding a range of initiatives to build a set of data
archives both within institutions and across disciplines
The Galathea story ...
 A research Institute in Denmark mounted a set of three
 oceanographic surveys, in 1845, 1950 and 2007.

 We stiil have all the 1845 and 1950 data as it was written in
 paper lab books

 Most of the 2007 data has been lost as it was stored on
 people's laptops, not in a central archive ...
The ANU Data Commons


Aim is to put in place the building blocks of a data management strategy
for the ANU

 ● A repository
    ○ based on fedora-commons
    ○ uses standard technologies
 ● A supporting policy framework
The ANU Data Commons


● Built on the back of two ANDS funded projects
   ○ Seeding the Commons
        ■ Identify existing datasets, including orphan and legacy datasets
          and publish descriptions of them in Research Data Australia
        ■ Descriptions effectively electronic catalogue cards
   ○ Data Capture
        ■ Build workflows and mechanisms in Earth Sciences, Optical
          Astronomy, Phenomics, and Digital Humanities to capture
          research data as it is generated and publish it
Seeding the Commons


Aim to be as self service as possible - built around concept of self deposit:

 ● user identifies themselves (logs in)
 ● creates a project description as free text and other informaton
 ● uploads data (aim is to be as simple as YouTube or Flickr)
 ● record is published to Research Data Australia
 ● data can be searched for and found again

Dataset owners can modify object they have created, for example to add a
second results file from a re run of a particular experiment.
Data Capture Deposit model


Aim to be as self service as possible - set up automated capture
mechanisms for instruments and at the same time leverage off Seeding
the Commons architecture

 ● User creates a project
 ● Enters information about the project including links to related
   documents
 ● Uploads data
 ● Publishes data

Dataset owners can modify object they have created, for example to add a
second results file from a re run of a particular experiment.
Data Capture


Basically more of the same

 ● data goes to short term store
 ● is processed
 ● is uploaded

and it can be automated via
a quasi drop box solution !
Data Capture - not just data
Data needs context.

Context is metadata, and includes things like the instrument
configuration, settings and so on.

Some formats eg WAV for audio and FITS contain a lot of
this information in the file by default - embedded metadata
What does it mean for me?


 ● Able to deposit existing research data sets
    ○ know that they are stored securely
    ○ can be read again - immune to format and media changes
    ○ most important - can be reused

Access can be
 ● Open - anyone can download and access it
 ● Embargoed - access is restricted until a particular date
 ● Restricted - people have to ask to access the data. Valid reasons
   include cultural, ethical and commercial issues
OK I like this - can I use it?


Not yet....

today we have an internal alpha (soon to be beta) solution

 ● can create object record
 ● import object to repository
 ● search for object in repository
 ● create collection record in RDA

Aim is to have public beta mid 2012

Aim to also publish collection records to other discipline specific
repositories where appropriate
I have some data ...


If you have some data you would be interested in depositing please email
me - we'd be interested.


                        doug.moncur@anu.edu.au

(We can also talk about legacy format conversion if that helps)
These projects are supported by the Australian National Data Service
                              (ANDS)




 ANDS is supported by the Australian Government through the National
Collaborative Research Infrastructure Strategy Program and the Education
              Investment Fund (EIF) Super Science Initiative
Thank you !

Introducingthe anu datacommons

  • 1.
    The ANU DataCommons enabling e-research
  • 2.
    What does itmean ● Building a solution to store research data ● data can be looked after ● found again and shared ● searched for ● can be reused ● a research data management solution aka a data repository
  • 3.
    But don't wehave a repository already? We do. The digital collections site allows researchers to deposit copies of journal papers. The current project aims to do the same for research data. Each dataset will have a unique and persistent identifier as an aid to citation.
  • 4.
    Yes but why? Increasingly researchers overseas are being required to make their data available in order to support their mainstream publications. In the US the NSF now requires researchers to submit a data management plan as part of their grant submission. In the UK, JISC is funding a range of initiatives to build a set of data archives both within institutions and across disciplines
  • 5.
    The Galathea story... A research Institute in Denmark mounted a set of three oceanographic surveys, in 1845, 1950 and 2007. We stiil have all the 1845 and 1950 data as it was written in paper lab books Most of the 2007 data has been lost as it was stored on people's laptops, not in a central archive ...
  • 6.
    The ANU DataCommons Aim is to put in place the building blocks of a data management strategy for the ANU ● A repository ○ based on fedora-commons ○ uses standard technologies ● A supporting policy framework
  • 7.
    The ANU DataCommons ● Built on the back of two ANDS funded projects ○ Seeding the Commons ■ Identify existing datasets, including orphan and legacy datasets and publish descriptions of them in Research Data Australia ■ Descriptions effectively electronic catalogue cards ○ Data Capture ■ Build workflows and mechanisms in Earth Sciences, Optical Astronomy, Phenomics, and Digital Humanities to capture research data as it is generated and publish it
  • 8.
    Seeding the Commons Aimto be as self service as possible - built around concept of self deposit: ● user identifies themselves (logs in) ● creates a project description as free text and other informaton ● uploads data (aim is to be as simple as YouTube or Flickr) ● record is published to Research Data Australia ● data can be searched for and found again Dataset owners can modify object they have created, for example to add a second results file from a re run of a particular experiment.
  • 9.
    Data Capture Depositmodel Aim to be as self service as possible - set up automated capture mechanisms for instruments and at the same time leverage off Seeding the Commons architecture ● User creates a project ● Enters information about the project including links to related documents ● Uploads data ● Publishes data Dataset owners can modify object they have created, for example to add a second results file from a re run of a particular experiment.
  • 10.
    Data Capture Basically moreof the same ● data goes to short term store ● is processed ● is uploaded and it can be automated via a quasi drop box solution !
  • 11.
    Data Capture -not just data Data needs context. Context is metadata, and includes things like the instrument configuration, settings and so on. Some formats eg WAV for audio and FITS contain a lot of this information in the file by default - embedded metadata
  • 12.
    What does itmean for me? ● Able to deposit existing research data sets ○ know that they are stored securely ○ can be read again - immune to format and media changes ○ most important - can be reused Access can be ● Open - anyone can download and access it ● Embargoed - access is restricted until a particular date ● Restricted - people have to ask to access the data. Valid reasons include cultural, ethical and commercial issues
  • 13.
    OK I likethis - can I use it? Not yet.... today we have an internal alpha (soon to be beta) solution ● can create object record ● import object to repository ● search for object in repository ● create collection record in RDA Aim is to have public beta mid 2012 Aim to also publish collection records to other discipline specific repositories where appropriate
  • 14.
    I have somedata ... If you have some data you would be interested in depositing please email me - we'd be interested. doug.moncur@anu.edu.au (We can also talk about legacy format conversion if that helps)
  • 15.
    These projects aresupported by the Australian National Data Service (ANDS) ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative
  • 16.