Research Data Planning                ...for the Sciences             MSGR UpSkills Program             Jeff Christiansen ...
   Why data management   What data   Where you store it   Who owns it   How you manage it             Bonus: start wo...
Intro – who we are   Dr Jeff Christiansen jeff.christiansen@ands.org.au      Australian   National Data Service      Pr...
 Why        data management   What data   Where you store it   Who owns it   How you manage it17/09/2012             ...
Becoming aware of datamanagement in research   BSc (Hons)             Experiment 1               ?             Experiment...
Becoming aware of datamanagement in research   PhD17/09/2012
Becoming aware of datamanagement in research   PhD             CCACGCGTCCGGTGTGAGCTCTCCTTCAGCTGCTGCAGGCATTACACTCAGCTCTGCT...
Becoming aware of datamanagement in research   PhD17/09/2012               8
Becoming aware of datamanagement in research   PhD             CCACGCGTCCGGTGTGAGCTCTCCTTCAGCTGCTGCAGGCATTACACTCAGCTCTGCT...
Becoming aware of datamanagement in research   PhD             CCACGCGTCCGGTGTGAGCTCTCCTTCAGCTGCTGCAGGCATTACACTCAGCTCTGCT...
Becoming aware of datamanagement in research   PhD             CCACGCGTCCGGTGTGAGCTCTCCTTCAGCTGCTGCAGGCATTACACTCAGCTCTGCT...
Becoming aware of datamanagement in research   PhD             CCACGCGTCCGGTGTGAGCTCTCCTTCAGCTGCTGCAGGCATTACACTCAGCTCTGCT...
Becoming aware of datamanagement in research   Postdoc
Becoming aware of datamanagement in research   EMAGE Database Project Manager
Becoming aware of datamanagement in research   EMAGE Database Project Manager
Becoming aware of datamanagement in research   EMAGE Database Project Manager
Becoming aware of datamanagement in research   EMAGE Database Project Manager
Becoming aware of datamanagement in research   EMAGE Database Project Manager   Cross DB queries need to use appropriate...
Becoming aware of datamanagement in research   Being organised, having systems in place and adopting    community standar...
Data Planning & ManagingMotivators        #1 Meet your obligations                legal, ethical, funding requirements; ...
   Why data management What       data   Where you store it   Who owns it   How you manage it             Ask: resear...
What is data?   Observational data      Sensor    readings, telemetry (non-reproducible)   Experimental data      Gene...
What else is data?   Social sciences      Surveys,    statistical data   Humanities      Cultural   artefacts (video, ...
The University’s definitions   Research Data                laboratory notebooks; field notebooks; primary research data...
Group activity (15 mins)   Form groups of similar discipline      Earth sciences/forestry/botany/agriculture      Healt...
   Why data management   What data Where      you store it   Who owns it   How you manage it17/09/2012               ...
Research trends   Research Data is increasing in size        Protein crystallography              100 GB/experiment     ...
Research trends   Large scale data intensive science        “A totally new way of doing research”        New research m...
How big?1mb                      10 Gb                   1Tb(spreadsheets)           (numerical,             (simulations,...
Where to keep it?   Possibilities:      Research            group storage                Ask!      Local         compu...
Sharing17/09/2012   31
17/09/2012   32
Group activity #2 (15 mins)   Discuss      How      much data will you have?      Where      will you store it?      W...
   Why data management   What data   Where you store it Who        owns it   How you manage it17/09/2012             ...
   In collaborations, get IP right early.   Find out:      Does   the University own your data?      Can   you still s...
   IP – who claims to own it   Copyright – who has legal backing      (not   all data can be copyright)   Ethics – mor...
Group activity #3 (15 mins)   Discuss      Who   owns your data?      What   data can you share? With whom?      How  ...
   Why data management   What data   Where you store it   Who owns it How        you manage it17/09/2012             ...
University Code of Conduct forResearch17/09/2012                       39
University Policy on Management ofResearch Data and Records17/09/2012                           40
Starting your system   Consider your goals – what do you want to    get out of managing your data?   Figure out your cri...
Benefits   Find your data 3 years from now   Get more papers out of your data   Save time and stress – get organised  ...
Being more professional...   Not rocket science!        Stop and think about what data you have, what you’re doing, what...
High level viewYour data management system needs to cover:                         (Use, Transform, Update)             Cr...
A simple Data Man. System   Identify key data in your context, important stuff to keep (your Data Assets)   Find secure ...
Free Tools   jEdit – text file editor                                      (private notes, metadata and records)   local...
Data Security   2 aspects to security        Safety from damage or loss                How important is the data to you...
Data Security   Safety from damage or loss (continued)…        Make sure Backup is occurring                Essential d...
Data Security   Safety from incorrect use (unintended and malicious)…        PCI DSS - a recommendation (Payment Card In...
Read up! Google: research data toolkit http://researchdata.unimelb.edu.au ANDS guides To consider: identifiers, DOIs, ...
Group activity #4 (15 mins)   Data management checklist      Complete   section 3.117/09/2012                      51
Questions?research-data@unimelb.edu.au  researchdata.unimelb.edu.au17/09/2012   Copyright (c) 2012, VeRSI Consortium, Lyle...
Upcoming SlideShare
Loading in...5
×

UpSkills: Research Data Management for the Sciences

994
-1

Published on

A 2 hour introductory session presented to PhD students at the University of Melbourne, 13 September 2012.

Given by Steve Bennett (VeRSI) and Jeff Christiansen (ANDS).

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
994
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • This bit is pretty easy for most people. We’ll do a quick summary.
  • Trivia: some disciplines actually don’t. Philosophy, theology, law.
  • But maybe your data volumes are easy to manage. Who is towards the left? Who is in the middle? Who is towards the right?The middle can be the most awkward: too big to store online, too small to get the attention of “big data” initiatives.
  • If you’re organised – and lucky – you can deposit your data in a repository for your discipline, in the University’s Research Data Registryor in the national register: Research Data Australia.This helps increase your profile and helps potential collaborators find you.
  • Soil samples left by one research group at the Burnley Campus. With only basic labelling, how will future research groups make any sense of it?
  • Now we talk about the “who”: who owns the data, who controls it, as well as restrictions on it: privacy, confidentiality, ethics, requirements to share (or not to share).
  • Nowyou’ve thought about what data you’ve got, made some decisions about where to put it, and have considered the thorny issues of IP, it’s time to put that knowledge together systematically: a data management system.
  • Show of hands: who has a data management plan? who has heard of the Policy on the Management of Research Data and Records
  • After investigating a number of different research data life cycles, I believe this to be the simplest approach to research data record keeping that might integrate with a broad range of research practice.
  • Once you know what information your going to keep (your archive) you can start putting into place a Data Management System. Apply, where practical, to all data/records you collect.Check: everyone knows metadata?
  • UpSkills: Research Data Management for the Sciences

    1. 1. Research Data Planning ...for the Sciences MSGR UpSkills Program Jeff Christiansen & Steve Bennett 13 September 201217/09/2012 1
    2. 2.  Why data management What data Where you store it Who owns it How you manage it Bonus: start work on a data management plan!17/09/2012 2
    3. 3. Intro – who we are Dr Jeff Christiansen jeff.christiansen@ands.org.au  Australian National Data Service  Previously researcher in molecular genetics Steve Bennett: steve.bennett@versi.edu.au  Victorian e-Research Strategic Initiative  Helps researchers with systems for digital data17/09/2012 3
    4. 4.  Why data management What data Where you store it Who owns it How you manage it17/09/2012 4
    5. 5. Becoming aware of datamanagement in research BSc (Hons) Experiment 1 ? Experiment 217/09/2012 5
    6. 6. Becoming aware of datamanagement in research PhD17/09/2012
    7. 7. Becoming aware of datamanagement in research PhD CCACGCGTCCGGTGTGAGCTCTCCTTCAGCTGCTGCAGGCATTACACTCAGCTCTGCTGT CCAAGCTGCTCATGTGATTGCCCTCTAATCCATTCAGGCAAAGTGAGCTAGACTTGTTTA AGCTGCAGGTCTTATTTTGATTGTAGCAGGCTAGTGAACAGTCACAGAAGTGGTTCAAGT ATTGTGCCCCTTGGAGCTGTTATCTTTGAAAATGTGGCCGTGGCTGGAAAAGGATGCATC TGCACCAATGGCACAGTGACCAGCCAGTTGCTTAGGGGCTTAGCTGGTGGATTTGGACCT GTCTTCTGCAACCTGGGGAAAGCATAATCTACTGTGTTATTTGATAATGGAAGCGCCGTG ATCAGATCCATCCCTCTGCTTTGAATTTTCAAACAAATAATCAAGAATTTGGCTCGTGTT AAAAAAAAAAAAAAAA17/09/2012 7
    8. 8. Becoming aware of datamanagement in research PhD17/09/2012 8
    9. 9. Becoming aware of datamanagement in research PhD CCACGCGTCCGGTGTGAGCTCTCCTTCAGCTGCTGCAGGCATTACACTCAGCTCTGCTGT CCAAGCTGCTCATGTGATTGCCCTCTAATCCATTCAGGCAAAGTGAGCTAGACTTGTTTA AGCTGCAGGTCTTATTTTGATTGTAGCAGGCTAGTGAACAGTCACAGAAGTGGTTCAAGT ATTGTGCCCCTTGGAGCTGTTATCTTTGAAAATGTGGCCGTGGCTGGAAAAGGATGCATC TGCACCAATGGCACAGTGACCAGCCAGTTGCTTAGGGGCTTAGCTGGTGGATTTGGACCT GTCTTCTGCAACCTGGGGAAAGCATAATCTACTGTGTTATTTGATAATGGAAGCGCCGTG ATCAGATCCATCCCTCTGCTTTGAATTTTCAAACAAATAATCAAGAATTTGGCTCGTGTT AAAAAAAAAAAAAAAA17/09/2012 9
    10. 10. Becoming aware of datamanagement in research PhD CCACGCGTCCGGTGTGAGCTCTCCTTCAGCTGCTGCAGGCATTACACTCAGCTCTGCTGT CCAAGCTGCTCATGTGATTGCCCTCTAATCCATTCAGGCAAAGTGAGCTAGACTTGTTTA AGCTGCAGGTCTTATTTTGATTGTAGCAGGCTAGTGAACAGTCACAGAAGTGGTTCAAGT ATTGTGCCCCTTGGAGCTGTTATCTTTGAAAATGTGGCCGTGGCTGGAAAAGGATGCATC TGCACCAATGGCACAGTGACCAGCCAGTTGCTTAGGGGCTTAGCTGGTGGATTTGGACCT GTCTTCTGCAACCTGGGGAAAGCATAATCTACTGTGTTATTTGATAATGGAAGCGCCGTG ATCAGATCCATCCCTCTGCTTTGAATTTTCAAACAAATAATCAAGAATTTGGCTCGTGTT AAAAAAAAAAAAAAAA17/09/2012 10
    11. 11. Becoming aware of datamanagement in research PhD CCACGCGTCCGGTGTGAGCTCTCCTTCAGCTGCTGCAGGCATTACACTCAGCTCTGCTGT CCAAGCTGCTCATGTGATTGCCCTCTAATCCATTCAGGCAAAGTGAGCTAGACTTGTTTA AGCTGCAGGTCTTATTTTGATTGTAGCAGGCTAGTGAACAGTCACAGAAGTGGTTCAAGT ATTGTGCCCCTTGGAGCTGTTATCTTTGAAAATGTGGCCGTGGCTGGAAAAGGATGCATC TGCACCAATGGCACAGTGACCAGCCAGTTGCTTAGGGGCTTAGCTGGTGGATTTGGACCT GTCTTCTGCAACCTGGGGAAAGCATAATCTACTGTGTTATTTGATAATGGAAGCGCCGTG ATCAGATCCATCCCTCTGCTTTGAATTTTCAAACAAATAATCAAGAATTTGGCTCGTGTT AAAAAAAAAAAAAAAA17/09/2012 11
    12. 12. Becoming aware of datamanagement in research PhD CCACGCGTCCGGTGTGAGCTCTCCTTCAGCTGCTGCAGGCATTACACTCAGCTCTGCTGT CCAAGCTGCTCATGTGATTGCCCTCTAATCCATTCAGGCAAAGTGAGCTAGACTTGTTTA AGCTGCAGGTCTTATTTTGATTGTAGCAGGCTAGTGAACAGTCACAGAAGTGGTTCAAGT ATTGTGCCCCTTGGAGCTGTTATCTTTGAAAATGTGGCCGTGGCTGGAAAAGGATGCATC TGCACCAATGGCACAGTGACCAGCCAGTTGCTTAGGGGCTTAGCTGGTGGATTTGGACCT GTCTTCTGCAACCTGGGGAAAGCATAATCTACTGTGTTATTTGATAATGGAAGCGCCGTG ATCAGATCCATCCCTCTGCTTTGAATTTTCAAACAAATAATCAAGAATTTGGCTCGTGTT AAAAAAAAAAAAAAAA17/09/2012 12
    13. 13. Becoming aware of datamanagement in research Postdoc
    14. 14. Becoming aware of datamanagement in research EMAGE Database Project Manager
    15. 15. Becoming aware of datamanagement in research EMAGE Database Project Manager
    16. 16. Becoming aware of datamanagement in research EMAGE Database Project Manager
    17. 17. Becoming aware of datamanagement in research EMAGE Database Project Manager
    18. 18. Becoming aware of datamanagement in research EMAGE Database Project Manager Cross DB queries need to use appropriate descriptors, not just free text E.g. Gene name identifiers
    19. 19. Becoming aware of datamanagement in research Being organised, having systems in place and adopting community standards are all helpful in data management. Think about what you will be required to do when publishing. There are obligations for having data available for others post publication. It’s useful to have your data organised so you can collaborate with others easily. What will happen to your data when you leave the lab? Your supervisor would like to know what’s what/where.
    20. 20. Data Planning & ManagingMotivators  #1 Meet your obligations  legal, ethical, funding requirements; uni, department, group policies  Find out now – avoid hassle later (ask research-data@unimelb.edu.au)  #2 Make your life easier  a data management system to make your research work  a data management plan to save time  keeping data, finding stuff again, labelling, security  sharing & collaborating  #3 Helping your career  being a professional researcher  data – your assets and records – finding, understanding data in years to come  contributing to global research community  manage your data now, help your future self.17/09/2012 20
    21. 21.  Why data management What data Where you store it Who owns it How you manage it Ask: research-data@unimelb.edu.au17/09/2012 21
    22. 22. What is data? Observational data  Sensor readings, telemetry (non-reproducible) Experimental data  Gene sequences, chromatograms (reproducible, but expensive) Simulation data  Climate models (model the most important thing) Derived/compiled data  Compiled database (reproducible but expensive)17/09/2012 22
    23. 23. What else is data? Social sciences  Surveys, statistical data Humanities  Cultural artefacts (video, photos, sound…) Physical samples  Soil, biological, water, archeological… Does anyone here not have data?17/09/2012 23
    24. 24. The University’s definitions Research Data  laboratory notebooks; field notebooks; primary research data (hardcopy or in computer); questionnaires; audiotapes; videotapes; models; photographs; films; test responses; slides; artefacts; specimens; samples Research Records  Includes correspondence (electronic mail and paper-based correspondence); project files; grant applications; ethics applications; technical reports; research reports; master lists; signed consent forms; and information sheets for research participants Administrative Records (Research Office, Central Records)  Includes contracts and agreements, patents, licences, grants, intellectual property and trademarks, policies, ethics, research project files, reports, publications What is often included as “Research Data”: = data + records + copies (physical & digital) = stuff you used and/or created17/09/2012 24
    25. 25. Group activity (15 mins) Form groups of similar discipline  Earth sciences/forestry/botany/agriculture  Health/medical biology/physio/social work  Engineering/computer science/linguistics Discuss:  What kind of data do you collect?  How do you get it? Your data management checklist:  Section 1.117/09/2012 25
    26. 26.  Why data management What data Where you store it Who owns it How you manage it17/09/2012 26
    27. 27. Research trends Research Data is increasing in size  Protein crystallography 100 GB/experiment  Gene sequencing 1,000 GB/day  High-energy physics 10,000,000s GB/year  Astronomy (SKA) 1,000,000,000 GB/day Research Collaborations are increasing  Human Genome project (1990-2003)  113 people, 20 orgs  Belle collaboration (1994-..)  ~370 people, 60 inst., 14 countries  ATLAS collaboration @ LHC CERN (1994-2020+)  ~2500 people, 169 inst., 37 countries Research Data is increasingly digital  Wonderful opportunities for reuse, sharing, collaboration, analysis  Data science (4th paradigm)  “eResearch”!17/09/2012 27
    28. 28. Research trends Large scale data intensive science  “A totally new way of doing research”  New research methods, new skills, therefore new training needed New skills...  Specialists – in both technology and research  Informatics – dealing with data from collection through analysis  Data Management and Planning – collecting, maintaining, sharing data Everyone!17/09/2012 28
    29. 29. How big?1mb 10 Gb 1Tb(spreadsheets) (numerical, (simulations, synchrotron) 1Pb video) Easy! Awkward Easy? (Probably already solved) Limit of Google Drive, DropBox…17/09/2012 29
    30. 30. Where to keep it? Possibilities:  Research group storage  Ask!  Local computer  Backups crucial. Sharing hard. Disaster looms.  Cloud (Dropbox, Google Drive)  Check security, legals. How to archive?  Ask research-data@unimelb.edu.au17/09/2012 30
    31. 31. Sharing17/09/2012 31
    32. 32. 17/09/2012 32
    33. 33. Group activity #2 (15 mins) Discuss  How much data will you have?  Where will you store it?  What data formats? Data management checklist  Complete section 2.3 & 2.4  If non-digital: 2.1, 2.217/09/2012 33
    34. 34.  Why data management What data Where you store it Who owns it How you manage it17/09/2012 34
    35. 35.  In collaborations, get IP right early. Find out:  Does the University own your data?  Can you still share it?  Restrictions?  Licences?17/09/2012 35
    36. 36.  IP – who claims to own it Copyright – who has legal backing  (not all data can be copyright) Ethics – more rules you agreed to  Must you keep the data private?  Must you share it? Privacy – can you de-identify the data?17/09/2012 36
    37. 37. Group activity #3 (15 mins) Discuss  Who owns your data?  What data can you share? With whom?  How will you protect confidential information? Data management checklist  Complete section 1.317/09/2012 37
    38. 38.  Why data management What data Where you store it Who owns it How you manage it17/09/2012 38
    39. 39. University Code of Conduct forResearch17/09/2012 39
    40. 40. University Policy on Management ofResearch Data and Records17/09/2012 40
    41. 41. Starting your system Consider your goals – what do you want to get out of managing your data? Figure out your criteria for keeping data Picture your data three years from now Consider the metadata you want to collect to document your datasets17/09/2012 41
    42. 42. Benefits Find your data 3 years from now Get more papers out of your data Save time and stress – get organised Share with collaborators Some journals require data submission17/09/2012 42
    43. 43. Being more professional... Not rocket science!  Stop and think about what data you have, what you’re doing, what you should be doing Some scary facts:  Microfilm, non-acidic paper last 100+ years  magnetic media lasts 10+ years  optical media lasts 20+ years  2-10% of hard drives fail every year  software & hardware can outdate quickly Scary stories:  US study 100’s charges “research misconduct” 40% avoided by better data management!  UniMelb ~20 cases research misconduct 2008. Most involved students. All needed good records!  Climategate scandal, UK – FOI Burroughs 1977 – B 9495 Proper Planning & Management is needed!!! Magnetic Tape Subsystem17/09/2012 43
    44. 44. High level viewYour data management system needs to cover: (Use, Transform, Update) Create, Keep, Capture, Transfer, Describe Destroy Store, Secure, Preserve (National Archives)17/09/2012 44
    45. 45. A simple Data Man. System Identify key data in your context, important stuff to keep (your Data Assets) Find secure places to keep physical & digital Records + Data (filing cabinet, department shared drive) – backups are essential Where and when should there be checks on your data (sanity checks, quality control, standards) File your data and records into logical divisions, say activities, projects, or pieces of work  eg. folders /DeptShare/johnsmith/Records/ProteinABC Investigation  Don’t break things down too much, makes things harder to find! Have a consistent file naming convention:  perhaps: ActivityOrContents-LocationOrPerson-CreateDate-Id-Description.ext  eg. “ProteinABC-LJW-20100409-0001 Raw data from instrument.dat” Keep good metadata (notes, records) on how you captured your data, particularly for physical records  Descriptions of collections or files – Structured text files good enough  eg. FileOrCollectionName-metadata.txt  On other things, entities that are not files – Structured text files or spreadsheets  Have a good labeling/ID/coding system  Perhaps keep a registry (spreadsheet will do; IDs, names, location, basic metadata) Find the right balance in digitising physical stuff (easy and quick)  Digital is easy to keep/transfer/search if stored properly. However, digitising/scanning everything can be time consuming and without good descriptions may not be useful.  Link digital notes/metadata to physical stuff (IDs, names, labels, codes, location)  Have some basic digital representations or notes of important physical stuff 45
    46. 46. Free Tools jEdit – text file editor (private notes, metadata and records) local disk + file share + Cobian Backup (private project records, data) Google Desktop (file and email search) Zotero (reference material) (EndNote is Uni default) EVO & Skype & Google chat (video/tele/chat communication)  http://evo.arcs.org.au/ Sakai@Melbourne (project workspace)  https://sakai.unimelb.edu.au/ see Info Skills classes Google docs + Sites (collaborative editing) on EndNote, Google groups (email list) UpSkills 29 June on VC research data storage, a tricky one…  use local storage in preference, ask around  DropBox, Google Drive, Microsoft SkyDrive, box.com… too many others to list, heaps on the web…  See Digital Research Tools (DiRT) wiki for a huge list http://digitalresearchtools.pbworks.com/  Check with your supervisor,17/09/2012 46
    47. 47. Data Security 2 aspects to security  Safety from damage or loss  How important is the data to you?  Safety from incorrect use  What are the possible consequences? Safety from damage or loss (unintended and intentional)…  What’sacceptable loss (safety can cost, use up time)  Backups (data, software, system)  How often (hourly, daily, weekly, monthly, manually, automated)?  How many and where (onsite, offsite, both, multiple)?  Departmental storage? Probably backed up already!  Disaster Recovery  Quality hardware, multiple/spare servers, spare disk drives,  Operating System and Applications image backups  (talk with someone technical, your local IT guys)17/09/2012 47
    48. 48. Data Security Safety from damage or loss (continued)…  Make sure Backup is occurring  Essential data and records... “Your Archive”  Frequency should depend on how often your data changes  Incremental backups are essential. Replication IS NOT SAFE!!!  Keep some copies (one?) offsite.  Database backups should use database tools (mysqldump, pg_dump etc.)  Departmental storage is best... probably backed up already!  Worst case... DIY, use external hard drives or remote storage  Seek advice on software  for Windows I use... Cobian Backup, DriveImage XML  for Linux I use... rsync (see http://rsync.samba.org/examples.html )  for Mac there is... Time Machine  (talk with someone technical, your local IT guys)17/09/2012 48
    49. 49. Data Security Safety from incorrect use (unintended and malicious)…  PCI DSS - a recommendation (Payment Card Industry Data Security Standard)  eg. google for: “nacubo.org payment card data security”  12 requirements that are good practice (first 10 are the basics)  10 IT basics…  Firewall servers  Do not use default usernames/password  Physically protected stored data (lock up servers, disk, tape, source material)  Use encrypted transmission over internet (VPN, SSL, SSH, GridFTP, S/MIME email)  Update antivirus/antimalware software regularly  Use secure and trusted applications  Restrict access to sensitive data (tighter control, or put it somewhere else)  Assign unique IDs for each user  Record and monitor all access to data  Plus some good practice…  Don’t retain sensitive data  Or encrypt sensitive information17/09/2012 49
    50. 50. Read up! Google: research data toolkit http://researchdata.unimelb.edu.au ANDS guides To consider: identifiers, DOIs, archival, security, licensing, metadata formats, ontologies, controlled vocabularies, definition of “collection”, data reuse, metadata stores…!17/09/2012 50
    51. 51. Group activity #4 (15 mins) Data management checklist  Complete section 3.117/09/2012 51
    52. 52. Questions?research-data@unimelb.edu.au researchdata.unimelb.edu.au17/09/2012 Copyright (c) 2012, VeRSI Consortium, Lyle Winton , Steve Bennett, Jeff Christiansen 52
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×