Phenocams hanigan-20140309

255 views

Published on

ACEAS_Hanigan_phenocams

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
255
On SlideShare
0
From Embeds
0
Number of Embeds
77
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Phenocams hanigan-20140309

  1. 1. Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis. Ivan Hanigan and Marco Fahmi Australian SuperSite Network (ASN) and Long Term Ecological Research Network (LTERN) ACEAS Phenocam Workshop 2014-03-11 Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 1 / 21
  2. 2. Topic 1 Introduction 2 What I want out of this workshop 3 Storage hosting of the data 4 Discoverability 5 Accessibility 6 Analysis 7 Conclusion Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 2 / 21
  3. 3. Introduction Four key challenges to working with large data collections: Storage (Big Data, resilience to disasters, future proofing) Discoverability (exposing metadata, indexing, standard schemas) Accessibility (who is accessing what? Is it collaborative?) Analysis (workflow and provenance) Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 3 / 21
  4. 4. Phenocams Managing phenocam data is an exemplar of these issues Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 4 / 21
  5. 5. Topic 1 Introduction 2 What I want out of this workshop 3 Storage hosting of the data 4 Discoverability 5 Accessibility 6 Analysis 7 Conclusion Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 5 / 21
  6. 6. What I want out of this workshop My work as a Data Manager / Data Analyst at ASN and LTERN Toward a better set of descriptions of the business requirements for each of these goals Building systems that address these challenges. Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 6 / 21
  7. 7. Topic 1 Introduction 2 What I want out of this workshop 3 Storage hosting of the data 4 Discoverability 5 Accessibility 6 Analysis 7 Conclusion Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 7 / 21
  8. 8. The Data Deluge “The next five years will produce more research data than has been produced in all of previous human history.” The great data explosion April 29, 2009 http://www.theaustralian.news.com.au/story/0,25197,25400306- 12332,00.html Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 8 / 21
  9. 9. Australian Research Cloud IT Infrastructure available is unprecedented Often cheap or free Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 9 / 21
  10. 10. Storage hosting of the data There are technical challenges of storing (as well as uploading/downloading) data. Sustainability and future-proofing of the storage is a logistical challenge. Questions arise such as should your store be the only location of the data or one of several mirrors? Is storage “indefinitely” really possible? Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 10 / 21
  11. 11. Topic 1 Introduction 2 What I want out of this workshop 3 Storage hosting of the data 4 Discoverability 5 Accessibility 6 Analysis 7 Conclusion Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 11 / 21
  12. 12. Data and metadata standards With the gathered expertise, it will be useful to advocate: Conventions over Configuration appropriate syntax and semantics for Phenocam data with well considered conceptual frameworks for grouping datasets appropriate compatibility/compliance with other standards. Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 12 / 21
  13. 13. Topic 1 Introduction 2 What I want out of this workshop 3 Storage hosting of the data 4 Discoverability 5 Accessibility 6 Analysis 7 Conclusion Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 13 / 21
  14. 14. Ownership, Sharing and Anonymous Re-use There will be some contractual obligations about sharing and publishing data (or not!) as well as a general inclination of the group of what/when to share. There is also the appropriate licensing scheme governing this, embargoes and controlled access. Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 14 / 21
  15. 15. Ethics and Trust Trust is needed then by the data provider to allay concerns over the re-use of data Collaborative and respectful use should be expected of data users. Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 15 / 21
  16. 16. Topic 1 Introduction 2 What I want out of this workshop 3 Storage hosting of the data 4 Discoverability 5 Accessibility 6 Analysis 7 Conclusion Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 16 / 21
  17. 17. Analysis Workflow management because users will need to have tools or tech savvy to do something interesting and useful with the data. The paradigm of “Bringing the Code to the Data” rather than “Taking the Data to the Code” Uses remote supercomputers with very large storage and compute capacity However it often feels like to be able to access and use a supercomputer one needs to be as skillful as a “Super Scientist” How to support ordinary users wanting “Super” analyses? Provenance tracking of analysis outputs to ensure reproducibility Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 17 / 21
  18. 18. Appropriate analysis There is an implicit belief of ‘big data’ advocates that answers to difficult environmental questions can be found through sharing data But Ecology is inherently about understanding local patterns and processes, and often hard-won, field-based understanding is essential to help interpret the results of data analyses There might be a need for support in study designs from those familiar with the ecosystem Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 18 / 21
  19. 19. Security against malicious mis-use A data analysis server is geared to executing software code Analyses may require custom code to be written, or installation of third-party software from unknown developers There is a risk that such a Virtual Lab could be the victim of a malicious attack. Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 19 / 21
  20. 20. Topic 1 Introduction 2 What I want out of this workshop 3 Storage hosting of the data 4 Discoverability 5 Accessibility 6 Analysis 7 Conclusion Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 20 / 21
  21. 21. Conclusion These challenges are not trivial We suspect the answers to many of these challenges will rely on outsourcing much of the hardware and software as possible to shift the responsibility of upkeep and sustainability on someone else’s shoulders and let the scientists focus on their science. Ivan Hanigan and Marco Fahmi (ASN-LTERN)Addressing the Key Challenges of Storage, Discoverability, Accessibility and Analysis.2014-03-11 21 / 21

×