Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Management for Grown Ups

487 views

Published on

Terrell Russell
RENCI, iRODS Consortium - Senior Data Scientist
Tuesday, Oct 20th
2:40 pm - Case Study/Demo

Published in: Technology

Data Management for Grown Ups

  1. 1. Data Management For Grown Ups Terrell Russell, Ph.D. @terrellrussell Senior Data Scientist, iRODS Consortium Renaissance Computing Institute (RENCI), UNC-Chapel Hill
  2. 2. iRODS Consortium was created to ensure the sustainability of iRODS and to further its adoption and continued evolution. To this end, the Consortium works to standardize the definition, development, and release of iRODS-based data middleware technologies, evangelize iRODS among potential users, promote new advances in iRODS, and expand the adoption of iRODS-based data middleware technologies through the development, release, and support of an open-source, mission-critical, production-level distribution of iRODS. Current Members: RENCI, DICE, Seagate, DDN, Novartis, IBM, Complete Genomics, Wellcome Trust Sanger Institute, UCL, Cleversafe, EMC, and the NASA Atmospheric Science Data Center The iRODS Consortium
  3. 3. Data Management Multiple pieces Multiple meanings Multiple goals
  4. 4. Data Management Access - Authentication, Authorization, Revocation
  5. 5. Data Management Access Description - Standards for discovery, compliance
  6. 6. Data Management Access Description Integrity - Confidence that nothing has changed
  7. 7. Data Management Access Description Integrity Replication - Multiple copies, multiple locations
  8. 8. Data Management Access Description Integrity Replication Availability - If things are down, nothing else matters
  9. 9. Data Management Access Description Integrity Replication Availability Migration - Hardware changes, format changes
  10. 10. Data Management Access Description Integrity Replication Availability Migration Recovery - Robust plans for when things go wrong
  11. 11. Data Management Access Description Integrity Replication Availability Migration Recovery Provenance - Full record of all related activity
  12. 12. Data Management Access Description Integrity Replication Availability Migration Recovery Provenance Retention - Deleting data on a defined schedule
  13. 13. People with Keys + Notes/Reports Passwords + Folders + Scripts (Maybe) Credentials + Metadata + Automation Policy Enforcement - Through the Years
  14. 14. Data Management Fraught with People
  15. 15. Four Verticals → Four Case Studies Health Care & Life Science Oil & Gas Media & Entertainment Archives & Records Management
  16. 16. Health Care & Life Science Genomics Use Case - Data begins as series of images from a sequencer, converted to bases (ATCG), fragmented, aligned, annotated for variants, filtered, analyzed Extensive Data Pipelines Saved State Diverse Data Products Share Results
  17. 17. Health Care & Life Science Priorities: reproducibility multi-institutional collaboration
  18. 18. Oil & Gas Ingest Use Case - As existing storage fills up, complementary strategies 1) migrate from active to slower, cheaper archive and 2) add more active. Traditional HSM has limited flexibility (access date, physical location, etc.) and additional namespaces just add more complexity. Diverse Data Sources Spread Geographically Computationally Intense
  19. 19. Oil & Gas Priorities: unified namespace automated analytics
  20. 20. Media & Entertainment Born Digital Use Case - New valuable creative content (movie assets, original musical tracks) requires large, robust, long-term, flexible, accessible infrastructure. Popular Content Unique Largely Video and Games
  21. 21. Media & Entertainment Priorities: access control backups integrity
  22. 22. Archives & Records Management Provenance Use Case - Libraries, museums, and other cultural institutions have a 100+ year view on their digital assets. Must maintain archival and dissemination copies. Lots of metadata. Cultural Heritage Original and Derivative Copies Quality Search and Browse
  23. 23. Archives & Records Management Priorities: provenance integrity migration metadata replication
  24. 24. Four Verticals → Four Case Studies Health Care & Life Science Oil & Gas Media & Entertainment Archives & Records Management
  25. 25. The Four Pillars
  26. 26. Open Source Data Management Middleware iRODS enables data discovery using a metadata catalog that describes every file, every directory, and every storage resource in the data grid. iRODS automates data workflows, with a rule engine that permits any action to be initiated by any trigger on any server or client in the grid. iRODS enables secure collaboration, so users only need to log in to their home grid to access data hosted on a remote grid. iRODS implements data virtualization, allowing access to distributed storage assets under a unified namespace, and freeing organizations from getting locked in to single-vendor storage solutions.
  27. 27. Questions? irods.org github.com/irods @irods Creative Commons Images Used: https://www.flickr.com/photos/addieplum/116062198/ https://www.flickr.com/photos/ajmexico/3281139507/ https://www.flickr.com/photos/future15/2037742362/

×