Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Advancing Research at London's Global University


Published on

In this video from the DDN User Group Meeting at ISC'13, Dr. Daniel Hanlon from the University College of London presents: Advancing Research at London's Global University.

"As UCL's storage demands grow, the university expects to build a storage foundation that will scale up to 100PB. Looking for a storage solution that was massively scalable yet simple to manage as part of the first phase of the infrastructure build out, UCL will use DDN object storage technology to store up to 600TB of research data. DDN object storage capabilities also will be able to empower UCL researchers to collaborate without having to worry about data reliability, compliance obligations or long-term retention of critical research assets."

Learn more:

Watch the presentation video:

Published in: Technology, Education
  • Be the first to comment

Advancing Research at London's Global University

  1. 1. Advancing Research at London’s Global University Clare Gryce
 Daniel Hanlon
 Research IT Services, UCL
  2. 2. London’s Global University Consistently ranked in word’s top 10 Universities Alumnae include 20 Nobel prize winners Founded 1826; first to admit without regard to race or religion, women on equal terms as men 8000 staff, 25% from 84 countries outside UK ~25,000 students (over 1/3 postgrads) Annual turnover > £900 million > £15m dependent on University HPC Highly multi-disciplinary
  3. 3. Delivering a Culture of Wisdom “UCL is London’s research powerhouse, with a commitment to enhancing the lives of people in the capital, the UK and around the world. Our academics have breadth and depth of expertise across the entire range of academic disciplines. Individually, they expand our understanding of the world; collectively and collaboratively, they deliver analysis that addresses the major challenges facing humanity” Professor David Price, UCL Vice-Provost (Research) •  UCL Grand Challenges – Impact •  UCL Research Frontiers – Enquiry and ‘curiosity’
  4. 4. Underpinning Research •  Research IT Services •  Established June 2012 •  Support and enablement across research lifecycle IRIS HPC Collaboration tools Research DataRPS UCL Discovery Research Data
  5. 5. Research IT Services •  Investment in people and expertise –  Research software development initiative –  Comprehensive training programme •  Collaboration and Innovation –  e-Infrastructure South –  UK Research Data community –  Vendor partnership Innovation and Skills
  6. 6. UCL’s case for Research Data Initiative •  Many departments and research groups with long history of excellence –  Lost opportunity for new research building on old •  UCL is a highly multi-disciplinary institution –  Lost opportunity for cross-disciplinary re-use •  Unmanaged datasets are lost datasets –  Increasing burden to researchers –  Backup, Failing USB HDDs, AuthN problems
  7. 7. All carrots, no sticks •  Research Data is an offering •  Projects can and will opt out •  Remove the burden of managing storage •  Resiliency is better than backup •  Remove burden of compliance with Research Council requirements
  8. 8. All carrots, no (hardly any) sticks •  Research Data is an offering •  Projects can and will opt out •  Remove the burden of managing storage •  Resiliency is better than backup •  Remove burden of compliance with Research Council requirements
  9. 9. Requirements capture •  Researchers – We want… –  Everything! but more! faster! and shinier! –  NFS, CIFS, scp, GridFTP, Cloud (Dro***x) •  Institution – Solution must have… –  Low admin overhead –  Cheap (cost per TB) –  High density
  10. 10. Architecture choices •  Simple –  start with tried and trusted –  challenge the Big Data hysteria •  Strong abstractions –  avoiding lock-in to proprietary technology •  Hedge bets –  Build in migration between storage solutions •  Project-based
  11. 11. Separate Live and Archive •  Live –  Mutable –  Address current requirement –  Private •  Archive –  Immutable –  Exploitable –  Transfer of responsibility
  12. 12. Metadata •  Registration –  Principle Investigator –  Project name –  Members –  Dates –  Funder •  Enrichment for Archive –  Domain specific
  13. 13. iRODS •  Metadata store •  Bridge between different areas and storage types –  Conventional storage –  Object storage –  Tape •  Metadata store –  Possibility for enrichment during live phase
  14. 14. WOS •  Simple Object Store –  REST API –  Policies –  Geographical redundancy •  Highly scalable… >10 PB deployments •  Low admin overhead •  Native iRODS connector
  15. 15. WOS Access •  NAS presentation –  expose WOS as CIFS or NFS mount points –  HA and backed up in WOS •  Open architecture based on Open Source projects –  easy to build upon
  16. 16. GPFS – IBM’s General Parallel FileSystem •  The conventional choice –  POSIX filesystem –  High performance, parallel transfers •  Connect with UCL’s existing HPC resources –  Multi-cluster around UCL •  Many options for exports –  Native GPFS, samba, scp, cNFS –  Non-trivial to manage interactions/locking issues
  17. 17. Cloud – “The storage that dare not speak its name” •  Very widely used in academia (unofficially) –  need to satisfy a very real requirement •  Rapidly evolving –  standards needed (not S3!) •  Oxygen Cloud –  Local storage –  Local authN –  Local files can be stubs or fully synchronised
  18. 18. Current state of play •  Number of projects –  11 •  Number of users –  22 •  Volume of data –  35TB •  Continuing to deploy and slowly expand users
  19. 19. Project Registration Manual Process Smaller initial deployment but to scale with phase II Storage connector Cloud infrastructure AD/Moonshot infrastructure AuthN connector Client applications mirror local files in to RD live storage Create project group E-mail with cloud access details DDN SFA12K iCAT Metadata store E-mail with campus access details Legion CS cluster etc... GPFS/NFS/ Samba/ssh/ iRODS via PAM Store high level project metadata Mounted access via GPFS/CIFS/NFS/ WebDAV sshd (gridftpd) Storage resiliency >10Gb network Asynchronous access via scp, iRODS Policy driven replication, encryption and backup as required with project-level granularity DDN WOS
  20. 20. Challenges •  Authentication –  Multiple access mechanisms with single AuthN –  Cross-University collaboration (Moonshot) •  Networking –  central storage vs local NAS device –  poor connectivity to some departments •  Multiple access mechanisms –  Multiple views…
  21. 21. Clare Gryce Daniel Hanlon