From tape to cloud storage                                                                             4/19/2012          ...
Agenda • Where we were…          • Tape archive overview • Where SDSC is at today          • Current Data Services        ...
SAMQFS ARCHITECTURE                                                                             MetaData            Oracle...
SAMQFS ARCHITECTURE CONT’D…                                                     LCU                                       ...
…TAKE AWAY         • Complex Environment                  •      Many dependencies (SAN, Metadata, Tape Drives, Silo)     ...
Where SDSC is at today                             Data Services Overview                              Cloud Storage (Open...
Goals for Cloud/Object Storage • Support NSF Data Management Plan          • Required Plan to describe how research result...
Why Openstack?                     Industry Standard                                      Proven Software                 ...
Design Highlights             100% Dual Copy Disk Storage             Initial 5.5PB (petabytes)             Dual 10Gb Aris...
OpenStack Cloud Storage ArchitectureSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO             ...
Current UsageSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO                                    ...
Usage Breakdown  Application Integration                                                   Native clients      File system...
Access methods             Integrated             • UCSD Library Collection management             Client tools           ...
UCSD Library CollectionsSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO                         ...
SDSC Swift Web Client (Cloud Explorer) Features: Uploads/Downloads/Rename/Mo ve Permissions management Change Password Dis...
Others (Command Line, GUI, Filesystem)                              Swift Python Client                              •Batc...
Upcoming Features              Active Directory              Authentication Integration              Large file upload sup...
Questions?                                  Email SERVICES@SDSC.EDU for more info!                                        ...
Upcoming SlideShare
Loading in …5
×

San Diego Super Computer

2,598 views

Published on

Published in: Education, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,598
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
66
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Organized research unit on the campus of UC San Diego and funded by the National Science Foundation.
  • Describe environment, Complex, Latency due to tape loads, small files affect performance
  • Silo overview, pass-through ports
  • High latency tape access and lower bandwidth users wait longer for there data. Many times one user can monopolize the staging queueBecause it’s difficult to share and access, most data tends to be write-once, ready never. Fine for things like backup, but not acceptable for valuable data sets
  • Where we’re at
  • New January 2011, NSF data plan is required for all new grant proposals
  • Talk about basic architecture, scale out by adding more components (proxies, storage bricks) Love the shared nothing architecture. then changes we made to improveDoug Merging auth, Load balancing… Arista switched data oasis…Currently running Centos 5 with a plan to update to Centos 6.2 within the next couple of months.
  • Launched Service in August 2011, Modest usage so far. Commvault backup service has yet to come online
  • Backups consumes the most space and This is consistent with our tape archive usage as wellFollowed by File system emulation. Both Backups and Filesystem are well understood and have familiar applicationsNative Client usage is a small percentage of use (swift python client, cyberduck, cloud explorer). There is no robust reliable (free) client available to our knowledge. We would like to see this get addressed. Speak to the challenges of each tool.We may be a little different in where we want to use swift as the generic storage service vs building an application.
  • Ron
  • Search, indexing,etc is handled by the web applicationDigital assets, text and other associated data is stored in swift
  • Features: Upload, Download, Permissions, dragNdrop, change passwordNot considered a large data movement tool. Largely used for viewing objects, setting permissions, viewing share URL’sNeed to support large file uploads
  • Swift: Nice to add copy, move, rename. In general, we’d like to see a robust command line client. We are interested in starting this effort if one doesn’t exist already and talking to others who have similar needsNearly all of our HPC users rely on command line based tools. Cyberduck is ok for light usage. Our hopes is that once cloud explorer supports drag and drop uploads that this would obsolete the tool and focus on the web and command line tool development and support.
  • And with that I’ll take questions
  • Ron
  • Ron PI pertinentEverything you put in is immediately shareable to groups of users, the world or no one if you want This form of storage is fully integrated with your applications and web services.Near real time collaboration from anywhere on the globe that you have web access.
  • Ron pass through
  • Ron
  • Ron
  • Ron Quote Richard
  • San Diego Super Computer

    1. 1. From tape to cloud storage 4/19/2012 Steve Meier https://cloud.sdsc.eduSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    2. 2. Agenda • Where we were… • Tape archive overview • Where SDSC is at today • Current Data Services • Swift architecture overview • Access methods • Cloud Explorer Web interface • UCSD Libraries Collections • Others (Cyberduck, Command Line, s3backer) • Future PlansSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    3. 3. SAMQFS ARCHITECTURE MetaData Oracle (RMAN) Servers NFS Server/ MDS1 NFS Backups/ Commvault Data Login web MDS2 (GridFTP,SFTP) Force 10 - Juniper 12000 T640 16 STK 9940B FC SAM-QFS 32 IBM 3592 FC Tape 1.2PB SAN Drives 6 STK 9310 Silos 32PB Disk Cache CapacitySAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    4. 4. SAMQFS ARCHITECTURE CONT’D… LCU 20-J2A 12-J2A ow 4-9 9940 nd 12 Wi LC 84 B - LC 0C U U 4-3590E 12-J2A LSM_2 Passthru Channel LSM_4 Passthru Channel LSM_5 Panel 1 Pa Pa Pa Ch ssthr hru l Ch ssthr hru l Ch ssthr sst an u sst an u ne Pa anne ne l Pa anne an u ne l h l Ch C LC U Passthru Channel Passthru Channel LSM_0 LSM_1 LSM_3 Panel 912 9840 8- -99 C 2A 40 LC -J Panel 0 B U 20 DESCRIPTION LC 0 E 0 E U 0 E 59 59 59 4-3 12-9940B 4-3 4-3 1.) one existing 20 drive panel on LSM4s panel 10 will hold 20 J2A 2.) one existing 20 drive panel on LSM5s panel 10 will hold 12 J2A 3.) will install a 20 drive panel on LSM5s panel 1 to hold 12 J2A 4.) will install a 20 drive panel on LSM3s panel 9 to hold 20 J2A SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    5. 5. …TAKE AWAY • Complex Environment • Many dependencies (SAN, Metadata, Tape Drives, Silo) • Aging Infrastructure • Puts pressure on all the dependencies • Tech refresh way over due • Archival data is difficult to access - high latency, lower bandwidth, user interfaces • Difficult to share archival data to multiple users • All too often archived data, particularly HPC simulations, is “write-once- read-never”SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    6. 6. Where SDSC is at today Data Services Overview Cloud Storage (OpenStack Swift) • Purpose: Storage of Digital Data for Ubiquitous Access and High-Durability • Access Mechanisms: Swift/S3 API, Cloud Explorer, Clients, CLI Traditional File Server Storage (NFS/CIFS) • Purpose: Typical Project / User Storage Needs • Access: NFS/CIFS/iSCSI High Performance Computing Storage (PFS) • Purpose: High Performance Transient Storage to Support HPC • Access Mechanisms: Lustre on HPC Systems (Gordon, Trestles, Triton)SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    7. 7. Goals for Cloud/Object Storage • Support NSF Data Management Plan • Required Plan to describe how research results are shared. • 99.5% system availability • File replication automated • Default 2 copies, able to keep additional offsite replications. • Automated checksum verification and error correction • Scalable • Performance and capacity grows by incremental bricks. • Multifaceted accessibility • Web, API, Graphical and Command Line Clients • Cost competitive • Operated as a recharge service • On par with current tape-based dual-copy costs of $0.0325/GB/Mo.SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    8. 8. Why Openstack? Industry Standard Proven Software • More than 100 leading • Running the OpenStack cloud companies from over a dozen operating system is same countries are participating in software that powers many OpenStack, including large public and private Cisco, Citrix, Dell, Intel and clouds, including RackSpace Microsoft Cloud Storage. Control & Flexibility Highly Compatible Open source platform means not • Compatibility w/ public OpenStack locked to a proprietary clouds means it’s easy to migrate vendor, and modular design can data and apps to public clouds integrate with legacy or 3rd-party when desired—based on security technologies. OpenStack project policies, economics, and other provided under Apache 2.0 key business criteria license.SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    9. 9. Design Highlights 100% Dual Copy Disk Storage Initial 5.5PB (petabytes) Dual 10Gb Arista Connected, 8 GB aggregate I/O performance Off-Site Replication (UC Berkeley) Continuous File Integrity Verification Help PI’s meet NSF Data Management requirementsSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    10. 10. OpenStack Cloud Storage ArchitectureSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    11. 11. Current UsageSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    12. 12. Usage Breakdown Application Integration Native clients File system Emulation Backups (s3backer, panzura, whitewater)SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    13. 13. Access methods Integrated • UCSD Library Collection management Client tools • SDSC Cloud Explorer • swift python client • Cyberduck • s3backerSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    14. 14. UCSD Library CollectionsSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    15. 15. SDSC Swift Web Client (Cloud Explorer) Features: Uploads/Downloads/Rename/Mo ve Permissions management Change Password Display Container Share URLSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    16. 16. Others (Command Line, GUI, Filesystem) Swift Python Client •Batch processing •Large file upload support •Lacking in features and error logging/recovery Cyberduck •drag and drop GUI for Mac’s and Windows •No large file upload support s3Backer •Compatible with existing tools (eg. rsync, SFTP) •File system •Familiar •File Sharing ChallengesSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    17. 17. Upcoming Features Active Directory Authentication Integration Large file upload support (Cloud Explorer) Server Side Encryption for at rest data.SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division
    18. 18. Questions? Email SERVICES@SDSC.EDU for more info! http://www.sdsc.edu http://rci.ucsd.eduSAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Cyberinfrastructure Services Division

    ×