OpenStack and Ceph case study at the University of Alabama


Published on

The University of Alabama at Birmingham gives scientists and researchers a massive, on-demand, virtual storage cloud using OpenStack and Ceph for less than $0.41 per gigabyte. This is a session at the OpenStack summit given by Kamesh Pemmaraju at Dell and John Paul at University of Alabama. This will detail how the university IT staff deployed a private storage cloud infrastructure using the Dell OpenStack cloud solution with Dell servers, storage, networking and OpenStack, and Inktank Ceph. After assessing a number of traditional storage scenarios, the University partnered with Dell and Inktank to architect a centralized cloud storage platform that was capable of scaling seamlessly and rapidly, was cost-effective, and that could leverage a single hardware infrastructure for the OpenStack compute and storage environment.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

OpenStack and Ceph case study at the University of Alabama

  1. 1. Case Study: The University of Alabama at Birmingham OpenStack , Ceph, Dell Kamesh Pemmaraju, Dell John-Paul Robinson, UAB OpenStack Summit 2014 Atlanta, GA
  2. 2. An overview • Dell – UAB backgrounder • What we were doing before • How the implementation went • What we’ve been doing since • Where we’re headed
  3. 3. Dell – UAB background • 900 researchers working on Cancer and Genomic Projects. • Their growing data sets challenged available resources – Research data distributed across laptops, USB drives, local servers, HPC clusters – Transferring datasets to HPC clusters took too much time and clogged shared networks – Distributed data management reduced researcher productivity and put data at risk • They therefore needed a centralized data repository for Researchers in order to insure compliances concerning retention of data. • They also wanted scale-out cost-effective solution and hardware that could be re-purposed for compute & storage
  4. 4. Dell – UAB background (contd..) • Potential solutions investigated – Traditional SAN – Public cloud storage – Hadoop UAB chose Dell/Inktank to architect a platform that would be very scalable and provide lost costs per GB and was the best of all worlds that provide compute and storage on the same hardware.
  5. 5. A little background… • We didn’t get here overnight • 2000s-era High Performance Computing • ROCKS-based compute cluster • The Grid and proto-clouds • GridWay Meta-scheduler • OpenNebula an early entrant that connected grids with this thing called the cloud • Virtualization through-and-through • DevOps is US
  6. 6. Challenges and Drivers • Technology • Many hypervisors • Many clouds • We have the technology…can we rebuild it here? • Applications • Researcher started shouting “Data”! NextGen Sequencing Research Data Repositories Hadoop • Researcher kept on shouting “Compute”!
  7. 7. Data Intensive Scientific Computing • We knew we needed storage and computing • We knew we wanted to tie it together with an HPC commodity scale-out philosophy • So August 2012 we bought 10 Dell 720xd servers • 16-core • 96GB RAM • 36TB Disk • A 192-core, ~1TB RAM, 360TB expansion to our HPC fabric • Now to integrate it…
  8. 8. December 2012 • Bob said: Hearing good things about open stack and ceph at this week at dell world. Simon anderson, CEO of dream host , spoke highly of dell, open stack, and ceph today. He is also chair of company that supports He also spoke highly of dell crowbar deployment tool. I
  9. 9. December 2012 • Bob said: Hearing good things about open stack and ceph at this week at dell world. Simon anderson, CEO of dream host , spoke highly of dell, open stack, and ceph today. He is also chair of company that supports He also spoke highly of dell crowbar deployment tool. • I said: Good to hear. I've been thinking a lot about dell in this picture too. We have the building blocks in place. Might be a good way to speed the construction.
  10. 10. Lesson 1: Recognize when a partnership will help you achieve your goals.
  11. 11. The 2013 Implementation • The Timeline • In January we started our discussions with Dell and Inktank • By March we had committed to the fabric • A week in April and we had our own cloud in place • The Experience • Vendors committed to their product • Direct engagement through open communities • Bright people who share your development ethic
  12. 12. Next Step…Build Adoption • Defined a new storage product based on the commodity scale-out fabric • Able to focus on strengths of Ceph to aggregate storage across servers • Provision any sized image to provide Flexible Block Storage • Promote cloud adoption within IT and across the research community • Demonstrate utility with applications
  13. 13. Applications • Crashplan Backup in the cloud • A couple hours to provision the VM resources • An easy half-day deploy with the vendor because we controlled our resources a.k.a. firewall • Add storage containers on the fly as we grow…10TB in few clicks • Gitlab hosting • Start a VM spec’d according to project site • Work with Omnibus install. Hey it uses Chef! • Research Storage • 1TB storage containers for cluster users • Uses Ceph RBD images and NFS • The storage infrastructure part was easy • Scaled provisioning, 100+ user containers (100TB) created in about 5 minutes. • Add storage servers as existing ones fill
  14. 14. Ceph Rebalances as Storage Grows :)
  15. 15. Lesson 2: Use it! That’s what it’s for!
  16. 16. Lesson 2: Use it! That’s what it’s for! The sooner you start using the cloud the sooner you start thinking like the cloud.
  17. 17. How PoC Decisions Age Over Time • Pick the environment you want when you are in operation…you’ll be there before you know it • Simple networking is good • But don’t go basic unless you are able to reinstall the fabric • Class B ranges to match the campus fabric • We chose a split admin range to coordinate with our HPC admin range • We chose a collapsed admin/storage network due to a single switch…probably would have been better to keep separate and allow growth • It’s OK to add non-provisioned interfacing nodes…know your net • Avoid painting yourself in corner • Don’t let the Paranoid Folk box-in your deployment • An inaccessible fabric is an unusable fabric • Fixed IP range mismatch with “fake” reservations
  18. 18. Lesson 3: The fabric is flexible. Let it help you solve your problems
  19. 19. Problems will Arise • The release version of the ixgbe driver in Ubuntu 12.04.1 kernel didn’t perform well with our 10Gbit cards • Open source has an upstream • Use it as part of debug network • Upgrading the drivers was a simple fix • Sometimes when you fix something you break something else • There are still a lot of moving parts but each has a strong open source community • Work methodically • You will learn as you go • Recognize the stack is integrated and respect tool boundaries
  20. 20. Sometimes a Problem is just a Problem • Code ex
  21. 21. Lesson 4: The code *is* the documentation
  22. 22. Lesson 4: The code *is* the documentation …and that’s a *good* thing
  23. 23. Where we are today • OpenStack plus Ceph are here to stay for our Research Computing System • They give us the flexibility we need for an ever expanding research applications portfolio • Move our UAB Galaxy NextGen Sequencing platform to our Cloud • Add Object Storage services • Put the cloud in the hands of researchers • The big question…
  24. 24. …how far can we take it? • The goal of process automation is scale • Incompatible, non-repeatable, manual processes are a cost • Success is in dual-use • Satisfy your needs and customer demand • Automating process implies documenting process…great for compliance and repeatability • Recognize the latent talent in your staff today’s system admins are tomorrows systems developers • Traditional infrastructure models are ripe for replacement
  25. 25. Lesson 5? You can we learn from research and engage as a partner
  26. 26. Want to learn more about Dell + OpenStack + Ceph? Join the Session, 2:00 pm, Tuesday, Room #313 Software Defined Storage, Big Data and Ceph - What Is all the Fuss About? Neil Levine, Inktank & Kamesh Pemmaraju, Dell