OpenStack and Ceph case study at the University of Alabama

Case Study: The University of
Alabama at Birmingham
OpenStack , Ceph, Dell
Kamesh Pemmaraju, Dell
John-Paul Robinson, UAB
OpenStack Summit 2014
Atlanta, GA

An overview
• Dell – UAB backgrounder
• What we were doing before
• How the implementation went
• What we’ve been doing since
• Where we’re headed

Dell – UAB background
• 900 researchers working on Cancer and Genomic
Projects.
• Their growing data sets challenged available resources
– Research data distributed across laptops, USB drives, local
servers, HPC clusters
– Transferring datasets to HPC clusters took too much time
and clogged shared networks
– Distributed data management reduced researcher
productivity and put data at risk
• They therefore needed a centralized data repository for
Researchers in order to insure compliances concerning
retention of data.
• They also wanted scale-out cost-effective solution and
hardware that could be re-purposed for compute &
storage

Dell – UAB background (contd..)
• Potential solutions investigated
– Traditional SAN
– Public cloud storage
– Hadoop
UAB chose Dell/Inktank to architect a platform that
would be very scalable and provide lost costs per GB
and was the best of all worlds that provide compute
and storage on the same hardware.

A little background…
• We didn’t get here overnight
• 2000s-era High Performance Computing
• ROCKS-based compute cluster
• The Grid and proto-clouds
• GridWay Meta-scheduler
• OpenNebula an early entrant that connected
grids with this thing called the cloud
• Virtualization through-and-through
• DevOps is US

Challenges and Drivers
• Technology
• Many hypervisors
• Many clouds
• We have the technology…can we rebuild it here?
• Applications
• Researcher started shouting “Data”!
NextGen Sequencing
Research Data Repositories
Hadoop
• Researcher kept on shouting “Compute”!

Data Intensive Scientific Computing
• We knew we needed storage and computing
• We knew we wanted to tie it together with an
HPC commodity scale-out philosophy
• So August 2012 we bought 10 Dell 720xd servers
• 16-core
• 96GB RAM
• 36TB Disk
• A 192-core, ~1TB RAM, 360TB expansion to our
HPC fabric
• Now to integrate it…

December 2012
• Bob said:
Hearing good things about open stack and ceph at this week at dell world.
Simon anderson, CEO of dream host , spoke highly of
dell, open stack, and ceph today.
He is also chair of company that supports
He also spoke highly of dell crowbar deployment tool.
I

December 2012
• Bob said:
Hearing good things about open stack and ceph at this week at dell world.
Simon anderson, CEO of dream host , spoke highly of
dell, open stack, and ceph today.
He is also chair of company that supports
He also spoke highly of dell crowbar deployment tool.
• I said:
Good to hear.
I've been thinking a lot about dell in this picture
too.
We have the building blocks in place. Might be a good
way to speed the construction.

Lesson 1:
Recognize when a partnership will help you
achieve your goals.

The 2013 Implementation
• The Timeline
• In January we started our discussions with Dell and
Inktank
• By March we had committed to the fabric
• A week in April and we had our own cloud in place
• The Experience
• Vendors committed to their product
• Direct engagement through open communities
• Bright people who share your development ethic

Next Step…Build Adoption
• Defined a new storage product based on the
commodity scale-out fabric
• Able to focus on strengths of Ceph to aggregate storage
across servers
• Provision any sized image to provide Flexible Block
Storage
• Promote cloud adoption within IT and across
the research community
• Demonstrate utility with applications

Applications
• Crashplan Backup in the cloud
• A couple hours to provision the VM resources
• An easy half-day deploy with the vendor because we controlled our
resources a.k.a. firewall
• Add storage containers on the fly as we grow…10TB in few clicks
• Gitlab hosting
• Start a VM spec’d according to project site
• Work with Omnibus install. Hey it uses Chef!
• Research Storage
• 1TB storage containers for cluster users
• Uses Ceph RBD images and NFS
• The storage infrastructure part was easy
• Scaled provisioning, 100+ user containers (100TB) created in about 5
minutes.
• Add storage servers as existing ones fill

Ceph Rebalances as Storage Grows :)

Lesson 2:
Use it! That’s what it’s for!

Lesson 2:
Use it! That’s what it’s for!
The sooner you start using the cloud
the sooner you start thinking like the cloud.

How PoC Decisions Age Over Time
• Pick the environment you want when you are in
operation…you’ll be there before you know it
• Simple networking is good
• But don’t go basic unless you are able to reinstall the fabric
• Class B ranges to match the campus fabric
• We chose a split admin range to coordinate with our HPC admin range
• We chose a collapsed admin/storage network due to a single
switch…probably would have been better to keep separate and allow
growth
• It’s OK to add non-provisioned interfacing nodes…know your net
• Avoid painting yourself in corner
• Don’t let the Paranoid Folk box-in your deployment
• An inaccessible fabric is an unusable fabric
• Fixed IP range mismatch with “fake” reservations

Lesson 3:
The fabric is flexible. Let it help you solve your
problems

Problems will Arise
• The release version of the ixgbe driver in Ubuntu
12.04.1 kernel didn’t perform well with our 10Gbit
cards
• Open source has an upstream
• Use it as part of debug network
• Upgrading the drivers was a simple fix
• Sometimes when you fix something you break
something else
• There are still a lot of moving parts but each has a
strong open source community
• Work methodically
• You will learn as you go
• Recognize the stack is integrated and respect tool boundaries

Sometimes a Problem is just a Problem
• Code ex

Lesson 4:
The code *is* the documentation

Lesson 4:
The code *is* the documentation
…and that’s a *good* thing

Where we are today
• OpenStack plus Ceph are here to stay for our
Research Computing System
• They give us the flexibility we need for an ever
expanding research applications portfolio
• Move our UAB Galaxy NextGen Sequencing platform to
our Cloud
• Add Object Storage services
• Put the cloud in the hands of researchers
• The big question…

…how far can we take it?
• The goal of process automation is scale
• Incompatible, non-repeatable, manual processes
are a cost
• Success is in dual-use
• Satisfy your needs and customer demand
• Automating process implies documenting process…great for
compliance and repeatability
• Recognize the latent talent in your staff today’s system
admins are tomorrows systems developers
• Traditional infrastructure models are ripe for
replacement

Lesson 5?
You can we learn from research
and engage as a partner

Want to learn more about Dell +
OpenStack + Ceph?
Join the Session, 2:00 pm, Tuesday, Room #313
Software Defined Storage, Big Data and Ceph -
What Is all the Fuss About?
Neil Levine, Inktank &
Kamesh Pemmaraju, Dell

OpenStack and Ceph case study at the University of Alabama

More Related Content

What's hot

Viewers also liked

Similar to OpenStack and Ceph case study at the University of Alabama

More from Kamesh Pemmaraju

Recently uploaded

OpenStack and Ceph case study at the University of Alabama