The Jetstream cloud is a collaboration between Cyverse partners TACC and University of Arizona, University of Chicago, Johns Hopkins University, and Indiana University to bring the flexibility and ease-of-use of CyVerse Atmosphere to the entire community of science, at a much larger scale. Jetstream is a cloud resource operated as part of XSEDE, and built from two independent OpenStack clusters, each capable of supporting thousands of virtual machines and data volumes. The clusters are integrated via the user-friendly "Atmosphere" interface developed by CyVerse, with authentication enabled by Globus, and, unlike the CyVerse cloud also offer full access to Openstack web service APIs. Jetstream features a diverse catalog virtual machine templates. One can launch a personal Galaxy server, do advanced biostatistics, use Matlab, or experiment with new technologies like Docker, all on Jetstream. This talk highlights the unique capabilities of Jetstream and provides information about how researchers from all over can access it.
On-Demand Cloud Computing for Life Sciences Research and Education
1. funded by the National Science Foundation
Award #ACI-1445604
On-Demand Cloud Computing for Life
Sciences Research and Education
Matthew Vaughn(@mattdotvaughn)
ORCID 0000-0002-1384-4283
Director, Life Science Computing
Texas Advanced Computing Center
PI @ Jetstream | Cyverse | Araport | CODE@TACC
2. Overview
• What is Cloud Computing?
• Why do we need Cloud Computing?
• What is Jetstream?
• How is Jetstream different from CyVerse
Atmosphere?
• How can one get started using
Jetstream?
3. “…A model for enabling ubiquitous, convenient,
on-demand network access to a shared pool of
configurable computing resources (e.g.,
networks, servers, storage, applications and
services) that can be rapidly provisioned and
released with minimal management.” NIST
Definition of Cloud Computing
http://www.nist.gov/itl/csd/cloud-102511.cfm
What is Cloud Computing?
7. Why don’t we all use Cloud?
Cloud computing is complex to adopt
• It can require a high degree of technical
knowledge
• Users can incur unexpected costs
• Data management and movement can be
challenging
• There’s no systematic support for adopting
cloud technologies or services
8. Jetstream is a Managed Cloud tailored to science and
engineering
9. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Supported by NSF ACI, operated by your friends
Construction
Application / Community LeadsManagement & Operations
Vendors
10. It’s built to address multiple user perspectives
User Accomplishments Role
• Learned how to use the shell and how to work with Linux
• Mastered using R to develop plots for his manuscript
• Published VM to IU Scholarworks to allow reproducible analysis
Laboratory
scientist
• Launches an instance and has full sudo access to customize
• Developed a software with R and Python library dependencies
• She updates it regularly by creating VM image snapshots
Informatics
specialist
• Linked several Jetstream instances with Apache Hadoop
• Worked with XSEDE ECSS to import existing Amazon image
• Built a simple science gateway to allow others to use his tools
Core facility staff
11. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Leadership/national class machines
Institutional HPC, HTC systems
Jetstream serves the “Long Tail”
12. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Jetstream serves the “Long Tail”
Researchers, developers, and scientists who:
– Need between 1 and a few hundred cores
• RIGHT NOW
• For the foreseeable future, but not forever
– Want to fully customize the OS and configuration for their
research computing environment
– Are working with cloud-native applications & workflows
– Use interactive mode for their computing & analytics
13. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
• Jetstream is also helpful for:
– Science gateway operators
• Run web applications, databases, and services (front-end)
• Use it for on-demand provisioned compute capacity
– STEM educators teaching a variety of subjects
• Create a reference VM appliance
• Provision an entire classroom
• Minimize need for local IT support
• It democratizes access to cloud-native technologies and approaches
– no credit card or PO needed
Jetstream serves the “Long Tail”
14. Jetstream is an innovative combination of powerful
hardware and sophisticated software
15. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Hardware Systems Overview
22. GateOne Web Shell
• Zero install SSH client
• Allows tablets or
Windows to easily
use the system
• Supports screen-
casting and terminal
sharing
Can also use native SSH
client with SSH keys
24. Jetstream enables access modes uncommon in
academic research computing
Web Services
• OpenStack API
• Compatible with all clients + libraries
• Python, Java, Go + CLI
• Atmosphere API – Pre-release
• Preview docs @
http://docs.atmospherev2.apiary.io/
• CLI later in 2017
Finicky
Automation, Orchestration, and Workflow
• Marathon/MESOS
• Docker Compose, Machine, Swarm
• Kubernetes
• CloudMan & Elasticluster
• OpenStack Magnum
Configuration Management Tools
• Vagrant / Terraform
• Chef
• Ansible
• Puppet
In Dev
25. Annotate a Genome Using WQ-Maker
Data analysis in Rstudio
Run your own JupyterHub
Publish a VM with DOI
Operate a Galaxy server
Teach a Bioinformatics class
Experiment with Deep Learning
Master Docker or K8S
Mine data using ArcGIS
Develop web services
26. Jetstream is a little different from CyVerse Atmosphere
(but not too much)
27. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Compared to CyVerse Atmosphere, Jetstream…
• Is allocated via XSEDE competitive process
• Gets its support from XSEDE-sponsored staff
• Uses XSEDE credentials and SSH keys for login
• Supports more APIs and automation
• Has more, bigger virtual machine hosts & better
networking
• Is not integrated with CyVerse Data Store
29. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
It’s pretty easy:
• Sign up for a free XSEDE account (portal.xsede.org)
• Get an allocation (startups take 1 business day to approve)
• Start using it use.jetstream-cloud.org
• Read more and get help at www.jetstream-cloud.org
• Cite Jetstream in your papers (instructions on web)
• Follow the project on Twitter @jetstream_cloud
33. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
What comes next?
• Passed acceptance (handily) in May
• OpenStack clouds, CEPH storage, and software components functional
– Substantial software integration and development required post-
acceptance. Terra incognita!
• Running in “Early operations mode”
– Extra maintenance days
– A few rough edges (especially for power users)
• Full production - Sep 1, 2016
• More features and capabilities will come
• Public roadmap soon
34. Free tier makes it really easy to get
started on public cloud$
Our idea: Let any user with active
XSEDE User Portal account use a
small (but functional) slice of
Jetstream
• Get an XSEDE account
• Sign in to XSEDE User Portal
• Click “Trial Jetstream Access”
button
• Get access to Jetstream in about
30 minutes
35. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Jetstream in context
• HTC system
• GPGPUs
• Virtual Clusters
• Gateways
• Data Intensive
• Portal-based
configuration
• Services
• Data sets
• Large Memory
• Data Intensive
Computing
• Managed VM
• Self-service VM
• Gateways
• Minimal disk
• Extensible
COMET WRANGLER BRIDGES JETSTREAM
36. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Jetstream in context (2)
Blue Waters & Stampede 1/2
Comet, Bridges, Regional HPC, Public Cloud$
37. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Summary
Jetstream is a public, production cloud platform
• Offers on-demand interactive shell + VNC
• Supports configurable software environments
• Enables configurable computing resources
• Encourages work with cloud-native software
• Empowers novice, intermediate, & expert users
39. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
How can I use Jetstream?
• An XSEDE User Portal (XUP) account is required. They are free!
Get one at https://portal.xsede.org
• Read the Allocations Overview -
https://portal.xsede.org/allocations-overview
• Write a successful allocation request – start with a Startup or
Education request - https://portal.xsede.org/successful-requests
40. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
Where can I get help or learn more?
• Production:
– Web app: http://use-jetstream-cloud.org/
– User guides: https://portal.xsede.org/jetstream
– XSEDE KB: https://portal.xsede.org/knowledge-base
– Email: help@jetstream-cloud.org
– Campus Champions: https://www.xsede.org/campus-
champions
– Training Videos / Virtual Workshops (TBD)
41. Expanding NSF XD’s reach and impact
Around 299,000 researchers, educators, & learners received NSF support in
2012-2013
– Only 1.5% completed a computation, data analysis, or visualization
task on XSDEDE program resources
– Less than 3% had an XSEDE Portal account
– 70% of researchers surveyed* claimed to be resource constrained
Why don’t they use XSEDE systems?
– Activation energy is pretty high
– HPC resources are scarce and not well-matched to their needs
– They just don’t need that much capability
* https://www.xsede.org/xsede-nsf-release-cloud-survey-report
42. Funded by the National Science Foundation
Award #ACI-1445604
http://jetstream-cloud.org/
What is Jetstream?
A production cloud platform for NSF-sponsored researchers
• Provides on-demand interactive computing and analysis
• Enables configurable environments and architectures
• Supports computational reproducibility and sharing
• Democratizes access to cloud-native software
• Focused on ease of use for all adopters
Expands the community of users who benefit from NSF investment in
shared cyberinfrastructure
43. Flavor vCPUs RAM Storage Per Node
m.tiny 1 2 20 46
m.small 2 4 40 23
m.medium 6 16 130 7
m.large 10 30 230 4
m.xlarge 22 60 460 2
m.xxlarge 44 120 920 1
VM Host Configuration
• Dual Intel E-2680v3 “Haswell”
• 24 physical cores/node @ 2.5
GHz (Hyperthreading on)
• 128 GB RAM
• Dual 1 TB local disks
• 10GB dual uplink NIC
• Running Centos7+KVM
Hypervisor
Hardware Specifics
CEPH Storage
• 20x Dell 730xd per cloud
• 2x10Gbs bonded NIC per 730xd
• Running CEPH 0.94.5 Hammer
• Configured as OpenStack Storage
• Storage is XSEDE-allocated
• Implemented on backend as OpenStack Volumes
• Each user gets 10 volumes up to 500GB total storage
• Exploring object storage as well but that’s in the future
44. Key components: What is OpenStack?
• Free and open source software for creating
private and public clouds
• Started in 2010 by Rackspace and NASA,
now managed by OpenStack Foundation
• Widespread adoption across industry and
public sector
• Modular architecture with a 6-month update
cycle
https://www.openstack.org/
45. Key components: Everything else
Component Function
Atmosphere Web application + middleware for providing user-friendly
provider-agnostic IaaS
Globus Auth Provides powerful identity and access management functions
that are easily integrated into web and mobile applications
XSEDE Services Centralized account management, allocation, and reporting
CEPH Distributed object store and file system designed to provide
excellent performance, reliability and scalability
Editor's Notes
Key distinction: Cloud is someone else’s computer. Cloud computing in this context is some else’s computer that YOU can reconfigure and automate.
We could build clusters optimized for these variables…
But now we need flexibility! Jetstream doesn't’t solve hardware issues but is aimed at other challenging aspects.
Jetstream grows out of experience by the iPlant Collaborative running a public science cloud (since 2012)
Offers helpful functions and a growth path for every user
According to a recent survey, only a few percent of NSF-supported researchers/students have ever used NSF shared resources. Resources not matched to their needs, high activation energy, don’t need THAT much capacity. For them, def of HPC is similar to the def of Big Data.
It’s more than you’re used to or comfortable with
User friendly, task oriented interface around discovery, usage, sharing of VM appliances
Land in Dashboard, see community activity, your resources, news, maintenance notices, etc.
Search for images. Description, tags, author. Sometime in 2017, software installed via common package managers
Favorite your faves to come back to them. JS is young and has but a few, but we know this is critical because Cyverse catalog is 250+
Image detail: Tags, etc. Previous versions. Author.
List of all instances associated with a project.
Find out more, watch status of launches and other operations.
Note that my username on the system is my XSEDE username. This is a big pain point for novice users
Note also that I have superuser. This is critical as well.
Custom clusters, tailored to your workload.
One example feature: Easy button.
Close by putting JS in context of the 3 other NSF-deployed systems from last 18mo.
Each fills a role, and there are synergies.
Flexible, modest-scale, exploratory computing and analytics
Onramp to the diverse XSEDE ecosystem for research and education
Aside: Tracking the component upgrade cycle is hard. Imagine if Slurm, GPFS, LDAP, etc all changed function radically every 6 moths.
This is partially why it makes sense to run a public free cloud. We can concentrate effort on doing this well, and provide DOMAIN SUPPORT as well.