SlideShare a Scribd company logo
1 of 43
Economy File Project
Lessons Learned: Deploying Very Low Cost Cloud Storage
Technology in a Traditional Research HPC Environment
Dirk Petersen
Scientific Computing Manager at
Fred Hutchinson Cancer Research Center
(graphs by Robert McDermott, Solution Architecture, FHCRC)
2
Economy File Project at Fred Hutch, Seattle
What we’ll cover:
• Who we are and what we do
• Why did we do it – requirements
for project
• What else we tried first
• What we deployed – architecture,
software, hardware
• How do we manage it
• What did it cost
• How well does it work
• What was the timeline
• What we learned
Economy File Project
3
Economy File Project – Who are we and what do we do
What is Fred Hutch?
• Cancer & HIV research
• 3 Nobel Laureates
• $430M budget / 85% NIH funding
• Seattle Campus with 13 buildings, 15 acre
campus, 1.5+ million sq ft of facility space
Research at “The Hutch”
• 2,700 employees
• 220 Faculty, many with custom requirements
• 13 research programs
• 14 core facilities
• Conservative use of information technology
IT at “The Hutch”
• Multiple data centers with >1000kw capacity
• 100 staff in Center IT plus divisional IT
• Team of 3 Sysadmins to support storage
• IT funded by indirects (F&A)
• Storage Chargebacks starting July 2014
Economy File Project
4
Economy File Project – Why did we do it?
Researchers concerned about ….
• Significant storage costs – $40/TiB/month
chargebacks (first 5 TB is free) and declining
grant funding
• “If you charge us please give us some cheap
storage for old and big files”
• (Mis)perception on storage value
(I can buy a hard drive at BestBuy)
Economy File Project
5
Economy File Project – Why did we do it?
Finance concerned about ….
• Cost predictability and scale
- Data growth causes storage costing up to $1M per year
- Omics data grows at 40%/Y and chargebacks don’t cover all costs
- Expensive forklift upgrades every few years
• The public cloud (e.g. Amazon S3) set new transparent cost benchmarks
Economy File Project
6
Economy File Project – Why did we do it?
Plus we had the complexities of data protection
and archiving
• How can I store something really long term
• But so I can access it quickly whenever I need it
• Difficulties meeting backup recovery time and -point
objectives
Economy File Project
7
Economy File Project – Why did we do it?
So we looked for a solution – but not necessarily object storage…
• Users need to work with a mountable filesystem
• Self service & access with standard unix tools
unix tools (no call to IT!!)
• Need to support larger data sets
for some key users
• 85% were okay with current free
5TB allocation on NAS
• But 15% need access to older/bigger data sets
• Not all resources for few customers (equity)
Economy File Project
Directory Directory
Directory
Directory Directory
Directory
Directory
8
Economy File Project – Why did we do it?
So we looked for a solution – but not
necessarily object storage…
• Performance needs not too high, but may
want to grow
• Large datasets are often put on cheap storage
but over time users often want it faster
• Some concerns about disaster recovery and
resiliency
• Earthquakes in Seattle have gone up to 9.0 (in 1700)
• Replicate to multiple buildings and still keep costs low?
• In Phase II want to replicate to a far away place that does
not rain bricks
Economy File Project
Cars in Seattle covered in bricks after
the 2001 Nisqually earthquake (AP
Photo)
9
Economy File Project – Key Characteristics
• Permanent, static data
• CIFS and NFS access
• POSIX permissions
• Optimized for economy
• Self-healing/self-protecting
• Snapshot/versioning
• Manual file migration
• Very low cost
Economy File Project
10
Economy File Project – Detailed Priorities
Must Have
• Scalability
• Data resilience and self-protection
• Protection from accidental deletion
• Adequate performance
• CIFS and mountable file system with
>16 groups per user
• POSIX like file system
• Low cost (<$20/TB/mo)
• Long term storage
• Supportable
• Monitoring and logging
• Self-service access
• Active Directory integration
• 99.5% Availability
Should Have
• Growth covered by chargeback
• Ability to leverage the cloud
• Ability to create an off-site replica
• Role based access for sys admin
• Symbolic links
Nice to Have
• Capacity quotas
• Migration tools
• Broader storage strategy
• Write Once Read Many (WORM)
• Unified management interface
• High-availability
• Multi-tenant partitioning
Economy File Project
11
Economy File Project – Why Object Storage?
We liked Object/Cloud Storage because:
• Scaling of capacities – no forced end of life because of lack of capacity
• Manageability – e.g. no RAID or lower level OS management involved
• Resiliency – built-in protection, no out of band process (e.g. tape)
• Consistency / predictability – no performance penalties because of
data striping, RAID rebuilds
• Flexibility – integration and extensibility better than file-based
But object storage was not a core requirement.
We would have accepted a POSIX filesystem as well.
Economy File Project
12
Economy File Project – Why Swift/SwiftStack?
We liked Swift/SwiftStack because:
- Proven – used by some big clouds: HP Cloud, Rackspace, IBM, Disney (>50PB)
- Manageability – SwiftStack makes Swift easier to install & maintain than any other storage system we
currently support
- Low cost – open source base plus commodity hardware
- Longevity – archive life of 20+ years better supported by open source project vs. any corporate storage line
(3PAR -> HP)
- Performance – shared nothing architecture has no bottlenecks
- Durability – redundant data accessible even when two drives, nodes, data centers fail due to unique as
possible data distribution
- Cloudiness – scientific applications can start adopting cloud storage paradigm (e.g. RESTful API) and data will
be easy to migrate to public cloud when ready (This will take many years !!)
- Future – young project/product, still adding features and growing
Swift met both current and potential future needs
Economy File Project
13
Economy File Project – Conceptual View
Economy File Project
Object
Controller
Object
Controller
Object
Controller
E2 J4 M4
CIFS NFSStorage Gateway
Files
14
Economy File Project – Swift Data Redundancy
Swift places 3+ replicas of all data as unique as possible
Economy File Project
Single Node Cluster
Disks are “as-unique-as-possible”
Large Cluster
Storage Racks are “as-unique-as-possible”
Multi-Region
Distributed data centers are “as-unique-as-possible”
Small Cluster
Storage Nodes are “as-unique-as-possible”
FAQ: can we run only 2 replicas? Yes we can , but maintenance / quorum
15
Economy File Project – Why does Swift look so familiar?
Common foundations meant we could leverage
• Ops staff expertise
• Existing applications and best practices
• Existing Troubleshooting tools (e.g. wireshark)
Swift object storage building blocks
• Linux, Python, & rsync foundation
• Swift open source storage (written in Python)
• Compatible filesystem gateway (Linux / fuse mount)
• On commodity hardware with consumer drives
• Managed by always up to date SwiftStack Controller installed in the cloud
Hutch’s Scientific Computing / HPC building blocks
• Linux, Python, & rsync foundation
• Running on commodity hardware
Economy File Project
16
NextGen Storage Architecture at the Hutch – Where Swift Fits
Economy File Project
$3/TB/
Month
$40/TB/
Month
17
Economy File Project – What were the other options?
Gluster FS worked well for us as scratch file system but:
• Could not find any proof of scalability (<1PB)
• Replication features seem complex and not widely in use (e.g. web sites like
“GlusterFS replication do's and don'ts”)
• Need to manage some kind raid on the underlying storage box
Looked at fully integrated and commercial HSM solutions
• Hierarchical Storage Management “file stubbing” can work but…
• Peers warned of too much management and unpredictable performance at times
• Many require forklift upgrades at some point
Plus reviewed a number of commercial object storage vendors
but didn’t meet cost/access requirements
• Erasure coding is less effective if you sustain a datacenter drop-out
Economy File Project
18
Economy File Project – how Swift compares to other OSS
Economy File Project
Swift *) Ceph Gluster
Founded 2010 2007 2005
Language Python C++ C
Lines of Code 72000 524000 798000
Contributors in 12 mo 87 126 73
Commits / month 100+ 500+ 100+
*) without CIFS/NFS gateway
Is a young and lean code base a benefit?
19
Economy File Project – Why not in the public cloud?
Still uncertainties with storing
genomic data in public cloud…
- NIH policies and fuzziness – what is
really ok?
• Genomic data must be secure to
protect research participants
• Desire to audit data access
(which only AWS provides?)
- All our compute is still local
But prepared for future migration
- Swift has S3 API (e.g. boto, galaxy)
- Programmers can develop cloud
compatible apps today
- No big forklift upgrades prior migration
Economy File Project
20
Economy File Project – Why not in the public cloud?
Costs and perception are changing…
• Amazon and Google just dropped storage prices significantly
• Cheap compared to high performance NAS tier at
$40-60 /TB/month
• But you still need a fast POSIX fs
Swift storage very low cost at only $13/TB/month
• half the cost/TB of Amazon S3 or Google Cloud Storage
But cloud will be chosen often because we need
a “Switzerland”
Economy File Project
$/TB/Month
NAS Amazon Google Swift
40
27
25
13
21
Economy File Project – What we deployed
Setup three zones in Swift for durability
- One zone per datacenter / building redundancy and durability
Swift Storage Nodes
- Two in each zone –Swift Proxy Nodes
- Two proxy nodes initially – will scale based on traffic
SwiftStack Management Node
- Management only, not in path of user’s storage requests
- Management node in the cloud, local install optional
Filesystem gateway in one zone
- Only one initially – currently the only single point of failure
Initial deployment easily scaled/reconfigured with SwiftStack
Economy File Project
22
The Hutch Campus & storage resiliency from 3 zones
Economy File Project
23
Economy File Project – Architecture View
Economy File Project
24
Economy File Project – Swift Storage Nodes
Supermicro SC847 4U Chassis – configured by Silicon
Mechanics
- 144TB raw capacity (~130TiB usable)
• No RAID controllers & no storage lost to RAID
• 36 x 4TB 3.5” Seagate SATA desktop drives (24
front, 12 back)
• 2 x 120GB SSDs (OS + metadata)
- 10Gb Base-T connectivity
- Two Intel Xeon E5 CPUs and 64 GB RAM
- $13,239 each + tax
Economy File Project
25
Economy File Project – Swift Storage Nodes
What grade of drive?
• Use consumer vs. enterprise grade drives to save cost?
• But how long will consumer grade drives live
• Open question, but Blackblaze thinks a median of 6 years
isn’t bad at all
• What’s the server lifecycle?
And how to replace them?
• Est. 20 drives/mo will fail (in 3 PiB system) so 1 fails per
day
• Replace failed drives every day or just once a month?
• With a bigger drive?
Economy File Project
26
Economy File Project – Networking Components
NETGEAR PROSAFE XS712T
• 12 ports 10Gb Base-T (Cat-6)
• 2 ports 10Gb SFP+
• $1,430 each
• Used in each zone
Economy File Project
27
Economy File Project – POC Architecture
Economy File Project
28
Economy File Project – What did it cost?
Totals for storage
• 216 4TB storage drives across six storage nodes
• 864TB of raw storage
• 260TB of usable capacity after Swift redundancy
Costs for hardware alone
• Total cost is $83,437 for all nodes and switches
• $320 per TB of usable capacity
• Expected cost over 5Y life is $5.35/TB/month
(plus replacement of drives that are out of warranty)
Operating costs
• Mainly from .25 FTE and software licenses
• Power/cooling costs very low due to cheap NW hydro power
Economy File Project
29
Economy File Project – How we manage it?
Needed system to require minimal management
• Production deployment took about 7 days
• Managed by Enterprise Storage Ops team of 3
Find a hardware vendor (e.g. Silicon Mechanics) who
• Has experience in Swift deployment
• Will offer consumer hard drives ready to go in drive carriers
Should take less than 0.25 FTE to operate
• Make sure you have a parts depot for low cost stuff
(e.g. switches, drives)
Also needed robust, mature management tools
• SwiftStack’s automation made roll-out and management simple and you never need to
upgrade this cloud app
Economy File Project
30
Economy File Project – Managed by SwiftStack
Open source Swift functional but…
• Lacking some enterprise features like delete protection, monitoring, user management
and cifs/nfs
• Deployment requires a lot of manual tasks
With the proprietary SwiftStack Controller
• Provides cloud-based management console w/runtime on Swift nodes
• Filesystem gateway, undelete feature, user web portal to Swift
• Competent Swift support 24x7 (a global company . . staff in Taiwan responding at 2am PST)
Economy File Project
SwiftStack
Runtime
SwiftStack FS
Gateway
SwiftStack
Console
31
Economy File Project – Management with SwiftStack
SwiftStack provides control & visibility
- Deployment automation
• Let us roll out Swift nodes in
10 minutes
• Upgrading Swift across clusters
with 1 click
- Monitoring and stats at cluster, node,
and drive levels
- Authentication & Authorization
- Capacity & Utilization Management
via Quotas and Rate Limits
- Alerting, & Diagnostics
Economy File Project
32
Economy File Project – SwiftStack Architecture
Economy File Project
Standard Linux Distribution
Off-the-shelf Ubuntu, Red Hat, CentOS
Standard Hardware
Dell, HP, Supermicro, Quanta, Lenovo…
Swift Runtime
Integrated storage engine with all node components
Integrations & Interfaces
End-user web UI, legacy interfaces,
authentication, utilization API, etc.
OpenStack Swift
Released and supported by SwiftStack
100% Open Source
SwiftStack Nodes (2 —> 1000s)
Rolling Upgrades & 24x7 Support
Monitoring, Alerting & Diagnostics
Capacity & Utilization Mgmt.
Client Support
Ring & Cluster Management
Authentication Services
Deployment Automation
SwiftStack
Controller
33
Economy File Project – How well does it work?
Overall costs are very low – exceeded project goals
• Only $13/TB/month all inclusive
• Much lower than current $40/TB/mo chargeback
Simple to manage
• Management focuses on CIFS/NFS gateway
• SwiftStack provides one web console for entire cluster
Economy File Project
CIFS/NFS
Gateway
SwiftStack
Console
Swift
Node
Swift
Node
Swift
Node
Files
34
Economy File Project – How well does it work?
Initial performance is adequate
• All requests initially threaded via single CIFS/NFS gateway (ca. 300MB/s)
• Doesn’t take full advantage of Swift cluster’s redundancy via proxies
• SwiftStack provides monitoring of load on Swift nodes
Lots of headroom for future
• Swift’s native HTTP API is very fast compared to fs gateway (multiple GB/s)
• Direct HTTP access goes from client directly to Swift node(s)
• Other options like application integration and web portal
• Can use SwiftStack or other load balancers as front-end
• Performance scales up linearly as nodes/clusters added
Economy File Project
35
Economy File Project – How well does it work?
Limitations in the CIFS / NFS gateway
• No symlinks nor hardlinks (it’s on the roadmap)
• e.g. Cannot checkout a github repos into FS
• Archiving dir structure that contains symlinks will be incomplete
• Cannot rename directory structures
• Need to rename each file in directory
• File renames are slow (copy-then-delete operations)
• Renaming a file hangs all other IO in the affected directory tree
• du command does not report right file size (wrote replacement)
Recently completed/fixed CIFS / NFS gateway issues
• Full support for POSIX permissions. (chmod 2770 myfolder now activates the setgid bit)
• vi occasionally complained about inaccurate metadata
File <> Object but they may converge after all
• SwiftStack CIFS/NFS Gateway is the only NAS gateway passing through native swift objects
(use cifs or boto to access th same object)
Economy File Project
36
Economy File Project – NFS/CIFS Benefits for Users
Researchers use familiar access method
• CIFS/NFS gateway gives them same mountable fs access
• No retraining needed, same practices work
• Performance almost as good as existing legacy storage
Cost savings for users
• Big savings over NAS/SAN storage passed on to users
• $3/TB/month instead of $40/TB/month chargeback (1/13th cost)
Plus lots of future potential
• Simple to grow capacity for continuing -omics data glut
• HTTP RESTful APIs allow direct integration with lab automation
• Can take on more uses/systems as other storage ages
• Can take the CIFS/NFS training wheels off in the future
Economy File Project
CIFS/NFS
Gateway
Files
37
Economy File Project – Timeline
Rapid timeline ?
• 2 years from 1st contact to go-live in April 2014
• Initial evaluation of SwiftStack only took 2 days
Initial targets for Swift storage
• Archive data, Large datasets, bam files from external sources,
sequencer output (2x HiSeq 2500)
• 15-20 labs in 1st rollout of Swift production storage
First deployed via familiar access path
• All users initially on SwiftStack CIFS/NFS gateway
• Provides familiar mountable file system & AD integration
• Migration to other Swift clients later
• Web portal, HTTP and other APIs
• Much faster and more flexible
Economy File Project
38
Economy File Project – Lessons Learned
• Communication / testing with users & ops is key
• Try to make life easier for ops staff
• Assistance from HPC or Architecture staff during initial phase
• Incentives such as delegating drive replacements to data center staff
• Communicate ALL limitations / unexpected behavior to users prior to
first test
• And be ultra responsive during test phase (even if the questions
are not related to the system)
Economy File Project
39
Economy File Project – Lessons Learned
Prioritize requirements
• Near-term success is critical
• But don’t ignore need for future potential/headroom
Confirm requirements
• (example: symlinks)
• “You really need them?” “Yes”
• “What for?”, “Why are you asking, symlinks are totally essential!”
• (discussion to resume at later time)
Economy File Project
40
Economy File Project – Lessons Learned
Giving up Tape Backup is a pretty big step
• Have to be certain that bugs do not affect data integrity
• Want some really big shops to use the
same technology to feel comfortable
• Risk management to include your staff
• Staff errors must not lead to data loss
• Restrict access to data centers and
management console
• Compliance best practices
Economy File Project
41
Economy File Project – Lessons Learned
Think hard before putting a POSIX gateway in front of object store for long
term data / archives
• Most Gateways put blobs in into objectstore and you cannot use other
clients or the cloud API to access data
• Always dependent on Gateway appliance vendor
• Using a gateway appliance & Glacier is like a Medieval marriage …….. Forever
• The SwiftStack CIFS/NFS gateway has limitations but will allow
for a more seamless migration to cloud storage as you can continue
to use other clients (convergence ?!)
• POSIX file system gateways are still good for many other datasets
Economy File Project
42
Economy File Project – What’s Next with Swift(Stack)
Evaluate Data protection alternatives
• Backup acceleration (e.g. Riverbed Whitewater)
• Endpoint backup backend (e.g. Druva)
• SQL / Exchange backup (e.g. Cloudberry Backup)
Other potential use cases
• Galaxy storage backend (via S3 compatibility layer)
• File sharing backed (e.g. owncloud)
• High performance POSIX file system backed: Avere
• Software only POSIX fs backend: NuageLabs
• Cloud Drive backed (e.g. Cloudberry Drive, Expandrive)
• High capacity block storage gateway
Digging deeper
• Auditing object access using Syslog and Splunk
• Do we need to have a copy outside the earthquake zone?
Economy File Project
43
Questions
&
Answers
Economy File Project

More Related Content

What's hot

Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersOwen O'Malley
 
Speeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China UnicomSpeeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China UnicomAlluxio, Inc.
 
Burst Presto & Spark workloads to AWS EMR with no data copies
Burst Presto & Spark workloads to AWS EMR with no data copiesBurst Presto & Spark workloads to AWS EMR with no data copies
Burst Presto & Spark workloads to AWS EMR with no data copiesAlluxio, Inc.
 
Hybrid data lake on google cloud with alluxio and dataproc
Hybrid data lake on google cloud  with alluxio and dataprocHybrid data lake on google cloud  with alluxio and dataproc
Hybrid data lake on google cloud with alluxio and dataprocAlluxio, Inc.
 
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for SparkAlluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for SparkAlluxio, Inc.
 
Fast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonFast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonAlluxio, Inc.
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Spark Summit
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
 
Embracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloadsEmbracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloadsAlluxio, Inc.
 
[Pixar] Big Data, Big Depots
[Pixar] Big Data, Big Depots[Pixar] Big Data, Big Depots
[Pixar] Big Data, Big DepotsPerforce
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceDinesh Chitlangia
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
BDAM: Big Data Asset Management
BDAM: Big Data Asset ManagementBDAM: Big Data Asset Management
BDAM: Big Data Asset ManagementPerforce
 
The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem tableMohamed Magdy
 
The DuraCloud Workshop - Open Repositories 2015
The DuraCloud Workshop - Open Repositories 2015The DuraCloud Workshop - Open Repositories 2015
The DuraCloud Workshop - Open Repositories 2015DuraSpace
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Spark Summit
 

What's hot (20)

Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
 
Speeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China UnicomSpeeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China Unicom
 
Burst Presto & Spark workloads to AWS EMR with no data copies
Burst Presto & Spark workloads to AWS EMR with no data copiesBurst Presto & Spark workloads to AWS EMR with no data copies
Burst Presto & Spark workloads to AWS EMR with no data copies
 
Hybrid data lake on google cloud with alluxio and dataproc
Hybrid data lake on google cloud  with alluxio and dataprocHybrid data lake on google cloud  with alluxio and dataproc
Hybrid data lake on google cloud with alluxio and dataproc
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for SparkAlluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
 
Fast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonFast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on Tachyon
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Embracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloadsEmbracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloads
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
[Pixar] Big Data, Big Depots
[Pixar] Big Data, Big Depots[Pixar] Big Data, Big Depots
[Pixar] Big Data, Big Depots
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
BDAM: Big Data Asset Management
BDAM: Big Data Asset ManagementBDAM: Big Data Asset Management
BDAM: Big Data Asset Management
 
The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem table
 
The DuraCloud Workshop - Open Repositories 2015
The DuraCloud Workshop - Open Repositories 2015The DuraCloud Workshop - Open Repositories 2015
The DuraCloud Workshop - Open Repositories 2015
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
 

Viewers also liked

Viewers also liked (20)

GSR Crowdfunding Campaign - Lead Generation Tutorial
GSR Crowdfunding Campaign - Lead Generation TutorialGSR Crowdfunding Campaign - Lead Generation Tutorial
GSR Crowdfunding Campaign - Lead Generation Tutorial
 
Question 2
Question 2Question 2
Question 2
 
Test your vocabulary 1
Test your vocabulary 1Test your vocabulary 1
Test your vocabulary 1
 
Индонезија
ИндонезијаИндонезија
Индонезија
 
Articulo comunicacion corporativa
Articulo comunicacion corporativaArticulo comunicacion corporativa
Articulo comunicacion corporativa
 
Question 6
Question 6Question 6
Question 6
 
Prueba final
Prueba finalPrueba final
Prueba final
 
Sunu8
Sunu8Sunu8
Sunu8
 
s2.dönem
s2.dönems2.dönem
s2.dönem
 
Shaw empresario
Shaw empresarioShaw empresario
Shaw empresario
 
Training report nyakyera sacco ltd final report
Training report nyakyera sacco ltd final reportTraining report nyakyera sacco ltd final report
Training report nyakyera sacco ltd final report
 
2015 National Surgical Assistant Association Conference
2015 National Surgical Assistant Association Conference2015 National Surgical Assistant Association Conference
2015 National Surgical Assistant Association Conference
 
Türk dili
Türk diliTürk dili
Türk dili
 
Taal
TaalTaal
Taal
 
Linkedin Tutorial
Linkedin TutorialLinkedin Tutorial
Linkedin Tutorial
 
美少女ゲームの現在地
美少女ゲームの現在地美少女ゲームの現在地
美少女ゲームの現在地
 
Nebosh Certificate 130315
Nebosh Certificate 130315Nebosh Certificate 130315
Nebosh Certificate 130315
 
blog1
blog1blog1
blog1
 
blogg1
blogg1blogg1
blogg1
 
Ergonomía power point 3ero seh
Ergonomía power point 3ero sehErgonomía power point 3ero seh
Ergonomía power point 3ero seh
 

Similar to BIOIT14: Deploying very low cost cloud storage technology in a traditional research HPC environment

How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
ECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayBob Sokol
 
In Place Analytics For File and Object Data
In Place Analytics For File and Object DataIn Place Analytics For File and Object Data
In Place Analytics For File and Object DataSandeep Patil
 
Spectrum scale object analytics
Spectrum scale object analyticsSpectrum scale object analytics
Spectrum scale object analyticsSmita Raut
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudAlluxio, Inc.
 
Harness the Power of Hybrid Cloud with AWS and Avere
Harness the Power of Hybrid Cloud with AWS and AvereHarness the Power of Hybrid Cloud with AWS and Avere
Harness the Power of Hybrid Cloud with AWS and AvereAmazon Web Services
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Cloudian
 
Building a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraBuilding a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraAlluxio, Inc.
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Doug O'Flaherty
 
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOCloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOStorage Switzerland
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraAlluxio, Inc.
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoopgluent.
 

Similar to BIOIT14: Deploying very low cost cloud storage technology in a traditional research HPC environment (20)

How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
ECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps Day
 
In Place Analytics For File and Object Data
In Place Analytics For File and Object DataIn Place Analytics For File and Object Data
In Place Analytics For File and Object Data
 
Spectrum scale object analytics
Spectrum scale object analyticsSpectrum scale object analytics
Spectrum scale object analytics
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
 
Harness the Power of Hybrid Cloud with AWS and Avere
Harness the Power of Hybrid Cloud with AWS and AvereHarness the Power of Hybrid Cloud with AWS and Avere
Harness the Power of Hybrid Cloud with AWS and Avere
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
Building a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraBuilding a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native Era
 
EMC EC Overview
EMC EC OverviewEMC EC Overview
EMC EC Overview
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
 
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOCloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
 

Recently uploaded

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

BIOIT14: Deploying very low cost cloud storage technology in a traditional research HPC environment

  • 1. Economy File Project Lessons Learned: Deploying Very Low Cost Cloud Storage Technology in a Traditional Research HPC Environment Dirk Petersen Scientific Computing Manager at Fred Hutchinson Cancer Research Center (graphs by Robert McDermott, Solution Architecture, FHCRC)
  • 2. 2 Economy File Project at Fred Hutch, Seattle What we’ll cover: • Who we are and what we do • Why did we do it – requirements for project • What else we tried first • What we deployed – architecture, software, hardware • How do we manage it • What did it cost • How well does it work • What was the timeline • What we learned Economy File Project
  • 3. 3 Economy File Project – Who are we and what do we do What is Fred Hutch? • Cancer & HIV research • 3 Nobel Laureates • $430M budget / 85% NIH funding • Seattle Campus with 13 buildings, 15 acre campus, 1.5+ million sq ft of facility space Research at “The Hutch” • 2,700 employees • 220 Faculty, many with custom requirements • 13 research programs • 14 core facilities • Conservative use of information technology IT at “The Hutch” • Multiple data centers with >1000kw capacity • 100 staff in Center IT plus divisional IT • Team of 3 Sysadmins to support storage • IT funded by indirects (F&A) • Storage Chargebacks starting July 2014 Economy File Project
  • 4. 4 Economy File Project – Why did we do it? Researchers concerned about …. • Significant storage costs – $40/TiB/month chargebacks (first 5 TB is free) and declining grant funding • “If you charge us please give us some cheap storage for old and big files” • (Mis)perception on storage value (I can buy a hard drive at BestBuy) Economy File Project
  • 5. 5 Economy File Project – Why did we do it? Finance concerned about …. • Cost predictability and scale - Data growth causes storage costing up to $1M per year - Omics data grows at 40%/Y and chargebacks don’t cover all costs - Expensive forklift upgrades every few years • The public cloud (e.g. Amazon S3) set new transparent cost benchmarks Economy File Project
  • 6. 6 Economy File Project – Why did we do it? Plus we had the complexities of data protection and archiving • How can I store something really long term • But so I can access it quickly whenever I need it • Difficulties meeting backup recovery time and -point objectives Economy File Project
  • 7. 7 Economy File Project – Why did we do it? So we looked for a solution – but not necessarily object storage… • Users need to work with a mountable filesystem • Self service & access with standard unix tools unix tools (no call to IT!!) • Need to support larger data sets for some key users • 85% were okay with current free 5TB allocation on NAS • But 15% need access to older/bigger data sets • Not all resources for few customers (equity) Economy File Project Directory Directory Directory Directory Directory Directory Directory
  • 8. 8 Economy File Project – Why did we do it? So we looked for a solution – but not necessarily object storage… • Performance needs not too high, but may want to grow • Large datasets are often put on cheap storage but over time users often want it faster • Some concerns about disaster recovery and resiliency • Earthquakes in Seattle have gone up to 9.0 (in 1700) • Replicate to multiple buildings and still keep costs low? • In Phase II want to replicate to a far away place that does not rain bricks Economy File Project Cars in Seattle covered in bricks after the 2001 Nisqually earthquake (AP Photo)
  • 9. 9 Economy File Project – Key Characteristics • Permanent, static data • CIFS and NFS access • POSIX permissions • Optimized for economy • Self-healing/self-protecting • Snapshot/versioning • Manual file migration • Very low cost Economy File Project
  • 10. 10 Economy File Project – Detailed Priorities Must Have • Scalability • Data resilience and self-protection • Protection from accidental deletion • Adequate performance • CIFS and mountable file system with >16 groups per user • POSIX like file system • Low cost (<$20/TB/mo) • Long term storage • Supportable • Monitoring and logging • Self-service access • Active Directory integration • 99.5% Availability Should Have • Growth covered by chargeback • Ability to leverage the cloud • Ability to create an off-site replica • Role based access for sys admin • Symbolic links Nice to Have • Capacity quotas • Migration tools • Broader storage strategy • Write Once Read Many (WORM) • Unified management interface • High-availability • Multi-tenant partitioning Economy File Project
  • 11. 11 Economy File Project – Why Object Storage? We liked Object/Cloud Storage because: • Scaling of capacities – no forced end of life because of lack of capacity • Manageability – e.g. no RAID or lower level OS management involved • Resiliency – built-in protection, no out of band process (e.g. tape) • Consistency / predictability – no performance penalties because of data striping, RAID rebuilds • Flexibility – integration and extensibility better than file-based But object storage was not a core requirement. We would have accepted a POSIX filesystem as well. Economy File Project
  • 12. 12 Economy File Project – Why Swift/SwiftStack? We liked Swift/SwiftStack because: - Proven – used by some big clouds: HP Cloud, Rackspace, IBM, Disney (>50PB) - Manageability – SwiftStack makes Swift easier to install & maintain than any other storage system we currently support - Low cost – open source base plus commodity hardware - Longevity – archive life of 20+ years better supported by open source project vs. any corporate storage line (3PAR -> HP) - Performance – shared nothing architecture has no bottlenecks - Durability – redundant data accessible even when two drives, nodes, data centers fail due to unique as possible data distribution - Cloudiness – scientific applications can start adopting cloud storage paradigm (e.g. RESTful API) and data will be easy to migrate to public cloud when ready (This will take many years !!) - Future – young project/product, still adding features and growing Swift met both current and potential future needs Economy File Project
  • 13. 13 Economy File Project – Conceptual View Economy File Project Object Controller Object Controller Object Controller E2 J4 M4 CIFS NFSStorage Gateway Files
  • 14. 14 Economy File Project – Swift Data Redundancy Swift places 3+ replicas of all data as unique as possible Economy File Project Single Node Cluster Disks are “as-unique-as-possible” Large Cluster Storage Racks are “as-unique-as-possible” Multi-Region Distributed data centers are “as-unique-as-possible” Small Cluster Storage Nodes are “as-unique-as-possible” FAQ: can we run only 2 replicas? Yes we can , but maintenance / quorum
  • 15. 15 Economy File Project – Why does Swift look so familiar? Common foundations meant we could leverage • Ops staff expertise • Existing applications and best practices • Existing Troubleshooting tools (e.g. wireshark) Swift object storage building blocks • Linux, Python, & rsync foundation • Swift open source storage (written in Python) • Compatible filesystem gateway (Linux / fuse mount) • On commodity hardware with consumer drives • Managed by always up to date SwiftStack Controller installed in the cloud Hutch’s Scientific Computing / HPC building blocks • Linux, Python, & rsync foundation • Running on commodity hardware Economy File Project
  • 16. 16 NextGen Storage Architecture at the Hutch – Where Swift Fits Economy File Project $3/TB/ Month $40/TB/ Month
  • 17. 17 Economy File Project – What were the other options? Gluster FS worked well for us as scratch file system but: • Could not find any proof of scalability (<1PB) • Replication features seem complex and not widely in use (e.g. web sites like “GlusterFS replication do's and don'ts”) • Need to manage some kind raid on the underlying storage box Looked at fully integrated and commercial HSM solutions • Hierarchical Storage Management “file stubbing” can work but… • Peers warned of too much management and unpredictable performance at times • Many require forklift upgrades at some point Plus reviewed a number of commercial object storage vendors but didn’t meet cost/access requirements • Erasure coding is less effective if you sustain a datacenter drop-out Economy File Project
  • 18. 18 Economy File Project – how Swift compares to other OSS Economy File Project Swift *) Ceph Gluster Founded 2010 2007 2005 Language Python C++ C Lines of Code 72000 524000 798000 Contributors in 12 mo 87 126 73 Commits / month 100+ 500+ 100+ *) without CIFS/NFS gateway Is a young and lean code base a benefit?
  • 19. 19 Economy File Project – Why not in the public cloud? Still uncertainties with storing genomic data in public cloud… - NIH policies and fuzziness – what is really ok? • Genomic data must be secure to protect research participants • Desire to audit data access (which only AWS provides?) - All our compute is still local But prepared for future migration - Swift has S3 API (e.g. boto, galaxy) - Programmers can develop cloud compatible apps today - No big forklift upgrades prior migration Economy File Project
  • 20. 20 Economy File Project – Why not in the public cloud? Costs and perception are changing… • Amazon and Google just dropped storage prices significantly • Cheap compared to high performance NAS tier at $40-60 /TB/month • But you still need a fast POSIX fs Swift storage very low cost at only $13/TB/month • half the cost/TB of Amazon S3 or Google Cloud Storage But cloud will be chosen often because we need a “Switzerland” Economy File Project $/TB/Month NAS Amazon Google Swift 40 27 25 13
  • 21. 21 Economy File Project – What we deployed Setup three zones in Swift for durability - One zone per datacenter / building redundancy and durability Swift Storage Nodes - Two in each zone –Swift Proxy Nodes - Two proxy nodes initially – will scale based on traffic SwiftStack Management Node - Management only, not in path of user’s storage requests - Management node in the cloud, local install optional Filesystem gateway in one zone - Only one initially – currently the only single point of failure Initial deployment easily scaled/reconfigured with SwiftStack Economy File Project
  • 22. 22 The Hutch Campus & storage resiliency from 3 zones Economy File Project
  • 23. 23 Economy File Project – Architecture View Economy File Project
  • 24. 24 Economy File Project – Swift Storage Nodes Supermicro SC847 4U Chassis – configured by Silicon Mechanics - 144TB raw capacity (~130TiB usable) • No RAID controllers & no storage lost to RAID • 36 x 4TB 3.5” Seagate SATA desktop drives (24 front, 12 back) • 2 x 120GB SSDs (OS + metadata) - 10Gb Base-T connectivity - Two Intel Xeon E5 CPUs and 64 GB RAM - $13,239 each + tax Economy File Project
  • 25. 25 Economy File Project – Swift Storage Nodes What grade of drive? • Use consumer vs. enterprise grade drives to save cost? • But how long will consumer grade drives live • Open question, but Blackblaze thinks a median of 6 years isn’t bad at all • What’s the server lifecycle? And how to replace them? • Est. 20 drives/mo will fail (in 3 PiB system) so 1 fails per day • Replace failed drives every day or just once a month? • With a bigger drive? Economy File Project
  • 26. 26 Economy File Project – Networking Components NETGEAR PROSAFE XS712T • 12 ports 10Gb Base-T (Cat-6) • 2 ports 10Gb SFP+ • $1,430 each • Used in each zone Economy File Project
  • 27. 27 Economy File Project – POC Architecture Economy File Project
  • 28. 28 Economy File Project – What did it cost? Totals for storage • 216 4TB storage drives across six storage nodes • 864TB of raw storage • 260TB of usable capacity after Swift redundancy Costs for hardware alone • Total cost is $83,437 for all nodes and switches • $320 per TB of usable capacity • Expected cost over 5Y life is $5.35/TB/month (plus replacement of drives that are out of warranty) Operating costs • Mainly from .25 FTE and software licenses • Power/cooling costs very low due to cheap NW hydro power Economy File Project
  • 29. 29 Economy File Project – How we manage it? Needed system to require minimal management • Production deployment took about 7 days • Managed by Enterprise Storage Ops team of 3 Find a hardware vendor (e.g. Silicon Mechanics) who • Has experience in Swift deployment • Will offer consumer hard drives ready to go in drive carriers Should take less than 0.25 FTE to operate • Make sure you have a parts depot for low cost stuff (e.g. switches, drives) Also needed robust, mature management tools • SwiftStack’s automation made roll-out and management simple and you never need to upgrade this cloud app Economy File Project
  • 30. 30 Economy File Project – Managed by SwiftStack Open source Swift functional but… • Lacking some enterprise features like delete protection, monitoring, user management and cifs/nfs • Deployment requires a lot of manual tasks With the proprietary SwiftStack Controller • Provides cloud-based management console w/runtime on Swift nodes • Filesystem gateway, undelete feature, user web portal to Swift • Competent Swift support 24x7 (a global company . . staff in Taiwan responding at 2am PST) Economy File Project SwiftStack Runtime SwiftStack FS Gateway SwiftStack Console
  • 31. 31 Economy File Project – Management with SwiftStack SwiftStack provides control & visibility - Deployment automation • Let us roll out Swift nodes in 10 minutes • Upgrading Swift across clusters with 1 click - Monitoring and stats at cluster, node, and drive levels - Authentication & Authorization - Capacity & Utilization Management via Quotas and Rate Limits - Alerting, & Diagnostics Economy File Project
  • 32. 32 Economy File Project – SwiftStack Architecture Economy File Project Standard Linux Distribution Off-the-shelf Ubuntu, Red Hat, CentOS Standard Hardware Dell, HP, Supermicro, Quanta, Lenovo… Swift Runtime Integrated storage engine with all node components Integrations & Interfaces End-user web UI, legacy interfaces, authentication, utilization API, etc. OpenStack Swift Released and supported by SwiftStack 100% Open Source SwiftStack Nodes (2 —> 1000s) Rolling Upgrades & 24x7 Support Monitoring, Alerting & Diagnostics Capacity & Utilization Mgmt. Client Support Ring & Cluster Management Authentication Services Deployment Automation SwiftStack Controller
  • 33. 33 Economy File Project – How well does it work? Overall costs are very low – exceeded project goals • Only $13/TB/month all inclusive • Much lower than current $40/TB/mo chargeback Simple to manage • Management focuses on CIFS/NFS gateway • SwiftStack provides one web console for entire cluster Economy File Project CIFS/NFS Gateway SwiftStack Console Swift Node Swift Node Swift Node Files
  • 34. 34 Economy File Project – How well does it work? Initial performance is adequate • All requests initially threaded via single CIFS/NFS gateway (ca. 300MB/s) • Doesn’t take full advantage of Swift cluster’s redundancy via proxies • SwiftStack provides monitoring of load on Swift nodes Lots of headroom for future • Swift’s native HTTP API is very fast compared to fs gateway (multiple GB/s) • Direct HTTP access goes from client directly to Swift node(s) • Other options like application integration and web portal • Can use SwiftStack or other load balancers as front-end • Performance scales up linearly as nodes/clusters added Economy File Project
  • 35. 35 Economy File Project – How well does it work? Limitations in the CIFS / NFS gateway • No symlinks nor hardlinks (it’s on the roadmap) • e.g. Cannot checkout a github repos into FS • Archiving dir structure that contains symlinks will be incomplete • Cannot rename directory structures • Need to rename each file in directory • File renames are slow (copy-then-delete operations) • Renaming a file hangs all other IO in the affected directory tree • du command does not report right file size (wrote replacement) Recently completed/fixed CIFS / NFS gateway issues • Full support for POSIX permissions. (chmod 2770 myfolder now activates the setgid bit) • vi occasionally complained about inaccurate metadata File <> Object but they may converge after all • SwiftStack CIFS/NFS Gateway is the only NAS gateway passing through native swift objects (use cifs or boto to access th same object) Economy File Project
  • 36. 36 Economy File Project – NFS/CIFS Benefits for Users Researchers use familiar access method • CIFS/NFS gateway gives them same mountable fs access • No retraining needed, same practices work • Performance almost as good as existing legacy storage Cost savings for users • Big savings over NAS/SAN storage passed on to users • $3/TB/month instead of $40/TB/month chargeback (1/13th cost) Plus lots of future potential • Simple to grow capacity for continuing -omics data glut • HTTP RESTful APIs allow direct integration with lab automation • Can take on more uses/systems as other storage ages • Can take the CIFS/NFS training wheels off in the future Economy File Project CIFS/NFS Gateway Files
  • 37. 37 Economy File Project – Timeline Rapid timeline ? • 2 years from 1st contact to go-live in April 2014 • Initial evaluation of SwiftStack only took 2 days Initial targets for Swift storage • Archive data, Large datasets, bam files from external sources, sequencer output (2x HiSeq 2500) • 15-20 labs in 1st rollout of Swift production storage First deployed via familiar access path • All users initially on SwiftStack CIFS/NFS gateway • Provides familiar mountable file system & AD integration • Migration to other Swift clients later • Web portal, HTTP and other APIs • Much faster and more flexible Economy File Project
  • 38. 38 Economy File Project – Lessons Learned • Communication / testing with users & ops is key • Try to make life easier for ops staff • Assistance from HPC or Architecture staff during initial phase • Incentives such as delegating drive replacements to data center staff • Communicate ALL limitations / unexpected behavior to users prior to first test • And be ultra responsive during test phase (even if the questions are not related to the system) Economy File Project
  • 39. 39 Economy File Project – Lessons Learned Prioritize requirements • Near-term success is critical • But don’t ignore need for future potential/headroom Confirm requirements • (example: symlinks) • “You really need them?” “Yes” • “What for?”, “Why are you asking, symlinks are totally essential!” • (discussion to resume at later time) Economy File Project
  • 40. 40 Economy File Project – Lessons Learned Giving up Tape Backup is a pretty big step • Have to be certain that bugs do not affect data integrity • Want some really big shops to use the same technology to feel comfortable • Risk management to include your staff • Staff errors must not lead to data loss • Restrict access to data centers and management console • Compliance best practices Economy File Project
  • 41. 41 Economy File Project – Lessons Learned Think hard before putting a POSIX gateway in front of object store for long term data / archives • Most Gateways put blobs in into objectstore and you cannot use other clients or the cloud API to access data • Always dependent on Gateway appliance vendor • Using a gateway appliance & Glacier is like a Medieval marriage …….. Forever • The SwiftStack CIFS/NFS gateway has limitations but will allow for a more seamless migration to cloud storage as you can continue to use other clients (convergence ?!) • POSIX file system gateways are still good for many other datasets Economy File Project
  • 42. 42 Economy File Project – What’s Next with Swift(Stack) Evaluate Data protection alternatives • Backup acceleration (e.g. Riverbed Whitewater) • Endpoint backup backend (e.g. Druva) • SQL / Exchange backup (e.g. Cloudberry Backup) Other potential use cases • Galaxy storage backend (via S3 compatibility layer) • File sharing backed (e.g. owncloud) • High performance POSIX file system backed: Avere • Software only POSIX fs backend: NuageLabs • Cloud Drive backed (e.g. Cloudberry Drive, Expandrive) • High capacity block storage gateway Digging deeper • Auditing object access using Syslog and Splunk • Do we need to have a copy outside the earthquake zone? Economy File Project