• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
2012 open storage summit   keynote
 

2012 open storage summit keynote

on

  • 5,182 views

Randy Bias, Co-Founder and CTO of Cloudscaling, speaks on open storage, fault tolerance and the concept of failure "blast radius" at the Open Storage Summit, hosted by Nexenta in May 2012.

Randy Bias, Co-Founder and CTO of Cloudscaling, speaks on open storage, fault tolerance and the concept of failure "blast radius" at the Open Storage Summit, hosted by Nexenta in May 2012.

Statistics

Views

Total Views
5,182
Views on SlideShare
2,089
Embed Views
3,093

Actions

Likes
3
Downloads
63
Comments
0

16 Embeds 3,093

http://www.cloudscaling.com 2819
http://cloudscaling.com 109
http://neotactics.com 86
http://dev.cloudscaling.com 25
http://feeds.feedburner.com 14
http://www.newsblur.com 13
http://www.hanrss.com 6
https://si0.twimg.com 5
http://www.linkedin.com 5
http://xianguo.com 3
http://aggiebug.com 2
https://www.linkedin.com 2
http://translate.googleusercontent.com 1
http://www.cloudscaling.com. 1
http://localhost 1
http://cafe.naver.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NoDerivs LicenseCC Attribution-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    2012 open storage summit   keynote 2012 open storage summit keynote Presentation Transcript

    • Cloud Storage Futures (previously: Designing Private & Public Clouds)May 22nd, 2012Randy Bias, CTO & Co-founder@randybias CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution* * All unlicensed or borrowed works retain their original licenses
    • Part 1:The Two Cloud Architectures 2
    • A Story of Two Clouds Scale-outEnterprise 3
    • ... Driven by Two App Types New Elastic AppsExisting Apps 4
    • Cloud Computing ... Disrupts Mainframe Enterprise Cloud Computing Computing Computing "Big Iron" "Client-Server" "Web"1960 1980 2000 2020 5
    • Cloud Computing ... Disrupts Mainframe Enterprise Cloud Computing Computing Computing "Big Iron" "Client-Server" "Web"1960 1980 2000 2020 Disruption 5
    • Cloud Computing ... Disrupts Mainframe Enterprise Cloud Computing Computing Computing "Big Iron" "Client-Server" "Web"1960 1980 2000 2020 Disruption Disruption 5
    • Cloud Computing ... Disrupts Mainframe Enterprise Cloud Computing Computing Computing "Big Iron" "Client-Server" "Web"1960 1980 2000 2020 Disruption Disruption Disruption 5
    • IT – Evolution of Computing Models SLA Scaling Hardware HA Type SoftwareConsumption Mainframe Enterprise Cloud “big-iron” “client/server” “scale-out” 6
    • IT – Evolution of Computing Models SLA 99.999 Scaling Vertical Hardware Custom HA Type Hardware Software CentralizedConsumption Centralized Service Mainframe Enterprise Cloud “big-iron” “client/server” “scale-out” 6
    • IT – Evolution of Computing Models SLA 99.999 99.9 Scaling Vertical Horizontal Hardware Custom Enterprise HA Type Hardware Software Software Centralized DecentralizedConsumption Centralized Shared Service Service Mainframe Enterprise Cloud “big-iron” “client/server” “scale-out” 6
    • IT – Evolution of Computing Models SLA 99.999 99.9 Always On Scaling Vertical Horizontal Hardware Custom Enterprise Commodity HA Type Hardware Software Software Centralized Decentralized DistributedConsumption Centralized Shared Self-service Service Service Mainframe Enterprise Cloud “big-iron” “client/server” “scale-out” 6
    • Enterprise Computing(existing apps built in silos) 7
    • Cloud Computing(new elastic apps) 8
    • Scale-out apps require elastic infrastructure Traditional apps Elastic cloud-ready appsAPPSINFRA 9
    • Scale-out Cloud Technology Traditional apps Elastic cloud-ready appsAPPSINFRA 10
    • Scale-out Principles• Small failure domains• Risk acceptance vs. risk mitigation• More boxes for throughput & redundancy• Assume app manages complexity: • Data replication • Assumes infrastructure is unreliable: • Server & data redundancy • Geo-distribution • Auto-scaling 11
    • What’s a failure domain?• “Blast radius” during a failure• What is impacted?• Public SAN failures: • FlexiScale SAN failure in 2007 • UOL Brazil in 2011: • http://goo.gl/8ct9n • There are many more• Enterprise HA ‘pairs’ typically support BIG failure domains 12
    • Two Diff Arches for Two Kinds of AppsApp Elastic Scale-out Location of complexity Cloud “Enterprise” CloudInfra Primary scaling dimension Up Out 13
    • Part 2:Storage Architectures Are Changing 14
    • Two Diff Storages for Two Kinds of Clouds “Classic” “Scale-out” Storage Storage Uptime in Infra Uptime in appsEvery part is redundant Minimal h/w redundancy Data mgmt in Infra Data mgmt in appsBigger SAN/NAS/DFS Smaller failure domains 15
    • Difference in TiersTier $ Purpose Classic Scale-out Mission •SAN, then NAS •On-demand SAN (EBS) 1 $$$$ Critical •10-15K RPM •DynamoDB (AWS) •SSD •Variable service levels •NAS then SAN •DAS 2 $$ Important •7.2K RPM •App / DFS to scale out Archive & •Tape 3 $ Backups •Nearline 5.4K •Object Storage 16
    • The Biggest Difference is in Where Data Management Resides• In scale-out systems, apps are managing the data: • Riak / Scale-out distributed data store • Hadoop+HDFS / Scale-out distributed computation systems • Cassandra / Scale-out distributed columnar database 17
    • Cassandra / Netflix use case• 3 x Replication• Linearly scaling performance • 50 - 300 nodes• > 1M writes/second• When is this perfect? • data size unknown • growth unknown • lots of elastic dynamism 18
    • Cassandra / Netflix use case• DAS (‘ephemeral store’)• Per node perf is constant • disk • CPU • network• Client write times constant• Nothing special here 19
    • Cassandra / Netflix use case• On-demand & app-managed• Cost per GB/hr: $.006• Cost per GB/mo: $4.14• Includes: storage, DB, storage admin, network, network admin, etc. etc. etc. 20
    • Part 3:Scale-out Storage ... Now & Future 21
    • Only Change is Certain 22
    • There are a few basic approaches being taken ... 23
    • Scale-out SAN• In-rack SAN == faster, bigger DAS w/ better stat-muxing• Accept normal DAS failure rates• Assume app handles data replication Dedicated Storage SW• Like AWS ‘ephemeral storage’ 9K Jumbo Frames SSD caches (ZIL/L2ARC)• KT architecture No replication• Customers didn’t “get it” Max HW redundancy • “Ephemeral SAN” not well understood 24
    • AWS EBS - “Block-devices-as-a-Service” API Cloud Control System EBS Scheduler• Scale-out SAN (sort of)• Block scheduler EBS Clusters• Async replication Core Network 1 1 • Some failure tolerance 2• Scheduler: 2 3 • Allocates customer block 4 Inter-rack cluster async devices across many VM1 N1 1 3 3 5 replication failure domains RACK RACK 6• Customer run RAID inside 7 Intra-rack cluster async VMs to increase VM2 N2 2 8 replication redundancy 25
    • DAS + Big Data (Storage + Compute + DFS)• Storage capability: • Replication • Disk & server failure • Data rebalancing • Data locality • rack awareness • Checksums (basic)• Also: • Built in computation 26
    • Distributed File Systems (DFS) over DAS• Storage capability: • Replication • Disk & server failure • Data rebalancing • Checksums (w/ btrfs) • Block devices• Also: ✴ ceph architecture • No computation ✴ Looks familiar doesn’t it? 27
    • Why is DFS at the Physical Layer Dangerous for Scale-out? DFS == 28
    • DAS + Database Replication / Scaling• Storage capability: • Async/Sync Replication • Server failure • Checksums (sort of)• Also: • Std RDBMS • SQL i/f • Well understood 29
    • Object Storage• Storage capability: • Replication • Disk & server failure • Data rebalancing • Checksums (sometimes)• Also: • Looks like a big web app • Uses a DHT/CHT to ‘index’ blobs • Very simple 30
    • Where does OpenStorage Fit? Scale-out Virtual or Purpose / Tier Fit Solution Physical?Scale-out SAN Tier-1/2 Physical In-rack SAN EBS Clusters EBS Tier-1 Physical (scale-out SAN) Reliable, bit-rotDAS+BigData Tier-2 Virtual resistant DAS Reliable, bit-rot DAS+DFS Tier-2 Physical / Virtual resistant DAS (unproven) In-VM reliable DAS+DB Tier-2 Virtual DAS Reliable, bit-rotObject Storage Tier-3 Physical resistant DAS 31
    • Summarizing ZFS Value in Scale-out• Data integrity & bit rot an issue that few solve today• Most SAN/NAS solutions don’t ‘scale down’• Commodity x86 servers are winning• There are two scale-out places ZFS wins: • Small SAN clusters • Best DAS management 32
    • Summary 33
    • Conclusions / Speculations• Build the right cloud• Which means the right storage for *that* cloud• A single cloud might support both ...• Open storage can be used for both ... • ... WITH the appropriate design/forethought 34
    • Q&A@randybias 35