Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Above the Clouds: A View From Academia


Published on

The closing keynote by Armando Fox at the Eduserv Symposium 2011 - Virtualisation and the Cloud.

Published in: Technology, Education
  • You can watch a video of Armando Fox giving his presentation at the Eduserv Symposium 2011:
    Are you sure you want to  Yes  No
    Your message goes here
  • very good
    Are you sure you want to  Yes  No
    Your message goes here

Above the Clouds: A View From Academia

  1. Above the Clouds: A View From Academia Armando Fox, UC Berkeley EDUSERV Symposium, 12 May 2011 Presentation slides licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License. Image: John Curley
  2. Who Am I? <ul><li>Research: Internet-scale systems; productive parallel programming </li></ul><ul><li>Teaching: software engineering </li></ul><ul><li>Writing: co-author Above the Clouds tech report </li></ul><ul><li>Disclaimer 1: I don’t speak for UC </li></ul><ul><li>Disclaimer 2: Relationship with Amazon </li></ul>
  3. How We Got Into the Cloud: RAD Lab’s 5-year Mission <ul><li>Enable 1 entrepreneur to prototype a great Web app over 3-day weekend, then deploy at scale </li></ul><ul><li>Key technology: Statistical machine learning </li></ul><ul><li>Early critiques: “Demonstrate your ideas at scale!” </li></ul><ul><li>Moved from Sun Blackbox to EC2 in mid-2008 </li></ul><ul><li>Feb. 2009: Above the Clouds tech report * </li></ul><ul><ul><li>Over 50K downloads, influenced high-profile IT co.’s </li></ul></ul>*, or CACM April 2010
  4. Outline: Two Themes <ul><li>Academic clouds: public or private? </li></ul><ul><ul><li>Theme 1 : save money or improve research? </li></ul></ul><ul><ul><li>Theme 2: cloud user or cloud provider? </li></ul></ul><ul><li>Assumption : familiar with cloud basics </li></ul><ul><ul><li>Public/pay-as-you-go </li></ul></ul><ul><ul><li>Private/closed/“condo” </li></ul></ul><ul><li>Non-goal : regulatory thickets around cloudifying “sensitive” information </li></ul>
  5. Public Cloud: CS Research <ul><li>Over $350,000 spent on AWS since 2008 </li></ul><ul><ul><li>PhD student ~ US$75k/year => cloud ~ 1/3 student/mo. </li></ul></ul><ul><li>Experiments: 100-300 nodes common, 900 max </li></ul><ul><ul><li>large-scale storage, cloud programming, MapReduce </li></ul></ul><ul><ul><li>results at scale now required for top-tier conferences </li></ul></ul><ul><ul><li>most experiments last 0-4 hours </li></ul></ul><ul><ul><li>“ Small” experiments also in cloud for convenience </li></ul></ul><ul><li>Comparison: Sun BlackBox at Berkeley </li></ul><ul><ul><li>$200k acquire & install </li></ul></ul><ul><ul><li>$300k+ in hardware donations </li></ul></ul><ul><ul><li>staff: ≥0.5 FTE </li></ul></ul>
  6. Public Cloud: CS Education <ul><li>Great Ideas in Computer Architecture (reinvented Fall 2010): 190 students </li></ul><ul><li>Software Engineering for Software-as-a-Service: 70 students </li></ul><ul><li>Operating Systems: 70 students </li></ul><ul><li>Intro. Data Science: 30 students </li></ul><ul><li>Adv. topics in HCI: 20 students </li></ul><ul><li>Natural language processing: 20 students </li></ul><ul><li>Large-scale programming abstractions for the cloud: ~20 students (Fall 2011) </li></ul><ul><li>Administration, provisioning, sizing much easier on public cloud than UC instructional computing </li></ul>
  7. Cloud Economics <ul><li>“ Private should be cheaper if you have stable utilization” </li></ul>Demand Capacity Time Demand Capacity Time
  8. $Private < $Public? <ul><li>Capital: hardware, networking, power 5-7x cheaper at 100K’s scale (Hamilton 2007) </li></ul><ul><li>Operations: heavy automation => 1000’s machines per FTE admin </li></ul><ul><li>R&D: cloud providers had to serve internal business need </li></ul><ul><li>Services: “Scale makes availability affordable”: wide-area disaster recovery facilities </li></ul><ul><li>Hidden/shared costs: power, cooling, staff, .... </li></ul>
  9. Hard to Compete on Cost <ul><li>Zero-touch metering/billing infrastructure </li></ul><ul><li>Optimized for low margin </li></ul><ul><ul><li>$0.08/hr: virtual CPU on EC2 </li></ul></ul><ul><ul><li>$0.02-0.08/hr: as-available “spot instances” </li></ul></ul><ul><ul><li>Reserved (prepay 1-3 years, save if utilization > 25%) </li></ul></ul><ul><ul><li>$0.00/hr: 1 year free usage tier for all services </li></ul></ul><ul><ul><li>Private: ≥ $0.075/hr ($2000 private server amortized over 3 years with no indirect costs) </li></ul></ul><ul><li>“ Moving to EC2 would cost about a factor of 2” </li></ul><ul><ul><li>Highly placed colleague at major social site </li></ul></ul>
  10. Try to smooth out peaks? <ul><li>Not waiting in queues accelerates research! </li></ul><ul><ul><li>Run several experiments simultaneously, each using 100’s of machines for 1-2 hours, without queueing up </li></ul></ul><ul><ul><li>Basic queueing theory: trade utilization vs service time </li></ul></ul><ul><ul><li>Better performance isolation than private cloud (!) </li></ul></ul><ul><ul><li>N.B. for long jobs, some queueing may be OK </li></ul></ul><ul><li>Corollary 1: cost-associative billing encourages research spontaneity </li></ul><ul><li>Corollary 2: incentive to stop using is important! </li></ul><ul><li>Effective metering & billing is key to on-demand usage model </li></ul>
  11. Example: wait times on UC Berkeley “Mako” cluster <ul><li>Mako has 272 dual-socket (quad-core per socket) nodes with 24 GB RAM each </li></ul><ul><li>Source: ShaRCS—Shared Research Computing Services, presentation by UC Office of the President at the UC Cloud Summit, April 2011 </li></ul>
  12. On the other hand...Big Data * Simson L. Garfinkel, An Evaluation of Amazon’s Grid Computing Services: EC2, S3 and SQS, Technical Report TR-08-07, School of Engineering & Applied Sciences, Harvard University, 2008.  Source: Ed Lazowska, eScience 2010, Microsoft Cloud Futures Workshop, <ul><li>Challenge: Long-haul networking is most expensive cloud resource, and improving most slowly </li></ul><ul><li>Copy 8 TB to EC2 at ~20 Mbps*: ~35 days, ~$800 </li></ul><ul><li>Ship four 2 TB drives to Amazon: 1 day, ~$150 </li></ul><ul><li>Can private/shared networking resources be combined with public cloud to get best of both? </li></ul>Application Data generated per day  DNA Sequencing (Illumina HiSeq machine) 1 TB Large Synoptic Survey Telescope 30 TB; 400 Mbps sustained data rate between Chile and NCSA Large Hadron Collider 60 TB
  13. On the other hand...Cloud Provider <ul><li>Hard research on public cloud: </li></ul><ul><ul><li>scheduling/provisioning research </li></ul></ul><ul><ul><li>security: honeypots, malware containment,epidemic modeling </li></ul></ul><ul><ul><li>energy efficiency or other physical monitoring </li></ul></ul><ul><ul><li>experimenting with networking fabric, multicast, etc. </li></ul></ul><ul><li>N.B., cloud provider research needs cloud users! </li></ul><ul><ul><li>Example: Microsoft Research Silicon Valley “Sherwood” cluster (~240 nodes) </li></ul></ul><ul><li>Demanding customers drove cloud research </li></ul>
  14. Nonprofit/Academic clouds <ul><li>PlanetLab & Emulab </li></ul><ul><ul><li>highly successful from their customers’ point of view </li></ul></ul><ul><ul><li>lots of great research, some of which might have been impossible on today’s public cloud </li></ul></ul><ul><li>Academic/research clusters </li></ul><ul><ul><li>Yahoo/IBM/M45 cluster, Google/IBM cluster, TerraGrid: primarily application-level research </li></ul></ul><ul><ul><li>OpenCirrus (HP/Intel/Yahoo/UIUC/IDA Singapore/Karlsruhe): bare-metal, federated, 1K+ cores/site </li></ul></ul><ul><li>Access model: write proposal; closed community </li></ul><ul><li>Saving money is non-goal (in fact, a subsidized investment by universities & industrial partners) </li></ul>
  15. OpenCirrus <ul><li>Infrastructure costs increase with # sites </li></ul><ul><li>Claim: even at ~50% utilization, owning your infrastructure pays for itself in ~3 years </li></ul><ul><li>Source: R. Campbell et al., OpenCirrus..., Proc. 2011 Workshop on Hot Topics in Cloud Computing (HotCloud’09), June 2011 (to appear) </li></ul>
  16. Public & private clouds don’t see same benefits * Implies ability to meter, and incentive to release idle resources Benefit Public Private “ infinite” resources on-demand Yes No Instantaneous provisioning Yes Varies Better hardware Yes No Zero-commitment pay-as-you-go* Yes No Reduced costs from economy of scale Yes No Can do “cloud provider” research No Yes Can trust co-tenants No Yes Better utilization through virtualization Yes Yes Quickly & inexpensively move big data No Yes Address data-custody regulatory issues Varies Yes
  17. So You Want to Build a Cloud... <ul><li>Single point of failure? </li></ul><ul><li>Zero Touch? </li></ul><ul><li>Hidden costs? </li></ul>
  18. Single point of failure <ul><li>30+ hour EBS outage on 21 April 2011 </li></ul><ul><ul><li>triggered by human error (network config change) </li></ul></ul><ul><li>Georedundant services (Netflix) largely unaffected </li></ul><ul><ul><li>At least, georedundancy was an available option! </li></ul></ul><ul><li>Non-redundant services had catastrophic outages </li></ul><ul><li>Question: would “more” operational expertise have resolved outage faster? </li></ul>
  19. Metering and Billing <ul><li>Billing is policy. Metering is mechanism. </li></ul><ul><ul><li>Pay-as-you-go policy allows cost associativity </li></ul></ul><ul><ul><li>Any policy only as flexible as its mechanism </li></ul></ul><ul><ul><li>Amazon’s mechanism: “zero touch” metering </li></ul></ul><ul><ul><li>So, Virtual Private Cloud ≠ your private cloud </li></ul></ul><ul><li>Which of these need human intervention: </li></ul><ul><ul><li>Signing up? Provisioning? Deploying? Billing? </li></ul></ul><ul><ul><li>Academic/nonprofit clouds don’t even try this </li></ul></ul>
  20. Hidden Costs <ul><li>Single billing scheme captures all costs, or must some costs be billed/accounted separately? </li></ul><ul><ul><li>shared expenses: power, networking </li></ul></ul><ul><ul><li>general employment benefits/overhead for staff </li></ul></ul><ul><li>Cost of keeping up with innovation </li></ul><ul><ul><li>On average, AWS has deployed 1 new service every 2 months since EC2 beta launch* </li></ul></ul><ul><li>Competition from new providers will exacerbate </li></ul><ul><ul><li>Microsoft Azure, VMware CloudFoundry, ... </li></ul></ul><ul><ul><li>* 21 Web service APIs as of April 2011 </li></ul></ul>
  21. Two themes <ul><li>Academic clouds: public or private? </li></ul><ul><ul><li>Theme 1 : save money or improve research? </li></ul></ul><ul><ul><li>Theme 2: cloud user or cloud provider? </li></ul></ul><ul><li>Capability </li></ul><ul><ul><li>Cloud accelerates and enables new research </li></ul></ul><ul><ul><li>Scale that can’t be achieved any other way </li></ul></ul><ul><li>Cost </li></ul><ul><ul><li>Will private cloud cost less? Is that the main goal? </li></ul></ul><ul><ul><li>Have hidden costs been accounted for? </li></ul></ul><ul><ul><li>Cost-associativity allows bursty use, encourages spontaneity, but needs fine grained metering </li></ul></ul>
  22. Two themes <ul><li>Academic clouds: public or private? </li></ul><ul><ul><li>Theme 1 : save money or improve research? </li></ul></ul><ul><ul><li>Theme 2: cloud user or cloud provider? </li></ul></ul><ul><li>Cloud provider research may require private cloud </li></ul><ul><ul><li>Security, energy, bare-metal, cloud provisioning, ... </li></ul></ul><ul><ul><li>But, still need cloud users (customers) to drive/validate </li></ul></ul><ul><ul><li>Need public-cloud-level APIs, service reliability </li></ul></ul><ul><li>Cloud user </li></ul><ul><ul><li>Big data may impede some public-cloud-ready apps </li></ul></ul><ul><ul><li>Exotic architectures (SSD, in-memory DB, ...) </li></ul></ul><ul><ul><li>Regulatory issues.... </li></ul></ul>
  23. Summary <ul><li>Public cloud shows how to “move slider” between insourcing & outsourcing </li></ul><ul><li>Unlikely to compete on cost with very large scale public clouds </li></ul><ul><li>So, how much can/should you outsource... </li></ul><ul><ul><li>...for technical reasons (types of research possible)? </li></ul></ul><ul><ul><li>...for regulatory reasons (data privacy, etc.)? </li></ul></ul><ul><li>Remember the non-obvious costs </li></ul><ul><ul><li>Metering & billing, esp. for shared overheads </li></ul></ul><ul><ul><li>Keeping up with the ecosystem </li></ul></ul>
  24. Thanks! <ul><li>UC Berkeley Reliable Adaptive Distributed Systems Lab & Affiliates </li></ul><ul><li>UC Cloud Computing Task Force </li></ul><ul><li>Andy Powell & Eduserv </li></ul>RAD Lab Team in 2009