QuickTimeᆰ and a
                decompressor
        are needed to see this picture.




Cloud Computing
   Pete Perlegos...
QuickTimeᆰ and a




                                      Overview
        decompressor
are needed to see this picture.

...
QuickTimeᆰ and a




                                       Introduction
         decompressor
are needed to see this pict...
QuickTimeᆰ and a




                                       Introduction
         decompressor
are needed to see this pict...
QuickTimeᆰ and a




                                       Motivation for Change
         decompressor
are needed to see ...
QuickTimeᆰ and a




                                       Component Based Approach
         decompressor
are needed to s...
QuickTimeᆰ and a




                                       Virtualization
         decompressor
are needed to see this pi...
QuickTimeᆰ and a




                                       Virtualization
         decompressor
are needed to see this pi...
QuickTimeᆰ and a




                                       Datacenter Networking
         decompressor
are needed to see ...
QuickTimeᆰ and a




                                       Datacenter Networking
         decompressor
are needed to see ...
QuickTimeᆰ and a




                                        Data Storage
         decompressor
are needed to see this pic...
QuickTimeᆰ and a




                                        Data Storage
         decompressor
are needed to see this pic...
QuickTimeᆰ and a




                                       Data Storage
         decompressor
are needed to see this pict...
QuickTimeᆰ and a




                                       Data Storage
         decompressor
are needed to see this pict...
QuickTimeᆰ and a




                                       Data Analytics
         decompressor
are needed to see this pi...
QuickTimeᆰ and a




                                       Data Analytics
         decompressor
are needed to see this pi...
QuickTimeᆰ and a




                                        Cloud Interoperability
         decompressor
are needed to se...
QuickTimeᆰ and a




                                       Seeding the Clouds
         decompressor
are needed to see thi...
QuickTimeᆰ and a




                                       Cloud Economics
         decompressor
are needed to see this p...
QuickTimeᆰ and a




                                       Cloud Economics
         decompressor
are needed to see this p...
QuickTimeᆰ and a




                                       Distributed Languages
         decompressor
are needed to see ...
QuickTimeᆰ and a




                                       Conclusion
         decompressor
are needed to see this pictur...
QuickTimeᆰ and a




                                       Conclusion
         decompressor
are needed to see this pictur...
Upcoming SlideShare
Loading in …5
×

Pete Perlegos Cloud Computing

2,211 views

Published on

This is a Cloud Computing talk I gave at Xerox PARC.

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,211
On SlideShare
0
From Embeds
0
Number of Embeds
60
Actions
Shares
0
Downloads
72
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Pete Perlegos Cloud Computing

  1. 1. QuickTimeᆰ and a decompressor are needed to see this picture. Cloud Computing Pete Perlegos pete@perlegos.com PARC August 11, 2009
  2. 2. QuickTimeᆰ and a Overview decompressor are needed to see this picture.  Introduction  Need: Scale, Failure, Massive Data, Analysis  Component Based Approach  Virtualization  Datacenter Networking  Data Storage  Data Analysis  Cloud Interoperability  Seeding the Clouds  Cloud Economics  Distributed Languages 2
  3. 3. QuickTimeᆰ and a Introduction decompressor are needed to see this picture.  Cloud Computing premise  Internet delivery of software and services to distributed clients, from mobile devices to desktops.  We are now as dependent on those delivered services, just as we depend on the telephone, water, electrical utilities (Twitter)  Cloud Infrastructure Challenges  Microsoft, Amazon, Google, Yahoo and others depend on an ever-expanding network of massive data centers  hundreds of thousands of servers, many, many petabytes of data, hundreds of megawatts of power  Database per CPU fee is not sustainable  Need cloud management software for clusters of commodity PCs  Enormous scale brings new design, deployment, management challenges: energy efficiency, rapid deployment, resilience, geo-distribution, composability, and graceful recovery.  Cloud service infrastructure is an integrated system and we must optimize all aspects of hardware and software. 3
  4. 4. QuickTimeᆰ and a Introduction decompressor are needed to see this picture.  What is Cloud Computing?  Larry Ellison: “we’ve redefined Cloud Computing to include everything that we already do.”  An overly broad definition risks diluting the field’s innovation (Larry Ellison has a motivation to be dismissive)  Above the Clouds, UC Berkeley  Cloud Computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services  The services themselves have long been referred to as Software as a Service (SaaS)  The datacenter hardware and software is what we will call a Cloud 4
  5. 5. QuickTimeᆰ and a Motivation for Change decompressor are needed to see this picture.  Scaling to millions users  Datacenter infrastructure makes petascale systems seem small  CAP Theorem requires a change in thinking  Failures are common  Partial failures can result in complete unavailability of some data  Hardware will fail: Fail small  Distributed UPS on every server board  Allows for graceful shutdown, Avoid concern for failure of single UPS  Provide fault tolerance with software  Massive amounts of data  Storage is infinite!!  The current mentality is to store everything and attempt to extract some value  Analyze to extract some value  Massive parallel computation  Complex analysis of large datasets requires new tools and languages to take advantage of the parallelization 5
  6. 6. QuickTimeᆰ and a Component Based Approach decompressor are needed to see this picture.  Build simple systems that do things well  Layers of service built on top of each other  Application layer, distributed storage, sub-services Google Yahoo, FB Microsoft Language Sawzall Pig, Hive DryadLINQ Execution MapReduce Hadoop Dryad Storage BigTable HBase Cosmos, Azure GFS HDFS SQLServer  Google advocates centralizing the control plane  Minimizing the work of the control nodes can make the system sufficiently scalable 6
  7. 7. QuickTimeᆰ and a Virtualization decompressor are needed to see this picture.  Virtualization software is the datacenter OS  Strict separation between each cloud client is necessary to provide the illusion and security of running the applications in isolation.  Communication between VMs on the same machine should be treated like communication between machines.  Virtualization software must simply be a CPU/machine time/resource sharing software, with with a true wall between each VM.  Lightweight OS within a VM can manage the processes involved with an application. 7
  8. 8. QuickTimeᆰ and a Virtualization decompressor are needed to see this picture.  Communication with VMs running the same application should be the common case.  This is necessary to achieve scalability. 8
  9. 9. QuickTimeᆰ and a Datacenter Networking decompressor are needed to see this picture.  Datacenter network is becoming the bottleneck  Communication is increasingly intra-datacenter  80% of packets stay inside the datacenter and the trend is for more internal communication  Network core is under-provisioned due to cost and technical hurdles 9
  10. 10. QuickTimeᆰ and a Datacenter Networking decompressor are needed to see this picture.  Flatten the network topology  Use commodity class routers to decrease cost and provide full bisection bandwidth in the datacenter  Fat-Tree (Scalable Commodity Data Center Network Archiecture), Monsoon(MSR)  Results in non-trivial expansion of interconnection  Monsoon: Interconnection problem can be mitigated by using commodity 10 GigE routers  Use encapsulation at server to make routing, forwarding tables at switches simple by sending to top-of-rack switch Fat-Tree Monsoon 10
  11. 11. QuickTimeᆰ and a Data Storage decompressor are needed to see this picture.  GFS: Distributed File System  GFS has an interface that allows basic file commands, including open, create, read, write and close files.  Google organized GFS into clusters of computers, built from many inexpensive commodity components that often fail.  Master server: coordinator for the cluster  There's only one active master server per cluster at any one time (though each cluster has multiple copies of the master server in case of a hardware failure). GFS avoids a bottleneck by minimizing the messages the master server sends and receives.  Chunkserver: handles all file data  GFS files are usually in the multi-gigabyte (GB) range. GFS breaks files up into manageable chunks of 64 megabytes (MB) each.  GFS copies every chunk multiple times and stores it on different chunkservers. Each copy is called a replica. Default is three replicas per chunk, but users can make more or fewer replicas if desired.  Reliable system built on unreliable components  Data made reliably available to many clients through replication across many servers, each of which independently serve the data to the client.  GFS components give system updates through heartbeats/handshakes. 11
  12. 12. QuickTimeᆰ and a Data Storage decompressor are needed to see this picture.  Bigtable  Bigtable built on GFS  Allows for DB-like organization of GFS data  Bigtable the ability to scale across hundreds or thousands of commodity servers that collectively can store petabytes of data  Each table is a sparse, distributed, multidimensional, sorted map. Table consists of rows and columns, and each cell has a time stamp.  Bigtable splits tables at row boundaries and saves them as tablets.  Each tablet is around 200MB, and each server saves about 100 tablets.  Allows tablets from a single table to be spread among many machines.  Allows for fine-grained load balancing by moving busy tablets to another machine that is not so busy.  If a machine goes down, a tablet may be spread across many other machines so that the performance impact on any given machine is minimal.  Chubby lock  Chubby is a lock service for distributed systems. Basically it's a method to insure that each user can only see their own files on Google's servers (the files are locked).  Based on Paxos algorithm 12
  13. 13. QuickTimeᆰ and a Data Storage decompressor are needed to see this picture.  Ebay  Separation of Application and Datastore  Application layer is stateless (scale up in minutes)  Datastore: Scaling up is on the order of hours  Periodically seeding the cloud would decrease time  Amazon Dynamo  Amazon could not obtain highly scalable systems from vendors  Separation of Application and Datastore through interface  Uses Chord with full interconnection  Any node in the system can be issued a put or get request for any key  Dynamo is an eventually consistent storage system: if one computer updates object A, these changes need to propagate to other machines. 13
  14. 14. QuickTimeᆰ and a Data Storage decompressor are needed to see this picture.  Yahoo PNUTS  Database focused on ACID with performance improvement underneath but database cannot scale to workloads  CAP: meet performance goals. Scale, availability, latency, then move towards consistency Consistency Performance ACID CAP  WheelFS  WheelFS is a wide-area distributed storage system intended to help multi-site applications share data and gain fault tolerance  Risk of cascading failures  Greater isolation? Distributed storage within each datacenter with updates pushed out to the other datacenters.  Applications have different needs from the file system  Use cues from the application to the file system: placement, durability, consistency, and large reads 14
  15. 15. QuickTimeᆰ and a Data Analytics decompressor are needed to see this picture.  MapReduce  Problem: large data sets that can only be effectively processed on large clusters of PCs.  The work must be broken into parts that can be processed in a reasonable time and the result must then be reassembled.  Solution: MapReduce allows developers to express the simple computations they are trying to perform but hides the details of parallelization, fault-tolerance, data distribution and load balancing.  Code reduction: the size of one phase of the computation dropped from approximately 3800 lines of C++ code to approximately 700 lines when expressed using MapReduce 15
  16. 16. QuickTimeᆰ and a Data Analytics decompressor are needed to see this picture.  Hadoop  Open-source MapReduce, Hbase (Bigtable)  Localizes computation to data  MapReduce has no pipelining: must wait for previous job to finish  Facebook  Hive: analyzing relationships between user and response to ads  Attempts to create more relevant ads  More complex languages: Pig, DryadLINQ, Sawzall  Allows the developer to express more complex queries  Great deal of processing occurs on a combination of large and small datasets  Pig: functional, object programming style  Allows for reuse of parallel-processing functions  Pig v Hadoop MapReduce: 1/20 lines of code Typical 1/16 development time 1.5x performance hit 16
  17. 17. QuickTimeᆰ and a Cloud Interoperability decompressor are needed to see this picture.  Cloud: commodity or proprietary  Different cloud providers differ in interfaces and level of access  Application code becomes dependant on particular vendor  Amazon low level  CPU and storage: use the components you need  Google, Salesforce higher level  Write code to API and they scale it for you  Lower level access gives greater control and allows for development of more powerful, third party management tools  Industry moves toward interoperability  Appdrop: allows GoogleApp to run in Amazon EC2  Eucalyptus: Amazon EC2 as open-source industry standard  Salesforce allows calls to other clouds  Data Inertia  The problem with storing everything is moving everything  Ship the data  Move data slowly over time 17
  18. 18. QuickTimeᆰ and a Seeding the Clouds decompressor are needed to see this picture.  Facebook: scaling out  Migrated data over a few weeks: low latency fiber link  All writes happen in the CA datacenter  Updates for people come from across the country  Interconnection graph is so dense, it cannot be effectively partitioned  Cloud Seeding  High speed inter-datacenter links with modified transfer protocols (Aspera)  Transfer to storage in datacenter and ship  Populate new servers 18
  19. 19. QuickTimeᆰ and a Cloud Economics decompressor are needed to see this picture.  External Economics  Pay per use: flat rate, bidding system  AWS: prices depend on computation, storage used  Argument to be made for bidding system to mitigate supply and demand issues between cloud providers and customers. Some customers may want cost guarantees of flat rate per use.  Internal cloud for base use  External cloud for overflow  Uncorrelated applications can allow for oversubscription  Exception: shopping sites at Christmas, news events 19
  20. 20. QuickTimeᆰ and a Cloud Economics decompressor are needed to see this picture.  Internal Economics  Monitoring benchmark: analyzes which resources would improve performance (CPU, memory, storage) (Andy, Matei, Berkeley RAD Lab)  Internal resource bidding system for component resources  Predictability Model: bid, QoS, guarantee probability  Give input for 2 factors  Get result for third factor based on present knowledge  Inflation in the pricing for a resource indicates under provisioning (Kevin Lai, HP Labs) 20
  21. 21. QuickTimeᆰ and a Distributed Languages decompressor are needed to see this picture.  Current Languages  Erlang: CouchDB, Facebook, Twitter  “I think the problem with Erlang is that it's more than 20 years old and for most of this time haven't been exposed enough to developer community at large. It's like raising a child in a cellar for all its childhood and don't let it interact and learn from his/her peers.” -Damien Katz  Ruby on Rails: separates the mapping of data from the underlying storage layer  Different scalable storage will work better for different tasks  Future/Present need  High level distributed languages  lightweight processes, no shared memory, asynchronous message passing and mechanisms to change code on the fly so that programs can evolve and change as they run in non-stop systems  Ruby is an order of magnitude slower than Java  The interpreter does a lot at runtime  Tools to analyze bottlenecks and optimize only the bottleneck  Python has some tools to optimize bottlenecks 21
  22. 22. QuickTimeᆰ and a Conclusion decompressor are needed to see this picture.  Cloud Computing is an evolving field  Formulation of the problems and ways to attack them is important at this time  General solutions are rare: IP, relational DB, Ethernet  Depending on the nature of the event/failure, it may make more economic sense to focus on performance  Performance: every user sees every time  Consistency failure: fails less often, say sorry 22
  23. 23. QuickTimeᆰ and a Conclusion decompressor are needed to see this picture.  Cloud Computing is an evolving field  Formulation of the problems and ways to attack them is important at this time  General solutions are rare: IP, relational DB, Ethernet  Depending on the nature of the event/failure, it may make more economic sense to focus on performance  Performance: every user sees every time  Consistency failure: fails less often, say sorry 23
  24. 24. QuickTimeᆰ and a Contacts decompressor are needed to see this picture. http://www.perlegos.com http://infinitepolygon.blogspot.com/ http://www.linkedin.com/in/peteperlegos 24

×