SlideShare a Scribd company logo
1 of 86
1 UC Berkeley Cloud Computing:  Past, Present, and Future  Professor Anthony D. Joseph*, UC BerkeleyReliable Adaptive Distributed Systems Lab RWTH Aachen 22 March 2010 http://abovetheclouds.cs.berkeley.edu/ *Director, Intel Research Berkeley
RAD Lab 5-year Mission Enable 1 person to develop, deploy, operate next -generation Internet application Key enabling technology: Statistical machine learning debugging, monitoring, pwr mgmt, auto-configuration, perfprediction, ... Highly interdisciplinary faculty & students PI’s: Patterson/Fox/Katz (systems/networks), Jordan (machine learning), Stoica (networks & P2P), Joseph (security), Shenker (networks), Franklin (DB) 2 postdocs, ~30 PhD students, ~6 undergrads Grad/Undergrad teaching integrated with research
Course Timeline Friday 10:00-12:00 History of Cloud Computing: Time-sharing, virtual machines, datacenter architectures, utility computing 12:00-13:30 Lunch 13:30-15:00 Modern Cloud Computing: economics, elasticity, failures 15:00-15:30 Break 15:30-17:00 Cloud Computing Infrastructure: networking, storage, computation models Monday 10:00-12:00 Cloud Computing research topics: scheduling, multiple datacenters, testbeds
Nexus: A common substrate for cluster computing Joint work with Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi,  Scott Shenker, and Ion Stoica
Recall: Hadoop on HDFS namenode job submission node namenode daemon jobtracker tasktracker tasktracker tasktracker datanode daemon datanode daemon datanode daemon Linux file system Linux file system Linux file system … … … slave node slave node slave node Adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
Problem Rapid innovation in cluster computing frameworks No single framework optimal for all applications Energy efficiency means maximizing cluster utilization Want to run multiple frameworks in a single cluster
What do we want to run in the cluster? Pregel Apache Hama Dryad Pig
Why share the cluster between frameworks? Better utilization and efficiency (e.g., take advantage of diurnal patterns) Better data sharing across frameworks and applications
Solution Nexus is an “operating system” for the cluster over which diverse frameworks can run Nexus multiplexes resources between frameworks Frameworks control job execution
Goals Scalable Robust (i.e., simple enough to harden) Flexible enough for a variety of different cluster frameworks Extensible enough to encourage innovative future frameworks
Question 1: Granularity of Sharing Option: Coarse-grained sharing Give framework a (slice of) machine for its entire duration Data locality compromised if machine held for long time Hard to account for new frameworks and changing demands -> hurts utilization and interactivity Hadoop 1 Hadoop 2 Hadoop 3
Question 1: Granularity of Sharing Nexus: Fine-grained sharing Support frameworks that use smaller tasks (in time and space) by multiplexing them across all available resources Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 1 Frameworks can take turns accessing data on each node Can resize frameworks shares to get utilization & interactivity Hadoop 1 Hadoop 2 Hadoop 2 Hadoop 1 Hadoop 3 Hadoop 1 Hadoop 2 Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 2 Hadoop 3 Hadoop 2 Hadoop 1 Hadoop 3 Hadoop 1 Hadoop 2
Question 2: Resource Allocation Option: Global scheduler Frameworks express needs in a specification language, a global scheduler matches resources to frameworks Requires encoding a framework’s semantics using the language, which is complex and can lead to ambiguities Restricts frameworks if specification is unanticipated Designing a general-purpose global scheduler is hard
Question 2: Resource Allocation Nexus: Resource offers Offer free resources to frameworks, let frameworks pick which resources best suit their needs ,[object Object]
Distributed decisions might not be optimal,[object Object]
Nexus Architecture
Hadoop job Hadoop job MPI job Hadoop v20 scheduler Hadoop v19 scheduler MPI scheduler Nexus master Nexus slave Nexus slave Nexus slave MPI executor MPI executor Hadoop v19 executor Hadoop v20 executor Hadoop v19 executor task task task task Overview task
Hadoop job Hadoop scheduler Nexus master Nexus slave Nexus slave MPI executor task Resource Offers MPI job MPI scheduler Pick framework to offer to Resourceoffer MPI executor task
Resource Offers MPI job Hadoop job MPI scheduler Hadoop scheduler offer = list of {machine, free_resources} Example:		          [ {node 1, <2 CPUs, 4 GB>},            {node 2, <2 CPUs, 4 GB>} ] Pick framework to offer to Resource offer Nexus master Nexus slave Nexus slave MPI executor MPI executor task task
Hadoop job Hadoop scheduler Nexus master Nexus slave Nexus slave Hadoop executor MPI executor task Resource Offers MPI job MPI scheduler Framework-specific scheduling task Pick framework to offer to Resourceoffer Launches & isolates executors MPI executor task
Resource Offer Details Min and max task sizes to control fragmentation Filters let framework restrict offers sent to it By machine list By quantity of resources Timeouts can be added to filters Frameworks can signal when to destroy filters, or when they want more offers
Using Offers for Data Locality We found that a simple policy called delay scheduling can give very high locality: Framework waits for offers on nodes that have its data If waited longer than a certain delay, starts launching non-local tasks
Framework Isolation Isolation mechanism is pluggable due to the inherent perfomance/isolation tradeoff Current implementation supports Solaris projects and Linux containers  Both isolate CPU, memory and network bandwidth Linux developers working on disk IO isolation Other options: VMs, Solaris zones, policing
Resource Allocation
Allocation Policies Nexus picks framework to offer resources to, and hence controls how many resources each framework can get (but not which) Allocation policies are pluggable to suit organization needs, through allocation modules
Example: Hierarchical Fairshare Policy Cluster Share Policy Facebook.com 20%  100% 80% 0% Ads Spam User 2 User 1 14% 70% 30% 20% 100% 6% Job 4 Job 3 Job 2 Job 1 CurrTime CurrTime CurrTime
Revocation Killing tasks to make room for other users Not the normal case because fine-grained tasks enable quick reallocation of resources  Sometimes necessary: Long running tasks never relinquishing resources Buggy job running forever Greedy user who decides to makes his task long
Revocation Mechanism Allocation policy defines a safe share for each user Users will get at least safe share within specified time Revoke only if a user is below its safe share and is interested in offers Revoke tasks from users farthest above their safe share Framework warned before its task is killed
How Do We Run MPI? Users always told their safe share Avoid revocation by staying below it Giving each user a small safe share may not be enough if jobs need many machines  Can run a traditional grid or HPC scheduler as a user with a larger safe share of the cluster, and have MPI jobs queue up on it E.g. Torque gets 40% of cluster
Example: Torque on Nexus Facebook.com Safe share = 40% 40% 20% 40% Torque Ads Spam User 2 User 1 MPI Job MPI Job MPI Job MPI Job Job 4 Job 1 Job 2 Job 1
Multi-Resource Fairness
What is Fair? Goal: define a fair allocation of resources in the cluster between multiple users Example: suppose we have:  30 CPUs and 30 GB RAM Two users with equal shares User 1 needs <1 CPU, 1 GB RAM> per task User 2 needs <1 CPU, 3 GB RAM> per task What is a fair allocation?
Definition 1: Asset Fairness Idea: give weights to resources (e.g. 1 CPU = 1 GB) and equalize value of resources given to each user Algorithm: when resources are free, offer to whoever has the least value Result: U1: 12 tasks: 12 CPUs, 12 GB ($24) U2: 6   tasks:   6 CPUs, 18 GB ($24) PROBLEM User 1 has < 50% of both CPUs and RAM User 1 User 2 100% 50% 0% CPU RAM
Lessons from Definition 1 “You shouldn’t do worse than if you ran a smaller, private cluster equal in size to your share” Thus, given N users, each user should get ≥ 1/N of his dominating resource (i.e., the resource that he consumes most of)
Def. 2: Dominant Resource Fairness Idea: give every user an equal share of her dominant resource (i.e., resource it consumes most of)  Algorithm: when resources are free, offer to the user with the smallest dominant share (i.e., fractional share of the her dominant resource) Result: U1: 15 tasks: 15 CPUs, 15 GB U2:   5 tasks:   5 CPUs, 15 GB User 1 User 2 100% 50% 0% CPU RAM
Fairness Properties
Implementation
Implementation Stats 7000 lines of C++ APIs in C, C++, Java, Python, Ruby Executor isolation using Linux containers and Solaris projects
Frameworks Ported frameworks: Hadoop(900 line patch) MPI (160 line wrapper scripts) New frameworks: Spark, Scala framework for iterative jobs (1300 lines) Apache+haproxy, elastic web server farm (200 lines)
Results
Overhead Less than 4% seen in practice
Dynamic Resource Sharing
Multiple Hadoops Experiment Hadoop 1 Hadoop 2 Hadoop 3
Multiple Hadoops Experiment Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 1 Hadoop 1 Hadoop 1 Hadoop 2 Hadoop 2 Hadoop 1 Hadoop 3 Hadoop 1 Hadoop 2 Hadoop 2 Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 2 Hadoop 3 Hadoop 2 Hadoop 3 Hadoop 1 Hadoop 1 Hadoop 3 Hadoop 2
Results with 16 Hadoops
Web Server Farm Framework
Web Framework Experiment httperf HTTP request HTTP request HTTP request Load calculation Scheduler (haproxy) Load gen framework task resource offer Nexus master status update Nexus slave Nexus slave Nexus slave Web executor Load gen executor Web executor Load gen executor Load gen executor Web executor task (Apache) task task (Apache) task task task task (Apache)
Web Framework Results
Future Work Experiment with parallel programming models Further explore low-latency services on Nexus (web applications, etc) Shared services (e.g. BigTable, GFS) Deploy to users and open source
Cloud Computing Testbeds
Open Cirrus™: Seizing the Open Source Cloud Stack OpportunityA joint initiative sponsored by HP, Intel, and Yahoo! http://opencirrus.org/
Proprietary Cloud Computing stacks GOOGLE AMAZON MICROSOFT Publicly accessible layer Applications Applications Applications Application Frameworks MapReduce, Sawzall, Google App Engine, Protocol Buffers Application Frameworks EMR – Hadoop Application Frameworks .NET Services Software Infrastructure VM Management Job Scheduling Borg Storage Management GFS,  BigTable Monitoring Borg Software Infrastructure VM Management EC2 Job Scheduling Storage Management S3, EBS Monitoring Borg Software Infrastructure VM Management Fabric Controller Job Scheduling Fabric Controller Storage Management SQL Services, blobs, tables, queues Monitoring Fabric Controller Hardware Infrastructure Borg Hardware Infrastructure Hardware Infrastructure Fabric Controller
Applications Monitoring Ganglia Nagios Zenoss MON Moara Storage  Management HDFS KFS Gluster Lustre PVFS MooseFS  HBase  Hypertable Application Frameworks Pig, Hadoop, MPI, Sprout, Mahout Software Infrastructure VM Management Job Scheduling Storage Management Monitoring Job Scheduling Maui/Torque VM Management Eucalyptus  Enomalism Tashi  Reservoir Nimbus, oVirt Hardware  Infrastructure PRS  Emulab  Cobbler  xCat Hardware Infrastructure PRS, Emulab, Cobbler, xCat Open Cloud Computing stacks Heavily fragmented  today!
Open Cirrus™ Cloud Computing Testbed Shared:  research, applications, infrastructure (12K cores), data sets Global services: sign on, monitoring, store. Open src stack (prs, tashi, hadoop) Sponsored by HP, Intel, and Yahoo! (with additional support from NSF) ,[object Object],[object Object]
Open Cirrus Organization  Central Management Office, oversees Open Cirrus Currently owned by HP  Governance model Research team  Technical team  New site additions  Support (legal (export, privacy), IT, etc.)  Each site   Runs its own research and technical teams  Contributes individual technologies  Operates some of the global services  E.g.  HP site supports portal and PRS Intel site developing and supporting Tashi Yahoo! contributes to Hadoop
Intel BigData Open Cirrus Site Mobile Rack 8 (1u) nodes ------------- 2 Xeon E5440 (quad-core) [Harpertown/ Core 2]  16GB DRAM 2 1TB Disk http://opencirrus.intel-research.net 1 Gb/s  (x8 p2p) 1 Gb/s  (x4) Switch 24 Gb/s 1 Gb/s  (x8) 1 Gb/s  (x4) 45 Mb/s T3  to Internet Switch 48 Gb/s * Switch 48 Gb/s 1 Gb/s (x2x5 p2p) 1 Gb/s  (x4) 1 Gb/s  (x4) 1 Gb/s  (x4) 1 Gb/s  (x4) 1 Gb/s  (x4) 3U Rack 5 storage nodes ------------- 12 1TB Disks Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s 1 Gb/s  (x4x4 p2p) 1 Gb/s  (x4x4 p2p) 1 Gb/s  (x15 p2p) 1 Gb/s  (x15 p2p) 1 Gb/s  (x15 p2p) (r1r5) PDU w/per-port monitoring  and control Blade Rack  40 nodes Blade Rack  40 nodes 1U Rack  15 nodes 2U Rack  15 nodes 2U Rack  15 nodes 20 nodes: 1 Xeon (1-core) [Irwindale/Pent4], 6GB DRAM, 366GB disk (36+300GB) 10 nodes: 2 Xeon 5160 (2-core) [Woodcrest/Core], 4GB RAM,  2 75GB disks 10 nodes: 2 Xeon E5345 (4-core) [Clovertown/Core],8GB DRAM, 2 150GB Disk 2 Xeon E5345 (quad-core) [Clovertown/ Core] 8GB DRAM 2 150GB Disk 2 Xeon E5420 (quad-core) [Harpertown/ Core 2] 8GB DRAM 2 1TB Disk 2 Xeon E5440 (quad-core) [Harpertown/ Core 2] 8GB DRAM 6 1TB Disk 2 Xeon E5520 (quad-core) [Nehalem-EP/ Core i7]  16GB DRAM 6 1TB Disk x3 x2 x2 Key: rXrY=row X rack Y rXrYcZ=row X rack Y chassis Z (r2r2c1-4) (r2r1c1-4) (r1r1, r1r2) (r1r3, r1r4, r2r3) (r3r2, r3r3)
Open Cirrus Sites Total 1,029 4 PB 12,074 1,746 26.3 TB
Testbed Comparison
Open Cirrus Stack Compute + network +  storage resources  Management and  control subsystem Power + cooling  Physical Resource set (Zoni) service Credit: John Wilkes (HP)
Open Cirrus Stack Research Tashi NFS storage  service HDFS storage service PRS clients, each with theirown “physical data center” Zoni service
Open Cirrus Stack Research Tashi NFS storage  service HDFS storage service Virtual cluster Virtual cluster Virtual clusters (e.g., Tashi) Zoni service
Open Cirrus Stack Research Tashi NFS storage  service HDFS storage service Virtual cluster Virtual cluster Application running On Hadoop On Tashi virtual cluster On a PRS On real hardware BigData App Hadoop Zoni service
Open Cirrus Stack Research Tashi NFS storage  service HDFS storage service Virtual cluster Virtual cluster Experiment/ save/restore BigData app Hadoop Zoni service
Open Cirrus Stack Research Tashi NFS storage  service HDFS storage service Virtual cluster Virtual cluster Experiment/ save/restore BigData App Hadoop Platform services Zoni service
Open Cirrus Stack Research Tashi NFS storage  service HDFS storage service Virtual cluster Virtual cluster User services Experiment/ save/restore BigData App Hadoop Platform services Zoni service
Open Cirrus Stack Research Tashi NFS storage  service HDFS storage service Virtual cluster Virtual cluster User services Experiment/ save/restore BigData App Hadoop Platform services Zoni
System Organization Compute nodes are divided into dynamically-allocated, vlan-isolated PRS subdomains Apps switch back and forth between virtual and phyiscal Open service  research Apps running in a  VM mgmt infrastructure  (e.g., Tashi) Tashi  development  Production  storage  service  Proprietary service  research  Open workload monitoring and trace collection
Open Cirrus stack - Zoni Zoni service goals Provide mini-datacenters to researchers Isolate experiments from each other Stable base for other research Zoni service approach Allocate sets of physical co-located nodes, isolated inside VLANs. Zoni code from HP being merged into Tashi Apache project and extended by Intel Running on HP site Being ported to Intel site Will eventually run on all sites
Open Cirrus Stack - Tashi  An open source Apache Software Foundation project sponsored by Intel (with CMU, Yahoo, HP) Infrastructure for cloud computing on Big Data   http://incubator.apache.org/projects/tashi  Research focus:  Location-aware co-scheduling of VMs, storage, and power. Seamless physical/virtual migration.   Joint with Greg Ganger (CMU), Mor Harchol-Balter (CMU), Milan Milenkovic (CTG)
Node Node Node Node Node Node Tashi High-Level Design Services are instantiated  through virtual machines Most decisions happen in the scheduler; manages  compute/storage/power  in concert Data location  and power  information is exposed  to scheduler  and services Scheduler Virtualization Service Storage Service The storage service aggregates the capacity of the commodity nodes  to house Big Data repositories.  Cluster Manager Cluster nodes are assumed  to be commodity machines CM maintains databases and routes messages; decision logic is limited
Location Matters (calculated)
73 Open Cirrus Stack - Hadoop   An open-source Apache Software Foundation project sponsored by Yahoo!  http://wiki.apache.org/hadoop/ProjectDescription  Provides a parallel programming model (MapReduce), a distributed file system, and a parallel database (HDFS)
What kinds of research projects are Open Cirrus sites looking for?  Open Cirrus is seeking research in the following areas (different centers will weight these differently): Datacenter federation Datacenter management Web services Data-intensive applications and systems  The following kinds of projects are generally not of interest: Traditional HPC application development Production applications that just need lots of cycles Closed source system development
How do users get access to Open Cirrus sites? Project PIs apply to each site separately.  Contact names, email addresses, and web links for applications to each site will be available on the Open Cirrus Web site (which goes live Q209) http://opencirrus.org Each Open Cirrus site decides which users and projects get access to its site. Developing a global sign on for all sites (Q2 09) Users will be able to login to each Open Cirrus site for which they are authorized using the same login and password.
Summary and Lessons  Intel is collaborating with HP and Yahoo! to provide a cloud computing testbed for the research community Using the cloud as an accelerator for interactive streaming/big data apps is an important usage model  Primary goals are to  Foster new systems research around cloud computing Catalyze open-source reference stack and APIs for the cloud Access model, Local and global services, Application frameworks Explore location-aware and power-aware workload scheduling Develop integrated physical/virtual allocations to combat cluster squatting Design cloud storage models GFS-style storage systems not mature, impact of SSDs unknown Investigate new application framework alternatives to map-reduce/Hadoop
Other Cloud Computing Research Topics: Isolation and DC Energy
Heterogeneity in Virtualized Environments VM technology isolates CPU and memory, but disk and network are shared Full bandwidth when no contention Equal shares when there is contention 2.5x performance difference EC2 small instances
Isolation Research Need predictable variance over raw performance Some resources that people have run into problems with:  Power, disk space, disk I/O rate (drive, bus), memory space (user/kernel), memory bus, cache at all levels (TLB, etc), hyperthreading/etc, CPU rate, interrupts Network: NIC (Rx/Tx), Switch, cross-datacenter, cross-country OS resources: File descriptors, ports, sockets
Datacenter Energy EPA, 8/2007: 1.5% of total U.S. energy consumption Growing from 60 to 100 Billion kWh in 5 yrs 48% of typical IT budget spent on energy 75 MW new DC deployments in PG&E’s service area – that they know about! (expect another 2x) Microsoft: $500m new Chicago facility Three substations with a capacity of 198MW  200+ shipping containers w/ 2,000 servers each Overall  growth of 20,000/month
81 Power/Cooling Issues
First Milestone: DC Energy Conservation DCs limited by power For each dollar spent on servers, add $0.48 (2005)/$0.71 (2010) for power/cooling $26B spent to power and cool servers in 2005 grows to $45B in 2010 Within DC racks, network equipment often the “hottest” components in the hot spot
Thermal Image of Typical Cluster Rack Rack Switch M. K. Patterson, A. Pratt, P. Kumar, “From UPS to Silicon: an end-to-end evaluation of datacenter efficiency”, Intel Corporation
DC Networking and Power Selectively power down ports/portions of net elements Enhanced power-awareness in the network stack Power-aware routing and support for system virtualization Support for datacenter “slice” power down and restart Application and power-aware media access/control Dynamic selection of full/half duplex Directional asymmetry to save power, e.g., 10Gb/s send, 100Mb/s receive  Power-awareness in applications and protocols Hard state (proxying), soft state (caching), protocol/data “streamlining” for power as well as b/w reduction Power implications for topology design Tradeoffs in redundancy/high-availability vs. power consumption VLANs support for power-aware system virtualization
Summary Many areas for research into Cloud Computing! Datacenter design, languages, scheduling, isolation, energy efficiency (at all levels) Opportunities to try out research at scale! Amazon EC2, Open Cirrus, …
Thank you! adj@eecs.berkeley.edu http://abovetheclouds.cs.berkeley.edu/ 86
Cloud Computing

More Related Content

What's hot

An Efficient Cloud based Approach for Service Crawling
An Efficient Cloud based Approach for Service CrawlingAn Efficient Cloud based Approach for Service Crawling
An Efficient Cloud based Approach for Service CrawlingIDES Editor
 
FIWARE Tech Summit - FIWARE Cygnus and STH-Comet
FIWARE Tech Summit - FIWARE Cygnus and STH-CometFIWARE Tech Summit - FIWARE Cygnus and STH-Comet
FIWARE Tech Summit - FIWARE Cygnus and STH-CometFIWARE
 
Are cloud based virtual labs cost effective? (CSEDU 2012)
Are cloud based virtual labs cost effective? (CSEDU 2012)Are cloud based virtual labs cost effective? (CSEDU 2012)
Are cloud based virtual labs cost effective? (CSEDU 2012)Nane Kratzke
 
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017Ashish Bansal
 
Load balancing in public cloud combining the concepts of data mining and netw...
Load balancing in public cloud combining the concepts of data mining and netw...Load balancing in public cloud combining the concepts of data mining and netw...
Load balancing in public cloud combining the concepts of data mining and netw...eSAT Publishing House
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...IJSRD
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovSpark Summit
 
F233842
F233842F233842
F233842irjes
 
Application of selective algorithm for effective resource provisioning in clo...
Application of selective algorithm for effective resource provisioning in clo...Application of selective algorithm for effective resource provisioning in clo...
Application of selective algorithm for effective resource provisioning in clo...ijccsa
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.Ramandeep Kaur
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsKendall
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingRamandeep Kaur
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentSwapnil Shahade
 
Data Replication In Cloud Computing
Data Replication In Cloud ComputingData Replication In Cloud Computing
Data Replication In Cloud ComputingRahul Garg
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaSpark Summit
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...Papitha Velumani
 

What's hot (20)

An Efficient Cloud based Approach for Service Crawling
An Efficient Cloud based Approach for Service CrawlingAn Efficient Cloud based Approach for Service Crawling
An Efficient Cloud based Approach for Service Crawling
 
FIWARE Tech Summit - FIWARE Cygnus and STH-Comet
FIWARE Tech Summit - FIWARE Cygnus and STH-CometFIWARE Tech Summit - FIWARE Cygnus and STH-Comet
FIWARE Tech Summit - FIWARE Cygnus and STH-Comet
 
Cloud Computing and PSo
Cloud Computing and PSoCloud Computing and PSo
Cloud Computing and PSo
 
Resource management
Resource managementResource management
Resource management
 
TensorFlow
TensorFlowTensorFlow
TensorFlow
 
Are cloud based virtual labs cost effective? (CSEDU 2012)
Are cloud based virtual labs cost effective? (CSEDU 2012)Are cloud based virtual labs cost effective? (CSEDU 2012)
Are cloud based virtual labs cost effective? (CSEDU 2012)
 
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
 
Load balancing in public cloud combining the concepts of data mining and netw...
Load balancing in public cloud combining the concepts of data mining and netw...Load balancing in public cloud combining the concepts of data mining and netw...
Load balancing in public cloud combining the concepts of data mining and netw...
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
 
F233842
F233842F233842
F233842
 
Application of selective algorithm for effective resource provisioning in clo...
Application of selective algorithm for effective resource provisioning in clo...Application of selective algorithm for effective resource provisioning in clo...
Application of selective algorithm for effective resource provisioning in clo...
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud Computing
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
 
Data Replication In Cloud Computing
Data Replication In Cloud ComputingData Replication In Cloud Computing
Data Replication In Cloud Computing
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
 

Viewers also liked

Rzevsky agent models of large systems
Rzevsky  agent models of large systemsRzevsky  agent models of large systems
Rzevsky agent models of large systemsMasha Rudnichenko
 
A Manifesto for 21st-Century IT
A Manifesto for 21st-Century ITA Manifesto for 21st-Century IT
A Manifesto for 21st-Century ITJeff Sussna
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...IEEEFINALYEARPROJECTS
 
"A 30min Introduction to Agent-Based Modelling" for GORS
"A 30min Introduction to Agent-Based Modelling" for GORS"A 30min Introduction to Agent-Based Modelling" for GORS
"A 30min Introduction to Agent-Based Modelling" for GORSBruce Edmonds
 
Chapter 6 complexity science and complex adaptive systems
Chapter 6 complexity science and complex adaptive systemsChapter 6 complexity science and complex adaptive systems
Chapter 6 complexity science and complex adaptive systemsstanbridge
 
A Survey on Resource Allocation & Monitoring in Cloud Computing
A Survey on Resource Allocation & Monitoring in Cloud ComputingA Survey on Resource Allocation & Monitoring in Cloud Computing
A Survey on Resource Allocation & Monitoring in Cloud ComputingMohd Hairey
 
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...SaikiranReddy Sama
 

Viewers also liked (7)

Rzevsky agent models of large systems
Rzevsky  agent models of large systemsRzevsky  agent models of large systems
Rzevsky agent models of large systems
 
A Manifesto for 21st-Century IT
A Manifesto for 21st-Century ITA Manifesto for 21st-Century IT
A Manifesto for 21st-Century IT
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
"A 30min Introduction to Agent-Based Modelling" for GORS
"A 30min Introduction to Agent-Based Modelling" for GORS"A 30min Introduction to Agent-Based Modelling" for GORS
"A 30min Introduction to Agent-Based Modelling" for GORS
 
Chapter 6 complexity science and complex adaptive systems
Chapter 6 complexity science and complex adaptive systemsChapter 6 complexity science and complex adaptive systems
Chapter 6 complexity science and complex adaptive systems
 
A Survey on Resource Allocation & Monitoring in Cloud Computing
A Survey on Resource Allocation & Monitoring in Cloud ComputingA Survey on Resource Allocation & Monitoring in Cloud Computing
A Survey on Resource Allocation & Monitoring in Cloud Computing
 
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
 

Similar to Cloud Computing

Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterShivraj Raj
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it worldChris Dwan
 
Performance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpiPerformance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpieSAT Journals
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataRobert Grossman
 
BISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesBISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesSrinath Perera
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...OpenStack
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goalskamaelian
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringRafael Ferreira da Silva
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfKishaKiddo
 
PeerToPeerComputing (1)
PeerToPeerComputing (1)PeerToPeerComputing (1)
PeerToPeerComputing (1)MurtazaB
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
CrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked dataCrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked dataRaphael do Vale
 
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceDesigning High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceObject Automation
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedWee Hyong Tok
 
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediAnimesh Chaturvedi
 
Operating Systems R20 Unit 2.pptx
Operating Systems R20 Unit 2.pptxOperating Systems R20 Unit 2.pptx
Operating Systems R20 Unit 2.pptxPrudhvi668506
 

Similar to Cloud Computing (20)

Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop cluster
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Performance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpiPerformance evaluation of larger matrices over cluster of four nodes using mpi
Performance evaluation of larger matrices over cluster of four nodes using mpi
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
BISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesBISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple Spaces
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
 
PeerToPeerComputing (1)
PeerToPeerComputing (1)PeerToPeerComputing (1)
PeerToPeerComputing (1)
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
CrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked dataCrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked data
 
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceDesigning High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learned
 
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
 
Painless Cache Allocation in Cloud
Painless Cache Allocation in CloudPainless Cache Allocation in Cloud
Painless Cache Allocation in Cloud
 
Operating Systems R20 Unit 2.pptx
Operating Systems R20 Unit 2.pptxOperating Systems R20 Unit 2.pptx
Operating Systems R20 Unit 2.pptx
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Cloud Computing

  • 1. 1 UC Berkeley Cloud Computing: Past, Present, and Future Professor Anthony D. Joseph*, UC BerkeleyReliable Adaptive Distributed Systems Lab RWTH Aachen 22 March 2010 http://abovetheclouds.cs.berkeley.edu/ *Director, Intel Research Berkeley
  • 2. RAD Lab 5-year Mission Enable 1 person to develop, deploy, operate next -generation Internet application Key enabling technology: Statistical machine learning debugging, monitoring, pwr mgmt, auto-configuration, perfprediction, ... Highly interdisciplinary faculty & students PI’s: Patterson/Fox/Katz (systems/networks), Jordan (machine learning), Stoica (networks & P2P), Joseph (security), Shenker (networks), Franklin (DB) 2 postdocs, ~30 PhD students, ~6 undergrads Grad/Undergrad teaching integrated with research
  • 3. Course Timeline Friday 10:00-12:00 History of Cloud Computing: Time-sharing, virtual machines, datacenter architectures, utility computing 12:00-13:30 Lunch 13:30-15:00 Modern Cloud Computing: economics, elasticity, failures 15:00-15:30 Break 15:30-17:00 Cloud Computing Infrastructure: networking, storage, computation models Monday 10:00-12:00 Cloud Computing research topics: scheduling, multiple datacenters, testbeds
  • 4. Nexus: A common substrate for cluster computing Joint work with Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Scott Shenker, and Ion Stoica
  • 5. Recall: Hadoop on HDFS namenode job submission node namenode daemon jobtracker tasktracker tasktracker tasktracker datanode daemon datanode daemon datanode daemon Linux file system Linux file system Linux file system … … … slave node slave node slave node Adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
  • 6. Problem Rapid innovation in cluster computing frameworks No single framework optimal for all applications Energy efficiency means maximizing cluster utilization Want to run multiple frameworks in a single cluster
  • 7. What do we want to run in the cluster? Pregel Apache Hama Dryad Pig
  • 8. Why share the cluster between frameworks? Better utilization and efficiency (e.g., take advantage of diurnal patterns) Better data sharing across frameworks and applications
  • 9. Solution Nexus is an “operating system” for the cluster over which diverse frameworks can run Nexus multiplexes resources between frameworks Frameworks control job execution
  • 10. Goals Scalable Robust (i.e., simple enough to harden) Flexible enough for a variety of different cluster frameworks Extensible enough to encourage innovative future frameworks
  • 11. Question 1: Granularity of Sharing Option: Coarse-grained sharing Give framework a (slice of) machine for its entire duration Data locality compromised if machine held for long time Hard to account for new frameworks and changing demands -> hurts utilization and interactivity Hadoop 1 Hadoop 2 Hadoop 3
  • 12. Question 1: Granularity of Sharing Nexus: Fine-grained sharing Support frameworks that use smaller tasks (in time and space) by multiplexing them across all available resources Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 1 Frameworks can take turns accessing data on each node Can resize frameworks shares to get utilization & interactivity Hadoop 1 Hadoop 2 Hadoop 2 Hadoop 1 Hadoop 3 Hadoop 1 Hadoop 2 Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 2 Hadoop 3 Hadoop 2 Hadoop 1 Hadoop 3 Hadoop 1 Hadoop 2
  • 13. Question 2: Resource Allocation Option: Global scheduler Frameworks express needs in a specification language, a global scheduler matches resources to frameworks Requires encoding a framework’s semantics using the language, which is complex and can lead to ambiguities Restricts frameworks if specification is unanticipated Designing a general-purpose global scheduler is hard
  • 14.
  • 15.
  • 17. Hadoop job Hadoop job MPI job Hadoop v20 scheduler Hadoop v19 scheduler MPI scheduler Nexus master Nexus slave Nexus slave Nexus slave MPI executor MPI executor Hadoop v19 executor Hadoop v20 executor Hadoop v19 executor task task task task Overview task
  • 18. Hadoop job Hadoop scheduler Nexus master Nexus slave Nexus slave MPI executor task Resource Offers MPI job MPI scheduler Pick framework to offer to Resourceoffer MPI executor task
  • 19. Resource Offers MPI job Hadoop job MPI scheduler Hadoop scheduler offer = list of {machine, free_resources} Example: [ {node 1, <2 CPUs, 4 GB>}, {node 2, <2 CPUs, 4 GB>} ] Pick framework to offer to Resource offer Nexus master Nexus slave Nexus slave MPI executor MPI executor task task
  • 20. Hadoop job Hadoop scheduler Nexus master Nexus slave Nexus slave Hadoop executor MPI executor task Resource Offers MPI job MPI scheduler Framework-specific scheduling task Pick framework to offer to Resourceoffer Launches & isolates executors MPI executor task
  • 21. Resource Offer Details Min and max task sizes to control fragmentation Filters let framework restrict offers sent to it By machine list By quantity of resources Timeouts can be added to filters Frameworks can signal when to destroy filters, or when they want more offers
  • 22. Using Offers for Data Locality We found that a simple policy called delay scheduling can give very high locality: Framework waits for offers on nodes that have its data If waited longer than a certain delay, starts launching non-local tasks
  • 23. Framework Isolation Isolation mechanism is pluggable due to the inherent perfomance/isolation tradeoff Current implementation supports Solaris projects and Linux containers Both isolate CPU, memory and network bandwidth Linux developers working on disk IO isolation Other options: VMs, Solaris zones, policing
  • 25. Allocation Policies Nexus picks framework to offer resources to, and hence controls how many resources each framework can get (but not which) Allocation policies are pluggable to suit organization needs, through allocation modules
  • 26. Example: Hierarchical Fairshare Policy Cluster Share Policy Facebook.com 20% 100% 80% 0% Ads Spam User 2 User 1 14% 70% 30% 20% 100% 6% Job 4 Job 3 Job 2 Job 1 CurrTime CurrTime CurrTime
  • 27. Revocation Killing tasks to make room for other users Not the normal case because fine-grained tasks enable quick reallocation of resources Sometimes necessary: Long running tasks never relinquishing resources Buggy job running forever Greedy user who decides to makes his task long
  • 28. Revocation Mechanism Allocation policy defines a safe share for each user Users will get at least safe share within specified time Revoke only if a user is below its safe share and is interested in offers Revoke tasks from users farthest above their safe share Framework warned before its task is killed
  • 29. How Do We Run MPI? Users always told their safe share Avoid revocation by staying below it Giving each user a small safe share may not be enough if jobs need many machines Can run a traditional grid or HPC scheduler as a user with a larger safe share of the cluster, and have MPI jobs queue up on it E.g. Torque gets 40% of cluster
  • 30. Example: Torque on Nexus Facebook.com Safe share = 40% 40% 20% 40% Torque Ads Spam User 2 User 1 MPI Job MPI Job MPI Job MPI Job Job 4 Job 1 Job 2 Job 1
  • 32. What is Fair? Goal: define a fair allocation of resources in the cluster between multiple users Example: suppose we have: 30 CPUs and 30 GB RAM Two users with equal shares User 1 needs <1 CPU, 1 GB RAM> per task User 2 needs <1 CPU, 3 GB RAM> per task What is a fair allocation?
  • 33. Definition 1: Asset Fairness Idea: give weights to resources (e.g. 1 CPU = 1 GB) and equalize value of resources given to each user Algorithm: when resources are free, offer to whoever has the least value Result: U1: 12 tasks: 12 CPUs, 12 GB ($24) U2: 6 tasks: 6 CPUs, 18 GB ($24) PROBLEM User 1 has < 50% of both CPUs and RAM User 1 User 2 100% 50% 0% CPU RAM
  • 34. Lessons from Definition 1 “You shouldn’t do worse than if you ran a smaller, private cluster equal in size to your share” Thus, given N users, each user should get ≥ 1/N of his dominating resource (i.e., the resource that he consumes most of)
  • 35. Def. 2: Dominant Resource Fairness Idea: give every user an equal share of her dominant resource (i.e., resource it consumes most of) Algorithm: when resources are free, offer to the user with the smallest dominant share (i.e., fractional share of the her dominant resource) Result: U1: 15 tasks: 15 CPUs, 15 GB U2: 5 tasks: 5 CPUs, 15 GB User 1 User 2 100% 50% 0% CPU RAM
  • 38. Implementation Stats 7000 lines of C++ APIs in C, C++, Java, Python, Ruby Executor isolation using Linux containers and Solaris projects
  • 39. Frameworks Ported frameworks: Hadoop(900 line patch) MPI (160 line wrapper scripts) New frameworks: Spark, Scala framework for iterative jobs (1300 lines) Apache+haproxy, elastic web server farm (200 lines)
  • 41. Overhead Less than 4% seen in practice
  • 43. Multiple Hadoops Experiment Hadoop 1 Hadoop 2 Hadoop 3
  • 44. Multiple Hadoops Experiment Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 1 Hadoop 1 Hadoop 1 Hadoop 2 Hadoop 2 Hadoop 1 Hadoop 3 Hadoop 1 Hadoop 2 Hadoop 2 Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 2 Hadoop 3 Hadoop 2 Hadoop 3 Hadoop 1 Hadoop 1 Hadoop 3 Hadoop 2
  • 45. Results with 16 Hadoops
  • 46. Web Server Farm Framework
  • 47. Web Framework Experiment httperf HTTP request HTTP request HTTP request Load calculation Scheduler (haproxy) Load gen framework task resource offer Nexus master status update Nexus slave Nexus slave Nexus slave Web executor Load gen executor Web executor Load gen executor Load gen executor Web executor task (Apache) task task (Apache) task task task task (Apache)
  • 49. Future Work Experiment with parallel programming models Further explore low-latency services on Nexus (web applications, etc) Shared services (e.g. BigTable, GFS) Deploy to users and open source
  • 51. Open Cirrus™: Seizing the Open Source Cloud Stack OpportunityA joint initiative sponsored by HP, Intel, and Yahoo! http://opencirrus.org/
  • 52. Proprietary Cloud Computing stacks GOOGLE AMAZON MICROSOFT Publicly accessible layer Applications Applications Applications Application Frameworks MapReduce, Sawzall, Google App Engine, Protocol Buffers Application Frameworks EMR – Hadoop Application Frameworks .NET Services Software Infrastructure VM Management Job Scheduling Borg Storage Management GFS, BigTable Monitoring Borg Software Infrastructure VM Management EC2 Job Scheduling Storage Management S3, EBS Monitoring Borg Software Infrastructure VM Management Fabric Controller Job Scheduling Fabric Controller Storage Management SQL Services, blobs, tables, queues Monitoring Fabric Controller Hardware Infrastructure Borg Hardware Infrastructure Hardware Infrastructure Fabric Controller
  • 53. Applications Monitoring Ganglia Nagios Zenoss MON Moara Storage Management HDFS KFS Gluster Lustre PVFS MooseFS HBase Hypertable Application Frameworks Pig, Hadoop, MPI, Sprout, Mahout Software Infrastructure VM Management Job Scheduling Storage Management Monitoring Job Scheduling Maui/Torque VM Management Eucalyptus Enomalism Tashi Reservoir Nimbus, oVirt Hardware Infrastructure PRS Emulab Cobbler xCat Hardware Infrastructure PRS, Emulab, Cobbler, xCat Open Cloud Computing stacks Heavily fragmented today!
  • 54.
  • 55. Open Cirrus Organization Central Management Office, oversees Open Cirrus Currently owned by HP Governance model Research team Technical team New site additions Support (legal (export, privacy), IT, etc.) Each site Runs its own research and technical teams Contributes individual technologies Operates some of the global services E.g. HP site supports portal and PRS Intel site developing and supporting Tashi Yahoo! contributes to Hadoop
  • 56. Intel BigData Open Cirrus Site Mobile Rack 8 (1u) nodes ------------- 2 Xeon E5440 (quad-core) [Harpertown/ Core 2] 16GB DRAM 2 1TB Disk http://opencirrus.intel-research.net 1 Gb/s (x8 p2p) 1 Gb/s (x4) Switch 24 Gb/s 1 Gb/s (x8) 1 Gb/s (x4) 45 Mb/s T3 to Internet Switch 48 Gb/s * Switch 48 Gb/s 1 Gb/s (x2x5 p2p) 1 Gb/s (x4) 1 Gb/s (x4) 1 Gb/s (x4) 1 Gb/s (x4) 1 Gb/s (x4) 3U Rack 5 storage nodes ------------- 12 1TB Disks Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s 1 Gb/s (x4x4 p2p) 1 Gb/s (x4x4 p2p) 1 Gb/s (x15 p2p) 1 Gb/s (x15 p2p) 1 Gb/s (x15 p2p) (r1r5) PDU w/per-port monitoring and control Blade Rack 40 nodes Blade Rack 40 nodes 1U Rack 15 nodes 2U Rack 15 nodes 2U Rack 15 nodes 20 nodes: 1 Xeon (1-core) [Irwindale/Pent4], 6GB DRAM, 366GB disk (36+300GB) 10 nodes: 2 Xeon 5160 (2-core) [Woodcrest/Core], 4GB RAM, 2 75GB disks 10 nodes: 2 Xeon E5345 (4-core) [Clovertown/Core],8GB DRAM, 2 150GB Disk 2 Xeon E5345 (quad-core) [Clovertown/ Core] 8GB DRAM 2 150GB Disk 2 Xeon E5420 (quad-core) [Harpertown/ Core 2] 8GB DRAM 2 1TB Disk 2 Xeon E5440 (quad-core) [Harpertown/ Core 2] 8GB DRAM 6 1TB Disk 2 Xeon E5520 (quad-core) [Nehalem-EP/ Core i7] 16GB DRAM 6 1TB Disk x3 x2 x2 Key: rXrY=row X rack Y rXrYcZ=row X rack Y chassis Z (r2r2c1-4) (r2r1c1-4) (r1r1, r1r2) (r1r3, r1r4, r2r3) (r3r2, r3r3)
  • 57. Open Cirrus Sites Total 1,029 4 PB 12,074 1,746 26.3 TB
  • 59. Open Cirrus Stack Compute + network + storage resources Management and control subsystem Power + cooling Physical Resource set (Zoni) service Credit: John Wilkes (HP)
  • 60. Open Cirrus Stack Research Tashi NFS storage service HDFS storage service PRS clients, each with theirown “physical data center” Zoni service
  • 61. Open Cirrus Stack Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster Virtual clusters (e.g., Tashi) Zoni service
  • 62. Open Cirrus Stack Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster Application running On Hadoop On Tashi virtual cluster On a PRS On real hardware BigData App Hadoop Zoni service
  • 63. Open Cirrus Stack Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster Experiment/ save/restore BigData app Hadoop Zoni service
  • 64. Open Cirrus Stack Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster Experiment/ save/restore BigData App Hadoop Platform services Zoni service
  • 65. Open Cirrus Stack Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster User services Experiment/ save/restore BigData App Hadoop Platform services Zoni service
  • 66. Open Cirrus Stack Research Tashi NFS storage service HDFS storage service Virtual cluster Virtual cluster User services Experiment/ save/restore BigData App Hadoop Platform services Zoni
  • 67. System Organization Compute nodes are divided into dynamically-allocated, vlan-isolated PRS subdomains Apps switch back and forth between virtual and phyiscal Open service research Apps running in a VM mgmt infrastructure (e.g., Tashi) Tashi development Production storage service Proprietary service research Open workload monitoring and trace collection
  • 68. Open Cirrus stack - Zoni Zoni service goals Provide mini-datacenters to researchers Isolate experiments from each other Stable base for other research Zoni service approach Allocate sets of physical co-located nodes, isolated inside VLANs. Zoni code from HP being merged into Tashi Apache project and extended by Intel Running on HP site Being ported to Intel site Will eventually run on all sites
  • 69. Open Cirrus Stack - Tashi An open source Apache Software Foundation project sponsored by Intel (with CMU, Yahoo, HP) Infrastructure for cloud computing on Big Data http://incubator.apache.org/projects/tashi Research focus: Location-aware co-scheduling of VMs, storage, and power. Seamless physical/virtual migration. Joint with Greg Ganger (CMU), Mor Harchol-Balter (CMU), Milan Milenkovic (CTG)
  • 70. Node Node Node Node Node Node Tashi High-Level Design Services are instantiated through virtual machines Most decisions happen in the scheduler; manages compute/storage/power in concert Data location and power information is exposed to scheduler and services Scheduler Virtualization Service Storage Service The storage service aggregates the capacity of the commodity nodes to house Big Data repositories. Cluster Manager Cluster nodes are assumed to be commodity machines CM maintains databases and routes messages; decision logic is limited
  • 72. 73 Open Cirrus Stack - Hadoop An open-source Apache Software Foundation project sponsored by Yahoo! http://wiki.apache.org/hadoop/ProjectDescription Provides a parallel programming model (MapReduce), a distributed file system, and a parallel database (HDFS)
  • 73. What kinds of research projects are Open Cirrus sites looking for? Open Cirrus is seeking research in the following areas (different centers will weight these differently): Datacenter federation Datacenter management Web services Data-intensive applications and systems The following kinds of projects are generally not of interest: Traditional HPC application development Production applications that just need lots of cycles Closed source system development
  • 74. How do users get access to Open Cirrus sites? Project PIs apply to each site separately. Contact names, email addresses, and web links for applications to each site will be available on the Open Cirrus Web site (which goes live Q209) http://opencirrus.org Each Open Cirrus site decides which users and projects get access to its site. Developing a global sign on for all sites (Q2 09) Users will be able to login to each Open Cirrus site for which they are authorized using the same login and password.
  • 75. Summary and Lessons Intel is collaborating with HP and Yahoo! to provide a cloud computing testbed for the research community Using the cloud as an accelerator for interactive streaming/big data apps is an important usage model Primary goals are to Foster new systems research around cloud computing Catalyze open-source reference stack and APIs for the cloud Access model, Local and global services, Application frameworks Explore location-aware and power-aware workload scheduling Develop integrated physical/virtual allocations to combat cluster squatting Design cloud storage models GFS-style storage systems not mature, impact of SSDs unknown Investigate new application framework alternatives to map-reduce/Hadoop
  • 76. Other Cloud Computing Research Topics: Isolation and DC Energy
  • 77. Heterogeneity in Virtualized Environments VM technology isolates CPU and memory, but disk and network are shared Full bandwidth when no contention Equal shares when there is contention 2.5x performance difference EC2 small instances
  • 78. Isolation Research Need predictable variance over raw performance Some resources that people have run into problems with: Power, disk space, disk I/O rate (drive, bus), memory space (user/kernel), memory bus, cache at all levels (TLB, etc), hyperthreading/etc, CPU rate, interrupts Network: NIC (Rx/Tx), Switch, cross-datacenter, cross-country OS resources: File descriptors, ports, sockets
  • 79. Datacenter Energy EPA, 8/2007: 1.5% of total U.S. energy consumption Growing from 60 to 100 Billion kWh in 5 yrs 48% of typical IT budget spent on energy 75 MW new DC deployments in PG&E’s service area – that they know about! (expect another 2x) Microsoft: $500m new Chicago facility Three substations with a capacity of 198MW 200+ shipping containers w/ 2,000 servers each Overall growth of 20,000/month
  • 81. First Milestone: DC Energy Conservation DCs limited by power For each dollar spent on servers, add $0.48 (2005)/$0.71 (2010) for power/cooling $26B spent to power and cool servers in 2005 grows to $45B in 2010 Within DC racks, network equipment often the “hottest” components in the hot spot
  • 82. Thermal Image of Typical Cluster Rack Rack Switch M. K. Patterson, A. Pratt, P. Kumar, “From UPS to Silicon: an end-to-end evaluation of datacenter efficiency”, Intel Corporation
  • 83. DC Networking and Power Selectively power down ports/portions of net elements Enhanced power-awareness in the network stack Power-aware routing and support for system virtualization Support for datacenter “slice” power down and restart Application and power-aware media access/control Dynamic selection of full/half duplex Directional asymmetry to save power, e.g., 10Gb/s send, 100Mb/s receive Power-awareness in applications and protocols Hard state (proxying), soft state (caching), protocol/data “streamlining” for power as well as b/w reduction Power implications for topology design Tradeoffs in redundancy/high-availability vs. power consumption VLANs support for power-aware system virtualization
  • 84. Summary Many areas for research into Cloud Computing! Datacenter design, languages, scheduling, isolation, energy efficiency (at all levels) Opportunities to try out research at scale! Amazon EC2, Open Cirrus, …
  • 85. Thank you! adj@eecs.berkeley.edu http://abovetheclouds.cs.berkeley.edu/ 86

Editor's Notes

  1. Just mention briefly that there are things MR and Dryad can’t do, and that there are competing implementations; perhaps also note the need to share resources with other data center services here?The excitement surrounding cluster computing frameworks like Hadoop continues to accelerate. (e.g. EC2 Hadoop and Dryad in Azure)Startups, enterprises, and us researchers are bursting with ideas to improve these already existing frameworks. But more importantly as we encounter the limitations of MR, we’re making a shopping list of what we want in next generation frameworks, new abstractions, programming models, even new implementations of existing models (e.g. Erlang MR called Disco).We believe that no single framework can best facilitate this innovation, but instead that people will want to run existing and new frameworks on the same physical clusters at the same time.
  2. Useful even if you only use one frameworkRun isolated framework instances (production vs test)Run multiple versions of framework together
  3. Global scheduler needs to make guesses about a lot more (job running times, etc)Talk about adaptive frameworks that may not know how many tasks they need in advanceTalk about irregular parallelism jobs that don’t even know DAG in advance**We are exploring resource offers but don’t yet know the limits; seem to work OK for jobs with data locality needs though**
  4. Global scheduler needs to make guesses about a lot more (job running times, etc)Talk about adaptive frameworks that may not know how many tasks they need in advanceTalk about irregular parallelism jobs that don’t even know DAG in advance**We are exploring resource offers but don’t yet know the limits; seem to work OK for jobs with data locality needs though**
  5. …multiple frameworks to run concurrently! Here we see a new framework, Dryad being run side by side with Hadoop, and Nexus is multiplexing the slaves between both. Some are running Hadoop tasks, some Dryad, and some both.
  6. …multiple frameworks to run concurrently! Here we see a new framework, Dryad being run side by side with Hadoop, and Nexus is multiplexing the slaves between both. Some are running Hadoop tasks, some Dryad, and some both.
  7. …multiple frameworks to run concurrently! Here we see a new framework, Dryad being run side by side with Hadoop, and Nexus is multiplexing the slaves between both. Some are running Hadoop tasks, some Dryad, and some both.
  8. …multiple frameworks to run concurrently! Here we see a new framework, Dryad being run side by side with Hadoop, and Nexus is multiplexing the slaves between both. Some are running Hadoop tasks, some Dryad, and some both.
  9. Waiting 1s gives 90% locality, 5s gives 95%
  10. Linux containers can actually be both “application” containers where an app shares the filesystem with the host (similar to Solaris projects), or “system” containers where each container has its own filesystem (similar to Solaris zones); both types also prevent processes in a container from seeing those outside it
  11. Transition to next slide: when you have policy == SLAs
  12. What to do with the rest of the resources?
  13. Mentioned shared HDFS!
  14. Mentioned shared HDFS!
  15. 16 Hadoop instances doing synthetic filter job100 nodes, 4 slots per nodeDelay scheduling improves performance by 1.7x