My Other Computer is a Data Center (2010 v21)

An Overview of Cloud Computing:My Other Computer is a Data Center Robert GrossmanOpen Cloud Consortium January 7, 2010

What is a Cloud? 3 Software as a Service (SaaS)

What Else is a Cloud? 4 Platform as a Service (PaaS)

Is Anything Else a Cloud? 5 Infrastructure as a Service (IaaS)

Are There Other Types of Clouds? 6 ad targeting Large Data Cloud Services

Idea Dates Back to the 1960s 8 App App App CMS CMS MVS IBM VM/370 IBM Mainframe Native (Full) Virtualization Examples: Vmware ESX Virtualization first widely deployed with IBM VM/370.

What Do You Optimize? Goal: Minimize latency and control heat. Goal: Maximize data (with matching compute) and control cost.

Elastic, Usage Based Pricing Is New 11 costs the same as 1 computer in a rack for 120 hours 120 computers in three racks for 1 hour ,[object Object]

Clouds can be used to manage surges in computing needs.,[object Object]

What Resource is Managed? Scarce processors wait for data Manage cycles wait for an opening in the queue scatter the data to the processors and gather the results Persistent data wait for queries Manage data persistent data waits for queries computation done locally results returned Supercomputer Center Model Data Center Model

Part 2. Data Centers as the Unit of Computing Cloud computing is at the top of the Gartner hype cycle. “Cloud computing has become the center of investment and innovation.”Nicholas Carr, 2009 IDC Directions 15

2004 10x-100x 1976 10x-100x data science 1670 250x simulation science 1609 30x experimental science

Transition Taking Place A hand full of players are building multiple data centers a year and improving with each one. This includes Google, Microsoft, Yahoo, … A data center today costs $200 M – $400+ M Berkeley RAD Report points out analogy with semiconductor industry as companies stopped building their own Fabs and starting leasing Fabs from others as Fabs approached $1B 18

Which is the Operating System? 19 … … VM 1 VM 5 VM 50,000 VM 1 Data Center Operating System Hyperviser workstation data center

How Do You Program A Data Center? 20

Some Programming Models for Data Centers Operations over data center of disks MapReduce (“string-based”) User-Defined Functions (UDFs) over data center SQL and Quasi-SQL over data center Data analysis / statistics over data center Operations over data center of memory Grep over distributed memory UDFs over distributed memory SQL and Quasi-SQL over distributed memory Data analysis / statistics over distributed memory

U.S. 501(3)(c) not-for-profit corporation Supports the development of standards and interoperability frameworks. Supports reference implementations for cloud computing. Manages testbeds: Open Cloud Testbed, IntercloudTestbed, Open Science Data Cloud Develops benchmarks. 23 www.opencloudconsortium.org

OCC Members Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo Universities: CalIT2, Johns Hopkins, Northwestern, University of Illinois at Chicago, University of Chicago Government agencies: NASA Organizations: Sector Project 24

Open Cloud Testbed C-Wave CENIC Dragon Phase 2 9 racks 250+ Nodes 1000+ Cores 10+ Gb/s ,[object Object]

IntercloudTestbed ,[object Object]

Data & Storage as a ServiceLarge Data Cloud Interoperability Framework Working with Infrastructure 2.0 Working Group SNIA Cloud Data Management Interface (CDMI) Dynamic infrastructure service linking IaaS and DaaS Working with Infrastructure 2.0 Working Group ,[object Object],Virtual Data Centers (VDC) Virtual Networks (VN) Virtual Machines (VM) Physical Resources Dynamic infrastructure service naming and linking entities in the IaaS layers Open Cloud Computing Interface (OCCI) Open Virtualization Format (OVF)

Open Science Data Cloud sky cloud Planning to work with 5 international partners (all connected with 10 Gbps networks). biocloud 27

MalStone (OCC-Developed Benchmark) Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack. Data consisted of 20 nodes with 500 million 100-byte records / node.

Some Lessons Learned (So Far) Python over Hadoop Distributed File System surprisingly powerful. Tuning Hadoop can be a large (unacknowledged) cost. Performance of a cloud computation can be significantly impacted by just 1 or 2 nodes that are a bit slower. Wide area clouds can be practical in some cases. 29

Part 4. Sector 30 http://sector.sourceforge.net

Sector Overview Sector is fast As measured by MalStone & Terasort Sector is easy to program Supports UDFs, MapReduce & Python over streams Sector does not require extensive tuning. Sector is secure A HIPAA compliant Sector cloud is being set up Sector is reliable Sector v1.24 supports multiple master node servers 31

Google’s Large Data Cloud Compute Services Data Services Storage Services 32 Applications Google’s MapReduce Google’s BigTable Google File System (GFS) Google’s Stack

Hadoop’s Large Data Cloud Compute Services Storage Services 33 Applications Hadoop’sMapReduce Data Services Hadoop Distributed File System (HDFS) Hadoop’s Stack

Sector’s Large Data Cloud 34 Applications Compute Services Sphere’s UDFs Data Services Sector’s Distributed File System (SDFS) Storage Services UDP-based Data Transport Protocol (UDT) Routing & Transport Services Sector’s Stack

My Other Computer is a Data Center (2010 v21)

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (16)

Similar to My Other Computer is a Data Center (2010 v21)

Similar to My Other Computer is a Data Center (2010 v21) (20)

More from Robert Grossman

More from Robert Grossman (20)

Recently uploaded

Recently uploaded (20)

My Other Computer is a Data Center (2010 v21)