• Save
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Upcoming SlideShare
Loading in...5
×
 

Key Challenges in Cloud Computing and How Yahoo! is Approaching Them

on

  • 4,559 views

From Raghu Ramakrishnan's presentation "Key Challenges in Cloud Computing and How Yahoo! is Approaching Them" at the 2009 Cloud Computing Expo in Santa Clara, CA, USA. Here's the talk description on ...

From Raghu Ramakrishnan's presentation "Key Challenges in Cloud Computing and How Yahoo! is Approaching Them" at the 2009 Cloud Computing Expo in Santa Clara, CA, USA. Here's the talk description on the Expo's site: http://cloudcomputingexpo.com/event/session/510

Statistics

Views

Total Views
4,559
Views on SlideShare
4,534
Embed Views
25

Actions

Likes
19
Downloads
0
Comments
1

4 Embeds 25

http://www.slideshare.net 22
http://static.slidesharecdn.com 1
http://webcache.googleusercontent.com 1
http://www.techgig.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • yahoo cloud
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Will begin by referring to Shelton’s talk, and saying that this talk is about the technical challenges in building and using the clouds he described
  • Point out that the key challenges cut across the different cloud components
  • Say that we begin with a closer look at data management issues
  • Note that this is the non-Mobstor case
  • Talk about what we’ve done, and explain how design choices address the key challenges
  • “ Free” update goes to east coast first because of network disruption, and as it happens, this happens before original busy notification is received. Now the west coast receives the corresponding free notification while the east coast receives the original busy notification. No priority between these two outcomes …
  • Yahoo! thought leadership in this space
  • Updates write the whole record
  • Yahoo! thought leadership in this space
  • Abstracts concerns of the underlying infrastructure and the network communication •Virtualized hardware •Declarative application structure •End-to-End Security •Standardized software stacks and packaging •Integrated service management •Continuous Integration and Deployment •Containers that service requests as opposed to Machines that run executables •Elastic serving of changing workloads •Controlled/Intelligent traffic direction •Controlled execution environment •Managed Communication •Service Associations, Bindings, and Access Controls
  • Common mechanism to orchestrate resources in the Cloud including nodes storage devices, load balancers, security identities, and services themselves (IETF proposal) - Also, moving services within and across clouds Clouds may be partitioned into independently administered "regions" (environment in which services and resources can interact) to facilitate load balancing and ownership
  • over 3,500 transactions/second per YTS box Ysquid – used in cases where the is a clear need based on features….does not scale per YTS in most cases (runs on single core, single threaded) YCPI, accelerates dynamic content internationally?
  • US only Much or Y! runs through YCS in US YCPI
  • Quick plug for Jonathan’s talk—and make the point that while Yahoo!’s cloud is currently private, it enables externally available apps, including some that selectively expose cloud capabilities to outside developers

Key Challenges in Cloud Computing and How Yahoo! is Approaching Them Key Challenges in Cloud Computing and How Yahoo! is Approaching Them Presentation Transcript

  • Vendor Speech by Raghu Ramakrishnan Cloud Computing Conference & Expo November 3, 2009   Key Challenges in Cloud Computing … and the Yahoo! Approach
  • Key Challenges
    • Elastic scaling
        • Typically, with commodity infrastructure
    • Availability
        • Trading consistency/performance/availability
    • Handling failures
        • What can be counted on after a failure?
    • Operational efficiency
        • Managing and tuning multi-tenanted clouds
    • The right abstractions
        • Data, security, and services in the cloud
        • Don’t forget failures!
  • Inside Yahoo!’s Cloud KEY CHALLENGES
  • DATA MANAGEMENT IN THE CLOUD
  • Help!
    • I have a huge amount of data. What should I do with it?
    DB2 UDB Sherpa
  • What Are You Trying to Do? Data Workloads OLTP (Random access to a few records) OLAP (Scan access to a large number of records) Read-heavy Write-heavy By rows By columns Unstructured Combined (Some OLTP and OLAP tasks)
  • Yahoo! Solution Space OLTP (Random access to a few records) OLAP (Scan access to a large number of records) Read-heavy Write-heavy By rows By columns Unstructured Combined (Some OLTP and OLAP tasks) UDS UDB ??? STCache Sherpa Read, read/write Write-heavy Main-memory SQL on Grid Zebra Pig HDFS MapReduce
  • Storage
    • Common features:
    • Managed service, Simple REST API
    • Replication, BCP, High Availability
    • Global footprint
    • Sherpa (more later in this talk)
    • Key-Value, structured data store
    • Size of objects – target is up to 100k
    • R/W target latency 10 – 20ms
    • Secondary indexes, ordered tables, real-time update notifications
    • MObStor (Chuck Neerdaels, Wed, Session 3, @ 9am)
    • Large, unstructured online storage
    • Object size typically a few MB – up to 2GB
    • R/W target latency 50 – 100ms (depending on size)
    • Commodity Storage (DORA)
  • Yahoo! Storage Problem
      • Small records – 100KB or less
      • Structured records – Lots of fields, evolving
      • Extreme data scale - Tens of TB
      • Extreme request scale - Tens of thousands of requests/sec
      • Low latency globally - 20+ datacenters worldwide
      • High Availability - Outages cost $millions
      • Variable usage patterns - Applications and users change
  • Typical Applications
    • User logins and profiles
        • Including changes that must not be lost!
          • But single-record “transactions” suffice
    • Events
        • Alerts (e.g., news, price changes)
        • Social network activity (e.g., user goes offline)
        • Ad clicks, article clicks
    • Application-specific data
        • Postings in message board
        • Uploaded photos, tags
        • Shopping carts
  • VLSD Data Serving Stores
    • Must partition data across machines
        • Can partitions be changed easily? (Affects elasticity )
        • How are read/update requests routed?
      • Range selections ? Can requests span machines?
    • Availability : What failures are handled?
        • With what semantic guarantees on data access?
    • (How) Is data replicated ?
        • Sync or async? Consistency model? Local or geo?
    • How are updates made durable ?
    • How is data stored on a single machine?
  • The CAP Theorem
    • You have to give up one of the following in a distributed system (Brewer, PODC 2000; Gilbert/Lynch, SIGACT News 2002):
        • Consistency of data
          • Think serializability
        • Availability
          • Pinging a live node should produce results
        • Partition tolerance
          • Live nodes should not be blocked by partitions
  • Approaches to CAP
    • Use a single version of DB, reconcile later
    • Defer transaction commit
    • Eventual consistency (e.g., Amazon Dynamo)
    • Restrict transactions (e.g., Sharded MySQL)
        • 1-M/c Xacts: Objects in xact are on the same machine
        • 1-Object Xacts: Xact can only read/write 1 object
    • Object timelines (SHERPA)
    http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
  • One Slide Hadoop Primer HDFS Data file Map tasks HDFS Reduce tasks Good for analyzing (scanning) huge files Not great for serving (reading or writing individual objects)
  • Out There in the World
    • OLTP
      • General
      • Write optimized
      • Main-memory
      • m e m c a c h e d*
    • OLAP
      • Row store
      • Column store
    • MapReduce
    • Combined OLAP/OLTP
      • B i g T a b l e
    • Other
      • Blobs
      • Information retrieval
    HadoopDB ??? * Memcached provides no native query processing DB2
  • Ways of Using Hadoop Data workloads OLAP (Scan access to a large number of records) By rows By columns Unstructured HadoopDB SQL on Grid Zebra
  • Hadoop-Based Applications @ Yahoo! 2008 2009 Webmap ~70 hours runtime ~300 TB shuffling ~200 TB output ~73 hours runtime ~490 TB shuffling ~280 TB output +55% hardware Terasort 209 seconds 1 Terabyte sorted 900 nodes 62 seconds 1 Terabyte, 1500 nodes 16.25 hours 1 Petabyte, 3700 nodes Largest cluster
    • 2000 nodes
    • 6PB raw disk
    • 16TB of RAM
    • 16K CPUs
    • 4000 nodes
    • 16PB raw disk
    • 64TB of RAM
    • 32K CPUs
    • (40% faster CPUs too)
  • Batch Storage and Processing
    • Hadoop (Eric Baldeschwieler, Tue 4:50 pm, Session 9)
    • Runs data/processing-intensive distributed applications on 1,000’s of nodes
    • Data Processing
    • Map Reduce – Java API
    • Pig – Procedural programming language
    • PigSQL – SQL
    • Process Management
    • Workflow; services for registering & discovering published data sets, moving/archiving data, grid management, etc.
  • SHERPA * Yahoo!’s Serving Store * The system also known as PNUTS
  • What is Sherpa? CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel database Geographic replication Structured, flexible schemas Hashed and ordered tables Hosted, managed infrastructure E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E
  • Key Components
    • Maintains map from database.table.key-to-tablet-to-SU
    • Provides load balancing
      • Caches the maps from the TC
      • Routes client requests to correct SU
      • Stores records
      • Services get/set/delete requests
  • Architecture Storage units Routers Tablet Controller REST API Clients Local region Remote regions Tribble
  • Updates Write key k Sequence # for key k Sequence # for key k Write key k SUCCESS Write key k Routers Message brokers 1 2 Write key k 7 8 SU SU SU 3 4 5 6
  • Accessing Data SU SU SU Get key k 1 2 Get key k 3 Record for key k 4 Record for key k
  • ASYNCHRONOUS REPLICATION AND CONSISTENCY
  • Asynchronous Replication
  • Consistency Model
    • If copies are asynchronously updated, what can we say about stale copies?
      • ACID transactions: Require synchronous updates
      • Eventual consistency: Copies can drift apart, but will eventually converge if the system is allowed to quiesce
        • To what value will copies converge?
        • Do systems ever “quiesce”?
      • Is there any middle ground?
  • Example: Social Alice West East ___ Busy Free Free Record Timeline (Network fault, updt goes to East) (Alice logs on) User Status Alice Busy User Status Alice Free User Status Alice ??? User Status Alice ??? User Status Alice Busy User Status Alice ___
  • Sherpa Consistency Model Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write Current version Stale version Stale version Achieved via per-record primary copy protocol (To maximize availability, record masterships automaticlly transferred if site fails) Can be selectively weakened to eventual consistency (local writes that are reconciled using version vectors)
  • Sherpa Consistency Model Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Current version Stale version Stale version Read In general, reads are served using a local copy
  • Sherpa Consistency Model Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read up-to-date Current version Stale version Stale version But application can request and get current version
  • OPERABILITY
  • Distribution Distribution for parallelism Data shuffling for load balancing Server 1 Server 2 Server 3 Server 4 Bike $86 6/2/07 636353 Chair $10 6/5/07 662113 Couch $570 6/1/07 424252 Car $1123 6/1/07 256623 Lamp $19 6/7/07 121113 Bike $56 6/9/07 887734 Scooter $18 6/11/07 252111 Hammer $8000 6/11/07 116458
  • Tablet Splitting and Balancing Each storage unit has many tablets (horizontal partitions of the table) Tablets may grow over time Overfull tablets split Storage unit may become a hotspot Shed load by moving tablets to other servers Storage unit Tablet
  • Mastering A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 E C 66354 W D 12352 E E 75656 C F 15677 E C 66354 W B 42521 E A 42342 E D 12352 E E 75656 C F 15677 E
  • Record vs. Tablet Master A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E Record master serializes updates Tablet master serializes inserts A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 E C 66354 W D 12352 E E 75656 C F 15677 E C 66354 W B 42521 E A 42342 E D 12352 E E 75656 C F 15677 E
  • Coping With Failures A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E X X OVERRIDE W -> E
  • COMPARING SYSTEMS
  • Example
    • User database
    • Workload:
    Data workloads Read-heavy *but: occasional writes UserID Fname Lname Country Sex Birthday MailLoc Avatar Passwd … bradtm Brad McMillen USA M 7/1/74 /m2/u2 guy3.gif jtr33KKa … cooperb Brian Cooper USA M 6/29/75 /m1/u4 guy2.gif Ffa3rrw1 … sangeeta Sangeetha Seshadri India F 8/13/82 /m3/u8 lady9.gif 3uiU$422 … … … … … … … … … … … OLTP (Random access to a few records)
  • Which System is Better?
    • Likely will work well:
    • Likely won’t work well:
    • Row oriented
    • Supports random writes
    • Scale-out via partitioning
    UDB
    • Scans of large files, not point access
    • Scans of columns, not point access
    • Scans of large tables, not point access
    • Write, not read, optimized
    Sherpa
  • Why System X?
    • Example: MySQL/Oracle/Sherpa
          • Page-based buffer manager
          • Page management is optimized for random access
    MySQL node Disk RAM Read Buffer Page B-Tree index Disk Page Disk Page Disk Page Disk Page Disk Page Disk Page Disk Page Disk Page Disk Page
  • Why Not System Y?
    • Example: Cassandra
        • Write optimized, not read optimized
        • Updates are written sequentially
        • Reads must reconstruct from multiple places on disk
    Cassandra node Disk RAM Log SSTable file Memtable Update (later) SSTable file SSTable file
  • Comparison Overview
    • Setup
      • Six server-class machines
        • 8 cores (2 x quadcore) 2.5 GHz CPUs, RHEL 4, Gigabit ethernet
        • 8 GB RAM
        • 6 x 146GB 15K RPM SAS drives in RAID 1+0
      • Plus extra machines for clients, routers, controllers, etc.
    • Workloads
      • 120 million 1 KB records = 20 GB per server
      • Write heavy workload: 50/50 read/update
      • Read heavy workload: 95/5 read/update
    • Metrics
      • Latency versus throughput curves
  • Preliminary Results Sherpa
  • YCS Benchmark
    • Will distribute Java application, plus extensible benchmark suite
      • Many systems have Java APIs
      • Non-Java systems can support REST
    • Scenario file
    • R/W mix
    • Record size
    • Data set
    • Command-line parameters
    • DB to use
    • Target throughput
    • Number of threads
    YCSB client DB client Client threads Stats Scenario executor Cloud DB Extensible: plug in new clients Extensible: define new scenarios
  • Benchmark Tiers
    • Tier 1: Cluster performance
      • Tests fundamental design independent of replication/scale-out/availability
    • Tier 2: Replication
      • Local replication
      • Geographic replication
    • Tier 3: Scale out
      • Scale out the number of machines by a factor of 10
      • Scale out the number of replicas
    • Tier 4: Availability
      • Kill system components
    If you’re interested in this, please contact me or cooperb@yahoo-inc.com
  • EDGE, CLOUD SERVING, AND MORE
  • Cloud Serving: The Integrated Cloud Big idea: Declarative language for specifying the full, end-to-end structure of a service Key insight: this “full, end-to-end structure” includes multiple environments Central mechanism: the Integrated Cloud , which (re)deploys these specifications ( Surendra Reddy, 2:20 pm, Session 7) Development, multiple testing environments, alpha and bucket-testing environments, production
  • Foundational Components
    • An abstract type system for describing the application protocol in an implementation neutral manner
    • Format descriptions for resources, entrypoints, bindings, etc.
    • Application-layer protocols for establishing the user's account in a region or across regions; managing services and moving them across regions
    • A security model describing trust relationships between participating entities, and guidelines for the use of existing authentication and confidentiality mechanisms
    • An application-layer protocol for identifying entities, and requesting information about them
    • Allocation, Provisioning, and Metering of Cloud Resources
    • Definition, Deployment, and Life Cycle Management of Cloud Services
    • Virtual Machine Management
    • Metadata/Registry for Cloud Services
    OpenCAP: Open Cloud Access Protocol
  • Edge Services
    • Yahoo! Traffic Server (Chuck Neerdaels, Wed, Session 3, @ 9am)
    • Highly scalable – over 3,500 req/sec per YTS box
    • Plug-in architecture
    • Core for multiple edge products
    • Caching Services
    • E.g., YCS
    • Web Acceleration, DNS and Load Balancing
    • YCPI: Connection proxy that reduces serving latencies
    • Brooklyn: High available location aware DNS
    • Bronx: Application level load balancer
  • YCS Scale YCS handles so much Yahoo traffic, this is noise!
  • YQL
    • Thousands of Web Services and sources that provide valuable data
    • Require Developers to read documentation and form URLs/queries.
    • Data is isolated
    • Needs filtering, combining , tweaking , shaping even after it gets to the developer.
  • YQL
    • Hosted web service and SQL-Like Language
        • Familiar to developers
          • Synonymous with Data access
        • Expressive enough to get the right data.
    • Self describing - show, desc table
    • Allows you to query, filter, join and update data across any structured data on the web / web services
        • And Yahoo’s Sherpa cloud storage
    • Real time engine
    (Jonathan Trevor, Session 3, 9:10 am)
  • New in 2010!
    • SIGMOD and SIGOPS are starting a new annual conference (co-located with SIGMOD in 2010):
    • ACM Symposium on Cloud Computing (SoCC)
    • PC Chairs: Surajit Chaudhuri & Mendel Rosenblum
    • GC: Joe Hellerstein Treasurer: Brian Cooper
    • Steering committee: Phil Bernstein, Ken Birman, Joe Hellerstein, John Ousterhout, Raghu Ramakrishnan, Doug Terry, John Wilkes
  • TUESDAY, 11/3 9:10 am – 9:55 am Yahoo! Query Language - YQL: Select * from Internet [Including Live Demo!] Dr. Jonathan Trevor Senior Architect TUESDAY, 11/3 2:20 pm – 3:05 pm Walking through Cloud Serving at Yahoo! Surendra Reddy VP, Integrated Cloud and Visualization TUESDAY, 11/3 4:50pm – 5:35 pm Hadoop @ Yahoo! – Internet Scale Data Processing Eric Baldeschwieler VP, Hadoop Software Development WEDNESDAY, 11/4 9:10 am - 9:55 am Yahoo! Scalable Storage and Delivery Services Chuck Neerdaels VP, Storage and Edge Services VISIT BOOTH #103 TO TALK WITH YAHOO! ENGINEERS AND LEARN MORE ABOUT YAHOO!’S VISION FOR CLOUD COMPUTING.