• Like
  • Save
Cloud Computing @Yahoo!
Upcoming SlideShare
Loading in...5
×
 

Cloud Computing @Yahoo!

on

  • 1,916 views

 

Statistics

Views

Total Views
1,916
Views on SlideShare
1,913
Embed Views
3

Actions

Likes
2
Downloads
85
Comments
0

1 Embed 3

http://www.slideshare.net 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • A little bit about me Over 10 years in the industry 3 years in the cloud Former GigaSpaces Cloud Technical Director Leading Yahoo Integrated Cloud product line Based in Yahoo’s headquarters at Sunnyvale, CA
  • Gartner presentation?? May
  • Up arrow – more updates with the same quality, can only be done by moving the curve Right arrow – more quality without losing features Quicker in the same quality Y Axis = “Agility” , measured by how many releases we can have in a year X Axis = “Quality”, measured by up time
  • Why Pyramid? Everything is targeted toward the users Every $ you spend in lower layers, yield in $$$ at the upper layers Every layer is built on top of the other, and add only the “marginal” development effort Yahoo!’s unique cloud play is that while other companies have focused on building out and exposing services at the infrastructure level (in terms of external products – think Amazon) Yahoo! has decided to focus on building out and exposing what we call our “functional cloud services”. These are services aimed at helping developers, partners, advertisers and Yahoo! itself create and deliver consumer focused product innovations by creating open services with access to Yahoo! data and computing resources. This is not to say we don’t invest and lead in Infrastructure cloud technology – in fact with Hadoop and other tech that drives the Internet we do lead – but rather that on that front our focus is not on productization of infrastructure as much as on open source and industry/academic collaboration. More like basic research. YQL The YQL platform provides a mediator service that enables developers to query, filter and combine data across Yahoo! and beyond. YQL exposes a SQL-like SELECT syntax that is both familiar to developers and expressive enough for getting the right data. Through the SHOW and DESC commands we attempt to make YQL self-documenting, enabling developers to discover the available data sources and structure without opening another web browser or reading a manual. The YQL Web Service exposes just a single URL, http://query.yahooapis.com/v1/?q=[command] that is compiled for each query. We perform rudimentary analysis on the query to determine how to factor it across one or more web services. As much of the query as possible is reworked into Yahoo! web service REST calls, and the remaining aspects are performed the YQL service itself. Think of a better name of “horizontal”
  • Why Pyramid? Everything is targeted toward the users Every $ you spend in lower layers, yield in $$$ at the upper layers Every layer is built on top of the other, and add only the “marginal” development effort Yahoo!’s unique cloud play is that while other companies have focused on building out and exposing services at the infrastructure level (in terms of external products – think Amazon) Yahoo! has decided to focus on building out and exposing what we call our “functional cloud services”. These are services aimed at helping developers, partners, advertisers and Yahoo! itself create and deliver consumer focused product innovations by creating open services with access to Yahoo! data and computing resources. This is not to say we don’t invest and lead in Infrastructure cloud technology – in fact with Hadoop and other tech that drives the Internet we do lead – but rather that on that front our focus is not on productization of infrastructure as much as on open source and industry/academic collaboration. More like basic research. YQL The YQL platform provides a mediator service that enables developers to query, filter and combine data across Yahoo! and beyond. YQL exposes a SQL-like SELECT syntax that is both familiar to developers and expressive enough for getting the right data. Through the SHOW and DESC commands we attempt to make YQL self-documenting, enabling developers to discover the available data sources and structure without opening another web browser or reading a manual. The YQL Web Service exposes just a single URL, http://query.yahooapis.com/v1/?q=[command] that is compiled for each query. We perform rudimentary analysis on the query to determine how to factor it across one or more web services. As much of the query as possible is reworked into Yahoo! web service REST calls, and the remaining aspects are performed the YQL service itself. Think of a better name of “horizontal”
  • Open PaaS for building scalable web-sites, NOT IaaS Better wording than structred
  • Inquisitor Today, Yahoo! Search is embracing the Mac community and offering similar search assistance features with the acquisition of Inquisitor software, a Safari browser plug-in. Inquisitor 3, a search technology that auto-completes queries and delivers results right in Safari Web browser, is similar to Yahoo!’s existing Search Assist technology. Simply type in your query and websites will appear immediately, as well as suggestions for refining your search. Just as with Search Assist, the goal with Inquisitor is to help users find exactly the site they’re looking for as quickly as possible.
  • Inquisitor Today, Yahoo! Search is embracing the Mac community and offering similar search assistance features with the acquisition of Inquisitor software, a Safari browser plug-in. Inquisitor 3, a search technology that auto-completes queries and delivers results right in Safari Web browser, is similar to Yahoo!’s existing Search Assist technology. Simply type in your query and websites will appear immediately, as well as suggestions for refining your search. Just as with Search Assist, the goal with Inquisitor is to help users find exactly the site they’re looking for as quickly as possible.
  • Why Pyramid? Everything is targeted toward the users Every $ you spend in lower layers, yield in $$$ at the upper layers Every layer is built on top of the other, and add only the “marginal” development effort Yahoo!’s unique cloud play is that while other companies have focused on building out and exposing services at the infrastructure level (in terms of external products – think Amazon) Yahoo! has decided to focus on building out and exposing what we call our “functional cloud services”. These are services aimed at helping developers, partners, advertisers and Yahoo! itself create and deliver consumer focused product innovations by creating open services with access to Yahoo! data and computing resources. This is not to say we don’t invest and lead in Infrastructure cloud technology – in fact with Hadoop and other tech that drives the Internet we do lead – but rather that on that front our focus is not on productization of infrastructure as much as on open source and industry/academic collaboration. More like basic research. YQL The YQL platform provides a mediator service that enables developers to query, filter and combine data across Yahoo! and beyond. YQL exposes a SQL-like SELECT syntax that is both familiar to developers and expressive enough for getting the right data. Through the SHOW and DESC commands we attempt to make YQL self-documenting, enabling developers to discover the available data sources and structure without opening another web browser or reading a manual. The YQL Web Service exposes just a single URL, http://query.yahooapis.com/v1/?q=[command] that is compiled for each query. We perform rudimentary analysis on the query to determine how to factor it across one or more web services. As much of the query as possible is reworked into Yahoo! web service REST calls, and the remaining aspects are performed the YQL service itself. Think of a better name of “horizontal”
  • Yahoo Infrastructure Cloud is a comprehensive bundle of services (not only Hadoop) “ Simple APIs” = integrated and simple API to “consume” the core services, targeted at functional services developers Unstructured storage (blobs) = Images, text, video, binaries, …. Structured storage (information) = Very dynamic, Queryable Online Serving – a key aspect of the “power of Data within Yahoo” Foundations on which tenants build functional services. Not tied to specific app-logic Partially provide the ability to inject application logic through well-defined APIs Broadly applicable Fault-tolerant over commodity hardware Built using inexpensive commodity hardware, and should mask component failures. The Integrated Cloud is the key Loosely coupled services that collectively make it easy to quickly develop and operate functional services
  • Commodity HW + Horizontal scaling Add inexpensive servers with JBODS Storage servers and their disks are not assumed to be highly reliable and available Use replication across servers to deal with unreliable storage/servers Metadata-data separation - simple design Storage scales horizontally Metadata scales vertically (today) Slightly Restricted file semantics Focus is mostly sequential access No file locking features Support for moving computation close to data i.e. servers have 2 purposes: data storage and computation Simplicity of design why a small team could build such a large system in the first place
  • Why Pyramid? Everything is targeted toward the users Every $ you spend in lower layers, yield in $$$ at the upper layers Every layer is built on top of the other, and add only the “marginal” development effort Yahoo!’s unique cloud play is that while other companies have focused on building out and exposing services at the infrastructure level (in terms of external products – think Amazon) Yahoo! has decided to focus on building out and exposing what we call our “functional cloud services”. These are services aimed at helping developers, partners, advertisers and Yahoo! itself create and deliver consumer focused product innovations by creating open services with access to Yahoo! data and computing resources. This is not to say we don’t invest and lead in Infrastructure cloud technology – in fact with Hadoop and other tech that drives the Internet we do lead – but rather that on that front our focus is not on productization of infrastructure as much as on open source and industry/academic collaboration. More like basic research. YQL The YQL platform provides a mediator service that enables developers to query, filter and combine data across Yahoo! and beyond. YQL exposes a SQL-like SELECT syntax that is both familiar to developers and expressive enough for getting the right data. Through the SHOW and DESC commands we attempt to make YQL self-documenting, enabling developers to discover the available data sources and structure without opening another web browser or reading a manual. The YQL Web Service exposes just a single URL, http://query.yahooapis.com/v1/?q=[command] that is compiled for each query. We perform rudimentary analysis on the query to determine how to factor it across one or more web services. As much of the query as possible is reworked into Yahoo! web service REST calls, and the remaining aspects are performed the YQL service itself. Think of a better name of “horizontal”
  • Open PaaS for building scalable web-sites, NOT IaaS Better wording than structred
  • BOSS BOSS (Build your Own Search Service) is Yahoo!'s open search web services platform. The goal of BOSS is simple: to foster innovation in the search industry. Developers, start-ups, and large Internet companies can use BOSS to build and launch web-scale search products that utilize the entire Yahoo! Search index. BOSS gives you access to Yahoo!'s investments in crawling and indexing, ranking and relevancy algorithms, and powerful infrastructure. By combining your unique assets and ideas with our search technology assets, BOSS is a platform for the next generation of search innovation, serving hundreds of millions of users across the Web. YQL The YQL platform provides a mediator service that enables developers to query, filter and combine data across Yahoo! and beyond. YQL exposes a SQL-like SELECT syntax that is both familiar to developers and expressive enough for getting the right data. Through the SHOW and DESC commands we attempt to make YQL self-documenting, enabling developers to discover the available data sources and structure without opening another web browser or reading a manual. The YQL Web Service exposes just a single URL, http://query.yahooapis.com/v1/?q=[command] that is compiled for each query. We perform rudimentary analysis on the query to determine how to factor it across one or more web services. As much of the query as possible is reworked into Yahoo! web service REST calls, and the remaining aspects are performed the YQL service itself.
  • YQL Consol – DEMO? http://developer.yahoo.com/yql/console/
  • YDN is the new thing, exposing cloud-based services to the outside world SaaS-like services (the upper layer) has been around at Yahoo for a long time We need to w ork with YDN to crystallized the difference between web services and functional cloud services
  • M45 more than 27 trillion calculations per second Carnegie Mellon University The University of California at Berkeley Cornell University The University of Massachusetts at Amherst joined OpenCiruic the first Eastern European institution, the Russian Academy of Sciences, Korean Electronics and Telecommunications Research Institute (ETRI), Malaysian Institute of Microelectronic Systems (MIMOS). The University of Illinois at Urbana-Champaign Infocomm Development Authority (IDA) in Singapore The Karlsruhe Institute of Technology (KIT) in Germany The Russian Academy of Sciences, Electronics & Telecomm. Malaysian Institute of Microelectronic Systems
  • Q: What is Yahoo!’s cloud computing infrastructure and how is it unique?   A: As one of the largest providers of consumer Internet services in the world, Yahoo’s cloud operates at virtually unprecedented scale, making it a unique environment and testing ground for cloud computing technologies.   Yahoo! has more than 500M unique users per month across the world. We store and deliver hundreds of petabytes of data, hundreds of billions of objects, and hundreds of thousands of requests/sec. All of this activity is processed across a diverse footprint of distributed data centers while seamlessly balancing highly variable usage patterns across a global audience at low-latencies.   Almost no other company can boast of having to tune its infrastructure to deal with such a range of technical requirements and high standards of performance. To meet this challenge, Yahoo!’s cloud includes a collection of infrastructure and functional services targeted at dramatically improving the company’s efficiency throughout the entire product development cycle, from gathering user feedback and insight, to feature testing and iteration to ongoing product operations. Q: How should we categorize Yahoo!’s Cloud Computing offerings? Is it IaaS, PaaS, SaaS or others?   A: For the time being, Yahoo!’s cloud computing focus is on its internal offerings, in service of making the Yahoo! experience as extraordinary, effective and productive as possible for consumers and advertisers across the world. We see this as a multi-year effort that will provide significant advantages for Yahoo! now and in the future.   Over time, we can envision exposing some of these cloud technologies and services externally. In such a scenario, we’d likely focus on more “functional” cloud services (more PaaS than IaaS) that could help developers leverage Yahoo!’s massive scale to innovate and deliver new, more richly integrated user experiences   Q: What is Yahoo!’s future plans in cloud computing? A: We are investing in further building out and deploying cloud computing technologies and services across the global Yahoo! operation so as to help our product teams innovate faster and deliver high-quality experiences to our customers across the globe. One area where this effort has been particularly notable is in the development and delivery of the Yahoo! Open Strategy and Y!OS platform.   Overall, we will continue to actively collaborate with the industry, academia and the open source community, including through our Open Cirrus consortium, involvement with Hadoop and Pig, and support of Apache.   Over time, we may consider exposing our cloud services in a more comprehensive manner through the Yahoo! Developer Network, which serves as Yahoo!’s front door for third parties seeking to engage with our developer tools and web services, including such popular offerings like BOSS, Flickr, YQL, YUI, and Y!OS. However, we have nothing specific to share at this time.

Cloud Computing @Yahoo! Cloud Computing @Yahoo! Presentation Transcript

  • Cloud Computing @Yahoo! Dekel Tankel Director, Product Management Yahoo! Cloud Computing [email_address] IGT, June 2009
  • What we’ll cover today…
    • Why Cloud?
      • Scale and Abstraction; Quality and Agility
      • Yahoo!’s unique footprint
    • Yahoo!’s Cloud Strategy
      • Overview of the Yahoo! Cloud vision and portfolio
      • Deep dive on Horizontal & Functional Cloud Services
    • The Yahoo! Open Strategy
      • Marrying Yahoo!’s “Open Strategy”, its platforms and ethic with external Cloud services
  • Why Cloud? Benefits for Yahoo!
    • Higher Agility & Stability while maintaining Scale
    • Abstraction
      • Enable developers to focus on their applications, not infrastructure
    • Accelerating innovation
      • Adding new features and products at an ever faster rate
    • Increasing Scale & Availability
      • More robustly, more globally, more completely, for a given budget
    Cloud is pushing up the Operation Excellence Curve Agility & Innovation Quality & Stability
  • Yahoo!’s Unique Cloud: Unprecedented Scale
    • Massive user base and engagement
      • 500M+ unique users per month
      • Hundreds of petabyte of storage
      • Hundreds of billions of objects
      • Hundred of thousands of requests/sec
    • Global
      • Tens of globally distributed data centers
      • Serving each region at low latencies
    • Challenging Users
      • Rapidly extracting value from voluminous data
      • Downtime is not an option (outages cost $millions)
      • Variable usage patterns
  • Yahoo! Cloud Services ROI & Innovation Y!OS, BOSS, YQL, APT, Analytics, … Storage, Batch, Edge Serving,… Users Applications Functional Cloud Services Horizontal Cloud Services Physical Layer
  • Yahoo! Cloud Services: Focus on PaaS offerings ROI & Innovation IaaS PaaS SaaS Users Applications Functional Cloud Services Horizontal Cloud Services Physical Layer
  • From Infrastructure to Shareholders benefit
    • Horizontal Cloud
      • Focus on open source and collaborative R&D with industry, academia and government
    • Functional Cloud
      • Focus on developing "open strategy" frameworks, tools and services for developers (at Yahoo! and beyond)
    • Combined Together
      • Leverage our unique scale, assets and data to drive disruptive innovations in the market and expand Yahoo!’s competitive differentiation
  • Yahoo! Cloud Strategy in Action: The Front Page Case Study
    • Horizontal Cloud – Storage & Hadoop
      • Analyze extremely large content data sets
    • Functional Cloud – Content Optimization
      • Rate content items based on various parameters
    • Applications – Yahoo’s Front Page
      • Display “high rating” items to the right users
      • Benefit consumers and advertisers and grow Yahoo!’s revenue
  • Yahoo! Cloud Strategy in Action: The Inquisitor Case Study
    • Horizontal Cloud – Hadoop
      • Analyze large search-index data sets
    • Functional Cloud - BOSS
      • Expose the data in a structured, open, flexible and “cloud like” way
    • Applications - iPhone TM Inquisitor
      • Leverage BOSS to provide innovative consumer experience
      • Benefit consumers and grow Yahoo!’s revenue
  • Horizontal Cloud Services ROI & Innovation Users Applications Functional Cloud Services Horizontal Cloud Services Physical Layer
  • Horizontal Cloud Services
    • Optimized for Yahoo!-scale
      • Yahoo!-internal focus
      • Data processing and serving environments
    • Drive faster innovation and agility
      • Shorter product development cycles
      • Reduce labor and costs for infrastructure
    • Multi-year effort
      • Strategic investment across the company
  • Horizontal Cloud Services: Conceptual View Common Approaches to QA, Production Engineering, Performance Engineering, Datacenter Management, and Optimization Shared Infrastructure Simple API’s ID & Account Management Provisioning & Virtualization (Xen) Operational Storage Structured, unstructured Batch Storage & Processing Hadoop, PIG Edge Content Services Caching, Proxies Online Serving Web, Data Security and Authentication Metering, Billing Monitoring & QoS
  • Horizontal Cloud Services: Use Cases Ads Optimization Content Optimization Search Index Image/Video Storage & Delivery Machine Learning (e.g. Spam filters) Attachment Storage
  • Yahoo! Distribution of Hadoop
    • Hadoop in a nutshell
      • Open source distributed file system & parallel execution environment to process massive amounts of data
      • Started in 2005, became top-level Apache project in 2008
      • Simple Design for Horizontal Scaling on commodity HW
    • Yahoo! Distribution of Hadoop
      • Source distribution of Yahoo!’s implementation of Hadoop (Based entirely on code found in the Apache Hadoop)
      • Tested and deployed at Yahoo!’s massive scale
      • Benefit the larger ecosystem , Increase pace of innovation
      • http://developer.yahoo.com/hadoop
  • Yahoo! runs the largest Hadoop Clusters in the World
    • 25,000+ nodes
      • Clusters of up to 4,000 nodes
    • 4 Tiers of clusters
      • Development & Testing, POCs, Science & Research, Production
    • Terasort Benchmarks
      • 62 seconds to sort One Terabyte (run on 1,500 nodes)
      • 16.25 hours to sort One Petabyte (run on 3,700 nodes)
    • Webmap application
      • ~490 TB shuffling
      • ~280 TB output
  • Case Study - Search Assist™
    • Database for Search Assist™ is built using Hadoop.
    • 3 years of log-data, 20-steps of map-reduce
      • Leverage Hadoop’s scalability, load balancing and resiliency
      • Simplified access, flexibility for rapid innovation (from C++ to Python)
    Before Hadoop After Hadoop Time 26 days 20 minutes Development Time 2-3 weeks 2-3 days
  • Functional Cloud Services ROI & Innovation Users Applications Functional Cloud Services Horizontal Cloud Services Physical Layer
  • Functional Cloud Services
    • Provides functional capabilities for applications
      • Help developers to accomplish integrated web experiences in a faster and easier way
      • Provides common set of functional “building blocks”
    • “ Powered by” the horizontal cloud services
      • Abstracts infrastructure services from the Application
        • E.g. Storage, Compute, Serving, Robustness and Scalability
      • Self-Served, Global, Managed, Elastic and Metered
  • Functional Cloud Services: YQL & BOSS
    A single endpoint service that enables developers to query, filter and combine data across Yahoo! and beyond http://developer.yahoo.com/yql/console/ Providing Yahoo! Search infrastructure and technology to developers and companies to help them build their own search experiences Build your Own Search Service http://developer.yahoo.com/search/boss / Yahoo! Query Language
  • Build your Own Search Service (BOSS)
    • Yahoo!'s open search web services platform
      • Serving hundreds of millions of users across the Web.
    • Goal: foster innovation in the search industry
      • Build and launch web-scale search products that utilize the entire Yahoo! Search index.
      • Access to Yahoo!'s investments in crawling and indexing, ranking and relevancy algorithms
  • Yahoo! Query Language (YQL)
    • Single endpoint service to query, filter and combine data across Yahoo! and beyond
      • The “Internet API”
    • SQL-like SELECT syntax for getting the right data
      • Quickly discover available data sources and structure
      • Combined data from a single web browser
    • Easy-to-use Consol
      • http://developer.yahoo.com/yql/console/
  • Y!OS and Cloud
  • Yahoo! Open Stagey (Y!OS): Goals
  • Y!OS and Cloud Strategy CLOUD SERVICES
  • Open Collaborations around the globe
    • M45 - Yahoo!’s supercomputing cluster
      • 4,000 cores, 3 TB RAM, 1.5 PB disks, 27 teraflops!
      • Operational since November 2007, 4 major Universities
      • Focus on highly parallel computing
    • Open Cirrus™ with HP & Intel
      • A global, multi-data center, open source test bed
      • Target to advance cloud computing research & education
      • Simulates a real-life, Internet-scale environment
      • 9 Global sites, more than 50 research projects
  • Questions? Dekel Tankel Director, Product Management Yahoo! Cloud Computing [email_address]