Your SlideShare is downloading. ×
Svccg 2011-05-12
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Svccg 2011-05-12

2,065
views

Published on

Slides for my talk at the Silicon Valley Cloud Computing Group Meetup on May 12 2011 at Yahoo's campus in Sunnyvale.

Slides for my talk at the Silicon Valley Cloud Computing Group Meetup on May 12 2011 at Yahoo's campus in Sunnyvale.

Published in: Technology, Business

3 Comments
2 Likes
Statistics
Notes
  • No downloads for now - sorry. I may want to tweak the slides to add in a few attributions.

    Don't get too hung up on the 'Knowledge as a Service' tag. It's just an internal term referring to certain shared aspects of the SOA architecture for Yahoo applications.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Geoff Arnold presented the slides to Silicon Valley Cloud Computing Group May 12, 2011. Great meeting with 200+ folks at Yahoo last night.

    Visit and join us at http://www.meetup.com/cloudcomputing.

    Video recording of the talk is available at http://www.ustream.tv/channel/silicon-valley-cloud-computing-group.

    For other presentations from SVCCG at slideshare http://www.slideshare.net/group/silicon-valley-cloud-computing-group
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Cool,

    Should the download is enabled, that's even better.

    Thanks & Good w/e

    Henry
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,065
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
3
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cloud Computing at Yahoo!Lessons, Challenges and Futures
    Geoff Arnold
    May 12,2011
  • 2. Abstract
    Yahoo operates a wide range of global and regional web services, from email to photo sharing, from sports and business news to ecommerce, from social networks to education. Our properties range in size from global behemoths like Yahoo Mail to hyper-local startups. The numbers are huge. Yahoo sites attract more than 680 million users; our Superbowl coverage generated 37 million clicks, and there were 1.2 billion page views of our Oscars site. And this scale carries through to our technology: we analyze 100 billion events every day, and we’re rolling out 135 websites on our new media technology platform this year.
    
All of this is focused on one end: to deliver the premier digital experience to our users. And this leads to an interesting challenge: how can we apply all of our resources - computation, networking, user data, media, and science – to deliver a consistent and profitable experience, with agility and efficiency? For Yahoo, the answer has been to embrace cloud computing. In this talk, I’ll discuss what this really means for software developers within Yahoo, in terms of technologies and engineering practices, and how we intend to transform the development and delivery of Yahoo applications.
    5/12/11
    Yahoo! Presentation
    2
  • 3. Agenda
    Why this is different from the last 99 cloud computing presentations you’ve sat through
    Motivation
    Déjà vu
    What we’ve learned
    Where we’re going
    5/12/11
    Yahoo! Presentation
    3
  • 4. Why this is different from the last 99 cloud computing talks you’ve sat through
    No definition of cloud
    If you don’t know it by now….
    I’m not selling anything
    Except maybe a job opportunity or two….
    No cool new technologies
    It’s mostly about operational and systems refactoring
    More questions than answers
    Progress report
    5/12/11
    Yahoo! Presentation
    4
  • 5. Motivation
    Yahoo! Presentation
    5
    5/12/11
  • 6. OurMission
    Create a global, scalable platform built on science that enables rapidinnovation and delivery of personalized, monetizable experiences across devices.
    5/12/11
    6
    Yahoo! Presentation
  • 7. Today’s Architecture
    5/12/11
    Yahoo! Presentation
    7
  • 8. INTERNET
    EDGE
    MEDIA
    ADVERTISING
    DATA HWY
    PLATFORMS
    PIPELINES
    BT
    COKE
    KEYSTONE
    YELLOWSTONE
    MAIL/ANTI-SPAM
    GD STONE
    CONTENT AGILITY
    User Data
    Media & Files
    UPS / UDS / UDB /
    SHERPA
    MOBSTOR
    HADOOP
    ODS
    STORAGE
    MAIL
    FRONTPAGE
    SEARCH
    8
  • 9. Space Yahoo is big. You just won't believe how vastly, hugely, mind- bogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space Yahoo.
    Douglas Adams, The Hitchhiker's Guide to the Galaxy
  • 10. Unfortunately, I can’t tell you just how big
    What can I say?
    680+M users
    200PB of data, adding ~50TB of data per day
    100B events a day captured, collected, transported and processed
    43k Hadoop servers running 5M+Hadoop jobs every month
    Let’s just say that life is simpler for other folks…
    5/12/11
    Yahoo! Presentation
    10
  • 11. Why cloud – specifically “private cloud”?
    After all, many people think that “private cloud” is an oxymoron
    OH: “You should either be using a public cloud or operating one.”
    5/12/11
    Yahoo! Presentation
    11
  • 12. Anticipated benefits
    Business agility
    Operational consistency
    Interoperability
    Quality
    Tech deduplication
    Efficiency
    Risk reduction
    Cost transparency
    5/12/11
    Yahoo! Presentation
    12
  • 13. Why is agility #1?
    March 11, 2011 – the Japanese earthquake and tsunami
    Yahoo News spiked to 20.6 million unique visitors
    We served 371 million page views
    We added a “Donate Now” button which raised $7M
    Frequently need to spin up new sites with high traffic and short lifetime:
    Royal Wedding
    Canadian Election
    5/12/11
    Yahoo! Presentation
    13
  • 14. But why private cloud?
    Our business is digital media:“People want to be informed, they want to be entertained, they want to be educated, and they want to communicate. And that's what Yahoo is all about.”
    There are no public cloud operators that have the scale, geographic presence, and functionality that we would need
    So we have to do it ourselves
    5/12/11
    Yahoo! Presentation
    14
  • 15. Déjà vu
    Yahoo! Presentation
    15
    5/12/11
  • 16. Hasn’t Yahoo been here before?
    There have been many public presentations on Yahoo’s cloud ambitions and technology over the last three years
    Architectural details
    IETF submissions
    Open source plans
    So what happened?
    And is this talk going to be just another one in the series?
    5/12/11
    Yahoo! Presentation
    16
  • 17. Progress report
    In some areas, we’ve made great progress
    Grid
    Cloud storage
    Transforming complex middleware into hosted services
    OH: “Cloud apps are SOA apps”
    In others, we got ahead of ourselves
    And we’ve learned some useful lessons
    5/12/11
    Yahoo! Presentation
    17
  • 18. What we’ve learned
    Yahoo! Presentation
    18
    5/12/11
  • 19. The cloud compute API story is a mess
    We have procedural primitives to manipulate instances
    AWS EC2, RackSpace, OpenStack
    We have “virtual data center” declarative schemes
    VMware vCloud, Orancle/Sun, Yahoo CSE
    We have PaaS models which abstract away instances and data centers
    And there’s no coherent way to tie them all together
    5/12/11
    Yahoo! Presentation
    19
  • 20. Consequences
    How do I describe alternative roll-out/roll-back policies for my VDC?
    It seems natural to describe – or specify – the semantics of my declarative API in terms of the primitives in my procedural API
    How can I deploy a multi-tier application in which one tier was created using a PaaS framework
    One or more!
    Who’s on first?
    Does the VDC deployment description control the PaaS system?
    Does the PaaS application configuration trigger the deployment of other tiers?
    5/12/11
    Yahoo! Presentation
    20
  • 21. Forget “long tail” – what about the head?
    Yahoo has a number of “mega-properties”
    Mail, Front Page, Sports, News, Flickr
    They all want to take advantage of the benefits of cloud
    Potential for huge gains in efficiency, predictabilty
    However because of their size, they’ve had to radically optimize their technology and operations
    They have stringent performance (latency) requirements and use highly customized technologies
    This means that they aren’t necessarily a good fit for generic, multi-tenant services
  • 22. Cranking up the rate of change
    Historically, Yahoo properties have acquired new hardware through traditional committee-driven processes
    So there are long lead times, and plenty of time to provision systems like asset management, DNS, monitoring, access control, and so forth
    Many of these systems were not designed to support real-time updates
    And even if they were, they were rarely stressed
    The effects of introducing on-demand provisioning tend to cascade though many operational systems
    5/12/11
    Yahoo! Presentation
    22
  • 23. Security policies
    Pre-cloud security policies tend to assume a single point of responsibility: the property which “owns” the box and the software stack that’s running on it
    Threat analysis tends to focus on the box, and on mitigating consequences of attack
    With the cloud, there are many more attack vectors:
    The application VM on the physical box
    Any other VM running on the box
    The hypervisor managing the box
    The VM management system
    Creating new security policies takes time
    5/12/11
    Yahoo! Presentation
    23
  • 24. Invalidating traditional assumptions
    In a slowly-changing environment, it’s easy to assume that some things are constant
    For example, Yahoo has a flat IP network with fixed IP addresses per-box and per-rack
    So it’s reasonable to use an IP address as a key for many things
    It violates current policy, but there are many legacy systems…
    So when a system that uses IP addresses for app instance identifiers is dynamically [re]deployed in a cloud fabric, things break
    E.g. Reprocessing historic Apache log records
    5/12/11
    Yahoo! Presentation
    24
  • 25. Legacy workloads
    Most people agree that the future of cloud computing lies with PaaS systems which provide a programming, development and operational model that is optimized for the cloud
    CloudFoundry and OpenShift are promising developments
    However it is difficult to make the case that the benefits of cloud computing should only be available to new applications
    So we have IaaS fabrics on which people run traditional workloads
    Sometimes unchanged, more often with minor cloud adaptations
  • 26. Legacy workloads at very large scale
    At Yahoo, we run into the problem that many of our legacy property workloads run on large fleets in many data centers around the world
    Their “cloud adoption strategy” involves deploying their current stack on a small number of cloud instances and running them in parallel with their existing fleet
    This makes it hard to adapt the application configuration to work well in the cloud
    It also means that the property’s existing operational practices have to be made to work with cloud instances
    This is particularly challenging for Edge configuration
    5/12/11
    Yahoo! Presentation
    26
  • 27. The reality of “on demand”
    “On demand” capacity presumes that each individual customer request is very small compared with the size of the resource pool
    Capacity management involves careful over-provisioning of the pool
    But we have many large properties…
    Today we overprovision to their planned size/QPS, which:
    Can waste a lot of resources
    Has long lead times
    We need to be able to capture the time dimension in our capacity planning and provisioning requests
    Define the envelope, allow on-demand within that envelope
    5/12/11
    Yahoo! Presentation
    27
  • 28. Integrating Platform as a Service
    We’ve already seen that PaaS presents us with a problem in terms of cloud compute APIs
    We’re working on a number of outstanding issues with PaaS:
    Accommodating PaaS in our existing software development and test methodologies, including Continuous Integration
    Deploying PaaS in different network security zones
    Managing the various kinds of access to the PaaS infrastructure
    Integrating various network and request handling mechanisms – throttling, traffic shaping, A/B testing, etc. – into PaaS frameworks
    5/12/11
    Yahoo! Presentation
    28
  • 29. Automation
    One of the key benefits of cloud computing is improved consistency and predictability through automation
    Scale affects automation in interesting ways
    Potential interactions between automated deployment and self-healing
    Very long-running automation, such as the initial population of a new storage farm in a new location
    Issues also arise when automation is designed to exploit an API that was created for a different use-case
    E.g. an API intended to support a portal GUI with a human operator, which has never been stressed
    5/12/11
    Yahoo! Presentation
    29
  • 30. Where we’re going
    Yahoo! Presentation
    30
    5/12/11
  • 31. Architectural Vision
    Software as a Service
    Knowledge as a Service
    Platform as a Service
    Infrastructure as a Service
    Hardware
    5/12/11
    Yahoo! Presentation
    31
  • 32. Architectural Vision: IaaS
    • Planet scale Cloud Fabric that abstracts away all the underlying hardware
    • 33. Fundamental Cloud Services that can be assembled to build higher level services
    5/12/11
    Yahoo! Presentation
    32
  • 34. Architectural Vision: PaaS
    • On demand, higher level Platform services that support:
    • 35. Interactive onstage applications
    • 36. Offline batch applications
    • 37. Sophisticated programming environments that are offered as a holistic Platforms
    • 38. Hosts the development and operation of Yahoo applications
    5/12/11
    Yahoo! Presentation
    33
  • 39. Takeaways
    The Yahoo private cloud project is on track
    Many successes already, but much more work to do
    End-to-end architecture is a key element
    Open source is important: we’re committed to a collaborative approach
    Founder member of Open Networking Foundation
    Studying other open source projects
    No more premature announcements
    And of course we’re hiring…. ☺
    5/12/11
    Yahoo! Presentation
    34
  • 40. geoff
    arnold
    Cloud services wrangler
    gma@yahoo-inc.com
    Twitter: @geoffarnold
    Y! IM: geoff_arnold@yahoo.com
    http://geoffarnold.com
  • 41. Our Open Source Philosophy
    Open Source Benefits
    • Higher quality software
    • 42. Avoid technological dead ends
    • 43. Leverage community contributions
    • 44. Workforce already trained
    • 45. More widespread usage
    • 46. “Platform/ecosystem” effect
    • 47. Talent recruiting
    • 48. Partner / M&A acquisitions
    Ongoing Contributions
    Adoption of Open Source
    Probable Future Engagement
    IaaS; PaaS; Cloud storage
    5/12/11
    Yahoo! Presentation
    36