Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou

on

  • 3,692 views

 

Statistics

Views

Total Views
3,692
Views on SlideShare
3,524
Embed Views
168

Actions

Likes
3
Downloads
128
Comments
1

5 Embeds 168

http://d.hatena.ne.jp 162
http://twitter.com 2
http://paper.li 2
http://www.trunk.ly 1
http://doryokujin.hatenablog.jp 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Hello
    My name is mercy,i saw your profile today and became intrested in you,i will also like to know you more,and if you can send an email to my email address,i will give you my pictures here is my email address (jonesmercy23@yahoo.i n) I believe we can move from here! Awaiting for your mail to my email address above.Thanks
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The web is changing. It’s always evolving and changing. This evolution is about people-powered experiences and transient, unstructured data. My 16-year-old writes. He deletes. He retweets.In fact, a ton of the data on the web today is transient data. It exists for a moment and then it's gone. Its comments on Facebook, emails, content alerts, messenger updates, blogs, twitter feeds .In fact, only 5% of the information created in the world today is “structured”.
  • Yahoo!'s role has always been to cut through the noise and help people find what they want. We do that in many ways – primarily with deep science and insights, all relying on Hadoop. From curating people’s relationships to get more meaning out of them, to understanding their interests and their location, to adding a complex layer of science on top of all that – Hadoop’s right at the core of making all of that possible.
  • Turning data into insights isn't trivial. It's heavy lifting. It’s analysis and refinement of raw, unstructured information. It's also deep, best-in-class technology and science, and applying and improving this science is one of the things we do best at Yahoo! – using a variety of techniques as you see listed here.
  • Yahoo! has made investments in Hadoop that have enabled us to add much more relevance to our data, enrich it, extract insights, and deliver relevant, personalized content and experiences to our consumers. These same investments help deliver the right audiences to our advertisers. As a result of delivering that highly relevant experience to 600 million users around the world, Yahoo!’s one of the most trusted brands on the Internet.
  • Hadoop delivers huge value to Yahoo! by enabling the important stuff we do with all of our big data. Without it, we simply couldn’t deliver the engaging consumer experiences and advertiser value the way we do today. With Hadoop, we get the disruptive ability to rapidly innovate by customizing, personalizing and fusing people’s individual worlds with the Web at large, in a way no other company can today.
  • With 600 million people visiting Yahoo!, 11 billion times a month, generating 98 billion page views, Yahoo! is a leader in many categories, and people trust us to give them a great experience and show them what’s most interesting and relevant to them. Behind every click, we’re using Hadoop to optimize what you see on Yahoo.com. We serve about 3 million different versions of the Today Module every 24 hours. Hadoop allows us to analyze story clicks by applying machine learning so we can figure out what you like and give you more of it. Every click a person makes on our homepage – that’s around half a billion clicks per day – results in multiple personalized rankings being computed, each completing in less than 1/100th of a second. Within ~7 minutes of a user clicking on a story, our entire ranking model is updated. Our Content Optimization Engine creates a real-time feedback loop for our editors. They can serve up popular stories and pull out unpopular stories, based on what the algorithm is telling them in real time. Our modeling techniques help us deeply understand the content and eliminate the guesswork, so we can actually predict a story’s relevance and popularity with our audience.
  • Because of technologies like Hadoop and the rest of our Cloud platform, we’re learning and building faster and faster. It’s all about speed, innovation and real, substantial value to our business. At Yahoo, we’ve been using Hadoop across the company for the last five years, and I’ve shown you just a few examples. Based on our testing and experience, we believe Hadoop is now ready for mainstream enterprise use. We’ve deliberately chosen to invest in open source as the foundation of our cloud. Yahoo! is running the largest implementation of Hadoop in the world today.
  • An overview of the Hadoop EcosystemYahoo! employees, including Doug Cutting, initiated Apache Hadoop in 2005Since then, the ecosystem has expanded
  • Hadoop is at the center or our data eco system Every click, page view, search Foundation of our ad management & targeting systems Content Enrichment: (geo location, category) Customize content for users Where Science Meets DataMachine learning - algorithm developmentspam detectionad targetingpredicting user interest and ad inventory Research on ad effectivenessProvides Scale for Big DataDaily: 120TB, 3+PB. Total 70+PB data -- and growingWeb data growing at CAGR of 60% - by 2013 - 667 exabytes (Cisco)
  • Started Developing Hadoop 5 years ago Prototype of a 20 node clusterDedicated team developing Hadoop every since Focused on supporting Yahoo! needsContributing Hadoop to Apache and helping build the communityStarted as research projectsProgressed to applied science efforts supporting search and adv productsThen production systems (Ad Targeting, Content optimization)Now Hadoop usage has spread to all parts of our business Hadoop is our Big Data infrastructure -- It provides agility with Big Data50% of enterprises cited recent study said strongly considering Hadoop adoption Agility cited as the number one reason
  • People ask why we contribute to open sourceOpen Source helps us avoid technological dead endsBenefit from leveraging community contributionsAllows us to hire a workforce already trained in our technologyOpen sourcing our Cloud components starts with HadoopPigYahoo! Distribution of Hadoop (adding others)Yahoo! Traffic ServerZookeeperIn addition to benefiting from extern Hadoop contributions:Hive, Apache Web Server, Xen

Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou Presentation Transcript

  • 1. Hadoop & thefuture of Cloud Computing
    Todd Papaioannou
    VP, Cloud Architecture
    By SearchNetMedia
  • 2. what’s
    happening
    More publicly available human-generated content
    More interactions being tracked (e.g. clickstream data)
    More business processes are being digitized
    More history being kept
    = The Data Exhaust!
    Flickr : sub_lime79
    BigData is here!
  • 3. CUTTING THROUGH
    THE NOISE
    access audience blogs communication computerinternetmass media
    people networking technology
    Location
    Social Relationships
    Science
    UnderstandingUser Interests
    Flickr : Lomo-Cam
  • 4. turning data
    into insights
    machine learning
    time series
    logic regression
    content clustering
    algorithms
    Ad inventory modeling
    user interest prediction
    Flickr : NASA Goddard Photo and Video
    factorization models
  • 5. making it
    relevant
    Flickr : ogimogi
  • 6. hadoop:
    lightning-fast
    Technology
    science + big data + insight = personal relevance = VALUE
    Flickr : DDFic
  • 7. BEHIND
    every click
  • 8. hadoop
    Flickr : Got Sarah
  • 9. THE PLATFORM EFFECT
    THE HADOOP ECOSYSTEM
    and other Early Adopters
    Scale and productize Hadoop
    Orgs with Internet Scale Problems
    Add tools / frameworks, enhance Hadoop
    Enhance
    Hadoop
    Ecosystem
    Service Providers
    Grow ecosystem - Training, support, enhancements
    Apache Hadoop
    Virtuous Circle!
    • Investment -> Adoption
    • 10. Adoption -> Investment
    Mainstream / Enterprise adoption
    Fund further development, enhancements
    9
  • 11. HADOOP IS GOING
    MAINSTREAM
    2010
    2008
    2009
    2007
    The Datagraph Blog
    10
  • 12. hadoop at
    yahoo!
    “Where Science meets Data”
    PRODUCTS
    Data Analytics
    Content Optimization
    Content Enrichment
    Yahoo! Mail Anti-Spam
    Advertising Products
    Ad Optimization
    Ad Selection
    Big Data Processing & ETL
    DIMENSIONAL DATA
    CONTENT
    DATA PIPELINES
    HADOOP CLUSTERS
    Tens of thousands of servers
    APPLIED SCIENCE
    User Interest Prediction Ad inventory prediction
    Machine learning - search ranking
    Machine learning - ad targeting
    Machine learning - spam filtering
    11
  • 13. 250
    200
    150
    100
    50
    0
    from project to
    core platform
    90
    80
    70
    60
    50
    40
    30
    20
    10
    0
    38K Servers
    170 PB Storage
    1M+ Monthly Jobs
    Petabytes
    Thousands of Servers
    Today
    2010
    2007
    2008
    2009
    2006
    12
  • 14. yahoo!’S Vision
    open source cloud
    Open Source Benefits
    »Avoid technological dead ends
    »Leverage community contributions
    »Workforce already trained
    Ongoing contributions
    Yahoo!’s adoption of open source
    Future contributions
    Cloud serving
    Storage
    13
  • 15. What does The
    Future hold?
    By Elsie
  • 16. More BIG
    By BionicTeaching
  • 17. Data in the
    cloud
    By Fadilfb
  • 18. PrivateClouds
    By Zachstern
  • 19. hybrid clouds
    By Calop
  • 20. Automation
  • 21. cloud fabrics
  • 22. Questions?