Cpg iitm mar_29_2012_final

844 views
792 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
844
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • From Siloed to PlatformEarlier everything was a technology and a data silo. Built one-off.CPG1.We had to get everyone on the same technology – stable unified platform services for powering innovation Trade-off: Agile (does not scale later) vs. Stable. People usually give on one vs. the other as they hurry to market. We can talk about tradeoffs in scale, latency, security, etc. Bring up the M&A example of RMX. All acquisition integrations have faced the same problem. RMX’s 300MM impressions did not scale (agility choice), we are now at 12B NGDs. We rebuilt the backend storage etc. Dapper was the same way.2. Once everyone’s on the same system, then we can share data, apply science to data on the “platform” at scale to derive business valueWe can bring up LEGO as an example for siloed properties brought to a common content platform. Sherpa example for structured data storage, disparate MySQL to a common data store.
  • From Siloed to PlatformEarlier everything was a technology and a data silo. Built one-off.CPG1.We had to get everyone on the same technology – stable unified platform services for powering innovation Trade-off: Agile (does not scale later) vs. Stable. People usually give on one vs. the other as they hurry to market. We can talk about tradeoffs in scale, latency, security, etc. Bring up the M&A example of RMX. All acquisition integrations have faced the same problem. RMX’s 300MM impressions did not scale (agility choice), we are now at 12B NGDs. We rebuilt the backend storage etc. Dapper was the same way.2. Once everyone’s on the same system, then we can share data, apply science to data on the “platform” at scale to derive business valueWe can bring up LEGO as an example for siloed properties brought to a common content platform. Sherpa example for structured data storage, disparate MySQL to a common data store.
  • CPG power ALL of Yahoo!1.Display Ads (Emphasis on Hadoop) 7 clusters, 15K notes, 17T/day, 10PB, (APT 11 4PB, RMX 16 5.8PB) Categorize Ads, BT targeting, Predict user response, Traffic protectionHadoop helps Yahoo! target billions of impressions per day across one of the largest ad networks in the world by processing declared data and recent activity to segment users and determine the right ad to serve in milliseconds. 3x improvement in accuracy of ad placements and our ability to forecast supply over legacy systems (MyNA/Panama & AWACS/ All Warehouse Access System)“Predict” - critical to serving apparatusMachine Learned Categorization for Ads and Queries to automatically assign categories to web pages, ads, and queriesKeystone – Contextual Ads, predict and model user response based on all user context, including page content, user attributes like behavioral and geographical data, referrals to the page (how the user got there), and information about the publisher page.Display Supply and Demand ForecastingFuture (supply) inventory forecastingNGD: pricing forecasting computation - advisory useNGD: estimate clicks from impressionsTraffic protectionExecute the trade in serving, and clean it up later for bad traffic, before it hits the revenue system2. Mail (Emphasize on Cloud Services)YCPI has shown to improve download speed by over 40% for Mail. Hadoop helps blocks over 300,000 spam mails/ sec globally. 24BMail, the best monetized product at Yahoo! and at the heart of the Yahoo! network, fully leverages the power of Cloud. At the same time, it also leverages several other platform capabilities such as Membership services (Over 226,000 new good accounts created per day for U.S. Mail alone, 72M successful logins/ day). MobStor is used to solve the attachment de-dup problem to increase efficiency. Ranking Systems (Vespa) is used to search through the mailboxes/ folders.3. Lego (Emphasize on Core Content Services)135 regional Media sites have moved to Content Agility last year alone.Leverages Content Agility as the single, grid-based, highly scalable CMS instead of siloed approaches for CMS, front-end development, and editorial that properties earlier had (pre-Lego). Lego provides reusable UI modules and shared tool to reduce time to launch new sites from quarters to weeks. Content Agility and Lego power the content network and bring agility to Yahoo! properties. 4.Front Page (Emphasize on KAPS - Personalization)CORE increased CTR by +263% for Today Module vs. pre-CORECORE enables a real-time feedback loop across properties, leveraging user interest, intent, and context to optimize user engagement. Increase engagement by showing the right content to users with input from science & human editors. CORE delivers the most relevant experience on the Web by serving the right content to the right user. 5.Social Chrome (Emphasize on KAPS – Social)Over 22M net cumulative installs since launch, 620K Facebook referrals generated daily. Daily active users crossed 1MM within 5 weeks of launch. Vitamix/Vitality powers social chrome on all Y! properties worldwide to increase user engagement by surfacing relevant activities from friends. Vitamix provides Facebar of Friends with activity, activity history of a friend, friends activity feed, and friends activity on top articles. Several raking type initiatives for 2012 (rank friends, show most shared article etc.). 6.Livestand (Emphasize on MPS – Cocktails)Leverages Cocktails, a presentation platform and application framework built on YUI3, to create connected experiences across devices with single codebase. Provides a simple way for publishers and advertisers to seamlessly distribute content across devices in an experience that is elegant and personalized – single serving stack across applications, framework, and runtime. Fragmented approaches slow innovation and create tech debt – one stack per device class (web 1.0, web 2.0, iOS/Android, and Feature Phones). Cocktails provides reusable modules across devices and properties, server side JavaScript execution engine, high-efficiency HTTP server for personalization/ 2-way browser-server communications, and cloud hosted applications for easy deployment and bucket testing.
  • What is the location of the user does not exist? Especially for building the business listings database, User Generated Places (UGP) provides the capability to crowd source and algorithmically curate locations.Ingesting over 10,000 RSS feeds with an average of 3000 / day.
  • Location as a key pillar of Personalization.Crowd sourced with confidence levels (Messenger use case)For properties that require more precision like Local and Travel, we are adopting a multi pronged approach:Extractions from the deep webEarly work with Sciences to apply algos
  • Cpg iitm mar_29_2012_final

    1. 1. Cloud Platform Group (CPG)Presentation at IIT ChennaiMarch 29, 2012
    2. 2. Agenda CPG Mission and Value Proposition Fit within the Yahoo Stack Drill-down: User Generated Content (UGC) Drill-down: User Location Drill-down: Web Extractions Drill-down: Trending Q&AYahoo! Presentation, Confidential 2
    3. 3. Cloud Platform Group MissionCreate a global, scalable platform built onscience that enables rapid innovation and delivery of personalized, monetizable experiences across devices.Yahoo! Presentation, Confidential 3 3/29/2012
    4. 4. CPG Value Proposition 1 Agility with Stability LEGO powered by Content AgilityYahoo! Presentation, Confidential 4
    5. 5. CPG Value Proposition 2 Science at ScaleYahoo! Presentation, Confidential 5
    6. 6. ILLUSTRATIVE SAMPLECPG powers all of Yahoo! today MAIL DISPLAY ADS FRONT PAGE powered by Edge, powered by Hadoop powered by CORE Storage, Ranking, & Hadoop 3x improvement in accuracy of ad 40% faster download time, 300K+ spam Increased CTR by +263% for Today placements and our ability to forecast mails blocked/ sec Module by serving right content to the supply over legacy systems right user (over pre-CORE) LIVESTAND LEGO (YPP) SOCIAL CHROME powered by Mobile & powered by Content Agility powered by Social Platform Cocktails Presentation Services Reduce time to launch new sites from Over 22M net cumulative installs Seamlessly distribute content across quarters to weeks since launch, Integrated into devices in an experience that is News, Games, Movies, OMG, TV elegant and personalizedYahoo! Presentation, Confidential 6
    7. 7. User Generated Content Unified, scalable platform that enables self expression and gets users to connect over contentUSE CASE RESULTSIncrease content stickiness UGC platforms are used by over 200 Yahoo!and user retention; drive properties with over 650M UGC actions per yearrepeat usage across the Comments Message BoardsYahoo! network 1/3 of US 6M Finance comments traffic per month from MB Ratings & Reviews PollsSOLUTION 40M userUGC Cloud is a ratingsscalable, real-time platform 1.2M poll per votes per monththat lets users to express monththemselves, resulting inincreased userengagement and a vibrantYahoo! community
    8. 8. User Generated Content – ApplicationsImproving Comment Quality3 pronged approach – Machine; Human and Community Moderation300M analyzed, 70 M blocked with machine moderationReactive Volume (cost of reacting to abuse) avoidedSentiment Sliderhttp://news.yahoo.com/open-business-free-agency-set-begin-211828913--spt.html
    9. 9. User Generated Content – Social Poll
    10. 10. User Generated Content – In the WorksTopical Organization of Comments Social Conversations
    11. 11. User Location Store, manage & share user locations and locations of interest to create deeply personal digital experiences USE CASE RESULTS User location information was Properties can launch location aware services siloed, inconsistent, and with faster time to market on a single platform not shareable across properties and users 237M users with 550M locations Management, Authorization, and Control LOCDROP Normalized, Geo-Aware User Locations SOLUTION Centralized, Consistent, and Contextual Accurate, Relevant, Valuable Experiences Create a single data store Increase Content, Targeting and Revenues of user locations, shareable across Yahoo! properties and advertising systems
    12. 12. Read locations to drive local news, events and deals
    13. 13. Contextual Locations for Yahoo News YAHOO! CONFIDENTIAL
    14. 14. User Generated Places: Enable users to submit (and curate) alocation if one does not exist Android Messenger Use Case User cannot find a place and decides to create a new location to check-in User is asked for permission to detect current location from device Users location is pointed on a map. This will be used to get the lat/long of the created place User enters a location “Russian Tea Room” A new location is stored in UGP platform and the user is checked-in to this location User has an option to curate the locations created by other users UGP platform enables algorithmic curation
    15. 15. KAFE: Technologies* Web Content Manual SDE Rules Bing WCC YST HVC Live Pages  Large Aggregator Websites (LLFS) (e.g. amazon)EditorialEffort Dapper KAFE  Small Websites (e.g. community sites) S.D.E Dapper PSOX  Behind the Form sites (Deep Web) PSOX (Y! Labs)  Unsupervised extractions from large number of websites W.O.O Properties  Goldrush, Dish-a- Legacy wish, Restaurant Photos Backend Precision * Supports Multiple Sources of Data and Multiple Technologies Yahoo! Presentation, Confidential 15
    16. 16. Answers Not LinksDappfactoryDappfactory used by DD Builder to create over 3000+ DD experiences ! 16
    17. 17. Answers Not LinksDappfactoryDappfactory used by DD Builder to create over 3000+ DD experiences ! 17
    18. 18. Answers Not Links S-DEKAFE XSL RulesCreating Vertical Search Experiences forRecipes 18
    19. 19. Answers Not LinksPSOX-Unsupervised ExtractionsLooking for where to buy Amana dishwashers ? Y! Goldrush Craving for Hummus in Sunnyvale ? Y! Dish-a-Wish 19
    20. 20. Enhanced ListingsDappfactoryBefore: After:• Taken from Roadmap deck for Y! Local by Erin Johns• Data being provided to Y! Local, Front End revamp on Local Roadmap 20
    21. 21. Local Events for N.I.L.E Dappfactory Extracted using DappfactoryAs of Feb ‘12, over 22,000 events for 250 US cities have been extracted using Dappfactory 21
    22. 22. Data Extraction – Challenges  Technology whitespace  Head – Fully manual scales fine. Gives high precision.  Torso – Mostly use human assisted learning. Drop in recall and precision, but acceptable for production use.  Tail content – Only option is ML/no-human-in-loop models. Recall and Precision need lot of improvement.  Semantic Web initiatives – Web of Objects  Linked Open Data Format (RDF-a, OWL, Sparql)  Lod Cloud – Few Thousand data sets, 10s of billions of interlinked facts.  Confhopper – Sample/Demo application  Unstructured Corpus – NLP Extraction  Systems /Engineering Challenges – Low Latency processing, tokenization/parsing – Intl support  Sciences Challenges – polysemy, synonymy, aboutness/concepts, sentiment analysis.  CAP – Contextual analysis platformYahoo! Presentation, Confidential 22
    23. 23. TimeSense – usecases/business value proposition Search Suggestions in SD box – Timesense poweredUS FP Trending Now local pool for a given DMA suggestions triggered for 6% of all gossip requestspowered by TS –6% CTR lift attributed to local termsTrending searches in Left Rail on Yahoo US SRP – triggeredfor ~6% of all user queries TW FP Trending Now automated by Timesense API Plumbing, Monetization, & Games 23
    24. 24. TimeSense In BucketAUTOMATED trending module on shopping.yahoo.com : First module with no editorial intervention, vertically categorizedtrends, fast refresh and rotating terms Soon to Launch HK , TW and KR Automated trends modules on FP, Mail, OMG, news etc Editorial Power users of Timesense • Search Forecasting Editorial Team – updates sent twice a day to 500+ subscribers • FP Trending Now team Plumbing, Monetization, & Games •Regional Content programming , search editorial and SEO teams : US ,UK, HK, TW, IN [Q1 launch – all INTLs] Upcoming • Trending Now Syndication for Yahoo Hosted Search partners – via BOSS • Trending Image experience • Trending Now 2.0 automation expansion 24
    25. 25. Trending topic detection – Challenges  Systems Challenges • Low latency requirement • GBs of data analyzed from multiple data sources every 5 minutes • Scalability – different verticals, segmented models. • High Availability requirement  Sciences Challenges  Algorithmic improvements for near real time detection without precision loss  Short Phrase Categorization  Deduping/Clustering – intent detection  Segmentation/Smoothing – Age/gender/Behavioral Tracking Categories/Geography – signal sparsity with fine grained segmentation.Yahoo! Presentation, Confidential 25

    ×