Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統

7,108 views

Published on

黃孝文 (Norman)
Yahoo! Taiwan Senior Data Engineer

熱愛透過資料的分析,找出其中隱藏的模式及可運用的知識,對解讀與詮釋資料模型深深著迷。目前任職於Yahoo!奇摩,致力於從電子商務平台上混雜的行為脈絡,經由分析消費者的購買偏好、商品特性及其交互影響的關係,在購物過程中提供消費者即時且適合的推薦商品。

---

林于聖 (Jason Lin)
Yahoo! Taiwan Senior Data Engineer

對於新技術科技都有特別的熱誠,喜歡寫程式,新科技,組裝3D印表機,對於資料挖掘非常有興趣。在Yahoo擔任資料工程師,致力於設計架構平台,也分析使用者在網站上的行為,研究商品,網站,使用者之間的關聯及影響性。在資訊爆炸的時代,如何將繁雜的資料抽絲剝繭,當使用者瀏覽時,能精準預測準確且即時地推薦使用者所需要的商品。

Published in: Technology

天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統

  1. 1. 天下武功唯快不破:  利用串流資料實做出即時分類器和即時推薦系統 Yahoo! Taiwan EC Data Team
  2. 2. Who I am ▪ Norman Huang (normany@yahoo-inc.com) ▪ Software & Data Engineer of Yahoo! Taiwan ▪ Aims to retrieve and deliver data insights via BI platform and data mining algorithms. 2
  3. 3. Who I am ▪ Jason Lin (jasonysl@yahoo-inc.com) ▪ Software & Data Engineer of Yahoo! Taiwan ▪ Responsible for recommendation system personalization mechanisms and cloud computing developer. 3
  4. 4. Agenda ▪ Challenges ▪ Solution: Pinball ▪ Q&A 4
  5. 5. Challenges ! ! ! ! ! ! ▪ Static content until next batch job. ! ! ! 5 Processing
  6. 6. Challenges ! ! ! ! ! ! ▪ Static content until next batch job. ▪ Batched product recommendation algorithms have become common features among e-commerce platforms. ! 6 Processing
  7. 7. Challenges ! ! ! ! ! ! ▪ Nearly 72% of visitors made their decision at the same day. 7 Absorbed into batch views Not yet absorbed Time Several hours of data
  8. 8. Challenges ! ! ! ! ! ! ▪ Nearly 72% of visitors made their decision at the same day. ▪ Real-time solution to interact with potential buyers. 8 Absorbed into batch views Not yet absorbed Time Several hours of data
  9. 9. Solution: Pinball 9
  10. 10. Pinball 10 Classifier Classifier User Profile A Profile B Profile C
  11. 11. Pinball ! ▪ Real-time classifier ▪ Detect buyers’ preferences by streaming data processing ▪ Deliver personalized ads and product recommendations on the fly 11
  12. 12. Pinball ! ▪ Real-time classifier ▪ Detect buyers’ preferences by streaming data processing ▪ Deliver personalized ads and product recommendations on the fly ! ▪ Challenges › How do to it in real-time? 12
  13. 13. Pinball ! ▪ Real-time classifier ▪ Detect buyers’ preferences by streaming data processing ▪ Deliver personalized ads and product recommendations on the fly ! ▪ Challenges › How do to it in real-time? › Storm 13
  14. 14. Pinball ! ▪ Real-time classifier ▪ Detect buyers’ preferences by streaming data processing ▪ Deliver personalized ads and product recommendations on the fly ! ▪ Challenges › How do to it in real-time? › Storm! › How to determine customers’ purchasing desire? 14
  15. 15. Pinball ! ▪ Real-time classifier ▪ Detect buyers’ preferences by streaming data processing ▪ Deliver personalized ads and product recommendations on the fly ! ▪ Challenges › How do to it in real-time? › Storm! › How to determine customers’ purchasing desire? › Buying Intention Detection 15
  16. 16. Solution: Pinball ▪ Storm Overview ▪ Buying Intention (BI) ▪ Architecture and Design 16
  17. 17. Pinball 17 Storm Learning Buyer
  18. 18. Pinball 18 Storm Learning Buyers
  19. 19. Pinball 19 Learning Storm Is Potential Buyer? Buyers Visitor Promotions
  20. 20. Pinball Pinball 20 Learning Storm Is Potential Buyer? Buyers Visitor Promotions
  21. 21. Pinball Pinball 21 Learning Storm Is Potential Buyer? Buyers Buyer Promotions
  22. 22. Storm Concepts ▪ Tuple & Streams ▪ Spouts & Bolts ▪ Topologies Yahoo Confidential & Proprietary 22
  23. 23. Tuple & Streams ▪ Tuple ! ! ! ! ▪ Stream Yahoo Confidential & Proprietary 23 Field 1 Field 2 Field 3 Field 4 Field 5 Tuple Tuple 1 Tuple 2 Tuple 3 Tuple n Stream
  24. 24. Spouts & Bolts Yahoo Confidential & Proprietary 24 Spout T T T T T Bolt T T T
  25. 25. Topology 25 Spout Bolt Bolt Streams ▪ Hadoop map-reduce job vs. Storm topology
  26. 26. Topology 26 Spout Bolt Bolt Streams ▪ Hadoop map-reduce job vs. Storm topology
  27. 27. Storm Concepts Yahoo Confidential & Proprietary 27 Computational Primitives Use Case High-level! Language Hadoop Map & Reduce Batch Processing Pig Storm Spout & Bolt Stream Processing Trident
  28. 28. Storm 28 Nimbus Zookeeper Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor Supervisor Master node, similar to the Hadoop JobTracker
  29. 29. Storm 29 Nimbus Zookeeper Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor Supervisor Coordinates the Storm cluster
  30. 30. Storm 30 Nimbus Zookeeper Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor Supervisor Run worker processes
  31. 31. Buying Intention ▪ Based on our findings: › The more page views, the higher the chance a visitor will buy it. › BUT, the buying intension value of each category will vary. 31 2 6
  32. 32. How to leverage Storm with Buying Intention (BI)?
  33. 33. Data Flow Diagram 33
  34. 34. Adaptive Learning 34
  35. 35. Learning & Classifier ▪ Online Binary Classification › Simple and computationally efficient ▪ e.g. › assumptions: γ=0.1, BI = 3 › scenario: a user makes 6 page views before purchasing • BI’ = 3 + (6-3) x 0.1 • BI’ = 3.3 35 BI ' = BI +(PV − BI )×γ
  36. 36. Buying Intention Qualification 36
  37. 37. 37 Topology Design
  38. 38. Lambda Architecture ▪ Term created by Nathan Marz (Creator of Apache Storm) ! ▪ Batch Real-time processing Yahoo Confidential & Proprietary 38
  39. 39. Lambda Architecture ▪ Term created by Nathan Marz (Creator of Apache Storm) ! ▪ Batch Real-time processing Yahoo Confidential & Proprietary 39
  40. 40. Lambda Architecture ▪ Term created by Nathan Marz (Creator of Apache Storm) ! ▪ Batch + Real-time processing › Hybrid batch and real-time processing Yahoo Confidential & Proprietary 40
  41. 41. Lambda Architecture ▪ Term created by Nathan Marz (Creator of Apache Storm) ! ▪ Batch + Real-time processing › Hybrid batch and real-time processing › Batch processing is treated as source of truth, and real-time updates models/insights between batches. Yahoo Confidential & Proprietary 41
  42. 42. Lambda Architecture Yahoo Confidential & Proprietary 42 [REF] http://lambda-architecture.net/
  43. 43. Lambda Architecture Yahoo Confidential & Proprietary 43 [REF] http://lambda-architecture.net/
  44. 44. Lambda Architecture Yahoo Confidential & Proprietary 44 Storm Streaming [REF] http://lambda-architecture.net/
  45. 45. Lambda Architecture Summingbird Yahoo Confidential & Proprietary 45 [REF] http://lambda-architecture.net/
  46. 46. Pinball Demonstration
  47. 47. 47
  48. 48. How to keep it generic and flexible? ▪ to add more signals ▪ to add more online learning algorithms ▪ to add more channels
  49. 49. How to keep it generic and flexible? Signals Algorithms Channels 49 Click Login Buy View Bounce Time Spent Buying Intention Email Y! Webpages Mobile Apps Messenger Fraud Detection Webpage Sequence
  50. 50. Summary ▪ Scalable to process real-time data ▪ Supports online learning algorithms ▪ Flexible interactions with visitors ▪ Increase user's engagement ▪ Increase the conversion rate ▪ To create synergy by combining batched recommender and Pinball Yahoo Confidential & Proprietary 50
  51. 51. Simple Hands-on -> Find out the heavy users!
  52. 52. Find out the heavy users! ▪ Memorize the numbers of page views for each user ▪ If the numbers are great than 3, it’s a heavy user Yahoo Confidential & Proprietary 52
  53. 53. Find out the heavy users! Yahoo Confidential & Proprietary 53 User Log Spout Learning Bolt userid, type, catlv1, catlv2, timestamp
  54. 54. Find out the heavy users! Yahoo Confidential & Proprietary 54 User Log Spout Learning Bolt userid, type, catlv1, catlv2, timestamp Learning Bolt shuffleGroup userA, xxxxx userB, xxxxx userD, xxxxx userB, xxxxx userE, xxxxx userC, xxxxx userB, xxxxx userC, xxxxx
  55. 55. Find out the heavy users! Yahoo Confidential & Proprietary 55 User Log Spout Learning Bolt userid, type, catlv1, catlv2, timestamp Learning Bolt fieldGroup userA, xxxxx userD, xxxxx userF, xxxxx userF, xxxxx userE, xxxxx userC, xxxxx userB, xxxxx userB, xxxxx userB, xxxxx userC, xxxxx
  56. 56. Find out the heavy users! Yahoo Confidential & Proprietary 56 User Log Spout Learning Bolt Learning Bolt fieldGroup userA, xxxxx userD, xxxxx userF, xxxxx userF, xxxxx userE, xxxxx userC, xxxxx userB, xxxxx userB, xxxxx userB, xxxxx userC, xxxxx Qualification Bolt userA, totalPV userB, totalPV userC, totalPV userF, totalPV
  57. 57. Questions? Norman! @normanyhuang! www.linkedin.com/in/normany Jason! @kalijason! www.linkedin.com/pub/jason-lin/67/93/743

×