Building Personalized Applications at Scale

2,063 views
1,975 views

Published on

Garrett Wu presents WibiData to the Bay Area Software Engineering meetup.

Published in: Technology, Sports
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,063
On SlideShare
0
From Embeds
0
Number of Embeds
1,184
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Building Personalized Applications at Scale

  1. 1. Building PersonalizedApplications at Scale Garrett Wu Director of Engineering Odiago, Inc.
  2. 2. Personalized Applications
  3. 3. Personalized Applications
  4. 4. Examples● Recommendations ○ Amazon ○ Netflix● Ad Targeting ○ Hulu ○ YouTube● Fraud Detection ○ Visa ○ JPMC● Spam ○ GMail● Search Personalization ○ Google
  5. 5. Overall Requirements● React to events in near real time. ○ Low latency reads/writes. ○ Event-driven analysis (not just batch).● Web scale: 100s of millions of users. ○ High throughput reads/writes.● Reliable. ○ Distributed, fault tolerant, graceful degradation.● Flexible. ○ Evolvable schema. ○ Support ad-hoc experimentation and analyses.
  6. 6. Data Flow
  7. 7. Data Flow
  8. 8. Datastore Requirements1. Random writes.2. Analysis (MapReduce).3. Random reads.
  9. 9. Datastore Requirements1. Random writes.2. Analysis (MapReduce).3. Random reads.
  10. 10. Data Model Requirements 1. Write user-centric data. ○ "Bob bought the Hunger Games book." ○ "Sally viewed product page X." 2. Query user-centric data. ○ "What were Jims most recent 5 purchases?" ○ "What are Sues top 3 recommendations?"Given everything we know about John: ● Transactions. ● Tweets. ● Likes.... recommend, classify, predict, cluster, profile.
  11. 11. User-centric Data Model
  12. 12. User-centric Data Model <column> <name>email</name> <description>Email address</description> <schema>"string"</schema> </column>Cells have Avro schemas for evolvable storage and retrieval.
  13. 13. User-centric Data Model ● 3-D storage with timestamps.
  14. 14. Analyzing Data: Producers ● produce() generates derived data for a single row: ○ recommend ○ profile ○ classify ○ etc.
  15. 15. Analyzing Data: Gatherers● gather() aggregates data across all rows. ○ build association rules for collaborative filtering. ○ train classifier models. ○ compute prior probabilities for events. ○ etc.
  16. 16. Example: Ad TargetingUser Games Interests Recommended AdsAlex MiniGolf Pro, Extreme Pond FishingBob Kitten KrashCarol Apples Everywhere, Underground RacerGame CategoriesMiniGolf Pro Golf, SportsKitten Krash Cats, RacingApples Everywhere Puzzles
  17. 17. Example: Ad TargetingUser Games Interests Recommended AdsAlex MiniGolf Pro, Golf, Extreme Pond Fishing SportsBob Kitten Krash ProducerCarol Apples Everywhere, Underground RacerGame CategoriesMiniGolf Pro Golf, SportsKitten Krash Cats, RacingApples Everywhere Puzzles
  18. 18. Example: Ad TargetingUser Games Interests Recommended AdsAlex MiniGolf Pro, Golf, ESPN.com Extreme Pond Fishing SportsBob Kitten KrashCarol Apples Everywhere, Producer Underground RacerCategory AdvertisementGolf ESPN.comAnimals Petco.comRacing Nascar.com
  19. 19. Example: Ad TargetingUser Games Interests Recommended AdsAlex MiniGolf Pro, Golf, ESPN.com Extreme Pond Fishing SportsBob Kitten KrashCarol Apples Everywhere, Producer Underground RacerCategory AdvertisementGolf ESPN.com Wait, where didAnimals Petco.com this come from?Racing Nascar.com
  20. 20. Example: Gathering AssociationsUser Games Interests Clicked AdsAlex MiniGolf Pro, Golf, Extreme Pond Fishing SportsBob Kitten KrashCarol Apples Everywhere, Underground Racer
  21. 21. Example: Gathering AssociationsUser Games Interests Clicked AdsAlex MiniGolf Pro, Golf, Extreme Pond Fishing SportsBob Kitten KrashCarol Apples Everywhere, Underground Racer
  22. 22. Example: Gathering Associations
  23. 23. Example: Gathering Associations
  24. 24. Example: Gathering Associations
  25. 25. Example: Gathering Associations
  26. 26. Example: Gathering Associations Map . . .
  27. 27. Example: Gathering Associations Map Reduce . . .
  28. 28. Final Thoughts● A user-centric data storage model has great advantages: ○ Fast per-user reads and writes. ○ Already pivoted by your most common analysis.● HBase provides fast, reliable random-access and scans. ○ Billions of rows, millions of columns. ○ Integrates well with MapReduce for analysis.● Build scalable personalized applications with WibiData. ○ Check out www.wibidata.com Garrett Wu | gwu@odiago.com

×