SlideShare a Scribd company logo
1 of 28
Download to read offline
Building Personalized
Applications at Scale



         Garrett Wu
   Director of Engineering
        Odiago, Inc.
Personalized Applications
Personalized Applications
Examples

● Recommendations
   ○ Amazon
   ○ Netflix
● Ad Targeting
   ○ Hulu
   ○ YouTube
● Fraud Detection
   ○ Visa
   ○ JPMC
● Spam
   ○ GMail
● Search Personalization
   ○ Google
Overall Requirements

● React to events in near real time.
   ○ Low latency reads/writes.
   ○ Event-driven analysis (not just batch).
● Web scale: 100's of millions of users.
   ○ High throughput reads/writes.
● Reliable.
   ○ Distributed, fault tolerant, graceful degradation.
● Flexible.
   ○ Evolvable schema.
   ○ Support ad-hoc experimentation and analyses.
Data Flow
Data Flow
Datastore Requirements

1. Random writes.
2. Analysis (MapReduce).
3. Random reads.
Datastore Requirements

1. Random writes.
2. Analysis (MapReduce).
3. Random reads.
Data Model Requirements

 1. Write user-centric data.
     ○ "Bob bought the Hunger Games book."
     ○ "Sally viewed product page X."
 2. Query user-centric data.
     ○ "What were Jim's most recent 5 purchases?"
     ○ "What are Sue's top 3 recommendations?"

Given everything we know about John:
   ● Transactions.
   ● Tweets.
   ● Likes.
... recommend, classify, predict, cluster, profile.
User-centric Data Model
User-centric Data Model



             <column>
              <name>email</name>
              <description>Email address</description>
              <schema>"string"</schema>
             </column>




Cells have Avro schemas for evolvable storage and retrieval.
User-centric Data Model




 ● 3-D storage with timestamps.
Analyzing Data: Producers




 ● produce() generates derived data for a single row:
    ○ recommend
    ○ profile
    ○ classify
    ○ etc.
Analyzing Data: Gatherers




● gather() aggregates data across all rows.
   ○ build association rules for collaborative filtering.
   ○ train classifier models.
   ○ compute prior probabilities for events.
   ○ etc.
Example: Ad Targeting
User                Games                  Interests   Recommended Ads
Alex                MiniGolf Pro,
                    Extreme Pond Fishing


Bob                 Kitten Krash



Carol               Apples Everywhere,
                    Underground Racer




Game                        Categories
MiniGolf Pro                Golf,
                            Sports

Kitten Krash                Cats,
                            Racing

Apples Everywhere           Puzzles
Example: Ad Targeting
User                Games                  Interests       Recommended Ads
Alex                MiniGolf Pro,          Golf,
                    Extreme Pond Fishing   Sports


Bob                 Kitten Krash


                                                Producer
Carol               Apples Everywhere,
                    Underground Racer




Game                        Categories
MiniGolf Pro                Golf,
                            Sports

Kitten Krash                Cats,
                            Racing

Apples Everywhere           Puzzles
Example: Ad Targeting
User       Games                   Interests              Recommended Ads
Alex       MiniGolf Pro,           Golf,                  ESPN.com
           Extreme Pond Fishing    Sports


Bob        Kitten Krash



Carol      Apples Everywhere,
                                               Producer
           Underground Racer




Category           Advertisement
Golf               ESPN.com


Animals            Petco.com


Racing             Nascar.com
Example: Ad Targeting
User       Games                   Interests              Recommended Ads
Alex       MiniGolf Pro,           Golf,                  ESPN.com
           Extreme Pond Fishing    Sports


Bob        Kitten Krash



Carol      Apples Everywhere,
                                               Producer
           Underground Racer




Category           Advertisement
Golf               ESPN.com

                                                      Wait, where did
Animals            Petco.com
                                                      this come from?
Racing             Nascar.com
Example: Gathering Associations
User    Games                  Interests   Clicked Ads
Alex    MiniGolf Pro,          Golf,
        Extreme Pond Fishing   Sports

Bob     Kitten Krash


Carol   Apples Everywhere,
        Underground Racer
Example: Gathering Associations
User    Games                  Interests   Clicked Ads
Alex    MiniGolf Pro,          Golf,
        Extreme Pond Fishing   Sports

Bob     Kitten Krash


Carol   Apples Everywhere,
        Underground Racer
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations




      Map




      .
      .
      .
Example: Gathering Associations




      Map          Reduce




      .
      .
      .
Final Thoughts

● A user-centric data storage model has great advantages:
   ○ Fast per-user reads and writes.
   ○ Already pivoted by your most common analysis.
● HBase provides fast, reliable random-access and scans.
   ○ Billions of rows, millions of columns.
   ○ Integrates well with MapReduce for analysis.


● Build scalable personalized applications with WibiData.
   ○ Check out www.wibidata.com




                                          Garrett Wu | gwu@odiago.com

More Related Content

Recently uploaded

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

Building Personalized Applications at Scale

  • 1. Building Personalized Applications at Scale Garrett Wu Director of Engineering Odiago, Inc.
  • 4. Examples ● Recommendations ○ Amazon ○ Netflix ● Ad Targeting ○ Hulu ○ YouTube ● Fraud Detection ○ Visa ○ JPMC ● Spam ○ GMail ● Search Personalization ○ Google
  • 5. Overall Requirements ● React to events in near real time. ○ Low latency reads/writes. ○ Event-driven analysis (not just batch). ● Web scale: 100's of millions of users. ○ High throughput reads/writes. ● Reliable. ○ Distributed, fault tolerant, graceful degradation. ● Flexible. ○ Evolvable schema. ○ Support ad-hoc experimentation and analyses.
  • 8. Datastore Requirements 1. Random writes. 2. Analysis (MapReduce). 3. Random reads.
  • 9. Datastore Requirements 1. Random writes. 2. Analysis (MapReduce). 3. Random reads.
  • 10. Data Model Requirements 1. Write user-centric data. ○ "Bob bought the Hunger Games book." ○ "Sally viewed product page X." 2. Query user-centric data. ○ "What were Jim's most recent 5 purchases?" ○ "What are Sue's top 3 recommendations?" Given everything we know about John: ● Transactions. ● Tweets. ● Likes. ... recommend, classify, predict, cluster, profile.
  • 12. User-centric Data Model <column> <name>email</name> <description>Email address</description> <schema>"string"</schema> </column> Cells have Avro schemas for evolvable storage and retrieval.
  • 13. User-centric Data Model ● 3-D storage with timestamps.
  • 14. Analyzing Data: Producers ● produce() generates derived data for a single row: ○ recommend ○ profile ○ classify ○ etc.
  • 15. Analyzing Data: Gatherers ● gather() aggregates data across all rows. ○ build association rules for collaborative filtering. ○ train classifier models. ○ compute prior probabilities for events. ○ etc.
  • 16. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Extreme Pond Fishing Bob Kitten Krash Carol Apples Everywhere, Underground Racer Game Categories MiniGolf Pro Golf, Sports Kitten Krash Cats, Racing Apples Everywhere Puzzles
  • 17. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Golf, Extreme Pond Fishing Sports Bob Kitten Krash Producer Carol Apples Everywhere, Underground Racer Game Categories MiniGolf Pro Golf, Sports Kitten Krash Cats, Racing Apples Everywhere Puzzles
  • 18. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Golf, ESPN.com Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Producer Underground Racer Category Advertisement Golf ESPN.com Animals Petco.com Racing Nascar.com
  • 19. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Golf, ESPN.com Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Producer Underground Racer Category Advertisement Golf ESPN.com Wait, where did Animals Petco.com this come from? Racing Nascar.com
  • 20. Example: Gathering Associations User Games Interests Clicked Ads Alex MiniGolf Pro, Golf, Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Underground Racer
  • 21. Example: Gathering Associations User Games Interests Clicked Ads Alex MiniGolf Pro, Golf, Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Underground Racer
  • 28. Final Thoughts ● A user-centric data storage model has great advantages: ○ Fast per-user reads and writes. ○ Already pivoted by your most common analysis. ● HBase provides fast, reliable random-access and scans. ○ Billions of rows, millions of columns. ○ Integrates well with MapReduce for analysis. ● Build scalable personalized applications with WibiData. ○ Check out www.wibidata.com Garrett Wu | gwu@odiago.com