SlideShare a Scribd company logo
1 of 28
Download to read offline
Building Personalized
Applications at Scale



         Garrett Wu
   Director of Engineering
        Odiago, Inc.
Personalized Applications
Personalized Applications
Examples

● Recommendations
   ○ Amazon
   ○ Netflix
● Ad Targeting
   ○ Hulu
   ○ YouTube
● Fraud Detection
   ○ Visa
   ○ JPMC
● Spam
   ○ GMail
● Search Personalization
   ○ Google
Overall Requirements

● React to events in near real time.
   ○ Low latency reads/writes.
   ○ Event-driven analysis (not just batch).
● Web scale: 100's of millions of users.
   ○ High throughput reads/writes.
● Reliable.
   ○ Distributed, fault tolerant, graceful degradation.
● Flexible.
   ○ Evolvable schema.
   ○ Support ad-hoc experimentation and analyses.
Data Flow
Data Flow
Datastore Requirements

1. Random writes.
2. Analysis (MapReduce).
3. Random reads.
Datastore Requirements

1. Random writes.
2. Analysis (MapReduce).
3. Random reads.
Data Model Requirements

 1. Write user-centric data.
     ○ "Bob bought the Hunger Games book."
     ○ "Sally viewed product page X."
 2. Query user-centric data.
     ○ "What were Jim's most recent 5 purchases?"
     ○ "What are Sue's top 3 recommendations?"

Given everything we know about John:
   ● Transactions.
   ● Tweets.
   ● Likes.
... recommend, classify, predict, cluster, profile.
User-centric Data Model
User-centric Data Model



             <column>
              <name>email</name>
              <description>Email address</description>
              <schema>"string"</schema>
             </column>




Cells have Avro schemas for evolvable storage and retrieval.
User-centric Data Model




 ● 3-D storage with timestamps.
Analyzing Data: Producers




 ● produce() generates derived data for a single row:
    ○ recommend
    ○ profile
    ○ classify
    ○ etc.
Analyzing Data: Gatherers




● gather() aggregates data across all rows.
   ○ build association rules for collaborative filtering.
   ○ train classifier models.
   ○ compute prior probabilities for events.
   ○ etc.
Example: Ad Targeting
User                Games                  Interests   Recommended Ads
Alex                MiniGolf Pro,
                    Extreme Pond Fishing


Bob                 Kitten Krash



Carol               Apples Everywhere,
                    Underground Racer




Game                        Categories
MiniGolf Pro                Golf,
                            Sports

Kitten Krash                Cats,
                            Racing

Apples Everywhere           Puzzles
Example: Ad Targeting
User                Games                  Interests       Recommended Ads
Alex                MiniGolf Pro,          Golf,
                    Extreme Pond Fishing   Sports


Bob                 Kitten Krash


                                                Producer
Carol               Apples Everywhere,
                    Underground Racer




Game                        Categories
MiniGolf Pro                Golf,
                            Sports

Kitten Krash                Cats,
                            Racing

Apples Everywhere           Puzzles
Example: Ad Targeting
User       Games                   Interests              Recommended Ads
Alex       MiniGolf Pro,           Golf,                  ESPN.com
           Extreme Pond Fishing    Sports


Bob        Kitten Krash



Carol      Apples Everywhere,
                                               Producer
           Underground Racer




Category           Advertisement
Golf               ESPN.com


Animals            Petco.com


Racing             Nascar.com
Example: Ad Targeting
User       Games                   Interests              Recommended Ads
Alex       MiniGolf Pro,           Golf,                  ESPN.com
           Extreme Pond Fishing    Sports


Bob        Kitten Krash



Carol      Apples Everywhere,
                                               Producer
           Underground Racer




Category           Advertisement
Golf               ESPN.com

                                                      Wait, where did
Animals            Petco.com
                                                      this come from?
Racing             Nascar.com
Example: Gathering Associations
User    Games                  Interests   Clicked Ads
Alex    MiniGolf Pro,          Golf,
        Extreme Pond Fishing   Sports

Bob     Kitten Krash


Carol   Apples Everywhere,
        Underground Racer
Example: Gathering Associations
User    Games                  Interests   Clicked Ads
Alex    MiniGolf Pro,          Golf,
        Extreme Pond Fishing   Sports

Bob     Kitten Krash


Carol   Apples Everywhere,
        Underground Racer
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations




      Map




      .
      .
      .
Example: Gathering Associations




      Map          Reduce




      .
      .
      .
Final Thoughts

● A user-centric data storage model has great advantages:
   ○ Fast per-user reads and writes.
   ○ Already pivoted by your most common analysis.
● HBase provides fast, reliable random-access and scans.
   ○ Billions of rows, millions of columns.
   ○ Integrates well with MapReduce for analysis.


● Build scalable personalized applications with WibiData.
   ○ Check out www.wibidata.com




                                          Garrett Wu | gwu@odiago.com

More Related Content

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Recently uploaded (20)

ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

Building Personalized Applications at Scale

  • 1. Building Personalized Applications at Scale Garrett Wu Director of Engineering Odiago, Inc.
  • 4. Examples ● Recommendations ○ Amazon ○ Netflix ● Ad Targeting ○ Hulu ○ YouTube ● Fraud Detection ○ Visa ○ JPMC ● Spam ○ GMail ● Search Personalization ○ Google
  • 5. Overall Requirements ● React to events in near real time. ○ Low latency reads/writes. ○ Event-driven analysis (not just batch). ● Web scale: 100's of millions of users. ○ High throughput reads/writes. ● Reliable. ○ Distributed, fault tolerant, graceful degradation. ● Flexible. ○ Evolvable schema. ○ Support ad-hoc experimentation and analyses.
  • 8. Datastore Requirements 1. Random writes. 2. Analysis (MapReduce). 3. Random reads.
  • 9. Datastore Requirements 1. Random writes. 2. Analysis (MapReduce). 3. Random reads.
  • 10. Data Model Requirements 1. Write user-centric data. ○ "Bob bought the Hunger Games book." ○ "Sally viewed product page X." 2. Query user-centric data. ○ "What were Jim's most recent 5 purchases?" ○ "What are Sue's top 3 recommendations?" Given everything we know about John: ● Transactions. ● Tweets. ● Likes. ... recommend, classify, predict, cluster, profile.
  • 12. User-centric Data Model <column> <name>email</name> <description>Email address</description> <schema>"string"</schema> </column> Cells have Avro schemas for evolvable storage and retrieval.
  • 13. User-centric Data Model ● 3-D storage with timestamps.
  • 14. Analyzing Data: Producers ● produce() generates derived data for a single row: ○ recommend ○ profile ○ classify ○ etc.
  • 15. Analyzing Data: Gatherers ● gather() aggregates data across all rows. ○ build association rules for collaborative filtering. ○ train classifier models. ○ compute prior probabilities for events. ○ etc.
  • 16. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Extreme Pond Fishing Bob Kitten Krash Carol Apples Everywhere, Underground Racer Game Categories MiniGolf Pro Golf, Sports Kitten Krash Cats, Racing Apples Everywhere Puzzles
  • 17. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Golf, Extreme Pond Fishing Sports Bob Kitten Krash Producer Carol Apples Everywhere, Underground Racer Game Categories MiniGolf Pro Golf, Sports Kitten Krash Cats, Racing Apples Everywhere Puzzles
  • 18. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Golf, ESPN.com Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Producer Underground Racer Category Advertisement Golf ESPN.com Animals Petco.com Racing Nascar.com
  • 19. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Golf, ESPN.com Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Producer Underground Racer Category Advertisement Golf ESPN.com Wait, where did Animals Petco.com this come from? Racing Nascar.com
  • 20. Example: Gathering Associations User Games Interests Clicked Ads Alex MiniGolf Pro, Golf, Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Underground Racer
  • 21. Example: Gathering Associations User Games Interests Clicked Ads Alex MiniGolf Pro, Golf, Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Underground Racer
  • 28. Final Thoughts ● A user-centric data storage model has great advantages: ○ Fast per-user reads and writes. ○ Already pivoted by your most common analysis. ● HBase provides fast, reliable random-access and scans. ○ Billions of rows, millions of columns. ○ Integrates well with MapReduce for analysis. ● Build scalable personalized applications with WibiData. ○ Check out www.wibidata.com Garrett Wu | gwu@odiago.com