SlideShare a Scribd company logo
1 of 37
Recommendations and User Understanding
          at StumbleUpon

Chief Data Scientist Summit, San Diego, February 2013

                      Debora Donato
                  Principal Data Scientist

                              Slides courtesy of
       Vishal Vaingankar, Tim Abraham, Roberto Sanabria, Ulas Bardak
StumbleUpon’s Mission

Help users find content they did not expect to find
 Be the best way to discover new
and interesting things from across
             the Web.
How StumbleUpon works
1. Register   2. Tell us your interests    3. Start Stumbling and
                                           rating web pages




                                We use your interests and behavior to
                                recommend new content for you!
StumbleUpon
•   Single item type
                            •   No serendipity
•   << 100K items
                            •   Many at a time
•   < 250 categories
                            •   Not personalized*
•   Hand-labeled
                            •   Repeats
•   ~27M users




                       •   +100M items
                       •   >600 recs/mo.
                       •   Auto features
                       •   ~200 methods




•   Mostly about
                            •   Hand-labeled
    presentation
                            •   Item-item
•   Social recs only
                                similarity based
•   10 million
                                methods
    recs/month
Data-driven culture

            Data science




 Applied
                              Analytics
Research


     15% of the total work force
Extensive A/B Testing




AB Tests on metrics such as session length, retention,
rating behavior etc
Outline of the talk
• The recommendation pipeline

• Showcases:
  – Mobile optimization
  – Power User Understanding
  – Lists
Discovery is very different from search


Discovery at StumbleUpon                  Search
     Serendipitous                     Intent driven
      One at a time                   List of articles
     Never repeats                    Always repeats
   Constantly adapting                 Fixed results
     Tailored for you                  Impersonal

    There is a ongoing shift from search to discovery
StumbleUpon Overview
1      Users            Automated
                                                  URL Index
    Discovery           Feeds


                                             3

            Ingestion
             Pipeline                            Rec Engine
                                       Yes
2
                                    Pass
          Sampling                   ?
Grow User’s Interest Graph:
              Implicit + Explicit

                           Experts     Friends

              Likeminded
                 Users                           News


                                User
               Food/                             Trending
Italian
Recipes       Cooking


                    Cars                    nasa.gov

          Vintage              1x.com
           Cars
Mobile Optimization
Changing Ecosystem

                            100%




                            75%
Percent of Total Stumbles




                                                                                      Source
                            50%                                                          mobile
                                                                                         desktop




                            25%




                             0%

                                   2011−01   2011−07   2012−01    2012−07   2013−01
                                                           Date
Webpages on Desktop Vs. Mobile
Webpages on Desktop Vs. Mobile
Finding mobile optimized content

                  Content Features
                  HTML tags
                  #links             P (URL_good | {f1, f2,…..}) = ?
                  #images
                  #videos



                  User Feedback



                                     P (URL_good | {f1, f2,…..}) = ?
User Feedback signals to determine mobile
 optimization




         CDF of thumbed-up
                                          URL is skipped when
         stumbles
                                          timespent <= skip_threshold


                                                          # skips
                                           Skip_rate =
                                                         # stumbles

0.05
       Skip threshold        Time (sec)
       (secs)
Cross-device skip rate prediction
                               URLs worse on
                               mobile vs desktop                       URLs bad on
                                                                       Both devices


             Mobile Skiprate




  URLs good on
  Both devices



                                          Desktop Skiprate

                      E[Mobile_skiprate] = Desktop_skiprate x Slope + Bias
AB RESULTS
User Understanding
Usage mining
Power user definition
• Is a loyal user who has been
  stumbled, even occasionally,
  for years?

• Is a user who regularly
  stumbles (daily or weekly)?

• Is a user who is able to
  discover good content?

• Or one who interacts (rates,
  creates lists, shares contents,
  invites friends)?
Stumble rate




•    Sample of ~5M users active in the last 3   •   max dist. cut off: 25.2 SPD
    months                                      •   50% dist cut off: 31.7 SPD
•    Excluded users that had < 10 DOA
•   Global avg: 39.2 SPD
•   Top 10% avg: 71 SPD
•   25% of users have SPD >= 31.7
Activity Day Rate




        # active _ days
  ADR =                   •   Max error: ~70%, 1.3% of the observations
                              above that rate.
        account _ age     •   Intercept: ~85%, 0.25% of the observations
                              above that rate.
Ranking users and content
 1   1          1


                    Content discovery



 i       r_ij   j    Content “likes”




                n
 m
Normalizations

• By the total number of object discovered

• By the total number of rates

• By the total number of Stumbles of the
  pages

• By keeping into account time of the rate
Lists
Lists
• Released in
  September 2012

• 45,000 lists
  created in the
  first months

• 2.9M total lists by
  February 2013
List by numbers

• Percentage of users who created more
  than 1 list in their first week of activity:
  10%

• Percentage of users who added at least 2
  pages to a list in their first week of activity:
  15%
URLs distribution


                         20
Number of URLs in List




                         10




                         0

                              0%   25%        50%   75%   100%
                                         Quantile
Content diversity
List distribution by number of topics


        1e+05
Count




        5e+04




        0e+00
                                                          151
                0   25                  50           75
                         Number of Topics in Lists
Topic Classification - Minos

               Cleanup
    Remove stopwords, numbers


                Stem
          Remove suffixes


        p (Ci w1, w2 ,… , wd ) = p ( w1, w2 ,… , wd Ci ) × p (Ci )
         Build n-grams
                       n
  Combinations of sequential words
        p (W Ci ) = Õ p ( wk Ci )
                        k=1
                                  n

          (       )      ( ) Õ p ( wk Ci )
              Wiki check
   Eliminate tokens notp Ci × in
        p Ci W = existing
    English Wikipedia as articles k=1
p (Ci w1, w2 ,… , wd ) = p ( w1, w2 ,… , wd Ci ) × p (Ci )
              n
p (W Ci ) = Õ p ( wk Ci )
             k=1
                       n
p (Ci W ) = p (Ci ) × Õ p ( wk Ci )
                      k=1
List Recommendation




                      ?
List Recommendation


        Vintage Cars
        Action movies            Astronomy
        Astronomy                Space Exploration
        Robotics
                                 Physics
                                 Classic Movies

       Movies
Cars               Space
                                 Neuroscience
                                 Astronomy
                                 Space Exploration
                       Science   Comedy Movies
Many other interesting problems…

•   Dupe detection
•   Anti-spam
•   Biases, mood
•   News
•   Adult content
•   Metrics
•   Trending
•   Many more…

More Related Content

Viewers also liked

Pesan moral dalam buku anak. mempang nggak
Pesan moral dalam buku anak. mempang nggakPesan moral dalam buku anak. mempang nggak
Pesan moral dalam buku anak. mempang nggak24 Hour Parenting
 
4 cara memberikan konsekuensi
4 cara memberikan konsekuensi4 cara memberikan konsekuensi
4 cara memberikan konsekuensi24 Hour Parenting
 
Berapa waktu yang harus diberikan orangtua untuk anak
Berapa waktu yang harus diberikan orangtua untuk anakBerapa waktu yang harus diberikan orangtua untuk anak
Berapa waktu yang harus diberikan orangtua untuk anak24 Hour Parenting
 
Bagaimana payudara bekerja saat menyusui
Bagaimana payudara bekerja saat menyusuiBagaimana payudara bekerja saat menyusui
Bagaimana payudara bekerja saat menyusui24 Hour Parenting
 
Nggak sabar sama anak wajarkah
Nggak sabar sama anak  wajarkah Nggak sabar sama anak  wajarkah
Nggak sabar sama anak wajarkah 24 Hour Parenting
 

Viewers also liked (8)

Pesan moral dalam buku anak. mempang nggak
Pesan moral dalam buku anak. mempang nggakPesan moral dalam buku anak. mempang nggak
Pesan moral dalam buku anak. mempang nggak
 
4 cara memberikan konsekuensi
4 cara memberikan konsekuensi4 cara memberikan konsekuensi
4 cara memberikan konsekuensi
 
Berapa waktu yang harus diberikan orangtua untuk anak
Berapa waktu yang harus diberikan orangtua untuk anakBerapa waktu yang harus diberikan orangtua untuk anak
Berapa waktu yang harus diberikan orangtua untuk anak
 
Bagaimana payudara bekerja saat menyusui
Bagaimana payudara bekerja saat menyusuiBagaimana payudara bekerja saat menyusui
Bagaimana payudara bekerja saat menyusui
 
Nggak sabar sama anak wajarkah
Nggak sabar sama anak  wajarkah Nggak sabar sama anak  wajarkah
Nggak sabar sama anak wajarkah
 
Pornografi
PornografiPornografi
Pornografi
 
Pornografi
PornografiPornografi
Pornografi
 
Ketagihan games
Ketagihan gamesKetagihan games
Ketagihan games
 

Similar to Recommendations and User Understanding at StumbleUpon Chief Data Scientist Summit

Recommendations and Discovery at StumbleUpon
Recommendations and Discovery at StumbleUponRecommendations and Discovery at StumbleUpon
Recommendations and Discovery at StumbleUponSumanth Kolar
 
Nandini gupta usefulpopularhelp_tekom
Nandini gupta usefulpopularhelp_tekomNandini gupta usefulpopularhelp_tekom
Nandini gupta usefulpopularhelp_tekomNandini Gupta
 
Google Analytics Basics for NCSU Libraries' Staff
Google Analytics Basics for NCSU Libraries' StaffGoogle Analytics Basics for NCSU Libraries' Staff
Google Analytics Basics for NCSU Libraries' StaffCharlie Morris
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slidesLouis Rosenfeld
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsTomer Gabel
 
IWMW 2005: Lies, Damn Lies, and Web Statistics (1)
IWMW 2005:  Lies, Damn Lies, and Web Statistics (1)IWMW 2005:  Lies, Damn Lies, and Web Statistics (1)
IWMW 2005: Lies, Damn Lies, and Web Statistics (1)IWMW
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databasessjwoodman
 
Selfish Accessibility — CodeDaze
Selfish Accessibility — CodeDazeSelfish Accessibility — CodeDaze
Selfish Accessibility — CodeDazeAdrian Roselli
 
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationFacets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationRoberto García
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive
 
MeasureWorks - Design for Fast Experiences
MeasureWorks - Design for Fast ExperiencesMeasureWorks - Design for Fast Experiences
MeasureWorks - Design for Fast ExperiencesMeasureWorks
 
Transversal social media monitoring overview (october 2012) revised
Transversal social media monitoring overview (october 2012) revisedTransversal social media monitoring overview (october 2012) revised
Transversal social media monitoring overview (october 2012) revisedTransversal Ltd
 
User-Testing, Testing, 1,2,3
User-Testing, Testing, 1,2,3User-Testing, Testing, 1,2,3
User-Testing, Testing, 1,2,3BusinessOnline
 
How to Interpret Implicit User Feedback
How to Interpret Implicit User FeedbackHow to Interpret Implicit User Feedback
How to Interpret Implicit User FeedbackLadislav Peska
 
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...keelangreen
 
Selfish Accessibility — Harbour Front HK
Selfish Accessibility — Harbour Front HKSelfish Accessibility — Harbour Front HK
Selfish Accessibility — Harbour Front HKAdrian Roselli
 
8 Information Architecture Better Practices
8 Information Architecture Better Practices8 Information Architecture Better Practices
8 Information Architecture Better PracticesLouis Rosenfeld
 

Similar to Recommendations and User Understanding at StumbleUpon Chief Data Scientist Summit (20)

Recommendations and Discovery at StumbleUpon
Recommendations and Discovery at StumbleUponRecommendations and Discovery at StumbleUpon
Recommendations and Discovery at StumbleUpon
 
Nandini gupta usefulpopularhelp_tekom
Nandini gupta usefulpopularhelp_tekomNandini gupta usefulpopularhelp_tekom
Nandini gupta usefulpopularhelp_tekom
 
Google Analytics Basics for NCSU Libraries' Staff
Google Analytics Basics for NCSU Libraries' StaffGoogle Analytics Basics for NCSU Libraries' Staff
Google Analytics Basics for NCSU Libraries' Staff
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slides
 
Dlf 2012
Dlf 2012Dlf 2012
Dlf 2012
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
 
IWMW 2005: Lies, Damn Lies, and Web Statistics (1)
IWMW 2005:  Lies, Damn Lies, and Web Statistics (1)IWMW 2005:  Lies, Damn Lies, and Web Statistics (1)
IWMW 2005: Lies, Damn Lies, and Web Statistics (1)
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
 
Selfish Accessibility — CodeDaze
Selfish Accessibility — CodeDazeSelfish Accessibility — CodeDaze
Selfish Accessibility — CodeDaze
 
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationFacets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data Exploration
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
 
Perso.na
Perso.naPerso.na
Perso.na
 
MeasureWorks - Design for Fast Experiences
MeasureWorks - Design for Fast ExperiencesMeasureWorks - Design for Fast Experiences
MeasureWorks - Design for Fast Experiences
 
Transversal social media monitoring overview (october 2012) revised
Transversal social media monitoring overview (october 2012) revisedTransversal social media monitoring overview (october 2012) revised
Transversal social media monitoring overview (october 2012) revised
 
User-Testing, Testing, 1,2,3
User-Testing, Testing, 1,2,3User-Testing, Testing, 1,2,3
User-Testing, Testing, 1,2,3
 
How to Interpret Implicit User Feedback
How to Interpret Implicit User FeedbackHow to Interpret Implicit User Feedback
How to Interpret Implicit User Feedback
 
Wa mw 2013
Wa mw 2013Wa mw 2013
Wa mw 2013
 
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
CPRS Ottawa-Gatineau - Measuring Social Media Workshop - Sean Howard - thornl...
 
Selfish Accessibility — Harbour Front HK
Selfish Accessibility — Harbour Front HKSelfish Accessibility — Harbour Front HK
Selfish Accessibility — Harbour Front HK
 
8 Information Architecture Better Practices
8 Information Architecture Better Practices8 Information Architecture Better Practices
8 Information Architecture Better Practices
 

Recently uploaded

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Recently uploaded (20)

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

Recommendations and User Understanding at StumbleUpon Chief Data Scientist Summit

  • 1. Recommendations and User Understanding at StumbleUpon Chief Data Scientist Summit, San Diego, February 2013 Debora Donato Principal Data Scientist Slides courtesy of Vishal Vaingankar, Tim Abraham, Roberto Sanabria, Ulas Bardak
  • 2. StumbleUpon’s Mission Help users find content they did not expect to find Be the best way to discover new and interesting things from across the Web.
  • 3. How StumbleUpon works 1. Register 2. Tell us your interests 3. Start Stumbling and rating web pages We use your interests and behavior to recommend new content for you!
  • 5. Single item type • No serendipity • << 100K items • Many at a time • < 250 categories • Not personalized* • Hand-labeled • Repeats • ~27M users • +100M items • >600 recs/mo. • Auto features • ~200 methods • Mostly about • Hand-labeled presentation • Item-item • Social recs only similarity based • 10 million methods recs/month
  • 6. Data-driven culture Data science Applied Analytics Research 15% of the total work force
  • 7. Extensive A/B Testing AB Tests on metrics such as session length, retention, rating behavior etc
  • 8. Outline of the talk • The recommendation pipeline • Showcases: – Mobile optimization – Power User Understanding – Lists
  • 9. Discovery is very different from search Discovery at StumbleUpon Search Serendipitous Intent driven One at a time List of articles Never repeats Always repeats Constantly adapting Fixed results Tailored for you Impersonal There is a ongoing shift from search to discovery
  • 10. StumbleUpon Overview 1 Users Automated URL Index Discovery Feeds 3 Ingestion Pipeline Rec Engine Yes 2 Pass Sampling ?
  • 11. Grow User’s Interest Graph: Implicit + Explicit Experts Friends Likeminded Users News User Food/ Trending Italian Recipes Cooking Cars nasa.gov Vintage 1x.com Cars
  • 13. Changing Ecosystem 100% 75% Percent of Total Stumbles Source 50% mobile desktop 25% 0% 2011−01 2011−07 2012−01 2012−07 2013−01 Date
  • 14. Webpages on Desktop Vs. Mobile
  • 15. Webpages on Desktop Vs. Mobile
  • 16. Finding mobile optimized content Content Features HTML tags #links P (URL_good | {f1, f2,…..}) = ? #images #videos User Feedback P (URL_good | {f1, f2,…..}) = ?
  • 17. User Feedback signals to determine mobile optimization CDF of thumbed-up URL is skipped when stumbles timespent <= skip_threshold # skips Skip_rate = # stumbles 0.05 Skip threshold Time (sec) (secs)
  • 18. Cross-device skip rate prediction URLs worse on mobile vs desktop URLs bad on Both devices Mobile Skiprate URLs good on Both devices Desktop Skiprate E[Mobile_skiprate] = Desktop_skiprate x Slope + Bias
  • 22. Power user definition • Is a loyal user who has been stumbled, even occasionally, for years? • Is a user who regularly stumbles (daily or weekly)? • Is a user who is able to discover good content? • Or one who interacts (rates, creates lists, shares contents, invites friends)?
  • 23. Stumble rate • Sample of ~5M users active in the last 3 • max dist. cut off: 25.2 SPD months • 50% dist cut off: 31.7 SPD • Excluded users that had < 10 DOA • Global avg: 39.2 SPD • Top 10% avg: 71 SPD • 25% of users have SPD >= 31.7
  • 24. Activity Day Rate # active _ days ADR = • Max error: ~70%, 1.3% of the observations above that rate. account _ age • Intercept: ~85%, 0.25% of the observations above that rate.
  • 25. Ranking users and content 1 1 1 Content discovery i r_ij j Content “likes” n m
  • 26. Normalizations • By the total number of object discovered • By the total number of rates • By the total number of Stumbles of the pages • By keeping into account time of the rate
  • 27. Lists
  • 28. Lists • Released in September 2012 • 45,000 lists created in the first months • 2.9M total lists by February 2013
  • 29. List by numbers • Percentage of users who created more than 1 list in their first week of activity: 10% • Percentage of users who added at least 2 pages to a list in their first week of activity: 15%
  • 30. URLs distribution 20 Number of URLs in List 10 0 0% 25% 50% 75% 100% Quantile
  • 32. List distribution by number of topics 1e+05 Count 5e+04 0e+00 151 0 25 50 75 Number of Topics in Lists
  • 33. Topic Classification - Minos Cleanup Remove stopwords, numbers Stem Remove suffixes p (Ci w1, w2 ,… , wd ) = p ( w1, w2 ,… , wd Ci ) × p (Ci ) Build n-grams n Combinations of sequential words p (W Ci ) = Õ p ( wk Ci ) k=1 n ( ) ( ) Õ p ( wk Ci ) Wiki check Eliminate tokens notp Ci × in p Ci W = existing English Wikipedia as articles k=1
  • 34. p (Ci w1, w2 ,… , wd ) = p ( w1, w2 ,… , wd Ci ) × p (Ci ) n p (W Ci ) = Õ p ( wk Ci ) k=1 n p (Ci W ) = p (Ci ) × Õ p ( wk Ci ) k=1
  • 36. List Recommendation Vintage Cars Action movies Astronomy Astronomy Space Exploration Robotics Physics Classic Movies Movies Cars Space Neuroscience Astronomy Space Exploration Science Comedy Movies
  • 37. Many other interesting problems… • Dupe detection • Anti-spam • Biases, mood • News • Adult content • Metrics • Trending • Many more…

Editor's Notes

  1. I want to step back a bit and ask… what
  2. I want to step back a bit and ask… what
  3. List are a new reality and since the fast adoption by the users
  4. Lists can group very distinct topics like in the case of “Save for later” and although 60% of the list are described by only 1 topics there are cases in which