SlideShare a Scribd company logo
1 of 19
Recipe search
Recipe search
BakeSearch
Make sense of recipes and bake like a pro
Disambiguating searches

Classic Chocolate chip cookies
Patty’s best chocolate cookies      Bigrams
Peanut butter cookies                   +
Sugar cookies with frosting         Trigrams
Gooey butter cookies
Banana pumpkin cookies
Black and white cookies
Halloween cookies
                                 Candidate labels
Domain-specific data munging
• Ingredients: nltk dictionary
• Domain knowledge
• Unit parsing
Defining distance measure
        Recipe 1                 Recipe 2
            Ingr1
                                   Ingr4
            Ingr2
                                   Ingr9
            Ingr3
                                  Ingr12
            Ingr4




                    Ingredients in both recipes
Jaccard =
                Ingredients in either recipe
Challenges of big data
• Most clustering algorithms (k-
  means, hierarchical, graph-based) take >30
  seconds
Challenges of big data
              • Most clustering algorithms (k-
                means, hierarchical, graph-based) take >30
                seconds
              • 40k baking recipes, 4k ingredients
            4000


            3000
# Recipes




            2000


            1000


              0
                   0   10       20         30     40
                        # Ingredients in recipe
Challenges of big data
              • Most clustering algorithms (k-
                means, hierarchical, graph-based) take >30
                seconds
              • 40k baking recipes, 4k ingredients
            4000


            3000
# Recipes




            2000

                                                      900
            1000
                                      # ingredients




                                                      600
              0
                   0   10        20                         30           40
                        # Ingredients 300 recipe
                                       in


                                                       0

                                                                 1   2        5   10    50     100              500      1000   5000   10000
                                                                                       # recipes containing ingredient
Challenges of big data
• Most clustering algorithms (k-
  means, hierarchical, graph-based) take >30
  seconds
• 40k baking recipes, 4k ingredients
• Pre-calculate jaccard distances between every
  pair of recipes (40k times 40k = 1.6 billion
  pairs!)
Challenges of big data
• Most clustering algorithms (k-
  means, hierarchical, graph-based) take >30
  seconds
• 40k baking recipes, 4k ingredients
• Pre-calculate jaccard distances between every
  pair of recipes (40k times 40k = 1.6 billion
  pairs!)
• MapReduce on Amazon EMR
• Preload into networkx graph
Cluster recipes based on ingredient
Cluster recipes based on ingredient
Find enriched/depleted ingredients




                        abs(Log-2 ratio) >2
Tools
   Back end            Analysis         Front end
• Yummly API       • Numpy, Scipy    • HTML/CSS/Jav
• Python           • Nltk, network     aScript
  – Pycurl           x               • Twitter
  – Nltk wordnet                       Bootstrap
                   • Python, R
• MySQL                              • Flask
                   • Amazon EMR
                                     • Amazon AWS
Diane Wu
• PhD Genetics, Stanford University, CA
• BSc Computing Science, Simon Fraser, Canada
Diane Wu
• PhD Genetics, Stanford University, CA
• BSc Computing Science, Simon Fraser, Canada
Diane Wu
• PhD Genetics, Stanford University, CA
• BSc Computing Science, Simon Fraser, Canada

More Related Content

Viewers also liked

Dialnet curriculo-2794021
Dialnet curriculo-2794021Dialnet curriculo-2794021
Dialnet curriculo-2794021Carla Flores
 
Bio heroes final report
Bio heroes  final reportBio heroes  final report
Bio heroes final reportDiane Wu
 
Qaradağlı faciəsi
Qaradağlı faciəsiQaradağlı faciəsi
Qaradağlı faciəsiNaze Ali Soy
 
Jc synthetic biology 6-15-2012
Jc synthetic biology   6-15-2012Jc synthetic biology   6-15-2012
Jc synthetic biology 6-15-2012Diane Wu
 
2013 SDSSA Photo of the Year Final Fifteen
2013 SDSSA Photo of the Year Final Fifteen2013 SDSSA Photo of the Year Final Fifteen
2013 SDSSA Photo of the Year Final FifteenCarol McFarland McKee
 
Reproductive system
Reproductive systemReproductive system
Reproductive systemGian Gonzaga
 
Affin Bank Berhad BSC and Business Intelligence tools
Affin Bank Berhad BSC and Business Intelligence toolsAffin Bank Berhad BSC and Business Intelligence tools
Affin Bank Berhad BSC and Business Intelligence toolsMior Azwan
 
Affin Bank Berhad Analysis
Affin Bank Berhad AnalysisAffin Bank Berhad Analysis
Affin Bank Berhad AnalysisMior Azwan
 

Viewers also liked (9)

Dialnet curriculo-2794021
Dialnet curriculo-2794021Dialnet curriculo-2794021
Dialnet curriculo-2794021
 
Splash
SplashSplash
Splash
 
Bio heroes final report
Bio heroes  final reportBio heroes  final report
Bio heroes final report
 
Qaradağlı faciəsi
Qaradağlı faciəsiQaradağlı faciəsi
Qaradağlı faciəsi
 
Jc synthetic biology 6-15-2012
Jc synthetic biology   6-15-2012Jc synthetic biology   6-15-2012
Jc synthetic biology 6-15-2012
 
2013 SDSSA Photo of the Year Final Fifteen
2013 SDSSA Photo of the Year Final Fifteen2013 SDSSA Photo of the Year Final Fifteen
2013 SDSSA Photo of the Year Final Fifteen
 
Reproductive system
Reproductive systemReproductive system
Reproductive system
 
Affin Bank Berhad BSC and Business Intelligence tools
Affin Bank Berhad BSC and Business Intelligence toolsAffin Bank Berhad BSC and Business Intelligence tools
Affin Bank Berhad BSC and Business Intelligence tools
 
Affin Bank Berhad Analysis
Affin Bank Berhad AnalysisAffin Bank Berhad Analysis
Affin Bank Berhad Analysis
 

Recently uploaded

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 

Recently uploaded (20)

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 

Diane wu Insight demo

  • 1.
  • 4. BakeSearch Make sense of recipes and bake like a pro
  • 5. Disambiguating searches Classic Chocolate chip cookies Patty’s best chocolate cookies Bigrams Peanut butter cookies + Sugar cookies with frosting Trigrams Gooey butter cookies Banana pumpkin cookies Black and white cookies Halloween cookies Candidate labels
  • 6. Domain-specific data munging • Ingredients: nltk dictionary • Domain knowledge • Unit parsing
  • 7. Defining distance measure Recipe 1 Recipe 2 Ingr1 Ingr4 Ingr2 Ingr9 Ingr3 Ingr12 Ingr4 Ingredients in both recipes Jaccard = Ingredients in either recipe
  • 8. Challenges of big data • Most clustering algorithms (k- means, hierarchical, graph-based) take >30 seconds
  • 9. Challenges of big data • Most clustering algorithms (k- means, hierarchical, graph-based) take >30 seconds • 40k baking recipes, 4k ingredients 4000 3000 # Recipes 2000 1000 0 0 10 20 30 40 # Ingredients in recipe
  • 10. Challenges of big data • Most clustering algorithms (k- means, hierarchical, graph-based) take >30 seconds • 40k baking recipes, 4k ingredients 4000 3000 # Recipes 2000 900 1000 # ingredients 600 0 0 10 20 30 40 # Ingredients 300 recipe in 0 1 2 5 10 50 100 500 1000 5000 10000 # recipes containing ingredient
  • 11. Challenges of big data • Most clustering algorithms (k- means, hierarchical, graph-based) take >30 seconds • 40k baking recipes, 4k ingredients • Pre-calculate jaccard distances between every pair of recipes (40k times 40k = 1.6 billion pairs!)
  • 12. Challenges of big data • Most clustering algorithms (k- means, hierarchical, graph-based) take >30 seconds • 40k baking recipes, 4k ingredients • Pre-calculate jaccard distances between every pair of recipes (40k times 40k = 1.6 billion pairs!) • MapReduce on Amazon EMR • Preload into networkx graph
  • 13. Cluster recipes based on ingredient
  • 14. Cluster recipes based on ingredient
  • 15. Find enriched/depleted ingredients abs(Log-2 ratio) >2
  • 16. Tools Back end Analysis Front end • Yummly API • Numpy, Scipy • HTML/CSS/Jav • Python • Nltk, network aScript – Pycurl x • Twitter – Nltk wordnet Bootstrap • Python, R • MySQL • Flask • Amazon EMR • Amazon AWS
  • 17. Diane Wu • PhD Genetics, Stanford University, CA • BSc Computing Science, Simon Fraser, Canada
  • 18. Diane Wu • PhD Genetics, Stanford University, CA • BSc Computing Science, Simon Fraser, Canada
  • 19. Diane Wu • PhD Genetics, Stanford University, CA • BSc Computing Science, Simon Fraser, Canada