I teamed up with 3 of my classmates to come up with a recipe recommendation engine that takes in ingredients & cuisine preferences as an input & gives you the best suited recipe for you. This was the final project for our Data Science in the Wild class at Cornell Tech for Spring 2020. Shoutout to my team Infinite Players, Prashant, Saloni & Dale!
The document describes a case study using logistic regression to build a model for predicting lead conversion rates. The model was built using data from an online education company to assign a lead score indicating how promising a lead is to convert. Exploratory data analysis found that leads from Google searches, SMS messages, and marketing/HR specializations had higher conversion rates. A logistic regression model was built and evaluated on train and test sets, achieving 80.1-80.9% accuracy. The model effectively identified promising leads that could be targeted to increase conversion rates over 30% to around 80% or more.
Worked on real life business problem where due to Covid-19, Airbnb has seen a major decline in revenue. To prepare for the next best steps that Airbnb needs to take as a business, analysis has been done on a dataset consisting of various Airbnb listings in New York.
This analysis served as the basis for the presentation created for the Lead Data Analyst and Data Analysis Managers
The document presents a case study for a lead scoring model built to predict potential customer conversions for an education company. Data on past leads was analyzed to identify key variables impacting conversion rate. A logistic regression model was developed and evaluated on train and test data, achieving 78% accuracy. The model can assign a lead score between 0-100 to help the company prioritize hot leads most likely to convert.
This document presents a case study on lead scoring for an education company called XEducation. The company sells online courses but has a low lead conversion rate of 30%. The goal is to build a logistic regression model to assign a lead score between 0-100 to help target potential leads and increase the conversion rate to 80%. Key factors like lead source, last activities, potential leads, and tags are analyzed. The model achieves the right balance of recall and precision to set the probability threshold at 38%, meaning leads with a score above 38 have a 38% chance of converting and should be targeted. This approach is estimated to achieve the desired 80% conversion rate.
The document discusses credit risk analysis for loan approvals. It outlines the steps in the analysis, which include data understanding, checking for data quality issues, identifying data imbalances, and conducting univariate, bivariate, and correlation analyses. The analyses found that the chances of default decrease with increased applicant age but increase with higher credit amounts. Low income groups had higher default rates than high or medium income groups. Certain applicant attributes like being a state servant, older, higher income, or having a previous approved loan were associated with lower risk of default.
Dataset Preparation
Abstract: This PDSG workshop introduces basic concepts on preparing a dataset for training a model. Concepts covered are data wrangling, replacing missing values, categorical variable conversion, and feature scaling.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
The document discusses cross-validation, which is used to estimate how well a machine learning model will generalize to unseen data. It defines cross-validation as splitting a dataset into training and test sets to train a model on the training set and evaluate it on the held-out test set. Common types of cross-validation discussed are k-fold cross-validation, which repeats the process by splitting the data into k folds, and repeated holdout validation, which randomly samples subsets for training and testing over multiple repetitions.
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
The document describes a case study using logistic regression to build a model for predicting lead conversion rates. The model was built using data from an online education company to assign a lead score indicating how promising a lead is to convert. Exploratory data analysis found that leads from Google searches, SMS messages, and marketing/HR specializations had higher conversion rates. A logistic regression model was built and evaluated on train and test sets, achieving 80.1-80.9% accuracy. The model effectively identified promising leads that could be targeted to increase conversion rates over 30% to around 80% or more.
Worked on real life business problem where due to Covid-19, Airbnb has seen a major decline in revenue. To prepare for the next best steps that Airbnb needs to take as a business, analysis has been done on a dataset consisting of various Airbnb listings in New York.
This analysis served as the basis for the presentation created for the Lead Data Analyst and Data Analysis Managers
The document presents a case study for a lead scoring model built to predict potential customer conversions for an education company. Data on past leads was analyzed to identify key variables impacting conversion rate. A logistic regression model was developed and evaluated on train and test data, achieving 78% accuracy. The model can assign a lead score between 0-100 to help the company prioritize hot leads most likely to convert.
This document presents a case study on lead scoring for an education company called XEducation. The company sells online courses but has a low lead conversion rate of 30%. The goal is to build a logistic regression model to assign a lead score between 0-100 to help target potential leads and increase the conversion rate to 80%. Key factors like lead source, last activities, potential leads, and tags are analyzed. The model achieves the right balance of recall and precision to set the probability threshold at 38%, meaning leads with a score above 38 have a 38% chance of converting and should be targeted. This approach is estimated to achieve the desired 80% conversion rate.
The document discusses credit risk analysis for loan approvals. It outlines the steps in the analysis, which include data understanding, checking for data quality issues, identifying data imbalances, and conducting univariate, bivariate, and correlation analyses. The analyses found that the chances of default decrease with increased applicant age but increase with higher credit amounts. Low income groups had higher default rates than high or medium income groups. Certain applicant attributes like being a state servant, older, higher income, or having a previous approved loan were associated with lower risk of default.
Dataset Preparation
Abstract: This PDSG workshop introduces basic concepts on preparing a dataset for training a model. Concepts covered are data wrangling, replacing missing values, categorical variable conversion, and feature scaling.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
The document discusses cross-validation, which is used to estimate how well a machine learning model will generalize to unseen data. It defines cross-validation as splitting a dataset into training and test sets to train a model on the training set and evaluate it on the held-out test set. Common types of cross-validation discussed are k-fold cross-validation, which repeats the process by splitting the data into k folds, and repeated holdout validation, which randomly samples subsets for training and testing over multiple repetitions.
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
Abstract: This PDSG workshop introduces basic concepts of splitting a dataset for training a model in machine learning. Concepts covered are training, test and validation data, serial and random splitting, data imbalance and k-fold cross validation.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Collaborative filtering is a technique used in recommender systems to predict a user's preferences based on other similar users' preferences. It involves collecting ratings or preference data from users, calculating similarities between users or items, and generating predictions for a user's unknown ratings based on weighted averages of the ratings from similar users or items. There are two main types: user-based which computes similarities between users, and item-based which computes similarities between items. Challenges include cold start problems, sparsity of data, scalability issues for large datasets, and reducing user bias.
The ID3 algorithm generates a decision tree from training data using a top-down, greedy search. It calculates the entropy of attributes in the training data to determine which attribute best splits the data into pure subsets with maximum information gain. It then recursively builds the decision tree, using the selected attributes to split the data at each node until reaching leaf nodes containing only one class. The resulting decision tree can then classify new samples not in the training data.
This document discusses using regression models to predict California housing prices from census data. It explores linear regression, decision tree regression, random forest regression and support vector regression. The random forest model performed best with the lowest RMSE of 49261.28 after hyperparameter tuning. The dataset contained 20,640 instances with 10 attributes describing California properties for which housing values needed to be estimated. Feature engineering steps like one-hot encoding and standardization were applied before randomly splitting the data into training, validation and test sets.
The document discusses Bayesian belief networks (BBNs), which represent probabilistic relationships between variables. BBNs consist of a directed acyclic graph showing the dependencies between nodes/variables, and conditional probability tables quantifying the effects. They allow representing conditional independence between non-descendant variables given parents. The document provides an example BBN modeling a home alarm system and neighbors calling police. It then shows calculations to find the probability of a burglary given one neighbor called police using the network. Advantages are handling incomplete data, learning causation, and using prior knowledge, while a disadvantage is more complex graph construction.
The document discusses IBM's Big Data and analytics solutions, including Watson Explorer which provides a single interface to access both structured and unstructured data. It also outlines several common use cases for big data such as customer analytics, security intelligence, and operations analysis. The final section provides contact information for an IBM sales manager to discuss these big data solutions.
An Analysis of New York City AirBnB ListingsBrendan Sigale
This document analyzes factors to consider when renting an Airbnb in New York City. It uses data on over 6.5 million NYC crime records from 2006-2018 and 2019 NYC Airbnb listing data. The analysis finds that Manhattan and Brooklyn dominate in both listings and crime, while Queens has fewer listings despite a similar population to Brooklyn. It also finds that crime rates are similar across boroughs but higher in Manhattan and Brooklyn per capita, and that crime is concentrated in the heart of Manhattan and Brooklyn. Availability and crime do not affect listing price.
This presentation introduces naive Bayesian classification. It begins with an overview of Bayes' theorem and defines a naive Bayes classifier as one that assumes conditional independence between predictor variables given the class. The document provides examples of text classification using naive Bayes and discusses its advantages of simplicity and accuracy, as well as its limitation of assuming independence. It concludes that naive Bayes is a commonly used and effective classification technique.
Recommendation systems provide users with information they may be interested in based on their preferences and interests. They help address the problem of information overload by retrieving desired information for the user based on their preferences or those of similar users. The two main types of recommendation systems are personalized and non-personalized systems. Common techniques used include collaborative filtering, which finds users with similar tastes, and content-based filtering, which recommends items similar to those a user has liked based on item attributes.
This is a case study I had worked on as a first year MIM student at University of Maryland (College Park), while studying INFM612 (Management of Information Programs and Services), taught by Dr. Ping Wang - a wonderful Professor.
We were given 2 unfortunate incidents that had occurred with a guest and a host of Airbnb, and had to analyze the issues and suggest solutions that can help make Airbnb an even safer option for its guests and hosts.
This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques
I spoke at the first Kaizen Data Science Conference, San Francisco, Sep 2016 on one of Instacart's recommendation systems. Also covers innovative ways of using data science to solve interdisciplinary problems. - Sharath Rao
The document discusses various clustering approaches including partitioning, hierarchical, density-based, grid-based, model-based, frequent pattern-based, and constraint-based methods. It focuses on partitioning methods such as k-means and k-medoids clustering. K-means clustering aims to partition objects into k clusters by minimizing total intra-cluster variance, representing each cluster by its centroid. K-medoids clustering is a more robust variant that represents each cluster by its medoid or most centrally located object. The document also covers algorithms for implementing k-means and k-medoids clustering.
This document summarizes a machine learning project for Homesite to predict customer quote conversions. The team members are Jack, Harry, and Abhishek. Homesite wants to predict the likelihood of customers purchasing insurance contracts based on their quote. The training data has 261k rows and 298 predictors, while the test data has 200k rows and the same 298 columns. Some key steps included data cleaning, using gradient boosting and random forests, and calculating the AUC (area under the ROC curve) metric to evaluate model performance. The team's model achieved an AUC of 0.95, indicating it does not overfit and has little bias.
The document discusses various data reduction strategies including attribute subset selection, numerosity reduction, and dimensionality reduction. Attribute subset selection aims to select a minimal set of important attributes. Numerosity reduction techniques like regression, log-linear models, histograms, clustering, and sampling can reduce data volume by finding alternative representations like model parameters or cluster centroids. Dimensionality reduction techniques include discrete wavelet transformation and principal component analysis, which transform high-dimensional data into a lower-dimensional representation.
This document describes a system for detecting brain tumors in MRI images using image segmentation. It discusses how existing manual detection of tumors is difficult due to noise and requires many days. The proposed system applies preprocessing like filtering and grayscale conversion. It then uses image segmentation techniques to detect tumor edges and boundaries. Features are extracted and classification is used to differentiate between normal and tumor images, helping doctors detect tumors earlier. The system is implemented in MATLAB and aims to overcome difficulties in early tumor detection.
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
Abstract: This PDSG workshop introduces basic concepts on measuring accuracy of your trained model. Concepts covered are loss functions and confusion matrices.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
In this presentation, two different data-sets are being collected to implement the machine learning classification techniques introduced from introduction to data mining and machine learning coursework. Both data-sets are collected by analyzing their output and team members interest. Following are the data-sets named as, Electricity grid stability simulated data-set and Face Recognition on Olivetti Data set
Predicting Bank Customer Churn Using ClassificationVishva Abeyrathne
This document describes a study that used classification models to predict customer churn for a bank. The authors collected a dataset of 10,000 bank customers from Kaggle and preprocessed the data. They then explored relationships between features and the target variable of whether a customer churned. Two classification models were tested - KNN and Decision Tree. After hyperparameter tuning, Decision Tree achieved the best accuracy of 84.25%, outperforming KNN. However, both models struggled to accurately predict customers who would churn. The authors concluded Decision Tree was the best model but recommend collecting more data on churning customers.
This document provides an overview of recommender systems for e-commerce. It discusses various recommender approaches including collaborative filtering algorithms like nearest neighbor methods, item-based collaborative filtering, and matrix factorization. It also covers content-based recommendation, classification techniques, addressing challenges like data sparsity and scalability, and hybrid recommendation approaches.
Ovenbot is an app that helps users find recipes based on the ingredients they already have. It manages a virtual pantry and shopping lists. Recipes can be searched, filtered, and modified. The app generates recipe recommendations based on the user's profile and allows easy browsing, saving, and sharing of recipes. Ovenbot aims to be the "kitchen of the future" by integrating users' physical kitchens and enabling new features like meal planning and sensor integration with appliances. It is a freemium app with some premium upgrades available. The founders created it to solve the problem of not knowing what to cook with the ingredients on hand.
Using Data Science to Transform OpenTable Into Your Local Dining ExpertPablo Delgado
Presentation for Spark Summit 2015, San Francisco
https://spark-summit.org/2015/events/using-data-science-to-transform-opentable-into-your-local-dining-expert/
Abstract: This PDSG workshop introduces basic concepts of splitting a dataset for training a model in machine learning. Concepts covered are training, test and validation data, serial and random splitting, data imbalance and k-fold cross validation.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Collaborative filtering is a technique used in recommender systems to predict a user's preferences based on other similar users' preferences. It involves collecting ratings or preference data from users, calculating similarities between users or items, and generating predictions for a user's unknown ratings based on weighted averages of the ratings from similar users or items. There are two main types: user-based which computes similarities between users, and item-based which computes similarities between items. Challenges include cold start problems, sparsity of data, scalability issues for large datasets, and reducing user bias.
The ID3 algorithm generates a decision tree from training data using a top-down, greedy search. It calculates the entropy of attributes in the training data to determine which attribute best splits the data into pure subsets with maximum information gain. It then recursively builds the decision tree, using the selected attributes to split the data at each node until reaching leaf nodes containing only one class. The resulting decision tree can then classify new samples not in the training data.
This document discusses using regression models to predict California housing prices from census data. It explores linear regression, decision tree regression, random forest regression and support vector regression. The random forest model performed best with the lowest RMSE of 49261.28 after hyperparameter tuning. The dataset contained 20,640 instances with 10 attributes describing California properties for which housing values needed to be estimated. Feature engineering steps like one-hot encoding and standardization were applied before randomly splitting the data into training, validation and test sets.
The document discusses Bayesian belief networks (BBNs), which represent probabilistic relationships between variables. BBNs consist of a directed acyclic graph showing the dependencies between nodes/variables, and conditional probability tables quantifying the effects. They allow representing conditional independence between non-descendant variables given parents. The document provides an example BBN modeling a home alarm system and neighbors calling police. It then shows calculations to find the probability of a burglary given one neighbor called police using the network. Advantages are handling incomplete data, learning causation, and using prior knowledge, while a disadvantage is more complex graph construction.
The document discusses IBM's Big Data and analytics solutions, including Watson Explorer which provides a single interface to access both structured and unstructured data. It also outlines several common use cases for big data such as customer analytics, security intelligence, and operations analysis. The final section provides contact information for an IBM sales manager to discuss these big data solutions.
An Analysis of New York City AirBnB ListingsBrendan Sigale
This document analyzes factors to consider when renting an Airbnb in New York City. It uses data on over 6.5 million NYC crime records from 2006-2018 and 2019 NYC Airbnb listing data. The analysis finds that Manhattan and Brooklyn dominate in both listings and crime, while Queens has fewer listings despite a similar population to Brooklyn. It also finds that crime rates are similar across boroughs but higher in Manhattan and Brooklyn per capita, and that crime is concentrated in the heart of Manhattan and Brooklyn. Availability and crime do not affect listing price.
This presentation introduces naive Bayesian classification. It begins with an overview of Bayes' theorem and defines a naive Bayes classifier as one that assumes conditional independence between predictor variables given the class. The document provides examples of text classification using naive Bayes and discusses its advantages of simplicity and accuracy, as well as its limitation of assuming independence. It concludes that naive Bayes is a commonly used and effective classification technique.
Recommendation systems provide users with information they may be interested in based on their preferences and interests. They help address the problem of information overload by retrieving desired information for the user based on their preferences or those of similar users. The two main types of recommendation systems are personalized and non-personalized systems. Common techniques used include collaborative filtering, which finds users with similar tastes, and content-based filtering, which recommends items similar to those a user has liked based on item attributes.
This is a case study I had worked on as a first year MIM student at University of Maryland (College Park), while studying INFM612 (Management of Information Programs and Services), taught by Dr. Ping Wang - a wonderful Professor.
We were given 2 unfortunate incidents that had occurred with a guest and a host of Airbnb, and had to analyze the issues and suggest solutions that can help make Airbnb an even safer option for its guests and hosts.
This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques
I spoke at the first Kaizen Data Science Conference, San Francisco, Sep 2016 on one of Instacart's recommendation systems. Also covers innovative ways of using data science to solve interdisciplinary problems. - Sharath Rao
The document discusses various clustering approaches including partitioning, hierarchical, density-based, grid-based, model-based, frequent pattern-based, and constraint-based methods. It focuses on partitioning methods such as k-means and k-medoids clustering. K-means clustering aims to partition objects into k clusters by minimizing total intra-cluster variance, representing each cluster by its centroid. K-medoids clustering is a more robust variant that represents each cluster by its medoid or most centrally located object. The document also covers algorithms for implementing k-means and k-medoids clustering.
This document summarizes a machine learning project for Homesite to predict customer quote conversions. The team members are Jack, Harry, and Abhishek. Homesite wants to predict the likelihood of customers purchasing insurance contracts based on their quote. The training data has 261k rows and 298 predictors, while the test data has 200k rows and the same 298 columns. Some key steps included data cleaning, using gradient boosting and random forests, and calculating the AUC (area under the ROC curve) metric to evaluate model performance. The team's model achieved an AUC of 0.95, indicating it does not overfit and has little bias.
The document discusses various data reduction strategies including attribute subset selection, numerosity reduction, and dimensionality reduction. Attribute subset selection aims to select a minimal set of important attributes. Numerosity reduction techniques like regression, log-linear models, histograms, clustering, and sampling can reduce data volume by finding alternative representations like model parameters or cluster centroids. Dimensionality reduction techniques include discrete wavelet transformation and principal component analysis, which transform high-dimensional data into a lower-dimensional representation.
This document describes a system for detecting brain tumors in MRI images using image segmentation. It discusses how existing manual detection of tumors is difficult due to noise and requires many days. The proposed system applies preprocessing like filtering and grayscale conversion. It then uses image segmentation techniques to detect tumor edges and boundaries. Features are extracted and classification is used to differentiate between normal and tumor images, helping doctors detect tumors earlier. The system is implemented in MATLAB and aims to overcome difficulties in early tumor detection.
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
Abstract: This PDSG workshop introduces basic concepts on measuring accuracy of your trained model. Concepts covered are loss functions and confusion matrices.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
In this presentation, two different data-sets are being collected to implement the machine learning classification techniques introduced from introduction to data mining and machine learning coursework. Both data-sets are collected by analyzing their output and team members interest. Following are the data-sets named as, Electricity grid stability simulated data-set and Face Recognition on Olivetti Data set
Predicting Bank Customer Churn Using ClassificationVishva Abeyrathne
This document describes a study that used classification models to predict customer churn for a bank. The authors collected a dataset of 10,000 bank customers from Kaggle and preprocessed the data. They then explored relationships between features and the target variable of whether a customer churned. Two classification models were tested - KNN and Decision Tree. After hyperparameter tuning, Decision Tree achieved the best accuracy of 84.25%, outperforming KNN. However, both models struggled to accurately predict customers who would churn. The authors concluded Decision Tree was the best model but recommend collecting more data on churning customers.
This document provides an overview of recommender systems for e-commerce. It discusses various recommender approaches including collaborative filtering algorithms like nearest neighbor methods, item-based collaborative filtering, and matrix factorization. It also covers content-based recommendation, classification techniques, addressing challenges like data sparsity and scalability, and hybrid recommendation approaches.
Ovenbot is an app that helps users find recipes based on the ingredients they already have. It manages a virtual pantry and shopping lists. Recipes can be searched, filtered, and modified. The app generates recipe recommendations based on the user's profile and allows easy browsing, saving, and sharing of recipes. Ovenbot aims to be the "kitchen of the future" by integrating users' physical kitchens and enabling new features like meal planning and sensor integration with appliances. It is a freemium app with some premium upgrades available. The founders created it to solve the problem of not knowing what to cook with the ingredients on hand.
Using Data Science to Transform OpenTable Into Your Local Dining ExpertPablo Delgado
Presentation for Spark Summit 2015, San Francisco
https://spark-summit.org/2015/events/using-data-science-to-transform-opentable-into-your-local-dining-expert/
Decyphering Recipes: Mapping ontologies for personalizationNeo4j
This document discusses how Gousto, an online recipe box service, uses Neo4j and a recipe ontology graph to power personalized recipe recommendations. It outlines Gousto's data challenges with siloed databases and lack of customer journey tracking. The ontology graph captures recipe attributes and ingredient relationships to calculate similarity scores beyond basic ingredients. This allows personalizing recommendations based on cuisines, dish types, and customer goals. Benchmarking with human ratings helps validate the model, while future uses could include substitution for diets and AI-generated recipes.
The document discusses issues with the cafeteria food provided to employees, which is described as unhygienic, low quality, oily and fatty. It notes that junk food options like pizza and burgers are also unhealthy. The document then introduces a company called Unjunk that provides healthier, tasty food alternatives made with quality ingredients at affordable prices. Unjunk has partnered with several corporate clients to provide food kiosks and catering for employees.
The document describes an app called iCook that helps users plan and prepare meals. The app allows users to view their pantry ingredients, choose recipes, generate a customized shopping list for missing items, and provide feedback on recipes after cooking. It aims to make meal planning easier by selecting recipes based on the user's available ingredients and preferences.
New self cooking center, rational 5 sense - KitchenramaKitchen Rama
The SelfCookingCenter® 5 Senses is the only cooking system in the world with 5 senses. Because it senses, recognises, thinks ahead, learns from you and even communicates with you For further information visit @ http://www.kitchenrama.com/
Our product will be mainly chocolate fudge brownies prepared with only mixing the ingredients and the convenience of not having to use the oven. We will focus on selling our no-bake brownies to students at Philippine Women's University, with a starting production of 40 pieces and pricing them affordably at 25% above cost. We aim to gain customers through promotions and bulk order discounts while complying with necessary business permits.
This document discusses reducing food waste through various strategies. It notes that 4-10% of food purchased by U.S. foodservice operations is thrown out before reaching customers, representing $9-23 billion in annual pre-consumer waste. Successful prevention requires changing behaviors through measurement, automation, and interventions like production adjustments, optimized ordering/menus, and influencing consumer behaviors with signage and portion sizes. Long-term tracking of waste metrics is key to driving continuous improvement.
Marketing plan for "Foodpanion",a cookery mobile appRaghu Kumar Reddy
This document provides an executive summary and overview of the Foodpation app. The app aims to provide users with a massive library of recipes that can be searched and filtered in various ways. It will offer both free and premium versions, with the free version allowing basic searches and saving of some recipes while generating revenue through ads, and the premium version providing additional features for a monthly or annual fee. The goal is to attract customers through the large recipe library and filters, gain 50,000 downloads in the first year, and generate $20 million in ad revenue to create a strong brand and customer satisfaction.
Using Data Science to Transform OpenTable Into Your Local Dining Expert-(Pabl...Spark Summit
Using data science techniques, OpenTable analyzes large amounts of data from over 32,000 restaurants and 190 million diners to provide personalized dining recommendations. Some key techniques used include collaborative filtering, matrix factorization, topic modeling of large volumes of reviews to understand restaurant attributes, and word embedding models to find similar restaurants across locations. The goal is to understand individual diner and restaurant preferences and trends to connect diners with the best possible dining experiences.
This document summarizes a project to recommend Thai foods to foreigners based on their preferences. It involved collecting recipe and ingredient data, cleaning the data, analyzing relationships between cuisines and ingredients, building similarity models combining Jaccard and cosine similarities of ingredients, flavors and categories, and evaluating the models. The best model achieved 41-42% accuracy in recommending similar recipes. Future work could involve more Thai food data, customer ratings, cooking methods, and developing an interface for image searching and recommendations.
Quester: QUANT + QUAL in a single hybrid design, leveraging AI technology
Intengo: Prediction markets to test concepts - FAST
Partnering to break through the “not sures” and “idks” for quantitative clarity and real qualitative insight
The main screen of the RecipesOnDemand software allows users to access and manage recipes. It includes an extensive ingredient and prepared foods database with nutritional information. Recipes can be selected from over 5,000 pre-loaded recipes or created from scratch, and are sorted into categories. Recipes can be printed individually or in groups, sorted by menu category or alphabetically. Multi-yield recipes automatically scale ingredients and instructions for different quantities. Nutrition facts labels and analysis can be generated for any recipe.
This document proposes an Android app called iCookbook that allows users to search over hundreds of recipes online by inputting ingredients. The app will search for recipes matching the ingredients and allow filtering by ratings, cooking time, and saving recipes offline. It aims to provide a one-stop solution for finding recipes through a simple user interface with step-by-step cooking instructions. The business hopes to generate revenue through in-app purchases and advertisements in the free version while offering additional features like calorie tracking in the premium version.
Galleta del Valle is a cookie company founded in Silicon Valley that donates profits to peace initiatives. They present sample recipes and cost analysis for their White Chocolate Chip & Macadamia Nut cookies. Production tasks are split between two chefs who develop measurements and analyze costs together. They aim to improve quality, lower costs, and increase sales volume to further their goals of donations and world peace.
Every day CIOs are asked to choose between things that are seemingly incomparable, but now you have objective analysis and standard metrics to compare Apples and Oranges with CAST HIGHLIGHT.
CAST HIGHLIGHT is a scalable, cloud-based Application Portfolio Analysis service that requires no licenses, hardware, or outside consultant. Learn more at www.casthighlight.com
This document discusses the development of a recipe search mobile application. It outlines features of the app such as searching recipes by ingredients, calendar functionality to plan meals, and providing nutrition information. It also discusses monetization strategies like free basic features and premium subscriptions. Plans for marketing the app through social media, blogs, and partnerships are mentioned. Finally, it outlines the development process including gathering recipes, designing the app interface, and obtaining user feedback.
This document discusses standard recipes and scaling recipes for restaurants. It defines a standard recipe as a tested formula that consistently provides quality and yield, serving as a guide for food preparation, training staff, and food costing. A standard recipe should include the name, yield, equipment, ingredients, procedures, timing, portioning/plating instructions, storage directions, and substitution notes. Scaling recipes involves changing ingredient amounts to produce a different yield based on required portions, calculated by a conversion factor. Scaling prevents waste from over or underproduction.
This slide was used at CHI2012 Conference (http://dl.acm.org/citation.cfm?id=2207695). Paper "panavi: recipe medium with a sensors-embedded pan for domestic users to master professional culinary arts"is here http://panavi.jp/panavi_CHI2012.pdf. http://panavi.jp
This document provides an overview of a project to create a bagel recommender system using Amazon Lex for an online bagel shop called Bagel & Deli. The team gathered data on the shop's 87 bagel varieties and used Amazon Sagemaker to build a K-nearest neighbors model to provide bagel recommendations to customers. Amazon Lambda was used to integrate the model with Amazon Lex. The demo showed how a customer could interact with the Lex bot to get a bagel recommendation. The project aims to help customers discover new bagel options, reduce wait times, and increase sales and customer satisfaction for Bagel & Deli.
Similar to Ingredients based - Recipe recommendation engine (20)
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
2. HOWDIDWE ARRIVEHERE?
FINDCALLING
01
FINDDATA
We found over 20+ usable
data repositories &
analyzed them
02
FINDRECIPE
Upon cleaning, we tried various
models to get the best possible
result across various models.
03
COOK&SERVE!
We collated the best
results around an intuitive
workflow
04
Infinite Players
We foodies found that no-one has
collated multiple datasets & made
a good recipe recommendation
engine
3. BACKGROUND
Coronavirus has brought about an interesting fact about
young working professionals and university students. All of
them relied on take-aways and dine-ins skipping cooking.
Most people do not know what to cook despite so many
options available in the grocery store in this globalized world.
Therefore, we want to answer:
What do I cook, given I have these ingredients available?
Food is the gateway to a new culture and so many cultures
can be explored by what is in your fridge. Our engine enables
this cross cultural exchange by telling you what is possible!
Infinite Players
4. DATA-FROM THEWILD
DATAsources:
Infinite Players
RECIPE_INGR_REVIEW (12K)
YUMMLY CLEAN (6K)
FOOD.COM DATA (231K)
EPICURIOUS (20K)
RECIPE_INGR (56K)
Dataset Name SOURCE FIELDS TAKEN
FOOD_COM LINK Ingredients, Recipe Name
EPICURIOUS LINK Ingredients, Recipe Name, Ratings,
Description
YUMMLY CLEAN LINK Ingredients, Recipe Name, Cuisines
RECIPE_INGR LINK Ingredients, Cuisines
RECIPE_INGR_REV LINK Ingredients, Recipe Name, Ratings
Total:~270k
5. DATA-FROM THEWILD
FORINGREDIENTS
We had ingredients ranging
from ubiquitous wheat flour
to the most exotic such as
Saffron.
In total, we had more than
100K+ ingredients in our
datasets
FORCUISINE
We started with more than 35
unique cuisines, studied the
differences, and commonalities
among all.
And finally mapped them to a
superset of 7
FORUSERRATINGS
Certain datasets had user
reviews for the recipes.
We utilised these reviews
by defining a rating scale
from 1-5 as a basis for
our item-item based
collaborative filtering
model.
Infinite Players
FORREcipeNAMES
All datasets have recipe
names except recipe
ingredients which has
only cuisine names &
ingredients.
This is our desired
output.
6. DATACLEANING- overview
Basic
Common text
preprocessing techniques
01
No“quantities”
“OZ”, “KG”, “POUNDS”,
“TSP”, “LITTLE”, “PINCH”
02
Extractnouns
POS tagging, extract
ingredients from recipe
instructions
03
Removerarewords
AVG term frequency is
600, remove words
occurring < 30 times
04
Infinite Players
Iterate!
05
Continue cleaning as we
see results
7. FEATUREENGINEERING
WHY?
● Multiple datasets - different data formats
● Cleaning to 100% is hard, doesn’t scale to new data
● Ingredient related tokens > 2.5 MIL across 270K recipes
Infinite Players
8. CUISINES- inthewild
Which cuisine does the recipe belong to?
Which cuisines should we narrow it down
to?
We tried to narrow down cuisines from this ->
Infinite Players
PROBLEM
GROCERYINSPIRATION
9. CUISINESDEMYSTIFIED
Infinite Players
Confusion Matrix - Using Neural Network
Upon refining further, we combined many
cuisines to achieve the highest accuracy for
our cuisine classifier while maintaining
distinctive flavors and favoring numbers.
Final List of Cuisines (7): American, Italian,
European, Asian, Mexican, French, Indian
10. CUISINEs- AMERICA!
Infinite Players
Confusion Matrix - Using Ensembling methods
Some patterns can be clearly noticed:
French cuisine is very similar to America’s Cajun & Creole (Louisiana).
Mexico influenced Texan food.
Italy has a great influence on Northeast food with Pizza etc.
European cuisine (Spanish,British & German) has a great influence
too
Asians & Indian cuisines have minimal collision
This is really similar to the ethnicity of immigrants in the US
*Indian & Mexican cuisines also share a lot of flavors.
11. HOWDOESITWORK?
Infinite Players
OUTPUT
INPUT : Ingredients feature vectors
EnsembleTechniques
Neuralnetwork
Logistic Regression
K Neighbors Classifier
Decision Tree Classifier
Random Forest Classifier
Layer 1: Linear + Leaky ReLU
Layer 2: Linear + LeakyReLU + Dropout
Layer 3: Linear + LeakyReLU + Dropout
Layer 4: Linear + Softmax
:Cuisine type for a list of ingredients
12. HOWTOGETARECOMMENDATION
Infinite Players
OUTPUTINPUT
Ingredient (s)
Choice of Cuisine
(if any)
COLLABFILTER
CONTENTBASEDRECOMMENDATION
AlternativeINPUT
Name of Recipe
ONE List of recipes
according to user
preferences
& Another,
List of recipes closest to
the ingredients mentioned.
Ingredient to
features using
word2vec model
Cosine Similarity for
calculating distance
KNN with Means
for recipe ratings
Cosine Similarity for
calculating distance
ITEM - ITEM based filter
Ingredients
taken as input
from recipe
13. MODEL- COLLABORATIVEFILTERing
INPUT
We build a recommender system
in which the user inputs the
ingredients they have on hand.
Based on these inputs we will
generate a short list of recipes
that fit the users preferences.
MODEL
KNN with Means has been
chosen for the recommender,
which is a basic collaborative
filtering algorithm, taking into
account the mean ratings of
each user.
Compute the cosine similarity
DATACLEANINg
We use only one rating per user.
Further we define a rating scale
for the recipe.This is determined
by the lowest and highest rating
possible given by the users.
Infinite Players
EVALUATE
We use the Surprise lib to test our
recsys. Using cross validation we
evaluate the model using a few
metrics like MSE and RMSE.
OUTPUT
Finally get a
recommendation
based on an input
string of ingredients
14. COLLABORATIVEFILTERing-RESULTS
Infinite Players
Input:User_ID,Ingredients
User_id: 2043209
Ingredients: ‘chicken,egg,milk’
RECIPE INGREDIENTS INSTRUCTIONS
Chicken Lasagna with White Sauce Recipe mozzarella,mushroom,milk,spinach,egg,ricotta,n… Preheat oven to 350 degrees F (175 degrees C)....
Swedish Meatballs egg,milk,ground beef,cereal,onion,chicken,mush. Preheat oven to 350 degrees F (175 degrees C)....
Mushroom Chicken Piccata Recipe flour,salt,paprika,egg,milk,chicken,butter,mus… In a shallow dish or bowl, mix together flour,...
User_id: 700
Ingredients:: ‘Cheese,onion’
RECIPE INGREDIENTS INSTRUCTIONS
Tuna Noodle Casserole II Recipe noodle,mushroom,milk,tuna,cheese,onion,potato,... In a large pot with boiling salted water cook ...
Hamburger Cheese Bake Recipe pasta,ground beef,onion,tomato sauce,white sauce.. In a large pot cook with boiling salted water..
I
15. MODEL- CONTENTBASEDRECOMMENDER SYSTEM
RAWINPUT
Ingredient list for every
recipe. All ingredients are
kept through the pre-
processing pipeline
MODEL:WorD2vec
Ingredients to features. 200 dimensions (with PCA,
negligible difference in cuisine results, hence
unused), context window of 12 (based on
experiments), downsampling threshold of 1e-3
Recommendation
Take input ingredients
and use w2v on it. Use
cosine similarity to
compare distance with
recipes in dataset
Infinite Players
evaluation
Based on performance of
downstream task: cuisine
classification
Eyeballing results of recipes
recommended
17. POST RECOMMENDATION /FUTURESCOPE
Personalization
As a part of improving the recommendations,
users can be prompted to rate the recipes they
were recommended.
!
The tool can be integrated into smart
devices such as refrigerators.
Integrationintosmart
devices
Infinite Players
!
User can evaluate recommendation
quality to improve the models
LEARNREGULAR
!
23. References andRelevantWork
1. https://www.kaggle.com/c/whats-cooking/data (P)
2. https://www.kaggle.com/shuyangli94/food-com-recipes-and-user-interactions (D)
3. https://www.kaggle.com/hugodarwood/epirecipes (B)
4. https://www.kaggle.com/kaggle/recipe-ingredients-dataset (S)
5. https://www.kaggle.com/kanaryayi/recipe-ingredients-and-reviews (P)
6. https://data.world/datafiniti/food-ingredient-lists (D)
7. https://link.springer.com/article/10.1007/s10844-017-0469-0
8. http://foodb.ca/ (B)
9. https://github.com/lingcheng99/Flavor-Network (S)
10. https://www.nature.com/articles/srep00196
11. https://www.foodpairing.com/
12. https://www.wired.com/2013/11/a-new-kind-of-food-science/
13. https://www.prescouter.com/2019/05/flavor-discovery-big-data-ai/
14. https://waterfootprint.org/media/downloads/Mekonnen-Hoekstra-2011-WaterFootprintCrops.pdf
15. https://www.footprintnetwork.org/licenses/public-data-package-free/
1. A New Kind of Food Science: How IBM Is Using Big Data to Invent Creative Recipes
● The study develops an algorithm that generates a list of recipes ranked using three categories: surprise, pleasantness of odor, and flavor pairings
1. Flavor network and the principles of food pairing
● The study introduces a flavor network that captures the flavor compounds shared by culinary ingredients. Given the increasing availability of information on food preparation, their data-driven
investigation also opens new avenues towards a systematic understanding of culinary practice.
1. How healthy is the meal: an analysis of recipe data
● The study looks into the interconnection between ratings, nutrients, ingredients, meals, seasons, holidays and cooking techniques.
Infinite Players
24. CUISINE
Infinite Players
GROCERYINSPIRATION
To these classifications.
We looked at our own grocery store
experiences and saw that we all could identify
items in the supermarket from these cuisines.
Therefore, people could recognize most these
cuisines
On the other hand, some cuisines had very
distinctive flavors and classifications. Such as
Jamaican & Moroccan. Therefore we tried
keeping a small sample & building a model
around it.
25. Fonts& colors used
This presentation has been made using the following fonts:
Staatliches
(https://fonts.google.com/specimen/Staatliches)
Roboto Condensed
(https://fonts.google.com/specimen/Roboto+Condensed)
#4c1130 #ff5864 #df183d#20124d #76a5af #134f5c#ffd966
OTHER RESOURCEs:
Inspiration from across SlidesGo
Infinite Players
Editor's Notes
We were looking for inspiration and we foodies who are students found that there is no good recipe recommendation engine which would suggest recipes with the ingredients available at our hand and at the same time satisfy of our specific taste. Most of the existing solutions suggest very common recipes.
Data
Try best models
We collated the best results through an intutive workflow
COVID-19 has had an adverse impact on the populations health and finances. We want to provide people with a way to
make food at home easily and quickly by recommending recipes depending on their preferences and what ingredients they
have available on hand.
Our proposed solution bodes well in the current ongoing pandemic wherein access to restaurant food is becoming
increasingly difficult. Anyone interested in saving money on eating out while simultaneously becoming more independent
and healthier will benefit from this.
People using the service can save money while simultaneously honing a skill everyone should have which is the ability to
cook food for one's self. Also, they will have an informed choice for eating healthy food.
We identified the fields in every dataset which would provide some sort of value to our recommendation engine
Fussiness
Ratings
INGREDIENT CLEANING:
We wanted to transform all our datasets to use the “Ingredient only” format. We used the following techniques across our datasets:
Removal of punctuation, numeric quantities and extra spaces
Removal of quantity strings such as “ounces”, “pounds” etc. and their variations
Splitting ingredients by “and” and “with” into individual ingredients. For example, “tomato sauce with basil and garlic” would become “tomato sauce”, “basil”, “garlic”
Use of POS tagging to identify noun phrases. For example, “Whisk some eggs” -> “eggs”
Removal of words ending in ‘ed’.
We applied the cuisine prediction on recipe datasets (>270K)
Key differences:
NN distributes recipes from Ensembling’s “American” and distributes in mexican, and other cuisines.