Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Machine Learning Project
1. Problem 1: WHAT CUISINE IS THIS
RECIPE?
Eckovation Machine Learning
Team Bits N’ Bytes :-
-Gaurav(00711503016)
-Kartik(01411503016)
-Pooja (41211503016)
-Kishu(01911504916)
-Govind (01411504916)
3. Data Shared: Training and
Testing Data
Dataset Format: JSON file
Language Used : Python
Type of Machine Learning used : Supervised
Dataset includes : we include the recipe id, the type of
cuisine, and the list of ingredients of each recipe (of
variable length) etc.
6. ALGORITHM USED:-
LOGISTIC REGRESSION:- is the go-to method for binary
classification. It gives you a discrete binary outcome
between 0 and 1. To say it in simpler words, it’s outcome is
either one thing or another.
RANDOM FOREST CLASSIFIER:- It builds multiple Decision
trees and merges them together to get a more accurate and
stable prediction. One big advantage of random forest is,
that it can be used for both classification and regression
problems, which form the majority of current machine
learning systems.
NAIVE BAYES:-Naive Bayes is a classification algorithm for
binary (two-class) and multi-class classification problems.
9. Problem 2: Will You Get A Free
Pizza?This problem is based on sentiment analysis in which we identify positive, negative
and neutral opinions in a natural language.
Here, If someone buys pizza to the requester, the request would be considered
successful, if not, would be unsuccessful.
INPUT : -Dataset for textual requests for Pizza from Random Acts Of Pizza community
on Reddit.
GOAL :- Given a request (post), the goal is to predict if it will be successful or
unsuccessful.
We aim to convert textual features in numeric features that contain sentiment
information, suitable to be given as input to machine learning algorithm.
10. DATASET SHARED: TRAINING AND TEST
Dataset Format: JSON file
Language Used : Python
Type of Machine Learning used : Supervised
Dataset includes :
5671 requests collected from Reddit
Community Random Acts Of Pizza between
December 8,2010 and September 29,2013.
Outcome of each request(whether the
author gets the pizza or not) : Known
MetaData includes :
Time of the request, Activity of the requester,
community age of the requester, etc.
11.
12.
13. ALGORITHM USED
Logistic Regression : It is a statistical model is usually applied to a binary dependent variable. The two
dependent variable values are often labelled as “0” and “1” which in our problem are “request text” and
“requester gets the pizza” respectively.
Naive Bayes : It is a family of algorithm based on the principle that value of a particular feature is
independent of the value of any other feature, given in the class variable. Its advantage is that it requires
a small dataset to estimate the parameters necessary for classification.
Support Vector Machine : It is a further extension to SVC to accomodate non-linear boundaries.
Though there is a clear distinction between various definitions but people prefer to call all of them
as SVM to avoid any complications.
Random Forest: represents multitude of decision trees. Based on the concept of neighbourhood
interpretation and can also be analysed in an unsupervised format.
We used NLTK’s API to get the polarity of the text which can be successful or unsuccessful in our
case.
14.
15. CONCLUSION
Hence, we can say that SVM and Random Forest are the best
models to work with this dataset.
ALGORITHM ACCURACY
SUPPORT VECTOR MACHINE (SVM) 0.7648514851485149
RANDOM FOREST 0.760519801980198
NAIVE BAYES 0.7580445544554455
LOGISTIC REGRESSION 0.7580445544554455