SlideShare a Scribd company logo
1 of 11
Download to read offline
Machine Learning
ASSIGNMENT 2
 NAME: Faizan Arshad
 REG.NO: 2018-P/2019-EE-139
 SECTION: B
SUBMITTED TO
DR KASHIF JAVAID
University of Engineering & Technology Lahore, Pakistan
Logistic regression
Logistic regression is a statistical method that is used for building machine learning models where
the dependent variable is dichotomous: i.e. binary. Logistic regression is used to describe data and
the relationship between one dependent variable and one or more independent variables. The
independent variables can be nominal, ordinal, or of interval type. The name “logistic regression”
is derived from the concept of the logistic function that it uses. The logistic function is also known
as the sigmoid function. The value of this logistic function lies between zero and one. The
following is an example of a logistic function we can use to find the probability of a vehicle
breaking down, depending on how many years it has been since it was serviced last.[1]
Advantages of the Logistic Regression Algorithm
 Logistic regression performs better when the data is linearly separable
 It does not require too many computational resources as it’s highly interpretable
 There is no problem scaling the input features—It does not require tuning
 It is easy to implement and train a model using logistic regression
 It gives a measure of how relevant a predictor (coefficient size) is, and its direction of
association (positive or negative)
How Does the Logistic Regression Algorithm Work?
The Sigmoid function (logistic regression model) is used to map the predicted predictions to
probabilities. The Sigmoid function represents an ‘S’ shaped curve when plotted on a map. The
graph plots the predicted values between 0 and 1. The values are then plotted towards the margins
at the top and the bottom of the Y-axis, with the labels as 0 and 1. Based on these values, the target
variable can be classified in either of the classes.
The equation for the sigmoid function is given as:
y=1/(1+e^x),
Where e^x= the exponential constant with a value of 2.718.
This equation gives the value of y(predicted value) close to zero if x is a considerable negative
value. Similarly, if the value of x is a large positive value, the value of y is predicted close to one.
A decision boundary can be set to predict the class to which the data belongs. Based on the set
value, the estimated values can be classified into classes.
For instance, let us take the example of classifying emails as spam or not. If the predicted value(p)
is less than 0.5, then the email is classified spam and vice versa.[2]
Types of logistic regression
Logistic regression models are generally used for predictive analysis for binary classification of
data. However, they can also be used for multi-class classification. Logistic regression models can
be classified into three main logistic regression analysis categories. They are:
 Binary Logistic Regression Model
This is one of the most widely-used logistic regression models, used to predict and categorize data
into either of the two classes. For example, a patient can have cancerous cells, or they cannot. The
data can’t belong to two categories at the same time.
 Multinomial Logistic Regression Model
The multinomial logistic regression model is used to classify the target variable into multiple
classes, irrespective of any quantitative significance. For instance, the type of food an individual
is likely to order based on their diet preferences – vegetarians, non-vegetarians, and vegan.
 Ordinal Logistic Regression Model
The ordinal logistic regression model is used to classify the target variable into classes and also in
order. For example, a pupil’s performance in an examination can be classified as poor, good, and
excellent in a hierarchical order. Thus, we can see that the data is not only classified into three
distinct categories, but each category has a unique level of importance.
The logistic regression algorithm can be used in a plethora of cases such as tumor classification,
spam detection, and sex categorization, to name a few. Let’s have a look at some logistic regression
examples to get a better idea.[2]
Random Forest
A random forest is a supervised machine learning algorithm that is constructed from decision tree
algorithms. This algorithm is applied in various industries such as banking and e-commerce to
predict behavior and outcomes. A random forest algorithm consists of many decision trees. The
‘forest’ generated by the random forest algorithm is trained through bagging or bootstrap
aggregating. Bagging is an ensemble meta algorithm that improves the accuracy of machine
learning algorithms. A random forest eradicates the limitations of a decision tree algorithm. It
reduces the overfitting of datasets and increases precision. It generates predictions without
requiring many configurations in packages. Classification in random forests employs an ensemble
methodology to attain the outcome. The training data is fed to train various decision trees. This
dataset consists of observations and features that will be selected randomly during the splitting of
nodes.
A rain forest system relies on various decision trees. Every decision tree consists of decision nodes,
leaf nodes, and a root node. The leaf node of each tree is the final output produced by that specific
decision tree. The selection of the final output follows the majority-voting system. In this case, the
output chosen by the majority of the decision trees becomes the final output of the rain forest
system. The diagram below shows a simple random forest classifier. [3]
Some of the applications of the random forest may include:
1. Banking
2. Stock market
3. E-Commerce
4. Health Care System
Features of Random Forests
 It is unexcelled in accuracy among current algorithms.
 It runs efficiently on large data bases.
 It can handle thousands of input variables without variable deletion.
 It gives estimates of what variables are important in the classification.
 It generates an internal unbiased estimate of the generalization error as the forest building
progresses.
 It has an effective method for estimating missing data and maintains accuracy when a large
proportion of the data are missing.
 It has methods for balancing error in class population unbalanced data sets.
 Generated forests can be saved for future use on other data.
 Prototypes are computed that give information about the relation between the variables and
the classification.
 It computes proximities between pairs of cases that can be used in clustering, locating
outliers, or (by scaling) give interesting views of the data.
 The capabilities of the above can be extended to unlabeled data, leading to unsupervised
clustering, data views and outlier detection.
 It offers an experimental method for detecting variable interactions.[4]
Support Vector Machines
A support vector machine (SVM) is a supervised machine learning model that uses classification
algorithms for two-group classification problems. After giving an SVM model sets of labeled
training data for each category, they’re able to categorize new text. Compared to newer algorithms
like neural networks, they have two main advantages: higher speed and better performance with a
limited number of samples (in the thousands). This makes the algorithm very suitable for text
classification problems, where it’s common to have access to a dataset of at most a couple of
thousands of tagged samples.[5]
How Does SVM Work?
The basics of Support Vector Machines and how it works are best understood with a simple
example. Let’s imagine we have two tags: red and blue, and our data has two features: x and y. We
want a classifier that, given a pair of (x,y) coordinates, outputs if it’s either red or blue. We plot
our already labeled training data on a plane:
A support vector machine takes these data points and outputs the hyperplane (which in two
dimensions it’s simply a line) that best separates the tags. This line is the decision boundary:
anything that falls to one side of it we will classify as blue, and anything that falls to the other
as red.
But, what exactly is the best hyperplane? For SVM, it’s the one that maximizes the margins from
both tags. In other words: the hyperplane (remember it's a line in this case) whose distance to the
nearest element of each tag is the largest.
Nonlinear data
Now this example was easy, since clearly the data was linearly separable — we could draw a
straight line to separate red and blue. Sadly, usually things aren’t that simple. Take a look at this
case:
It’s pretty clear that there’s not a linear decision boundary (a single straight line that separates both
tags). However, the vectors are very clearly segregated and it looks as though it should be easy to
separate them.
So here’s what we’ll do: we will add a third dimension. Up until now we had two
dimensions: x and y. We create a new z dimension, and we rule that it be calculated a certain way
that is convenient for us: z = x² + y² (you’ll notice that’s the equation for a circle).
This will give us a three-dimensional space. Taking a slice of that space, it looks like this:
What can SVM do with this? Let’s see:
That’s great! Note that since we are in three dimensions now, the hyperplane is a plane parallel
to the x axis at a certain z (let’s say z = 1).[5]
What’s left is mapping it back to two dimensions:
Advantages of SVM:
 Effective in high dimensional cases
 Its memory efficient as it uses a subset of training points in the decision function called
support vectors
 Different kernel functions can be specified for the decision functions and its possible to
specify custom kernels.[6]
ANOVA
Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed
aggregate variability found inside a data set into two parts: systematic factors and random factors.
The systematic factors have a statistical influence on the given data set, while the random factors
do not. Analysts use the ANOVA test to determine the influence that independent variables have
on the dependent variable in a regression study.
The t- and z-test methods developed in the 20th century were used for statistical analysis until
1918, when Ronald Fisher created the analysis of variance method. ANOVA is also called the
Fisher analysis of variance, and it is the extension of the t- and z-tests. The term became well-
known in 1925, after appearing in Fisher's book, "Statistical Methods for Research Workers."
The ANOVA test is the initial step in analyzing factors that affect a given data set. Once the test
is finished, an analyst performs additional testing on the methodical factors that measurably
contribute to the data set's inconsistency. The analyst utilizes the ANOVA test results in an f-test
to generate additional data that aligns with the proposed regression models.
The ANOVA test allows a comparison of more than two groups at the same time to determine
whether a relationship exists between them. The result of the ANOVA formula, the F statistic
(also called the F-ratio), allows for the analysis of multiple groups of data to determine the
variability between samples and within samples.
If no real difference exists between the tested groups, which is called the null hypothesis, the
result of the ANOVA's F-ratio statistic will be close to 1. The distribution of all possible values
of the F statistic is the F-distribution. This is actually a group of distribution functions, with two
characteristic numbers, called the numerator degrees of freedom and the denominator degrees of
freedom.[7]
The Formula for ANOVA is:
F=MSE/MST
Where:
F=ANOVA coefficient
MST=Mean sum of squares due to treatment
MSE=Mean sum of squares due to error
Method
A researcher might, for example, test students from multiple colleges to see if students from one
of the colleges consistently outperform students from the other colleges. In a business application,
an R&D researcher might test two different processes of creating a product to see if one process
is better than the other in terms of cost efficiency.
The type of ANOVA test used depends on a number of factors. It is applied when data needs to
be experimental. Analysis of variance is employed if there is no access to statistical software
resulting in computing ANOVA by hand. It is simple to use and best suited for small samples.
With many experimental designs, the sample sizes have to be the same for the various factor level
combinations.
ANOVA is helpful for testing three or more variables. It is similar to multiple two-sample t-tests.
However, it results in fewer type I errors and is appropriate for a range of issues. ANOVA groups
differences by comparing the means of each group and includes spreading out the variance into
diverse sources. It is employed with subjects, test groups, between groups and within groups.
One-Way ANOVA versus Two-Way ANOVA
There are two main types of ANOVA: one-way (or unidirectional) and two-way. There also
variations of ANOVA. For example, MANOVA (multivariate ANOVA) differs from ANOVA as
the former tests for multiple dependent variables simultaneously while the latter assesses only
one dependent variable at a time. One-way or two-way refers to the number of independent
variables in your analysis of variance test. A one-way ANOVA evaluates the impact of a sole
factor on a sole response variable. It determines whether all the samples are the same. The one-
way ANOVA is used to determine whether there are any statistically significant differences
between the means of three or more independent (unrelated) groups.
A two-way ANOVA is an extension of the one-way ANOVA. With a one-way, you have one
independent variable affecting a dependent variable. With a two-way ANOVA, there are two
independents. For example, a two-way ANOVA allows a company to compare worker
productivity based on two independent variables, such as salary and skill set. It is utilized to
observe the interaction between the two factors and tests the effect of two factors at the same
time.
Results and Conclusion:
RF Classifier SVM Classifier Logistic Regression
Classifier
1st Split 0.782051282051282 0.8717948717948718 0.782051282051282
2nd
Split 0.717948717948718 0.8076923076923077 0.7435897435897436
3rd
Split 0.8076923076923077 0.8717948717948718 0.8589743589743589
Average Value 76.92307692307692 % 85.04273504273505 % 79.48717948717949 %
And we have following results for ANOVA test:
The F value in one way ANOVA is a tool to help you answer the question “Is the variance between
the means of two populations significantly different?” The F value in the ANOVA test also
determines the P value; The P value is the probability of getting a result at least as extreme as the
one that was actually observed, given that the null hypothesis is true.
References:
[1].https://www.simplilearn.com/tutorials/machine-learning-tutorial/logistic-regression-in-python
[2]. https://www.jigsawacademy.com/blogs/data-science/logistic-regression/
[3].https://www.section.io/engineering-education/introduction-to-random-forest-in-machine-learning/
[4]. https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
[5]. https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/
[6]. https://www.geeksforgeeks.org/support-vector-machine-algorithm/
[7]. https://www.investopedia.com/terms/a/anova.asp

More Related Content

What's hot

Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbookRaman Kannan
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classificationSnehaDey21
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression TreesHemant Chetwani
 
SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSKAMIL MAJEED
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretizationKrish_ver2
 
Analysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data MiningAnalysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data Miningijdmtaiir
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forestsMarc Garcia
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization janani thirupathi
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Data Reduction
Data ReductionData Reduction
Data ReductionRajan Shah
 

What's hot (17)

Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbook
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
Data reduction
Data reductionData reduction
Data reduction
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODS
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Analysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data MiningAnalysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data Mining
 
Random forest
Random forestRandom forest
Random forest
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 4 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 4 Semester 3 MSc IT Part 2 Mumbai University
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 

Similar to 2018 p 2019-ee-a2

A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
 
Big Data Analytics.pptx
Big Data Analytics.pptxBig Data Analytics.pptx
Big Data Analytics.pptxKaviya452563
 
Introduction to Machine Learning Elective Course
Introduction to Machine Learning Elective CourseIntroduction to Machine Learning Elective Course
Introduction to Machine Learning Elective CourseMayuraD1
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionIOSR Journals
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
Student Performance Predictor
Student Performance PredictorStudent Performance Predictor
Student Performance PredictorIRJET Journal
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAEditor Jacotech
 

Similar to 2018 p 2019-ee-a2 (20)

A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Big Data Analytics.pptx
Big Data Analytics.pptxBig Data Analytics.pptx
Big Data Analytics.pptx
 
Introduction to Machine Learning Elective Course
Introduction to Machine Learning Elective CourseIntroduction to Machine Learning Elective Course
Introduction to Machine Learning Elective Course
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature Selection
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
classification.pptx
classification.pptxclassification.pptx
classification.pptx
 
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdfTop Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
 
PythonML.pptx
PythonML.pptxPythonML.pptx
PythonML.pptx
 
Student Performance Predictor
Student Performance PredictorStudent Performance Predictor
Student Performance Predictor
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
1376846406 14447221
1376846406  144472211376846406  14447221
1376846406 14447221
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCA
 

Recently uploaded

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 

Recently uploaded (20)

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 

2018 p 2019-ee-a2

  • 1. Machine Learning ASSIGNMENT 2  NAME: Faizan Arshad  REG.NO: 2018-P/2019-EE-139  SECTION: B SUBMITTED TO DR KASHIF JAVAID University of Engineering & Technology Lahore, Pakistan
  • 2. Logistic regression Logistic regression is a statistical method that is used for building machine learning models where the dependent variable is dichotomous: i.e. binary. Logistic regression is used to describe data and the relationship between one dependent variable and one or more independent variables. The independent variables can be nominal, ordinal, or of interval type. The name “logistic regression” is derived from the concept of the logistic function that it uses. The logistic function is also known as the sigmoid function. The value of this logistic function lies between zero and one. The following is an example of a logistic function we can use to find the probability of a vehicle breaking down, depending on how many years it has been since it was serviced last.[1] Advantages of the Logistic Regression Algorithm  Logistic regression performs better when the data is linearly separable  It does not require too many computational resources as it’s highly interpretable  There is no problem scaling the input features—It does not require tuning  It is easy to implement and train a model using logistic regression  It gives a measure of how relevant a predictor (coefficient size) is, and its direction of association (positive or negative) How Does the Logistic Regression Algorithm Work? The Sigmoid function (logistic regression model) is used to map the predicted predictions to probabilities. The Sigmoid function represents an ‘S’ shaped curve when plotted on a map. The graph plots the predicted values between 0 and 1. The values are then plotted towards the margins at the top and the bottom of the Y-axis, with the labels as 0 and 1. Based on these values, the target variable can be classified in either of the classes.
  • 3. The equation for the sigmoid function is given as: y=1/(1+e^x), Where e^x= the exponential constant with a value of 2.718. This equation gives the value of y(predicted value) close to zero if x is a considerable negative value. Similarly, if the value of x is a large positive value, the value of y is predicted close to one. A decision boundary can be set to predict the class to which the data belongs. Based on the set value, the estimated values can be classified into classes. For instance, let us take the example of classifying emails as spam or not. If the predicted value(p) is less than 0.5, then the email is classified spam and vice versa.[2] Types of logistic regression Logistic regression models are generally used for predictive analysis for binary classification of data. However, they can also be used for multi-class classification. Logistic regression models can be classified into three main logistic regression analysis categories. They are:  Binary Logistic Regression Model This is one of the most widely-used logistic regression models, used to predict and categorize data into either of the two classes. For example, a patient can have cancerous cells, or they cannot. The data can’t belong to two categories at the same time.  Multinomial Logistic Regression Model The multinomial logistic regression model is used to classify the target variable into multiple classes, irrespective of any quantitative significance. For instance, the type of food an individual is likely to order based on their diet preferences – vegetarians, non-vegetarians, and vegan.  Ordinal Logistic Regression Model The ordinal logistic regression model is used to classify the target variable into classes and also in order. For example, a pupil’s performance in an examination can be classified as poor, good, and excellent in a hierarchical order. Thus, we can see that the data is not only classified into three distinct categories, but each category has a unique level of importance. The logistic regression algorithm can be used in a plethora of cases such as tumor classification, spam detection, and sex categorization, to name a few. Let’s have a look at some logistic regression examples to get a better idea.[2] Random Forest A random forest is a supervised machine learning algorithm that is constructed from decision tree algorithms. This algorithm is applied in various industries such as banking and e-commerce to predict behavior and outcomes. A random forest algorithm consists of many decision trees. The ‘forest’ generated by the random forest algorithm is trained through bagging or bootstrap aggregating. Bagging is an ensemble meta algorithm that improves the accuracy of machine learning algorithms. A random forest eradicates the limitations of a decision tree algorithm. It
  • 4. reduces the overfitting of datasets and increases precision. It generates predictions without requiring many configurations in packages. Classification in random forests employs an ensemble methodology to attain the outcome. The training data is fed to train various decision trees. This dataset consists of observations and features that will be selected randomly during the splitting of nodes. A rain forest system relies on various decision trees. Every decision tree consists of decision nodes, leaf nodes, and a root node. The leaf node of each tree is the final output produced by that specific decision tree. The selection of the final output follows the majority-voting system. In this case, the output chosen by the majority of the decision trees becomes the final output of the rain forest system. The diagram below shows a simple random forest classifier. [3] Some of the applications of the random forest may include: 1. Banking 2. Stock market 3. E-Commerce 4. Health Care System Features of Random Forests  It is unexcelled in accuracy among current algorithms.  It runs efficiently on large data bases.  It can handle thousands of input variables without variable deletion.  It gives estimates of what variables are important in the classification.  It generates an internal unbiased estimate of the generalization error as the forest building progresses.  It has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing.  It has methods for balancing error in class population unbalanced data sets.  Generated forests can be saved for future use on other data.  Prototypes are computed that give information about the relation between the variables and the classification.  It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data.  The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection.  It offers an experimental method for detecting variable interactions.[4] Support Vector Machines A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they’re able to categorize new text. Compared to newer algorithms
  • 5. like neural networks, they have two main advantages: higher speed and better performance with a limited number of samples (in the thousands). This makes the algorithm very suitable for text classification problems, where it’s common to have access to a dataset of at most a couple of thousands of tagged samples.[5] How Does SVM Work? The basics of Support Vector Machines and how it works are best understood with a simple example. Let’s imagine we have two tags: red and blue, and our data has two features: x and y. We want a classifier that, given a pair of (x,y) coordinates, outputs if it’s either red or blue. We plot our already labeled training data on a plane: A support vector machine takes these data points and outputs the hyperplane (which in two dimensions it’s simply a line) that best separates the tags. This line is the decision boundary: anything that falls to one side of it we will classify as blue, and anything that falls to the other as red.
  • 6. But, what exactly is the best hyperplane? For SVM, it’s the one that maximizes the margins from both tags. In other words: the hyperplane (remember it's a line in this case) whose distance to the nearest element of each tag is the largest. Nonlinear data Now this example was easy, since clearly the data was linearly separable — we could draw a straight line to separate red and blue. Sadly, usually things aren’t that simple. Take a look at this case: It’s pretty clear that there’s not a linear decision boundary (a single straight line that separates both tags). However, the vectors are very clearly segregated and it looks as though it should be easy to separate them.
  • 7. So here’s what we’ll do: we will add a third dimension. Up until now we had two dimensions: x and y. We create a new z dimension, and we rule that it be calculated a certain way that is convenient for us: z = x² + y² (you’ll notice that’s the equation for a circle). This will give us a three-dimensional space. Taking a slice of that space, it looks like this: What can SVM do with this? Let’s see: That’s great! Note that since we are in three dimensions now, the hyperplane is a plane parallel to the x axis at a certain z (let’s say z = 1).[5] What’s left is mapping it back to two dimensions:
  • 8. Advantages of SVM:  Effective in high dimensional cases  Its memory efficient as it uses a subset of training points in the decision function called support vectors  Different kernel functions can be specified for the decision functions and its possible to specify custom kernels.[6] ANOVA Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed aggregate variability found inside a data set into two parts: systematic factors and random factors. The systematic factors have a statistical influence on the given data set, while the random factors do not. Analysts use the ANOVA test to determine the influence that independent variables have on the dependent variable in a regression study. The t- and z-test methods developed in the 20th century were used for statistical analysis until 1918, when Ronald Fisher created the analysis of variance method. ANOVA is also called the Fisher analysis of variance, and it is the extension of the t- and z-tests. The term became well- known in 1925, after appearing in Fisher's book, "Statistical Methods for Research Workers." The ANOVA test is the initial step in analyzing factors that affect a given data set. Once the test is finished, an analyst performs additional testing on the methodical factors that measurably
  • 9. contribute to the data set's inconsistency. The analyst utilizes the ANOVA test results in an f-test to generate additional data that aligns with the proposed regression models. The ANOVA test allows a comparison of more than two groups at the same time to determine whether a relationship exists between them. The result of the ANOVA formula, the F statistic (also called the F-ratio), allows for the analysis of multiple groups of data to determine the variability between samples and within samples. If no real difference exists between the tested groups, which is called the null hypothesis, the result of the ANOVA's F-ratio statistic will be close to 1. The distribution of all possible values of the F statistic is the F-distribution. This is actually a group of distribution functions, with two characteristic numbers, called the numerator degrees of freedom and the denominator degrees of freedom.[7] The Formula for ANOVA is: F=MSE/MST Where: F=ANOVA coefficient MST=Mean sum of squares due to treatment MSE=Mean sum of squares due to error Method A researcher might, for example, test students from multiple colleges to see if students from one of the colleges consistently outperform students from the other colleges. In a business application, an R&D researcher might test two different processes of creating a product to see if one process is better than the other in terms of cost efficiency. The type of ANOVA test used depends on a number of factors. It is applied when data needs to be experimental. Analysis of variance is employed if there is no access to statistical software resulting in computing ANOVA by hand. It is simple to use and best suited for small samples. With many experimental designs, the sample sizes have to be the same for the various factor level combinations. ANOVA is helpful for testing three or more variables. It is similar to multiple two-sample t-tests. However, it results in fewer type I errors and is appropriate for a range of issues. ANOVA groups differences by comparing the means of each group and includes spreading out the variance into diverse sources. It is employed with subjects, test groups, between groups and within groups. One-Way ANOVA versus Two-Way ANOVA There are two main types of ANOVA: one-way (or unidirectional) and two-way. There also variations of ANOVA. For example, MANOVA (multivariate ANOVA) differs from ANOVA as the former tests for multiple dependent variables simultaneously while the latter assesses only one dependent variable at a time. One-way or two-way refers to the number of independent variables in your analysis of variance test. A one-way ANOVA evaluates the impact of a sole factor on a sole response variable. It determines whether all the samples are the same. The one- way ANOVA is used to determine whether there are any statistically significant differences between the means of three or more independent (unrelated) groups.
  • 10. A two-way ANOVA is an extension of the one-way ANOVA. With a one-way, you have one independent variable affecting a dependent variable. With a two-way ANOVA, there are two independents. For example, a two-way ANOVA allows a company to compare worker productivity based on two independent variables, such as salary and skill set. It is utilized to observe the interaction between the two factors and tests the effect of two factors at the same time. Results and Conclusion: RF Classifier SVM Classifier Logistic Regression Classifier 1st Split 0.782051282051282 0.8717948717948718 0.782051282051282 2nd Split 0.717948717948718 0.8076923076923077 0.7435897435897436 3rd Split 0.8076923076923077 0.8717948717948718 0.8589743589743589 Average Value 76.92307692307692 % 85.04273504273505 % 79.48717948717949 % And we have following results for ANOVA test: The F value in one way ANOVA is a tool to help you answer the question “Is the variance between the means of two populations significantly different?” The F value in the ANOVA test also determines the P value; The P value is the probability of getting a result at least as extreme as the one that was actually observed, given that the null hypothesis is true.