This document contains legal notices and disclaimers for an Intel presentation. It states that the presentation is for informational purposes only and that Intel makes no warranties. It also notes that performance depends on system configuration and that sample source code is released under an Intel license agreement. Finally, it provides basic copyright information.
This slide represents topics on PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) where I tried to cover basic PCA, application, and use of PCA and SVD, Important keywords to know about PCA briefly, PCA algorithm and implementation, Basic SVD, SVD calculation, SVD implementation, Performance comparison of SVD and PCA regarding one publicly available dataset.
N.B. Information in this slide are gathered from
1. Machine Learning course by Andrew NG,
2. Mining of Massive Dataset | Stanford University | Artificial Intelligence - All in One (youtube channel)
3. and many more they are described in the slide.
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
This is the documentation of the study-meeting in lab.
Tha book title is "Hands-On Machine Learning with Scikit-Learn and TensorFlow" and this is the chapter 8.
This presentation briefly defines machine learning and its types of algorithms. After that two algorithms are presented. The first is naive bayes classifier for text classification and later k-means for clustering including some strategies to improve results.
In this presentation, we approach a two-class classification problem. We try to find a plane that separates the class in the feature space, also called a hyperplane. If we can't find a hyperplane, then we can be creative in two ways: 1) We soften what we mean by separate, and 2) We enrich and enlarge the featured space so that separation is possible.
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
This Support Vector Machine (SVM) presentation will help you understand Support Vector Machine algorithm, a supervised machine learning algorithm which can be used for both classification and regression problems. This SVM presentation will help you learn where and when to use SVM algorithm, how does the algorithm work, what are hyperplanes and support vectors in SVM, how distance margin helps in optimizing the hyperplane, kernel functions in SVM for data transformation and advantages of SVM algorithm. At the end, we will also implement Support Vector Machine algorithm in Python to differentiate crocodiles from alligators for a given dataset.
Below topics are explained in this Support Vector Machine presentation:
1. What is Machine Learning?
2. Why support vector machine?
3. What is support vector machine?
4. Understanding support vector machine
5. Advantages of support vector machine
6. Use case in Python
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, Naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Data Science - Part IX - Support Vector MachineDerek Kane
This lecture provides an overview of Support Vector Machines in a more relatable and accessible manner. We will go through some methods of calibration and diagnostics of SVM and then apply the technique to accurately detect breast cancer within a dataset.
Application of Machine Learning in AgricultureAman Vasisht
With the growing trend of machine learning, it is needless to say how machine learning can help reap benefits in agriculture. It will be boon for the farmer welfare.
This slide represents topics on PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) where I tried to cover basic PCA, application, and use of PCA and SVD, Important keywords to know about PCA briefly, PCA algorithm and implementation, Basic SVD, SVD calculation, SVD implementation, Performance comparison of SVD and PCA regarding one publicly available dataset.
N.B. Information in this slide are gathered from
1. Machine Learning course by Andrew NG,
2. Mining of Massive Dataset | Stanford University | Artificial Intelligence - All in One (youtube channel)
3. and many more they are described in the slide.
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
This is the documentation of the study-meeting in lab.
Tha book title is "Hands-On Machine Learning with Scikit-Learn and TensorFlow" and this is the chapter 8.
This presentation briefly defines machine learning and its types of algorithms. After that two algorithms are presented. The first is naive bayes classifier for text classification and later k-means for clustering including some strategies to improve results.
In this presentation, we approach a two-class classification problem. We try to find a plane that separates the class in the feature space, also called a hyperplane. If we can't find a hyperplane, then we can be creative in two ways: 1) We soften what we mean by separate, and 2) We enrich and enlarge the featured space so that separation is possible.
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
This Support Vector Machine (SVM) presentation will help you understand Support Vector Machine algorithm, a supervised machine learning algorithm which can be used for both classification and regression problems. This SVM presentation will help you learn where and when to use SVM algorithm, how does the algorithm work, what are hyperplanes and support vectors in SVM, how distance margin helps in optimizing the hyperplane, kernel functions in SVM for data transformation and advantages of SVM algorithm. At the end, we will also implement Support Vector Machine algorithm in Python to differentiate crocodiles from alligators for a given dataset.
Below topics are explained in this Support Vector Machine presentation:
1. What is Machine Learning?
2. Why support vector machine?
3. What is support vector machine?
4. Understanding support vector machine
5. Advantages of support vector machine
6. Use case in Python
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, Naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Data Science - Part IX - Support Vector MachineDerek Kane
This lecture provides an overview of Support Vector Machines in a more relatable and accessible manner. We will go through some methods of calibration and diagnostics of SVM and then apply the technique to accurately detect breast cancer within a dataset.
Application of Machine Learning in AgricultureAman Vasisht
With the growing trend of machine learning, it is needless to say how machine learning can help reap benefits in agriculture. It will be boon for the farmer welfare.
By popular demand, here is a case study of my first Kaggle competition from about a year ago. Hope you find it useful. Thank you again to my fantastic team.
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
This presentation on Machine Learning will help you understand what is clustering, K-Means clustering, flowchart to understand K-Means clustering along with demo showing clustering of cars into brands, what is logistic regression, logistic regression curve, sigmoid function and a demo on how to classify a tumor as malignant or benign based on its features. Machine Learning algorithms can help computers play chess, perform surgeries, and get smarter and more personal. K-Means & logistic regression are two widely used Machine learning algorithms which we are going to discuss in this video. Logistic Regression is used to estimate discrete values (usually binary values like 0/1) from a set of independent variables. It helps to predict the probability of an event by fitting data to a logit function. It is also called logit regression. K-means clustering is an unsupervised learning algorithm. In this case, you don't have labeled data unlike in supervised learning. You have a set of data that you want to group into and you want to put them into clusters, which means objects that are similar in nature and similar in characteristics need to be put together. This is what k-means clustering is all about. Now, let us get started and understand K-Means clustering & logistic regression in detail.
Below topics are explained in this Machine Learning tutorial part -2 :
1. Clustering
- What is clustering?
- K-Means clustering
- Flowchart to understand K-Means clustering
- Demo - Clustering of cars based on brands
2. Logistic regression
- What is logistic regression?
- Logistic regression curve & Sigmoid function
- Demo - Classify a tumor as malignant or benign based on features
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
We recommend this Machine Learning training course for the following professionals in particular:
1. Developers aspiring to be a data scientist or Machine Learning engineer
2. Information architects who want to gain expertise in Machine Learning algorithms
3. Analytics professionals who want to work in Machine Learning or artificial intelligence
4. Graduates looking to build a career in data science and Machine Learning
Learn more at: https://www.simplilearn.com/
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
5. Predict
Image Data
(Known)
Classify image or
identify if object is
in image
Image Data
(Unknown)
Fit
What is Supervised Learning?
?
Model
Trained
Model
Trained
Model
6. Predict
Predict if there is
a person in this
photo (yes | no)
Fit
Classification: Answers are Categories
Person
?
Model
Trained
Model
Trained
Model
7. Predict
Predict what time of
day it is
(Morning, Afternoon, Evening)
Fit
Classification: Answers are Categories
Morning Afternoon Evening
?
Model
Trained
Model
Trained
Model
9. Classification: Answers are Categories
Predict
Fit
?
6
3 2
4
5 7 8
1
Labeled Image
Unlabeled Image
Label
Unknown
Image
Model
Trained
Model
Trained
Model
10. Example Each data point
(one row)
Target Predicted property
(column to predict)
Feature A property of the point used for
prediction
(non-target columns in the model)
Label Target / category of the point
(value of target column)
Learning Terminology
Feature 1 Feature 2 Target
Example 1 Label 1
Example2 Label 2
11. Predict
Model
Predict if there is
a person in this
photo (yes | no)
Model
Model
Fit
Let's Revisit Predicting a Person
Person
?
12. Example 1, 2, 3
Target Yes, Yes, No
Feature Skin-like color
People-like shapes
Label Yes
No
Learning Terminology
Skin-Like Color People-Like
Shapes
Target
Example 1 Yes
Example 2 Yes
Example 3 No
13. Features and Labels
In this case we have:
Two features
• Skin-like color
• People-like shapes
Two labels
• yes
• no 0 10 20
Target
Label
Label
Example 1
Example 2
Example 3
yes
no
# Rectangular Shapes Detected
#PixelsinSkinHue
14. Features and Labels
In this case we have:
Two features
• Skin-like color
• People-like shapes
Two labels
• Yes
• No
0 10 20
# Rectangular Shapes Detected
15. Features and Labels
In this case we have:
Two features
• Skin-like color
• People-like shapes
Two labels
• Yes
• No
New Example
known: #nodes
predict: yes | no
0 10 20
# Rectangular Shapes Detected
16. # Rectangular Shapes Detected
0 10 20
#PixelsinSkinHue
60
40
20
Features and Labels
17. Features and Labels
In this case we have:
Two features
• Skin-like color
• People-like shapes
Two labels
• Yes
• No 0 10 20
60
40
20
# Rectangular Shapes Detected
#PixelsinSkinHue
18. Features and Labels
In this case we have:
Two features
• Skin-like color
• People-like shapes
Two labels
• Yes
• No 0 10 20
60
40
New Example
known: #skin color, shapes
predict: yes | no # Rectangular Shapes Detected
#PixelsinSkinHue
19. Predict
Model
Is this city Pittsburgh?
What time of day is it?
(Morning, Afternoon, Evening)
Model
Model
Fit
Let's Predict City and Time of Day
Morning Afternoon Evening
?
20. Degree of Illumination
0 10 20
Corners
60
40
20
Features and Labels
In this case we have:
Two features
• Illumination
• # corners detected against
background
Three labels
• Morning
• Afternoon
• Evening
21. 0 10 20
60
40
20
New Example
known: #illumination, corners
predict:
morning| afternoon | evening
Degree of Illumination
0 10 20
Corners
60
40
20
Features and Labels
In this case we have:
Two features
• Illumination
• # corners detected against
background
Three labels
• Morning
• Afternoon
• Evening
22.
23. Evaluation Techniques
Last week we looked at evaluation metrics; now we'll look at effectively estimating these
for a learner:
• Train test
• Cross validation
• LOO (leave one out)
25. Training and Test Sets
Training set (actual)
• Fit the model
Test set (predicted)
• Measure performance
Predict y with model
Compare with actual y
Measure error
26. Training and Test Sets
From simple workflow in notebook:
import sklearn.model_selection as skms
(data_train, data_tst, tgts_train, tgts_tst) = skms.train_test_split(data, tgts, test_size=.2)
from sklearn import neighbors
# create and fit modelknn_classifier = neighbors.KNeighborsClassifier(n_neighbors=3)
knn_classifier.fit(data_train, tgts_train)
predicted = knn_classifier.predict(data_tst)
Accuracy = skms.accuracy(predicted, tgts_test)
27. Cross-Validation
Folds do not overlap
Each fold is the same size
Every example is in a fold
Fold 1
Fold 2
Image # Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Fold 3
29. Leave One Out Cross-Validation
Removes one datapoint from the dataset at
a time.
Train from the N-1 datapoints
• N = number of datapoints
• In this example, N = 24
This is repeated for each datapoint in the
set.
You can also think of the as N-fold cross
validation
• That is, 24 datapoints is 24-folds
Image # Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
30.
31. Learning Methods
Supervised learning techniques
• K nearest neighbor (KNN)
• Support vector machines (SVM)
Unsupervised learning techniques
• Principal components analysis (PCA)
• Clustering
32. K Nearest Neighbor (KNN)
KNN
• Memorize the dataset
For prediction, find example most like me
• Without preprocessing, fit is instant – but prediction takes more work.
• With preprocessing, spend some up-front cost to organize the data, then prediction
is (relatively) faster.
• Requires significant memory because it saves some form of the entire dataset.
33. K Nearest Neighbor (KNN)
K= # nearest neighbors used for predictor
0 10 20
Feature
Predict Node
Nearest Neighbor
1
2
K = 3
3
34. KNN
K = 3
Predict the node through
its nearest neighbors
0 10 20
60
40
20
Predict Node
# Rectangular Shapes Detected
#PixelsinSkinHue
35. KNN
Use KNN to make a decision boundary
If the node falls to the right
• Decision = male (red)
If node falls to the left
• Decision = female (blue)
Note false positive and negatives.
0 10
.
40
20
Predict Node
# Rectangular Shapes Detected
#PixelsinSkinHue
36. KNN
Multiclass decision boundaries break the data into three or more subsets.
Scaling is critical for determining the best decision boundary.
• If data is scaled too closely together in one dimension, examples may be too
similar distance-wise compared to actual class similarity.
The number of nearest neighbors used for analysis is also critical.
• Too small a K will give too wiggly a border.
Example of overfitting or following noise
• Too big K will give too flat a border.
Example of underfitting ignoring signal
37. Support Vector Machines
Used for discriminative classification
Divides classes of data into groups by drawing a border between them.
• Unlike KNN, it goes straight to the decision boundary
• Remembers boundary, throws out data
Is a maximum margin estimator.
• The border between classes that maximizes the distance to examples from that
class
39. SVM: Linear Boundaries
So, let's use a margin around the linear
boundary.
• Maximum-margin is SVM's preferred line
The examples on the margin lines are
support vectors.
Now, the only important examples to
keep track of are the support vectors.
For new data, see which side of the line it
falls on.
0 10
40
20
40. SVM: Kernel
How would you separate this data?
Not with a line.
Need to get fancy.
Use linear algebra tricks to rewrite this data in
more complicated terms:
• Polynomials of the inputs
• Distances from one input to another
• These are called kernels
41. Non-Linear Data
You must pick the right projection to separate the classes.
Kernel transformation computes the basis function for every example.
• Can be too cumbersome when you get large datasets
• Instead, we can use the kernel trick
Implicitly fit kernel-transformed examples
Does not build explicit (expensive) table
44. Principal Components Analysis (PCA)
What are we trying to achieve?
• Improve clustering
• Improve classification
• Dealing with sparse features
• Visualizing high-dimensional 2D or 3D data
• Minimal loss data compression
45. PCA: 2D to 1D
You can weight your existing dimensions (2D) to give you 1D data.
• Data as a point-cloud
• Fit ellipse
• Find perpendicular axes
Directions of maximum variation
• Describe data in terms of these directions
Orient the data in a new way
46. PCA: 2D to 1D
You can weight your existing dimensions (2D) to give you 1D data.
51. PCA: 3D to 2D
Math speak:
• The new axes are eigenvectors
• The new axes are calculated from
covariance of the matrix features
via a singular value decomposition
53. Eigenfaces: A PCA Example
PCA data
New freduced dimensional plot
• Removed some number of features from our representation
• Redescribe original faces in terms of these primitive faces
that act like axes
• Similar to redescribing images in terms of circles (which we
saw with Fourier transforms)
54. PCA
When choosing dimensions for your feature extraction:
• Think about combining features to reduce dimensionality.
That is, instead of analyzing pixel intensity at each color (RGB), can you analyze pixel
intensity (grayscale)?
• You will need to perform hypothesis-driven trials and measure your model’s
performance.
55. Performing PCA Using scikit-learn*
from sklearn.decomposition import PCA
reducer = PCA( n_components = 20 )
reduced_X = reducer.fit_transform(X)
# “model” could be any trained model
model.fit(reduced_X,Y)
56. Performing PCA Using scikit-learn*
When you need to predict:
reduced_X_new = reducer.transform(X_new)
model.predict(reduced_X_new)
# “model” is the same model from the prior slide; could be any model.
65. K-Means Algorithms
Each point belongs to the new
cluster’s mean.
The points don’t change anymore.
They are converged!
# Rectangular Shapes Detected
#PixelsinSkinHue
66. K-Means Algorithms
K = 3 (find three clusters)
Randomly assign three cluster
centers.
# Rectangular Shapes Detected
#PixelsinSkinHue
69. K-Means Algorithms
Each point belongs to the new
cluster’s mean.
The points don’t change anymore.
They are converged!
# Rectangular Shapes Detected
#PixelsinSkinHue
70. Other scikit-learn* Tools
GridSearch
• Specify a space of parameters and evaluate model(s) at those params.
Pipelines
• Create a built/fit/predictable sequence of steps.
• Fitting w.ill fit each of the components in turn
73. BOW Process: Step 1
Take known image and find descriptors of keypoints.
Make a table with all information about keypoints.
• Number of columns are fixed between images.
Columns are how we describe a keypoint
• Number of rows vary between images.
Rows represent number of keypoints
74. BOW Process: Step 1
Take known image and find descriptors of keypoints.
• Creating local words
Description of Keypoint (columns; fixed #)
#ofKeypoints(rows;varies)
75. BOW Process: Step1
Repeat for each image in your dataset. Description of Keypoint (columns; fixed #)
#ofKeypoints(rows;varies)
76. BOW Process: Step 2
Combine ALL individual descriptors.
• Local vocabularies
• Put the rows in space and group clusters of data
Group similar descriptors together
• # of clusters is the number of global words to be used for further analysis (current global
vocabulary: 1, 2, 3, 4, 5, 6, 7).
77. BOW Process: Step 2
Convert all local words to same number of global words.
Description of Keypoint (local)
#ofKeypoints(Local)
7
6
5
3
6
1
Represents ONE and only ONE row
But it represents the ENTIRE row
78. BOW Process: Step 3
Redescribe known images in terms of the global vocabulary.
Generate histograms with counts of global words.
And likewise for other colors, empty cells are zero.
1 1 2
g1 g2 g3 g4 g5 g6 g7
Image1
Image2
Image3
Image4
Image5
Image6
Airplane
Airplane
Zebra
Zebra
Piano
Piano
7
7
5
3
79. BOW Process: Step 4
Build an SVM model predicting from histograms (represented in rows) object class
Airplane Airplane Zebra Zebra Piano Piano
80. BOW Process: Step 5
With a new example:
• Describe with regional vocabulary
• Convert to global vocabulary
Use the regional word most similar to the global word
• Create histogram representation
• Feed to SVM to make prediction about class
Imagine the brown zebra example without a known label.
81. BOW Details
Local vocabularies come from feature descriptors
• SIFT, ORB, and so on
Global vocabulary comes from clustering the local vocabularies
• Convert regional to global by looking up cluster (global term) for a local descriptor
• K-means clustering
Note: These are conceptual examples and not actual examples.
Images are all from Wikimedia.
[see: model_selection_ii.pptx]
Note: Red and blue represent two classes here; not necessarily “yes” and “no”.
Note: the decision boundary is a hypothetical scenario where we have test examples from *everywhere* in the input space. The boundary is the result of classifying each of those input examples.
In some high-D space, that ellipse is “just” a plane-like thing (flat boundary).
This is a “bad” line because the sum of (normal/perpendicular) distances from points to line is very big!
The “right” PCA line will minimize the orthogonal (normal/perpendicular) distances from the examples to their mapping on the line.
Note: Red points represent data that is fit to the new 2D structure.
Examples are in notebook.
https://kushalvyas.github.io/BOV.html
Image url?
Emphasize that browns are all airplanes (different tone is different specific image). Color is class. Tone is instance of class.
- Image url?
Global vocabulary is (1) concise (only small N # of terms), and (2) comparable between different images
[we still have to *use* that vocabulary to describe our images somehow [convert many-row description to single row description].