Csci101 lect10 algorithms_iii

CSCI101
Algorithms III
(Searching, Clustering, Classification)

Overview
•Searching Algorithms
•Linear Search
•Binary Search
•Clustering Algorithms
•K-Means
•Classification Algorithms
•K-NN

Searching
● The process used to find the location of a target among a list of
objects
● Searching an array finds the index of first element in an array
containing that value

Linear Search(Sequential)
● Uses a loop to step through an array, starts with the first
element
● Compares each element with the value being searched for(Key),
and stops when either the value is found or the end of the array
is reached (when element not found).
● Since the array elements are stored in linear order, searching
the element in the linear order makes it easy and efficient.
●

Linear Search(Sequential)
● Advantages:
● Simple
● Easy to understand and implement
● Doesn't require the data in the array to be sorted
● Disadvantages:
● Poor efficiency: takes a lot of comparisons to find a key in
big files
● The performance of the algorithm scales linearly with the
size of the input array

Binary Search
● Sorted array searching algorithm
● Algorithm
1. The initial search region is the whole array.
2. Look at the data value in the middle of the search region
3. If you’ve found your target, stop
4. If your target is less than the middle data value, the new search
region is the lower half of the data
5. If your target is greater than the middle data value, the new
search region is the higher half of the data.
6. Continue from Step 2

Linear Search (Example)
Binary Search (Example)

Clustering
● Clustering is concerned with grouping together
objects that are similar to each other and
dissimilar to the objects belonging to other
clusters.
● Examples:
– In a medical application we might wish to find clusters of patients
with similar symptoms.
– In a document retrieval application we might wish to find clusters
of documents with related content.
– In an economics application we might be interested in finding
countries whose economies are similar.

K-Means Clustering
● k-means clustering is an exclusive clustering
algorithm. Each object is assigned to precisely
one of a set of clusters. (There are other methods
that allow objects to be in more than one
cluster.)
● For this method of clustering we start by deciding
how many clusters k we would like to form from
our data.
● The value of k is generally a small integer, such as
2, 3, 4 or 5, but may be larger.

The k-Means Clustering Algorithm
1. Choose a value of k.
2. Select k objects in an arbitrary fashion. Use
these as the initial set of k centroids.
3. Assign each of the objects to the cluster for
which it is nearest to the centroid.
4. Recalculate the centroids of the k clusters.
5. Repeat steps 3 and 4 until the centroids no
longer move.

Third Set of Clusters
These are the same clusters as before. Their centroids will be the
same as those from which the clusters were generated. Hence the
termination condition of the k-means algorithm has been met and
these are the final clusters produced by the algorithm for the initial
choice of centroids made.

Other points to consider
● Initial selection affect the K-Mean results
● Outliers should be removed first
● Normalize the data
● Euclidean distance does not make sense in
some cases, so select the proper closeness
measure.

Classification
● Classification is dividing up objects so that each is
assigned to one of a number of mutually
exhaustive and exclusive categories known as
classes.
● Examples:
– customers who are likely to buy or not buy a particular
product in a supermarket
– people who are at high, medium or low risk of acquiring a
certain illness
– people who closely resemble, slightly resemble or do not
resemble someone seen committing a crime

Classification Example
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Refund Marital
Status
Taxable
Income Cheat
No Single 75K ?
Yes Married 50K ?
No Married 150K ?
Yes Divorced 90K ?
No Single 40K ?
No Married 80K ?
10
Test
Set
Training
Set
Model
Learn
Classifier

K-Nearest Neighbor (K-NN) Classifier
The algorithm can be summarized as:
● A positive integer k is specified, along with a
new sample (k= 1, 3, 5)
● We select the k entries in our training data set
which are closest to the new sample
● We find the most common classification of
these entries
● This is the classification we give to the new
sample

Training Data Set
● Two classes
● Two attributes
● How to classify (9.1,11)

5-NN Classifier
● The five nearest
neighbours are labelled
with three + signs and two
− signs,
● so a basic 5-NN classifier
would classify the unseen
instance as ‘positive’ by a
form of majority voting.

Effect of K
https://medium.com/@adi.bronshtein/a-quick-introduction-to-k-nearest-neighbors-algorithm-62214cea29c7

Csci101 lect10 algorithms_iii

More Related Content

Similar to Csci101 lect10 algorithms_iii

More from Elsayed Hemayed

Recently uploaded

Csci101 lect10 algorithms_iii