The document discusses different types of searching algorithms. It describes sequential search which searches an unsorted list sequentially until the target item is found or the entire list is searched. The average runtime is O(n) as the item could be anywhere in the list. Binary search is described as more efficient for sorted lists, repeatedly dividing the search space in half and comparing the target to the middle element. Indexes are also summarized, including clustered vs unclustered indexes and different approaches to storing data entries.
2. Searching Algorithms
• Necessary components to search a list of data
– Array containing the list
– Length of the list
– Item for which you are searching
• After search completed
– If item found, report “success,” return location in array
– If item not found, report “not found” or “failure”
Trupti Agrawal 2
3. Searching Algorithms (Cont’d)
• Suppose that you want to determine whether 27 is in the list
• First compare 27 with list[0]; that is, compare 27 with 35
• Because list[0] ≠ 27, you then compare 27 with list[1]
• Because list[1] ≠ 27, you compare 27 with the next element in
the list
• Because list[2] = 27, the search stops
• This search is successful!
FigTruuprtei A g1ra: wAalrray list with seven (07) elements 3
4. Searching Algorithms (Cont’d)
Let’s now search for 10
The search starts at the first element in the list; that is, at
list[0]
Proceeding as before, we see that this time the search item,
which is 10, is compared with every item in the list
Eventually, no more data is left in the list to compare with
the search item; this is an unsuccessful search
Trupti Agrawal 4
5. Sequential Search Algorithm
The previous could be further reduced to:
public static int linSearch(int[] list, int listLength, int key) {
int loc;
boolean found = false;
for(int loc = 0; loc < listLength; loc++) {
if(list[loc] == key) {
found = true;
break;
}
}
if(found)
return loc;
else
return -1;
}
Trupti Agrawal 5
6. Sequential Search Algorithm (Cont’d)
public static int linSearch(int[] list, int listLength, int key) {
int loc;
for(int loc = 0; loc < listLength; loc++) {
if(list[loc] == key)
return loc;
}
return -1;
}
Trupti Agrawal 6
7. Sequential Search Algorithm (Cont’d)
• Using a while (or a for) loop, the definition of the method
seqSearch can also be written without the break statement as:
public static int linSearch(int[] list, int listLength, int key) {
int loc = 0;
boolean found = false;
while(loc < listLength && !found) {
if(list[loc] == key)
found = true;
else
loc++
}
if(found)
return loc;
else
return -1;
}
Trupti Agrawal 7
8. Performance of the Sequential Search
• Suppose that the first element in the array list contains the
variable key, then we have performed one comparison to find
the key.
• Suppose that the second element in the array list contains the
variable key, then we have performed two comparisons to find
the key.
• Carry on the same analysis till the key is contained in the last
element of the array list. In this case, we have performed N
comparisons (N is the size of the array list) to find the key.
• Finally if the key is NOT in the array list, then we would have
performed N comparisons and the key is NOT found and we
would return -1.
Trupti Agrawal 8
9. Performance of the Sequential Search (Cont’d)
• Therefore, the best case is: 1
• And, the worst case is: N
• The average case is:
1 + 2 + 3 + …..+ N + N
N+1
Best case
Average Number of
Comparisons
Worst case and key found at the end of
the array list!
Worst case and key is NOT found!
=
Number of possible cases
Trupti Agrawal 9
10. Binary Search Algorithm
Can only be performed on a sorted list !!!
Uses divide and conquer technique to search list
Trupti Agrawal 10
11. Binary Search Algorithm (Cont’d)
Search item is compared with middle element of list
If search item < middle element of list, search is restricted
to first half of the list
If search item > middle element of list, search second half
of the list
If search item = middle element, search is complete
Trupti Agrawal 11
12. Binary Search Algorithm (Cont’d)
• Determine whether 75 is in the list
Figure 2: Array list with twelve (12) elements
Figure 3: Search list, list[0] … list[11]
Trupti Agrawal 12
14. Binary Search Algorithm (Cont’d)
public static int binarySearch(int[] list, int listLength, int key) {
int first = 0, last = listLength - 1;
int mid;
boolean found = false;
while (first <= last && !found) {
mid = (first + last) / 2;
if (list[mid] == key)
found = true;
else
if(list[mid] > key)
last = mid - 1;
else
first = mid + 1;
}
if (found)
return mid;
else
return –1;
} //end binarySearch
Trupti Agrawal 14
16. Binary Search Algorithm (Cont’d)
key = 22
Figure 6: Sorted list for binary search
Trupti Agrawal 16
17. Indexed Search
• Indexes: Data structures to organize records to
optimize certain kinds of retrieval operations.
o Speed up searches for a subset of records, based on values in
certain (“search key”) fields
o Updates are much faster than in sorted files.
Trupti Agrawal 17
18. Alternatives for Data Entry k* in
Index
Data Entry : Records stored in index file
Given search key value k, provide for efficient retrieval of
all data entries k* with value k.
In a data entry k* , alternatives include that we can store:
alternative 1: Full data record with key value k, or
alternative 2: <k, rid of data record with search key value k>, or
alternative 3: <k, list of rids of data records with search key k>
Choice of above 3 alternative data entries is orthogonal to indexing
technique used to locate data entries.
Example indexing techniques: B+ trees, hash-based structures, etc.
Trupti Agrawal 18
19. Alternatives for Data Entries
Alternative 1: Full data record with key value k
Index structure is file organization for data records (instead
of a Heap file or sorted file).
At most one index on a given collection of data records
can use Alternative 1.
Otherwise, data records are duplicated, leading to
redundant storage and potential inconsistency.
If data records are very large, this implies size of auxiliary
information in index is also large.
Trupti Agrawal 19
20. Alternatives for Data Entries
Alternatives 2 (<k, rid>) and 3 (<k, list-of-rids>):
Data entries typically much smaller than data records.
Comparison:
Both better than Alternative 1 with large data records,
especially if search keys are small.
Alternative 3 more compact than Alternative 2,
but leads to variable sized data entries even if search keys
are of fixed length.
Trupti Agrawal 20
21. Index Classification
Clustered vs. unclustered index :
If order of data records is the same as, or `close to’,
order of data entries, then called clustered index.
Trupti Agrawal 21
22. Index Clustered vs Unclustered
Observation 1:
Alternative 1 implies clustered. True ?
Observation 2:
In practice, clustered also implies Alternative 1 (since
sorted files are rare).
Observation 3:
A file can be clustered on at most one search key.
Observation 4:
Cost of retrieving data records through index varies
greatly based on whether index is clustered or not !!
Trupti Agrawal 22
23. Index Clustered vs Unclustered
Observation 1:
Alternative 1 implies clustered. True ?
Observation 2:
In practice, clustered also implies Alternative 1 (since
sorted files are rare).
Observation 3:
A file can be clustered on at most one search key.
Observation 4:
Cost of retrieving data records through index varies
greatly based on whether index is clustered or not !!
Trupti Agrawal 23
24. Clustered vs. Unclustered Index
Index entries
CLUSTERED direct search for
UNCLUSTERED
data entries
Data entries
(Index File)
(Data file)
Data Records
Data entries
Data Records
Suppose Alternative (2) is used for data entries.
Trupti Agrawal 24
25. Clustered vs. Unclustered Index
Use Alternative (2) for data entries
Data records are stored in Heap file.
To build clustered index, first sort the Heap file
Overflow pages may be needed for inserts.
Thus, order of data recs is close to (not identical to)
sort order.
Index entries
CLUSTERED direct search for
UNCLUSTERED
data entries
Data entries
(Index File)
(Data file)
Data Records
Data entries
Data Records
Trupti Agrawal 25
26. Summary of Index Search
Many alternative file organizations exist, each appropriate in
some situation.
If selection queries are frequent, sorting the file or building an
index is important.
Hash-based indexes only good for equality search.
Sorted files and tree-based indexes best for range search; also
good for equality search.
Files rarely kept sorted in practice; B+ tree index is better.
Index is a collection of data entries plus a way to quickly find
entries with given key values.
Trupti Agrawal 26
27. Summary of Index Search
Data entries can be :
actual data records,
<key, rid> pairs, or
<key, rid-list> pairs.
Can have several indexes on a given file of data
records, each with a different search key.
Indexes can be classified as clustered vs. unclustered,
Differences have important consequences for
utility/performance of query processing
Trupti Agrawal 27