2. Medians & Order Statistics
What are they?
The ith order statistic of a set of n
elements is defined as the ith smallest
element in the set.
E.g., the minimum order statistic is the first
order statistic, the max is the last order statistic
The median is informally the “halfway”
point – there are one (if n is odd) or two (if
n is even)
3. Medians & Order Statistics
This chapter deals with finding a
particular order statistic in a set
We know we can use a sorting algorithm to
find an order statistic in O(nlog2n) time, by
sorting the data first
There are faster algorithms, however, that
don’t require a sort
4. Minimum and Maximum
A basic algorithm is (O(n)) – just scan the set
and keep track of the smallest
This is the best we can do – in order to find the
min (or max) we must compare every element;
since we’ve done this in O(n) time, you can’t get
any better
Type FindMin(Type data[], int length)
{
Type min = data[0];
for ( int i = 1 ; i < length ; ++i )
if ( data[i] < min ) min = data[i];
return min;
}
5. Simultaneous min and max
Sometimes its useful to come up with both at
the same time
We can run separate algorithms, or modify
the original to keep track of both
What about finding the second smallest
element?
Write an algorithm to compute the second smallest
element. How many comparisons are required?
6. Selection in Expected Linear Time
A general solution would be more useful
It would seem like the type of problem that
would be hard to solve – at least O(nlog2n)
In reality it can be accomplished in O(n),
using a divide-and-conquer algorithm
7. Randomized Select
Recall the Randomized QuickSort algorithm
As in QuickSort, the idea is to partition the input
recursively
Unlike QuickSort, Randomized Select only cares
about one of the partitions – the partition
containing the order statistic you’re looking for
The expected running time for this algorithm is
O(n)
Randomized Select requires the Randomized
Partition algorithm previously discussed
8. The Randomized Select Algorithm
RandomizedSelect(A, begin, end, i)
{
if (begin == end ) return A[begin];
q = RandomizedPartition(A, begin, end)
k = q – begin + 1
if ( i <= k )
return RandomizedSelect(A, begin, q, i)
else
return RandomizedSelect(A, q+1, end, i-k)
}
9. The Randomized Select Algorithm
First, partition the array
This guarantees that all the elements in
A[begin..q] are less than all the elements in
A[q+1..end]
Then compute how many elements are in the
array A[begin..q]
This is just q-begin+1 (since begin may be non-
zero)
This also happens to be the order statistic of the
partition element
10. The Randomized Select Algorithm
Because of the partitioning, we know which
partition the order statistic must be in
If the order statistic is less than k, then recursively
search the left partition for order statistic i
If the order statistic is to the right of k, then
recursively search the right partition for order
statistic i – k
We already know that k values are smaller than the
smallest element in this partition
We’re looking for the (i-k)th smallest element in that
partition
11. Analysis of Randomized Select
Worst case, O(n2)
We could get unlucky and partition around
the largest or smallest remaining element
This is unlikely, since it’s randomized
The average case is somewhat more
complicated (see the formula on 189)
but amounts to O(n)
12. Generic Programming
Generic programming is “programming using
types as parameters”
The idea of generic programming is to write
code that is data-type independent
Many algorithms and data structures that we
discuss will operate independently of data type
Generic programming provides a way of writing
the code once, then specifying the data type to
operate on later
Reference: The C++ Programming Language, by
Bjarne Stroustrup
13. Generic Programming in C++
The principle of generic programming in
C++ is implemented via templates
Templates provide a way to represent a
wide range of general concepts and simple
ways to combine them
14. Template Functions
The C++ compiler deduces the template arguments
from the function arguments
Calling this function is the same as calling any other
function:
int some_min = FindMin(some_array, SIZE);
template <typename Type>
Type FindMin(Type data[], int length)
{
Type min = data[0];
for ( int i = 1 ; i < length ; ++i )
if ( data[i] < min ) min = data[i];
return min;
}
15. Template Functions
You are not limited to one template
parameter
Multiple parameters are listed as a comma
separated list:
template <typename T, typename U> …
Template parameters aren’t even
limited to typenames:
template <typename T, int i> …
16. Template Functions
There are rare occasions when the compiler
cannot deduce the type of the template
argument
E.g., when the argument is only used as a return
type
In these cases, explicit specification can be used
template <typename T>
T* create()
{
return new T;
}
…
SomeClass *p = create<SomeClass>();
17. Template Functions
Template functions can also be overloaded,
both with other template functions and with
non-template functions
This may be required in a number of situations:
For some types, you can use a different (more efficient)
algorithm
Multiple type deductions can be made, such that the
compiler can’t decide which version to use
template <typename T> T sqrt(T);
template <typename T> complex<T> sqrt(complex<T>);
double sqrt(double);
18. Template Classes
C++ also provides for template classes
template <typename T>
class SomeArray {
public:
SomeArray();
T& ItemAt(int index);
void SetItemAt(int index, const T& value);
…
private:
T m_data[SIZE];
};
19. Template Classes
Instantiating an object from a template class
takes a little more work
You must specify the type
The resulting object can be used like any other
object
SomeArray<double> array_of_doubles;
array_of_doubles.SetItemAt(0, 2.0);
double d = array_of_doubles.ItemAt(0);
20. Template Classes
Like function templates, template parameters
are not limited to only generic types
Other data can also be provided:
template <typename Type, int Storage>
class SomeArray {
public:
SomeArray();
Type& ItemAt(int index);
…
private:
Type m_data[Storage];
};
…
SomeArray<double, 1000> array_of_doubles;
21. Some Cautions About
Templates
Templates provide a convenient way of writing an algorithm or
data structure only once
For each instantiated template, the compiler creates a separate
piece of compiled code
E.g., SomeArray<double>, SomeArray<int>, and
SomeArray<SomeClass> creates three different implementations
of SomeArray in memory
Templates are considered a primary contributor to code bloat
because of this property
Care should be taken in template classes to only include
methods that depend on the templated type
Other functionality can be moved to a standalone function, or to a
non-template base class