Cis435 week04

Medians & Order Statistics
Data Structures & Algorithms

What are they?
 The ith order statistic of a set of n
elements is defined as the ith smallest
element in the set.
 E.g., the minimum order statistic is the first
order statistic, the max is the last order statistic
 The median is informally the “halfway”
point – there are one (if n is odd) or two (if
n is even)

This chapter deals with finding a
particular order statistic in a set
 We know we can use a sorting algorithm to
find an order statistic in O(nlog2n) time, by
sorting the data first
 There are faster algorithms, however, that
don’t require a sort

Minimum and Maximum
A basic algorithm is (O(n)) – just scan the set
and keep track of the smallest
 This is the best we can do – in order to find the
min (or max) we must compare every element;
since we’ve done this in O(n) time, you can’t get
any better
Type FindMin(Type data[], int length)
{
Type min = data[0];
for ( int i = 1 ; i < length ; ++i )
if ( data[i] < min ) min = data[i];
return min;
}

Simultaneous min and max
Sometimes its useful to come up with both at
the same time
We can run separate algorithms, or modify
the original to keep track of both
What about finding the second smallest
element?
Write an algorithm to compute the second smallest
element. How many comparisons are required?

Selection in Expected Linear Time
A general solution would be more useful
 It would seem like the type of problem that
would be hard to solve – at least O(nlog2n)
 In reality it can be accomplished in O(n),
using a divide-and-conquer algorithm

Randomized Select
Recall the Randomized QuickSort algorithm
 As in QuickSort, the idea is to partition the input
recursively
 Unlike QuickSort, Randomized Select only cares
about one of the partitions – the partition
containing the order statistic you’re looking for
 The expected running time for this algorithm is
O(n)
Randomized Select requires the Randomized
Partition algorithm previously discussed

The Randomized Select Algorithm
RandomizedSelect(A, begin, end, i)
{
if (begin == end ) return A[begin];
q = RandomizedPartition(A, begin, end)
k = q – begin + 1
if ( i <= k )
return RandomizedSelect(A, begin, q, i)
else
return RandomizedSelect(A, q+1, end, i-k)
}

First, partition the array
 This guarantees that all the elements in
A[begin..q] are less than all the elements in
A[q+1..end]
Then compute how many elements are in the
array A[begin..q]
 This is just q-begin+1 (since begin may be non-
zero)
 This also happens to be the order statistic of the
partition element

Because of the partitioning, we know which
partition the order statistic must be in
 If the order statistic is less than k, then recursively
search the left partition for order statistic i
 If the order statistic is to the right of k, then
recursively search the right partition for order
statistic i – k
 We already know that k values are smaller than the
smallest element in this partition
 We’re looking for the (i-k)th smallest element in that
partition

Analysis of Randomized Select
Worst case, O(n2)
 We could get unlucky and partition around
the largest or smallest remaining element
 This is unlikely, since it’s randomized
The average case is somewhat more
complicated (see the formula on 189)
but amounts to O(n)

Generic Programming
Generic programming is “programming using
types as parameters”
The idea of generic programming is to write
code that is data-type independent
 Many algorithms and data structures that we
discuss will operate independently of data type
 Generic programming provides a way of writing
the code once, then specifying the data type to
operate on later
Reference: The C++ Programming Language, by
Bjarne Stroustrup

Generic Programming in C++
The principle of generic programming in
C++ is implemented via templates
 Templates provide a way to represent a
wide range of general concepts and simple
ways to combine them

Template Functions
The C++ compiler deduces the template arguments
from the function arguments
Calling this function is the same as calling any other
function:
int some_min = FindMin(some_array, SIZE);
template <typename Type>
Type FindMin(Type data[], int length)
{
Type min = data[0];
for ( int i = 1 ; i < length ; ++i )
if ( data[i] < min ) min = data[i];
return min;
}

Template Functions
You are not limited to one template
parameter
 Multiple parameters are listed as a comma
separated list:
template <typename T, typename U> …
Template parameters aren’t even
limited to typenames:
template <typename T, int i> …

Template Functions
There are rare occasions when the compiler
cannot deduce the type of the template
argument
 E.g., when the argument is only used as a return
type
 In these cases, explicit specification can be used
template <typename T>
T* create()
{
return new T;
}
…
SomeClass *p = create<SomeClass>();

Template Functions
Template functions can also be overloaded,
both with other template functions and with
non-template functions
 This may be required in a number of situations:
 For some types, you can use a different (more efficient)
algorithm
 Multiple type deductions can be made, such that the
compiler can’t decide which version to use
template <typename T> T sqrt(T);
template <typename T> complex<T> sqrt(complex<T>);
double sqrt(double);

Template Classes
C++ also provides for template classes
template <typename T>
class SomeArray {
public:
SomeArray();
T& ItemAt(int index);
void SetItemAt(int index, const T& value);
…
private:
T m_data[SIZE];
};

Template Classes
Instantiating an object from a template class
takes a little more work
 You must specify the type
 The resulting object can be used like any other
object
SomeArray<double> array_of_doubles;
array_of_doubles.SetItemAt(0, 2.0);
double d = array_of_doubles.ItemAt(0);

Template Classes
Like function templates, template parameters
are not limited to only generic types
 Other data can also be provided:
template <typename Type, int Storage>
class SomeArray {
public:
SomeArray();
Type& ItemAt(int index);
…
private:
Type m_data[Storage];
};
…
SomeArray<double, 1000> array_of_doubles;

Some Cautions About
Templates
Templates provide a convenient way of writing an algorithm or
data structure only once
For each instantiated template, the compiler creates a separate
piece of compiled code
 E.g., SomeArray<double>, SomeArray<int>, and
SomeArray<SomeClass> creates three different implementations
of SomeArray in memory
 Templates are considered a primary contributor to code bloat
because of this property
Care should be taken in template classes to only include
methods that depend on the templated type
 Other functionality can be moved to a standalone function, or to a
non-template base class

Cis435 week04

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cis435 week04

Similar to Cis435 week04 (20)

More from ashish bansal

More from ashish bansal (13)

Recently uploaded

Recently uploaded (20)

Cis435 week04