DataStructures
Basic Terminologies& Asymptotic
Notations
1
DataStructures
“Clever” waysto organizeinformation in order to enable
efficient computation
– What do wemean by clever?
– What do wemean by efficient?
2
Basic Terminologies & Asymptotic
Notations
Picking thebest
DataStructurefor thejob
• Thedatastructureyou pick needsto suppo rt the
operationsyou need
• Ideally it supportstheoperationsyou will usemost
often in an efficient manner
• Examplesof operations:
– A List with operationsinsert and delete
– A Stack with operationspushand pop
3
Basic Terminologies & Asymptotic
Notations
Terminology
• Abstract DataType(ADT)
– Mathematical description of an object with set of
operationson theobject. Useful building block.
• Algorithm
– A high level, languageindependent, description of
astep-by-step process
• Datastructure
– A specific family of algorithmsfor implementing
an abstract datatype.
• Implementation of datastructure
– A specific implementation in aspecific language
4
Basic Terminologies & Asymptotic
Notations
Terminology
• Data
Datarefersto valueor set of values.
e.g.Marksobtained by thestudents.
• Datatype
datatypeisaclassification identifying oneof varioustypes
of data, such as floating-point, integer, or Boolean, that
determinesthepossiblevaluesfor that type; theoperations
that can bedoneon valuesof that type; and theway values
of that typecan bestored
Data Structures - Introduction 5
Terminology
• Primitivedatatype:
Thesearebasic datatypesthat areprovided by the
programming languagewith built-in support. Thesedata
typesare nativeto thelanguage. Thisdatatypeis
supported by machinedirectly
• Variable
Variableisasymbolic namegiven to someknown or
unknown quantity or information, for thepurposeof
allowing thenameto beused independently of the
information it represents.
Data Structures - Introduction 6
Terminology
• Record
Collection of related dataitemsisknown asrecord. The
elementsof recordsareusually Called fieldsor members.
Recordsare distinguished from arraysby thefact that
their number of fieldsistypically fixed, each field hasa
name, and that each field may haveadifferent type.
• Program
A sequenceof instructionsthat acomputer can
interpret and execute.
Data Structures - Introduction 7
Terminology examples
• A stack isan abstract data type supporting push, pop and
isEmpty operations
• A stack data structure could usean array, alinked list, or
anything that can hold data
• Onestack implementatio n isjava.util.Stack; another is
java.util.LinkedList
8
Basic Terminologies & Asymptotic
Notations
Concepts vs. Mechanisms
• Abstract
• Pseudocode
• Algorithm
– A sequenceof high-level,
languageindependent
operations, which may act
upon an abstracted view
of data.
• Abstract DataType(ADT)
– A mathematical
description of an object
and theset of operations
on theobject.
• Concrete
• Specific programming language
• Program
– A sequenceof operationsin a
specific programming language,
which may act upon real datain
theform of numbers, images,
sound, etc.
• Datastructure
– A specific way in which a
program’sdataisrepresented,
which reflectsthe
programmer’sdesign
choices/goals.
9
Why So Many DataStructures?
Ideal datastructure:
“fast”, “elegant”, memory efficient
Generatestensions:
– timevs. space
– performancevs. elegance
– generality vs. simplicity
– oneoperation’sperformancevs. another’s
The study o f data structures is the study
o f tradeo ffs. That’ s why we have so
many o f them!
10Basic Terminologies & Asymptotic
Notations
DataStructures
Asymptotic Analysis
11
Basic Terminologies & Asymptotic
Notations
Algorithm Analysis: Why?
• Correctness:
– Doesthealgorithm do what isintended.
• Performance:
– What istherunning timeof thealgorithm.
– How much storagedoesit consume.
• Different algorithmsmay becorrect
– Which should I use?
12
Basic Terminologies & Asymptotic
Notations
Recursivealgorithm for sum
• Writearecursive function to find thesum of thefirst n
integersstored in array v.
13
Proof by Induction
• Basis Step: Thealgorithm iscorrect for abasecaseor
two by inspection.
• Inductive Hypothesis (n=k): Assumethat the
algorithm workscorrectly for thefirst k cases.
• Inductive Step (n=k+1): Given thehypothesisabove,
show that thek+1 casewill becalculated correctly.
14
Program Correctnessby Induction
• Basis Step:
sum(v,0) = 0. 
• Inductive Hypothesis (n=k):
Assumesum(v,k) correctly returnssum of first k
elementsof v, i.e. v[0]+v[1]+…+v[k-1]+v[k]
• Inductive Step (n=k+1):
sum(v,n) returns
v[k]+sum(v,k-1)= (by inductive hyp.)
v[k]+(v[0]+v[1]+…+v[k-1])=
v[0]+v[1]+…+v[k-1]+v[k]  15
AlgorithmsvsPrograms
• Proving correctnessof an algorithm isvery important
– awell designed algorithm isguaranteed to work
correctly and itsperformancecan beestimated
• Proving correctnessof aprogram (an implementation) is
fraught with weird bugs
– Abstract DataTypesareaway to bridgethegap
between mathematical algorithmsand programs
16
Comparing Two Algorithms
GOAL: Sort alist of names
“I’ll buy afaster CPU”
“I’ll useC++ instead of Java– wicked fast!”
“Ooh look, the–O4 flag!”
“Who careshow I do it, I’ll add morememory!”
“Can’t I just get thedatapre-sorted??”
17
Comparing Two Algorithms
• What wewant:
– Rough Estimate
– IgnoresDetails
• Really, independent of details
– Coding tricks, CPU speed, compiler optimizations, …
– Thesewould help any algorithmsequally
– Don’t just careabout running time– not agood
enough measure
18
Big-O Analysis
• Ignores“details”
• What details?
– CPU speed
– Programming languageused
– Amount of memory
– Compiler
– Order of input
– Sizeof input … sorta.
19
Analysisof Algorithms
• Efficiency measure
– how long theprogram runs timecomplexity
– how much memory it uses spacecomplexity
• Why analyzeat all?
– Decidewhat algorithm to implement beforeactually
doing it
– Given code, get asensefor wherebottlenecksmust be,
without actually measuring it
20
Asymptotic Analysis
• Complexity asafunction of input sizen
T(n) = 4n + 5
T(n) = 0.5 n log n - 2n + 7
T(n) = 2n
+ n3
+ 3n
• What happens as n gro ws?
21
Why Asymptotic Analysis?
• Most algorithmsarefast for small n
– Timedifferencetoo small to benoticeable
– External thingsdominate(OS, disk I/O, …)
• BUT n isoften largein practice
– Databases, internet, graphics, …
• Differencereally showsup asn grows!
22
Exercise- Searching
bool ArrayFind( int array[], int n, int key){
// Insert your algorithm here
2 3 5 16 37 50 73 75 126
What algo rithm wo uld yo u
cho o se to implement this
co de snippet?
23
Analyzing Code
Basic Java operations
Consecutive statements
Conditionals
Loops
Function calls
Recursive functions
Constant time
Sum of times
Larger branch plustest
Sum of iterations
Cost of function body
Solverecurrencerelation
24
Linear Search Analysis
bool LinearArrayFind(int array[],
int n,
int key ) {
for( int i = 0; i < n; i++ ) {
if( array[i] == key )
// Found it!
return true;
}
return false;
}
Best Case:
Worst Case:
25
Binary Search Analysis
bool BinArrayFind( int array[], int low,
int high, int key ) {
// Thesubarray isempty
if( low > high ) return false;
// Search thissubarray recursively
int mid = (high + low) / 2;
if( key == array[mid] ) {
return true;
} elseif( key < array[mid] ) {
return BinArrayFind( array, low,
mid-1, key );
} else{
return BinArrayFind( array, mid+1,
high, key );
Best case:
Worst case:
26
Solving RecurrenceRelations
1. Determinetherecurrencerelation. What is/arethebase
case(s)?
2. “Expand” theoriginal relation to find an equivalent general
expression in terms o f the number o f expansio ns.
3. Find aclosed-form expression by setting the number o f
expansio ns to avaluewhich reducestheproblem to abase
case
27
DataStructures
Asymptotic Analysis
28
Linear Search vsBinary Search
Linear Search Binary Search
Best Case 4 at [0] 4 at [middle]
Worst Case 3n+2 4 log n + 4
So … which algo rithm is better?
What tradeo ffs can yo u make?
29
Fast Computer vs. Slow Computer
30
Fast Computer vs. Smart Programmer (round
1)
31
Fast Computer vs. Smart Programmer
(round 2)
32
Asymptotic Analysis
• Asymptotic analysislooksat theo rder of therunning
timeof thealgorithm
– A valuabletool when theinput gets“large”
– Ignorestheeffects o f different machines or different
implementatio ns of an algorithm
• Intuitively, to find theasymptotic runtime, throw
away theconstantsand low-order terms
– Linear search isT(n) = 3n + 2 ∈ O(n)
– Binary search isT(n) = 4 log2n + 4 ∈ O(log n)
Remember: the fastest algo rithm has the
slo west gro wing functio n fo r its runtime
33
Basic Terminologies & Asymptotic
Notations
Asymptotic Analysis
• Eliminatelow order terms
– 4n + 5 ⇒
– 0.5 n log n + 2n + 7 ⇒
– n3
+ 2n
+ 3n ⇒
• Eliminatecoefficients
– 4n ⇒
– 0.5 n log n ⇒
– n log n2
=>
34
Basic Terminologies & Asymptotic
Notations
Propertiesof Logs
• log AB = log A + log B
• Proof:
• Similarly:
– log(A/B) = log A – log B
– log(AB
) = B log A
• Any log isequivalent to log-base-2
BAAB
AB
BA
BABA
BA
logloglog
222
2,2
)log(logloglog
loglog
2222
22
+=∴
=⋅=
==
+
35
Basic Terminologies & Asymptotic
Notations
Order Notation: Intuition
Although not yet apparent, asn gets“sufficiently large”,
f(n) will be“greater than or equal to” g(n)
f(n) = n3
+ 2n2
g(n) = 100n2
+ 1000
36
Basic Terminologies & Asymptotic
Notations
Definition of Order Notation
• Upper bound: T(n) = O(f(n)) Big-O
Exist positiveconstantsc and n’ such that
T(n) ≤ c f(n) for all n ≥ n’
• Lower bound: T(n) = Ω(g(n)) Omega
Exist positiveconstantsc and n’ such that
T(n) ≥ c g(n) for all n ≥ n’
• Tight bound: T(n) = θ(f(n)) Theta
When both hold:
T(n) = O(f(n))
T(n) = Ω(f(n)) 37
Basic Terminologies & Asymptotic
Notations
Definition of Order Notation
O( f(n) ) : aset or classof functions
g(n) ∈ O( f(n) ) iff thereexist positiveconstsc and n0 such
that:
g(n) ≤ c f(n) for all n ≥ n0
Example:
100n2
+ 1000 ≤ 5 (n3
+ 2n2
) for all n ≥ 19
So g(n) ∈ O( f(n) )
38
Basic Terminologies & Asymptotic
Notations
Order Notation: Example
100n2
+ 1000 ≤ 5 (n3
+ 2n2
) for all n ≥ 19
So f(n) ∈ O( g(n) )
39
Basic Terminologies & Asymptotic
Notations
SomeNoteson Notation
• Sometimesyou’ll see
g(n) = O( f(n) )
• Thisisequivalent to
g(n) ∈ O( f(n) )
• What about thereverse?
O( f(n) ) = g(n)
40
Basic Terminologies & Asymptotic
Notations
Big-O: Common Names
– constant: O(1)
– logarithmic: O(log n) (logkn, log n2
∈ O(log n))
– linear: O(n)
– log-linear: O(n log n)
– quadratic: O(n2
)
– cubic: O(n3
)
– polynomial: O(nk
) (k isaconstant)
– exponential: O(cn
) (c isaconstant > 1)
41
Basic Terminologies & Asymptotic
Notations
Meet theFamily
• O( f(n) ) istheset of all functionsasymptotically lessthan
or equal to f(n)
– o( f(n) ) istheset of all functionsasymptotically
strictly lessthan f(n)
• Ω( f(n) ) istheset of all functionsasymptotically greater
than or equal to f(n)
– ω( f(n) ) istheset of all functionsasymptotically
strictly greater than f(n)
• θ( f(n) ) istheset of all functionsasymptotically equal to
f(n)
42
Basic Terminologies & Asymptotic
Notations
Meet theFamily, Formally
• g(n) ∈ O( f(n) ) iff
Thereexist c and n0 such that g(n) ≤ c f(n) for all n ≥ n0
– g(n) ∈ o( f(n) ) iff
Thereexistsan0 such that g(n) < c f(n) for all c and n ≥ n0
• g(n) ∈ Ω( f(n) ) iff
Thereexist c and n0 such that g(n) ≥ c f(n) for all n ≥ n0
– g(n) ∈ ω( f(n) ) iff
Thereexistsan0 such that g(n) > c f(n) for all c and n ≥ n0
• g(n) ∈ θ( f(n) ) iff
g(n) ∈ O( f(n) ) and g(n) ∈ Ω( f(n) )
Equivalent to: limn→∞ g(n)/f(n) = 0
Equivalent to: limn→∞ g(n)/f(n) = ∞
43Data Structures - Introduction
Big-Omegaet al. Intuitively
Asymptotic Notation MathematicsRelation
O ≤
Ω ≥
θ =
o <
ω >
44Basic Terminologies & Asymptotic
Notations
Prosand Cons
of Asymptotic Analysis
45
Basic Terminologies & Asymptotic
Notations
Perspective: Kindsof Analysis
• Running timemay depend on actual datainput, not
just length of input
• Distinguish
– Worst Case
• Your worst enemy ischoosing input
– Best Case
– AverageCase
• Assumessomeprobabilistic distribution of
inputs
– Amortized
• Averagetimeover many operations
46
Basic Terminologies & Asymptotic
Notations
Typesof Analysis
Two orthogonal axes:
– Bound Flavor
• Upper bound (O, o)
• Lower bound (Ω, ω)
• Asymptotically tight (θ)
– AnalysisCase
• Worst Case(Adversary)
• AverageCase
• Best Case
• Amortized
47
Basic Terminologies & Asymptotic
Notations
16n3
log8(10n2
) + 100n2
= O(n3
log n)
• Eliminate
low-order
terms
• Eliminate
constant
coefficients
16n3
log8(10n2
) + 100n2
16n3
log8(10n2
)
n3
log8(10n2
)
n3
(log8(10) + log8(n2
))
n3
log8(10) + n3
log8(n2
)
n3
log8(n2
)
2n3
log8(n)
n3
log8(n)
n3
log8(2)log(n)
n3
log(n)/3
n3
log(n)
48Basic Terminologies&
Asymptotic Notations

Basic terminologies & asymptotic notations

  • 1.
  • 2.
    DataStructures “Clever” waysto organizeinformationin order to enable efficient computation – What do wemean by clever? – What do wemean by efficient? 2 Basic Terminologies & Asymptotic Notations
  • 3.
    Picking thebest DataStructurefor thejob •Thedatastructureyou pick needsto suppo rt the operationsyou need • Ideally it supportstheoperationsyou will usemost often in an efficient manner • Examplesof operations: – A List with operationsinsert and delete – A Stack with operationspushand pop 3 Basic Terminologies & Asymptotic Notations
  • 4.
    Terminology • Abstract DataType(ADT) –Mathematical description of an object with set of operationson theobject. Useful building block. • Algorithm – A high level, languageindependent, description of astep-by-step process • Datastructure – A specific family of algorithmsfor implementing an abstract datatype. • Implementation of datastructure – A specific implementation in aspecific language 4 Basic Terminologies & Asymptotic Notations
  • 5.
    Terminology • Data Datarefersto valueorset of values. e.g.Marksobtained by thestudents. • Datatype datatypeisaclassification identifying oneof varioustypes of data, such as floating-point, integer, or Boolean, that determinesthepossiblevaluesfor that type; theoperations that can bedoneon valuesof that type; and theway values of that typecan bestored Data Structures - Introduction 5
  • 6.
    Terminology • Primitivedatatype: Thesearebasic datatypesthatareprovided by the programming languagewith built-in support. Thesedata typesare nativeto thelanguage. Thisdatatypeis supported by machinedirectly • Variable Variableisasymbolic namegiven to someknown or unknown quantity or information, for thepurposeof allowing thenameto beused independently of the information it represents. Data Structures - Introduction 6
  • 7.
    Terminology • Record Collection ofrelated dataitemsisknown asrecord. The elementsof recordsareusually Called fieldsor members. Recordsare distinguished from arraysby thefact that their number of fieldsistypically fixed, each field hasa name, and that each field may haveadifferent type. • Program A sequenceof instructionsthat acomputer can interpret and execute. Data Structures - Introduction 7
  • 8.
    Terminology examples • Astack isan abstract data type supporting push, pop and isEmpty operations • A stack data structure could usean array, alinked list, or anything that can hold data • Onestack implementatio n isjava.util.Stack; another is java.util.LinkedList 8 Basic Terminologies & Asymptotic Notations
  • 9.
    Concepts vs. Mechanisms •Abstract • Pseudocode • Algorithm – A sequenceof high-level, languageindependent operations, which may act upon an abstracted view of data. • Abstract DataType(ADT) – A mathematical description of an object and theset of operations on theobject. • Concrete • Specific programming language • Program – A sequenceof operationsin a specific programming language, which may act upon real datain theform of numbers, images, sound, etc. • Datastructure – A specific way in which a program’sdataisrepresented, which reflectsthe programmer’sdesign choices/goals. 9
  • 10.
    Why So ManyDataStructures? Ideal datastructure: “fast”, “elegant”, memory efficient Generatestensions: – timevs. space – performancevs. elegance – generality vs. simplicity – oneoperation’sperformancevs. another’s The study o f data structures is the study o f tradeo ffs. That’ s why we have so many o f them! 10Basic Terminologies & Asymptotic Notations
  • 11.
  • 12.
    Algorithm Analysis: Why? •Correctness: – Doesthealgorithm do what isintended. • Performance: – What istherunning timeof thealgorithm. – How much storagedoesit consume. • Different algorithmsmay becorrect – Which should I use? 12 Basic Terminologies & Asymptotic Notations
  • 13.
    Recursivealgorithm for sum •Writearecursive function to find thesum of thefirst n integersstored in array v. 13
  • 14.
    Proof by Induction •Basis Step: Thealgorithm iscorrect for abasecaseor two by inspection. • Inductive Hypothesis (n=k): Assumethat the algorithm workscorrectly for thefirst k cases. • Inductive Step (n=k+1): Given thehypothesisabove, show that thek+1 casewill becalculated correctly. 14
  • 15.
    Program Correctnessby Induction •Basis Step: sum(v,0) = 0.  • Inductive Hypothesis (n=k): Assumesum(v,k) correctly returnssum of first k elementsof v, i.e. v[0]+v[1]+…+v[k-1]+v[k] • Inductive Step (n=k+1): sum(v,n) returns v[k]+sum(v,k-1)= (by inductive hyp.) v[k]+(v[0]+v[1]+…+v[k-1])= v[0]+v[1]+…+v[k-1]+v[k]  15
  • 16.
    AlgorithmsvsPrograms • Proving correctnessofan algorithm isvery important – awell designed algorithm isguaranteed to work correctly and itsperformancecan beestimated • Proving correctnessof aprogram (an implementation) is fraught with weird bugs – Abstract DataTypesareaway to bridgethegap between mathematical algorithmsand programs 16
  • 17.
    Comparing Two Algorithms GOAL:Sort alist of names “I’ll buy afaster CPU” “I’ll useC++ instead of Java– wicked fast!” “Ooh look, the–O4 flag!” “Who careshow I do it, I’ll add morememory!” “Can’t I just get thedatapre-sorted??” 17
  • 18.
    Comparing Two Algorithms •What wewant: – Rough Estimate – IgnoresDetails • Really, independent of details – Coding tricks, CPU speed, compiler optimizations, … – Thesewould help any algorithmsequally – Don’t just careabout running time– not agood enough measure 18
  • 19.
    Big-O Analysis • Ignores“details” •What details? – CPU speed – Programming languageused – Amount of memory – Compiler – Order of input – Sizeof input … sorta. 19
  • 20.
    Analysisof Algorithms • Efficiencymeasure – how long theprogram runs timecomplexity – how much memory it uses spacecomplexity • Why analyzeat all? – Decidewhat algorithm to implement beforeactually doing it – Given code, get asensefor wherebottlenecksmust be, without actually measuring it 20
  • 21.
    Asymptotic Analysis • Complexityasafunction of input sizen T(n) = 4n + 5 T(n) = 0.5 n log n - 2n + 7 T(n) = 2n + n3 + 3n • What happens as n gro ws? 21
  • 22.
    Why Asymptotic Analysis? •Most algorithmsarefast for small n – Timedifferencetoo small to benoticeable – External thingsdominate(OS, disk I/O, …) • BUT n isoften largein practice – Databases, internet, graphics, … • Differencereally showsup asn grows! 22
  • 23.
    Exercise- Searching bool ArrayFind(int array[], int n, int key){ // Insert your algorithm here 2 3 5 16 37 50 73 75 126 What algo rithm wo uld yo u cho o se to implement this co de snippet? 23
  • 24.
    Analyzing Code Basic Javaoperations Consecutive statements Conditionals Loops Function calls Recursive functions Constant time Sum of times Larger branch plustest Sum of iterations Cost of function body Solverecurrencerelation 24
  • 25.
    Linear Search Analysis boolLinearArrayFind(int array[], int n, int key ) { for( int i = 0; i < n; i++ ) { if( array[i] == key ) // Found it! return true; } return false; } Best Case: Worst Case: 25
  • 26.
    Binary Search Analysis boolBinArrayFind( int array[], int low, int high, int key ) { // Thesubarray isempty if( low > high ) return false; // Search thissubarray recursively int mid = (high + low) / 2; if( key == array[mid] ) { return true; } elseif( key < array[mid] ) { return BinArrayFind( array, low, mid-1, key ); } else{ return BinArrayFind( array, mid+1, high, key ); Best case: Worst case: 26
  • 27.
    Solving RecurrenceRelations 1. Determinetherecurrencerelation.What is/arethebase case(s)? 2. “Expand” theoriginal relation to find an equivalent general expression in terms o f the number o f expansio ns. 3. Find aclosed-form expression by setting the number o f expansio ns to avaluewhich reducestheproblem to abase case 27
  • 28.
  • 29.
    Linear Search vsBinarySearch Linear Search Binary Search Best Case 4 at [0] 4 at [middle] Worst Case 3n+2 4 log n + 4 So … which algo rithm is better? What tradeo ffs can yo u make? 29
  • 30.
    Fast Computer vs.Slow Computer 30
  • 31.
    Fast Computer vs.Smart Programmer (round 1) 31
  • 32.
    Fast Computer vs.Smart Programmer (round 2) 32
  • 33.
    Asymptotic Analysis • Asymptoticanalysislooksat theo rder of therunning timeof thealgorithm – A valuabletool when theinput gets“large” – Ignorestheeffects o f different machines or different implementatio ns of an algorithm • Intuitively, to find theasymptotic runtime, throw away theconstantsand low-order terms – Linear search isT(n) = 3n + 2 ∈ O(n) – Binary search isT(n) = 4 log2n + 4 ∈ O(log n) Remember: the fastest algo rithm has the slo west gro wing functio n fo r its runtime 33 Basic Terminologies & Asymptotic Notations
  • 34.
    Asymptotic Analysis • Eliminateloworder terms – 4n + 5 ⇒ – 0.5 n log n + 2n + 7 ⇒ – n3 + 2n + 3n ⇒ • Eliminatecoefficients – 4n ⇒ – 0.5 n log n ⇒ – n log n2 => 34 Basic Terminologies & Asymptotic Notations
  • 35.
    Propertiesof Logs • logAB = log A + log B • Proof: • Similarly: – log(A/B) = log A – log B – log(AB ) = B log A • Any log isequivalent to log-base-2 BAAB AB BA BABA BA logloglog 222 2,2 )log(logloglog loglog 2222 22 +=∴ =⋅= == + 35 Basic Terminologies & Asymptotic Notations
  • 36.
    Order Notation: Intuition Althoughnot yet apparent, asn gets“sufficiently large”, f(n) will be“greater than or equal to” g(n) f(n) = n3 + 2n2 g(n) = 100n2 + 1000 36 Basic Terminologies & Asymptotic Notations
  • 37.
    Definition of OrderNotation • Upper bound: T(n) = O(f(n)) Big-O Exist positiveconstantsc and n’ such that T(n) ≤ c f(n) for all n ≥ n’ • Lower bound: T(n) = Ω(g(n)) Omega Exist positiveconstantsc and n’ such that T(n) ≥ c g(n) for all n ≥ n’ • Tight bound: T(n) = θ(f(n)) Theta When both hold: T(n) = O(f(n)) T(n) = Ω(f(n)) 37 Basic Terminologies & Asymptotic Notations
  • 38.
    Definition of OrderNotation O( f(n) ) : aset or classof functions g(n) ∈ O( f(n) ) iff thereexist positiveconstsc and n0 such that: g(n) ≤ c f(n) for all n ≥ n0 Example: 100n2 + 1000 ≤ 5 (n3 + 2n2 ) for all n ≥ 19 So g(n) ∈ O( f(n) ) 38 Basic Terminologies & Asymptotic Notations
  • 39.
    Order Notation: Example 100n2 +1000 ≤ 5 (n3 + 2n2 ) for all n ≥ 19 So f(n) ∈ O( g(n) ) 39 Basic Terminologies & Asymptotic Notations
  • 40.
    SomeNoteson Notation • Sometimesyou’llsee g(n) = O( f(n) ) • Thisisequivalent to g(n) ∈ O( f(n) ) • What about thereverse? O( f(n) ) = g(n) 40 Basic Terminologies & Asymptotic Notations
  • 41.
    Big-O: Common Names –constant: O(1) – logarithmic: O(log n) (logkn, log n2 ∈ O(log n)) – linear: O(n) – log-linear: O(n log n) – quadratic: O(n2 ) – cubic: O(n3 ) – polynomial: O(nk ) (k isaconstant) – exponential: O(cn ) (c isaconstant > 1) 41 Basic Terminologies & Asymptotic Notations
  • 42.
    Meet theFamily • O(f(n) ) istheset of all functionsasymptotically lessthan or equal to f(n) – o( f(n) ) istheset of all functionsasymptotically strictly lessthan f(n) • Ω( f(n) ) istheset of all functionsasymptotically greater than or equal to f(n) – ω( f(n) ) istheset of all functionsasymptotically strictly greater than f(n) • θ( f(n) ) istheset of all functionsasymptotically equal to f(n) 42 Basic Terminologies & Asymptotic Notations
  • 43.
    Meet theFamily, Formally •g(n) ∈ O( f(n) ) iff Thereexist c and n0 such that g(n) ≤ c f(n) for all n ≥ n0 – g(n) ∈ o( f(n) ) iff Thereexistsan0 such that g(n) < c f(n) for all c and n ≥ n0 • g(n) ∈ Ω( f(n) ) iff Thereexist c and n0 such that g(n) ≥ c f(n) for all n ≥ n0 – g(n) ∈ ω( f(n) ) iff Thereexistsan0 such that g(n) > c f(n) for all c and n ≥ n0 • g(n) ∈ θ( f(n) ) iff g(n) ∈ O( f(n) ) and g(n) ∈ Ω( f(n) ) Equivalent to: limn→∞ g(n)/f(n) = 0 Equivalent to: limn→∞ g(n)/f(n) = ∞ 43Data Structures - Introduction
  • 44.
    Big-Omegaet al. Intuitively AsymptoticNotation MathematicsRelation O ≤ Ω ≥ θ = o < ω > 44Basic Terminologies & Asymptotic Notations
  • 45.
    Prosand Cons of AsymptoticAnalysis 45 Basic Terminologies & Asymptotic Notations
  • 46.
    Perspective: Kindsof Analysis •Running timemay depend on actual datainput, not just length of input • Distinguish – Worst Case • Your worst enemy ischoosing input – Best Case – AverageCase • Assumessomeprobabilistic distribution of inputs – Amortized • Averagetimeover many operations 46 Basic Terminologies & Asymptotic Notations
  • 47.
    Typesof Analysis Two orthogonalaxes: – Bound Flavor • Upper bound (O, o) • Lower bound (Ω, ω) • Asymptotically tight (θ) – AnalysisCase • Worst Case(Adversary) • AverageCase • Best Case • Amortized 47 Basic Terminologies & Asymptotic Notations
  • 48.
    16n3 log8(10n2 ) + 100n2 =O(n3 log n) • Eliminate low-order terms • Eliminate constant coefficients 16n3 log8(10n2 ) + 100n2 16n3 log8(10n2 ) n3 log8(10n2 ) n3 (log8(10) + log8(n2 )) n3 log8(10) + n3 log8(n2 ) n3 log8(n2 ) 2n3 log8(n) n3 log8(n) n3 log8(2)log(n) n3 log(n)/3 n3 log(n) 48Basic Terminologies& Asymptotic Notations

Editor's Notes

  • #3 clever – range from techniques with which you are already familiar – eg, representing simple lists – to ones that are more complex, such as hash tables or self-balancing trees. Elegant, mathematically deep, non obvious. making the different meanings of “efficient” precise is much of the work of this course!
  • #10 Note how much messier life becomes when we move from abstract concepts to concrete mechanisms. Being able to abstract the intrinsic problems from the real world scenarios -- or, alternatively, being able to realize an abstract concept in code -- is one of the important skills that computer scientists need to possess. Note that pseudocode is a concept, and a programming language is a mechanism.
  • #11 What does it mean to be “fast”? What about “elegant”?
  • #21 We talked last time about efficiency. Let’s refine this further. Confidence: algorithm will work well in practice : gives you boss a reason to pay you right away! Insight : alternative, better algorithms Have an idea where potential bottlenecks are/will be
  • #24 Ultimately we want to analyze algorithms, so let’s generate an algorithm to try out. The point of these “Hannah takes a break” series of slides is to encourage students to come up with the answers themselves. What I say should be minimal, and there really shouldn’t be any point in the lecture when I present the “right answer”, because it encourages the students not to say anything if they come to expect this. (n = 0) Hopefully, students will only pick linear and binary search …
  • #25 Okay, now that we have some algorithms to serve as examples, let’s analyze them! Here’s some hints before we begin …
  • #26 T(n) = n (we are looking for exact runtimes) Best = 3, Worst = 2n + 1 [note that average depends on if found]
  • #27 Uh-oh. We don’t know how to calculate the runtime exactly (although 142/3 should have taught us that it’s O( log n ) Let’s go to the next slide. We’ll come back and fill out these numbers later. Runtime: T(n) = 4 log_2 n + 2 Best = 4, Worst = 4 log_2 n + 2, Most of the time = 4 log_2 n + 2
  • #28 1. T(n) = 4 + T( floor(n/2) ) T(1) = 2 2. T(n) = 4 + (4 + T( floor(n/4) ) ) = 8 + T( floor(n/4) ) T(n) = 4 + (4 + (4 + T( floor(n/8) ) ) = 12 + T( floor(n/8) ) So, if k is the number of expansions, the general expression is: T(n) = 4k + T( floor( n/(2^k) ) ) 3. Since the base case is n = 1, we need to find a value of k such that the value of floor( n/(2^k) ) is 1 (which removes T(n) from the RHS of the equation, thus giving us a closed-form expression). A value of k = log_2 n gives us this. Setting k = log_2 n, we have: T(n) = 4 log_2 n + T(1) T(n) = 4 log_2 n + 2
  • #30 Linear Search: 3, 2n+1 Binary Search: 4, 4logN + 2 Some students will probably say “Binary search, because it’s O( log n ), whereas linear search is O( n )”. But the point of this discussion is that big-O notation obscures some important factors (like constants!) and we really don’t know the input size. To make a meaningful comparison, we need to know more information. What information might that be? (1) what our priorities are (runtime? Memory footprint?) (2) what the input size is (or, even better, what the input is!) (3) our platform/machine – we saw on the earlier chart that architecture made a difference! (4) other … Big-O notation gives us a way to compare algorithms without this information, but the cost is a loss of precision.
  • #31 The y-axis is time – lower is better With the same algorithm, the faster machine wins out
  • #32 With different algorithms, constants matter. Does linear search beat out binary search?
  • #33 Binary search wins out – eventually
  • #34 Okay, so the point of all those pretty pictures is to show that, while constants matter, they don’t really matter as much as the order for “sufficiently large” input (“large” depends, of course, on the constants).
  • #35 We didn’t get very precise in our analysis of the UWID info finder; why? Didn’t know the machine we’d use. Is this always true? Do you buy that coefficients and low order terms don’t matter? When might they matter? (Linked list memory usage)
  • #38 We’ll use some specific terminology to describe asymptotic behavior. There are some analogies here that you might find useful.
  • #40 Whoa, what happened here? The picture seems to indicate that the “crossover point” happens around 95, whereas our inequality seems to indicate that the crossover happens at 19! The point we want to make is that big-O notation captures a relationship between f(n) and g(n) (ie, the fact that f(n) is “greater than or equal to” g(n)), not that it captures the actual constants that describe when the “crossover” happens. Remember, in big-O notation, the constants on the two functions don’t really matter. For c = 1, the crossover happens at n = 100 exactly
  • #43 ( f(n) ) = theta
  • #44 Note how they all look suspiciously like copy-and-paste definitions … Make sure that they notice the only difference between the relations -- less-than, less-than-or-equal, etc.
  • #45 In fact, it’s not just the intuitive chart, but it’s the chart of definitions! Notice how similar the formal definitions all were … they only differed in the relations which we highlighted in blue!
  • #46 Let’s do a think-pair-share: Have the students (in teams of 2 or 3) come up with some of the pros and cons of asymptotic analysis Have them come together and share as a group Some points I hope to get out: Asymptotic analysis is useful for quick-and-and-dirty comparisons of algorithms They allow us to talk about algorithms separate from architecture But in order to be so powerful, we sacrifice precision. Also don’t contrast implementation complexity.
  • #47 We already discussed the bound flavor. All of these can be applied to any analysis case. For example, we’ll later prove that sorting in the worst case takes at least n log n time. That’s a lower bound on a worst case. Average case is hard! What does “average” mean. For example, what’s the average case for searching an unordered list (as precise as possible, not asymptotic). WRONG! It’s about n, not 1/2 n. Why? You have to search the whole thing if the elt is not there. Note there’s two senses of tight. I’ll try to avoid the terminology “asymptotically tight” and stick with the lower def’n of tight. O(inf) is not tight!
  • #48 We already discussed the bound flavor. All of these can be applied to any analysis case. For example, we’ll later prove that sorting in the worst case takes at least n log n time. That’s a lower bound on a worst case. Average case is hard! What does “average” mean. For example, what’s the average case for searching an unordered list (as precise as possible, not asymptotic). WRONG! It’s about n, not 1/2 n. Why? You have to search the whole thing if the elt is not there. Note there’s two senses of tight. I’ll try to avoid the terminology “asymptotically tight” and stick with the lower def’n of tight. O(inf) is not tight!