Data Structures and
Algorithms
Week 1: Intro to Data Structures and Algorithms
Ferdin Joe John Joseph, PhD
Faculty of Information Technology
Thai-Nichi Institute of Technology, Bangkok
Join our g+ community
Faculty of Information Technology, Thai -
Nichi Institute of Technology
2
https://bit.ly/2skCIK0
DSA 107 – A Road Map
Attendance
(10%)
Mid Exam
(40%)
How GPA
works for
DSA 107?
Final Exam
(50%)
Faculty of Information Technology, Thai -
Nichi Institute of Technology
3
Textbooks
Faculty of Information Technology, Thai -
Nichi Institute of Technology
4
What is Program
• A Set of Instructions
• Data Structures + Algorithms
• Data Structure = A Container stores Data
• Algoirthm = Logic + Control
Lecture series for Semester 2, Data
Science and Analytics
5
Functions of Data Structures
• Add
• Index
• Key
• Position
• Priority
• Search
• Change
• Delete
Lecture series for Semester 2, Data
Science and Analytics
6
Common Data Structures
• Array
• Stack
• Queue
• Linked List
• Tree
• Heap
• Hash Table
• Priority Queue
Lecture series for Semester 2, Data
Science and Analytics
7
How many Algorithms?
• Countless
Lecture series for Semester 2, Data
Science and Analytics
8
Algorithm Strategies
• Greedy
• Divide and Conquer
• Dynamic Programming
• Exhaustive Search
Lecture series for Semester 2, Data
Science and Analytics
9
Which Data Structure or Algorithm is
better?
• Must Meet Requirement
• High Performance
• Low RAM footprint
• Easy to implement
• Encapsulated
Lecture series for Semester 2, Data
Science and Analytics
10
Chapter 1 Basic Concepts
• Overview: System Life Cycle
• Algorithm Specification
• Data Abstraction
• Performance Analysis
• Performance Measurement
Lecture series for Semester 2, Data
Science and Analytics
11
1.1 Overview: system life cycle (1/2)
• Good programmers regard large-scale computer
programs as systems that contain many complex
interacting parts.
• As systems, these programs undergo a
development process called the system life cycle.
Lecture series for Semester 2, Data
Science and Analytics
12
1.1 Overview (2/2)
• We consider this cycle as consisting of
five phases.
• Requirements
• Analysis: bottom-up vs. top-down
• Design: data objects and operations
• Refinement and Coding
• Verification
• Program Proving
• Testing
• Debugging
Lecture series for Semester 2, Data
Science and Analytics
13
1.2 Algorithm Specification
• 1.2.1 Introduction
• An algorithm is a finite set of instructions that accomplishes a
particular task.
• Criteria
• input: zero or more quantities that are externally supplied
• output: at least one quantity is produced
• definiteness: clear and unambiguous
• finiteness: terminate after a finite number of steps
• effectiveness: instruction is basic enough to be carried out
• A program does not have to satisfy the finiteness criteria.
Lecture series for Semester 2, Data
Science and Analytics
14
1.2 Algorithm Specification
• Representation
• A natural language, like English or Chinese.
• A graphic, like flowcharts.
• A computer language, like C++, Java etc.
• Algorithms + Data structures = Programs
• Sequential search vs. Binary search
Lecture series for Semester 2, Data
Science and Analytics
15
1.3 Data abstraction (1/4)
• Data Type
A data type is a collection of objects and a set of
operations that act on those objects.
• For example, the data type int consists of the objects {0, +1, -1,
+2, -2, …, INT_MAX, INT_MIN} and the operations +, -, *, /, and
%.
• The data types of Java
• The basic data types: char, int, float and double etc
• The group data types: array and arraytype data type
• The user-defined types
Lecture series for Semester 2, Data
Science and Analytics
16
1.3 Data abstraction (2/4)
• Abstract Data Type
• An abstract data type(ADT) is a data type
that is organized in such a way that
the specification of the objects and
the operations on the objects is separated from
the representation of the objects and
the implementation of the operations.
• We know what is does, but not necessarily how it will do it.
Lecture series for Semester 2, Data
Science and Analytics
17
1.3 Data abstraction (3/4)
• Specification vs. Implementation
• An ADT is implementation independent
• Operation specification
• function name
• the types of arguments
• the type of the results
• The functions of a data type can be classify into several
categories:
• creator / constructor
• transformers
• observers / reporters
Lecture series for Semester 2, Data
Science and Analytics
18
1.3 Data abstraction (4/4)
• Example [Abstract data type Natural_Number]
::= is defined as
Lecture series for Semester 2, Data
Science and Analytics
19
1.4 Performance analysis (1/4)
• Criteria
• Is it correct?
• Is it readable?
• …
• Performance Analysis (machine independent)
• space complexity: storage requirement
• time complexity: computing time
• Performance Measurement (machine dependent)
Lecture series for Semester 2, Data
Science and Analytics
20
1.4 Performance analysis (2/4)
• 1.4.1 Space Complexity:
S(P)=C+SP(I)
• Fixed Space Requirements (C)
Independent of the characteristics
of the inputs and outputs
• instruction space
• space for simple variables, fixed-size structured variable, constants
• Variable Space Requirements (SP(I))
depend on the instance characteristic I
• number, size, values of inputs and outputs associated with I
• recursive stack space, formal parameters, local variables, return
address
Lecture series for Semester 2, Data
Science and Analytics
21
1.4 Performance analysis (5/4)
• 1.4.2 Time Complexity:
T(P)=C+TP(I)
• The time, T(P), taken by a program, P, is the sum of its
compile time C and its run (or execution) time, TP(I)
• Fixed time requirements
• Compile time (C), independent of instance characteristics
• Variable time requirements
• Run (execution) time TP
• TP(n)=caADD(n)+csSUB(n)+clLDA(n)+cstSTA(n)
Lecture series for Semester 2, Data
Science and Analytics
22
1.4 Performance analysis (6/4)
• A program step is a syntactically or semantically
meaningful program segment whose execution
time is independent of the instance
characteristics.
• Example
(Regard as the same unit machine independent)
• abc = a + b + b * c + (a + b - c) / (a + b) + 4.0
• abc = a + b + c
• Methods to compute the step count
• Introduce variable count into programs
• Tabular method
• Determine the total number of steps contributed by each
statement step per execution × frequency
• add up the contribution of all statements
Lecture series for Semester 2, Data
Science and Analytics
23
Array ADT – Java Implementation
Lecture series for Semester 2, Data
Science and Analytics
24
An Array-Based Implementation
of the ADT List
public class ListArrayBased
implements ListInterface {
private static final int MAX_LIST = 50;
private Object items[];
// an array of list items
private int numItems;
// number of items in list
Lecture series for Semester 2, Data
Science and Analytics
25
An Array-Based Implementation
of the ADT List
public ListArrayBased() {
items = new Object[MAX_LIST];
numItems = 0;
} // end default constructor
Lecture series for Semester 2, Data
Science and Analytics
26
An Array-Based Implementation
of the ADT List
public boolean isEmpty() {
return (numItems == 0);
} // end isEmpty
public int size() {
return numItems;
} // end size
Lecture series for Semester 2, Data
Science and Analytics
27
Insertion into Array
What happens if you want to insert an item at a specified
position in an existing array?
• Write over the current contents at the given index (which might not
be appropriate), or
• The item originally at the given index must be moved up one
position, and all the items after that index must shuffled up
Lecture series for Semester 2, Data
Science and Analytics
28
An Array-Based Implementation
of the ADT List
public void add(int index, Object item)
throws ListException,
ListIndexOutOfBoundsException {
if (numItems >= MAX_LIST) {
throw new ListException("ListException on add:"+
" out of memory");
} // end if
if (index < 1 || index > numItems+1) {
// index out of range
throw new ListIndexOutOfBoundsException(
"ListIndexOutOfBoundsException on add");
} // end if
Lecture series for Semester 2, Data
Science and Analytics
29
An Array-Based Implementation
of the ADT List
// make room for new element by shifting all items at
// positions >= index toward the end of the
// list (no shift if index == numItems+1)
for (int pos = numItems; pos >= index; pos--) {
items[pos] = items[pos-1];
} // end for
// insert new item
items[index-1] = item;
numItems++;
} //end add
Lecture series for Semester 2, Data
Science and Analytics
30
Removal from Arrays
What happens if you want to remove an item from a specified
position in an existing array?
• Leave gaps in the array, i.e. indices that contain no elements, which
in practice, means that the array element has to be given a special
value to indicate that it is “empty”, or
• All the items after the (removed item’s) index must be shuffled
down
Lecture series for Semester 2, Data
Science and Analytics
31
An Array-Based Implementation
of the ADT Listpublic void remove(int index)
throws ListIndexOutOfBoundsException {
if (index >= 1 && index <= numItems) {
// delete item by shifting all items at
// positions > index toward the beginning of the list
// (no shift if index == size)
for (int pos = index+1; pos <= size(); pos++) {
items[pos-2] = items[pos-1];
} // end for
numItems--;
}
else { // index out of range
throw new ListIndexOutOfBoundsException(
"ListIndexOutOfBoundsException on remove");
} // end if
} // end remove Lecture series for Semester 2, Data
Science and Analytics
32
An Array-Based Implementation
of the ADT List
public Object get(int index)
throws ListIndexOutOfBoundsException {
if (index >= 1 && index <= numItems) {
return items[index-1];
}
else { // index out of range
throw new ListIndexOutOfBoundsException(
"ListIndexOutOfBoundsException on get");
} // end if
} // end get
Lecture series for Semester 2, Data
Science and Analytics
33
An Array-Based Implementation -
Summary
• Good things:
• Fast, random access of elements
• Very memory efficient, very little memory is required
other than that needed to store the contents (but see
bellow)
• Bad things:
• Slow deletion and insertion of elements
• Size must be known when the array is created and is
fixed (static)
Lecture series for Semester 2, Data
Science and Analytics
34
Recommended Reading
https://www.quora.com/Does-a-data-scientist-need-
to-know-algorithms-and-data-structures-as-well-as-
a-software-engineer
Lecture series for Semester 2, Data
Science and Analytics
35
Next Week
Linked List ADT
Doubly Linked List
Circular Linked List
Implementation in Java
Lecture series for Semester 2, Data
Science and Analytics
36

Week 1 - Data Structures and Algorithms

  • 1.
    Data Structures and Algorithms Week1: Intro to Data Structures and Algorithms Ferdin Joe John Joseph, PhD Faculty of Information Technology Thai-Nichi Institute of Technology, Bangkok
  • 2.
    Join our g+community Faculty of Information Technology, Thai - Nichi Institute of Technology 2 https://bit.ly/2skCIK0
  • 3.
    DSA 107 –A Road Map Attendance (10%) Mid Exam (40%) How GPA works for DSA 107? Final Exam (50%) Faculty of Information Technology, Thai - Nichi Institute of Technology 3
  • 4.
    Textbooks Faculty of InformationTechnology, Thai - Nichi Institute of Technology 4
  • 5.
    What is Program •A Set of Instructions • Data Structures + Algorithms • Data Structure = A Container stores Data • Algoirthm = Logic + Control Lecture series for Semester 2, Data Science and Analytics 5
  • 6.
    Functions of DataStructures • Add • Index • Key • Position • Priority • Search • Change • Delete Lecture series for Semester 2, Data Science and Analytics 6
  • 7.
    Common Data Structures •Array • Stack • Queue • Linked List • Tree • Heap • Hash Table • Priority Queue Lecture series for Semester 2, Data Science and Analytics 7
  • 8.
    How many Algorithms? •Countless Lecture series for Semester 2, Data Science and Analytics 8
  • 9.
    Algorithm Strategies • Greedy •Divide and Conquer • Dynamic Programming • Exhaustive Search Lecture series for Semester 2, Data Science and Analytics 9
  • 10.
    Which Data Structureor Algorithm is better? • Must Meet Requirement • High Performance • Low RAM footprint • Easy to implement • Encapsulated Lecture series for Semester 2, Data Science and Analytics 10
  • 11.
    Chapter 1 BasicConcepts • Overview: System Life Cycle • Algorithm Specification • Data Abstraction • Performance Analysis • Performance Measurement Lecture series for Semester 2, Data Science and Analytics 11
  • 12.
    1.1 Overview: systemlife cycle (1/2) • Good programmers regard large-scale computer programs as systems that contain many complex interacting parts. • As systems, these programs undergo a development process called the system life cycle. Lecture series for Semester 2, Data Science and Analytics 12
  • 13.
    1.1 Overview (2/2) •We consider this cycle as consisting of five phases. • Requirements • Analysis: bottom-up vs. top-down • Design: data objects and operations • Refinement and Coding • Verification • Program Proving • Testing • Debugging Lecture series for Semester 2, Data Science and Analytics 13
  • 14.
    1.2 Algorithm Specification •1.2.1 Introduction • An algorithm is a finite set of instructions that accomplishes a particular task. • Criteria • input: zero or more quantities that are externally supplied • output: at least one quantity is produced • definiteness: clear and unambiguous • finiteness: terminate after a finite number of steps • effectiveness: instruction is basic enough to be carried out • A program does not have to satisfy the finiteness criteria. Lecture series for Semester 2, Data Science and Analytics 14
  • 15.
    1.2 Algorithm Specification •Representation • A natural language, like English or Chinese. • A graphic, like flowcharts. • A computer language, like C++, Java etc. • Algorithms + Data structures = Programs • Sequential search vs. Binary search Lecture series for Semester 2, Data Science and Analytics 15
  • 16.
    1.3 Data abstraction(1/4) • Data Type A data type is a collection of objects and a set of operations that act on those objects. • For example, the data type int consists of the objects {0, +1, -1, +2, -2, …, INT_MAX, INT_MIN} and the operations +, -, *, /, and %. • The data types of Java • The basic data types: char, int, float and double etc • The group data types: array and arraytype data type • The user-defined types Lecture series for Semester 2, Data Science and Analytics 16
  • 17.
    1.3 Data abstraction(2/4) • Abstract Data Type • An abstract data type(ADT) is a data type that is organized in such a way that the specification of the objects and the operations on the objects is separated from the representation of the objects and the implementation of the operations. • We know what is does, but not necessarily how it will do it. Lecture series for Semester 2, Data Science and Analytics 17
  • 18.
    1.3 Data abstraction(3/4) • Specification vs. Implementation • An ADT is implementation independent • Operation specification • function name • the types of arguments • the type of the results • The functions of a data type can be classify into several categories: • creator / constructor • transformers • observers / reporters Lecture series for Semester 2, Data Science and Analytics 18
  • 19.
    1.3 Data abstraction(4/4) • Example [Abstract data type Natural_Number] ::= is defined as Lecture series for Semester 2, Data Science and Analytics 19
  • 20.
    1.4 Performance analysis(1/4) • Criteria • Is it correct? • Is it readable? • … • Performance Analysis (machine independent) • space complexity: storage requirement • time complexity: computing time • Performance Measurement (machine dependent) Lecture series for Semester 2, Data Science and Analytics 20
  • 21.
    1.4 Performance analysis(2/4) • 1.4.1 Space Complexity: S(P)=C+SP(I) • Fixed Space Requirements (C) Independent of the characteristics of the inputs and outputs • instruction space • space for simple variables, fixed-size structured variable, constants • Variable Space Requirements (SP(I)) depend on the instance characteristic I • number, size, values of inputs and outputs associated with I • recursive stack space, formal parameters, local variables, return address Lecture series for Semester 2, Data Science and Analytics 21
  • 22.
    1.4 Performance analysis(5/4) • 1.4.2 Time Complexity: T(P)=C+TP(I) • The time, T(P), taken by a program, P, is the sum of its compile time C and its run (or execution) time, TP(I) • Fixed time requirements • Compile time (C), independent of instance characteristics • Variable time requirements • Run (execution) time TP • TP(n)=caADD(n)+csSUB(n)+clLDA(n)+cstSTA(n) Lecture series for Semester 2, Data Science and Analytics 22
  • 23.
    1.4 Performance analysis(6/4) • A program step is a syntactically or semantically meaningful program segment whose execution time is independent of the instance characteristics. • Example (Regard as the same unit machine independent) • abc = a + b + b * c + (a + b - c) / (a + b) + 4.0 • abc = a + b + c • Methods to compute the step count • Introduce variable count into programs • Tabular method • Determine the total number of steps contributed by each statement step per execution × frequency • add up the contribution of all statements Lecture series for Semester 2, Data Science and Analytics 23
  • 24.
    Array ADT –Java Implementation Lecture series for Semester 2, Data Science and Analytics 24
  • 25.
    An Array-Based Implementation ofthe ADT List public class ListArrayBased implements ListInterface { private static final int MAX_LIST = 50; private Object items[]; // an array of list items private int numItems; // number of items in list Lecture series for Semester 2, Data Science and Analytics 25
  • 26.
    An Array-Based Implementation ofthe ADT List public ListArrayBased() { items = new Object[MAX_LIST]; numItems = 0; } // end default constructor Lecture series for Semester 2, Data Science and Analytics 26
  • 27.
    An Array-Based Implementation ofthe ADT List public boolean isEmpty() { return (numItems == 0); } // end isEmpty public int size() { return numItems; } // end size Lecture series for Semester 2, Data Science and Analytics 27
  • 28.
    Insertion into Array Whathappens if you want to insert an item at a specified position in an existing array? • Write over the current contents at the given index (which might not be appropriate), or • The item originally at the given index must be moved up one position, and all the items after that index must shuffled up Lecture series for Semester 2, Data Science and Analytics 28
  • 29.
    An Array-Based Implementation ofthe ADT List public void add(int index, Object item) throws ListException, ListIndexOutOfBoundsException { if (numItems >= MAX_LIST) { throw new ListException("ListException on add:"+ " out of memory"); } // end if if (index < 1 || index > numItems+1) { // index out of range throw new ListIndexOutOfBoundsException( "ListIndexOutOfBoundsException on add"); } // end if Lecture series for Semester 2, Data Science and Analytics 29
  • 30.
    An Array-Based Implementation ofthe ADT List // make room for new element by shifting all items at // positions >= index toward the end of the // list (no shift if index == numItems+1) for (int pos = numItems; pos >= index; pos--) { items[pos] = items[pos-1]; } // end for // insert new item items[index-1] = item; numItems++; } //end add Lecture series for Semester 2, Data Science and Analytics 30
  • 31.
    Removal from Arrays Whathappens if you want to remove an item from a specified position in an existing array? • Leave gaps in the array, i.e. indices that contain no elements, which in practice, means that the array element has to be given a special value to indicate that it is “empty”, or • All the items after the (removed item’s) index must be shuffled down Lecture series for Semester 2, Data Science and Analytics 31
  • 32.
    An Array-Based Implementation ofthe ADT Listpublic void remove(int index) throws ListIndexOutOfBoundsException { if (index >= 1 && index <= numItems) { // delete item by shifting all items at // positions > index toward the beginning of the list // (no shift if index == size) for (int pos = index+1; pos <= size(); pos++) { items[pos-2] = items[pos-1]; } // end for numItems--; } else { // index out of range throw new ListIndexOutOfBoundsException( "ListIndexOutOfBoundsException on remove"); } // end if } // end remove Lecture series for Semester 2, Data Science and Analytics 32
  • 33.
    An Array-Based Implementation ofthe ADT List public Object get(int index) throws ListIndexOutOfBoundsException { if (index >= 1 && index <= numItems) { return items[index-1]; } else { // index out of range throw new ListIndexOutOfBoundsException( "ListIndexOutOfBoundsException on get"); } // end if } // end get Lecture series for Semester 2, Data Science and Analytics 33
  • 34.
    An Array-Based Implementation- Summary • Good things: • Fast, random access of elements • Very memory efficient, very little memory is required other than that needed to store the contents (but see bellow) • Bad things: • Slow deletion and insertion of elements • Size must be known when the array is created and is fixed (static) Lecture series for Semester 2, Data Science and Analytics 34
  • 35.
  • 36.
    Next Week Linked ListADT Doubly Linked List Circular Linked List Implementation in Java Lecture series for Semester 2, Data Science and Analytics 36