SlideShare a Scribd company logo
1
Nico Ludwig (@ersatzteilchen)
Collections – Part V
2
2
●
Collections – Part V
– Implementation Strategies of Map/Dictionary-like associative Collections
●
BST-based Implementation
●
Hashtable-based Implementation
– A 20K Miles Perspective on Trees in Computer Science
– Equality
– Multimaps
– Set-like associative Collections
TOC
3
3
●
In the last lecture we learned about associative collections.
– Associative collections allow associating keys with values.
●
Example: In white pages names are associated to phone numbers.
– Associative collections allow to lookup a value for a certain key, e.g. getting the phone number of a specific person.
– Associative collections could be implemented with two arrays as shown before, but this would be very inefficient for common cases.
●
In this lecture we're going to understand how associative collectionsare implemented.
●
We'll start with SortedDictionary, which was introduced in the last lecture. We already know about SortedDictionary that
– it allows only one value per key,
– that it organizes its keys by equivalence,
– that a .Net type provides equivalence by implementing IComparable or delegating equivalence to objects implementing IComparer.
●
Ok, the organization of keys is based on equivalence, but when a key is in aSortedDictionary, where does it have to "be"?
– I.e. where and how will it be stored?
– We're going to understand this in this lecture!
Implementation of Associative Collections
4
4
●
In opposite to indexed/sequential collections, items got and set by analyzing their relationships in associative collections.
●
The idea of SortedDictionary is to keep the contained items sorted by key whilst adding items!
– So, its way of key organization is the analysis of the sort order of keys of the items.
●
To make this happen, SortedDictionary is implemented with a tree-like storage organization for the keys.
– When we analyzed algorithms we learned that the most efficient sorting algorithms can be visualized and analyzed with trees.
●
I.e. those implementing "divide and conquer", like mergesort and quicksort.
●
Before we discuss SortedDictionary's tree-based implementation, we're going to talk about alternative implementations.
●
Sidebar: Java's implementation of a sorted associative collection is calledTreeMap. It implements the interface Map.
Associative Collections – What's in a SortedDictionary?
5
5
●
SortedDictionary could be implemented with a sorted list.
– Finding would be fast (O(logn)), but inserting would be costly (O(n), because of resizing).
– It is simple to code.
●
SortedDictionary could be implemented with a sorted linked list.
– Inserting would be fast if the location is known (O(1)), but finding would be costly
(O(n), finding involves the iteration of a linked list).
– More memory per item is required (e.g. one item plus a few references, whereby
(sorted) lists don't (just one item, due to continuous memory)).
●
Problems of these approaches:
– For lists continuos memory is the problem. → It makes the data structure rigid. → Resizing required.
– For linked lists finding items is a problem due to the loose structure. → "Linked iteration"
●
Let's finally discuss how we come from sorted lists and sorted linked lists to the real implementationof SortedDictionary.
"Helen"
"James""Gil""Clark""Ed"
"Clark" "Ed" "Gil" "James"
null
"Helen"
SortedDictionary – two alternative Implementations
6
6
●
When we examine a sorted linked list we'll find an interesting way of thinking about finding items.
– If we had a reference to the middle item, we could determine the "direction" or half to insert a new item to keep the linked list sorted.
– Therefor it would be nice to allow moving into both directions, so we should use a sorted doubly linked list (3rd
alternative)!
– The idea of referencing only the middle item of an already sorted doubly linked list can be recursively applied to the sorted halves.
– The resulting structure is a tree, on which the real implementation of SortedDictionary is built upon!
"Clark" "Ed" "Helen" "James""Gil"
"Clark" "Ed" "Helen" "James""Gil"
"Bud" "Will"
null
"Will"
null
"Bud"
null
null"Clark"
"Ed"
null
"Helen"
"James"null null
"Gil"
"Will"
nullnullnull
"Bud"
null
"Kyle"
"Kyle"
"Kyle"
"Kyle"
SortedDictionary – Coming to the real Implementation
7
7
●
Trees represent one of the many data structures available in cs (others are graphs, relational, heap, stack etc.)
– Similar to collections data structures can be categorized in different ways, e.g. a tree is a special form of a graph.
●
Basic features of trees:
– Trees are recursive branching structures consisting of linked nodes.
– Trees have only one root node.
●
In cs the root of a tree is written on the top at graphical representations.
– A node can be reached by exactly one path from the root.
●
This is only true for linear paths, in xpath there exist various expressions to select any node from the root.
●
Trees represent data that has a hierarchy. This is very common in computer business.
– In the last example the hierarchy was the order: left => less than root, right => greater than root.
– Directory and file structures.
– A function call hierarchy: a call stack.
– "Has a" and "is a" associations in object-oriented systems.
– The structure of an XML file.
Trees in Computer Science (cs)
8
8
●
The final solution we found to handle sorted collections is a special form of a tree, abinary search tree (BST).
– The implementation of SortedDictionary is based on a BST (BST-based).
– The core strategy of BSTs is to maintaining pointers/references to the middle of subtrees.
●
Here some general terminology on trees (seen from "James"):
– "James" is a node or cell. The arrows can be called edges.
– "James" has two children.
●
Below "James" there is a subtree of two children.
– "James"' parent is "Gil".
– "Gil" is the root of the tree.
●
In computer science trees are evolving upside down, having the root at the top.
– "Will" is a leaf, i.e. it has no children at all.
●
A node having only one child is sometimes called half-leaf.
– "Clark" and "James" are siblings.
●
The BST is called "binary", because each node has at most two children.
– A binary search can be issued on a BST in a very simple manner.
null"Clark"
"Ed"
null
"Helen"
"James"null null
"Gil"
"Will"
nullnullnull
"Bud"
null
Trees – Terminology
● There also exist ternary, n-ary etc. trees.
9
9
●
Basically all operations on trees involve recursion!
●
The iteration of a tree is called traversal. It means to visit every node in the tree from a certain starting node.
– (1) Do something with the current node.
– (2) If the left (or right node) is not null make it the current node and continue with (1). → Recurse!
●
There are three classical ways to traverse (binary) trees:
– Inorder: visit the left node, then the root and finally the right node.
– Postorder: visit the left node, then the right node and finally the root.
– Preorder/depth-first: visit the root, then the left node and finally the right node.
●
The most important traversal for BSTs is inorder, because it visits the nodes in sorted order.
– Notice, how the way of traversal could be encapsulated behind the iterator design pattern!
public class Node { // Mind the recursive definition!
public string Value { get; set;}
public Node Left { get; set;}
public Node Right { get; set;}
}
public static void PrintTreePostorder(Node root) {
if (null != root) {
PrintTreePostorder(root.Left); // Recurse!
PrintTreePostorder(root.Right);// Recurse!
Console.WriteLine(root.Value);
}
}
public static void PrintTreeInorder(Node root) {
if (null != root) {
PrintTreeInorder(root.Left); // Recurse!
Console.WriteLine(root.Value);
PrintTreeInorder(root.Right); // Recurse!
}
}
public static void PrintTreePreorder(Node root) {
if (null != root) {
Console.WriteLine(root.Value);
PrintTreePreorder(root.Left); // Recurse!
PrintTreePreorder(root.Right); // Recurse!
}
}
Traversal of Trees
● The deletion of all nodes in a BST needs to be
done postorder.
● Among others there also exists "level-order",
which is also called "breadth-first", it iterates all
nodes beginning at the root from left to right.
10
10
●
So SortedDictionary is implemented with a BST.
– The string that was used in each node in former examples, plays the role of SortedDictionary's search key.
– The nodes of SortedDictionary have one more field: the value that is associated with the search key.
●
Using a BST in the implementation allows fast implementations to find items.
– E.g. the methods Add() and ContainsKey() apply a binary search on SortedDictionary's BST to locate items.
– The involved search algorithms are implemented recursively.
●
Complexity.
– The costs for finding one item only depends on the height of the tree.
– We've already discussed that a tree's height can be retrieved by log n.
– This is valid for perfectly balanced trees.
– Other costs like swapping points will not be
taken into consideration as always.
– This makes the complexity for finding (and
related operations) O(log n) for BSTs.
●
Max. ten comparisons to find an item in 1000 items.
null"Clark"
"Ed"
null
"Helen"
"James"null null
"Gil"
"Will"
nullnullnull
"Bud"
null
"Kyle"
"Kyle"
(Node)
8758 value
null root
* left
* right
"Gil" key
SortedDictionary – final Words
● Looking at the key we know on which half an item
has to be inserted or found. (left => less than root;
right => greater than root, this makes divide and
conquer work).
● BSTs can be kept balanced internally to maintain
logarithmic behavior, this yields an additional cost
factor.
11
11
Associative Collections – A completely different Strategy
● Up to now we discussed associative collections using equivalence to manage keys.
– We used a BST and got logarithmic cost for inserting and finding elements. - This is pretty fast!
– An interesting point is that the organization of BSTs is concretely imaginable/understandable/drawable.
● Now we're going to discuss associative collections using equality to manage keys, which is really cool!
– Instead of keeping the associative collection sorted, we manage a set of collections that collect items having something in common.
– These individual collections are then called buckets.
● E.g. an associative collection telephoneDictionary: for each bucket the contained strings
will have the same first character.
– (This is exactly how "real" dictionaries work, where there are tabs collecting words starting with
the same character. The tabs play the role of the buckets.)
– This type of associative collection is called hashtable.
– A so called hash-function maps a key to a hash-code.
● In this case the hash-function maps a string (= the key) to its first letter (= the hash-code).
– The hash-function determines, in which bucket to find or insert a key (or key-value-pair).
● When the bucket is determined, a specific operation takes only place on the items in that bucket. telephoneDictionary
(Bucket)
"Bud""B"
"Ben"
(Bucket)
"Gil""G"
(Bucket)
"Will""W"
"Walt"
3821
9427
7764
4689
1797
public class Bucket {
public string HashCode { get; set; }
public IList<KeyValuePair<string, int>> Items { get; set; }
}
12
12
Associative Collections – Hashtable-based Implementations - Keys
● OK, but how will an implementation using a hashtable help? - To understand it, let'sdissect hashtable starting with the keys.
● The idea of a hashtable is to have a hash-function that calculates a position in a table from the input item.
– How can we have a function that is able to do that for an item?
– The idea is to have a method of an item-object, more specifically the key, that calculates this position.
● In the frameworks Java and .Net each UDT inherits a base type (Object in both frameworks), forming a cosmic hierarchy.
– In Java a UDT can override Object's method hashCode() in order to calculate the hash-code.
– In .Net a UDT can override Object's method GetHashCode() in order to calculate the hash-code.
– Now we have hash-functions/methods to "hash" a name string to the int value of the first letter (incl. some error handling).
– We can use instances of NameKey as keys for hashtables!
// C#
public class NameKey {
public string Name { get; set; }
public override int GetHashCode() {
// Return int-value of first letter.
return string.IsNullOrEmpty(Name) ? 0 : Name[0];
}
}
// Java
public class NameKey { // (members hidden)
public String name; // (getter/setter hidden)
@Override
public int hashCode() {
// Return int-value of first letter.
return (null == getName() || getName().isEmpty()) ? 0 : getName().charAt(0);
}
}
● Java's and .Net's inherited equals()/Equals()
(inherited from Object esp. for .Net reference
types) will just compare references. It means that
equals()/Equals() will only return true, if the same
object is compared to itself, so the inherited
implementations do an identity comparison.
● Java's inherited hashCode() (inherited from
Object) just returns the object's address in
memory. .Net's inherited GetHashCode()
(inherited from Object i.e. for .Net reference types)
uses some number that is guaranteed to be
unique within the .Net AppDomain.
13
13
Associative Collections – Hashtable-based Implementations -
.Net's Dictionary
● In .Net we can now use NameKey as type for the key in an IDictionary that is implemented in terms of a hashtable.
– The hashtable-based implementation of IDictionary is simply called Dictionary! It can (of course) be used like any other IDictionary:
– When GetHashCode() is overridden in the key's type, the hashtable will automatically work:
● When those items are added into the telephoneDictionary, following will happen:
– (1) For the key NameKey{ Name = "Bud" } the hash-code is retrieved. GetHashCode() returns 66.
– (2) In the hashtable-array (here an int[]) the entry on the index 66 is fetched:
● If there is an empty entry at 66, it will be set to the value 3821,
● otherwise the value at the existing entry at 66 will be overwritten with the value 3821.
– Another key (like NameKey{ Name = "Gil" }) yields another hash-code and another index
in the hashtable.
– The operations in (2) are O(1) operations, because accessing indexed collections
(like arrays) on indexes only cost O(1).
● Oversimplification! There's more and it's getting more contrived! And we've to understand it to get the whole picture!
IDictionary<NameKey, int> telephoneDictionary = new Dictionary<NameKey, int>();
telephoneDictionary[new NameKey{ Name = "Bud" }] = 3821; // GetHashCode() returns 66.
telephoneDictionary[new NameKey{ Name = "Gil" }] = 7764; // GetHashCode() returns 71.
telephoneDictionary
hashTable : int[]
382166
......
776471
......
......
38213821
adding these objects
(NameKey)
"Bud"
3821
(NameKey)
"Gil"
7764
● What we discuss here is a sparse array implementation
of a hashtable, in which many slots of the table remain
empty, whilst only few slots of the table with the indexes
matching the hash-code are filled. Real-world
implementations of hashtable are much more compact
(usually the index is calculated as "index = hash-code
% hashtable-length"), but we're not going to discuss
them in this course, because the mere idea is the same.
● We didn't discuss the implementation of
hashcode()/GetHashCode() in great depth, esp. if many
fields are to be taken into consideration in the hash-
code. Many suggestions involve the combination of the
hash-codes with xor operations in order to get a good
distribution including the multiplication of magic
constants to yield prime numbers, which may result in
yet a better distribution, or retrieving hash-codes of
different field-types in different ways. That's all fine, but
many of these tips rely on special assumptions of the
platform and the implementation. Using a simple xor
combination is OK for most cases and magic should
only be taken into consideration, if performance- (i.e.
lookup-) problems are present. Don't be too clever!
14
14
Associative Collections – Hashtable-based Implementations
– Hash Collisions and Buckets
● But there is a problem: what if we try adding keys having the same hash-code, e.g. names with the same first letter?
– The thing that happens here is called hash collision. To handle hash collisions in a hashtable we have to review the idea of Buckets.
● Actually a hashtable is an array of Bucket objects (Not just an int[]!).
– For each hash-code there exists one Bucket.
– The hash-code is the index of the hashtable, where that
hash-code's Bucket resides.
– Within a Bucket all inserted items with a key having the
same hash-code are collected as key-value-pairs.
● Hm... wait! If a hash collision leads to putting items
into the same Bucket, how can the associated value
be retrieved from the hashtable in a definite way?
– In other words: the hashtable story is even more complex!
telephoneDictionary[new NameKey{ Name = "Bud" }] = 3821; // GetHashCode() returns 66.
telephoneDictionary[new NameKey{ Name = "Ben" }] = 9427; // GetHashCode() also returns 66!
telephoneDictionary
hashTable : Bucket[]
(Bucket)66
......
(Bucket)71
......
......
(Bucket)
(Key : NameKey)
"Bud"
(Key : NameKey)
"Ben"
66 hashCode
(Bucket)
(Key : NameKey)
"Gil"
71 hashCode
(Value : int)
3821
(Value : int)
9427
(Value : int)
7764
adding these objects
(NameKey)
"Bud"
3821
(NameKey)
"Ben"
9427
15
15
Associative Collections – Hashtable-based Implementations
– Equality
● As a matter of fact hash collisions can not be avoided, because any objects could have the same hash-code!
– E.g. two names could have the same first letter. This fact is not under our control!
● The answer is that all the key-value-pairs of a hash-code-matching Bucket need to be searched for a certain key.
– The keys of a hash-code-matching Bucket are linearly searched and compared for equality to find the exactly matching key.
– But what means "exactly matching key" in opposite to a (just) hash-code-matching key?
– We have to prepare our key-type to provide another method to check for the exact equality of two keys!
– Similar to objects providing hash-codes, Java and .Net want us overriding the method equals()/Equals() inherited from Object:
● Let's discuss how this works!
public class NameKey { // (members hidden)
public override int GetHashCode() { /* pass */ }
public override bool Equals(object other) {
// Simple and naïve implementation of Equals():
NameKey otherNameKey = (NameKey)other;
return Name == otherNameKey.Name;
}
}
public class NameKey { // (members hidden)
@Override
public int hashCode() { /* pass */ }
@Override
public boolean equals(Object other) {
// Simple and naïve implementation of equals():
NameKey otherNameKey = (NameKey) other;
return getName().equals(otherNameKey.getName());
}
}
16
16
Associative Collections – Hashtable Implementations –
Implementing Equality
● .Net: Mechanics of comparing two NameKey objects with the method Equals():
– Equals() has a parameter that is of the very base type Object (static type).
– We assume the dynamic type behind the passed argument is NameKey (only NameKeys can be equality-compared to each other).
● We can cast the parameter other to NameKey blindly. (And this can be a problem, which we'll discuss shortly.)
– When we've the NameKey behind the parameter, we equality-compare the argument's Name property against this' Name property.
● Mind that we used the operator== to compare the Name properties (well, we could have used Equals() as well); those are of type string.
● Equals() makes a deeper (more expensive) comparison than GetHashCode(), the latter only deals with Name's first letter:
public class NameKey { // (members hidden)
public override bool Equals(object other) {
NameKey otherNameKey = (NameKey)other;
return this.Name == otherNameKey.Name;
}
}
// Both hash-codes evaluate to 66, but these objects are not equal!
NameKey key1 = new NameKey{ Name = "Bud" };
NameKey key2 = new NameKey{ Name = "Ben" };
bool hashCodesAreTheSame = key1.GetHashCode() == key2.GetHashCode();
// evaluates to true
bool areNotEqual = key1.Equals(key2);
// evaluates to false
public class NameKey { // (members hidden)
public override bool Equals(object other) {
NameKey otherNameKey = (NameKey)other;
return this.Name == otherNameKey.Name;
}
}
public class NameKey { // (members hidden)
public override int GetHashCode() {
return string.IsNullOrEmpty(this.Name) ? 0 : this.Name[0];
}
}
17
17
Associative Collections – Hashtable-based Implementations –
Rules of Equality
● We'll not discuss all facets of the implementation of GetHashCode() and Equals(), but here are the most important rules:
– Two objects having the same hash-code need not to return true for calling Equals()!
– But, if two objects having the same dynamic type return true for calling Equals(), then those need to have the same hash-code!
– GetHashCode() and Equals() have to return the same results for the "structurally" same objects unless one of them is modified.
– GetHashCode() and Equals() should be implemented to work very fast.
– Neither GetHashCode() nor Equals() are allowed to throw exceptions!
● If null is passed to Equals() the result has to be false.
● Finally GetHashCode() and Equals() for NameKey could be implemented like so to fulfill these rules:
public class NameKey { // (members hidden)
public override int GetHashCode() {
return string.IsNullOrEmpty(Name) ? 0 : Name[0];
}
public override bool Equals(object other) { // Stable implementation of Equals()
if (this == other) {
return true;
}
if (null != other && GetType() == other.GetType())) {
return Name == ((NameKey)other).Name;
}
return false;
}
}
- checks for identity
- checks for nullity (to avoid exceptions)
- checks the dynamic type of this and the other object
- the cast is type safe
- the Name properties of both objects are equality-compared
18
18
Associative Collections – Hashtable-based Implementations
– The Lookup Algorithm
● All right! Now we have the methods GetHashCode() and Equals() in place. But how do hashtables use these tools to work?
● Let's assume following content in the hashtable-based Dictionary telephoneDictionary:
– Then we'll lookup/search the phone number of "Bud":
● The lookup will initiate following algorithm basically:
– Dictionary will call GetHashCode() on the indexer's (i.e. operator[]) argument 'new NameKey{ Name = "Bud" }' and the result is 66.
– Dictionary will get the Bucket at the hashtable's index 66. This Bucket has two entries!
– Dictionary will call 'Equals(new NameKey{ Name = "Bud" })' against each key of the key-value-pairs in the just returned Bucket.
– The value of the key-value-pair for which the key was equal to 'new NameKey{ Name = "Bud" })' will be returned.
● Read these steps for multiple times and make sure you understood those!Now we have to discuss some details...
– The algorithms to insert or update a key-value-pair work the same way as for lookups!
telephoneDictionary
hashTable : Bucket[]
(Bucket)66
......
......
(Bucket)
(Key : NameKey)
"Bud"
(Key : NameKey)
"Ben"
66 hashCode
(Value : int)
3821
(Value : int)
9427
searching this key
(NameKey)
"Bud"
IDictionary<NameKey, int> telephoneDictionary = new Dictionary<NameKey, int> {
{ new NameKey { Name = "Bud" }, 3821 },
{ new NameKey { Name = "Ben" }, 9427 }
} ;
int no = telephoneDictionary[new NameKey { Name = "Bud" }];
19
19
Associative Collections – Hashtable-based Implementation – best
Case Complexity
● Although the algorithm to lookup keys is complex, the analysis of a hashtable's complexity is reallysimple!
● The best case yields a complexity of O(1)! - Wow!
– If every item can be associated to exactly one distinct Bucket each, we have
one item in every Bucket: an 1 : 1 association between items and Buckets.
– This means that every item has a distinct hash-code and thus a distinct
index in the hashtable.
– As the hashtable is an array and arrays can access their items by index
with O(1) complexity, we have the best case for hashtable!
● In the best case, accessing a hashtable, means accessing an array by
index with constant complexity.
– This is better than O(log n) for tree-based associative collections!
– This is the best we can get for collections!
●
But there is also a worst case!
telephoneDictionary
hashTable : Bucket[]
(Bucket)66
......
(Bucket)71
......
......
(Bucket)
(Key : NameKey)
"Gil"
71 hashCode
(Value : int)
7764
(Bucket)
(Key : NameKey)
"Bud"
66 hashCode
(Value : int)
3821
(Bucket)
(Key : NameKey)
"Will"
87 hashCode
(Value : int)
4689
(Bucket)87
......
20
20
Associative Collections – Hashtable-based Implementations –
worst Case Complexity
● The worst case yields a complexity of O(n)! - Oh no!
– If every item can be associated to the same Bucket, we have
all items in only one single Bucket: an n : 1 association between items and
Buckets.
– This means that every item has the same hash-code and thus the same
index in the hashtable.
– As the only one Bucket needs to be search linearily to find the equal key,
the worst complexity boils down to the linear complexity O(n)!
● In the worst case the workload is moved to a single Bucket that must
be searched in a linear manner.
telephoneDictionary
hashTable : Bucket[]
(Bucket)66
......
......
(Bucket)
(Key : NameKey)
"Bud"
(Key : NameKey)
"Ben"
66 hashCode
(Value : int)
3821
(Value : int)
9427
(Key : NameKey)
"Betty"
(Value : int)
8585
● Java's java.util.concurrent.ConcurrentHashMap is
able to switch the storage of the buckets from a List-
implementation to a sorted tree implementation, if
the buckets grow too large (i.e. too many keys with
the same hashcode). The benefit is that searching a
bucket will be more efficient then (O(n) for List, but
O(log n) for sorted trees).
21
21
Associative Collections – Hashtable-based Implementations –
Controlling Performance – Part I
● An interesting point when working with .Net's Dictionary is how we can influence the performance as developers:
– We can override GetHashCode() and Equals() for our own UDTs being used as keys and this is what we are going to discuss now.
● Concretely we have a problem with the UDT NameKey: we'll get the same hash-code for keys having the same first letter!
– However, we can override GetHashCode(), so that a "deeper" or in other words "more distinct" hash-code will be produced.
– .Net's string is able to produce its own hash-code that is calculated concerning the whole string-value and not only the first letter:
● We already know that each .Net type inherits Object and can override GetHashCode() and Equals().
– A type implementing equality in the .Net framework needs overriding GetHashCode() and Equals()!
●
Think: GetHashCode() => level one equality, Equals() => level two equality.
– Many types of the .Net framework provide useful overrides of GetHashCode() and Equals() to implement equality.
– (GetHashCode() is not only used with Dictionary's but also in other places of the .Net framework.)
– (A type implementing equivalence in the .Net framework needs implementing IComparable or another type implementing IComparer.)
public class NameKey {
public string Name {get; set;}
public override int GetHashCode() {
// Return int-value of first letter.
return string.IsNullOrEmpty(Name) ? 0 : Name[0];
}
}
public class NameKey {
public string Name {get; set;}
public override int GetHashCode() {
// Return the hash-code of the string Name.
return string.IsNullOrEmpty(Name) ? 0 : Name.GetHashCode();
}
}
● switch-case with strings in C# and Java uses the
strings' hash-code to do the comparisons.
● It should be mentioned that some methods of list-
types also use equality to function, e.g. methods
like Contains(), Remove() or IndexOf().
22
22
NO! Stop it! We're not going to discuss it here! As a matter of fact implementing equality correctly in .Net and Java is
not simple and it is often done downright wrong and potentially dangerous!
I have one simple tip: Don't be too clever and follow the rules!
Ok, we'll discuss it in depth in a future lecture.
***Under Construction – Equality – Under Construction***
● An interesting point is that each object in Java/.Net implements equals()/Equals() and hashcode()/GetHashCode().
– But overrides of those methods need to follow certain rules. Let's discuss those.
23
23
Associative Collections – Hashtable-based Implementations –
Controlling Performance – Part II
● The last implementation of NameKey does basically delegate equality fully to its property Name, which is of type string.
– Hm! - To drive this point home: we can get rid of the type NameKey and use string instead as key-type! Hey!
● As a matter of fact, it is often not required to program extra UDTs as key types!
– The present .Net types are often sufficient to be used as keys for hashtables. Most often int and string are used as key types.
– Nevertheless, .Net (and Java) developers have to know, how equality needs to be implemented correctly!
● In principle the implementation pattern and even the idiom for equality works the same way in Java!
// A dictionary that associates strings (names) to ints (phone numbers):
IDictionary<string, int> telephoneDictionary = new Dictionary<string, int>();
// Adding two string-int key-value-pairs:
telephoneDictionary["Bud"] = 3821;
telephoneDictionary["Gil"] = 7764;
// Looking up the phone number of "Bud":
int no = telephoneDictionary["Bud"];
24
24
●
Without any further explanation it should be clear that maintaining a collection having only unique items is very painful.
– E.g. collecting only the surnames from a deck of invitations. It could be done by filling a list w/o duplicates:
●
There is a variant of associative collections that just avoids the presence of duplicates. Those are often called sets.
– Sets are associative collections like dictionaries, in which the keys are the values!
– In Java/.Net we can use BST-based (TreeSet/SortedSet) and hashtable-based (HashSet/HashSet) implementations of sets.
– In C++ we can use set either the BST-based std::set (<set>) or the hashtable-based std::unordered_set (C++11: <unordered_set>).
// Java
List<String> allSurnames = Arrays.asList("Taylor", "Miller", "Taylor", "Miller");
List<String> uniqueSurnames = new ArrayList<>();
for (String surName : allSurnames) {
if (!uniqueSurnames.contains(surName)) { // Filters unique items out.
uniqueSurnames.add(surName);
}
}
for (String surName : uniqueSurnames) {
System.out.println(surName);
}
// > Taylor
// > Miller
// Implementation using a Set:
Set<String> uniqueSurnames2 = new HashSet<>();
for (String surName : allSurnames) {
uniqueSurnames2.add(surName);
}
for (String surName : uniqueSurnames2) {
System.out.println(surName);
}
Associative Collections – Sets
● As for dictionaries (and all associative collections)
the cause using sets is to exploit its self-
organization (keeping items unique), instead of
organizing it ourselves.
25
25
●
Set collections can really represent mathematical sets, this also includes operations on sets (e.g. via .Net's ISet interface):
●
Subsets
●
Union
// C#/.Net
ISet<int> A = new HashSet<int>{ 1, 2, 3, 4, 5, 6, 7, 8, 9 };
ISet<int> B = new HashSet<int>{ 3, 6, 8 };
bool BSubSetOfA = B.IsSubsetOf(A);
// >true
bool BProperSubSetOfA = B.IsProperSubsetOf(A);
// >true
Associative Collections – Operations on Sets – Part I
// Java
Set<Integer> A = new HashSet<>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9));
Set<Integer> B = new HashSet<>(Arrays.asList(3, 6, 8));
boolean BSubSetOfA = A.containsAll(B);
// >true
// Java doesn't provide a test for _proper_ subsets.
A⊆B A⊂B A={1,2,3,4,5,6,7,8,9};B={3,6,8}
A B
A∪B A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}∪{4,5,6,7,8,9}={1,2,3,4,5,6,7,8,9}
ISet<int> A = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 };
ISet<int> B = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 };
A.UnionWith(B); // The set A will be _modified_!
// >{1, 2, 3, 4, 5, 6, 7, 8, 9}
// LINQ's Union() extension method will create a _new_
// sequence:
ISet<int> A2 = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 };
ISet<int> B2 = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 };
IEnumerable<int> A2UnionB2 = A2.Union(B2);
// >{1, 2, 3, 4, 5, 6, 7, 8, 9}
Set<Integer> A = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 5, 6));
Set<Integer> B = new TreeSet<>(Arrays.asList(4, 5, 6, 7, 8, 9));
boolean AWasModified = A.addAll(B); // The set A will be _modified_!
// >true
System.out.println(A);
// >[1, 2, 3, 4, 5, 6, 7, 8, 9]
A B
● A proper subset means, that a set is a subset of
another set, but both sets are not equal!
● Subsets can also be expressed with LINQ, but
usually the type ISet and its implementors should
be used, because it leads to more expressive
code:
// C#/.Net/LINQ
ISet<int> A = new HashSet<int>{ 1, 2, 3, 4, 5, 6, 7, 8, 9 };
ISet<int> B = new HashSet<int>{ 1, 2, 3, 4, 5, 6, 7, 8, 9 };
ISet<int> C = new HashSet<int>{ 3, 6, 8 };
// Subset:
bool BIsSubsetOfA = !B.Except(A).Any();
// >true
bool CIsSubsetOfA = !C.Except(A).Any();
// >true
// Proper Subset:
bool BIsProperSubSetOfA = A.Except(B).Any();
// > false
bool CIsProperSubSetOfA = A.Except(C).Any();
// > true
26
26
●
Difference
●
Symmetric difference
// C#/.Net
ISet<int> A = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 };
ISet<int> B = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 };
A.ExceptWith(B); // The set A will be _modified_!
// >{1, 2, 3}
// LINQ's Except() method will create a _new_ sequence:
ISet<int> A2 = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 };
ISet<int> B2 = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 };
IEnumerable<int> A2ExceptB2 = A2.Except(B2);
// >{1, 2, 3}
Associative Collections – Operations on Sets – Part II
A B
A ∖B
// Java
Set<Integer> A = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 5, 6));
Set<Integer> B = new TreeSet<>(Arrays.asList(4, 5, 6, 7, 8, 9));
A.removeAll(B); // The set A will be _modified_!
System.out.println(A);
// >[1, 2, 3]
A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}∖{4,5,6,7,8,9}={1,2,3}
A▵B(:=( A∖ B)∪(B∖ A)) A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}▵{4,5,6,7,8,9}={1,2,3,7,8,9}
A B
ISet<int> A = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 };
ISet<int> B = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 };
A.SymmetricExceptWith(B); // The set A will be _modified_!
// >{1, 2, 3, 7, 8, 9}
// Using LINQ we can create a _new_ sequence:
ISet<int> A2 = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 };
ISet<int> B2 = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 };
IEnumerable<int> A2SymmetricExceptB2 =
A2.Except(B2).Union(B2.Except(A2));
// >{1, 2, 3, 7, 8, 9}
Set<Integer> A = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 5, 6));
Set<Integer> B = new TreeSet<>(Arrays.asList(4, 5, 6, 7, 8, 9));
Set<Integer> A2 = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 5, 6));
A.removeAll(B); // The sets A and B will be _modified_!
B.removeAll(A2);
A.addAll(B);
System.out.println(A);
// >[1, 2, 3, 7, 8, 9]
27
27
●
Intersection
Associative Collections – Operations on Sets – Part III
A∩B A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}∩{4,5,6,7,8,9}={4,5,6}
A B
// C#/.Net
ISet<int> A = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 };
ISet<int> B = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 };
A.IntersectWith(B); // The set A will be _modified_!
// >{4, 5, 6}
// LINQ's Intersect() method will create a _new_ sequence:
ISet<int> A2 = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 };
ISet<int> B2 = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 };
IEnumerable<int> A2IntersectionWithB2 = A2.Intersect(B2);
// >{4, 5, 6}
// Java
Set<Integer> A = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 5, 6));
Set<Integer> B = new TreeSet<>(Arrays.asList(4, 5, 6, 7, 8, 9));
A.retainAll(B); // The set A will be _modified_!
System.out.println(A);
// >[4, 5, 6]
28
28
●
Inserting values having already present keys in an associative collectionoverwrites or updates present values.
– Sometimes it is not desired. E.g. mind a telephoneDictionary, in which a name can have more than one phone number!
●
Some collection frameworks provide associative collections that can handlemultiplicity.
– In C++ we can use std::multimap (<map>) and std::multiset (<set>).
●
In other frameworks (Java, .Net etc.) multi-associative collections need to be explicitly implemented or taken from 3rd
party.
– 3rd
party sources: Apache Commons (Java)
// C++11
std::multimap<std::string, int> telephoneDictionary {
{ "Ben", 9427 }, // Mind: two values with the same key are added here.
{ "Ben", 4367 },
{ "Jody", 1781 }, // Mind: three values with the same key are added here.
{ "Jody", 9032 },
{ "Jody", 8038 }
};
// It is required to use STL iterators, because std::multimap provides no subscript operator:
for (auto item = telephoneDictionary.begin(); item != telephoneDictionary.end(); ++item) {
std::cout<<"Name: "<<item->first<<", Phone number: "<<item->second<<std::endl;
}
Special associative Collections – Multiplicity of Keys
telephoneDictionary
9427"Ben" 4367
1781"Jody" 9032 8038
29
29
●
Key
– Refrain from modifying key objects managed in associative collections.
●
Sorted/equivalence-based associative collections:
– TreeMap/TreeSet (Java), SortedDictionary/SortedSet (.Net), std::set/std::map (C++),
– The internal organization of keys is equivalence-based:
●
Equivalence-based means that key objects are organized due to their relative/natural order. This is often implemented with a BST.
●
The key-type needs implementing Comparable (Java) or IComparable (.Net) or operator< (C++ STL) for semantically correct "natural order".
●
Or a Comparator (Java) or IComparer (.Net) or a comparison functor (C++ STL) needs to be specified for the associative collection that implements
"natural ordering" for the key-type.
– Searching/inserting/removing items can be done very fast (O(logn)), because binary searches are used (BST).
– Sorted associative collections are unordered!
●
The iteration will yield the items in sorted order (BST in-order). The order, which items have been added/inserted/removed doesn't matter!
– => Use these associative collections, if sorting of keys or the control of key-comparison (e.g. reverse order) is needed!
Things to ponder about Associative Collections – Part I
30
30
●
Equality-based associative collections:
– HashMap/HashSet (Java), Dictionary/HashSet (.Net), NSDictionary/NSSet (Cocoa), std::unordered_map/std::unordered_set (C++11),
associative arrays (JavaScript)
– The internal organization of keys is equality-based:
●
By equality. I.e. by hash-codes and the result of the equality check (methods hashCode()equals() (Java) or GetHashCode()/Equals() (.Net)).
●
The key-type needs implementing hashCode()/equals() (Java) or GetHashCode()/Equals() (.Net) for semantically correct equality.
●
Or an IHashCodeProvider/IEqualityComparer (.Net) needs to be specified for the associative collection that implements equality for the key-type.
●
In C++11 STL a hasher functor should be specified for the associative collection that provides hash-codes for the key-type.
– Searching/inserting/removing items can be done O(1) no extra search operation is required, the hash-code is used as an index.
– These associative collections are unordered!
●
The iteration order makes no guarantees about how items are yielded. It can change completely when items are added.
●
In most cases equality-based associative collection should be our first choice, those are potentially most efficient.
– => Use these associative collections, if sorting of keys or the control of key-comparison doesn't matter!
●
Ordered associative collections:
– LinkedHashMap (Java) and OrderedDictionary (.Net) will iterate in that order, in which the items were put into the collection.
●
LinkedHashMap is the type Groovy uses to implement map literals.
Things to ponder about Associative Collections – Part II
31
31
●
Old and new collections:
– Java:
●
The old Hashtable is not null-aware on keys, adding null will throw a NullPointerException. Better use HashMap/HashSet (Java 1.2 or newer).
●
Hashtable and HashMap return null, if a requested key has no value (i.e. key is not present).
– .Net:
●
The old object-based Hashtable should not be used for new code using .Net 1 or newer. Better use Dictionary/SortedDictionary (.Net 2 or newer).
●
Be aware that Hashtable returns null, if a key has no value (i.e. key is not present). Dictionary/SortedDictionary will throw a KeyNotFoundException.
●
Keys with the value null:
– Java:
●
TreeMap/TreeSet don't allow keys having the value null, then a NullPointerException will be thrown.
●
HashMap/HashSet can digest keys having the value null.
– .Net:
●
Dictionary/SortedDictionary don't allow keys having the value null, then an ArgumentNullException will be thrown.
●
HashSet/SortedSet can digest keys having the value null.
●
It was never mentioned explicitly: There are basically no constraints on the types of the values in associative collections!
Things to ponder about Associative Collections – Part III
32
32
Thank you!

More Related Content

What's hot

Data Structures 8
Data Structures 8Data Structures 8
Data Structures 8
Dr.Umadevi V
 
Introduction to Data Structure part 1
Introduction to Data Structure part 1Introduction to Data Structure part 1
Introduction to Data Structure part 1
ProfSonaliGholveDoif
 
Lecture1 data structure(introduction)
Lecture1 data structure(introduction)Lecture1 data structure(introduction)
DATA STRUCTURE
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURE
Rohit Rai
 
Elementary data organisation
Elementary data organisationElementary data organisation
Elementary data organisation
Muzamil Hussain
 
Chapter 8: tree data structure
Chapter 8:  tree data structureChapter 8:  tree data structure
Chapter 8: tree data structure
Mahmoud Alfarra
 
Files and data storage
Files and data storageFiles and data storage
Files and data storage
Zaid Shabbir
 
Basics of data structure
Basics of data structureBasics of data structure
Basics of data structure
Rajendran
 
Introduction To Data Structures.
Introduction To Data Structures.Introduction To Data Structures.
Introduction To Data Structures.
Education Front
 
Mca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structureMca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structure
Rai University
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
Prof Ansari
 
Data structure power point presentation
Data structure power point presentation Data structure power point presentation
Data structure power point presentation
Anil Kumar Prajapati
 
Data structure lecture 1
Data structure lecture 1Data structure lecture 1
Data structure lecture 1
Kumar
 
Data Structures 6
Data Structures 6Data Structures 6
Data Structures 6
Dr.Umadevi V
 
Data Structures 7
Data Structures 7Data Structures 7
Data Structures 7
Dr.Umadevi V
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structure
Rai University
 
Data structure Assignment Help
Data structure Assignment HelpData structure Assignment Help
Data structure Assignment Help
JosephErin
 
Data Structure # vpmp polytechnic
Data Structure # vpmp polytechnicData Structure # vpmp polytechnic
Data Structure # vpmp polytechnic
lavparmar007
 
Data Structures
Data StructuresData Structures
Data Structures
Nitesh Bichwani
 
Introduction to data structure ppt
Introduction to data structure pptIntroduction to data structure ppt
Introduction to data structure ppt
NalinNishant3
 

What's hot (20)

Data Structures 8
Data Structures 8Data Structures 8
Data Structures 8
 
Introduction to Data Structure part 1
Introduction to Data Structure part 1Introduction to Data Structure part 1
Introduction to Data Structure part 1
 
Lecture1 data structure(introduction)
Lecture1 data structure(introduction)Lecture1 data structure(introduction)
Lecture1 data structure(introduction)
 
DATA STRUCTURE
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURE
 
Elementary data organisation
Elementary data organisationElementary data organisation
Elementary data organisation
 
Chapter 8: tree data structure
Chapter 8:  tree data structureChapter 8:  tree data structure
Chapter 8: tree data structure
 
Files and data storage
Files and data storageFiles and data storage
Files and data storage
 
Basics of data structure
Basics of data structureBasics of data structure
Basics of data structure
 
Introduction To Data Structures.
Introduction To Data Structures.Introduction To Data Structures.
Introduction To Data Structures.
 
Mca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structureMca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structure
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
 
Data structure power point presentation
Data structure power point presentation Data structure power point presentation
Data structure power point presentation
 
Data structure lecture 1
Data structure lecture 1Data structure lecture 1
Data structure lecture 1
 
Data Structures 6
Data Structures 6Data Structures 6
Data Structures 6
 
Data Structures 7
Data Structures 7Data Structures 7
Data Structures 7
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structure
 
Data structure Assignment Help
Data structure Assignment HelpData structure Assignment Help
Data structure Assignment Help
 
Data Structure # vpmp polytechnic
Data Structure # vpmp polytechnicData Structure # vpmp polytechnic
Data Structure # vpmp polytechnic
 
Data Structures
Data StructuresData Structures
Data Structures
 
Introduction to data structure ppt
Introduction to data structure pptIntroduction to data structure ppt
Introduction to data structure ppt
 

Similar to (5) collections algorithms

Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02
Getachew Ganfur
 
Funddamentals of data structures
Funddamentals of data structuresFunddamentals of data structures
Funddamentals of data structures
Globalidiots
 
Data Structure Question Bank(2 marks)
Data Structure Question Bank(2 marks)Data Structure Question Bank(2 marks)
Data Structure Question Bank(2 marks)
pushpalathakrishnan
 
Master of Computer Application (MCA) – Semester 4 MC0080
Master of Computer Application (MCA) – Semester 4  MC0080Master of Computer Application (MCA) – Semester 4  MC0080
Master of Computer Application (MCA) – Semester 4 MC0080
Aravind NC
 
Splay tree
Splay treeSplay tree
Splay tree
hina firdaus
 
splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNNsplaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
ratnapatil14
 
project on data structures and algorithm
project on data structures and algorithmproject on data structures and algorithm
project on data structures and algorithm
AnujKumar566766
 
Tree data structure.pptx
Tree data structure.pptxTree data structure.pptx
Tree data structure.pptx
ssuser039bf6
 
Hi,Based on the Data We can Use diffrent type of Data Structure..pdf
Hi,Based on the Data We can Use diffrent type of Data Structure..pdfHi,Based on the Data We can Use diffrent type of Data Structure..pdf
Hi,Based on the Data We can Use diffrent type of Data Structure..pdf
aradhana9856
 
Unit iv data structure-converted
Unit  iv data structure-convertedUnit  iv data structure-converted
Unit iv data structure-converted
Shri Shankaracharya College, Bhilai,Junwani
 
Data structure
 Data structure Data structure
Data structure
Shahariar limon
 
Skiena algorithm 2007 lecture05 dictionary data structure trees
Skiena algorithm 2007 lecture05 dictionary data structure treesSkiena algorithm 2007 lecture05 dictionary data structure trees
Skiena algorithm 2007 lecture05 dictionary data structure trees
zukun
 
Tree Data Structure by Daniyal Khan
Tree Data Structure by Daniyal KhanTree Data Structure by Daniyal Khan
Tree Data Structure by Daniyal Khan
Daniyal Khan
 
Zippers: Derivatives of Regular Types
Zippers: Derivatives of Regular TypesZippers: Derivatives of Regular Types
Zippers: Derivatives of Regular Types
Jay Coskey
 
Binary Trees
Binary TreesBinary Trees
Binary Trees
Sriram Raj
 
Binary trees
Binary treesBinary trees
Binary trees
Ssankett Negi
 
Coding Assignment 3CSC 330 Advanced Data Structures, Spri.docx
Coding Assignment 3CSC 330 Advanced Data Structures, Spri.docxCoding Assignment 3CSC 330 Advanced Data Structures, Spri.docx
Coding Assignment 3CSC 330 Advanced Data Structures, Spri.docx
mary772
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
ssuser7922b8
 
Poggi analytics - clustering - 1
Poggi   analytics - clustering - 1Poggi   analytics - clustering - 1
Poggi analytics - clustering - 1
Gaston Liberman
 
DS Module 1.pptx
DS Module 1.pptxDS Module 1.pptx
DS Module 1.pptx
sarala9
 

Similar to (5) collections algorithms (20)

Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02
 
Funddamentals of data structures
Funddamentals of data structuresFunddamentals of data structures
Funddamentals of data structures
 
Data Structure Question Bank(2 marks)
Data Structure Question Bank(2 marks)Data Structure Question Bank(2 marks)
Data Structure Question Bank(2 marks)
 
Master of Computer Application (MCA) – Semester 4 MC0080
Master of Computer Application (MCA) – Semester 4  MC0080Master of Computer Application (MCA) – Semester 4  MC0080
Master of Computer Application (MCA) – Semester 4 MC0080
 
Splay tree
Splay treeSplay tree
Splay tree
 
splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNNsplaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
 
project on data structures and algorithm
project on data structures and algorithmproject on data structures and algorithm
project on data structures and algorithm
 
Tree data structure.pptx
Tree data structure.pptxTree data structure.pptx
Tree data structure.pptx
 
Hi,Based on the Data We can Use diffrent type of Data Structure..pdf
Hi,Based on the Data We can Use diffrent type of Data Structure..pdfHi,Based on the Data We can Use diffrent type of Data Structure..pdf
Hi,Based on the Data We can Use diffrent type of Data Structure..pdf
 
Unit iv data structure-converted
Unit  iv data structure-convertedUnit  iv data structure-converted
Unit iv data structure-converted
 
Data structure
 Data structure Data structure
Data structure
 
Skiena algorithm 2007 lecture05 dictionary data structure trees
Skiena algorithm 2007 lecture05 dictionary data structure treesSkiena algorithm 2007 lecture05 dictionary data structure trees
Skiena algorithm 2007 lecture05 dictionary data structure trees
 
Tree Data Structure by Daniyal Khan
Tree Data Structure by Daniyal KhanTree Data Structure by Daniyal Khan
Tree Data Structure by Daniyal Khan
 
Zippers: Derivatives of Regular Types
Zippers: Derivatives of Regular TypesZippers: Derivatives of Regular Types
Zippers: Derivatives of Regular Types
 
Binary Trees
Binary TreesBinary Trees
Binary Trees
 
Binary trees
Binary treesBinary trees
Binary trees
 
Coding Assignment 3CSC 330 Advanced Data Structures, Spri.docx
Coding Assignment 3CSC 330 Advanced Data Structures, Spri.docxCoding Assignment 3CSC 330 Advanced Data Structures, Spri.docx
Coding Assignment 3CSC 330 Advanced Data Structures, Spri.docx
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Poggi analytics - clustering - 1
Poggi   analytics - clustering - 1Poggi   analytics - clustering - 1
Poggi analytics - clustering - 1
 
DS Module 1.pptx
DS Module 1.pptxDS Module 1.pptx
DS Module 1.pptx
 

More from Nico Ludwig

Grundkurs fuer excel_part_v
Grundkurs fuer excel_part_vGrundkurs fuer excel_part_v
Grundkurs fuer excel_part_v
Nico Ludwig
 
Grundkurs fuer excel_part_iv
Grundkurs fuer excel_part_ivGrundkurs fuer excel_part_iv
Grundkurs fuer excel_part_iv
Nico Ludwig
 
Grundkurs fuer excel_part_iii
Grundkurs fuer excel_part_iiiGrundkurs fuer excel_part_iii
Grundkurs fuer excel_part_iii
Nico Ludwig
 
Grundkurs fuer excel_part_ii
Grundkurs fuer excel_part_iiGrundkurs fuer excel_part_ii
Grundkurs fuer excel_part_ii
Nico Ludwig
 
Grundkurs fuer excel_part_i
Grundkurs fuer excel_part_iGrundkurs fuer excel_part_i
Grundkurs fuer excel_part_i
Nico Ludwig
 
(2) gui drawing
(2) gui drawing(2) gui drawing
(2) gui drawing
Nico Ludwig
 
(2) gui drawing
(2) gui drawing(2) gui drawing
(2) gui drawing
Nico Ludwig
 
(1) gui history_of_interactivity
(1) gui history_of_interactivity(1) gui history_of_interactivity
(1) gui history_of_interactivity
Nico Ludwig
 
(1) gui history_of_interactivity
(1) gui history_of_interactivity(1) gui history_of_interactivity
(1) gui history_of_interactivity
Nico Ludwig
 
New c sharp4_features_part_vi
New c sharp4_features_part_viNew c sharp4_features_part_vi
New c sharp4_features_part_vi
Nico Ludwig
 
New c sharp4_features_part_v
New c sharp4_features_part_vNew c sharp4_features_part_v
New c sharp4_features_part_v
Nico Ludwig
 
New c sharp4_features_part_iv
New c sharp4_features_part_ivNew c sharp4_features_part_iv
New c sharp4_features_part_iv
Nico Ludwig
 
New c sharp4_features_part_iii
New c sharp4_features_part_iiiNew c sharp4_features_part_iii
New c sharp4_features_part_iii
Nico Ludwig
 
New c sharp4_features_part_ii
New c sharp4_features_part_iiNew c sharp4_features_part_ii
New c sharp4_features_part_ii
Nico Ludwig
 
New c sharp4_features_part_i
New c sharp4_features_part_iNew c sharp4_features_part_i
New c sharp4_features_part_i
Nico Ludwig
 
New c sharp3_features_(linq)_part_v
New c sharp3_features_(linq)_part_vNew c sharp3_features_(linq)_part_v
New c sharp3_features_(linq)_part_v
Nico Ludwig
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
Nico Ludwig
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
Nico Ludwig
 
New c sharp3_features_(linq)_part_iii
New c sharp3_features_(linq)_part_iiiNew c sharp3_features_(linq)_part_iii
New c sharp3_features_(linq)_part_iii
Nico Ludwig
 
New c sharp3_features_(linq)_part_ii
New c sharp3_features_(linq)_part_iiNew c sharp3_features_(linq)_part_ii
New c sharp3_features_(linq)_part_ii
Nico Ludwig
 

More from Nico Ludwig (20)

Grundkurs fuer excel_part_v
Grundkurs fuer excel_part_vGrundkurs fuer excel_part_v
Grundkurs fuer excel_part_v
 
Grundkurs fuer excel_part_iv
Grundkurs fuer excel_part_ivGrundkurs fuer excel_part_iv
Grundkurs fuer excel_part_iv
 
Grundkurs fuer excel_part_iii
Grundkurs fuer excel_part_iiiGrundkurs fuer excel_part_iii
Grundkurs fuer excel_part_iii
 
Grundkurs fuer excel_part_ii
Grundkurs fuer excel_part_iiGrundkurs fuer excel_part_ii
Grundkurs fuer excel_part_ii
 
Grundkurs fuer excel_part_i
Grundkurs fuer excel_part_iGrundkurs fuer excel_part_i
Grundkurs fuer excel_part_i
 
(2) gui drawing
(2) gui drawing(2) gui drawing
(2) gui drawing
 
(2) gui drawing
(2) gui drawing(2) gui drawing
(2) gui drawing
 
(1) gui history_of_interactivity
(1) gui history_of_interactivity(1) gui history_of_interactivity
(1) gui history_of_interactivity
 
(1) gui history_of_interactivity
(1) gui history_of_interactivity(1) gui history_of_interactivity
(1) gui history_of_interactivity
 
New c sharp4_features_part_vi
New c sharp4_features_part_viNew c sharp4_features_part_vi
New c sharp4_features_part_vi
 
New c sharp4_features_part_v
New c sharp4_features_part_vNew c sharp4_features_part_v
New c sharp4_features_part_v
 
New c sharp4_features_part_iv
New c sharp4_features_part_ivNew c sharp4_features_part_iv
New c sharp4_features_part_iv
 
New c sharp4_features_part_iii
New c sharp4_features_part_iiiNew c sharp4_features_part_iii
New c sharp4_features_part_iii
 
New c sharp4_features_part_ii
New c sharp4_features_part_iiNew c sharp4_features_part_ii
New c sharp4_features_part_ii
 
New c sharp4_features_part_i
New c sharp4_features_part_iNew c sharp4_features_part_i
New c sharp4_features_part_i
 
New c sharp3_features_(linq)_part_v
New c sharp3_features_(linq)_part_vNew c sharp3_features_(linq)_part_v
New c sharp3_features_(linq)_part_v
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
 
New c sharp3_features_(linq)_part_iii
New c sharp3_features_(linq)_part_iiiNew c sharp3_features_(linq)_part_iii
New c sharp3_features_(linq)_part_iii
 
New c sharp3_features_(linq)_part_ii
New c sharp3_features_(linq)_part_iiNew c sharp3_features_(linq)_part_ii
New c sharp3_features_(linq)_part_ii
 

Recently uploaded

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 

Recently uploaded (20)

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 

(5) collections algorithms

  • 2. 2 2 ● Collections – Part V – Implementation Strategies of Map/Dictionary-like associative Collections ● BST-based Implementation ● Hashtable-based Implementation – A 20K Miles Perspective on Trees in Computer Science – Equality – Multimaps – Set-like associative Collections TOC
  • 3. 3 3 ● In the last lecture we learned about associative collections. – Associative collections allow associating keys with values. ● Example: In white pages names are associated to phone numbers. – Associative collections allow to lookup a value for a certain key, e.g. getting the phone number of a specific person. – Associative collections could be implemented with two arrays as shown before, but this would be very inefficient for common cases. ● In this lecture we're going to understand how associative collectionsare implemented. ● We'll start with SortedDictionary, which was introduced in the last lecture. We already know about SortedDictionary that – it allows only one value per key, – that it organizes its keys by equivalence, – that a .Net type provides equivalence by implementing IComparable or delegating equivalence to objects implementing IComparer. ● Ok, the organization of keys is based on equivalence, but when a key is in aSortedDictionary, where does it have to "be"? – I.e. where and how will it be stored? – We're going to understand this in this lecture! Implementation of Associative Collections
  • 4. 4 4 ● In opposite to indexed/sequential collections, items got and set by analyzing their relationships in associative collections. ● The idea of SortedDictionary is to keep the contained items sorted by key whilst adding items! – So, its way of key organization is the analysis of the sort order of keys of the items. ● To make this happen, SortedDictionary is implemented with a tree-like storage organization for the keys. – When we analyzed algorithms we learned that the most efficient sorting algorithms can be visualized and analyzed with trees. ● I.e. those implementing "divide and conquer", like mergesort and quicksort. ● Before we discuss SortedDictionary's tree-based implementation, we're going to talk about alternative implementations. ● Sidebar: Java's implementation of a sorted associative collection is calledTreeMap. It implements the interface Map. Associative Collections – What's in a SortedDictionary?
  • 5. 5 5 ● SortedDictionary could be implemented with a sorted list. – Finding would be fast (O(logn)), but inserting would be costly (O(n), because of resizing). – It is simple to code. ● SortedDictionary could be implemented with a sorted linked list. – Inserting would be fast if the location is known (O(1)), but finding would be costly (O(n), finding involves the iteration of a linked list). – More memory per item is required (e.g. one item plus a few references, whereby (sorted) lists don't (just one item, due to continuous memory)). ● Problems of these approaches: – For lists continuos memory is the problem. → It makes the data structure rigid. → Resizing required. – For linked lists finding items is a problem due to the loose structure. → "Linked iteration" ● Let's finally discuss how we come from sorted lists and sorted linked lists to the real implementationof SortedDictionary. "Helen" "James""Gil""Clark""Ed" "Clark" "Ed" "Gil" "James" null "Helen" SortedDictionary – two alternative Implementations
  • 6. 6 6 ● When we examine a sorted linked list we'll find an interesting way of thinking about finding items. – If we had a reference to the middle item, we could determine the "direction" or half to insert a new item to keep the linked list sorted. – Therefor it would be nice to allow moving into both directions, so we should use a sorted doubly linked list (3rd alternative)! – The idea of referencing only the middle item of an already sorted doubly linked list can be recursively applied to the sorted halves. – The resulting structure is a tree, on which the real implementation of SortedDictionary is built upon! "Clark" "Ed" "Helen" "James""Gil" "Clark" "Ed" "Helen" "James""Gil" "Bud" "Will" null "Will" null "Bud" null null"Clark" "Ed" null "Helen" "James"null null "Gil" "Will" nullnullnull "Bud" null "Kyle" "Kyle" "Kyle" "Kyle" SortedDictionary – Coming to the real Implementation
  • 7. 7 7 ● Trees represent one of the many data structures available in cs (others are graphs, relational, heap, stack etc.) – Similar to collections data structures can be categorized in different ways, e.g. a tree is a special form of a graph. ● Basic features of trees: – Trees are recursive branching structures consisting of linked nodes. – Trees have only one root node. ● In cs the root of a tree is written on the top at graphical representations. – A node can be reached by exactly one path from the root. ● This is only true for linear paths, in xpath there exist various expressions to select any node from the root. ● Trees represent data that has a hierarchy. This is very common in computer business. – In the last example the hierarchy was the order: left => less than root, right => greater than root. – Directory and file structures. – A function call hierarchy: a call stack. – "Has a" and "is a" associations in object-oriented systems. – The structure of an XML file. Trees in Computer Science (cs)
  • 8. 8 8 ● The final solution we found to handle sorted collections is a special form of a tree, abinary search tree (BST). – The implementation of SortedDictionary is based on a BST (BST-based). – The core strategy of BSTs is to maintaining pointers/references to the middle of subtrees. ● Here some general terminology on trees (seen from "James"): – "James" is a node or cell. The arrows can be called edges. – "James" has two children. ● Below "James" there is a subtree of two children. – "James"' parent is "Gil". – "Gil" is the root of the tree. ● In computer science trees are evolving upside down, having the root at the top. – "Will" is a leaf, i.e. it has no children at all. ● A node having only one child is sometimes called half-leaf. – "Clark" and "James" are siblings. ● The BST is called "binary", because each node has at most two children. – A binary search can be issued on a BST in a very simple manner. null"Clark" "Ed" null "Helen" "James"null null "Gil" "Will" nullnullnull "Bud" null Trees – Terminology ● There also exist ternary, n-ary etc. trees.
  • 9. 9 9 ● Basically all operations on trees involve recursion! ● The iteration of a tree is called traversal. It means to visit every node in the tree from a certain starting node. – (1) Do something with the current node. – (2) If the left (or right node) is not null make it the current node and continue with (1). → Recurse! ● There are three classical ways to traverse (binary) trees: – Inorder: visit the left node, then the root and finally the right node. – Postorder: visit the left node, then the right node and finally the root. – Preorder/depth-first: visit the root, then the left node and finally the right node. ● The most important traversal for BSTs is inorder, because it visits the nodes in sorted order. – Notice, how the way of traversal could be encapsulated behind the iterator design pattern! public class Node { // Mind the recursive definition! public string Value { get; set;} public Node Left { get; set;} public Node Right { get; set;} } public static void PrintTreePostorder(Node root) { if (null != root) { PrintTreePostorder(root.Left); // Recurse! PrintTreePostorder(root.Right);// Recurse! Console.WriteLine(root.Value); } } public static void PrintTreeInorder(Node root) { if (null != root) { PrintTreeInorder(root.Left); // Recurse! Console.WriteLine(root.Value); PrintTreeInorder(root.Right); // Recurse! } } public static void PrintTreePreorder(Node root) { if (null != root) { Console.WriteLine(root.Value); PrintTreePreorder(root.Left); // Recurse! PrintTreePreorder(root.Right); // Recurse! } } Traversal of Trees ● The deletion of all nodes in a BST needs to be done postorder. ● Among others there also exists "level-order", which is also called "breadth-first", it iterates all nodes beginning at the root from left to right.
  • 10. 10 10 ● So SortedDictionary is implemented with a BST. – The string that was used in each node in former examples, plays the role of SortedDictionary's search key. – The nodes of SortedDictionary have one more field: the value that is associated with the search key. ● Using a BST in the implementation allows fast implementations to find items. – E.g. the methods Add() and ContainsKey() apply a binary search on SortedDictionary's BST to locate items. – The involved search algorithms are implemented recursively. ● Complexity. – The costs for finding one item only depends on the height of the tree. – We've already discussed that a tree's height can be retrieved by log n. – This is valid for perfectly balanced trees. – Other costs like swapping points will not be taken into consideration as always. – This makes the complexity for finding (and related operations) O(log n) for BSTs. ● Max. ten comparisons to find an item in 1000 items. null"Clark" "Ed" null "Helen" "James"null null "Gil" "Will" nullnullnull "Bud" null "Kyle" "Kyle" (Node) 8758 value null root * left * right "Gil" key SortedDictionary – final Words ● Looking at the key we know on which half an item has to be inserted or found. (left => less than root; right => greater than root, this makes divide and conquer work). ● BSTs can be kept balanced internally to maintain logarithmic behavior, this yields an additional cost factor.
  • 11. 11 11 Associative Collections – A completely different Strategy ● Up to now we discussed associative collections using equivalence to manage keys. – We used a BST and got logarithmic cost for inserting and finding elements. - This is pretty fast! – An interesting point is that the organization of BSTs is concretely imaginable/understandable/drawable. ● Now we're going to discuss associative collections using equality to manage keys, which is really cool! – Instead of keeping the associative collection sorted, we manage a set of collections that collect items having something in common. – These individual collections are then called buckets. ● E.g. an associative collection telephoneDictionary: for each bucket the contained strings will have the same first character. – (This is exactly how "real" dictionaries work, where there are tabs collecting words starting with the same character. The tabs play the role of the buckets.) – This type of associative collection is called hashtable. – A so called hash-function maps a key to a hash-code. ● In this case the hash-function maps a string (= the key) to its first letter (= the hash-code). – The hash-function determines, in which bucket to find or insert a key (or key-value-pair). ● When the bucket is determined, a specific operation takes only place on the items in that bucket. telephoneDictionary (Bucket) "Bud""B" "Ben" (Bucket) "Gil""G" (Bucket) "Will""W" "Walt" 3821 9427 7764 4689 1797 public class Bucket { public string HashCode { get; set; } public IList<KeyValuePair<string, int>> Items { get; set; } }
  • 12. 12 12 Associative Collections – Hashtable-based Implementations - Keys ● OK, but how will an implementation using a hashtable help? - To understand it, let'sdissect hashtable starting with the keys. ● The idea of a hashtable is to have a hash-function that calculates a position in a table from the input item. – How can we have a function that is able to do that for an item? – The idea is to have a method of an item-object, more specifically the key, that calculates this position. ● In the frameworks Java and .Net each UDT inherits a base type (Object in both frameworks), forming a cosmic hierarchy. – In Java a UDT can override Object's method hashCode() in order to calculate the hash-code. – In .Net a UDT can override Object's method GetHashCode() in order to calculate the hash-code. – Now we have hash-functions/methods to "hash" a name string to the int value of the first letter (incl. some error handling). – We can use instances of NameKey as keys for hashtables! // C# public class NameKey { public string Name { get; set; } public override int GetHashCode() { // Return int-value of first letter. return string.IsNullOrEmpty(Name) ? 0 : Name[0]; } } // Java public class NameKey { // (members hidden) public String name; // (getter/setter hidden) @Override public int hashCode() { // Return int-value of first letter. return (null == getName() || getName().isEmpty()) ? 0 : getName().charAt(0); } } ● Java's and .Net's inherited equals()/Equals() (inherited from Object esp. for .Net reference types) will just compare references. It means that equals()/Equals() will only return true, if the same object is compared to itself, so the inherited implementations do an identity comparison. ● Java's inherited hashCode() (inherited from Object) just returns the object's address in memory. .Net's inherited GetHashCode() (inherited from Object i.e. for .Net reference types) uses some number that is guaranteed to be unique within the .Net AppDomain.
  • 13. 13 13 Associative Collections – Hashtable-based Implementations - .Net's Dictionary ● In .Net we can now use NameKey as type for the key in an IDictionary that is implemented in terms of a hashtable. – The hashtable-based implementation of IDictionary is simply called Dictionary! It can (of course) be used like any other IDictionary: – When GetHashCode() is overridden in the key's type, the hashtable will automatically work: ● When those items are added into the telephoneDictionary, following will happen: – (1) For the key NameKey{ Name = "Bud" } the hash-code is retrieved. GetHashCode() returns 66. – (2) In the hashtable-array (here an int[]) the entry on the index 66 is fetched: ● If there is an empty entry at 66, it will be set to the value 3821, ● otherwise the value at the existing entry at 66 will be overwritten with the value 3821. – Another key (like NameKey{ Name = "Gil" }) yields another hash-code and another index in the hashtable. – The operations in (2) are O(1) operations, because accessing indexed collections (like arrays) on indexes only cost O(1). ● Oversimplification! There's more and it's getting more contrived! And we've to understand it to get the whole picture! IDictionary<NameKey, int> telephoneDictionary = new Dictionary<NameKey, int>(); telephoneDictionary[new NameKey{ Name = "Bud" }] = 3821; // GetHashCode() returns 66. telephoneDictionary[new NameKey{ Name = "Gil" }] = 7764; // GetHashCode() returns 71. telephoneDictionary hashTable : int[] 382166 ...... 776471 ...... ...... 38213821 adding these objects (NameKey) "Bud" 3821 (NameKey) "Gil" 7764 ● What we discuss here is a sparse array implementation of a hashtable, in which many slots of the table remain empty, whilst only few slots of the table with the indexes matching the hash-code are filled. Real-world implementations of hashtable are much more compact (usually the index is calculated as "index = hash-code % hashtable-length"), but we're not going to discuss them in this course, because the mere idea is the same. ● We didn't discuss the implementation of hashcode()/GetHashCode() in great depth, esp. if many fields are to be taken into consideration in the hash- code. Many suggestions involve the combination of the hash-codes with xor operations in order to get a good distribution including the multiplication of magic constants to yield prime numbers, which may result in yet a better distribution, or retrieving hash-codes of different field-types in different ways. That's all fine, but many of these tips rely on special assumptions of the platform and the implementation. Using a simple xor combination is OK for most cases and magic should only be taken into consideration, if performance- (i.e. lookup-) problems are present. Don't be too clever!
  • 14. 14 14 Associative Collections – Hashtable-based Implementations – Hash Collisions and Buckets ● But there is a problem: what if we try adding keys having the same hash-code, e.g. names with the same first letter? – The thing that happens here is called hash collision. To handle hash collisions in a hashtable we have to review the idea of Buckets. ● Actually a hashtable is an array of Bucket objects (Not just an int[]!). – For each hash-code there exists one Bucket. – The hash-code is the index of the hashtable, where that hash-code's Bucket resides. – Within a Bucket all inserted items with a key having the same hash-code are collected as key-value-pairs. ● Hm... wait! If a hash collision leads to putting items into the same Bucket, how can the associated value be retrieved from the hashtable in a definite way? – In other words: the hashtable story is even more complex! telephoneDictionary[new NameKey{ Name = "Bud" }] = 3821; // GetHashCode() returns 66. telephoneDictionary[new NameKey{ Name = "Ben" }] = 9427; // GetHashCode() also returns 66! telephoneDictionary hashTable : Bucket[] (Bucket)66 ...... (Bucket)71 ...... ...... (Bucket) (Key : NameKey) "Bud" (Key : NameKey) "Ben" 66 hashCode (Bucket) (Key : NameKey) "Gil" 71 hashCode (Value : int) 3821 (Value : int) 9427 (Value : int) 7764 adding these objects (NameKey) "Bud" 3821 (NameKey) "Ben" 9427
  • 15. 15 15 Associative Collections – Hashtable-based Implementations – Equality ● As a matter of fact hash collisions can not be avoided, because any objects could have the same hash-code! – E.g. two names could have the same first letter. This fact is not under our control! ● The answer is that all the key-value-pairs of a hash-code-matching Bucket need to be searched for a certain key. – The keys of a hash-code-matching Bucket are linearly searched and compared for equality to find the exactly matching key. – But what means "exactly matching key" in opposite to a (just) hash-code-matching key? – We have to prepare our key-type to provide another method to check for the exact equality of two keys! – Similar to objects providing hash-codes, Java and .Net want us overriding the method equals()/Equals() inherited from Object: ● Let's discuss how this works! public class NameKey { // (members hidden) public override int GetHashCode() { /* pass */ } public override bool Equals(object other) { // Simple and naïve implementation of Equals(): NameKey otherNameKey = (NameKey)other; return Name == otherNameKey.Name; } } public class NameKey { // (members hidden) @Override public int hashCode() { /* pass */ } @Override public boolean equals(Object other) { // Simple and naïve implementation of equals(): NameKey otherNameKey = (NameKey) other; return getName().equals(otherNameKey.getName()); } }
  • 16. 16 16 Associative Collections – Hashtable Implementations – Implementing Equality ● .Net: Mechanics of comparing two NameKey objects with the method Equals(): – Equals() has a parameter that is of the very base type Object (static type). – We assume the dynamic type behind the passed argument is NameKey (only NameKeys can be equality-compared to each other). ● We can cast the parameter other to NameKey blindly. (And this can be a problem, which we'll discuss shortly.) – When we've the NameKey behind the parameter, we equality-compare the argument's Name property against this' Name property. ● Mind that we used the operator== to compare the Name properties (well, we could have used Equals() as well); those are of type string. ● Equals() makes a deeper (more expensive) comparison than GetHashCode(), the latter only deals with Name's first letter: public class NameKey { // (members hidden) public override bool Equals(object other) { NameKey otherNameKey = (NameKey)other; return this.Name == otherNameKey.Name; } } // Both hash-codes evaluate to 66, but these objects are not equal! NameKey key1 = new NameKey{ Name = "Bud" }; NameKey key2 = new NameKey{ Name = "Ben" }; bool hashCodesAreTheSame = key1.GetHashCode() == key2.GetHashCode(); // evaluates to true bool areNotEqual = key1.Equals(key2); // evaluates to false public class NameKey { // (members hidden) public override bool Equals(object other) { NameKey otherNameKey = (NameKey)other; return this.Name == otherNameKey.Name; } } public class NameKey { // (members hidden) public override int GetHashCode() { return string.IsNullOrEmpty(this.Name) ? 0 : this.Name[0]; } }
  • 17. 17 17 Associative Collections – Hashtable-based Implementations – Rules of Equality ● We'll not discuss all facets of the implementation of GetHashCode() and Equals(), but here are the most important rules: – Two objects having the same hash-code need not to return true for calling Equals()! – But, if two objects having the same dynamic type return true for calling Equals(), then those need to have the same hash-code! – GetHashCode() and Equals() have to return the same results for the "structurally" same objects unless one of them is modified. – GetHashCode() and Equals() should be implemented to work very fast. – Neither GetHashCode() nor Equals() are allowed to throw exceptions! ● If null is passed to Equals() the result has to be false. ● Finally GetHashCode() and Equals() for NameKey could be implemented like so to fulfill these rules: public class NameKey { // (members hidden) public override int GetHashCode() { return string.IsNullOrEmpty(Name) ? 0 : Name[0]; } public override bool Equals(object other) { // Stable implementation of Equals() if (this == other) { return true; } if (null != other && GetType() == other.GetType())) { return Name == ((NameKey)other).Name; } return false; } } - checks for identity - checks for nullity (to avoid exceptions) - checks the dynamic type of this and the other object - the cast is type safe - the Name properties of both objects are equality-compared
  • 18. 18 18 Associative Collections – Hashtable-based Implementations – The Lookup Algorithm ● All right! Now we have the methods GetHashCode() and Equals() in place. But how do hashtables use these tools to work? ● Let's assume following content in the hashtable-based Dictionary telephoneDictionary: – Then we'll lookup/search the phone number of "Bud": ● The lookup will initiate following algorithm basically: – Dictionary will call GetHashCode() on the indexer's (i.e. operator[]) argument 'new NameKey{ Name = "Bud" }' and the result is 66. – Dictionary will get the Bucket at the hashtable's index 66. This Bucket has two entries! – Dictionary will call 'Equals(new NameKey{ Name = "Bud" })' against each key of the key-value-pairs in the just returned Bucket. – The value of the key-value-pair for which the key was equal to 'new NameKey{ Name = "Bud" })' will be returned. ● Read these steps for multiple times and make sure you understood those!Now we have to discuss some details... – The algorithms to insert or update a key-value-pair work the same way as for lookups! telephoneDictionary hashTable : Bucket[] (Bucket)66 ...... ...... (Bucket) (Key : NameKey) "Bud" (Key : NameKey) "Ben" 66 hashCode (Value : int) 3821 (Value : int) 9427 searching this key (NameKey) "Bud" IDictionary<NameKey, int> telephoneDictionary = new Dictionary<NameKey, int> { { new NameKey { Name = "Bud" }, 3821 }, { new NameKey { Name = "Ben" }, 9427 } } ; int no = telephoneDictionary[new NameKey { Name = "Bud" }];
  • 19. 19 19 Associative Collections – Hashtable-based Implementation – best Case Complexity ● Although the algorithm to lookup keys is complex, the analysis of a hashtable's complexity is reallysimple! ● The best case yields a complexity of O(1)! - Wow! – If every item can be associated to exactly one distinct Bucket each, we have one item in every Bucket: an 1 : 1 association between items and Buckets. – This means that every item has a distinct hash-code and thus a distinct index in the hashtable. – As the hashtable is an array and arrays can access their items by index with O(1) complexity, we have the best case for hashtable! ● In the best case, accessing a hashtable, means accessing an array by index with constant complexity. – This is better than O(log n) for tree-based associative collections! – This is the best we can get for collections! ● But there is also a worst case! telephoneDictionary hashTable : Bucket[] (Bucket)66 ...... (Bucket)71 ...... ...... (Bucket) (Key : NameKey) "Gil" 71 hashCode (Value : int) 7764 (Bucket) (Key : NameKey) "Bud" 66 hashCode (Value : int) 3821 (Bucket) (Key : NameKey) "Will" 87 hashCode (Value : int) 4689 (Bucket)87 ......
  • 20. 20 20 Associative Collections – Hashtable-based Implementations – worst Case Complexity ● The worst case yields a complexity of O(n)! - Oh no! – If every item can be associated to the same Bucket, we have all items in only one single Bucket: an n : 1 association between items and Buckets. – This means that every item has the same hash-code and thus the same index in the hashtable. – As the only one Bucket needs to be search linearily to find the equal key, the worst complexity boils down to the linear complexity O(n)! ● In the worst case the workload is moved to a single Bucket that must be searched in a linear manner. telephoneDictionary hashTable : Bucket[] (Bucket)66 ...... ...... (Bucket) (Key : NameKey) "Bud" (Key : NameKey) "Ben" 66 hashCode (Value : int) 3821 (Value : int) 9427 (Key : NameKey) "Betty" (Value : int) 8585 ● Java's java.util.concurrent.ConcurrentHashMap is able to switch the storage of the buckets from a List- implementation to a sorted tree implementation, if the buckets grow too large (i.e. too many keys with the same hashcode). The benefit is that searching a bucket will be more efficient then (O(n) for List, but O(log n) for sorted trees).
  • 21. 21 21 Associative Collections – Hashtable-based Implementations – Controlling Performance – Part I ● An interesting point when working with .Net's Dictionary is how we can influence the performance as developers: – We can override GetHashCode() and Equals() for our own UDTs being used as keys and this is what we are going to discuss now. ● Concretely we have a problem with the UDT NameKey: we'll get the same hash-code for keys having the same first letter! – However, we can override GetHashCode(), so that a "deeper" or in other words "more distinct" hash-code will be produced. – .Net's string is able to produce its own hash-code that is calculated concerning the whole string-value and not only the first letter: ● We already know that each .Net type inherits Object and can override GetHashCode() and Equals(). – A type implementing equality in the .Net framework needs overriding GetHashCode() and Equals()! ● Think: GetHashCode() => level one equality, Equals() => level two equality. – Many types of the .Net framework provide useful overrides of GetHashCode() and Equals() to implement equality. – (GetHashCode() is not only used with Dictionary's but also in other places of the .Net framework.) – (A type implementing equivalence in the .Net framework needs implementing IComparable or another type implementing IComparer.) public class NameKey { public string Name {get; set;} public override int GetHashCode() { // Return int-value of first letter. return string.IsNullOrEmpty(Name) ? 0 : Name[0]; } } public class NameKey { public string Name {get; set;} public override int GetHashCode() { // Return the hash-code of the string Name. return string.IsNullOrEmpty(Name) ? 0 : Name.GetHashCode(); } } ● switch-case with strings in C# and Java uses the strings' hash-code to do the comparisons. ● It should be mentioned that some methods of list- types also use equality to function, e.g. methods like Contains(), Remove() or IndexOf().
  • 22. 22 22 NO! Stop it! We're not going to discuss it here! As a matter of fact implementing equality correctly in .Net and Java is not simple and it is often done downright wrong and potentially dangerous! I have one simple tip: Don't be too clever and follow the rules! Ok, we'll discuss it in depth in a future lecture. ***Under Construction – Equality – Under Construction*** ● An interesting point is that each object in Java/.Net implements equals()/Equals() and hashcode()/GetHashCode(). – But overrides of those methods need to follow certain rules. Let's discuss those.
  • 23. 23 23 Associative Collections – Hashtable-based Implementations – Controlling Performance – Part II ● The last implementation of NameKey does basically delegate equality fully to its property Name, which is of type string. – Hm! - To drive this point home: we can get rid of the type NameKey and use string instead as key-type! Hey! ● As a matter of fact, it is often not required to program extra UDTs as key types! – The present .Net types are often sufficient to be used as keys for hashtables. Most often int and string are used as key types. – Nevertheless, .Net (and Java) developers have to know, how equality needs to be implemented correctly! ● In principle the implementation pattern and even the idiom for equality works the same way in Java! // A dictionary that associates strings (names) to ints (phone numbers): IDictionary<string, int> telephoneDictionary = new Dictionary<string, int>(); // Adding two string-int key-value-pairs: telephoneDictionary["Bud"] = 3821; telephoneDictionary["Gil"] = 7764; // Looking up the phone number of "Bud": int no = telephoneDictionary["Bud"];
  • 24. 24 24 ● Without any further explanation it should be clear that maintaining a collection having only unique items is very painful. – E.g. collecting only the surnames from a deck of invitations. It could be done by filling a list w/o duplicates: ● There is a variant of associative collections that just avoids the presence of duplicates. Those are often called sets. – Sets are associative collections like dictionaries, in which the keys are the values! – In Java/.Net we can use BST-based (TreeSet/SortedSet) and hashtable-based (HashSet/HashSet) implementations of sets. – In C++ we can use set either the BST-based std::set (<set>) or the hashtable-based std::unordered_set (C++11: <unordered_set>). // Java List<String> allSurnames = Arrays.asList("Taylor", "Miller", "Taylor", "Miller"); List<String> uniqueSurnames = new ArrayList<>(); for (String surName : allSurnames) { if (!uniqueSurnames.contains(surName)) { // Filters unique items out. uniqueSurnames.add(surName); } } for (String surName : uniqueSurnames) { System.out.println(surName); } // > Taylor // > Miller // Implementation using a Set: Set<String> uniqueSurnames2 = new HashSet<>(); for (String surName : allSurnames) { uniqueSurnames2.add(surName); } for (String surName : uniqueSurnames2) { System.out.println(surName); } Associative Collections – Sets ● As for dictionaries (and all associative collections) the cause using sets is to exploit its self- organization (keeping items unique), instead of organizing it ourselves.
  • 25. 25 25 ● Set collections can really represent mathematical sets, this also includes operations on sets (e.g. via .Net's ISet interface): ● Subsets ● Union // C#/.Net ISet<int> A = new HashSet<int>{ 1, 2, 3, 4, 5, 6, 7, 8, 9 }; ISet<int> B = new HashSet<int>{ 3, 6, 8 }; bool BSubSetOfA = B.IsSubsetOf(A); // >true bool BProperSubSetOfA = B.IsProperSubsetOf(A); // >true Associative Collections – Operations on Sets – Part I // Java Set<Integer> A = new HashSet<>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9)); Set<Integer> B = new HashSet<>(Arrays.asList(3, 6, 8)); boolean BSubSetOfA = A.containsAll(B); // >true // Java doesn't provide a test for _proper_ subsets. A⊆B A⊂B A={1,2,3,4,5,6,7,8,9};B={3,6,8} A B A∪B A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}∪{4,5,6,7,8,9}={1,2,3,4,5,6,7,8,9} ISet<int> A = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 }; ISet<int> B = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 }; A.UnionWith(B); // The set A will be _modified_! // >{1, 2, 3, 4, 5, 6, 7, 8, 9} // LINQ's Union() extension method will create a _new_ // sequence: ISet<int> A2 = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 }; ISet<int> B2 = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 }; IEnumerable<int> A2UnionB2 = A2.Union(B2); // >{1, 2, 3, 4, 5, 6, 7, 8, 9} Set<Integer> A = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 5, 6)); Set<Integer> B = new TreeSet<>(Arrays.asList(4, 5, 6, 7, 8, 9)); boolean AWasModified = A.addAll(B); // The set A will be _modified_! // >true System.out.println(A); // >[1, 2, 3, 4, 5, 6, 7, 8, 9] A B ● A proper subset means, that a set is a subset of another set, but both sets are not equal! ● Subsets can also be expressed with LINQ, but usually the type ISet and its implementors should be used, because it leads to more expressive code: // C#/.Net/LINQ ISet<int> A = new HashSet<int>{ 1, 2, 3, 4, 5, 6, 7, 8, 9 }; ISet<int> B = new HashSet<int>{ 1, 2, 3, 4, 5, 6, 7, 8, 9 }; ISet<int> C = new HashSet<int>{ 3, 6, 8 }; // Subset: bool BIsSubsetOfA = !B.Except(A).Any(); // >true bool CIsSubsetOfA = !C.Except(A).Any(); // >true // Proper Subset: bool BIsProperSubSetOfA = A.Except(B).Any(); // > false bool CIsProperSubSetOfA = A.Except(C).Any(); // > true
  • 26. 26 26 ● Difference ● Symmetric difference // C#/.Net ISet<int> A = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 }; ISet<int> B = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 }; A.ExceptWith(B); // The set A will be _modified_! // >{1, 2, 3} // LINQ's Except() method will create a _new_ sequence: ISet<int> A2 = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 }; ISet<int> B2 = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 }; IEnumerable<int> A2ExceptB2 = A2.Except(B2); // >{1, 2, 3} Associative Collections – Operations on Sets – Part II A B A ∖B // Java Set<Integer> A = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 5, 6)); Set<Integer> B = new TreeSet<>(Arrays.asList(4, 5, 6, 7, 8, 9)); A.removeAll(B); // The set A will be _modified_! System.out.println(A); // >[1, 2, 3] A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}∖{4,5,6,7,8,9}={1,2,3} A▵B(:=( A∖ B)∪(B∖ A)) A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}▵{4,5,6,7,8,9}={1,2,3,7,8,9} A B ISet<int> A = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 }; ISet<int> B = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 }; A.SymmetricExceptWith(B); // The set A will be _modified_! // >{1, 2, 3, 7, 8, 9} // Using LINQ we can create a _new_ sequence: ISet<int> A2 = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 }; ISet<int> B2 = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 }; IEnumerable<int> A2SymmetricExceptB2 = A2.Except(B2).Union(B2.Except(A2)); // >{1, 2, 3, 7, 8, 9} Set<Integer> A = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 5, 6)); Set<Integer> B = new TreeSet<>(Arrays.asList(4, 5, 6, 7, 8, 9)); Set<Integer> A2 = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 5, 6)); A.removeAll(B); // The sets A and B will be _modified_! B.removeAll(A2); A.addAll(B); System.out.println(A); // >[1, 2, 3, 7, 8, 9]
  • 27. 27 27 ● Intersection Associative Collections – Operations on Sets – Part III A∩B A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}∩{4,5,6,7,8,9}={4,5,6} A B // C#/.Net ISet<int> A = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 }; ISet<int> B = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 }; A.IntersectWith(B); // The set A will be _modified_! // >{4, 5, 6} // LINQ's Intersect() method will create a _new_ sequence: ISet<int> A2 = new SortedSet<int>{ 1, 2, 3, 4, 5, 6 }; ISet<int> B2 = new SortedSet<int>{ 4, 5, 6, 7, 8, 9 }; IEnumerable<int> A2IntersectionWithB2 = A2.Intersect(B2); // >{4, 5, 6} // Java Set<Integer> A = new TreeSet<>(Arrays.asList(1, 2, 3, 4, 5, 6)); Set<Integer> B = new TreeSet<>(Arrays.asList(4, 5, 6, 7, 8, 9)); A.retainAll(B); // The set A will be _modified_! System.out.println(A); // >[4, 5, 6]
  • 28. 28 28 ● Inserting values having already present keys in an associative collectionoverwrites or updates present values. – Sometimes it is not desired. E.g. mind a telephoneDictionary, in which a name can have more than one phone number! ● Some collection frameworks provide associative collections that can handlemultiplicity. – In C++ we can use std::multimap (<map>) and std::multiset (<set>). ● In other frameworks (Java, .Net etc.) multi-associative collections need to be explicitly implemented or taken from 3rd party. – 3rd party sources: Apache Commons (Java) // C++11 std::multimap<std::string, int> telephoneDictionary { { "Ben", 9427 }, // Mind: two values with the same key are added here. { "Ben", 4367 }, { "Jody", 1781 }, // Mind: three values with the same key are added here. { "Jody", 9032 }, { "Jody", 8038 } }; // It is required to use STL iterators, because std::multimap provides no subscript operator: for (auto item = telephoneDictionary.begin(); item != telephoneDictionary.end(); ++item) { std::cout<<"Name: "<<item->first<<", Phone number: "<<item->second<<std::endl; } Special associative Collections – Multiplicity of Keys telephoneDictionary 9427"Ben" 4367 1781"Jody" 9032 8038
  • 29. 29 29 ● Key – Refrain from modifying key objects managed in associative collections. ● Sorted/equivalence-based associative collections: – TreeMap/TreeSet (Java), SortedDictionary/SortedSet (.Net), std::set/std::map (C++), – The internal organization of keys is equivalence-based: ● Equivalence-based means that key objects are organized due to their relative/natural order. This is often implemented with a BST. ● The key-type needs implementing Comparable (Java) or IComparable (.Net) or operator< (C++ STL) for semantically correct "natural order". ● Or a Comparator (Java) or IComparer (.Net) or a comparison functor (C++ STL) needs to be specified for the associative collection that implements "natural ordering" for the key-type. – Searching/inserting/removing items can be done very fast (O(logn)), because binary searches are used (BST). – Sorted associative collections are unordered! ● The iteration will yield the items in sorted order (BST in-order). The order, which items have been added/inserted/removed doesn't matter! – => Use these associative collections, if sorting of keys or the control of key-comparison (e.g. reverse order) is needed! Things to ponder about Associative Collections – Part I
  • 30. 30 30 ● Equality-based associative collections: – HashMap/HashSet (Java), Dictionary/HashSet (.Net), NSDictionary/NSSet (Cocoa), std::unordered_map/std::unordered_set (C++11), associative arrays (JavaScript) – The internal organization of keys is equality-based: ● By equality. I.e. by hash-codes and the result of the equality check (methods hashCode()equals() (Java) or GetHashCode()/Equals() (.Net)). ● The key-type needs implementing hashCode()/equals() (Java) or GetHashCode()/Equals() (.Net) for semantically correct equality. ● Or an IHashCodeProvider/IEqualityComparer (.Net) needs to be specified for the associative collection that implements equality for the key-type. ● In C++11 STL a hasher functor should be specified for the associative collection that provides hash-codes for the key-type. – Searching/inserting/removing items can be done O(1) no extra search operation is required, the hash-code is used as an index. – These associative collections are unordered! ● The iteration order makes no guarantees about how items are yielded. It can change completely when items are added. ● In most cases equality-based associative collection should be our first choice, those are potentially most efficient. – => Use these associative collections, if sorting of keys or the control of key-comparison doesn't matter! ● Ordered associative collections: – LinkedHashMap (Java) and OrderedDictionary (.Net) will iterate in that order, in which the items were put into the collection. ● LinkedHashMap is the type Groovy uses to implement map literals. Things to ponder about Associative Collections – Part II
  • 31. 31 31 ● Old and new collections: – Java: ● The old Hashtable is not null-aware on keys, adding null will throw a NullPointerException. Better use HashMap/HashSet (Java 1.2 or newer). ● Hashtable and HashMap return null, if a requested key has no value (i.e. key is not present). – .Net: ● The old object-based Hashtable should not be used for new code using .Net 1 or newer. Better use Dictionary/SortedDictionary (.Net 2 or newer). ● Be aware that Hashtable returns null, if a key has no value (i.e. key is not present). Dictionary/SortedDictionary will throw a KeyNotFoundException. ● Keys with the value null: – Java: ● TreeMap/TreeSet don't allow keys having the value null, then a NullPointerException will be thrown. ● HashMap/HashSet can digest keys having the value null. – .Net: ● Dictionary/SortedDictionary don't allow keys having the value null, then an ArgumentNullException will be thrown. ● HashSet/SortedSet can digest keys having the value null. ● It was never mentioned explicitly: There are basically no constraints on the types of the values in associative collections! Things to ponder about Associative Collections – Part III