Generation 5 » The Semantics of Dictionaries, Maps and Hashtables

 Subscribe to our RSS Feed | About Us

The Semantics of...
Generation 5 » The Semantics of Dictionaries, Maps and Hashtables

handling settings in your runtime are) you might write ...
Generation 5 » The Semantics of Dictionaries, Maps and Hashtables

There are four tools that hashtables give programmers t...
Generation 5 » The Semantics of Dictionaries, Maps and Hashtables

key and no value associated with a key
2. The potential...
Generation 5 » The Semantics of Dictionaries, Maps and Hashtables

get() in PHP that never gives a warning message,  and i...
Generation 5 » The Semantics of Dictionaries, Maps and Hashtables
None
Subscribe to None

Submit Comment

Copyright © 2013...
Upcoming SlideShare
Loading in...5
×

The semantics of dictionaries, maps and hashtables

772

Published on

The Semantics of Dictionaries, Maps and Hashtables
Introduction
The first language I used that put dictionaries on my fingertips was Perl, where the
solution to just about any problem involved writing something like
$hashtable{$key}=$value;
Perl called a dictionary a ‘hash’, a reference to the way Perl implemented
dictionaries. (Dictionaries are commonly implemented with hashtables and b-trees,
but can also be implemented with linked-list and other structures.) The syntax of Perl
is a bit odd, as you’d need to use $, # or % to reference scalar, array or hash
variables in different contexts, but dictionaries with similar semantics became
widespread in dynamic languages of that and succeeding generations, such as Python,
PHP and Ruby. ‘Map’ container classes were introduced in Java about a decade ago,
and programmers are using dictionaries increasingly in static languages such as Java
and C#.
Dictionaries are a convenient and efficient data structure, but there’s are areas in
which different mplementations behave differently: for instance, in what happens if
you try to access an undefined key. I think that cross-training is good for
developers, so this article compares this aspect of the semantics in for popular languages: PHP, Python, Java and C#

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
772
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The semantics of dictionaries, maps and hashtables

  1. 1. Generation 5 » The Semantics of Dictionaries, Maps and Hashtables  Subscribe to our RSS Feed | About Us The Semantics of Dictionaries, Maps and Hashtables Introduction The first language I used that put dictionaries on my fingertips was Perl, where the solution to just about any problem involved writing something like $hashtable{$key}=$value; Perl called a dictionary a ‘hash’,  a reference to the way Perl implemented dictionaries.  (Dictionaries are commonly implemented with hashtables and b-trees,  but can also be implemented with linked-list and other structures.)  The syntax of Perl is a bit odd, as you’d need to use $, # or % to reference scalar,  array or hash variables in different contexts,  but dictionaries with similar semantics became widespread in dynamic languages of that and succeeding generations, such as Python, PHP and Ruby.  ‘Map’ container classes were introduced in Java about a decade ago,  and programmers are using dictionaries increasingly in static languages such as Java and C#. Dictionaries are a convenient and efficient data structure, but there’s are areas in which different mplementations behave differently: for instance,  in what happens if you try to access an undefined key.   I think that cross-training is good for developers,  so this article compares this aspect of the semantics of dictionaries in four popular languages:  PHP,  Python,  Java and C#. Use cases There are two use cases for dictionaries, so far as error handling is concerned: 1. When you expect to look up undefined values, and 2. When you don’t Let’s look at three examples: Computing A Histogram One common use for a dictionary is for counting items, or recording that items in a list or stream have been seen. In C#, this is typically written something like: [01] var count=Dictionary<int,int>(); [02] foreach(int i in inputList) { [03] if (!counts.Contains(i)) [04] count[i]=0; [05] [06] count[i]=count[i]+1 [07] } The Dictionary count now contains the frequency of items inputList, which could be useful for plotting a histogram. A similar pattern can be used if we wish to make a list of unique items found in inputList. In either case,  looking up values that aren’t already in the hash is a fundamental part of the algorithm. Processing Input Sometimes, we’re getting input from another subsystem, and expect that some values might not be defined. For instance, suppose a web site has a search feature with a number of optional features, and that queries are made by GET requests like: [08] search.php?q=kestrel [09] search.php?q=admiral&page=5 [10] search.php?q=laurie+anderson&page=3&in_category=music&after_date=1985-0207 In this case, the only required search parameter is “q”, the query string — the rest are optional. In PHP (like many other environments), you can get at GET variables via a hashtable, specifically, the $_GET superglobal, so (depending on how strict the error http://gen5.info/q/2008/07/17/the-semantics-of-dictionaries-maps-and-hashtables/[1/16/2014 4:03:03 PM] Search for: Search Archives June 2012 (1) August 2010 (1) May 2010 (1) June 2009 (2) April 2009 (1) March 2009 (1) February 2009 (3) January 2009 (3) November 2008 (1) August 2008 (2) July 2008 (5) June 2008 (5) May 2008 (2) April 2008 (6) March 2008 (8) June 2006 (1) February 2006 (1) Categories AJAX (2) Asynchronous Communications (16) Biology (1) Books (1) Design (1) Distributed (1) Exceptions (2) Functional Programming (1) GIS (1) Ithaca (1) Japan (1) Math (1) Media (3) Nature (1) Semantic Web (3) Tools (28) CRUD (1) Dot Net (17) Freebase (2) GWT (9) Java (7) Linq (2) PHP (6) Server Frameworks (1) Silverlight (12) SQL (5) Uncategorized (1) Web (2) Analytics (1)
  2. 2. Generation 5 » The Semantics of Dictionaries, Maps and Hashtables handling settings in your runtime are) you might write something like [11] [12] [13] [14] [15] [16] [17] if ($_GET["q"])) { throw new InvalidInputException("You must specify a query"); } if($_GET["after_date"]) { ... add another WHERE clause to a SQL query ... } This depends, quite precisely, on two bits of sloppiness in PHP and Perl: (a) Dereferencing an undefined key on a hash returns an undefined value, which is something like a null . (b) both languages have a liberal definition of true and false in an if() statement. As a result, the code above is a bit quirky. The if() at line 11 evaluates false if q is undefined, or if q is the empty string. That’s good. However, both the numeric value 0 and the string “0″ also evaluate false. As a result, this code won’t allow a user to search for “0″, and will ignore an (invalid) after_date of 0, rather than entering the block at line [16], which hopefully would validate the date. Java and C# developers might enjoy a moment of schadenfreude at the above example, but they’ve all seen, written and debugged examples of input handling code that just as quirky as the above PHP code — with several times the line count. To set the record straight, PHP programmers can use the isset() function to precisely test for the existence of a hash key: [11] if (isset($_GET["q"]))) { [12] throw new InvalidInputException("You must specify a query"); [13] } The unusual handling of “0″ is the kind of fault that can survive for years in production software:  so long as nobody searches for “0″,  it’s quite harmless.  (See what you get if you search for a negative integer on Google.)  The worst threat that this kind of permissive evaluation poses is when it opens the door to a security attack,  but we’ve also seen that highly complex logic that strives to be “correct” in every situation can hide vulnerabilities too. Relatively Rigid Usage Let’s consider a third case: passing a bundle of context in an asynchronous communications call in a Silverlight application written in C#. You can do a lot worse than to use the signatures: [14] void BeginAsyncCall(InputType input,Dictionary<string, object> context,CallbackDelegate callback); [15] void CallbackDelegate(ReturnType returnValue,Dictionary<string,object> context); The point here is that the callback might need to know something about the context in which the asynchronous function was called to do it’s work. However, this information may be idiosyncratic to the particular context in which the async function is called,  and is certainly not the business of the asynchronous function. You might write something like [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] void Initiator() { InputType input=...; var context=Dictionary<string,object>(); context["ContextItemOne"]= (TypeA) ...; context["ContextItemTwo"]= (TypeB) ...; context["ContextItemThre"] = (TypeC) ...; BeginAsyncCall(input,context,TheCallback); } void TheCallback(ReturnType output,Dictionary<string,object> context) { ContextItemOne = (TypeA) context["ContextItemOne"]; ContextItemTwo = (TypeB) context["ContextItemTwo"]; ContextItemThree = (TypeC) context["ContextItemThree"]; ... } This is nice, isn’t it?  You can pass any data values you want between Initiator and TheCallback. Sure,  the compiler isn’t checking the types of your arguments,  but loose coupling is called for in some situations.  Unfortunately it’s a little too loose in this case,  because we spelled the name of a key incorrectly on line 21. What happens? The [] operator on a dot-net Dictionary throws a KeyNotFoundException when we try to look up a key that doesn’t exist.   I’ve set a global exception handler for my Silverlight application which,  in debugging mode,  displays the stack trace.  The error gets quickly diagnosed and fixed. Four ways to deal with a missing value http://gen5.info/q/2008/07/17/the-semantics-of-dictionaries-maps-and-hashtables/[1/16/2014 4:03:03 PM]
  3. 3. Generation 5 » The Semantics of Dictionaries, Maps and Hashtables There are four tools that hashtables give programmers to access values associated with keys and detect missing values: 1. 2. 3. 4. Test if key exists Throw exception if key doesn’t exist Return default value (or null) if key doesn’t exist TryGetValue #1: Test if key exists PHP: Python: C#: Java: isset($hashtable[$key]) key in hashtable hashtable.Contains(key) hashtable.containsKey(key) This operator can be used together with the #2 or #3 operator to safely access a hashtable.  Line [03]-[04] illustrates a common usage pattern. One strong advantage of the explicit test is that it’s more clear to developers who spend time working in different language environments — you don’t need to remember or look in the manual to know if the language you’re working in today uses the #2 operator or the #3 operator. Code that depends on the existence test can be more verbose than alternatives,  and can  be structurally unstable:  future edits can accidentally change the error handling properties of the code.  In multithreaded environments,  there’s a potential risk that an item can be added or removed between the existance check and an access — however,  the default collections in most environment are not thread-safe,  so you’re likely to have worse problems if a collection is being accessed concurrently. #2 Throw exception if key doesn’t exist Python: hashtable[key] C#: hashtable[key] This is a good choice when the non-existence of a key is really an exceptional event.  In that case,  the error condition is immediately propagated via the exception handling mechanism of the language,  which,  if properly used,  is almost certainly better than anything you’ll develop.  It’s awkward,  and probably inefficient,  if you think that nonexistent keys will happen frequently.  Consider the following rewrite of the code between [01]-[07] [31] var count=Dictionary<int,int>(); [32] foreach(int i in inputList) { [33] int oldCount; [34] try { [35] oldCount=count[i]; [36] } catch (KeyNotFoundException ex) { [37] oldCount=0 [38] } [39] [40] count[i]=oldCount+1 [41] } It may be a matter of taste,  but I think that’s just awful. #3 Return a default (often null) value if key doesn’t exist PHP: $hashtable[key] (well,  almost) Python: hashtable.get(key, [default value]) Java: hashtable.get(key) This can be a convenient and compact operation.  Python’s form is particularly attractive because it lets us pick a specific default value.  If we use an extension method to add a Python-style GetValue operation in C#,  the code from [01]-[07] is simplified to [42] var count=Dictionary<int,int>(); [43] foreach(int i in inputList) [44] count[i]=count.GetValue(i,0)+1; It’s reasonable for the default default value to be null (or rather,  the default value of the type),  as it is in Python,  in which case we could use the ??-operator to write [42] var count=Dictionary<int,int>(); [43] foreach(int i in inputList) [44] count[i]=(count.GetValue(i) ?? 0)+1; (A ?? B equals A if A is not null,  otherwise it equals B.)   The price for this simplicity is two kinds of sloppiness: 1. We can’t tell the difference between a null (or default) value associated with a http://gen5.info/q/2008/07/17/the-semantics-of-dictionaries-maps-and-hashtables/[1/16/2014 4:03:03 PM]
  4. 4. Generation 5 » The Semantics of Dictionaries, Maps and Hashtables key and no value associated with a key 2. The potential of null value exports chaos into the environment:  trying to use a null value can cause a NullReferenceException if we don’t explictly handle the null.  NullReferenceExceptions don’t bother me if they happen locally to the function that returns them,  but they can be a bear to understand when a null gets written into an instance variable that’s accessed much later. Often people don’t care about 1,  and the risk of 2 can be handled by specifying a non-null default value. Note that PHP’s implementation of hashtables has a particularly annoying characteristic.  Error handling in php is influenced by the error_reporting configuration variable which can be set in the php.ini file and other places.  If the E_STRICT bit is not set in error_reporting ,   PHP barrels on past places where incorrect variable names are used: [45] $correctVariableName="some value"; [46] echo "[{$corectValiableName}]"; // s.i.c. In that case, the script prints “[]” (treats the undefined variable as an empty string) rather than displaying an error or warning message.  PHP will give a warning message if E_STRICT is set,  but then it applies the same behavior to hashtables:  an error message is printed if you try to dereference a key that doesn’t exist — so PHP doesn’t consistently implement type #3 access. #4 TryGetValue There are quite a few methods (Try-* methods) in the .net framework that have a signature like this: [47] bool Dictionary<K,V>.TryGetValue(K key,out V value); This method has crisp and efficient semantics which could be performed in an atomic thread-safe manner:  it returns true if finds the key,  and otherwise returns false.  The output parameter value is set to the value associated with the key if a value is associated with the key,  however,  I couldn’t find a clear statement of what happens if the key isn’t found.  I did a little experiment: [48] [49] [50] [51] [52] [53] var d = new Dictionary<int, int>(); d[1] = 5; d[2] = 7; int outValue = 99; d.TryGetValue(55, out outValue) int newValue = outValue; I set a breakpoint on line 53 and found thate the value of outValue was 0,  which is the default value of the int type.  It seems,  therefore,  that TryGetValue returns the default value of the type when it fails to find the key.  I wouldn’t count on this behavior,  as it is undocumented. The semantics of TryGetValue are crisp and precise.  It’s particularly nice that something like TryGetValue could be implemented as an atomic operation,  if the underyling class is threadsafe.  I fear,  however,  that TryGetValue exports chaos into it’s environment.  For instance,  I don’t like declaring a variable without an assignment,  like below: [54] [55] [56] [57] int outValue; if (d.TryGetValue(55,outValue)) { ... use outValue ... } The variable outValue exists before the place where it’s set,  and outside of the block where it has a valid value.  It’s easy for future maintainers of the code to try to use outValue between lines [54]-[55] or after line [57].  It’s also easy to write something like 51],  where the value 99 is completely irrelevant to the program.  I like the construction [58] if (d.Contains(key)) { [59]    int value=d[key]; [60]    ... do something with value ... [61] } because the variable value only exists in the block [56]-[58] where it has a defined value. Hacking Hashables A comparison of hashtables in different languages isn’t just academic.  If you don’t like the operations that your language gives you for hashtables,  you’re free to implement new operations.  Let’s take two simple examples.  It’s nice to have a Python-style http://gen5.info/q/2008/07/17/the-semantics-of-dictionaries-maps-and-hashtables/[1/16/2014 4:03:03 PM]
  5. 5. Generation 5 » The Semantics of Dictionaries, Maps and Hashtables get() in PHP that never gives a warning message,  and it’s easy to implement [62] function array_get($array,$key,$defaultValue=false) { [63]   if (!isset($array[$key])) [64]      return $defaultValue; [65] [66] return $array[$key]; [67] } Note that the third parameter of this function uses a default value of false,  so it’s possible to call it in a two-parameter form [68] $value=array_get($array,$key); with a default default of false,  which is reasonable in PHP. Extension methods make it easy to add a Python-style get() to C#;  I’m going to call it GetValue() to be consistent with TryGetValue(): [69] public static class DictionaryExtensions { [70] public static V GetValue<K, V>(this IDictionary<K, V> dict, K key) { [71]   return dict.GetValue(key, default(V)); [72]   } [73] [74]   public static V GetValue<K, V>(this IDictionary<K, V> dict, K key, V defaultValue) { [75]   V value; [76]      return dict.TryGetValue(key, out value) ? value : defaultValue; [77]   } [78] } Conclusion Today’s programming languages put powerful data structures,  such as dictionaries,  on your fingertips.  When we look closely,  we see subtle differences in the APIs used access dictionaries in different languages.  A study of the different APIs and their consequences can help us think about how to write code that is more reliable and maintainable,  and informs API design in every language Paul Houle on July 17th 2008 in Dot Net, Java, PHP Comments (1) Comment (1) Login Sort by: Date Rating Last Activity wekempf · 287 weeks ago 0 Uninitialized locals in C# don't have the problem your worried about. The compiler will issue an error (not a warning, but an error) if you attempt to use an uninitialized variable. This is an interesting post. There's some minor things I could debate, but overall, not a bad analysis. In the end, I believe .NET got this part of the interface mostly correct. There's no unsafe corner cases to be aware of (beyond the debate about null values, which I find to be a pointless debate). TryGetValue allows you to optimally attempt to get a value (this isn't so much about being an atomic operation, which in fact is not the case for Dictionary, but about not requiring two expensive lookups by using Contains followed by indexing). The only real complaint with TryGetValue is that it's usage is overly verbose, and in many cases a GetValue such as you provided would be simpler to consume. I'd argue, though, that an interface without TryGetValue would be even less optimal then an interface without GetValue. Reply Post a new comment Enter text right here! Comment as a Guest, or login: Name Email Website (optional) Displayed next to your comments. Not displayed publicly. If you have a website, link to it here. http://gen5.info/q/2008/07/17/the-semantics-of-dictionaries-maps-and-hashtables/[1/16/2014 4:03:03 PM]
  6. 6. Generation 5 » The Semantics of Dictionaries, Maps and Hashtables None Subscribe to None Submit Comment Copyright © 2013 Generation 5. WordPress Theme design. http://gen5.info/q/2008/07/17/the-semantics-of-dictionaries-maps-and-hashtables/[1/16/2014 4:03:03 PM]

×