Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What you forgot from your Computer Science Degree

2,235 views

Published on

An interesting -- possibly over engineered -- way to convert all occurrences of HTML escape characters into their Unicode equivalent on I

Published in: Technology
  • Be the first to comment

What you forgot from your Computer Science Degree

  1. 1. What you forgot from your Computer Science Degree Stephen Darlington Wandle Software Limited
  2. 2. Okay, not all of it.
  3. 3. Requirements • Parse a string. Convert all occurrences of HTML escape characters into their Unicode equivalent • "If you see '&lt;' convert it to '<'"
  4. 4. How Google Did It static HTMLEscapeMap gAsciiHTMLEscapeMap[] = { // A.2.2. Special characters { @"&quot;", 34 }, { @"&amp;", 38 }, { @"&apos;", 39 }, { @"&lt;", 60 }, ... { @"&hearts;", 9829 }, { @"&diams;", 9830 } }; https://code.google.com/p/google-toolbox-for-mac/source/browse/trunk/Foundation/GTMNSString %2BHTML.m
  5. 5. How Google Did It for (unsigned i = 0; i < sizeof(gAsciiHTMLEscapeMap) / sizeof(HTMLEscapeMap); ++i) { if ([escapeString isEqualToString:gAsciiHTMLEscapeMap[i].escapeSequence]) { [finalString replaceCharactersInRange:escapeRange withString: [NSString stringWithCharacters:&gAsciiHTMLEscapeMap[i].uchar length:1]]; break; } }
  6. 6. Yuck
  7. 7. “flex is a tool for generating scanners. A scanner is a program which recognizes lexical patterns in text. The flex program [looks for a] description of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules. flex generates as output a C source file” Lexical Analysis With Flex Introduction http://flex.sourceforge.net/manual/Introduction.html#Introduction
  8. 8. Lexer Description &amp; { return WSL_ENTITY_amp; } &gt; { return WSL_ENTITY_gt; } &lt; { return WSL_ENTITY_lt; } &quot; { return WSL_ENTITY_quot; } &apos; { return WSL_ENTITY_apos; } &AElig; { return WSL_ENTITY_AElig; } ... &#[0-9]+; { return WSL_ENTITY_NUMBER; } [^&]+ { return WSL_ENTITY_NOMATCH; } . { return WSL_ENTITIY_NOMATCH; }
  9. 9. Constants #define WSL_ENTITY_NOMATCH -1 #define WSL_ENTITY_NUMBER -2 #define WSL_ENTITY_amp 38 // # ampersand #define WSL_ENTITY_gt 62 // # greater than #define WSL_ENTITY_lt 60 // # less than #define WSL_ENTITY_quot 34 // # double quote ...
  10. 10. Main loop while ((expression = WSLlex(scanner))) { switch (expression) { case WSL_ENTITY_NOMATCH: [output appendFormat:@"%@", [NSString stringWithCString:WSLget_text(scanner) encoding:NSISOLatin1StringEncoding]]; break; case WSL_ENTITY_NUMBER: expression = atoi(&WSLget_text(scanner)[2]); // fall through so expression is added to string default: [output appendFormat:@"%C", (unsigned short) expression]; break; } }
  11. 11. Ziggity-ZaggityZooooom!
  12. 12. Benefits • Right tool for the right job • Consistent performance • Xcode knows about Flex (with some caveats) so simple to integrate • Flex has various flags to optimise performance, for example -Cf is much faster but uses lots more memory
  13. 13. Further information • WSLHTMLEntities is on GitHub (https://github.com/sdarlington/WSLHTMLEntities ) • Flex documentation (http://flex.sourceforge.net/manual/) • "Introduction to Compiling Techniques," J P Bennett
  14. 14. Stephen Darlington Wandle Software Limited @sdarlington @wandlesoftware http://www.zx81.org.uk/ http://www.wandlesoftware.com/ Apps: Yummy / www.cut / Rootn Tootn / CameraGPS

×