Successfully reported this slideshow.
Your SlideShare is downloading. ×

What you forgot from your Computer Science Degree

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
GWT
GWT
Loading in …3
×

Check these out next

1 of 14 Ad

What you forgot from your Computer Science Degree

Download to read offline

An interesting -- possibly over engineered -- way to convert all occurrences of HTML escape characters into their Unicode equivalent on I

An interesting -- possibly over engineered -- way to convert all occurrences of HTML escape characters into their Unicode equivalent on I

Advertisement
Advertisement

More Related Content

Similar to What you forgot from your Computer Science Degree (20)

Advertisement

Recently uploaded (20)

What you forgot from your Computer Science Degree

  1. 1. What you forgot from your Computer Science Degree Stephen Darlington Wandle Software Limited
  2. 2. Okay, not all of it.
  3. 3. Requirements • Parse a string. Convert all occurrences of HTML escape characters into their Unicode equivalent • "If you see '&lt;' convert it to '<'"
  4. 4. How Google Did It static HTMLEscapeMap gAsciiHTMLEscapeMap[] = { // A.2.2. Special characters { @"&quot;", 34 }, { @"&amp;", 38 }, { @"&apos;", 39 }, { @"&lt;", 60 }, ... { @"&hearts;", 9829 }, { @"&diams;", 9830 } }; https://code.google.com/p/google-toolbox-for-mac/source/browse/trunk/Foundation/GTMNSString %2BHTML.m
  5. 5. How Google Did It for (unsigned i = 0; i < sizeof(gAsciiHTMLEscapeMap) / sizeof(HTMLEscapeMap); ++i) { if ([escapeString isEqualToString:gAsciiHTMLEscapeMap[i].escapeSequence]) { [finalString replaceCharactersInRange:escapeRange withString: [NSString stringWithCharacters:&gAsciiHTMLEscapeMap[i].uchar length:1]]; break; } }
  6. 6. Yuck
  7. 7. “flex is a tool for generating scanners. A scanner is a program which recognizes lexical patterns in text. The flex program [looks for a] description of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules. flex generates as output a C source file” Lexical Analysis With Flex Introduction http://flex.sourceforge.net/manual/Introduction.html#Introduction
  8. 8. Lexer Description &amp; { return WSL_ENTITY_amp; } &gt; { return WSL_ENTITY_gt; } &lt; { return WSL_ENTITY_lt; } &quot; { return WSL_ENTITY_quot; } &apos; { return WSL_ENTITY_apos; } &AElig; { return WSL_ENTITY_AElig; } ... &#[0-9]+; { return WSL_ENTITY_NUMBER; } [^&]+ { return WSL_ENTITY_NOMATCH; } . { return WSL_ENTITIY_NOMATCH; }
  9. 9. Constants #define WSL_ENTITY_NOMATCH -1 #define WSL_ENTITY_NUMBER -2 #define WSL_ENTITY_amp 38 // # ampersand #define WSL_ENTITY_gt 62 // # greater than #define WSL_ENTITY_lt 60 // # less than #define WSL_ENTITY_quot 34 // # double quote ...
  10. 10. Main loop while ((expression = WSLlex(scanner))) { switch (expression) { case WSL_ENTITY_NOMATCH: [output appendFormat:@"%@", [NSString stringWithCString:WSLget_text(scanner) encoding:NSISOLatin1StringEncoding]]; break; case WSL_ENTITY_NUMBER: expression = atoi(&WSLget_text(scanner)[2]); // fall through so expression is added to string default: [output appendFormat:@"%C", (unsigned short) expression]; break; } }
  11. 11. Ziggity-ZaggityZooooom!
  12. 12. Benefits • Right tool for the right job • Consistent performance • Xcode knows about Flex (with some caveats) so simple to integrate • Flex has various flags to optimise performance, for example -Cf is much faster but uses lots more memory
  13. 13. Further information • WSLHTMLEntities is on GitHub (https://github.com/sdarlington/WSLHTMLEntities ) • Flex documentation (http://flex.sourceforge.net/manual/) • "Introduction to Compiling Techniques," J P Bennett
  14. 14. Stephen Darlington Wandle Software Limited @sdarlington @wandlesoftware http://www.zx81.org.uk/ http://www.wandlesoftware.com/ Apps: Yummy / www.cut / Rootn Tootn / CameraGPS

×