Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
What you forgot from your
Computer Science
Degree
Stephen Darlington
Wandle Software Limited
Okay, not all of it.
Requirements
•

Parse a string. Convert all occurrences of HTML
escape characters into their Unicode equivalent

•

"If yo...
How Google Did It
static HTMLEscapeMap gAsciiHTMLEscapeMap[] = {
// A.2.2. Special characters
{ @""", 34 },
{ @"&...
How Google Did It
for (unsigned i = 0; i < sizeof(gAsciiHTMLEscapeMap) /
sizeof(HTMLEscapeMap); ++i) {
if ([escapeString
i...
Yuck
“flex is a tool for generating scanners. A
scanner is a program which recognizes
lexical patterns in text. The flex progra...
Lexer Description
&amp; { return WSL_ENTITY_amp; }
&gt; { return WSL_ENTITY_gt; }
&lt; { return WSL_ENTITY_lt; }
&quot; { ...
Constants
#define WSL_ENTITY_NOMATCH -1
#define WSL_ENTITY_NUMBER -2
#define WSL_ENTITY_amp 38 // # ampersand
#define WSL_...
Main loop
while ((expression = WSLlex(scanner))) {
switch (expression) {
case WSL_ENTITY_NOMATCH:
[output appendFormat:@"%...
Ziggity-ZaggityZooooom!
Benefits
•

Right tool for the right job

•

Consistent performance

•

Xcode knows about Flex
(with some caveats) so
simp...
Further information
•

WSLHTMLEntities is on GitHub
(https://github.com/sdarlington/WSLHTMLEntities
)

•

Flex documentati...
Stephen Darlington
Wandle Software Limited
@sdarlington
@wandlesoftware
http://www.zx81.org.uk/
http://www.wandlesoftware....
Upcoming SlideShare
Loading in …5
×

of

What you forgot from your Computer Science Degree Slide 1 What you forgot from your Computer Science Degree Slide 2 What you forgot from your Computer Science Degree Slide 3 What you forgot from your Computer Science Degree Slide 4 What you forgot from your Computer Science Degree Slide 5 What you forgot from your Computer Science Degree Slide 6 What you forgot from your Computer Science Degree Slide 7 What you forgot from your Computer Science Degree Slide 8 What you forgot from your Computer Science Degree Slide 9 What you forgot from your Computer Science Degree Slide 10 What you forgot from your Computer Science Degree Slide 11 What you forgot from your Computer Science Degree Slide 12 What you forgot from your Computer Science Degree Slide 13 What you forgot from your Computer Science Degree Slide 14
Upcoming SlideShare
As tic na escola 2013-14
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

What you forgot from your Computer Science Degree

Download to read offline

An interesting -- possibly over engineered -- way to convert all occurrences of HTML escape characters into their Unicode equivalent on I

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

What you forgot from your Computer Science Degree

  1. 1. What you forgot from your Computer Science Degree Stephen Darlington Wandle Software Limited
  2. 2. Okay, not all of it.
  3. 3. Requirements • Parse a string. Convert all occurrences of HTML escape characters into their Unicode equivalent • "If you see '&lt;' convert it to '<'"
  4. 4. How Google Did It static HTMLEscapeMap gAsciiHTMLEscapeMap[] = { // A.2.2. Special characters { @"&quot;", 34 }, { @"&amp;", 38 }, { @"&apos;", 39 }, { @"&lt;", 60 }, ... { @"&hearts;", 9829 }, { @"&diams;", 9830 } }; https://code.google.com/p/google-toolbox-for-mac/source/browse/trunk/Foundation/GTMNSString %2BHTML.m
  5. 5. How Google Did It for (unsigned i = 0; i < sizeof(gAsciiHTMLEscapeMap) / sizeof(HTMLEscapeMap); ++i) { if ([escapeString isEqualToString:gAsciiHTMLEscapeMap[i].escapeSequence]) { [finalString replaceCharactersInRange:escapeRange withString: [NSString stringWithCharacters:&gAsciiHTMLEscapeMap[i].uchar length:1]]; break; } }
  6. 6. Yuck
  7. 7. “flex is a tool for generating scanners. A scanner is a program which recognizes lexical patterns in text. The flex program [looks for a] description of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules. flex generates as output a C source file” Lexical Analysis With Flex Introduction http://flex.sourceforge.net/manual/Introduction.html#Introduction
  8. 8. Lexer Description &amp; { return WSL_ENTITY_amp; } &gt; { return WSL_ENTITY_gt; } &lt; { return WSL_ENTITY_lt; } &quot; { return WSL_ENTITY_quot; } &apos; { return WSL_ENTITY_apos; } &AElig; { return WSL_ENTITY_AElig; } ... &#[0-9]+; { return WSL_ENTITY_NUMBER; } [^&]+ { return WSL_ENTITY_NOMATCH; } . { return WSL_ENTITIY_NOMATCH; }
  9. 9. Constants #define WSL_ENTITY_NOMATCH -1 #define WSL_ENTITY_NUMBER -2 #define WSL_ENTITY_amp 38 // # ampersand #define WSL_ENTITY_gt 62 // # greater than #define WSL_ENTITY_lt 60 // # less than #define WSL_ENTITY_quot 34 // # double quote ...
  10. 10. Main loop while ((expression = WSLlex(scanner))) { switch (expression) { case WSL_ENTITY_NOMATCH: [output appendFormat:@"%@", [NSString stringWithCString:WSLget_text(scanner) encoding:NSISOLatin1StringEncoding]]; break; case WSL_ENTITY_NUMBER: expression = atoi(&WSLget_text(scanner)[2]); // fall through so expression is added to string default: [output appendFormat:@"%C", (unsigned short) expression]; break; } }
  11. 11. Ziggity-ZaggityZooooom!
  12. 12. Benefits • Right tool for the right job • Consistent performance • Xcode knows about Flex (with some caveats) so simple to integrate • Flex has various flags to optimise performance, for example -Cf is much faster but uses lots more memory
  13. 13. Further information • WSLHTMLEntities is on GitHub (https://github.com/sdarlington/WSLHTMLEntities ) • Flex documentation (http://flex.sourceforge.net/manual/) • "Introduction to Compiling Techniques," J P Bennett
  14. 14. Stephen Darlington Wandle Software Limited @sdarlington @wandlesoftware http://www.zx81.org.uk/ http://www.wandlesoftware.com/ Apps: Yummy / www.cut / Rootn Tootn / CameraGPS
  • anthonymerseal

    Sep. 2, 2017

An interesting -- possibly over engineered -- way to convert all occurrences of HTML escape characters into their Unicode equivalent on I

Views

Total views

2,488

On Slideshare

0

From embeds

0

Number of embeds

743

Actions

Downloads

3

Shares

0

Comments

0

Likes

1

×