SlideShare a Scribd company logo
1 of 33
Download to read offline
Parsing
                           for Fun and Profit
                            (but mainly fun)


                                                  Ash Moran
                                   ash.moran@patchspace.co.uk
                                               PatchSpace Ltd
Saturday, 23 February 13
What?


Saturday, 23 February 13
Parsing


                           Adding structure and meaning to text




Saturday, 23 February 13
Parsing Human
                                  Languages
                           Jake stretched his legs
                           “Jake”, “stretched”, “his”, “legs”
                           “Jake”<noun>, “stretched”<verb, past>,
                           “his”<possessive pronoun>, “legs”<noun>
                           “Jake” <noun, subject>, “stretched”, (“his”,
                           “legs”)<noun phrase, object>


Saturday, 23 February 13
Parsing Computer
                                Languages
                           “foo = bar + 123”
                           “foo”, “=”, “bar”, “+”, “123”
                           “foo”<var>, “=”<assignment_op>,
                           “bar”<var>, “+”<op_plus>,
                           “123”<int_literal>



Saturday, 23 February 13
Why?


Saturday, 23 February 13
Not just
           compiling!
            Compilers breathe fire.




Saturday, 23 February 13
Pretty Printing
Saturday, 23 February 13
Pretty
                Printing
                           gofmt

                 http://gofmt.com/




Saturday, 23 February 13
Code Smell Detectors
                           https://rubygems.org/gems/reek

Saturday, 23 February 13
Code Smell Detectors
Saturday, 23 February 13
Other ideas
                           Code metrics
                           Bug detectors
                           Domain-specific languages
                           Language translators (e.g. Ruby -> PHP)
                           Code obfuscators
                           Alternative syntaxes (e.g. CoffeeScript)
                           Refactoring tools
Saturday, 23 February 13
How?


Saturday, 23 February 13
Step 1
          3 year computer science
                   degree




Saturday, 23 February 13
Lexing/Tokenising


                   if x > 100 then return “big” else return “small”
                   if x > 100 then return “big” else return “small”




Saturday, 23 February 13
Tree Building
                   if x > 100 then return “big” else return a + b
                                           if

                               >         then     else

                           x       100   return     return

                                         “big”           +
                                                   a         b


Saturday, 23 February 13
Parsing Expression
                                Grammars

                           Like regular expressions, but can handle
                           recursion, e.g. HTML
                           Not actually that much harder to use




Saturday, 23 February 13
Regexes and HTML




Saturday, 23 February 13
Treetop PEG grammar
Saturday, 23 February 13
Doing Sums


Saturday, 23 February 13
Switch to
                           Sublime Text, idiot
                               Code is now available:
         https://github.com/patchspace/parsing_for_fun_and_profit/




Saturday, 23 February 13
A Ruby Syntax
                             Highlighter


Saturday, 23 February 13
What


                           A tool to read in simple Ruby source and
                           output syntax highlighted HTML




Saturday, 23 February 13
Why

                           Because I thought it would be fun
                           It was
                           Because I thought it would be easy
                           …



Saturday, 23 February 13
Why



Saturday, 23 February 13
How
                           Build a parse tree of the Ruby source
                           Walk the tree and spit out a <span>
                           element for each bit of text
                           Oh yes, make sure each line goes in <div>
                           and <pre> tags
                           Wrap it in <html>
                           And for bonus points, do some fancy
                           method highlighting
Saturday, 23 February 13
Switch to Chrome,
                                 idiot


Saturday, 23 February 13
Switch to
        Sublime Text again, idiot
                            Code is now available:
           https://github.com/patchspace/parsing_for_fun_and_profit/




Saturday, 23 February 13
We’re doing this the
                                hard way

                           Ruby’s grammar is too complex and
                           undefined to easily implement as a PEG
                           Tools for parsing Ruby already exist




Saturday, 23 February 13
Ripper (Ruby 1.9.3)
Saturday, 23 February 13
Learn
                           more!
        Skip theoretical physics,
       start by playing with Lego




Saturday, 23 February 13
Do more
                           Ideas you might like to try:
                            CSV parser
                            JSON parser (return arrays & hashes)
                            XML parser
                            JSON highlighter
                            A simple JavaScript minifier (just kill
                            whitespace)

Saturday, 23 February 13
Thank you

                                              Ash Moran
                               ash.moran@patchspace.co.uk
                                           PatchSpace Ltd
Saturday, 23 February 13

More Related Content

More from PatchSpace Ltd

Conflict in Complex Systems
Conflict in Complex SystemsConflict in Complex Systems
Conflict in Complex SystemsPatchSpace Ltd
 
Personal Kanban (lightning talk)
Personal Kanban (lightning talk)Personal Kanban (lightning talk)
Personal Kanban (lightning talk)PatchSpace Ltd
 
Why Won't My Car Start?
Why Won't My Car Start?Why Won't My Car Start?
Why Won't My Car Start?PatchSpace Ltd
 
ShRUG 5 - Scottish Ruby Conf edition
ShRUG 5  - Scottish Ruby Conf editionShRUG 5  - Scottish Ruby Conf edition
ShRUG 5 - Scottish Ruby Conf editionPatchSpace Ltd
 
Encouraging Agile Discipline
Encouraging Agile DisciplineEncouraging Agile Discipline
Encouraging Agile DisciplinePatchSpace Ltd
 
From Specification To Success
From Specification To SuccessFrom Specification To Success
From Specification To SuccessPatchSpace Ltd
 
Uses & Abuses of Mocks & Stubs
Uses & Abuses of Mocks & StubsUses & Abuses of Mocks & Stubs
Uses & Abuses of Mocks & StubsPatchSpace Ltd
 
NWRUG July 2009 - Darcs
NWRUG July 2009 - DarcsNWRUG July 2009 - Darcs
NWRUG July 2009 - DarcsPatchSpace Ltd
 
Elephants In The Meeting Room
Elephants In The Meeting RoomElephants In The Meeting Room
Elephants In The Meeting RoomPatchSpace Ltd
 

More from PatchSpace Ltd (9)

Conflict in Complex Systems
Conflict in Complex SystemsConflict in Complex Systems
Conflict in Complex Systems
 
Personal Kanban (lightning talk)
Personal Kanban (lightning talk)Personal Kanban (lightning talk)
Personal Kanban (lightning talk)
 
Why Won't My Car Start?
Why Won't My Car Start?Why Won't My Car Start?
Why Won't My Car Start?
 
ShRUG 5 - Scottish Ruby Conf edition
ShRUG 5  - Scottish Ruby Conf editionShRUG 5  - Scottish Ruby Conf edition
ShRUG 5 - Scottish Ruby Conf edition
 
Encouraging Agile Discipline
Encouraging Agile DisciplineEncouraging Agile Discipline
Encouraging Agile Discipline
 
From Specification To Success
From Specification To SuccessFrom Specification To Success
From Specification To Success
 
Uses & Abuses of Mocks & Stubs
Uses & Abuses of Mocks & StubsUses & Abuses of Mocks & Stubs
Uses & Abuses of Mocks & Stubs
 
NWRUG July 2009 - Darcs
NWRUG July 2009 - DarcsNWRUG July 2009 - Darcs
NWRUG July 2009 - Darcs
 
Elephants In The Meeting Room
Elephants In The Meeting RoomElephants In The Meeting Room
Elephants In The Meeting Room
 

Recently uploaded

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Parsing for Fun and Profit

  • 1. Parsing for Fun and Profit (but mainly fun) Ash Moran ash.moran@patchspace.co.uk PatchSpace Ltd Saturday, 23 February 13
  • 3. Parsing Adding structure and meaning to text Saturday, 23 February 13
  • 4. Parsing Human Languages Jake stretched his legs “Jake”, “stretched”, “his”, “legs” “Jake”<noun>, “stretched”<verb, past>, “his”<possessive pronoun>, “legs”<noun> “Jake” <noun, subject>, “stretched”, (“his”, “legs”)<noun phrase, object> Saturday, 23 February 13
  • 5. Parsing Computer Languages “foo = bar + 123” “foo”, “=”, “bar”, “+”, “123” “foo”<var>, “=”<assignment_op>, “bar”<var>, “+”<op_plus>, “123”<int_literal> Saturday, 23 February 13
  • 7. Not just compiling! Compilers breathe fire. Saturday, 23 February 13
  • 9. Pretty Printing gofmt http://gofmt.com/ Saturday, 23 February 13
  • 10. Code Smell Detectors https://rubygems.org/gems/reek Saturday, 23 February 13
  • 12. Other ideas Code metrics Bug detectors Domain-specific languages Language translators (e.g. Ruby -> PHP) Code obfuscators Alternative syntaxes (e.g. CoffeeScript) Refactoring tools Saturday, 23 February 13
  • 14. Step 1 3 year computer science degree Saturday, 23 February 13
  • 15. Lexing/Tokenising if x > 100 then return “big” else return “small” if x > 100 then return “big” else return “small” Saturday, 23 February 13
  • 16. Tree Building if x > 100 then return “big” else return a + b if > then else x 100 return return “big” + a b Saturday, 23 February 13
  • 17. Parsing Expression Grammars Like regular expressions, but can handle recursion, e.g. HTML Not actually that much harder to use Saturday, 23 February 13
  • 18. Regexes and HTML Saturday, 23 February 13
  • 20. Doing Sums Saturday, 23 February 13
  • 21. Switch to Sublime Text, idiot Code is now available: https://github.com/patchspace/parsing_for_fun_and_profit/ Saturday, 23 February 13
  • 22. A Ruby Syntax Highlighter Saturday, 23 February 13
  • 23. What A tool to read in simple Ruby source and output syntax highlighted HTML Saturday, 23 February 13
  • 24. Why Because I thought it would be fun It was Because I thought it would be easy … Saturday, 23 February 13
  • 26. How Build a parse tree of the Ruby source Walk the tree and spit out a <span> element for each bit of text Oh yes, make sure each line goes in <div> and <pre> tags Wrap it in <html> And for bonus points, do some fancy method highlighting Saturday, 23 February 13
  • 27. Switch to Chrome, idiot Saturday, 23 February 13
  • 28. Switch to Sublime Text again, idiot Code is now available: https://github.com/patchspace/parsing_for_fun_and_profit/ Saturday, 23 February 13
  • 29. We’re doing this the hard way Ruby’s grammar is too complex and undefined to easily implement as a PEG Tools for parsing Ruby already exist Saturday, 23 February 13
  • 31. Learn more! Skip theoretical physics, start by playing with Lego Saturday, 23 February 13
  • 32. Do more Ideas you might like to try: CSV parser JSON parser (return arrays & hashes) XML parser JSON highlighter A simple JavaScript minifier (just kill whitespace) Saturday, 23 February 13
  • 33. Thank you Ash Moran ash.moran@patchspace.co.uk PatchSpace Ltd Saturday, 23 February 13