Don't Be Afraid of Abstract Syntax Trees

1,951 views

Published on

ASTs are an incredibly powerful tool for understanding and manipulating JavaScript. We'll explore this topic by looking at examples from ESLint, a pluggable static analysis tool, and Browserify, a client-side module bundler. Through these examples we'll see how ASTs can be great for analyzing and even for modifying your JavaScript. This talk should be interesting to anyone that regularly builds apps in JavaScript either on the client-side or on the server-side.

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,951
On SlideShare
0
From Embeds
0
Number of Embeds
48
Actions
Shares
0
Downloads
20
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • Hi I’m Jamund. I work at PayPal doing node stuff and I have an environmental planning degree from the university of washington. Which hopefully qualifies me to talk you about trees today.
  • Before I get started, how many of you have used these tools before and know what i’m talking about? Cool, so the rest of you. please stick around. It’s to ally worth it and I’m sure these people can attest to that.
  • Instead of having to worry about string parsing and regexes and all that nonsense.
  • Here’s an example. This is using the mozilla parser API format used by esprima and respresents the following code.
  • This single statement creates a tree with 7 different nodes!!!
  • Each item in this tree is a node of one of several types. We’ll discuss these types in more depth later.
  • My goals is to help my team not introduce bugs in their code.
  • Anyone know what’w wrong with this with?
  • So we just built an ESLint rule that would ensure we never made that same mistake again! And it worked! We haven’t seen that type of bug crop up in our codebase.
  • I love static analysis and that example proves that it can do a lot more than complain about your formatting.
  • Essentially since 2011 ever new major static analysis tool has been based on esprima.
  • Mostly because of this…If I want to add a rule. I can put it in its own file/function. Adding that handle-callback-err thing was no problem at all. Even if ESLint didn’t take it at this point I can just NPM install rules and use them! It rocks.
  • Mixes the parser with the linter. It’s a great tool. But it’s not as easy to extend.
  • Everytime there’s a member expression, which is thing.thing you just check to see if the name is the console, in which case it complains…..super simple..

    But who cares. You can do this with “grep” or something. Let’s do something more fun.
  • Pretty awesome. So every time you have a function expression or function declaration we check to see if has an ancestor that’s a loop. Pretty dang slick.
  • https://github.com/eslint/eslint/blob/master/lib/rules/max-params.js
  • Here’s a custom one we have that we only apply to certain parts of the code-base. For example in our backbone models, to prevent people from using Backbone.sync.
  • Currently working with our globalization team on rules to help us avoid mistakes in handling dates, and phone numbers worldwide. and much more.
  • For example: what if you could turn your AST into a series of beats and you could listen to your code to help determine how complicated it is. Short song good. Consistent tones good. Too many high notes bad? I don’t know. Think about that.
  • We’ll use both of these in the following examples:
  • How many of you have used browserify?
  • Say we want to change this to this. Dynamically. Easy.
  • Through takes 2 callbacks. The first lets you buffer the stream data the 2nd is when you get everything.
  • This is our parse() function. We use falafel’s node.update to transform the node.
  • altogether it looks like this
  • And it works. And you can add them with npm and they just work.
  • Marihn haverbeke, author of Acorn.js parser and a genius. Asks this question. The answer is no, but don’t let that stop you from writing .forEach() and .map() and .filter(). CODE IS FOR HUMANS
  • https://github.com/xjamundx/perfify-recast/
  • But we can do hard things.
  • But we write code for humans, so we’ll let the transform take care of this.
  • https://github.com/benjamn/recast uses “partial source transformation” to safely

    With browserify when you apply the transforms it rewrites the whole thing. You generally lose comments and stuff that isn’t really needed to form the AST. Recast (and some other tools like falafel) are nice for one-time refactoring because the employ techniques to limit the amount of code that is touched during the re-generation phase.
  • I can show you the full source it’s online…. https://github.com/xjamundx/perfify-recast/blob/master/index.js
  • There’s also a Facebook fork of Esprima that supports ES6, so it’s getting some support as well. I assume it will eventually bubble back up to the main branch…hopefully?
  • You probably just need esprima.
  • http://esprima.org/demo/parse.html#
  • It’s also called the SpiderMonkey API. Are th
  • This single statement creates a tree with 7 different nodes!!!
  • The key to being successful in working with the JavaScript AST is to study the node types. Memorize them. Get used to them. Think about your JavaScript in this way. Here are the basics.
  • esprima.org/demo/parse.html#
  • https://github.com/benjamn/ast-types
  • https://github.com/benjamn/ast-types/blob/master/def/core.js
  • Essentially acorn came out around the time Uglify2 was coming out and Esprima was still new, so he spent a lot of time improving Uglify and didn’t want to rewrite it again, so he wrote a compatability layer.
  • https://medium.com/@valueof/why-i-forked-jslint-to-jshint-73a72fd3612
  • Anyone know what’w wrong with this with?
  • https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Parser_API
  • There are tons of other tools out there as well. I’m just most familiar with thes.e
  • Don't Be Afraid of Abstract Syntax Trees

    1. 1. Don’t Be Afraid of ASTs Jamund Ferguson
    2. 2. Our Basic Plan 1. High-level overview 2. Static Analysis with ASTs 3. Transforming and refactoring 4. A quick look at the Mozilla Parser API (de-facto standard AST format)
    3. 3. An abstract syntax tree is basically a DOM for your code.
    4. 4. An AST makes it easier to inspect and manipulate your code with confidence.
    5. 5. { "type": "Program", "body": [ { "type": “VariableDeclaration", "kind": "var" "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "fullstack" }, "init": { "type": "BinaryExpression", "left": { "type": "Identifier", "name": "node" }, "operator": "+", "right": { "type": "Identifier", "name": "ui" } } } ], } ] }
    6. 6. var fullstack = node + ui;
    7. 7. { "type": "Program", "body": [ { "type": “VariableDeclaration", "kind": "var" "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "fullstack" }, "init": { "type": "BinaryExpression", "left": { "type": "Identifier", "name": "node" }, "operator": "+", "right": { "type": "Identifier", "name": "ui" } } } ], } ] }
    8. 8. Things Built On ASTs • Syntax Highlighting • Code Completion • Static Analysis (aka JSLint, etc.) • Code Coverage • Minification • JIT Compilation • Source Maps • Compile to JS Languages So much more…
    9. 9. Static Analysis
    10. 10. It’s not just about formatting.
    11. 11. Fix a bug. Add a unit test. Fix a similar bug…
    12. 12. Write some really solid static analysis. Never write that same type of bug again.
    13. 13. function loadUser(req, res, next) { User.loadUser(function(err, user) { req.session.user = user; next(); }); } Bad Example We forgot to handle the error!
    14. 14. handle-callback-err 1. Each time a function is declared check if there is an error* parameter If so set a count to 0; Increment count when error is used At the end of the function warn when count is empty * the parameter name can be defined by the user
    15. 15. Static Analysis • Complexity Analysis • Catching Mistakes • Consistent Style
    16. 16. History Lesson • 1995: JavaScript • 2002: JSLint started by Douglas Crockford • 2011: JSHint comes out as a fork of JSLint. Esprima AST parser released. • 2012: plato, escomplex, complexity-report • 2013: Nicholoas Zakas releases ESLint. Marat Dulin releases JSCS.
    17. 17. My static analysis tool of choice is ESLint.
    18. 18. JSHint Mixes the rule engine with the parser
    19. 19. Examples
    20. 20. no-console return { "MemberExpression": function(node) { if (node.object.name === "console") { context.report(node, "Unexpected console statement.”); } } }; https://github.com/eslint/eslint/blob/master/lib/rules/no-console.js
    21. 21. no-loop-func function checkForLoops(node) { var ancestors = context.getAncestors(); if (ancestors.some(function(ancestor) { return ancestor.type === "ForStatement" || ancestor.type === "WhileStatement" || ancestor.type === "DoWhileStatement"; })) { context.report(node, "Don't make functions within a loop"); } } return { "FunctionExpression": checkForLoops, "FunctionDeclaration": checkForLoops };
    22. 22. max-params var numParams = context.options[0] || 3; function checkParams(node) { if (node.params.length > numParams) { context.report(node, "This function has too many parameters ({{count}}). Maximum allowed is {{max}}.", { count: node.params.length, max: numParams }); } } return { “FunctionDeclaration”: checkParams, “FunctionExpression”: checkParams }
    23. 23. no-jquery function isjQuery(name) { return name === '$' || name === 'jquery' || name === 'jQuery'; } return { “CallExpression”: function(node) { var name = node.callee && node.callee.name; if (isjQuery(name)) { context.report(node, 'Please avoid using jQuery here.’); } } }
    24. 24. More Complex Rules • indent • no-extend-native • no-next-next • security • internationalization
    25. 25. Other Areas for Static Analysis Code complexity and visualization is another area where static analysis is really useful. Plato is an exciting start, but I believe there are tons of more interesting things that can be done in this area.
    26. 26. Recap • Static Analysis can help you catch real bugs and keep your code maintainable • ESLint and JSCS both use ASTs for inspecting your code to make it easy to cleanly to add new rules • Static analysis can also help you manage your code complexity as well • What exactly does a for loop sound like?
    27. 27. Transforms
    28. 28. Sometimes you want to step into the future, but something is keeping you in the past.
    29. 29. Maybe it’s Internet Explorer
    30. 30. Maybe it’s the size of your code base
    31. 31. ASTs to the rescue!
    32. 32. Tools like falafel and recast give you an API to manipulate an AST and then convert that back into source code.
    33. 33. Two Types of AST Transformations Regenerative Regenerate the full file from the AST. Often losing comments and non-essential formatting. Fine for code not read by humans (i.e. browserify transforms). Partial-source transformation Regenerate only the parts of the source that have changed based on the AST modifications. Nicer for one-time changes in source.
    34. 34. Build a Simple Browserify Transform
    35. 35. var vfaurl lfsutlalcskt a=c kn o=d en o+d ber o+w sueir;ify;
    36. 36. 4 Steps 1. Buffer up the stream of source code 2. Convert the source into an AST 3. Transform the AST 4. Re-generate and output the source
    37. 37. Step 1 Use through to grab the source code var through = require(‘through'); var buffer = []; return through(function write(data) { buffer.push(data); }, function end () { var source = buffer.join(‘’); });
    38. 38. Step 2 Use falafel to transform create an AST var falafel = require(‘falafel’); function end () { var source = buffer.join(‘’); var out = falafel(source, parse).toString(); }
    39. 39. Step 3 function parse(node) { if (node.type === 'Identifier' && node.value === ‘ui’) { node.update('browserify'); } } Use falafel to transform the AST
    40. 40. Step 4 Stream the source with through and close the stream function end () { var source = buffer.join(‘’); var out = falafel(source, parse).toString(); this.queue(out); this.queue(null); // end the stream }
    41. 41. var through = require('through'); var falafel = require('falafel'); module.exports = function() { var buffer = []; return through(function write(data) { buffer.push(data); }, function end() { var source = buffer.join('n'); var out = falafel(source, parse).toString(); this.queue(out); this.queue(null); // close the stream }); }; function parse(node) { if (node.type === 'Identifier' && node.name === 'ui') { node.update('browserify'); } }
    42. 42. It Works! browserify -t ./ui-to-browserify.js code.js (function e(t,n,r){function s(o,u){if(!n[o]){if(!t[o]){var a=typeof require=="var fullstack = node + browserify; },{}]},{},[1]);
    43. 43. Lots of code to do something simple?
    44. 44. Probably, but… It will do exactly what is expected 100% of the time.
    45. 45. And it’s a building block for building a bunch of cooler things.
    46. 46. What sort of cooler things?
    47. 47. How about performance?
    48. 48. V8 doesn’t do it, but there’s nothing stopping you*.
    49. 49. *Except it’s hard.
    50. 50. A Basic Map/Filter var a = [1, 2, 3]; var b = a.filter(function(n) { return n > 1; }).map(function(k) { return k * 2; });
    51. 51. Faster Like This var a = [1, 2, 3]; var b = []; for (var i = 0; i < a.length; i++) { if (a[i] > 1) { b.push(a[i] * 2); } }
    52. 52. A Basic Recast Script var recast = require(‘recast’); var code = fs.readFileSync(‘code.js', 'utf-8'); var ast = recast.parse(code); var faster = transform(ast); var output = recast.print(faster).code;
    53. 53. function transform(ast) { var transformedAST = new MapFilterEater({ body: ast.program.body }).visit(ast); return transformedAST; } var Visitor = recast.Visitor; var MapFilterEater = Visitor.extend({ init: function(options) {}, visitForStatement: function(ast) {}, visitIfStatement: function(ast) {}, visitCallExpression: function(ast) {}, visitVariableDeclarator: function(ast) {} });
    54. 54. How Does it Work? 1. Move the right side of the b declaration into a for loop 2. Set b = [] 3. Place the .filter() contents inside of an if statement 4. Unwrap the .map contents and .push() them into b 5. Replace all of the local counters with a[_i]
    55. 55. And Voila…. var a = [1, 2, 3]; var b = []; for (var i = 0; i < a.length; i++) { if (a[i] > 1) { b.push(a[i] * 2); } }
    56. 56. Worth the effort? YES!
    57. 57. The most well-read documentation for how to engineer your app is the current codebase.
    58. 58. If you change your code, you can change the future.
    59. 59. Knowledge What is an AST and what does it look like?
    60. 60. Parser 1. Read your raw JavaScript source. 2. Parse out every single thing that’s happening. 3. Return an AST that represents your code
    61. 61. Esprima is a very popular* parser that converts your code into an abstract syntax tree. *FB recently forked it to add support for ES6 and JSX
    62. 62. Parsers narcissus ZeParser Treehugger Uglify-JS Esprima Acorn
    63. 63. Esprima follows the Mozilla Parser API which is a well documented AST format used internally by Mozilla (and now by basically everyone else*)
    64. 64. var fullstack = node + ui;
    65. 65. { "type": "Program", "body": [ { "type": “VariableDeclaration", "kind": "var" "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "fullstack" }, "init": { "type": "BinaryExpression", "left": { "type": "Identifier", "name": "node" }, "operator": "+", "right": { "type": "Identifier", "name": "ui" } } } ], } ] }
    66. 66. { "type": "Program", "body": [ { "type": “VariableDeclaration", "kind": "var" "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "fullstack" }, "init": { "type": "BinaryExpression", "left": { "type": "Identifier", "name": "node" }, "operator": "+", "right": { "type": "Identifier", "name": "ui" } } } ], } ] }
    67. 67. Node Types SwitchCase (1) Property (1) Literal (1) Identifier (1) Declaration (3) Expression (14) Statement (18)
    68. 68. Expression Types • FunctionExpression • MemberExpression • CallExpression • NewExpression • ConditionalExpression • LogicalExpression • UpdateExpression • AssignmentExpression • BinaryExpression • UnaryExpression • SequenceExpression • ObjectExpression • ArrayExpression • ThisExpression
    69. 69. Statement Types • DebuggerStatement • ForInStatement • ForStatement • DoWhileStatement • WhileStatement • CatchClause • TryStatement • ThrowStatement • ReturnStatement • SwitchStatement • WithStatement • ContinueStatement • BreakStatement • LabeledStatement • IfStatement • ExpressionStatement • BlockStatement • EmptyStatement
    70. 70. Debugging ASTs
    71. 71. • When debugging console.log(ast) will not print a large nested AST properly. Instead you can use util.inspect: var util = require('util'); var tree = util.inspect(ast, { depth: null }); console.log(tree); • When transforming code start with the AST you want and then work backward. • Often this means pasting code using the Esprima online visualization tool or just outputting the trees into JS files and manually diffing them.
    72. 72. Oftentimes it helps to print out the code representation of a single node. In recast you can do: var source = recast.prettyPrint(ast, { tabWidth: 2 }).code; In ESLint you can get the current node with: var source = context.getSource(node)
    73. 73. ASTs can turn your code into play-dough
    74. 74. Totally worth the effort!
    75. 75. the end @xjamundx
    76. 76. Related - http://pegjs.majda.cz/ - https://www.kickstarter.com/projects/michaelficarra/make-a-better-coffeescript-compiler - http://coffeescript.org/documentation/docs/grammar.html - https://github.com/padolsey/parsers-built-with-js - https://github.com/zaach/jison - https://github.com/substack/node-falafel - eprima-based code modifier - https://github.com/substack/node-burrito - uglify-based AST walking code-modifier - https://github.com/jscs-dev/node-jscs/blob/e745ceb23c5f1587c3e43c0a9cfb05f5ad86b5ac/lib/js-file.js - JSCS’s way of walking the AST - https://www.npmjs.org/package/escodegen - converts an AST into real code again - https://www.npmjs.org/package/ast-types - esprima-ish parser - http://esprima.org/demo/parse.html - the most helpful tool https://github.com/RReverser/estemplate - AST-based search and replace https://www.npmjs.org/package/aster - build system thing Technical Papers http://aosd.net/2013/escodegen.html Videos / Slides http://slidedeck.io/benjamn/fluent2014-talk http://vimeo.com/93749422 https://speakerdeck.com/michaelficarra/spidermonkey-parser-api-a-standard-for-structured-js-representations https://www.youtube.com/watch?v=fF_jZ7ErwUY https://speakerdeck.com/ariya/bringing-javascript-code-analysis-to-the-next-level Just in Time Compilers http://blogs.msdn.com/b/ie/archive/2012/06/13/advances-in-javascript-performance-in-ie10-and-windows-8.aspx https://blog.mozilla.org/luke/2014/01/14/asm-js-aot-compilation-and-startup-performance/ https://blog.indutny.com/4.how-to-start-jitting Podcasts http://javascriptjabber.com/082-jsj-jshint-with-anton-kovalyov/ http://javascriptjabber.com/054-jsj-javascript-parsing-asts-and-language-grammar-w-david-herman-and-ariya-hidayat/
    77. 77. RANDOM EXTRA SLIDES
    78. 78. Static analysis tools like ESLint and JSCS provide an API to let you inspect an AST to make sure it’s following certain patterns.
    79. 79. function isEmptyObject( obj ) { for ( var name in obj ) { return false; } return true; }
    80. 80. static analysis > unit testing > functional testing
    81. 81. function loadUser(req, res, next) { User.loadUser(function(err, user) { if (err) { next(err); } req.session.user = user; next(); }); } Another Example
    82. 82. Program VariableDeclarator VariableDeclaration FunctionExpression ExpressionStatement Identifier Identifier Abstract Christmas Trees
    83. 83. Tools like falafel and recast give you an API to manipulate an AST and then convert that back into source code.

    ×