Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to create a high performance excel engine in java script

163 views

Published on

You have complex mathematical models (millions of cells, hundreds thousand of formulas) in Excel. And you need to run it browser and mobile without excel. I will talk how we created own spreadsheet engine compatible with MS Excel which allows us to run any Excel model without Excel. I will talk about:
* Architecture
* Algorithms
* JavaScript performance optimization.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

How to create a high performance excel engine in java script

  1. 1. Create own Excel with JavaScript Viktor Turskyi CEO at WebbyLab 2019
  2. 2. Viktor Turskyi ● CEO and principal architect at WebbyLab ● Open source developer ● More than 15 years of experience ● Delivered more than 60 projects of different scale ● Did projects for 5 companies from Fortune 500 list
  3. 3. Business task Business logic is in Excel and you need to code it in your app and run in browser and mobile. This business logic is complex mathematical models.
  4. 4. Ok. Send us the file.
  5. 5. Model details ● 2 mln cells ● 400k formulas ● 1 mln Excel functions ● 50 sheets ● Computation chains of 20-30k of cells
  6. 6. Demo of original source file
  7. 7. Requirements ● High performance (<2s full recompute) ● Small file size (suitable for work in browser) ● Offline work in browser ● Work on server ● Offline work on tablets (iOS, Android)
  8. 8. Decision: write own excel in JavaScript
  9. 9. What we want?
  10. 10. Is JS performant enough for mathematical computations?
  11. 11. Performance testing (100k times, large math formula AST)
  12. 12. Any ideas how to do this?
  13. 13. It is like creating a compiler
  14. 14. Components
  15. 15. Extractor
  16. 16. How to read data from XLS file? ● Extract values ● Extract formulas ● Extract sheet names ● Extract cells/ranges names
  17. 17. ● Nodejs libraries ● Ruby libraries ● Python libraries ● Perl libraries ● PHP libraries We tried (everything did not work)
  18. 18. Run Excel as OLE Object Communicate with Excel via VBA methods What did work for us?
  19. 19. Preprocessor
  20. 20. What next? Preprocess all data 1. Parse all raw data 2. Parse and normalize formulas 3. Parse and normalize references 4. Optimize size
  21. 21. FormulaParser: What to parse? 1. Operators priority 2. Infix/prefix operators 3. Constants 4. Functions 5. Cell references 6. Range references 7. Named ranges
  22. 22. =IF($F$36 + $AF128 <= 101; SUMPRODUCT( ($S128:OFFSET($S128;$F$36-1;0)) * ($AG$55:OFFSET($AG$55;$F$36-1;0)) * ('Sheet25'!BY84:OFFSET('Sheet25'!BY84;$F$36-1;0) + 'Sheet25'!BY194:OFFSET('Sheet25'!BY194; $F$36-1; 0) ) ); SUMPRODUCT( ($S128:$S$155) * ($AG$55:OFFSET($AG$55;100-$AF128;0)) * ('Sheet25'!BY84:BY$111 + 'Sheet25'!BY194:BY$221) ) ) Formula example
  23. 23. Mistake 1: Trying to write own parser from scratch Own parser 1. Complex 2. A lot of time 3. Expensive
  24. 24. 90% of the work is the same as writing a parser for programming language
  25. 25. Good solution - ANTLR 1. Parser generator based on Grammars (including JS) 2. Lexer and Parser 3. Emits AST (Abstract Syntax Tree) 4. The fastest and the most powerful http://www.antlr.org/
  26. 26. Formulas examples Formula: '=1+2*3' JS AST: [ '+', 1, [ '*', 2, 3] ] Formula: '=A1+B1' JS AST: [‘+’, ['=', 0, 0, 0], ['=', 0, 1, 0] ] Formula: ‘=SUM(B5:B100, 42)' JS AST: [ 'SUM', [ 'RANGE', 0, 1, 4, 1, 99 ], 42 ]
  27. 27. Model Runner
  28. 28. Model Runner (simplified)
  29. 29. Components ● LocalRunner(Engine) - works with model, processes all cells dependencies ● Formula Evaluator - computes one formula ● Address Parser - parses address in runtime ● Functions - Excel functions implementation
  30. 30. Real engine usage
  31. 31. Implementation of EXCEL functions ● One function - one module. ● No side effects ● Use dependency injection ● Test test test test (excel functions often does not work as documented) Call example: SQRT([ 9 ]) returns 3 SUM([2, [5, 6, 7, 9], 1 ]) returns 30
  32. 32. Mistake 2: passing ranges as arrays SUM([A1, B1:B4, C1]) returns 30 SUM([2, [5, 6, 7, 9], 1 ]) returns 30
  33. 33. Range abstraction is very important (avoid unnecessary data copying) SUM( [ [ 21, 22, 23, 31, 32, 33 ] ] ); SUM( [ new ArrayRange([21, 22, 23, 31, 32, 33]) ] ); SUM( [ new ModelRange(model, ‘B2:C4’ ) ] );
  34. 34. 2+2
  35. 35. =A1:A10+B1:B10 (does not work)
  36. 36. =SUMPRODUCT(A1:A10+B1:B10; C1:C10) Works
  37. 37. Implementation of ModelRunner A1=1 A2=A1+1 A3=A1+A2 Cell А1 influences A2 and A3 Cell A2 influences А3
  38. 38. What we want?
  39. 39. We can represent dependencies in form of directed acyclic graph (DAG) A1=1; A2=A1+2; A3=A1+A2; Now we can recompute dependent cells on changes
  40. 40. Mistake 3: reay on synthetic models too much Test model with million cells - 2 seconds for recompute Real model with million cells - 1 hour for recompute Reason: we recompute the same cells several times
  41. 41. You can sort your dependency graph with topological sort Each cell will be calculated only one time
  42. 42. We did it: it worked for test files but did not work on real models. Why?
  43. 43. Reason: Our graph is more than 10k nodes deep. We got stack overflow (JS limits call stack to 10k frames).
  44. 44. What to do Do not use recursion, traverse graph manually with own stack. Real model results: No toposort - 1 hour With toposort - 6 seconds
  45. 45. Optimization Benchmark => tune => benchmark => tune => benchmark => tune => benchmark => tune etc Read a lot about v8 internals Benchmark => tune => benchmark => tune => benchmark => tune => benchmark => tune etc
  46. 46. We did it! What’s next?
  47. 47. OFFSET breaks everything OFFSET(reference, rows, cols, [height], [width]) =OFFSET(D3,3,-2,1,1) - displays the value in cell B6
  48. 48. OFFSET breaks everything =OFFSET(D3, 3, -2) - displays the value in cell B6 A2 = A1+OFFSET(D3, 3, -2). Does A2 depend on B6? Do we have any problem with it?
  49. 49. OFFSET breaks everything =OFFSET(D3, 3, -2) - displays the value in cell B6 A2 = A1+OFFSET(D3, RAND(), RAND()). Which cell does A2 depend on?
  50. 50. Solution: Alternative Runner implementation Build graphs dynamically and cache them for different OFFSET args
  51. 51. Demo of how engine works
  52. 52. Conclusion ● Dependency injection (and SOLID) everywhere ● Make everything modular. ● You will need a lot of tests. There are tons of edge cases in Excel behavior. ● Measure performance on real models. ● You need to have some sort of automatic model tester. ● Create convenient debug tools (you will spent a lot of time debugging) ● Understand how V8 works
  53. 53. Telegram: @JABASCRIPT
  54. 54. Viktor Turskyi viktor@webbylab.com @koorchik @koorchik https://webbylab.com

×