Report

Viktor TurskyiFollow

Oct. 29, 2019•0 likes•599 views

Oct. 29, 2019•0 likes•599 views

Download to read offline

Report

Engineering

You have complex mathematical models (millions of cells, hundreds thousand of formulas) in Excel. And you need to run it browser and mobile without excel. I will talk how we created own spreadsheet engine compatible with MS Excel which allows us to run any Excel model without Excel. I will talk about: * Architecture * Algorithms * JavaScript performance optimization.

Viktor TurskyiFollow

- 1. Create own Excel with JavaScript Viktor Turskyi CEO at WebbyLab 2019
- 2. Viktor Turskyi ● CEO and principal architect at WebbyLab ● Open source developer ● More than 15 years of experience ● Delivered more than 60 projects of different scale ● Did projects for 5 companies from Fortune 500 list
- 3. Business task Business logic is in Excel and you need to code it in your app and run in browser and mobile. This business logic is complex mathematical models.
- 4. Ok. Send us the file.
- 5. Model details ● 2 mln cells ● 400k formulas ● 1 mln Excel functions ● 50 sheets ● Computation chains of 20-30k of cells
- 7. Demo of original source file
- 8. Requirements ● High performance (<2s full recompute) ● Small file size (suitable for work in browser) ● Offline work in browser ● Work on server ● Offline work on tablets (iOS, Android)
- 10. Decision: write own excel in JavaScript
- 11. What we want?
- 13. Is JS performant enough for mathematical computations?
- 14. Performance testing (100k times, large math formula AST)
- 15. Any ideas how to do this?
- 16. It is like creating a compiler
- 17. Components
- 18. Extractor
- 19. How to read data from XLS file? ● Extract values ● Extract formulas ● Extract sheet names ● Extract cells/ranges names
- 20. ● Nodejs libraries ● Ruby libraries ● Python libraries ● Perl libraries ● PHP libraries We tried (everything did not work)
- 21. Run Excel as OLE Object Communicate with Excel via VBA methods What did work for us?
- 22. Preprocessor
- 23. What next? Preprocess all data 1. Parse all raw data 2. Parse and normalize formulas 3. Parse and normalize references 4. Optimize size
- 26. FormulaParser: What to parse? 1. Operators priority 2. Infix/prefix operators 3. Constants 4. Functions 5. Cell references 6. Range references 7. Named ranges
- 27. =IF($F$36 + $AF128 <= 101; SUMPRODUCT( ($S128:OFFSET($S128;$F$36-1;0)) * ($AG$55:OFFSET($AG$55;$F$36-1;0)) * ('Sheet25'!BY84:OFFSET('Sheet25'!BY84;$F$36-1;0) + 'Sheet25'!BY194:OFFSET('Sheet25'!BY194; $F$36-1; 0) ) ); SUMPRODUCT( ($S128:$S$155) * ($AG$55:OFFSET($AG$55;100-$AF128;0)) * ('Sheet25'!BY84:BY$111 + 'Sheet25'!BY194:BY$221) ) ) Formula example
- 28. Mistake 1: Trying to write own parser from scratch Own parser 1. Complex 2. A lot of time 3. Expensive
- 29. 90% of the work is the same as writing a parser for programming language
- 30. Good solution - ANTLR 1. Parser generator based on Grammars (including JS) 2. Lexer and Parser 3. Emits AST (Abstract Syntax Tree) 4. The fastest and the most powerful http://www.antlr.org/
- 31. Formulas examples Formula: '=1+2*3' JS AST: [ '+', 1, [ '*', 2, 3] ] Formula: '=A1+B1' JS AST: [‘+’, ['=', 0, 0, 0], ['=', 0, 1, 0] ] Formula: ‘=SUM(B5:B100, 42)' JS AST: [ 'SUM', [ 'RANGE', 0, 1, 4, 1, 99 ], 42 ]
- 32. Model Runner
- 34. Components ● LocalRunner(Engine) - works with model, processes all cells dependencies ● Formula Evaluator - computes one formula ● Address Parser - parses address in runtime ● Functions - Excel functions implementation
- 37. Implementation of EXCEL functions ● One function - one module. ● No side effects ● Use dependency injection ● Test test test test (excel functions often does not work as documented) Call example: SQRT([ 9 ]) returns 3 SUM([2, [5, 6, 7, 9], 1 ]) returns 30
- 38. Mistake 2: passing ranges as arrays SUM([A1, B1:B4, C1]) returns 30 SUM([2, [5, 6, 7, 9], 1 ]) returns 30
- 39. Range abstraction is very important (avoid unnecessary data copying) SUM( [ [ 21, 22, 23, 31, 32, 33 ] ] ); SUM( [ new ArrayRange([21, 22, 23, 31, 32, 33]) ] ); SUM( [ new ModelRange(model, ‘B2:C4’ ) ] );
- 40. 2+2
- 45. Implementation of ModelRunner A1=1 A2=A1+1 A3=A1+A2 Cell А1 influences A2 and A3 Cell A2 influences А3
- 46. What we want?
- 47. We can represent dependencies in form of directed acyclic graph (DAG) A1=1; A2=A1+2; A3=A1+A2; Now we can recompute dependent cells on changes
- 48. Mistake 3: reay on synthetic models too much Test model with million cells - 2 seconds for recompute Real model with million cells - 1 hour for recompute Reason: we recompute the same cells several times
- 49. You can sort your dependency graph with topological sort Each cell will be calculated only one time
- 50. We did it: it worked for test files but did not work on real models. Why?
- 51. Reason: Our graph is more than 10k nodes deep. We got stack overflow (JS limits call stack to 10k frames).
- 52. What to do Do not use recursion, traverse graph manually with own stack. Real model results: No toposort - 1 hour With toposort - 6 seconds
- 53. Optimization Benchmark => tune => benchmark => tune => benchmark => tune => benchmark => tune etc Read a lot about v8 internals Benchmark => tune => benchmark => tune => benchmark => tune => benchmark => tune etc
- 54. We did it! What’s next?
- 55. OFFSET breaks everything OFFSET(reference, rows, cols, [height], [width]) =OFFSET(D3,3,-2,1,1) - displays the value in cell B6
- 56. OFFSET breaks everything =OFFSET(D3, 3, -2) - displays the value in cell B6 A2 = A1+OFFSET(D3, 3, -2). Does A2 depend on B6? Do we have any problem with it?
- 57. OFFSET breaks everything =OFFSET(D3, 3, -2) - displays the value in cell B6 A2 = A1+OFFSET(D3, RAND(), RAND()). Which cell does A2 depend on?
- 58. Solution: Alternative Runner implementation Build graphs dynamically and cache them for different OFFSET args
- 59. Demo of how engine works
- 60. Conclusion ● Dependency injection (and SOLID) everywhere ● Make everything modular. ● You will need a lot of tests. There are tons of edge cases in Excel behavior. ● Measure performance on real models. ● You need to have some sort of automatic model tester. ● Create convenient debug tools (you will spent a lot of time debugging) ● Understand how V8 works