Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Create own Excel with
JavaScript
Viktor Turskyi
CEO at WebbyLab
2019
Viktor Turskyi
● CEO and principal architect at
WebbyLab
● Open source developer
● More than 15 years of experience
● Deli...
Business task
Business logic is in Excel and you need to code
it in your app and run in browser and mobile.
This business ...
Ok. Send us the file.
Model details
● 2 mln cells
● 400k formulas
● 1 mln Excel functions
● 50 sheets
● Computation chains of 20-30k of cells
Demo of original source file
Requirements
● High performance (<2s full recompute)
● Small file size (suitable for work in
browser)
● Offline work in br...
Decision: write own excel in
JavaScript
What we want?
Is JS performant enough for
mathematical computations?
Performance testing (100k times,
large math formula AST)
Any ideas how to do this?
It is like creating a compiler
Components
Extractor
How to read data from XLS file?
● Extract values
● Extract formulas
● Extract sheet names
● Extract cells/ranges names
● Nodejs libraries
● Ruby libraries
● Python libraries
● Perl libraries
● PHP libraries
We tried (everything did not work)
Run Excel as OLE Object
Communicate with Excel via VBA methods
What did work for us?
Preprocessor
What next? Preprocess all data
1. Parse all raw data
2. Parse and normalize formulas
3. Parse and normalize references
4. ...
FormulaParser: What to parse?
1. Operators priority
2. Infix/prefix operators
3. Constants
4. Functions
5. Cell references...
=IF($F$36 + $AF128 <= 101; SUMPRODUCT(
($S128:OFFSET($S128;$F$36-1;0)) *
($AG$55:OFFSET($AG$55;$F$36-1;0)) *
('Sheet25'!BY...
Mistake 1: Trying to write own parser
from scratch
Own parser
1. Complex
2. A lot of time
3. Expensive
90% of the work is the same as
writing a parser for programming
language
Good solution - ANTLR
1. Parser generator based on Grammars
(including JS)
2. Lexer and Parser
3. Emits AST (Abstract Synt...
Formulas examples
Formula: '=1+2*3'
JS AST: [ '+', 1, [ '*', 2, 3] ]
Formula: '=A1+B1'
JS AST: [‘+’, ['=', 0, 0, 0], ['=',...
Model Runner
Model Runner (simplified)
Components
● LocalRunner(Engine) - works with model,
processes all cells dependencies
● Formula Evaluator - computes one
f...
Real engine usage
Implementation of EXCEL functions
● One function - one module.
● No side effects
● Use dependency injection
● Test test te...
Mistake 2: passing ranges as arrays
SUM([A1, B1:B4, C1]) returns 30
SUM([2, [5, 6, 7, 9], 1 ]) returns 30
Range abstraction is very important
(avoid unnecessary data copying)
SUM( [ [ 21, 22, 23, 31, 32, 33 ] ] );
SUM( [ new Arr...
2+2
=A1:A10+B1:B10
(does not work)
=SUMPRODUCT(A1:A10+B1:B10; C1:C10)
Works
Implementation of ModelRunner
A1=1
A2=A1+1
A3=A1+A2
Cell А1 influences A2 and A3
Cell A2 influences А3
What we want?
We can represent dependencies in
form of directed acyclic graph (DAG)
A1=1;
A2=A1+2;
A3=A1+A2;
Now we can recompute
depend...
Mistake 3: reay on synthetic models
too much
Test model with million cells -
2 seconds for recompute
Real model with milli...
You can sort your dependency graph
with topological sort
Each cell will be calculated only one time
We did it: it worked for test files but did not
work on real models. Why?
Reason: Our graph is more than 10k nodes
deep. We got stack overflow (JS limits call
stack to 10k frames).
What to do
Do not use recursion, traverse graph manually
with own stack.
Real model results:
No toposort - 1 hour
With top...
Optimization
Benchmark => tune => benchmark => tune =>
benchmark => tune => benchmark => tune etc
Read a lot about v8 inte...
We did it! What’s next?
OFFSET breaks everything
OFFSET(reference, rows, cols, [height], [width])
=OFFSET(D3,3,-2,1,1) - displays the value in
cel...
OFFSET breaks everything
=OFFSET(D3, 3, -2) - displays the value in cell
B6
A2 = A1+OFFSET(D3, 3, -2).
Does A2 depend on B...
OFFSET breaks everything
=OFFSET(D3, 3, -2) - displays the value in cell
B6
A2 = A1+OFFSET(D3, RAND(), RAND()).
Which cell...
Solution: Alternative Runner
implementation
Build graphs
dynamically and cache
them for different
OFFSET args
Demo of how engine works
Conclusion
● Dependency injection (and SOLID) everywhere
● Make everything modular.
● You will need a lot of tests. There ...
Telegram: @JABASCRIPT
Viktor Turskyi
viktor@webbylab.com
@koorchik @koorchik
https://webbylab.com
How to create a high performance excel engine in java script
How to create a high performance excel engine in java script
How to create a high performance excel engine in java script
How to create a high performance excel engine in java script
How to create a high performance excel engine in java script
How to create a high performance excel engine in java script
How to create a high performance excel engine in java script
How to create a high performance excel engine in java script
Upcoming SlideShare
Loading in …5
×

How to create a high performance excel engine in java script

You have complex mathematical models (millions of cells, hundreds thousand of formulas) in Excel. And you need to run it browser and mobile without excel. I will talk how we created own spreadsheet engine compatible with MS Excel which allows us to run any Excel model without Excel. I will talk about:
* Architecture
* Algorithms
* JavaScript performance optimization.

  • Be the first to comment

  • Be the first to like this

How to create a high performance excel engine in java script

  1. 1. Create own Excel with JavaScript Viktor Turskyi CEO at WebbyLab 2019
  2. 2. Viktor Turskyi ● CEO and principal architect at WebbyLab ● Open source developer ● More than 15 years of experience ● Delivered more than 60 projects of different scale ● Did projects for 5 companies from Fortune 500 list
  3. 3. Business task Business logic is in Excel and you need to code it in your app and run in browser and mobile. This business logic is complex mathematical models.
  4. 4. Ok. Send us the file.
  5. 5. Model details ● 2 mln cells ● 400k formulas ● 1 mln Excel functions ● 50 sheets ● Computation chains of 20-30k of cells
  6. 6. Demo of original source file
  7. 7. Requirements ● High performance (<2s full recompute) ● Small file size (suitable for work in browser) ● Offline work in browser ● Work on server ● Offline work on tablets (iOS, Android)
  8. 8. Decision: write own excel in JavaScript
  9. 9. What we want?
  10. 10. Is JS performant enough for mathematical computations?
  11. 11. Performance testing (100k times, large math formula AST)
  12. 12. Any ideas how to do this?
  13. 13. It is like creating a compiler
  14. 14. Components
  15. 15. Extractor
  16. 16. How to read data from XLS file? ● Extract values ● Extract formulas ● Extract sheet names ● Extract cells/ranges names
  17. 17. ● Nodejs libraries ● Ruby libraries ● Python libraries ● Perl libraries ● PHP libraries We tried (everything did not work)
  18. 18. Run Excel as OLE Object Communicate with Excel via VBA methods What did work for us?
  19. 19. Preprocessor
  20. 20. What next? Preprocess all data 1. Parse all raw data 2. Parse and normalize formulas 3. Parse and normalize references 4. Optimize size
  21. 21. FormulaParser: What to parse? 1. Operators priority 2. Infix/prefix operators 3. Constants 4. Functions 5. Cell references 6. Range references 7. Named ranges
  22. 22. =IF($F$36 + $AF128 <= 101; SUMPRODUCT( ($S128:OFFSET($S128;$F$36-1;0)) * ($AG$55:OFFSET($AG$55;$F$36-1;0)) * ('Sheet25'!BY84:OFFSET('Sheet25'!BY84;$F$36-1;0) + 'Sheet25'!BY194:OFFSET('Sheet25'!BY194; $F$36-1; 0) ) ); SUMPRODUCT( ($S128:$S$155) * ($AG$55:OFFSET($AG$55;100-$AF128;0)) * ('Sheet25'!BY84:BY$111 + 'Sheet25'!BY194:BY$221) ) ) Formula example
  23. 23. Mistake 1: Trying to write own parser from scratch Own parser 1. Complex 2. A lot of time 3. Expensive
  24. 24. 90% of the work is the same as writing a parser for programming language
  25. 25. Good solution - ANTLR 1. Parser generator based on Grammars (including JS) 2. Lexer and Parser 3. Emits AST (Abstract Syntax Tree) 4. The fastest and the most powerful http://www.antlr.org/
  26. 26. Formulas examples Formula: '=1+2*3' JS AST: [ '+', 1, [ '*', 2, 3] ] Formula: '=A1+B1' JS AST: [‘+’, ['=', 0, 0, 0], ['=', 0, 1, 0] ] Formula: ‘=SUM(B5:B100, 42)' JS AST: [ 'SUM', [ 'RANGE', 0, 1, 4, 1, 99 ], 42 ]
  27. 27. Model Runner
  28. 28. Model Runner (simplified)
  29. 29. Components ● LocalRunner(Engine) - works with model, processes all cells dependencies ● Formula Evaluator - computes one formula ● Address Parser - parses address in runtime ● Functions - Excel functions implementation
  30. 30. Real engine usage
  31. 31. Implementation of EXCEL functions ● One function - one module. ● No side effects ● Use dependency injection ● Test test test test (excel functions often does not work as documented) Call example: SQRT([ 9 ]) returns 3 SUM([2, [5, 6, 7, 9], 1 ]) returns 30
  32. 32. Mistake 2: passing ranges as arrays SUM([A1, B1:B4, C1]) returns 30 SUM([2, [5, 6, 7, 9], 1 ]) returns 30
  33. 33. Range abstraction is very important (avoid unnecessary data copying) SUM( [ [ 21, 22, 23, 31, 32, 33 ] ] ); SUM( [ new ArrayRange([21, 22, 23, 31, 32, 33]) ] ); SUM( [ new ModelRange(model, ‘B2:C4’ ) ] );
  34. 34. 2+2
  35. 35. =A1:A10+B1:B10 (does not work)
  36. 36. =SUMPRODUCT(A1:A10+B1:B10; C1:C10) Works
  37. 37. Implementation of ModelRunner A1=1 A2=A1+1 A3=A1+A2 Cell А1 influences A2 and A3 Cell A2 influences А3
  38. 38. What we want?
  39. 39. We can represent dependencies in form of directed acyclic graph (DAG) A1=1; A2=A1+2; A3=A1+A2; Now we can recompute dependent cells on changes
  40. 40. Mistake 3: reay on synthetic models too much Test model with million cells - 2 seconds for recompute Real model with million cells - 1 hour for recompute Reason: we recompute the same cells several times
  41. 41. You can sort your dependency graph with topological sort Each cell will be calculated only one time
  42. 42. We did it: it worked for test files but did not work on real models. Why?
  43. 43. Reason: Our graph is more than 10k nodes deep. We got stack overflow (JS limits call stack to 10k frames).
  44. 44. What to do Do not use recursion, traverse graph manually with own stack. Real model results: No toposort - 1 hour With toposort - 6 seconds
  45. 45. Optimization Benchmark => tune => benchmark => tune => benchmark => tune => benchmark => tune etc Read a lot about v8 internals Benchmark => tune => benchmark => tune => benchmark => tune => benchmark => tune etc
  46. 46. We did it! What’s next?
  47. 47. OFFSET breaks everything OFFSET(reference, rows, cols, [height], [width]) =OFFSET(D3,3,-2,1,1) - displays the value in cell B6
  48. 48. OFFSET breaks everything =OFFSET(D3, 3, -2) - displays the value in cell B6 A2 = A1+OFFSET(D3, 3, -2). Does A2 depend on B6? Do we have any problem with it?
  49. 49. OFFSET breaks everything =OFFSET(D3, 3, -2) - displays the value in cell B6 A2 = A1+OFFSET(D3, RAND(), RAND()). Which cell does A2 depend on?
  50. 50. Solution: Alternative Runner implementation Build graphs dynamically and cache them for different OFFSET args
  51. 51. Demo of how engine works
  52. 52. Conclusion ● Dependency injection (and SOLID) everywhere ● Make everything modular. ● You will need a lot of tests. There are tons of edge cases in Excel behavior. ● Measure performance on real models. ● You need to have some sort of automatic model tester. ● Create convenient debug tools (you will spent a lot of time debugging) ● Understand how V8 works
  53. 53. Telegram: @JABASCRIPT
  54. 54. Viktor Turskyi viktor@webbylab.com @koorchik @koorchik https://webbylab.com

×