Accelerating OpenERP accounting: Precalculated period sums

3,166 views

Published on

Performance analysis of the current balance/debit/credit calculation of OpenERP (6.0) and alternative proposals based on precalculated period sums.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,166
On SlideShare
0
From Embeds
0
Number of Embeds
505
Actions
Shares
0
Downloads
99
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Accelerating OpenERP accounting: Precalculated period sums

  1. 1. Accelerating OpenERP accounting: Precalculated period sums Borja López Soilán http://www.kami.es
  2. 2. IndexCurrent approach (sum of entries)● Current approach explained.● Performance analysis.Proposal: “Precalculated period sums”● Alternative 1: Accumulated values using triggers – Proposed by Ferdinand Gassauer (Chricar)● Alternative 2: Period totals using the ORM – Proposed by Borja L.S. (@NeoPolus)● Current approach vs Precalculated period sums
  3. 3. Current approach: Sum of entries Currently each time you read thecredit/debit/balance of one account OpenERP has to recalculate it from the account entries (move lines).The magic is done by the “_query_get()” method of account.move.line, that selects the lines to consider, and the “__compute()” method of account.account that does the sums.
  4. 4. Inside the current approach_query_get() filters: builds the “WHERE” partof the SQL query that selects all the accountmove lines involving a set of accounts.● Allows to do complex filters, but usually look like “include non-draft entries from these periods for these accounts”.__compute() sums: uses the filter to query forthe sums of debit/credit/balance for the currentaccount and its children.● Does just one SQL query for all the accounts. (nice!)● Has to aggregate the children values on python.
  5. 5. Sample query done by __computeSELECT l.account_id as id,COALESCE(SUM(l.debit), 0) as debit,COALESCE(SUM(l.credit), 0) as credit,COALESCE(SUM(l.debit),0) -COALESCE(SUM(l.credit), 0) as balanceFROM account_move_line l Account + children = lot of ids!WHERE l.account_id IN (2, 3, 4, 5, 6, ...,1648, 1649, 1650, 1651) AND l.state <>draft AND l.period_id IN (SELECT id FROMaccount_period WHERE fiscalyear_id IN (1))AND l.move_id IN (SELECT id FROM account_moveWHERE account_move.state = posted)GROUP BY l.account_id
  6. 6. Sample query plan QUERY PLAN--------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=57.83..57.85 rows=1 width=18) -> Nested Loop Semi Join (cost=45.00..57.82 rows=1 width=18) Ugh!, sequential scan Join Filter: (l.period_id = account_period.id) on a table with (potentially) -> Nested Loop (cost=45.00..57.52 rows=1 width=22) lots of records... :( -> HashAggregate (cost=45.00..45.01 rows=1 width=4) -> Seq Scan on account_move (cost=0.00..45.00 rows=1 width=4) Filter: ((state)::text = posted::text) -> Index Scan using account_move_line_move_id_index on account_move_line l (cost=0.00..12.49 rows=1 width=26) Index Cond: (l.move_id = account_move.id) Filter: (((l.state)::text <> draft::text) AND (l.account_id = ANY ({2,3,4,5, ...,1649,1650,1651}::integer[]))) -> Index Scan using account_period_fiscalyear_id_index on account_period (cost=0.00..0.29 rows=1 width=4) Index Cond: (account_period.fiscalyear_id = 1)
  7. 7. Performance Analysis Current approach big O 1/2“Selects all the account move lines”The query complexity depends on l, thenumber of move lines for that account and(recursive) children: O(query) = O(f(l))“Has to aggregate the children values”The complexity depends on c, the number ofchildren. O(aggregate) = O(g(c))
  8. 8. Current approach big O 2/2O(__compute) = O(query) + O(aggregate)O(__compute) = O(f(l)) + O(g(c))What kind of functions are f and g?Lets do some empiric testing (funnier thanmaths, isnt it?)...
  9. 9. Lets test this chart... 1/2The official Spanishchart of accounts, whenempty: Has about 1600 accounts. Has 5 levels.(to test this chart ofaccounts install thel10n_es module)
  10. 10. Lets test this chart... 2/2 How many accounts below each level?Account code Number of children (recursive)Level 5 – 430000 0(leaf account)Level 4 - 4300 1Level 3 - 430 6Level 2 - 43 43Level 1 - 4 192Level 0 – 0 1678(root account) To get the balance of account “4” we need to sum the balance of 192 accounts!
  11. 11. Ok, looks like the number of children c has alot of influence, and the number of moves lhas little or zero influence, g(c) >> f(l)Lets split them...
  12. 12. Now it is clear that g(c) is linear!(note: the nº of children grows exponentially)O(g(c)) = O(c)
  13. 13. So, the influence was little, but linear too!O(f(l)) = O(l)
  14. 14. Big O - ConclusionO(__compute) = O(l) + O(c)c has an unexpectedly big influence on theresults=> Bad performance on complex charts ofaccounts!c does not grow with time, but l does...=> OpenERP accounting becomes slower andslower with time! (though its not that bad as expected)
  15. 15. Proposal: Precalculated sumsOpenERP recalculates the debit/credit/balancefrom move lines each time.Most accounting programs store the totals perperiod (or the cumulative values) for eachaccount. Why?● Reading the debit/credit/balance becomes much faster.● ...and reading is much more data intensive than writing: – Accounting reports read lots of times lots of accounts. – Accountants only update a few accounts at a time.
  16. 16. Its really faster?Precalculated sums per period means:● O(p)query (get the debit/credit/balance of each period for that account) instead of O(l)query, with p being the number of periods, p << l. Using opening entries, or cumulative totals, p becomes constant => O(1)● If aggregated sums (with children values) are also precalculated, we dont have to do one O(c)aggregation per read.Its O(1) for reading!! (but creating/editing entries is a bit slower)
  17. 17. Alternative 1: Accumulated values using triggers (I)Proposed by Ferdinand Gassauer.How does it work?● New object to store the accumulated debit/credit/balance per account and period (lets call it account.period.sum). Opening 1st 2nd 3rd 4th Move line values 400 +200, +25 -400 +25, in period +50 +200 Value in table 400 650 675 275 500● Triggers on Postgres (PL/pgSQL) update the account_period_sum table each time an account move line is created/updated/deleted.
  18. 18. Alternative 1: Accumulated values using triggers (II)How does it work?(cont.)● The data is calculated accumulating the values from previous periods. (Ferdinand prototype requires an special naming of periods for this).● Creates SQL views based on the account account_period_sum table.● For reports that show data aggregated by period: – New reports can be created that either directly use the SQL views, or use the account.period.sum object.● The account.account.__compute() method could be extended to optimize queries (modified to make use of the account_period_sum when possible) in the future.
  19. 19. Alternative 1: Accumulated values using triggers (III)Good points Bad points Triggers guarantee that Database dependent the data is always in triggers. sync. (even if somebody writes directly to Triggers are harder to the database!) maintain than Python Triggers are fast. code. Prototype available and Makes some working! - “used this method assumptions on period already in very big names. installations - some 100 (as OpenERP currently does not flag opening periods apart accountants some millions from closing ones) moves without any problems” (Ferdinand)
  20. 20. Alternative 2: Period totals using the ORM (I) Proposed by Borja L.S. (@NeoPolus). How does it work? ● New object to store the debit/credit/balance sums per account and period (and state): Opening 1st 2nd 3rd 4th Move line values 400 +200, +25 -400 +25, in period +50 +200 Value in table 400 250 25 -400 225 ● Extends the account.move.line open object to update the account.sum objects each time a line is created/updated/deleted.
  21. 21. Alternative 2: Period totals using the ORM (II) How does it work?(cont.) ● Extends account.account.__compute() method to optimize queries: – If the query filters only by period/fiscal year/state, the data is retrieved from the account.sum object. – If the query filters by dates, and one ore more fiscal periods are fully included on that range, the data is retrieved from for the account.sum objects (for the range covered by the periods) plus the account.move.lines (the range not covered by periods). – Filtering by every other field (for example partner_id) causes a fallback into the normal __compute method.
  22. 22. Alternative 2: Period totals using the ORM (III)Good points Bad points Database Does not guarantee independent. that the sums are in Optimizes all the sync with the move accounting. lines. (but nobody should directly alter the database in first place...) Flexible. No PL/pgSQL triggers Python is slower than required, just Python using triggers. => Easier to maintain. No prototype yet! :) (But take a look at Tryton stock quantity computation)
  23. 23. Current approach VS Period sumsCurrent approach Precalculated sumsPros Pros ● No redundant data. ● Fast, always. ● Simpler queries. ● Drill-down navigation.Cons Cons ● Slow. ● Need to keep sums in – Reports and sync with move lines. dashboard ● More complex charts/tables are (__compute) or performance hungry. specific queries to ● Becomes even slower make use of the with time. precalculated sums.
  24. 24. Precalculated sums – Drill down navigation (Chricar prototype) 1/3
  25. 25. Precalculated sums – Drill down navigation (Chricar prototype) 2/3
  26. 26. Precalculated sums – Drill down navigation (Chricar prototype) 3/3
  27. 27. And one last remark......all this is applicable to the stock quantities computation too!

×