Successfully reported this slideshow.

Accelerating OpenERP accounting: Precalculated period sums

2

Share

Loading in …3
×
1 of 27
1 of 27

Accelerating OpenERP accounting: Precalculated period sums

2

Share

Download to read offline

Description

Performance analysis of the current balance/debit/credit calculation of OpenERP (6.0) and alternative proposals based on precalculated period sums.

Transcript

  1. 1. Accelerating OpenERP accounting: Precalculated period sums Borja López Soilán http://www.kami.es
  2. 2. Index Current approach (sum of entries) ● Current approach explained. ● Performance analysis. Proposal: “Precalculated period sums” ● Alternative 1: Accumulated values using triggers – Proposed by Ferdinand Gassauer (Chricar) ● Alternative 2: Period totals using the ORM – Proposed by Borja L.S. (@NeoPolus) ● Current approach vs Precalculated period sums
  3. 3. Current approach: Sum of entries Currently each time you read the credit/debit/balance of one account OpenERP has to recalculate it from the account entries (move lines). The magic is done by the “_query_get()” method of account.move.line, that selects the lines to consider, and the “__compute()” method of account.account that does the sums.
  4. 4. Inside the current approach _query_get() filters: builds the “WHERE” part of the SQL query that selects all the account move lines involving a set of accounts. ● Allows to do complex filters, but usually look like “include non-draft entries from these periods for these accounts”. __compute() sums: uses the filter to query for the sums of debit/credit/balance for the current account and its children. ● Does just one SQL query for all the accounts. (nice!) ● Has to aggregate the children values on python.
  5. 5. Sample query done by __compute SELECT l.account_id as id, COALESCE(SUM(l.debit), 0) as debit, COALESCE(SUM(l.credit), 0) as credit, COALESCE(SUM(l.debit),0) - COALESCE(SUM(l.credit), 0) as balance FROM account_move_line l Account + children = lot of ids! WHERE l.account_id IN (2, 3, 4, 5, 6, ..., 1648, 1649, 1650, 1651) AND l.state <> 'draft' AND l.period_id IN (SELECT id FROM account_period WHERE fiscalyear_id IN (1)) AND l.move_id IN (SELECT id FROM account_move WHERE account_move.state = 'posted') GROUP BY l.account_id
  6. 6. Sample query plan QUERY PLAN --------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=57.83..57.85 rows=1 width=18) -> Nested Loop Semi Join (cost=45.00..57.82 rows=1 width=18) Ugh!, sequential scan Join Filter: (l.period_id = account_period.id) on a table with (potentially) -> Nested Loop (cost=45.00..57.52 rows=1 width=22) lots of records... :( -> HashAggregate (cost=45.00..45.01 rows=1 width=4) -> Seq Scan on account_move (cost=0.00..45.00 rows=1 width=4) Filter: ((state)::text = 'posted'::text) -> Index Scan using account_move_line_move_id_index on account_move_line l (cost=0.00..12.49 rows=1 width=26) Index Cond: (l.move_id = account_move.id) Filter: (((l.state)::text <> 'draft'::text) AND (l.account_id = ANY ('{2,3,4,5, ..., 1649,1650,1651}'::integer[]))) -> Index Scan using account_period_fiscalyear_id_index on account_period (cost=0.00..0.29 rows=1 width=4) Index Cond: (account_period.fiscalyear_id = 1)
  7. 7. Performance Analysis Current approach big O 1/2 “Selects all the account move lines” The query complexity depends on l, the number of move lines for that account and (recursive) children: O(query) = O(f(l)) “Has to aggregate the children values” The complexity depends on c, the number of children. O(aggregate) = O(g(c))
  8. 8. Current approach big O 2/2 O(__compute) = O(query) + O(aggregate) O(__compute) = O(f(l)) + O(g(c)) What kind of functions are f and g? Let's do some empiric testing (funnier than maths, isn't it?)...
  9. 9. Let's test this chart... 1/2 The official Spanish chart of accounts, when empty: Has about 1600 accounts. Has 5 levels. (to test this chart of accounts install the l10n_es module)
  10. 10. Let's test this chart... 2/2 How many accounts below each level? Account code Number of children (recursive) Level 5 – 430000 0 (leaf account) Level 4 - 4300 1 Level 3 - 430 6 Level 2 - 43 43 Level 1 - 4 192 Level 0 – 0 1678 (root account) To get the balance of account “4” we need to sum the balance of 192 accounts!
  11. 11. Ok, looks like the number of children c has a lot of influence, and the number of moves l has little or zero influence, g(c) >> f(l) Lets split them...
  12. 12. Now it is clear that g(c) is linear! (note: the nº of children grows exponentially) O(g(c)) = O(c)
  13. 13. So, the influence was little, but linear too! O(f(l)) = O(l)
  14. 14. Big O - Conclusion O(__compute) = O(l) + O(c) c has an unexpectedly big influence on the results => Bad performance on complex charts of accounts! c does not grow with time, but l does... => OpenERP accounting becomes slower and slower with time! (though it's not that bad as expected)
  15. 15. Proposal: Precalculated sums OpenERP recalculates the debit/credit/balance from move lines each time. Most accounting programs store the totals per period (or the cumulative values) for each account. Why? ● Reading the debit/credit/balance becomes much faster. ● ...and reading is much more data intensive than writing: – Accounting reports read lots of times lots of accounts. – Accountants only update a few accounts at a time.
  16. 16. It's really faster? Precalculated sums per period means: ● O(p)query (get the debit/credit/balance of each period for that account) instead of O(l)query, with p being the number of periods, p << l. Using opening entries, or cumulative totals, p becomes constant => O(1) ● If aggregated sums (with children values) are also precalculated, we don't have to do one O(c)aggregation per read. It's O(1) for reading!! (but creating/editing entries is a bit slower)
  17. 17. Alternative 1: Accumulated values using triggers (I) Proposed by Ferdinand Gassauer. How does it work? ● New object to store the accumulated debit/credit/balance per account and period (let's call it account.period.sum). Opening 1st 2nd 3rd 4th Move line values 400 +200, +25 -400 +25, in period +50 +200 Value in table 400 650 675 275 500 ● Triggers on Postgres (PL/pgSQL) update the account_period_sum table each time an account move line is created/updated/deleted.
  18. 18. Alternative 1: Accumulated values using triggers (II) How does it work?(cont.) ● The data is calculated accumulating the values from previous periods. (Ferdinand prototype requires an special naming of periods for this). ● Creates SQL views based on the account account_period_sum table. ● For reports that show data aggregated by period: – New reports can be created that either directly use the SQL views, or use the account.period.sum object. ● The account.account.__compute() method could be extended to optimize queries (modified to make use of the account_period_sum when possible) in the future.
  19. 19. Alternative 1: Accumulated values using triggers (III) Good points Bad points Triggers guarantee that Database dependent the data is always in triggers. sync. (even if somebody writes directly to Triggers are harder to the database!) maintain than Python Triggers are fast. code. Prototype available and Makes some working! - “used this method assumptions on period already in very big names. installations - some 100 (as OpenERP currently does not flag opening periods apart accountants some millions from closing ones) moves without any problems” (Ferdinand)
  20. 20. Alternative 2: Period totals using the ORM (I) Proposed by Borja L.S. (@NeoPolus). How does it work? ● New object to store the debit/credit/balance sums per account and period (and state): Opening 1st 2nd 3rd 4th Move line values 400 +200, +25 -400 +25, in period +50 +200 Value in table 400 250 25 -400 225 ● Extends the account.move.line open object to update the account.sum objects each time a line is created/updated/deleted.
  21. 21. Alternative 2: Period totals using the ORM (II) How does it work?(cont.) ● Extends account.account.__compute() method to optimize queries: – If the query filters only by period/fiscal year/state, the data is retrieved from the account.sum object. – If the query filters by dates, and one ore more fiscal periods are fully included on that range, the data is retrieved from for the account.sum objects (for the range covered by the periods) plus the account.move.lines (the range not covered by periods). – Filtering by every other field (for example partner_id) causes a fallback into the normal __compute method.
  22. 22. Alternative 2: Period totals using the ORM (III) Good points Bad points Database Does not guarantee independent. that the sums are in Optimizes all the sync with the move accounting. lines. (but nobody should directly alter the database in first place...) Flexible. No PL/pgSQL triggers Python is slower than required, just Python using triggers. => Easier to maintain. No prototype yet! :) (But take a look at Tryton stock quantity computation)
  23. 23. Current approach VS Period sums Current approach Precalculated sums Pros Pros ● No redundant data. ● Fast, always. ● Simpler queries. ● Drill-down navigation. Cons Cons ● Slow. ● Need to keep sums in – Reports and sync with move lines. dashboard ● More complex charts/tables are (__compute) or performance hungry. specific queries to ● Becomes even slower make use of the with time. precalculated sums.
  24. 24. Precalculated sums – Drill down navigation (Chricar prototype) 1/3
  25. 25. Precalculated sums – Drill down navigation (Chricar prototype) 2/3
  26. 26. Precalculated sums – Drill down navigation (Chricar prototype) 3/3
  27. 27. And one last remark... ...all this is applicable to the stock quantities computation too!

Description

Performance analysis of the current balance/debit/credit calculation of OpenERP (6.0) and alternative proposals based on precalculated period sums.

Transcript

  1. 1. Accelerating OpenERP accounting: Precalculated period sums Borja López Soilán http://www.kami.es
  2. 2. Index Current approach (sum of entries) ● Current approach explained. ● Performance analysis. Proposal: “Precalculated period sums” ● Alternative 1: Accumulated values using triggers – Proposed by Ferdinand Gassauer (Chricar) ● Alternative 2: Period totals using the ORM – Proposed by Borja L.S. (@NeoPolus) ● Current approach vs Precalculated period sums
  3. 3. Current approach: Sum of entries Currently each time you read the credit/debit/balance of one account OpenERP has to recalculate it from the account entries (move lines). The magic is done by the “_query_get()” method of account.move.line, that selects the lines to consider, and the “__compute()” method of account.account that does the sums.
  4. 4. Inside the current approach _query_get() filters: builds the “WHERE” part of the SQL query that selects all the account move lines involving a set of accounts. ● Allows to do complex filters, but usually look like “include non-draft entries from these periods for these accounts”. __compute() sums: uses the filter to query for the sums of debit/credit/balance for the current account and its children. ● Does just one SQL query for all the accounts. (nice!) ● Has to aggregate the children values on python.
  5. 5. Sample query done by __compute SELECT l.account_id as id, COALESCE(SUM(l.debit), 0) as debit, COALESCE(SUM(l.credit), 0) as credit, COALESCE(SUM(l.debit),0) - COALESCE(SUM(l.credit), 0) as balance FROM account_move_line l Account + children = lot of ids! WHERE l.account_id IN (2, 3, 4, 5, 6, ..., 1648, 1649, 1650, 1651) AND l.state <> 'draft' AND l.period_id IN (SELECT id FROM account_period WHERE fiscalyear_id IN (1)) AND l.move_id IN (SELECT id FROM account_move WHERE account_move.state = 'posted') GROUP BY l.account_id
  6. 6. Sample query plan QUERY PLAN --------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=57.83..57.85 rows=1 width=18) -> Nested Loop Semi Join (cost=45.00..57.82 rows=1 width=18) Ugh!, sequential scan Join Filter: (l.period_id = account_period.id) on a table with (potentially) -> Nested Loop (cost=45.00..57.52 rows=1 width=22) lots of records... :( -> HashAggregate (cost=45.00..45.01 rows=1 width=4) -> Seq Scan on account_move (cost=0.00..45.00 rows=1 width=4) Filter: ((state)::text = 'posted'::text) -> Index Scan using account_move_line_move_id_index on account_move_line l (cost=0.00..12.49 rows=1 width=26) Index Cond: (l.move_id = account_move.id) Filter: (((l.state)::text <> 'draft'::text) AND (l.account_id = ANY ('{2,3,4,5, ..., 1649,1650,1651}'::integer[]))) -> Index Scan using account_period_fiscalyear_id_index on account_period (cost=0.00..0.29 rows=1 width=4) Index Cond: (account_period.fiscalyear_id = 1)
  7. 7. Performance Analysis Current approach big O 1/2 “Selects all the account move lines” The query complexity depends on l, the number of move lines for that account and (recursive) children: O(query) = O(f(l)) “Has to aggregate the children values” The complexity depends on c, the number of children. O(aggregate) = O(g(c))
  8. 8. Current approach big O 2/2 O(__compute) = O(query) + O(aggregate) O(__compute) = O(f(l)) + O(g(c)) What kind of functions are f and g? Let's do some empiric testing (funnier than maths, isn't it?)...
  9. 9. Let's test this chart... 1/2 The official Spanish chart of accounts, when empty: Has about 1600 accounts. Has 5 levels. (to test this chart of accounts install the l10n_es module)
  10. 10. Let's test this chart... 2/2 How many accounts below each level? Account code Number of children (recursive) Level 5 – 430000 0 (leaf account) Level 4 - 4300 1 Level 3 - 430 6 Level 2 - 43 43 Level 1 - 4 192 Level 0 – 0 1678 (root account) To get the balance of account “4” we need to sum the balance of 192 accounts!
  11. 11. Ok, looks like the number of children c has a lot of influence, and the number of moves l has little or zero influence, g(c) >> f(l) Lets split them...
  12. 12. Now it is clear that g(c) is linear! (note: the nº of children grows exponentially) O(g(c)) = O(c)
  13. 13. So, the influence was little, but linear too! O(f(l)) = O(l)
  14. 14. Big O - Conclusion O(__compute) = O(l) + O(c) c has an unexpectedly big influence on the results => Bad performance on complex charts of accounts! c does not grow with time, but l does... => OpenERP accounting becomes slower and slower with time! (though it's not that bad as expected)
  15. 15. Proposal: Precalculated sums OpenERP recalculates the debit/credit/balance from move lines each time. Most accounting programs store the totals per period (or the cumulative values) for each account. Why? ● Reading the debit/credit/balance becomes much faster. ● ...and reading is much more data intensive than writing: – Accounting reports read lots of times lots of accounts. – Accountants only update a few accounts at a time.
  16. 16. It's really faster? Precalculated sums per period means: ● O(p)query (get the debit/credit/balance of each period for that account) instead of O(l)query, with p being the number of periods, p << l. Using opening entries, or cumulative totals, p becomes constant => O(1) ● If aggregated sums (with children values) are also precalculated, we don't have to do one O(c)aggregation per read. It's O(1) for reading!! (but creating/editing entries is a bit slower)
  17. 17. Alternative 1: Accumulated values using triggers (I) Proposed by Ferdinand Gassauer. How does it work? ● New object to store the accumulated debit/credit/balance per account and period (let's call it account.period.sum). Opening 1st 2nd 3rd 4th Move line values 400 +200, +25 -400 +25, in period +50 +200 Value in table 400 650 675 275 500 ● Triggers on Postgres (PL/pgSQL) update the account_period_sum table each time an account move line is created/updated/deleted.
  18. 18. Alternative 1: Accumulated values using triggers (II) How does it work?(cont.) ● The data is calculated accumulating the values from previous periods. (Ferdinand prototype requires an special naming of periods for this). ● Creates SQL views based on the account account_period_sum table. ● For reports that show data aggregated by period: – New reports can be created that either directly use the SQL views, or use the account.period.sum object. ● The account.account.__compute() method could be extended to optimize queries (modified to make use of the account_period_sum when possible) in the future.
  19. 19. Alternative 1: Accumulated values using triggers (III) Good points Bad points Triggers guarantee that Database dependent the data is always in triggers. sync. (even if somebody writes directly to Triggers are harder to the database!) maintain than Python Triggers are fast. code. Prototype available and Makes some working! - “used this method assumptions on period already in very big names. installations - some 100 (as OpenERP currently does not flag opening periods apart accountants some millions from closing ones) moves without any problems” (Ferdinand)
  20. 20. Alternative 2: Period totals using the ORM (I) Proposed by Borja L.S. (@NeoPolus). How does it work? ● New object to store the debit/credit/balance sums per account and period (and state): Opening 1st 2nd 3rd 4th Move line values 400 +200, +25 -400 +25, in period +50 +200 Value in table 400 250 25 -400 225 ● Extends the account.move.line open object to update the account.sum objects each time a line is created/updated/deleted.
  21. 21. Alternative 2: Period totals using the ORM (II) How does it work?(cont.) ● Extends account.account.__compute() method to optimize queries: – If the query filters only by period/fiscal year/state, the data is retrieved from the account.sum object. – If the query filters by dates, and one ore more fiscal periods are fully included on that range, the data is retrieved from for the account.sum objects (for the range covered by the periods) plus the account.move.lines (the range not covered by periods). – Filtering by every other field (for example partner_id) causes a fallback into the normal __compute method.
  22. 22. Alternative 2: Period totals using the ORM (III) Good points Bad points Database Does not guarantee independent. that the sums are in Optimizes all the sync with the move accounting. lines. (but nobody should directly alter the database in first place...) Flexible. No PL/pgSQL triggers Python is slower than required, just Python using triggers. => Easier to maintain. No prototype yet! :) (But take a look at Tryton stock quantity computation)
  23. 23. Current approach VS Period sums Current approach Precalculated sums Pros Pros ● No redundant data. ● Fast, always. ● Simpler queries. ● Drill-down navigation. Cons Cons ● Slow. ● Need to keep sums in – Reports and sync with move lines. dashboard ● More complex charts/tables are (__compute) or performance hungry. specific queries to ● Becomes even slower make use of the with time. precalculated sums.
  24. 24. Precalculated sums – Drill down navigation (Chricar prototype) 1/3
  25. 25. Precalculated sums – Drill down navigation (Chricar prototype) 2/3
  26. 26. Precalculated sums – Drill down navigation (Chricar prototype) 3/3
  27. 27. And one last remark... ...all this is applicable to the stock quantities computation too!

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

×