Performance Improvements
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Performance Improvements

  • 1,210 views
Uploaded on

Performance improvement in OpenOffice.org

Performance improvement in OpenOffice.org

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,210
On Slideshare
1,210
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Niklas Nebel Sun Microsystems PERFORMANCE IMPROVEMENTS IN CALC
  • 2. Agenda
    • Introduction and context
    • 3. Local optimizations
    • 4. Handling sheets separately
    • 5. DataPilot performance
    • 6. Load & save outlook
  • 7. Introduction and Context
  • 8. Performance work in all of OOo
    • Performance project
      • Big improvements from 3.0 to 3.2
    • Start-up: Cold start of Writer 20% faster
    • 9. Writer load performance
      • Comparable with MS Word 2007
    • Impress load performance
      • Comparable with MS PowerPoint 2007
    • Calc performance
      • Load and save: Up to twice as fast
      • 10. Recalculation: Up to 20 times faster (extreme case)
  • 11. Local Optimizations
  • 12. API Usage When Saving Text Cells
    • Filter uses getFormula API method
    • 13. Single quote character added if text can be parsed as a number
    • 14. Unnecessary parsing step
    • 15. Can take up to 17% of CPU time
  • 16. Querying the Document Null Date
    • Internal representation: Days since the null date
    • 17. File format: XML Schema dates ( ≈ ISO 8601)
    • 18. Utility method for conversion
      • Queries the null date from the document
      • 19. Several UNO calls
    • Querying once is enough
    • 20. 10% of CPU time if only date cells are used
  • 21. Collecting Formatted Cell Ranges
    • Collect cell ranges with equal cell formats
      • For generating automatic styles
      • 22. Keep a list of ranges for each set of formats
      • 23. Try to join adjacent ranges
    • Formats are kept and iterated column-wise
      • Can use this information when trying to join
    • Prevents pathological cases
  • 24. Formula Optimizations
    • String handling when formuas are parsed
      • Functions, references, names are case-insensitive
      • 25. Operators, separators, parentheses are not
      • 26. Reduce case conversion calls
        • 5% of CPU time saved
    • Sorting of values for MEDIAN etc.
      • Not necessary to completely sort the array
      • 27. Use std::nth_element STL method instead
      • 28. Faster calculation after loading
  • 29. Formula Recalculation (1)
    • Detection of duplicate notifications
      • When a cell range is modified
      • 30. Parameter range can contain several changed cells
      • 31. Notify each range only once
    • Also useful for single-cell change
      • Parameter range can contain several changed results
      • 32. Extreme case: Issue 95967 – 20x faster
  • 33. Handling Sheets Separately
  • 34. Updating Row Heights
    • Optimal row height depends on local conditions
      • Especially fonts
    • Core structures need concrete height values
      • Positioning of shapes: Whole file
        • File format: relative to cell position
        • 35. Internally: absolute positions
      • Screen output: Only single sheet
    • Update row heights
      • After loading: Visible sheet and sheets with shapes
      • 36. Others as needed (display, printing, …)
  • 37. Updating Row Heights: Comments
    • Cell comments (formerly: notes) are shapes
    • 38. Often used in large sheets
      • Usually not shown
    • Create shape only when comment is shown
      • Saves time if there are many hidden comments
      • 39. Row heights can be updated later
  • 40. Updating Row Heights: Results
    • No effect for single sheet
    • 41. Little improvement for text and numbers
    • 42. 30% CPU time with date cells on many sheets
    • 43. Formula results don't have to be calculated
  • 44. Partial Saving
    • Don't generate XML elements for whole file
    • 45. Copy unchanged parts on stream level
    • 46. Could copy from temporary storage
      • Storage layer creates copy of the unpacked file
    • Access the original file
      • Uncompress on the fly
    • Cost
      • File access: Read the compressed file
      • 47. CPU: Uncompress
  • 48. Experiment: Incremental Saving
    • Generate XML elements only for changed cells
      • Proof of concept: Only single-cell changes
    • No additional information kept after loading
    • 49. Minimal parsing to find affected cells in stream
      • Takes extra time
      • 50. Less if affected cells near start of file
    • Results (compared to 3.0):
      • 40 – 70% improvement in CPU time
      • 51. 30 – 50% improvement in total time
  • 52. Sheet-Wise Saving
    • Handle sheets instead of individual cells
    • 53. Fewer sheets than cells
      • Additional information can be kept in memory
    • Easier to find modified sheets than modified cells
    • 54. One obvious limitation:
      • Only useful with several sheets
  • 55. Finding Modified Sheets
    • Few code changes for most types of changes
      • Formula notification for cell contents
      • 56. Formula calculation for changed results
      • 57. Cell format changes
      • 58. Column widths or row heights
      • 59. Handled separately: Print ranges, etc.
    • Currently no handling of drawing layer changes
      • All sheets are considered modified
  • 60. Automatic Styles
    • Direct formats are collected in automatic styles
      • Referenced by name
        • Generated name (“ce1” etc.)
      • One list for the whole document
      • 61. Have to be created with the same names again
    • Implemented for cell contents (incl. comments)
      • Keep a mapping of names to cell/text positions
      • 62. Collect styles for unchanged sheets first
      • 63. Include in existing duplicate detection for other sheets
    • Sheets with shapes always saved normally
  • 64. Putting the Parts Together
    • When loading a file
      • Compatibility checks: Namespaces, encoding
      • 65. Keep stream positions and style information
    • Steps to save a spreadsheet document
      • meta.xml, styles.xml, embedded objects: as usual
      • 66. content.xml
        • Generate common content and modified sheets
        • 67. For each sheet: Generate or copy stream portion
      • For “Save” and “Save As” update stream positions
  • 68. Results
    • Influencing factors
      • Unchanged sheets
      • 69. Type of sheet content
      • 70. CPU time / file access
    • Example
      • Text, numbers, dates
      • 71. 16 sheets
    • Single sheet modified
      • Twice as fast
      • 72. On top of other changes
  • 73. Formula Recalculation (2)
    • Sheet area is divided into “slots”
      • 16 columns by 128 rows
      • 74. Range dependency registered in all affected slots
      • 75. Needs attention when row limit is changed
    • Change: Use hash_set instead of set
      • Faster modification of dependency structures
      • 76. Loading time
    • Change: Separate structures per sheet
      • Faster recalculation if several sheets are used
  • 77. DataPilot Performance
  • 78. DataPilot Memory Usage
    • Issue 55266: Several fields with many items
    • 79. Fix now under way from IBM Symphony team
      • Don't allocate results for all child items
      • 80. New cache table
    • CWS datapilotperf
      • Planned for OOo 3.3
      • 81. Combination of large fields no longer a limitation
  • 82. Load & Save Outlook
  • 83. DOM Usage
    • Prototype by Christian Lippka for Impress
      • Use fast SAX to fill a compact DOM representation
      • 84. Import from DOM, possibly parallel to parsing
    • Results for Impress
      • Only 2% improvement for typical presentation
      • 85. Filling DOM tree uses 2% of CPU time
      • 86. Not worth the effort
    • Calc may be different
      • Larger number of XML elements
      • 87. But: Memory usage twice the XML stream size
  • 88. Further Separation of Sheets
    • Load only the visible sheet
      • Load other sheets as needed, or in background
      • 89. Parse XML fragment from stream, or use DOM
      • 90. Formulas, charts may depend on changed cells
        • Dependencies must be known before saving
    • Parse formulas only as needed
      • Per sheet or individually
      • 91. Already a separate step (but for all formulas)
    • Handle several sheets in parallel
      • More fine-grained locking needed
  • 92. Q & A
  • 93. PERFORMANCE IMPROVEMENTS IN CALC Niklas Nebel [email_address]