0
Performance evaluation of fast
integer compression
techniques over tables

Ikhtear Md. Sharif Bhuyan
Supervisors: Hazel We...
Overview
•
•
•
•
•
•

Introduction
Compression in databases and issues
Objectives
Experimental Results
Conclusion
Future W...
Query processing
Processor

Cache
RAM

Disk

Performance evaluation of fast integer compression techniques over tables

12...
Compression in databases
•
•
•
•

Reduce storage
Query processing speed
Save I/O bandwidth
Improve performance for I/O-bou...
Selecting Compression in
databases
• Lossless
• Trade off between compression ratio and speed of
compression and decompres...
Objective
• Examining and comparing the performance of
patched schemes with other methods with respect
to compression rati...
Column-oriented database
system
104543

ID

Name

104543

Peter

203456

Sam

234321

Peter

203456

Sam

234321

Maria

M...
Compression Algorithm
• Variable length output
o Byte-oriented compression: Integers are coded in
units of bytes. i.e., Va...
Compression Algorithm (Contd …)
• Fixed length output
Each step takes a variable number of integers
and produces a compres...
Binary packing
• Original Sequence
67

78

85

96

98

• the numbers range from 67 to 98.
• Compressed Sequence
0

11

18
...
Patched Compression
• Original Sequence
11

1

10

…

11

11

11111 10

11

• The exception # 11111.
• Base value b=2 (non...
Synthetic data experiments
• Compression Ratio Clustered data

Clustered Data

Performance evaluation of fast integer comp...
Synthetic data experiments (Contd …)
• Compression Ratio Uniform data

Uniform data
Performance evaluation of fast integer...
Synthetic data experiments(Contd …)
• Decompression Speed:

Clustered data

Performance evaluation of fast integer compres...
Synthetic data experiments(Contd …)

Uniform Data
Performance evaluation of fast integer compression techniques over table...
Real Data Sets
• Census-Income
• Census1881
• Star Schema Benchmark

Performance evaluation of fast integer compression te...
Column wise Compressed size

Original

Shuffled

Column-wise compressed size for Census1881 of frequency coded file
Perfor...
Column wise Compressed size
(Contd …)

Sort High Cardinality Column (column 1)

Sort Low Cardinality Column(column 3)

Col...
Column wise Compression
speed

Column-wise compression speed for Census1881 of frequency coded file
Performance evaluation...
Column wise Compression
speed (Contd …)

Column-wise compression speed for Census1881 of frequency coded file
Performance ...
Column wise Decompression
speed

Column-wise decompression speed for Census1881 of frequency coded file
Performance evalua...
Column wise Decompression
speed (Contd …)

Column-wise decompression speed for Census1881 of frequency coded file
Performa...
Effect of Row Order

Histogram of compressed size (bits/int)
Performance evaluation of fast integer compression techniques...
Conclusion
• Sorting columns results in good compressed size.
• Sorted columns can be compressed and
decompressed faster t...
Future Work
• Incorporating a query engine to asses real world
performance.
• Comparing on processor-level metrics.
• Usin...
Thank You

Performance evaluation of fast integer compression techniques over tables

12/4/2013

26
Backup

Performance evaluation of fast integer compression techniques over tables

12/4/2013

27
Key Issues
• Data access latency
The time it takes between the request sent and the
data is found on disk to start process...
Experimental Setup
• Hardware
o
o
o
o

Intel Core i5-2400
RAM: 8 GB
Cache: 6MB L3
Memory Clock Speed: 1333 MHz

• Software...
Compressed Size
Coding Scheme

Original

Shuffled High Card.

Low Card.

Variable-Byte

15.00

15.00

15.00

15.00

Binary...
Compression Speed
Coding Scheme

Original

Shuffled High Card.

Low Card.

Variable-Byte

33

31

33

31

Binary Packing

...
Decompression Speed
Coding Scheme

Original

Shuffled High Card.

Low Card.

Variable-Byte

165

197

214

186

Binary Pac...
Column wise Compressed size

Original

Shuffled

Column-wise compressed size for Census1881 of frequency coded file
Perfor...
Column wise Compressed size

Sort High Cardinality Column (column 1)

Sort Low Cardinality Column(column 3)

Column-wise c...
Column wise Compression
speed

Column-wise compression speed for Census1881 of frequency coded file
Performance evaluation...
Column wise Decompression
speed

Column-wise decompression speed for Census1881 of frequency coded file
Performance evalua...
Effect of CPU family on
compression speed

Compression speed (mis) on different processor

Performance evaluation of fast ...
Effect of CPU family on
decompression speed

Decompression speed (mis) on different processor
Performance evaluation of fa...
Upcoming SlideShare
Loading in...5
×

Performance evaluation of fast integer compression techniques over tables

483

Published on

Compression is used in database management systems to improve the performance by preserving memory and storage. Compression and decompression requires substantial processing by the Central Processing Unit (CPU). Queries in large databases such as Online Analytical Processing (OLAP) databases involve processing huge amounts of data. Database compression not only reduces disk space requirements, but also increases the effective Input/Output (I/O) bandwidth since more data is transferred in compressed form to cache for query processing. As a result, database compression transforms I/O-intensive database operations into more CPU-intensive operations. We are most likely to benefit from compression if encoding and decoding speeds significantly exceed I/O bandwidth. Hence, we seek compression schemes that can be used in databases with good compression ratios and high speed. We examined the performance of several compression schemes such as Variable-Byte, Binary Packing/Frame Of Reference (FOR), Simple9 and Simple16 which have reasonable compression ratio with fair decompression speed over sequences of integers. As variations on Binary Packing, we also studied patched schemes such as NewPFD, OptPFD and FastPFOR: they have good compression ratios and decompression speed though they need more computational time during compression than Binary Packing. In our study, we aim to quantify the trade-offs of fast integer compression schemes with respect to compression ratio and speed of compression and decompression. We are able to decompress data at a rate of around 1.5 billion of integers per second (in Java) while sometimes beating Shannon’s entropy. In our tests, Binary Packing is significantly faster than all other alternatives. Among the patched schemes we tested, the recently introduced FastPFOR is most competitive. Hence, we found that it is worth using a patching scheme because we get both a good compression ratio and a high decompression speed. However, the higher compression and decompression speed of Binary Packing makes it a better choice when speed is more important than compression ratio. We also assessed the effects that row ordering and sorting have on compression performance. We discovered that sorting can significantly improve the performance of compression. We obtained around 15% gain in decompression speed and 12% gain in compression ratio by sorting compared to random shuffling. Additionally, we found that sorting on the highest cardinality column was more effective than sorting on lower cardinality columns.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
483
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Performance evaluation of fast integer compression techniques over tables"

  1. 1. Performance evaluation of fast integer compression techniques over tables Ikhtear Md. Sharif Bhuyan Supervisors: Hazel Webb, Daniel Lemire, Owen Kaser ©Ikhtear Md. Sharif Bhuyan
  2. 2. Overview • • • • • • Introduction Compression in databases and issues Objectives Experimental Results Conclusion Future Work Performance evaluation of fast integer compression techniques over tables 12/4/2013 2
  3. 3. Query processing Processor Cache RAM Disk Performance evaluation of fast integer compression techniques over tables 12/4/2013 3
  4. 4. Compression in databases • • • • Reduce storage Query processing speed Save I/O bandwidth Improve performance for I/O-bound operation Performance evaluation of fast integer compression techniques over tables 12/4/2013 4
  5. 5. Selecting Compression in databases • Lossless • Trade off between compression ratio and speed of compression and decompression Performance evaluation of fast integer compression techniques over tables 12/4/2013 5
  6. 6. Objective • Examining and comparing the performance of patched schemes with other methods with respect to compression ratio, decompression speed and compression speed. • Assessing the effect of different factors such as row order. Performance evaluation of fast integer compression techniques over tables 12/4/2013 6
  7. 7. Column-oriented database system 104543 ID Name 104543 Peter 203456 Sam 234321 Peter 203456 Sam 234321 Maria Maria Row-oriented database 104543 203456 234621 Peter Sam Maria Column-oriented database Performance evaluation of fast integer compression techniques over tables 12/4/2013 7
  8. 8. Compression Algorithm • Variable length output o Byte-oriented compression: Integers are coded in units of bytes. i.e., Variable-Byte o Block-based compression: These schemes use a fixed number of input integers and output a variable number of bytes. e.g., FOR, NewPFD, FastPFD Performance evaluation of fast integer compression techniques over tables 12/4/2013 8
  9. 9. Compression Algorithm (Contd …) • Fixed length output Each step takes a variable number of integers and produces a compressed form of those integers using a fixed number of bits as a unit. i.e., Simple9 Performance evaluation of fast integer compression techniques over tables 12/4/2013 9
  10. 10. Binary packing • Original Sequence 67 78 85 96 98 • the numbers range from 67 to 98. • Compressed Sequence 0 11 18 29 Performance evaluation of fast integer compression techniques over tables 31 12/4/2013 10
  11. 11. Patched Compression • Original Sequence 11 1 10 … 11 11 11111 10 11 • The exception # 11111. • Base value b=2 (non-exceptional values), maximum number of bits 5, number of exception 1, location of exception 125 • Compressed Sequence 11 1 10 … 11 11 11 Performance evaluation of fast integer compression techniques over tables 10 11 12/4/2013 11
  12. 12. Synthetic data experiments • Compression Ratio Clustered data Clustered Data Performance evaluation of fast integer compression techniques over tables 12/4/2013 12
  13. 13. Synthetic data experiments (Contd …) • Compression Ratio Uniform data Uniform data Performance evaluation of fast integer compression techniques over tables 12/4/2013 13
  14. 14. Synthetic data experiments(Contd …) • Decompression Speed: Clustered data Performance evaluation of fast integer compression techniques over tables 12/4/2013 14
  15. 15. Synthetic data experiments(Contd …) Uniform Data Performance evaluation of fast integer compression techniques over tables 12/4/2013 15
  16. 16. Real Data Sets • Census-Income • Census1881 • Star Schema Benchmark Performance evaluation of fast integer compression techniques over tables 12/4/2013 16
  17. 17. Column wise Compressed size Original Shuffled Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 17
  18. 18. Column wise Compressed size (Contd …) Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3) Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 18
  19. 19. Column wise Compression speed Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 19
  20. 20. Column wise Compression speed (Contd …) Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 20
  21. 21. Column wise Decompression speed Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 21
  22. 22. Column wise Decompression speed (Contd …) Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 22
  23. 23. Effect of Row Order Histogram of compressed size (bits/int) Performance evaluation of fast integer compression techniques over tables 12/4/2013 23
  24. 24. Conclusion • Sorting columns results in good compressed size. • Sorted columns can be compressed and decompressed faster than shuffled order. • Selection of compression schemes depends on the nature of database(OLPT/OLAP) and the requirement of storage and data access speed. Performance evaluation of fast integer compression techniques over tables 12/4/2013 24
  25. 25. Future Work • Incorporating a query engine to asses real world performance. • Comparing on processor-level metrics. • Using multiple threads in compression algorithm. • Query in compressed form Performance evaluation of fast integer compression techniques over tables 12/4/2013 25
  26. 26. Thank You Performance evaluation of fast integer compression techniques over tables 12/4/2013 26
  27. 27. Backup Performance evaluation of fast integer compression techniques over tables 12/4/2013 27
  28. 28. Key Issues • Data access latency The time it takes between the request sent and the data is found on disk to start processing. • Disk bandwidth The amount of data can be sent per second from the disk. Performance evaluation of fast integer compression techniques over tables 12/4/2013 28
  29. 29. Experimental Setup • Hardware o o o o Intel Core i5-2400 RAM: 8 GB Cache: 6MB L3 Memory Clock Speed: 1333 MHz • Software o Java SDK version 1.7.0 o https://github.com/lemire/JavaFastPFOR o Single-threaded • More Info o http://hdl.handle.net/1882/45703 Performance evaluation of fast integer compression techniques over tables 12/4/2013 29
  30. 30. Compressed Size Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 15.00 15.00 15.00 15.00 Binary Packing 11.37 11.42 11.15 11.37 NewPFD 13.06 13.19 12.32 13.14 OptPFD 11.84 11.85 11.80 11.80 FastPFOR 11.27 11.29 11.06 11.24 Simple9 15.75 15.90 15.72 15.84 Result of compression (bits per integer) on SSB with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 30
  31. 31. Compression Speed Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 33 31 33 31 Binary Packing 729 711 746 732 NewPFD 52 36 40 34 OptPFD 6 3 5 4 FastPFOR 104 76 89 84 Simple9 78 60 69 64 Result of compression speed (mis) on Census1881 with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 31
  32. 32. Decompression Speed Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 165 197 214 186 Binary Packing 1151 1089 1151 1135 NewPFD 709 615 729 689 OptPFD 421 357 482 381 FastPFOR 776 707 763 730 Simple9 488 377 447 398 Result of decompression speed (mis) on Census1881 with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 32
  33. 33. Column wise Compressed size Original Shuffled Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 33
  34. 34. Column wise Compressed size Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3) Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 34
  35. 35. Column wise Compression speed Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 35
  36. 36. Column wise Decompression speed Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 36
  37. 37. Effect of CPU family on compression speed Compression speed (mis) on different processor Performance evaluation of fast integer compression techniques over tables 12/4/2013 37
  38. 38. Effect of CPU family on decompression speed Decompression speed (mis) on different processor Performance evaluation of fast integer compression techniques over tables 12/4/2013 38
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×