Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

1,292 views

Published on

Published in:
Technology

No Downloads

Total views

1,292

On SlideShare

0

From Embeds

0

Number of Embeds

9

Shares

0

Downloads

30

Comments

0

Likes

3

No embeds

No notes for slide

- 1. Performance evaluation of fast integer compression techniques over tables Ikhtear Md. Sharif Bhuyan Supervisors: Hazel Webb, Daniel Lemire, Owen Kaser ©Ikhtear Md. Sharif Bhuyan
- 2. Overview • • • • • • Introduction Compression in databases and issues Objectives Experimental Results Conclusion Future Work Performance evaluation of fast integer compression techniques over tables 12/4/2013 2
- 3. Query processing Processor Cache RAM Disk Performance evaluation of fast integer compression techniques over tables 12/4/2013 3
- 4. Compression in databases • • • • Reduce storage Query processing speed Save I/O bandwidth Improve performance for I/O-bound operation Performance evaluation of fast integer compression techniques over tables 12/4/2013 4
- 5. Selecting Compression in databases • Lossless • Trade off between compression ratio and speed of compression and decompression Performance evaluation of fast integer compression techniques over tables 12/4/2013 5
- 6. Objective • Examining and comparing the performance of patched schemes with other methods with respect to compression ratio, decompression speed and compression speed. • Assessing the effect of different factors such as row order. Performance evaluation of fast integer compression techniques over tables 12/4/2013 6
- 7. Column-oriented database system 104543 ID Name 104543 Peter 203456 Sam 234321 Peter 203456 Sam 234321 Maria Maria Row-oriented database 104543 203456 234621 Peter Sam Maria Column-oriented database Performance evaluation of fast integer compression techniques over tables 12/4/2013 7
- 8. Compression Algorithm • Variable length output o Byte-oriented compression: Integers are coded in units of bytes. i.e., Variable-Byte o Block-based compression: These schemes use a fixed number of input integers and output a variable number of bytes. e.g., FOR, NewPFD, FastPFD Performance evaluation of fast integer compression techniques over tables 12/4/2013 8
- 9. Compression Algorithm (Contd …) • Fixed length output Each step takes a variable number of integers and produces a compressed form of those integers using a fixed number of bits as a unit. i.e., Simple9 Performance evaluation of fast integer compression techniques over tables 12/4/2013 9
- 10. Binary packing • Original Sequence 67 78 85 96 98 • the numbers range from 67 to 98. • Compressed Sequence 0 11 18 29 Performance evaluation of fast integer compression techniques over tables 31 12/4/2013 10
- 11. Patched Compression • Original Sequence 11 1 10 … 11 11 11111 10 11 • The exception # 11111. • Base value b=2 (non-exceptional values), maximum number of bits 5, number of exception 1, location of exception 125 • Compressed Sequence 11 1 10 … 11 11 11 Performance evaluation of fast integer compression techniques over tables 10 11 12/4/2013 11
- 12. Synthetic data experiments • Compression Ratio Clustered data Clustered Data Performance evaluation of fast integer compression techniques over tables 12/4/2013 12
- 13. Synthetic data experiments (Contd …) • Compression Ratio Uniform data Uniform data Performance evaluation of fast integer compression techniques over tables 12/4/2013 13
- 14. Synthetic data experiments(Contd …) • Decompression Speed: Clustered data Performance evaluation of fast integer compression techniques over tables 12/4/2013 14
- 15. Synthetic data experiments(Contd …) Uniform Data Performance evaluation of fast integer compression techniques over tables 12/4/2013 15
- 16. Real Data Sets • Census-Income • Census1881 • Star Schema Benchmark Performance evaluation of fast integer compression techniques over tables 12/4/2013 16
- 17. Column wise Compressed size Original Shuffled Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 17
- 18. Column wise Compressed size (Contd …) Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3) Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 18
- 19. Column wise Compression speed Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 19
- 20. Column wise Compression speed (Contd …) Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 20
- 21. Column wise Decompression speed Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 21
- 22. Column wise Decompression speed (Contd …) Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 22
- 23. Effect of Row Order Histogram of compressed size (bits/int) Performance evaluation of fast integer compression techniques over tables 12/4/2013 23
- 24. Conclusion • Sorting columns results in good compressed size. • Sorted columns can be compressed and decompressed faster than shuffled order. • Selection of compression schemes depends on the nature of database(OLPT/OLAP) and the requirement of storage and data access speed. Performance evaluation of fast integer compression techniques over tables 12/4/2013 24
- 25. Future Work • Incorporating a query engine to asses real world performance. • Comparing on processor-level metrics. • Using multiple threads in compression algorithm. • Query in compressed form Performance evaluation of fast integer compression techniques over tables 12/4/2013 25
- 26. Thank You Performance evaluation of fast integer compression techniques over tables 12/4/2013 26
- 27. Backup Performance evaluation of fast integer compression techniques over tables 12/4/2013 27
- 28. Key Issues • Data access latency The time it takes between the request sent and the data is found on disk to start processing. • Disk bandwidth The amount of data can be sent per second from the disk. Performance evaluation of fast integer compression techniques over tables 12/4/2013 28
- 29. Experimental Setup • Hardware o o o o Intel Core i5-2400 RAM: 8 GB Cache: 6MB L3 Memory Clock Speed: 1333 MHz • Software o Java SDK version 1.7.0 o https://github.com/lemire/JavaFastPFOR o Single-threaded • More Info o http://hdl.handle.net/1882/45703 Performance evaluation of fast integer compression techniques over tables 12/4/2013 29
- 30. Compressed Size Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 15.00 15.00 15.00 15.00 Binary Packing 11.37 11.42 11.15 11.37 NewPFD 13.06 13.19 12.32 13.14 OptPFD 11.84 11.85 11.80 11.80 FastPFOR 11.27 11.29 11.06 11.24 Simple9 15.75 15.90 15.72 15.84 Result of compression (bits per integer) on SSB with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 30
- 31. Compression Speed Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 33 31 33 31 Binary Packing 729 711 746 732 NewPFD 52 36 40 34 OptPFD 6 3 5 4 FastPFOR 104 76 89 84 Simple9 78 60 69 64 Result of compression speed (mis) on Census1881 with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 31
- 32. Decompression Speed Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 165 197 214 186 Binary Packing 1151 1089 1151 1135 NewPFD 709 615 729 689 OptPFD 421 357 482 381 FastPFOR 776 707 763 730 Simple9 488 377 447 398 Result of decompression speed (mis) on Census1881 with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 32
- 33. Column wise Compressed size Original Shuffled Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 33
- 34. Column wise Compressed size Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3) Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 34
- 35. Column wise Compression speed Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 35
- 36. Column wise Decompression speed Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 36
- 37. Effect of CPU family on compression speed Compression speed (mis) on different processor Performance evaluation of fast integer compression techniques over tables 12/4/2013 37
- 38. Effect of CPU family on decompression speed Decompression speed (mis) on different processor Performance evaluation of fast integer compression techniques over tables 12/4/2013 38

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment