VectorwiseImplementation best practicesMark Van de WielDirector Product Management, VectorwiseThursday, November 01, 20121...
Agenda Hardware Operating system Database configuration Database design Data loading High availability Monitoring         ...
100x (+) Performance Difference – 2003Custom C versus Relational Database                                           TPC-H ...
Some Numbers Traditional RDBMS: <200 MB/s per core  Even these use MPP to I/O challenges Vectorwise (lab environment): >1....
What Hardware to Use CPU Memory Storage I/O and capacity        Requirements                               Budget         ...
Hardware Considerations – MEMORY Ideally frequently-accessed data should fit in memory  May be all data  May be a small po...
Hardware Recommendation CPUs  Use CPUs with higher clock rate for better raw throughput  Use more cores for higher through...
ExamplesSmall configuration (1 TB)  Dell R620  Lenovo RD430Medium configuration (single digit TBs)  Dell R720  HP DL380  I...
Operating System Considerations                                                 64-bit    Redhat                          ...
Database ConfigurationInstallation defaults are generally good May want to adjust column buffer size (default 25% of RAM) ...
Database Design Schema – no particular preference  Single demormalized table, star schema, snowflake schema, 3rd normal fo...
Data LoadingInitial load  File-based bulk load through vwload or copy   Conversion into UTF8  Use tools   Pentaho   Inform...
Data LoadingIncremental load INSERT, UPDATE and/or DELETE Append if possible Batch if possible Use COMBINE Positional Delt...
Moving Window of DataConsiderations COMBINE on a large table can be expensive  Mostly relevant for updates and deletes Alt...
High Availability Hardware and OS best practices  UPS, RAID Vectorwise backup  Only read-only, full backup  Consider perio...
Monitoring OS monitoring  CPU, memory utilization, I/O statistics vwinfo data Actian Director DBA tools             Confid...
Agenda Hardware Operating system Database configuration Database design Data loading High availability MonitoringMore info...
Confidential © 2012 Actian Corporation
Upcoming SlideShare
Loading in …5
×

A27 Vectorwise Performance Considerations_implementation_best_practices

936 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
936
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
44
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A27 Vectorwise Performance Considerations_implementation_best_practices

  1. 1. VectorwiseImplementation best practicesMark Van de WielDirector Product Management, VectorwiseThursday, November 01, 20121 of 9 1 of 9Confidential © 2012 Actian Corporation
  2. 2. Agenda Hardware Operating system Database configuration Database design Data loading High availability Monitoring Confidential © 2012 Actian Corporation 2
  3. 3. 100x (+) Performance Difference – 2003Custom C versus Relational Database TPC-H 1 GB query 1 (runtime in s)30 28.1 26.22520 MySQL15 DBMS X C program10 Vectorwise 5 0.2 0.6 0 MySQL DBMS X C program Vectorwise Confidential © 2012 Actian Corporation 3
  4. 4. Some Numbers Traditional RDBMS: <200 MB/s per core Even these use MPP to I/O challenges Vectorwise (lab environment): >1.5 GB/s per core Maximum throughput requirement is extremely high Realistically (cost-effectively) only RAM can serve data quick enough Confidential © 2012 Actian Corporation 4
  5. 5. What Hardware to Use CPU Memory Storage I/O and capacity Requirements Budget Confidential © 2012 Actian Corporation 5
  6. 6. Hardware Considerations – MEMORY Ideally frequently-accessed data should fit in memory May be all data May be a small portion of the data Note: data is compressed in memory buffer • 3x – 5x compression ratios are common Query execution should all take place in memory Operations against larger data sets require more memory Consider query concurrency “Spill to disk” is supported but should be a last resort Confidential © 2012 Actian Corporation 6
  7. 7. Hardware Recommendation CPUs Use CPUs with higher clock rate for better raw throughput Use more cores for higher throughput Higher power CPUs are faster Memory At least 8 GB per core (more is always better) Storage Use as many drives as possible Ensure sufficient capacity Use the fastest drives available • SAS over SATA, ideally 15k RPM • SSDs are often not cost-effective relative to more memory Confidential © 2012 Actian Corporation 7
  8. 8. ExamplesSmall configuration (1 TB) Dell R620 Lenovo RD430Medium configuration (single digit TBs) Dell R720 HP DL380 IBM x3650 Lenovo RD630High-end configuration Dell R910 HP DL580 or DL980 IBM x3750 Confidential © 2012 Actian Corporation 8
  9. 9. Operating System Considerations 64-bit Redhat Windows 7 (or higher) SuSE xfs, ext3, ext4 Windows 2008 (or higher) Ubuntu Confidential © 2012 Actian Corporation 9
  10. 10. Database ConfigurationInstallation defaults are generally good May want to adjust column buffer size (default 25% of RAM) May want to adjust processing memory (default 50% of RAM) Confidential © 2012 Actian Corporation 10
  11. 11. Database Design Schema – no particular preference Single demormalized table, star schema, snowflake schema, 3rd normal form Constraints Only on empty tables today… (to be addressed in Vectorwise 3.0) Consider data loading order and impact Indexes Note: clustered index-only today (“index-organized table”) One per table Consider incremental load Confidential © 2012 Actian Corporation 11
  12. 12. Data LoadingInitial load File-based bulk load through vwload or copy Conversion into UTF8 Use tools Pentaho Informatica Talend HVR Attunity Confidential © 2012 Actian Corporation 12
  13. 13. Data LoadingIncremental load INSERT, UPDATE and/or DELETE Append if possible Batch if possible Use COMBINE Positional Delta Trees Memory considerations Propagation to disk Use tools Confidential © 2012 Actian Corporation 13
  14. 14. Moving Window of DataConsiderations COMBINE on a large table can be expensive Mostly relevant for updates and deletes Alternative: manual partitioning One table per period Single view across all tables Confidential © 2012 Actian Corporation 14
  15. 15. High Availability Hardware and OS best practices UPS, RAID Vectorwise backup Only read-only, full backup Consider periodic full backup and file incremental loads Disaster recovery Dual load Active/active possibility Confidential © 2012 Actian Corporation 15
  16. 16. Monitoring OS monitoring CPU, memory utilization, I/O statistics vwinfo data Actian Director DBA tools Confidential © 2012 Actian Corporation 16
  17. 17. Agenda Hardware Operating system Database configuration Database design Data loading High availability MonitoringMore information in the Vectorwise Developer Guide: http://www.actian.com/images/white_papers/vw_developers_v2.5.pdf Confidential © 2012 Actian Corporation 17
  18. 18. Confidential © 2012 Actian Corporation

×