Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hp vertica certification guide


Published on

Vertica Academy

Published in: Software

Hp vertica certification guide

  1. 1. HP Vertica Certification Guide Softtek 2015
  2. 2. Vertica Architecture
  3. 3. Identify key features of Vertica 1. Performance Features 1. Column-orientation 2. Aggressive Compression 3. Read-Optimized Storage 4. Ability to exploit multiple sort orders 5. Parallel shared-nothing design on on-the-shelf hardware 6. Bottom Line 2. Administrative and Management Features 1. Vertica Database Designer 2. Recovery and High Availability through K-Safety 3. Continuous Load: Snapshot Isolation and the WOS 4. Monitoring and Administration Tools and APIs Cristóbal Gómez | Identify key features of Vertica | 1
  4. 4. The Vertica Analytic Database Architecture Cristóbal Gómez | Identify key features of Vertica | 2
  5. 5. ROS Distribution And Tuple Mover Cristóbal Gómez | Identify key features of Vertica | 3
  6. 6. Victor Espinosa | Topic | # Page Temas: - Describe High Availability capabilities and describe Vertica’s transaction model. - Identify characteristics and determine features of projections used in Vertica.
  7. 7. High Availability. Ability of the database to continue running even if a node goes down. Proj A Proj B Proj C Proj C Proj A Proj B Buddy Projections: copies of existing projections stored in adjacent nodes. K-Safety: 0,1,2
  8. 8. High Availability and Recovery - HP Vertica is said to be K-safe. High Availability with Projections. - Vertica Replicate small, unsegmented projections. - creates buddy projections for large, segmented projections. - for small tables, it replicates them, creating and storing duplicates of these projections on all nodes. - HP Vertica creates buddy projections, which are copies of segmented projections that are distributed across database nodes.
  9. 9. Features - Columnar Orientation. Vertica stores data in columns, reads only the columns referenced by the query. - Advanced Encoding / Compression. compress and encode as part of the database design. reduce disk storage. data does not need to be unencoded to return a result. - High Availability. - Automatic Database Design transform data into column-based projections. query performance can be enhanced by comparing the data loaded and the most commonly used SQL queries. - Application Integration. Vertica uses standard SQL. - Massively Parallel Processing. ETL Replication Data Quality Vertica Analytics Reporting
  10. 10. Projections Characteristics and Features. Projection is a representation of the columns in the source tables. Vertica stores data in columnar format called Projections. Vertica stores all data in Projections. Projections are updated automatically as data is loaded into the database. Data is sorted and compressed. Vertica distribute the data across all nodes. 3 Types of Projections: Superprojections. Contain all data, they are created when data is first loaded into the database. Query-Specific Projections. Contain only the columns needed for a specific query. Buddy Projections. Copies of projections stored on an adjacent node.
  11. 11. Projections with large amount of data: For small amount of data segmentation is not efficient, Vertica copy the full projection to each node.
  12. 12. Create projections using DDL (Data Definition Language)
  13. 13. Vertica’s Transaction Model. Vertica follows the SQL-92 transaction model. - DML commands: INSERT, UPDATE, DELETE. - you don’t have to explicitly start a transaction. - we must use COMMIT, ROLLBACK or COPY to end a transaction. In Vertica: - DELETE doesn’t delete data from disk storage, it marks rows as deleted so they can be be found by historical queries. - UPDATE write two rows: one with new data and one marked for deletion. Like COPY, by default, INSERT, UPDATE and DELETE commands write the data to the WOS and on overflow write to the ROS. For large INSERTS or UPDATES, you can use the DIRECT keyword to force HP Vertica to write rows directly to the ROS. Loading large number of rows as single row inserts are not recommended for performance reasons. Use COPY instead.
  14. 14. Cristóbal Gómez | Topic | # Page Temas A1 - Identify key features of Vertica C1 - Identify benefits of loading data into WOS and directly into ROS D4 - Distinguish between deleting partitions and deleting records F1 - Identify situations when a backup is recommended H1 - Understanding analytics syntax
  15. 15. Identify benefits of loading data into WOS and directly into ROS
  16. 16. Arely Sandoval Encoding Is the process of converting data into a standard format. Vertica uses a number of different encoding strategies, depending on column data type, table cardinality, and sort order. Compression Is process of transforming data into a compact format. Encoding Types ENCODING AUTO (default) Lempel-Ziv-Oberhumer-based (LZO) compression is used for CHAR/VARCHAR, BOOLEAN, BINARY/VARBINARY, and FLOAT columns. ENCODING DELTAVAL Stores only the differences between sequential data values instead of the values themselves. This encoding type is best used for integer-based columns, but also applies to DATE/TIME/TIMESTAMP/INTERVAL columns. It has no effect on other data types. ENCODING RLE Arely Sandoval | A3- Differentiate between compression and encoding| # Page
  17. 17. ENCODING BLOCK_DICT For each block of storage, Vertica compiles distinct column values into a dictionary and then stores the dictionary and a list of indexes to represent the data block. Is ideal for few-valued, unsorted columns in which saving space is more important than encoding speed. BINARY/VARBINARY columns do not support BLOCK_DICT encoding. ENCODING BLOCKDICT_COMP This encoding type is similar to BLOCK_DICT except that dictionary indexes are entropy coded. This encoding type requires significantly more CPU time to encode and decode and has a poorer worst-case performance. However, use of this type can lead to space savings if the distribution of values is extremely skewed. ENCODING DELTARANGE_COMP Is ideal for many-valued FLOAT columns that are either sorted or confined to a range. Do not use it with unsorted columns that contain NULL values, as the storage cost for representing a NULL value is high.It has a high cost for both compression and decompression. ENCODING COMMONDELTA_COMP Is ideal for sorted FLOAT and INTEGER-based (DATE/TIME/TIMESTAMP/INTERVAL) data columns with predictable sequences and only the occasional sequence breaks, such as timestamps recorded at periodic intervals or primary keys. ENCODING NONE Do not specify this value. Increases space usage, increases processing time, and leads to problems Arely Sandoval | A3- Differentiate between compression and encoding| # Page
  18. 18. SELECT PROJECTION_NAME, PROJECTION_COLUMN_NAME, ENCODING_TYPE,DATA_TYPE FROM PROJECTION_COLUMNS WHERE PROJECTION_COLUMN_NAME='Column_Name'; Differentiate between compression and encoding ● Encoded data can be processed directly by Vertica. ● Compressed data cannot be directly processed by Vertica. Data must first be decompressed. ● Encoding depends on the data type of the data being encoded, and compression treats a compressed block as opaque / doesn't really care what's in it. Arely Sandoval | A3- Differentiate between compression and encoding| # Page
  19. 19. ● D6 - Identify the advantages of a group by pipe versus a group by hash ● F3 - Define the Resource Manager's role in query processing ● H3 - Using explain plans and query profiles Arely Sandoval | A3- Differentiate between compression and encoding| # Page
  20. 20. Juan Carlos Vázquez Tapia | Topic | # Page Juan Carlos Vazquez Tapia Temas ● Viernes 20 de Marzo ○ Sección: Projection Design ■ B5 - Understanding buddy projections. ● Martes 24 de Marzo ○ Sección: Removing Data Permanently from Vertica and Advanced Projection Design. ■ D2 - Identify the advantages and disadvantages of using delete vectors to identify records marked for deletion. ● Miercoles 25 de Marzo ○ Sección: Cluster Management in Vertica. ■ E4 - Define local segmentation capability in Vertica. ● Jueves 26 de Marzo ○ Sección: Monitoring and Troubleshooting Vertica. ■ G4 - Defining, using and logging into Management Console.
  21. 21. Juan Carlos Vázquez Tapia | Understanding Buddy Projections | # Page Projection Design B5 - Understanding Buddy Projections Definition: HP Vertica creates buddy projections, which are replicas of projections of the data in the database that exist in the cluster and these replicas are distributed across database nodes. HP Vertica ensures that projections that contain the same data are distributed to different nodes. This ensures that if a node goes down, all the data is available on the remaining nodes. The number of buddy projections is determined by the value of K as in K-safety
  22. 22. Juan Carlos Vázquez Tapia | Understanding Buddy Projections | # Page B5 - Understanding Buddy Projections Requirements: There are some requirements that two projections need to accomplish to be considered “buddies”, those requirements are: ● They have to contain the same columns ● They have to have the same hash segmentation ● Use different node ordering Buddy projections can have different sort orders for query performance purposes.
  23. 23. Juan Neve B4.- Describe the process of projection segmentation. D1.- Describe the process used to mark records for deletion. E3.- Identify the steps of online recovery of a failed node. G3.- Describe how to disallow user connections, while preserving dbadmin connectivity.
  24. 24. B4.- Describe the purpose of projection segmentation ● Provides high availability ● Recovery of data ● Optimizes query execution Juan Antonio Neve Gómez | Page 1
  25. 25. Segmented Duplicated Segmentation Juan Antonio Neve Gómez | Page 2
  26. 26. The Random distribution of data is very important for segmentation to be effective. it keeps the load on the nodes to the minimum so it runs more efficiently. Replicate projections provide high availability because all of the data is available on each node. And of course it helps to recovery because there are more copies on the other nodes. Juan Antonio Neve Gómez | Page 3
  27. 27. Carlos Leal 1. Determining segmentation and partitioning (B6) 1. Identify the process for processing a large delete or update (D3) 1. Distinguish between the items in Vertica Cluster (E5) 1. Administering a cluster using management console (F5) Carlos Ivan Leal
  28. 28. Determining Segmentation and Partitioning Partitioning and segmentation have completely separate functions in Vertica. It is important to clarify the differences because the concepts are similar, and there terms are often used interchangeably for other databases. Carlos Leal | Segmentation and Partitioning | B6
  29. 29. Segmentation and Partitioning Segmentation defines how data is spread among cluster nodes, while partitioning specifies how data is organized within the individual nodes. Segmentation is defined by the projection, and partitioning is defined by the table. Logically, the partition clause is applied after the segmented by clause. Carlos Leal | Segmentation and Partitioning | B6
  30. 30. Segmentation and Partitioning Segmentation and partitioning have opposite goals regarding data localization. Partitioning attempts to introduce hot spots within the node, allowing for a convenient way to drop data and reclaim the disk space. Segmentation (by hash) distributes the data evenly across all nodes in a Vertica cluster. Carlos Leal | Segmentation and Partitioning | B6
  31. 31. Segmentation and Partitioning Partitioning by year, for example, makes sense if you intend to retain and drop data at the granularity of a year. On the other hand, segmenting the data by year would be an extremely bad choice, as the node holding data for the current year would likely answer far more queries than the other nodes. Carlos Leal | Segmentation and Partitioning | B6
  32. 32. Carlos Leal | Identify the process for processing a large delete or update Identify the process for processing a large delete or update D3 ● Performance Considerations for Deletes and Updates A large number of (un-purged) deleted rows could negatively affect query and recovery performance. To eliminate the rows that have been deleted from the result, a query must do extra processing. It has been observed if 10% or more of the total rows in a table have been deleted, the performance of a query on the table slows down. However your experience may vary depending upon the size of the table, the table definition, and the query. The same problem can also happen during the recovery. To avoid this, the delete rows need to be purged in Vertica. For more information, see Purge Procedure.
  33. 33. Carlos Leal | Concurrency Concurrency Deletes and updates take exclusive locks on the table. Hence, only one delete or update transaction on that table can be in progress at a time and only when no loads (or INSERTs) are in progress. Deletes and updates on different tables can be run concurrently.
  34. 34. Carlos Leal | Optimizing Optimizing Deletes and Updates for Performance The process of optimizing a design for deletes and updates is the same. Some simple steps to optimize a projection design or a delete or update statement can increase the query performance by tens to hundreds of times. The following section details several proposed optimizations to significantly increase delete and update performance.
  35. 35. Temas (Manuel Loza) ● B2 - Define RLE ● C6 - Understanding both WOS and ROS ● E1 - Identify the steps used to add nodes to an existing clusters ● G1 - Define the use of Management Console in monitoring Vertica
  36. 36. Define RLE Run-Length Encoding o is an encoding method. o increases performance because there is less disk I/O during query execution. o Stores more data in less space. How it works? ● replaces sequences of the same data values within a column by a single value and a count number. Typically used when data is: 1. Sorted 2. Low cardinality 3. Any data type
  37. 37. Example:
  38. 38. Understanding both WOS and ROS Write Optimized Store (WOS) ● Memory-Resident ● Used to store INSERT, UPDATE, DELETE and COPY actions ● Arranged by projection ● Records are stored in the order they are inserted o Stores data without compression or indexing  Support very fast load speed ● A projection is sorted only when queried o Remains sorted until new data is inserted into it ● Holds both committed and uncommitted transactions
  39. 39. Read Optimized Store (ROS) ● Disk storage structure o Highly optimized o Read oriented ● Like WOS, ROS is arranged by projection o Projections in ROS are stored in ROS contain ● Makes optimal use of sorting (indexing) and compression ● COPY...DIRECT and INSERT (with /*direct*/ hint) o Load data directly into ROS
  40. 40. Luis Cárdenas C2 Define the actions of the move out and merge out tasks D5 Identify the advantages of merge join versus hash join. F2 Features of the vertica file used for back up and restore H2 Using event based windows, time series, event server join and pattern matching.
  41. 41. Ruben Gonzalez A. Vertica Architecture (Viernes 20) 4. Installation of Vertica. C. Loading Data into Vertica. (Lunes 23) 4. Copying data directly to ROS D Removing Data Permanently from Vertica and Advanced Projection Design. (Martes 24) 7. Describe the characteristics of a prejoin projection. F Backup/Restore and Resource Management in Vertica. (Jueves 26) 4. Describe the differences between maxconcurrency and planned concurrency.
  42. 42. Laura López B3 - Describe Order By importance in projection design C7 - Distinguishing between moveout and mergeout actions E2 - Describe the benefits of having identically sorted buddy projections G2 - Determine methods to troubleshoot spread
  43. 43. B3 - Describe Order By importance in projection design ● Specifies the columns to sort the projection on. ● You cannot specify an ascending or descending clause. ● HP Vertica always uses an ascending sort order in physical storage. ● If you do not specify the ORDER BY table-column parameter, HP Vertica uses the order in which columns are specified as the sort order for the projection. ● One of the ways the projections can be optimized.
  44. 44. B3 - Describe Order By importance in projection design
  45. 45. Identifying characteristics of data file directory Disk Space Requirements for HP Vertica In addition to actual data stored in the database, HP Vertica requires disk space for several data reorganization operations, such as mergeout and managing nodes in the cluster. For best results, HP recommends that disk utilization per node be no more than sixty percent (60%) for a K- Safe=1 database to allow such operations to proceed.
  46. 46. Identifying characteristics of data file directory In addition, disk space is temporarily required by certain query execution operators, such as hash joins and sorts, in the case when they cannot be completed in memory (RAM). Such operators might be encountered during queries, recovery, refreshing projections, and so on. The amount of disk space needed (known as temp space) depends on the nature of the queries, amount of data on the node and number of concurrent users on the system. By default, any unused disk space on the data disk can be used as temp space. However, HP recommends provisioning temp space separate from data disk space. See Configuring Disk Usage to Optimize Performance Prepare the Logical Schema Script Designing a logical schema for an HP Vertica database is no different from designing one for any other SQL database. Details are described more fully in Designing a Logical Schema. To create your logical schema, prepare a SQL script (plain text file, typically with an extension of .sql) that:
  47. 47. Identifying characteristics of data file directory Prepare Data Files Prepare two sets of data files: l Test data files. Use test files to test the database after the partial data load. If possible, use part of the actual data files to prepare the test data files. l Actual data files. Once the database has been tested and optimized, use your data files for your initial Bulk Loading Data. How to Name Data Files Name each data file to match the corresponding table in the logical schema. Case does not matter. Use the extension .tbl or whatever you prefer. For example, if a table is named Stock_Dimension, name the corresponding data file stock_dimension.tbl. When using multiple data files, append _ nnn (where nnn is a positive integer in the range 001 to 999) to the file name. For example, stock_ dimension.tbl_001, stock_dimension.tbl_002, and so on.
  48. 48. Identifying characteristics of data file directory
  49. 49. Documentation Core: ● HP Vertica Architecture White Paper (Key Features) ● HP Vertica 7.1 complete ● HP_Vertica_7.1.x_administrators Guide ● HP Vertica’s Certification Topic List ● Braindumps ● Built-in Pools ● HP2 - N36 Exam Prep Guide ● Vertica Client 32bits ● VNC Portable ● DBeaver ● PuTTY Direct Download ● Host: Port: 22 User: dbadmin Pass: admin ● Para acceder a Vertica > VMart: Ejecutar comando: “/opt/vertica/bin/admintools” ● Tableau (Cliente para extracción de datos). ● JDBC Driver
  50. 50. Documentation pt2 The Following files are located inside the install disc: HP_Vertica_7.1.x_ Administrators Guide HP_Vertica_7.1.x_ Analyzing Data HP_Vertica_7.1.x_ Best Practices for OEM Customers HP_Vertica_7.1.x_ Concepts Guide HP_Vertica_7.1.x_ Connecting To HP Vertica HP_Vertica_7.1.x_ Cpp_SDK_API HP_Vertica_7.1.x_ Distributed_R HP_Vertica_7.1.x_ Error Messages HP_Vertica_7.1.x_ Extending HP Vertica HP_Vertica_7.1.x_ Flex_tables HP_Vertica_7.1.x_ Flex Canonical CEF Parser HP_Vertica_7.1.x_ Flextables Quickstart HP_Vertica_7.1.x_ Getting Started HP_Vertica_7.1.x_ HP Vertica For SQL On Hadoop HP_Vertica_7.1.x_ Informatica_plug-ing_Guide HP_Vertica_7.1.x_ Install_Guide HP_Vertica_7.1.x_ Integrating Apache Hadoop
  51. 51. Documentation pt3 The Following files are located inside the install disc: HP_Vertica_7.1.x_ Java_SDK_API HP_Vertica_7.1.x_ MS_Connectivity_Pack HP_Vertica_7.1.x_ New_Features HP_Vertica_7.1.x_ Place HP_Vertica_7.1.x_ Pulse HP_Vertica_7.1.x_ SQL_Reference_Manual HP_Vertica_7.1.x_ Supported_Platforms HP_Vertica_7.1.x_ Third_Party
  52. 52. Speaker Name | Topic | # Page Layout
  53. 53. FAQs 1. What HDD Format Configuration is recommended for Data and Log Files? 2. What are the TOP Best Practices for Configuration?