SlideShare a Scribd company logo
1 of 35
Vertica Database
A Brief Intro to Vertica Database – Navin Chakraborty
ME 
• Vertica Architect
• Vertica Certified Professional
• Exploring Database engines
Topics
1. What is Vertica
2. Vertica concepts
a. Column storage
b. Hybrid Storage
c. Projections vs / Tables
d. Types of Projections
3. Vertica Objects
a. Projections
b. Views
c. Tables
d. UDF SQL functions
e. Sequences
4. Vertica Operations
a. DML operations ( Simple SQL statements )
b. Bulk Data Loading
c. Bulk Updating ( MERGE statement )
d. Exporting Data
5. Comparison with Teradata
What is Vertica
• Vertica is a True Columnar, MPP, Analytic database
• Vertica is just a DBMS like other DBMS( Oracle, Teradata,
MySQL)
• Vertica is made for storing Structured and semistrucutred data.
• It supports Standard SQL like other Databases
Release and Versions
• The most latest release of Vertica( launched in December
2013) worldwide is Vertica 7.1 and is called Crane
• The version used in our working environments by the Client is
Vertica 6.1.2 which is called Bulldozer( Funny but they seem
to use the heavy lifting automotive names for every major
release )
Vertica Concepts – Column Storage
• Unlike other RDBMS, Vertica reads the columns from database objects
called projections. No resources are wasted by reading large numbers of
unused columns. Every byte of data is used by the execution engine. For
example, consider this simple two-table schema:
Suppose you want to run this query:
SELECT A, C, N
FROM Table1 JOIN Table2
ON H = J;
• A row store must read 16 columns (A through H and J through Q) from
physical storage for each record in the result set. A column store with a
query-specific projection reads only three columns: A, C, and N.
Vertica Concepts – Hybrid Storage
• To support Data Manipulation Language (DML) commands (INSERT, UPDATE, and
DELETE) and bulk load operations (COPY), intermixed with queries in a typical data
warehouse workload, Vertica implements the storage model shown in the
illustration below. This model is the same on each Vertica node.
• WOS – Write optimised Store (memory)
• ROS – Rad Optimised Store(Disk)
• Tuple Mover – Operation to move data
Vertica Concepts –Projection vs Tables
• Projection are nothing but materialized views
same like in Oracle or any other database. The
only difference in Vertica is Data is stored in
Projections and not tables.
• When it comes to understanding this term
quickly, My tagline will be
“Projections are physical and Tables are logical
in Vertica”
Continued..
• So all the data that you store in Tables are
physically stored in Projection for those tables.
and the projections are automatically created
for you by vertica when you create your
tables.
Vertica Concepts – Types of Projections
• Super Projections - A superprojection is a
projection for a single table that contains all
the columns in the table.
• PreJoin Projection - Projections can contain
joins between tables that are connected by
PK/FK constraints. These projections are called
pre-join projections. Pre-join projections can
have only inner joins between tables on their
primary and foreign key columns.
Continued..
• Query specific Projections - You can optimize
your queries by creating one or more
projections that contain only the subset of
table columns required to process the query.
These projections are called query-specific
projections.
Vertica Objects - Projections
• For each table in the database, Vertica requires a
minimum of one projection, called a
superprojection. A superprojection is a projection
for a single table that contains all the columns in
the table.
• To get your database up and running quickly,
Vertica automatically creates a default
superprojection for each table created through
the CREATE TABLE and CREATE TEMPORARY
TABLE statements.
Anatomy of Projection
• When you need to create a projection manually for queries to optimize you can
use CREATE PROJECTI)ON statement
• The CREATE PROJECTIONS statement
Vertica Objects - Views
• Views in Vertica are just like any other
database views they store the query on base
tables and when you select from views the
query is executed on the base table and the
data is retrieved.
• Syntax
CREATE [ OR REPLACE ] VIEW viewname [ (
column-name [, ...] ) ] AS query ]
Continued..
• Example
CREATE VIEW myview AS
SELECT SUM(annual_income), customer_state
FROM public.customer_dimension
WHERE customer_key IN
(SELECT customer_key
FROM store.store_sales_fact)
GROUP BY customer_state
ORDER BY customer_state ASC;
• Dropping A View
DROP VIEW myview;
Vertica Object - Tables
• Tables are logical in Vertica. You can create a
table and drop a table and alter a table , but it
just changes the logical schema of the
database and not the physical schema.
• A single table in vertica can have 1600
Columns
• The data is stored in the projections
corresponding to a particular table and not in
the table.
Continued..
• Syntax
CREATE TABLE [ IF NOT EXISTS ] [[db-name.]schema.]table-name
{
... ( column-definition (table) (page 779) [ , ... ] )
... | [ column-name-list (create table) (page 780) ] AS [COPY] [ [ AT EPOCH LATEST ]
... | [ AT TIME 'timestamp' ] ] [ /*+ direct */ ] query
... | [ LIKE [[db-name.]schema.]existing-table [ INCLUDING PROJECTIONS | EXCLUDING
PROJECTIONS ] ]
} ... [ ORDER BY table-column [ , ... ] ]
... [ ENCODED BY column-definition [ , ... ]
... [ hash-segmentation-clause (page 750)
... | range-segmentation-clause (page 790)
... | UNSEGMENTED { NODE node | ALL NODES } ]
... [ KSAFE [k_num] ]
... [ PARTITION BY partition-clause ]
Continued..
Example
CREATE TABLE Public.Employee_Dimension (
Employee_key integer PRIMARY KEY NOT NULL,
Employee_gender varchar(8) ENCODING RLE,
Employee_title varchar(8),
Employee_first_name varchar(64),
Employee_middle_initial varchar(8),
Employee_last_name varchar(64), )
ORDER BY Employee_gender, Employee_last_name, Employee_first_name;
Example
CREATE TABLE time( ..., date_col date NOT NULL, ...)
PARTITION BY extract('year' FROM date_col);
Vertica Objects - SQL functions
• SQL function is Vertica is where you can embed small sql
functions to use it in your SQL statements and it also
facilitates code re-usability.
• Lets you store SQL expressions as functions in Vertica for
use in queries. These functions are useful for executing
complex queries or combining Vertica built-in functions.
You simply call the function name you assigned.
• A SQL function can be used anywhere in a query where an
ordinary SQL expression can be used, except in the table
partition clause or the projection segmentation clause.
Conitnued..
• Syntax
CREATE [ OR REPLACE ] FUNCTION
... [[db-name.]schema.]function-name ( [ argname argtype [, ...] ] )
... RETURN rettype
... AS
... BEGIN
...... RETURN expression;
... END;
• Example
• This following statement creates a SQL function called myzeroifnull that accepts an
INTEGER argument and returns an INTEGER result.
CREATE FUNCTION myzeroifnull(x INT) RETURN INT
AS BEGIN
RETURN (CASE WHEN (x IS NOT NULL) THEN x ELSE 0 END);
END;
Conitnued..
You can use the new SQL function (myzeroifnull) anywhere you use an ordinary SQL expression. For example, create a simple table:
=> CREATE TABLE tabwnulls(col1 INT);
=> INSERT INTO tabwnulls VALUES(1);
=> INSERT INTO tabwnulls VALUES(NULL);
=> INSERT INTO tabwnulls VALUES(0);
=> SELECT * FROM tabwnulls;
a
---
1
0
(3 rows)
Use the myzeroifnull function in a SELECT statement, where the function calls col1 from table tabwnulls:
=> SELECT myzeroifnull(col1) FROM tabwnulls;
myzeroifnull
--------------
1
0
0
(3 rows)
Use the myzeroifnull function in the GROUP BY clause:
=> SELECT COUNT(*) FROM tabwnulls GROUP BY myzeroifnull(col1);
count
-------
2
1
(2 rows)
If you want to change a SQL function's body, use the CREATE OR REPLACE syntax. The following command modifies the CASE expression:
Vertica Objects - Sequences
• The sequence is a feature by some database products which just creates unique
values. It just increments a value and returns it. The special thing about it is: there
is no transaction isolation, so several transactions can not get the same value, the
incrementation is also not rolled back. Without a database sequence it is very hard
to generate unique incrementing numbers.
Example
• Create a simple sequence:
CREATE SEQUENCE my_seq MAXVALUE 5000 START 150;
CREATE SEQUENCE
Vertica Operation – DML operations
• SQL data manipulation language (DML) commands INSERT, UPDATE, and DELETE
perform the same functions in Vertica as they do in row-oriented databases. These
commands follow the SQL-92 transaction model and can be intermixed.
• DML statements acquire write locks to prevent other READ COMMITTED
transactions from modifying the same data. SELECT statements do not acquire
locks, which prevents read and write statements from conflicting.
• All the DML operations (Insert, Update, Delete) are similar to other row oriented
databases.
Insert
• Inserts values into all projections of a table. You must insert one complete tuple at a
time. By default, Insert first uses the WOS. When the WOS is full, the INSERT overflows
to the ROS.
• Syntax
INSERT [ /*+ direct */ ] [ /*+ label(label-name)*/ ]
... INTO [[db-name.]schema.]table
... [ ( column [, ...] ) ]
... { DEFAULT VALUES
... | VALUES ( { expression | DEFAULT } [, ...] )
... | SELECT...
Examples
• => INSERT INTO t1 VALUES (101, 102, 103, 104);
• => INSERT INTO customer VALUES (10, 'male', 'DPR', 'MA', 35);
• => INSERT INTO retail.t1 (C0, C1) VALUES (1, 1001);
• => INSERT INTO films SELECT * FROM tmp_films WHERE date_prod < '2004-05-07';
Update
• Replaces the values of the specified columns in all rows for which a specific condition is true. All
other columns and rows in the table are unchanged. By default, UPDATE uses the WOS and if the
WOS fills up, overflows to the ROS.
• SelfJoins are not permitted in Updates in Vertica
• UPDATE inserts new records into the WOS and marks the old records for deletion.
• You cannot UPDATE columns that have primary key or foreign key referential integrity constraints.
Syntax
UPDATE [ /*+ direct */ ] [ /*+ label(label-name)*/ ]
... [[db-name.]schema.]table-reference (on page 882) [AS] alias
... SET column =
... { expression | DEFAULT } [ , ... ]
... [ FROM from-list ]
... [ WHERE clause
Example
UPDATE Result_Table r
SET address=n.new_address
FROM New_Addresses n
WHERE r.cust_id = n.new_cust_id;
Delete
• Unlike most databases, the DELETE command in Vertica does not delete data; it marks records as
deleted.
• Marks tuples as no longer valid in the current epoch, marking the records for deletion in the WOS,
rather than deleting data from disk storage. By default, delete uses the WOS and if the WOS fills up,
overflows to the ROS. You cannot delete records from a projection.
• DELETE statements support subqueries and joins, which is useful for deleting values in a table
based on values that are stored in other tables.
Syntax
DELETE [ /*+ direct */ ] [ /*+ label(label-name)*/ ]
... FROM [[db-name.]schema.]table
... WHERE clause
Example
• The following command truncates a temporary table called temp1:
DELETE FROM temp1;
• The following command deletes all records from base table T where C1 = C2 - C1.
DELETE FROM T WHERE C1=C2-C1;
• The following command deletes all records from the customer table in the retail schema where the
state attribute is in MA or NH:
DELETE FROM retail.customer WHERE state IN ('MA', 'NH');
Vertica Operations – Bulk Data
Loading
• The COPY command is designed for bulk load operations and can
load data into the WOS or the ROS.
• In Vertica, the COPY statement is designed for bulk loading data into
the database. COPY reads data from text files or data pipes and
inserts it into WOS (memory) or directly into the ROS (disk). COPY
automatically commits itself and any current transaction but is not
atomic; some rows could be rejected. Note that COPY does not
automatically commit when copying data into temporary tables.
• You can use the COPY statement's NO COMMIT option to prevent
COPY from committing a transaction when it finishes copying data.
You often want to use this option when sequentially running several
COPY statements to ensure the data in the bulk load is either
committed or rolled back at the same time. Also, combining
multiple smaller data loads into a single transaction allows Vertica
to more efficiently load the data.
Conitnued..
Syntax
COPY [ [db-name.]schema-name.]table
... [ ( { column-as-expression | column }
...... [ FILLER datatype ]
...... [ FORMAT 'format' ]
...... [ ENCLOSED BY 'char' ]
...... [ ESCAPE AS 'char' | NO ESCAPE ]
...... [ NULL [ AS ] 'string' ]
...... [ TRIM 'byte' ]
...... [ DELIMITER [ AS ] 'char' ]
... [, ... ] ) ]
... [ COLUMN OPTION ( column
...... [ FORMAT 'format' ]
...... [ ENCLOSED BY 'char' ]
...... [ ESCAPE AS 'char' | NO ESCAPE ]
...... [ NULL [ AS ] 'string' ]
...... [ DELIMITER [ AS ] 'char' ]
... [, ... ] ) ]
FROM { STDIN
...... [ BZIP | GZIP | UNCOMPRESSED ]
...| 'pathToData' [ ON nodename | ON ANY NODE ]
...... [ BZIP | GZIP | UNCOMPRESSED ] [, ...]
...| LOCAL STDIN | 'pathToData'
...... [ BZIP | GZIP | UNCOMPRESSED ] [, ...]
}
...[ NATIVE | NATIVE VARCHAR | FIXEDWIDTH COLSIZES (integer [, ....]) ]
...[ WITH ]
...[ WITH [ SOURCE source(arg='value')] [ FILTER filter(arg='value') ] [ PARSER parser(arg='value') ]]
...[ DELIMITER [ AS ] 'char' ]
...[ TRAILING NULLCOLS ]
...[ NULL [ AS ] 'string' ]
...[ ESCAPE AS 'char' | NO ESCAPE ]
...[ ENCLOSED BY 'char' ]
...[ RECORD TERMINATOR 'string' ]
...[ SKIP records ]
...[ SKIP BYTES integer ]
...[ TRIM 'byte' ]
...[ REJECTMAX integer ]
...[ EXCEPTIONS 'path' [ ON nodename ] [, ...] ]
...[ REJECTED DATA 'path' [ ON nodename ] [, ...] ]
...[ ENFORCELENGTH ]
...[ ABORT ON ERROR ]
...[ AUTO | DIRECT | TRICKLE ]
Conitnued..
• Examples
The following examples load data with the COPY statement using the FORMAT, DELIMITER, NULL and ENCLOSED BY
string options, as well as a DIRECT option. =>
COPY public.customer_dimension (customer_since FORMAT 'YYYY')
FROM STDIN
DELIMITER ','
NULL AS 'null'
ENCLOSED BY '"'
=> COPY a
FROM STDIN
DELIMITER ','
NULL E'N'
DIRECT;
=> COPY store.store_dimension
FROM :input_file
DELIMITER '|'
NULL ''
RECORD TERMINATOR E'f'
Vertica Operations - Merge
• Lets you load a batch of new records while simultaneously updating existing
records by internally combining INSERT and UPDATE SQL operations in one
statement. In a MERGE operation, Vertica replaces the values of the specified
columns in all rows of the target table for which a specific condition is true. All
other columns and rows in the table are unchanged. By default MERGE uses the
WOS, and if the WOS fills up data overflows to the ROS.
• When you write a MERGE statement, you specify both a target and source table.
You also provide a search condition (ON clause), which Vertica uses to evaluate
each row in the source table in order to update or insert into the target table
records from the source table.
• You can use the optional WHEN MATCHED or WHEN NOT MATCHED clauses to
further refine results. For example, if you use:
• WHEN MATCHED THEN UPDATE: Vertica updates (replaces) the values of the
specified columns in all rows when it finds more than one matching row in the
target table for a row in the source table. All other columns and rows in the table
are unchanged. If Vertica finds more than one matching row in the source table for
a row in the target table, it returns a run-time error.
• WHEN NOT MATCHED THEN INSERT: Vertica inserts into the target table all rows
from the source table that do not match any rows in the target table.
Continued..
• Syntax
MERGE [/*+ direct */] INTO [[db-name.]schema.]target-table-name [alias]
... USING [[db-name.]schema.]source-table-name [alias ] ON ( condition )
... [ WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ... ] ]
... [ WHEN NOT MATCHED THEN INSERT ( column1 [, column2 ...])
VALUES ( value1 [, value2 ... ] ) ]
Vertica Operation -Exporting Data
• Vertica supports spooling of data as in Oracle where you
can retrieve all the results of a select query into a file as
specified.
• o parameter start the spooling in Vertica after this option
any thing you select in the database will be redirected onto
the file and not on the standard output.
• Again o will end spooling for that session
Syntax
o /home/tmp/output.txt
Select * from store.store_sales_fact ;
o
Comparision with Teradata
• In Teradata there is a provision of Control Flow of
statements ( IF, GOTO) , Vertica does not support
these statements
• Update query in Vertica is different then in
Teradata
• There is no provision for Qualify function in
Vertica as in Teradata
• Volatile tables in Teradata, Local Temporary tables
in Vertica.
• Update queries cannot have a self join in Vertica
as in Teradata
Conitnued..
• There are no templates/utilites in vertica for
loading,exporting data as in Teradata (
FastLoad,Fastexport), Vertica uses the vsql
terminal for all its operations
Thankyou

More Related Content

What's hot

Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDatabricks
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...Altinity Ltd
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsThomas Sykes
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfMichael Kogan
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsDatabricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 

What's hot (20)

Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Data mesh
Data meshData mesh
Data mesh
 

Similar to Brief Intro to Vertica Database - Key Concepts and Objects

SQL.pptx for the begineers and good know
SQL.pptx for the begineers and good knowSQL.pptx for the begineers and good know
SQL.pptx for the begineers and good knowPavithSingh
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupRemus Rusanu
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningMichael Rys
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slidesmetsarin
 
从 Oracle 合并到 my sql npr 实例分析
从 Oracle 合并到 my sql   npr 实例分析从 Oracle 合并到 my sql   npr 实例分析
从 Oracle 合并到 my sql npr 实例分析YUCHENG HU
 
Php, mysq lpart5(mysql)
Php, mysq lpart5(mysql)Php, mysq lpart5(mysql)
Php, mysq lpart5(mysql)Subhasis Nayak
 
working with database using mysql
working with database using mysql working with database using mysql
working with database using mysql Subhasis Nayak
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sqlŁukasz Grala
 
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfytxjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfytWrushabhShirsat3
 
Implementing Tables and Views.pptx
Implementing Tables and Views.pptxImplementing Tables and Views.pptx
Implementing Tables and Views.pptxLuisManuelUrbinaAmad
 
Getting Started with MySQL I
Getting Started with MySQL IGetting Started with MySQL I
Getting Started with MySQL ISankhya_Analytics
 
Introduction to Oracle Database.pptx
Introduction to Oracle Database.pptxIntroduction to Oracle Database.pptx
Introduction to Oracle Database.pptxSiddhantBhardwaj26
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select TopicsJay Coskey
 
Mysql-overview.pptx
Mysql-overview.pptxMysql-overview.pptx
Mysql-overview.pptxTamilHunt
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfEric Xiao
 

Similar to Brief Intro to Vertica Database - Key Concepts and Objects (20)

Apache TAJO
Apache TAJOApache TAJO
Apache TAJO
 
Views, Triggers, Functions, Stored Procedures, Indexing and Joins
Views, Triggers, Functions, Stored Procedures,  Indexing and JoinsViews, Triggers, Functions, Stored Procedures,  Indexing and Joins
Views, Triggers, Functions, Stored Procedures, Indexing and Joins
 
SQL Views
SQL ViewsSQL Views
SQL Views
 
SQL.pptx for the begineers and good know
SQL.pptx for the begineers and good knowSQL.pptx for the begineers and good know
SQL.pptx for the begineers and good know
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User Group
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance Tuning
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
Sql lite android
Sql lite androidSql lite android
Sql lite android
 
从 Oracle 合并到 my sql npr 实例分析
从 Oracle 合并到 my sql   npr 实例分析从 Oracle 合并到 my sql   npr 实例分析
从 Oracle 合并到 my sql npr 实例分析
 
Php, mysq lpart5(mysql)
Php, mysq lpart5(mysql)Php, mysq lpart5(mysql)
Php, mysq lpart5(mysql)
 
working with database using mysql
working with database using mysql working with database using mysql
working with database using mysql
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
unit-ii.pptx
unit-ii.pptxunit-ii.pptx
unit-ii.pptx
 
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfytxjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
 
Implementing Tables and Views.pptx
Implementing Tables and Views.pptxImplementing Tables and Views.pptx
Implementing Tables and Views.pptx
 
Getting Started with MySQL I
Getting Started with MySQL IGetting Started with MySQL I
Getting Started with MySQL I
 
Introduction to Oracle Database.pptx
Introduction to Oracle Database.pptxIntroduction to Oracle Database.pptx
Introduction to Oracle Database.pptx
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
 
Mysql-overview.pptx
Mysql-overview.pptxMysql-overview.pptx
Mysql-overview.pptx
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 

Brief Intro to Vertica Database - Key Concepts and Objects

  • 1. Vertica Database A Brief Intro to Vertica Database – Navin Chakraborty
  • 2. ME  • Vertica Architect • Vertica Certified Professional • Exploring Database engines
  • 3. Topics 1. What is Vertica 2. Vertica concepts a. Column storage b. Hybrid Storage c. Projections vs / Tables d. Types of Projections 3. Vertica Objects a. Projections b. Views c. Tables d. UDF SQL functions e. Sequences 4. Vertica Operations a. DML operations ( Simple SQL statements ) b. Bulk Data Loading c. Bulk Updating ( MERGE statement ) d. Exporting Data 5. Comparison with Teradata
  • 4. What is Vertica • Vertica is a True Columnar, MPP, Analytic database • Vertica is just a DBMS like other DBMS( Oracle, Teradata, MySQL) • Vertica is made for storing Structured and semistrucutred data. • It supports Standard SQL like other Databases
  • 5. Release and Versions • The most latest release of Vertica( launched in December 2013) worldwide is Vertica 7.1 and is called Crane • The version used in our working environments by the Client is Vertica 6.1.2 which is called Bulldozer( Funny but they seem to use the heavy lifting automotive names for every major release )
  • 6. Vertica Concepts – Column Storage • Unlike other RDBMS, Vertica reads the columns from database objects called projections. No resources are wasted by reading large numbers of unused columns. Every byte of data is used by the execution engine. For example, consider this simple two-table schema: Suppose you want to run this query: SELECT A, C, N FROM Table1 JOIN Table2 ON H = J; • A row store must read 16 columns (A through H and J through Q) from physical storage for each record in the result set. A column store with a query-specific projection reads only three columns: A, C, and N.
  • 7. Vertica Concepts – Hybrid Storage • To support Data Manipulation Language (DML) commands (INSERT, UPDATE, and DELETE) and bulk load operations (COPY), intermixed with queries in a typical data warehouse workload, Vertica implements the storage model shown in the illustration below. This model is the same on each Vertica node. • WOS – Write optimised Store (memory) • ROS – Rad Optimised Store(Disk) • Tuple Mover – Operation to move data
  • 8. Vertica Concepts –Projection vs Tables • Projection are nothing but materialized views same like in Oracle or any other database. The only difference in Vertica is Data is stored in Projections and not tables. • When it comes to understanding this term quickly, My tagline will be “Projections are physical and Tables are logical in Vertica”
  • 9. Continued.. • So all the data that you store in Tables are physically stored in Projection for those tables. and the projections are automatically created for you by vertica when you create your tables.
  • 10. Vertica Concepts – Types of Projections • Super Projections - A superprojection is a projection for a single table that contains all the columns in the table. • PreJoin Projection - Projections can contain joins between tables that are connected by PK/FK constraints. These projections are called pre-join projections. Pre-join projections can have only inner joins between tables on their primary and foreign key columns.
  • 11. Continued.. • Query specific Projections - You can optimize your queries by creating one or more projections that contain only the subset of table columns required to process the query. These projections are called query-specific projections.
  • 12. Vertica Objects - Projections • For each table in the database, Vertica requires a minimum of one projection, called a superprojection. A superprojection is a projection for a single table that contains all the columns in the table. • To get your database up and running quickly, Vertica automatically creates a default superprojection for each table created through the CREATE TABLE and CREATE TEMPORARY TABLE statements.
  • 13. Anatomy of Projection • When you need to create a projection manually for queries to optimize you can use CREATE PROJECTI)ON statement • The CREATE PROJECTIONS statement
  • 14. Vertica Objects - Views • Views in Vertica are just like any other database views they store the query on base tables and when you select from views the query is executed on the base table and the data is retrieved. • Syntax CREATE [ OR REPLACE ] VIEW viewname [ ( column-name [, ...] ) ] AS query ]
  • 15. Continued.. • Example CREATE VIEW myview AS SELECT SUM(annual_income), customer_state FROM public.customer_dimension WHERE customer_key IN (SELECT customer_key FROM store.store_sales_fact) GROUP BY customer_state ORDER BY customer_state ASC; • Dropping A View DROP VIEW myview;
  • 16. Vertica Object - Tables • Tables are logical in Vertica. You can create a table and drop a table and alter a table , but it just changes the logical schema of the database and not the physical schema. • A single table in vertica can have 1600 Columns • The data is stored in the projections corresponding to a particular table and not in the table.
  • 17. Continued.. • Syntax CREATE TABLE [ IF NOT EXISTS ] [[db-name.]schema.]table-name { ... ( column-definition (table) (page 779) [ , ... ] ) ... | [ column-name-list (create table) (page 780) ] AS [COPY] [ [ AT EPOCH LATEST ] ... | [ AT TIME 'timestamp' ] ] [ /*+ direct */ ] query ... | [ LIKE [[db-name.]schema.]existing-table [ INCLUDING PROJECTIONS | EXCLUDING PROJECTIONS ] ] } ... [ ORDER BY table-column [ , ... ] ] ... [ ENCODED BY column-definition [ , ... ] ... [ hash-segmentation-clause (page 750) ... | range-segmentation-clause (page 790) ... | UNSEGMENTED { NODE node | ALL NODES } ] ... [ KSAFE [k_num] ] ... [ PARTITION BY partition-clause ]
  • 18. Continued.. Example CREATE TABLE Public.Employee_Dimension ( Employee_key integer PRIMARY KEY NOT NULL, Employee_gender varchar(8) ENCODING RLE, Employee_title varchar(8), Employee_first_name varchar(64), Employee_middle_initial varchar(8), Employee_last_name varchar(64), ) ORDER BY Employee_gender, Employee_last_name, Employee_first_name; Example CREATE TABLE time( ..., date_col date NOT NULL, ...) PARTITION BY extract('year' FROM date_col);
  • 19. Vertica Objects - SQL functions • SQL function is Vertica is where you can embed small sql functions to use it in your SQL statements and it also facilitates code re-usability. • Lets you store SQL expressions as functions in Vertica for use in queries. These functions are useful for executing complex queries or combining Vertica built-in functions. You simply call the function name you assigned. • A SQL function can be used anywhere in a query where an ordinary SQL expression can be used, except in the table partition clause or the projection segmentation clause.
  • 20. Conitnued.. • Syntax CREATE [ OR REPLACE ] FUNCTION ... [[db-name.]schema.]function-name ( [ argname argtype [, ...] ] ) ... RETURN rettype ... AS ... BEGIN ...... RETURN expression; ... END; • Example • This following statement creates a SQL function called myzeroifnull that accepts an INTEGER argument and returns an INTEGER result. CREATE FUNCTION myzeroifnull(x INT) RETURN INT AS BEGIN RETURN (CASE WHEN (x IS NOT NULL) THEN x ELSE 0 END); END;
  • 21. Conitnued.. You can use the new SQL function (myzeroifnull) anywhere you use an ordinary SQL expression. For example, create a simple table: => CREATE TABLE tabwnulls(col1 INT); => INSERT INTO tabwnulls VALUES(1); => INSERT INTO tabwnulls VALUES(NULL); => INSERT INTO tabwnulls VALUES(0); => SELECT * FROM tabwnulls; a --- 1 0 (3 rows) Use the myzeroifnull function in a SELECT statement, where the function calls col1 from table tabwnulls: => SELECT myzeroifnull(col1) FROM tabwnulls; myzeroifnull -------------- 1 0 0 (3 rows) Use the myzeroifnull function in the GROUP BY clause: => SELECT COUNT(*) FROM tabwnulls GROUP BY myzeroifnull(col1); count ------- 2 1 (2 rows) If you want to change a SQL function's body, use the CREATE OR REPLACE syntax. The following command modifies the CASE expression:
  • 22. Vertica Objects - Sequences • The sequence is a feature by some database products which just creates unique values. It just increments a value and returns it. The special thing about it is: there is no transaction isolation, so several transactions can not get the same value, the incrementation is also not rolled back. Without a database sequence it is very hard to generate unique incrementing numbers. Example • Create a simple sequence: CREATE SEQUENCE my_seq MAXVALUE 5000 START 150; CREATE SEQUENCE
  • 23. Vertica Operation – DML operations • SQL data manipulation language (DML) commands INSERT, UPDATE, and DELETE perform the same functions in Vertica as they do in row-oriented databases. These commands follow the SQL-92 transaction model and can be intermixed. • DML statements acquire write locks to prevent other READ COMMITTED transactions from modifying the same data. SELECT statements do not acquire locks, which prevents read and write statements from conflicting. • All the DML operations (Insert, Update, Delete) are similar to other row oriented databases.
  • 24. Insert • Inserts values into all projections of a table. You must insert one complete tuple at a time. By default, Insert first uses the WOS. When the WOS is full, the INSERT overflows to the ROS. • Syntax INSERT [ /*+ direct */ ] [ /*+ label(label-name)*/ ] ... INTO [[db-name.]schema.]table ... [ ( column [, ...] ) ] ... { DEFAULT VALUES ... | VALUES ( { expression | DEFAULT } [, ...] ) ... | SELECT... Examples • => INSERT INTO t1 VALUES (101, 102, 103, 104); • => INSERT INTO customer VALUES (10, 'male', 'DPR', 'MA', 35); • => INSERT INTO retail.t1 (C0, C1) VALUES (1, 1001); • => INSERT INTO films SELECT * FROM tmp_films WHERE date_prod < '2004-05-07';
  • 25. Update • Replaces the values of the specified columns in all rows for which a specific condition is true. All other columns and rows in the table are unchanged. By default, UPDATE uses the WOS and if the WOS fills up, overflows to the ROS. • SelfJoins are not permitted in Updates in Vertica • UPDATE inserts new records into the WOS and marks the old records for deletion. • You cannot UPDATE columns that have primary key or foreign key referential integrity constraints. Syntax UPDATE [ /*+ direct */ ] [ /*+ label(label-name)*/ ] ... [[db-name.]schema.]table-reference (on page 882) [AS] alias ... SET column = ... { expression | DEFAULT } [ , ... ] ... [ FROM from-list ] ... [ WHERE clause Example UPDATE Result_Table r SET address=n.new_address FROM New_Addresses n WHERE r.cust_id = n.new_cust_id;
  • 26. Delete • Unlike most databases, the DELETE command in Vertica does not delete data; it marks records as deleted. • Marks tuples as no longer valid in the current epoch, marking the records for deletion in the WOS, rather than deleting data from disk storage. By default, delete uses the WOS and if the WOS fills up, overflows to the ROS. You cannot delete records from a projection. • DELETE statements support subqueries and joins, which is useful for deleting values in a table based on values that are stored in other tables. Syntax DELETE [ /*+ direct */ ] [ /*+ label(label-name)*/ ] ... FROM [[db-name.]schema.]table ... WHERE clause Example • The following command truncates a temporary table called temp1: DELETE FROM temp1; • The following command deletes all records from base table T where C1 = C2 - C1. DELETE FROM T WHERE C1=C2-C1; • The following command deletes all records from the customer table in the retail schema where the state attribute is in MA or NH: DELETE FROM retail.customer WHERE state IN ('MA', 'NH');
  • 27. Vertica Operations – Bulk Data Loading • The COPY command is designed for bulk load operations and can load data into the WOS or the ROS. • In Vertica, the COPY statement is designed for bulk loading data into the database. COPY reads data from text files or data pipes and inserts it into WOS (memory) or directly into the ROS (disk). COPY automatically commits itself and any current transaction but is not atomic; some rows could be rejected. Note that COPY does not automatically commit when copying data into temporary tables. • You can use the COPY statement's NO COMMIT option to prevent COPY from committing a transaction when it finishes copying data. You often want to use this option when sequentially running several COPY statements to ensure the data in the bulk load is either committed or rolled back at the same time. Also, combining multiple smaller data loads into a single transaction allows Vertica to more efficiently load the data.
  • 28. Conitnued.. Syntax COPY [ [db-name.]schema-name.]table ... [ ( { column-as-expression | column } ...... [ FILLER datatype ] ...... [ FORMAT 'format' ] ...... [ ENCLOSED BY 'char' ] ...... [ ESCAPE AS 'char' | NO ESCAPE ] ...... [ NULL [ AS ] 'string' ] ...... [ TRIM 'byte' ] ...... [ DELIMITER [ AS ] 'char' ] ... [, ... ] ) ] ... [ COLUMN OPTION ( column ...... [ FORMAT 'format' ] ...... [ ENCLOSED BY 'char' ] ...... [ ESCAPE AS 'char' | NO ESCAPE ] ...... [ NULL [ AS ] 'string' ] ...... [ DELIMITER [ AS ] 'char' ] ... [, ... ] ) ] FROM { STDIN ...... [ BZIP | GZIP | UNCOMPRESSED ] ...| 'pathToData' [ ON nodename | ON ANY NODE ] ...... [ BZIP | GZIP | UNCOMPRESSED ] [, ...] ...| LOCAL STDIN | 'pathToData' ...... [ BZIP | GZIP | UNCOMPRESSED ] [, ...] } ...[ NATIVE | NATIVE VARCHAR | FIXEDWIDTH COLSIZES (integer [, ....]) ] ...[ WITH ] ...[ WITH [ SOURCE source(arg='value')] [ FILTER filter(arg='value') ] [ PARSER parser(arg='value') ]] ...[ DELIMITER [ AS ] 'char' ] ...[ TRAILING NULLCOLS ] ...[ NULL [ AS ] 'string' ] ...[ ESCAPE AS 'char' | NO ESCAPE ] ...[ ENCLOSED BY 'char' ] ...[ RECORD TERMINATOR 'string' ] ...[ SKIP records ] ...[ SKIP BYTES integer ] ...[ TRIM 'byte' ] ...[ REJECTMAX integer ] ...[ EXCEPTIONS 'path' [ ON nodename ] [, ...] ] ...[ REJECTED DATA 'path' [ ON nodename ] [, ...] ] ...[ ENFORCELENGTH ] ...[ ABORT ON ERROR ] ...[ AUTO | DIRECT | TRICKLE ]
  • 29. Conitnued.. • Examples The following examples load data with the COPY statement using the FORMAT, DELIMITER, NULL and ENCLOSED BY string options, as well as a DIRECT option. => COPY public.customer_dimension (customer_since FORMAT 'YYYY') FROM STDIN DELIMITER ',' NULL AS 'null' ENCLOSED BY '"' => COPY a FROM STDIN DELIMITER ',' NULL E'N' DIRECT; => COPY store.store_dimension FROM :input_file DELIMITER '|' NULL '' RECORD TERMINATOR E'f'
  • 30. Vertica Operations - Merge • Lets you load a batch of new records while simultaneously updating existing records by internally combining INSERT and UPDATE SQL operations in one statement. In a MERGE operation, Vertica replaces the values of the specified columns in all rows of the target table for which a specific condition is true. All other columns and rows in the table are unchanged. By default MERGE uses the WOS, and if the WOS fills up data overflows to the ROS. • When you write a MERGE statement, you specify both a target and source table. You also provide a search condition (ON clause), which Vertica uses to evaluate each row in the source table in order to update or insert into the target table records from the source table. • You can use the optional WHEN MATCHED or WHEN NOT MATCHED clauses to further refine results. For example, if you use: • WHEN MATCHED THEN UPDATE: Vertica updates (replaces) the values of the specified columns in all rows when it finds more than one matching row in the target table for a row in the source table. All other columns and rows in the table are unchanged. If Vertica finds more than one matching row in the source table for a row in the target table, it returns a run-time error. • WHEN NOT MATCHED THEN INSERT: Vertica inserts into the target table all rows from the source table that do not match any rows in the target table.
  • 31. Continued.. • Syntax MERGE [/*+ direct */] INTO [[db-name.]schema.]target-table-name [alias] ... USING [[db-name.]schema.]source-table-name [alias ] ON ( condition ) ... [ WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ... ] ] ... [ WHEN NOT MATCHED THEN INSERT ( column1 [, column2 ...]) VALUES ( value1 [, value2 ... ] ) ]
  • 32. Vertica Operation -Exporting Data • Vertica supports spooling of data as in Oracle where you can retrieve all the results of a select query into a file as specified. • o parameter start the spooling in Vertica after this option any thing you select in the database will be redirected onto the file and not on the standard output. • Again o will end spooling for that session Syntax o /home/tmp/output.txt Select * from store.store_sales_fact ; o
  • 33. Comparision with Teradata • In Teradata there is a provision of Control Flow of statements ( IF, GOTO) , Vertica does not support these statements • Update query in Vertica is different then in Teradata • There is no provision for Qualify function in Vertica as in Teradata • Volatile tables in Teradata, Local Temporary tables in Vertica. • Update queries cannot have a self join in Vertica as in Teradata
  • 34. Conitnued.. • There are no templates/utilites in vertica for loading,exporting data as in Teradata ( FastLoad,Fastexport), Vertica uses the vsql terminal for all its operations