SlideShare a Scribd company logo
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica
In Depth
Basic introduction
Samchu Li/ Jan 3rd, 2013
Updated: Samchu Li/ May 23rd, 2014
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Agend
a• History
• Storage Model – compare DSM with NSM, PAX
• Column Store
• Compression
• Projection & record construction
• Joins
• Vertica SQL process (Hybrid Storage Model)
• Flex Zone
• 4C / Availability
• Query Execution Workflow
• Udx
• Eco-system
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica
History
C-Store & Vertica
Samchu Li / Jan 3rd, 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica History
1. Miguel C. Ferreira <<Compression and Query Execution within Column Oruented
Database>> for Master of Engineering in Computer Science and Electrical Engineering at
MIT; June 2005. Where C-Store comes from.
2. MIT’s open source project C-Store. <<C-Store: A Column-oriented DBMS>>, VLDB 2005
3. Vertica was set up in 2005 based on C-Store, Billerica, Massachusetts , US. Co-founder is
Michael Stonebraker.
4. March, 2011, HP acquired Vertica. <<The Vertica Analytic Database: C-Store 7 Years
Later>>, VLDB, 2012
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Michael Stonebraker,SQLServer/Sysbase奠基人。
著名的数据库科学家,他在1992年提出对象关系数据库模型在加州伯克利分
校计算机教授达25年。在此期间他创作了Ingres,Illustra, Cohera, StreamBase
Systems和Vertica等系统。Stonebraker教授也曾担任过Informix的CEO,目
前他是MIT麻省理工学院客席教授。
Stonebraker教授领导了称为Postgres的后Ingres项目。这个项目的成果非常
巨大,在现代数据库的许多方面都做出的大量的贡献。Stonebraker教授还做
出了一件造福全人类的事情,那就是把Postgres放在了BSD版权的保护下。
如今Postgres名字已经变成了PostgreSQL,功能也是日渐强大。
87年左右,Sybase联合了微软,共同开发SQLServer。原始代码的来源与
Ingres有些渊源。后来1994年,两家公司合作终止。此时,两家公司都拥有一
套完全相同的SQLServer代码。可以认为,Stonebraker教授是目前主流数据
库的奠基人。
Ingres(Michael Stonebraker)  Informix (2000 年被 IBM 收购)
 Sybase MS SQLServer (1992年将产品卖给微软)
 NonStop SQL (Tandem 被 Compaq 并购并在 2000 年开始重写,HP2002年收购Compad)  Neoview  SeaQuest
 Postgres Illustra (1997 年被 Informix 收购)
 PostgreSQL
C-Store  Vertica
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
Vertica Market Share
2012
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica
Storage Model
NSM, DSM, PAX
Samchu Li / Jan 3rd, 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
Column Storage
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9
Storage Model in DB
NSM
70s ~ 1985
DSM
1985, <<A decomposition storage paper>>, Copeland and
Khoshafian, SIGMOD
PAX
2001, <<Weaving Relations for Cache Performance>>, Ailamaki,
DeWitt, Hill, Skounakis, VLDB
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10
NSM
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
DSM
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
PAX (Partition Attributes Across) - MonetDB
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13
Why DSM/Columnar DB so quick?
ID(PK,
INT4)
Name(Va
rchar5)
Ag
e(I
NT
2)
0962 Jane 3
0
7658 John 4
5
3859 Jim 2
0
5523 Susan 5
2
… … …
SELECT NAME FROM
TABLEName, Average length 4 BYTE
1.NSM
Row length = 4+5+2 =11 BYTE
100 Million * 11 BYTE /1024=10742.1875KB
1 block = 32KB | 1 block contains =2978 complete records
Block scan = 10742.1875KB/32KB = 336
2. DSM
Length = 4 BYTE
100 Million * 4 BYTE /1024=3906.25KB
1 block = 32KB | 1 block contains = 8192 complete records
Block scan = 123
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
Why DSM/Columnar DB so quick?
0
50
100
150
200
250
300
350
400
Block Scans
NSM
DSM
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
records/block
NSM
DSM
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
Weakness - Scan performance for example
Read-Optimized Databases, In Depth; Allison L. Holloway and David J. DeWitt; 2008, VLDB
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica
Compression
Samchu Li / Jan 3rd, 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
Clustering & Compression
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18
Compression
• Trades I/O for CPU
• Increased column-store opportunities:
• Higher data value locality in column stores
• Techniques such as run length encoding far more useful
• Can use extra space to store multiple copies of data in different sort orders
• Operating Directly on Compressed Data
• I/O - CPU tradeoff is no longer a tradeoff
• Reduces memory–CPU bandwidth requirements
• Opens up possibility of operating on multiple
records at once
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19
Run-length Encoding (RLE)
Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20
Bit-vector Encoding
Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21
Dictionary Encoding
Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22
Frame Of Reference Encoding
+/1, one bit; the max 111=7
Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23
Differential Encoding
100=4
Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24
What Compression Scheme To Use?
Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25
Group iteration
Normally, The DB deal with the records with iteration method once a
time before, But in row-oriented DB, for its storage model NSM, the
cache efficient is low.
But in Row-oriented DB, we could using this method to read more
records one time, like RLE, (100,1,100), 100 hundred records once a
time.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica
Projection & record
construction
Samchu Li / Jan 3rd, 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.27
Logic Schema & Physical Schema
Logic Schema
In traditional database architectures, data is primarily stored in tables. Additionally,
secondary tuning structures such as index and materialized view structures are created
for improved query performance.
Physical Schema
But in contrast, tables do not occupy any physical storage at all in Vertica. Instead,
physical storage consists of collections of table columns called projection.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.28
Projection
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.29
Projection
ID(PK,
INT4)
Name(Va
rchar5)
Ag
e(I
NT
2)
0962 Jane 3
0
7658 John 4
5
3859 Jim 2
0
ID(PK,
INT4)
Name(Va
rchar5)
Ag
e(I
NT
2)
3859 Jim 2
0
5523 Susan 5
2
7658 John 4
5
0962 Jane 3
0
… … …
Name(V
archar5
)
ID(PK,
INT4)
Ag
e(I
NT
2)
Jane 0962 3
0
Jim 3859 2
0
John 7658 4
5
Age(IN
T2)
ID(PK,
INT4)
20 3859
30 0962
45 7658
52 5523
… …
Super
Projection
Projection1
Projection2
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30
Projection & Index
Vertica is designed for Data warehouse/Big Data, no specific design for
single data query.
No Index
In a highly simplified view, you can think of a Vertica projection as a single
level, densely packed, clustered index which stores the actual data values,
is never updated in place, and has no logging. Any “maintenance” such as
merging sorted chunks or purging deleted records is done as automatic
and background activity, not in the path of real-time loads. So yes,
projections are a type of native index if you will, but they are very different
from traditional indexes like Bitmap and Btrees.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31
Query Benefits of Storing Sorted Data
How does Vertica query huge volumes without indexes?
It’s easy… the data is sorted by column value, something we can do
because we wrote both our storage engine and execution engine from
scratch. We don’t store the data by insert order, nor do we limit sorting to
within a set of disk blocks. Instead, we have put significant engineering
effort into keeping the data totally sorted during its entire lifetime in
Vertica. It should be clear how sorted data increases compression ratios
(by putting similar values next to each other in the data stream), but it
might be less obvious at first how we use sorted data to increase query
speed as well.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.32
An example
SELECT stock, price FROM
ticks ORDER BY stock, price;
SELECT stock, price FROM
ticks WHERE stock=’IBM’
ORDER BY price;
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.33
An example
One pass aggregation
SELECT stock, AVG(price)
FROM ticks ORDER BY stock,
price;
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.34
Projection
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.35
How tuples be constructed
Two ways:
1. EM (Early Materialization)
Like the row-oriented databases(where projections are almost always
performed as soon as an attribute is no longer needed) suggest a natural
tuple construction policy: at each point at which a column is accessed,
add the column to an intermediate tuple representation if that column is
needed by some later operator or is included in the set of output columns.
1. Perform an inner join to constructed the
record which the operator needed
2. Then send to the operator to operate on
the real record
Materialization Strategies in a Column-Oriented DBMS, IEEE, 2007
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.36
How tuples be constructed
2. LM (Late Materialization)
a. First, scan the column blocks and output its positions (ordinal offsets
of values within the column)
b. Repeat with other columns to output its positions which satisfy the
operations like WHERE… (these position can take the form of ranges,
lists, or a bitmap)
c. Use position-wise AND operations to intersect the position lists.
d. Finally, re-access these columns and extract the values of records that
satisfy all predicates and stich these values together into output tuples
Materialization Strategies in a Column-Oriented DBMS, IEEE, 2007
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.37
When should tuples be constructed? - EM
Early Materialized – No join
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.38
When should tuples be constructed? -LM
Late Materialized – with Joins
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.39
When should tuples be constructed? -LM
Late Materialized – with Joins
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.40
When should tuples be constructed? -LM
Late Materialized – with Joins
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.41
When should tuples be constructed? -LM
Late Materialized – with Joins
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.42
EM with Joins
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.43
EM with Joins
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.44
LM with Joins
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.45
EM vs LM
Naïve LM join about 2X slower than EM join on typical queries(due to
random I/O)
This number is very dependent on
Amount of memory available
Number of projected attributes
Join cardinality
But Here is some new join algorithms for LM do better:
Invisible Join
Jive/Flash Join
Radix cluster/decluster join
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.46
Pre-join projections
Vertica supports prejoin projections which permit joining the projection’s anchor
table with any number of dimension tables via N:1 joins. This permits a
normalized logical schema, while allowing the physical storage to be
denormalized. The cost of storing physically denormalized data is much less
than in traditional systems because of the available encoding and compression.
Prejoin projections are not used as often in practice as we expected. This is
because Vertica’s execution engine handles joins with small dimension tables
very well (using highly optimized hash and merge join algorithms), so the
benefits of a prejoin for query execution are not as significant as we initially
predicted.
<<The Vertica Analytic Database: C-Store 7 Years Later>>, VLDB, 2012
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.47
Pre-join projections
In the case of joins involving a fact and a large dimension table or
two large fact tables where the join cost is high, most customers
are unwilling to slow down bulk loads to optimize such joins. In addition, joins
during load offer fewer optimization opportunities than joins during query
because the database knows nothing apriori about the data in the load stream.
Pre-join projections can have only inner joins between tables on their primary
and foreign key columns. Outer joins are not allowed.
<<The Vertica Analytic Database: C-Store 7 Years Later>>, VLDB, 2012
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica
Joins
Samchu Li / Jan 3rd, 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.49
Invisible Join
Designed for typical joins when data is modeled using a star schema
One(“Fact”) table is joined with multiple dimension tables
select c_nation, s_nation, d_year,sum(lo_revenue) as revenue
from customer, lineorder, supplier, date
where lo_custkey = c_custkey
and lo_suppkey = s_suppkey
and lo_orderdate = d_datekey
and c_region = 'ASIA‘
and s_region = 'ASIA‘
and d_year >= 1992 and d_year <= 1997
group by c_nation, s_nation, d_year
order by d_year asc, revenue desc;
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.50
Invisible Join
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.51
Invisible Join
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.52
Invisible Join
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.53
Invisible Join
Bottom Line
Many data warehouses model data using star/snowflake schemes
Joins of one (fact) table with many dimension tables is common
Invisible join takes advantage of this by making sure that the table that can be accessed in
position order is the fact table for each join
Position lists from the fact table are then intersected (in position order)
This reduces the amount of data that must be accessed out of order from the dimension
tables
“Between-predicate rewriting” trick not relevant for this discussion
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.54
Invisible Join
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.55
Jive/Flash Join
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.56
Jive/Flash Join
Bottom Line
Instead of probing projected columns from inner table out of order:
• Sort join index
• Probe projected columns in order
• Sort result using an added column
LM vs EM tradeoffs:
LM has the extra sorts (EM accesses all columns in order)
LM only has to fit join columns into memory (EM needs join columns and all projected
columns)
• § Results in big memory and CPU savings (see part 3 for why there is CPU savings)
LM only has to materialize relevant columns
In many cases LM advantages outweigh disadvantages
LM would be a clear winner if not for those pesky sorts …
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.57
Radix Cluster/Decluster
The full sort from the Jive join is actually overkill
We just want to access the storage blocks in order (we don’t mind random access within a
block)
So do a radix sort and stop early
By stopping early, data within each block is accessed out of order, but in the order
specified in the original join index
• Use this pseudo-order to accelerate the post-probe sort as well
Radix Sort
将所有待比较数值(正整数)统一为同样的数位长度,数位较短的数前面补零。
然后,从最低位开始,依次进行一次排序。这样从最低位排序一直到最高位排序
完成以后, 数列就变成一个有序序列。
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.58
Radix Sort Example - LSD
LSD (Least Significant Digital); positive integer
From the right side.
73, 22, 93, 43, 55, 14, 28, 65, 390, 81
First, from units digit
0 1 2 3 4 5 6 7 8 9
073
022
093
043
055
014
028
065
390
081
390,081,022,073,093,043,014,055,0
65,028
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.59
Radix Sort Example - LSD
Second round, sort tens digit
390,081,022,073,093,043,014,055,065,028
0 1 2 3 4 5 6 7 8 9
390
081
022
073
093
043
014
055
065
028
014,022,028,043,055,065,073,081,3
90,093
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.60
Radix Sort Example - LSD
The last round, sort hundreds digit
014,022,028,043,055,065,073,081,390,093
0 1 2 3 4 5 6 7 8 9
014
022
028
043
055
065
073
081
390
093
014,022,028,043,055,065,073,081,0
93,390
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.61
Radix Sort Example - MSD
MSD (Most Significant Digital); positive integer
From the left side, suitable for the case have large digits.
The Process is the same.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.62
Radix Cluster/Decluster
Bottom line
Both sorts from the Jive join can be significantly reduced in overhead
Only been tested when there is sufficient memory for the entire join index
to be stored three times
Technique is likely applicable to larger join indexes, but utility will go down a little
Only works if random access within a storage block
Don’t want to use radix cluster/decluster if you have variablewidth column values or
compression schemes that can only be decompressed starting from the beginning of the
block
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.63
Tuple Construction Heuristics
For queries with selective predicates, aggregations, or compressed data,
use late materialization
For joins:
Research papers:
Always use late materialization
Commercial systems:
Inner table to a join often materialized before join (reduces system complexity):
Some systems will use LM only if columns from inner table can fit entirely in memory
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.64
Query Optimization
Almost, all the query optimization is auto, or by using Vertica Database
Designer, we can do nothing, three generations:
1. StarOpt
Only optimize the Data Warehouse style queries like Star and Snow.
2. StarifiedOpt
Add non-start/snow queries optimization ability
3.V2Opt
Optimized the Query Optimization, add the ability for a set of extensible
modules, new algorithms for using the statistics …
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica SQL process
(Hybrid Storage Model)
Samchu Li / Jan 3rd, 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.66
Vertica SQL process (Hybrid Storage Model)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.67
Disk Physical Structure
http://blog.163.com/sonyericssonss/blog/static/109683969200911233723670/ 硬盘结构详细易懂图解讲解
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.68
I/O – Sequential I/O & Random I/O
连续和随机,是指本次I/O给出的初始扇区地址和上一次I/O的结束扇区地址一致. 是不是完全连续
的,或者相隔不多的,如果是,则本次I/O应该算是一个连续I/O, 如果相差太大,则算一次随机I/O.
连续I/O,因为本次初始扇区和上次结束扇区相隔很近,则磁头几乎不用换道或者换道时间极短;如
果相差太大,则磁头需要很长的换道时间,如果随机I/O很多,导致磁头不停的换道,效率大大降
低。
优化数据库性能最重要的一个方面是调整I/O性能,一般来说,15000转的服务器硬盘也就是能提供
75个左右的不连续(随机)的I/O的操作和150个连续的I/O操作。一般这种磁盘的标称传输速率在
100MB/S, 但是实际上影响和限制数据库服务器的传输率是每秒的 75/150 I/O.
假设,一次I/O的操作数据块大小是8KB, 则:
每秒75次的随机I/O操作 * 8KB = 600KB/S
每秒150次的连续I/O操作*8KB = 1200KB/S 跟标称传输100MB/S相差巨大
实际上不是这样,每次I/O的数据操作不会这么小,同时还有预读,硬盘缓存等机制。
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.69
B+ Tree
If one insert happens in the last leaf node, it is fast, for it is Sequential IO.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.70
B+ Tree
But if there is an write operation like update, insert, delete, need to read
many leaf node, many Random IO happens, it will waste time for disk seek
time, low efficient.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.71
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.72
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.73
How to avoid this problem?
1. Give up for some read performance
a. COLA (Cached-Oblivious Look Ahead Array) tokuDB
b. LSM Tree (Log-structured merge Tree) Vertica,cassandra,hbase,bdb
java editon,levelDB etc.
2. Memory / SSD
The Design and Implementation of a Log-Structure File System, 1996
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.74
LSM tree
The Design and Implementation of a Log-Structure File System, 1996
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.75
LSM tree
The Design and Implementation of a Log-Structure File System, 1996
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.76
Vertica SQL process (Hybrid Storage Model)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.77
WOS
INSERT
COPY
DELETE
UPDATE
WOS(Memor
y)
Small
Data
Data in the WOS is
solely in memory,
where column or
row oriented
doesn’t matter.
Cust Price
Andrew $100.00
Cust Price
Andrew $98.00
Cust Price
Nga $90.00
Chuck $87.00
Merge
When using WOS, directly put
the data into WOS(memory), no
need sorting, clustering,
compressing
ID Cust
1,2 Andrew
3 Chuck
4 Nga
ID Price
1 $98.00
2 $100.00
3 $87.00
4 $90.00
Tuple Mover: Moveout ROS
Cust Price
Andrew $98.00
… …
Merge multiple small files into large ones with
sorting, clustering, compressing
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.78
ROS
Slow, need sorting, clustering, compressing and so on… …
In Real World, prefer WOS, it is fast, suitable for large data be dived into
large batches small job, or modify the parameter MoveOutInterval and
MoveOutSizePct to let the WOS moveout the data quickly. But, be
careful, could cause the Vertica down if your job workload is heavy, and
WOS is out of memory.
INSERT
COPY
DELETE
UPDATE
Large Data
ROS
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.79
Data Modifications and Delete Vectors
Data in Vertica is never modified in place. When a tuple
is deleted or updated from either the WOS or ROS, Vertica
creates a delete vector. A delete vector is a list of positions of
rows that have been deleted. Delete vectors are stored in the
same format as user data: they are first written to a DVWOS
in memory, then moved to DVROS containers on disk by the
tuple mover and stored using efficient compression mechanisms. There may be
multiple delete vectors for the WOS and multiple delete vectors for any
particular ROS container. SQL UPDATE is supported by deleting the row being
updated and then inserting a row containing the updated column values
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.80
Histories Query – EPOCH
An epoch is associated with each COMMIT
the current_epoch at the time of the COMMIT is the epoch for that load.
Vertica supports historical queries, though it's not a common use case for
most customers. You can only query epochs that are after the current
AHM, which is kept aggressively current by default. Deleted data prior to
the AHM (Ancient History Mark) is eligible for being purged when a
mergeout or explicit purge happens. After it's purged, delete vectors no
longer need to be maintained. The Last Good Epoch is the epoch at which
all data has been written from WOS to ROS. Any data after the LGE will be
lost if the cluster shuts down abnormally from something like a power
loss or a set of exceptions across multiple nodes. Refresh Epoch - don't
worry about it, it doesn't get referenced in practice.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.81
Example
dbadmin=> select current_epoch from system;
current_epoch
---------------
44
(1 row)
==================================================
====
dbadmin=> insert into A values(1); commit;
OUTPUT
--------
1
(1 row)
COMMIT
==================================================
====
dbadmin=> select current_epoch from system;
current_epoch
---------------
45
dbadmin=> insert into A values(2); commit;
OUTPUT
--------
1
(1 row)
==================================================
====
dbadmin=> at epoch 46 select * from A;
i
---
1
2
(2 rows)
==================================================
====
dbadmin=> at epoch 45 select * from A;
i
---
1
(1 row)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.82
Example
dbadmin=> select make_ahm_now();
make_ahm_now
-----------------------------
AHM set (New AHM Epoch: 46)
(1 row)
==================================================
====
dbadmin=> at epoch 45 select * from A;
ERROR 2318: Can't run historical queries at epochs prior to
the Ancient History Mark
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Flex Zone
Samchu Li / May 23rd, 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.84
Flex Zone – New with 7.0!
Easily load, explore, analyze and monetize semi-structured data such as
text, videos, call records
More information in Loading Data module
Vertica Analytics
Flex Zone Tables
Store
and
Explore
Columnar Tables
Daily Analytics
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica
4C / Availablity
Samchu Li / Jan 3rd, 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.86
4C
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.87
K-Safety – Clustering/MPP
Your database must have a minimum number of nodes to be able to have a K-
safety level greater than zero.
Note: Vertica does not officially support values of K higher than 2.
K-level Number of Nodes
Required
0 1+
1 3+
2 5+
K (K+1)/2
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.88
K-Safety
K=1
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.89
Projection
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.90
Segmentation & Partition
Segmentation
Here the Segmentation = Neoview’s Partition
Hash Segmentation
Range Segmentation
Partition
Means on a single node, you still could divide this segmentation tabel into
different parts to improve the performance.
Range.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.91
Segmentation & Partition
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica
Query Execution
Workflow
Samchu Li / Jan 3rd, 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.93
- MPP
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.94
- MPP
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica
UDx
Samchu Li / Jan 3rd, 2013
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.96
UDx
User Defined Extension (UDx) refers to all extensions to Vertica developed
using the APIs in the Vertica SDK.
UDF – User Defined Functions, five types:
• User Defined Scalar Functions (UDSFs)
• User Defined Transform Functions (UDTFs)
• User Defined Aggregate Functions (UDAF)
• User Defined Analytic Functions (UDAnF)
• The User Defined Load (UDL)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.97
Fenced Mode
Fenced mode runs UDx code outside of the main Vertica process in a separate
zygote process. UDx code that crashes while running in fenced mode does not
impact the core Vertica process. There is a small performance impact when
running UDx code in fenced mode. On average, using fenced mode adds about
10% more time to execution compared to unfenced mode.
Zygote process
The Vertica zygote process starts when Vertica starts. Each node has a single
zygote process. Side processes are created "on demand". The zygote listens for
requests and spawns a UDx side session that runs the UDx in fenced mode
when a UDx is called by the user.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.98
Vertica R
User Defined Functions developed in R always run in Fenced Mode in a
process outside of the main Vertica process.
You can create Scalar Functions and Transform Functions using the R
language. Other UDx types are not supported with the R language.
R Packages
The Vertica R Language Pack includes the following R packages in addition to
the default packages bundled with R:
• Rcpp
• RInside
• IpSolve
• lpSolveAPI
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.99
Vertica R
The R programming language is fast gaining popularity among data
scientists to perform statistical analyses. It is extensible and has a large
community of users, many of whom contribute packages to extend its
capabilities. However, it is single-threaded and limited by the amount of
RAM on the machine it is running on, which makes it challenging to run R
programs on big data.
There are efforts under way to remedy this situation, which essentially fall
into one of the following two categories:
• Integrate R into a parallel database, or
• Parallelize R so it can process big data
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.100
Running multiple instances of the R algorithm
in parallel (query partitioned data)
The first major performance benefit from Vertica R implementation has to
do with running multiple instances of the R algorithm in parallel with
queries that chunk the data independently.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.101
Running multiple instances of the R algorithm
in parallel (query partitioned data)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.102
Leveraging column-store technology for
optimized data exchange (query non-
partitioned data)
It is important to note that even for non-data parallel tasks (functions that
operate on input that is basically one big chunk of non-partitioned data) ,
Vertica’s implementation provides better performance since computation
runs on a server instead of client, and we have optimized data flow
between DB and R (no need to parse data again).
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.103
Leveraging column-store technology for
optimized data exchange (query non-
partitioned data)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.104
Leveraging column-store technology for
optimized data exchange (query non-
partitioned data)
As the chart above indicates performance improvements are also
achieved by the optimizing the data transfers between Vertica and
R. Since Vertica is a column store and R is vector based it is very efficient
to move data from a Vertica column in very large blocks to R vectors.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.105
Vertica eco-system
Unstructured data Vertica Build-in engine
RDBMS
Index
Store/idx/table
ETL
Hadoop/pig/HDFS/Hc
atalog connector
Flex
Table
Support
Reporting
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you

More Related Content

What's hot

Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Someshwar Kale
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
DataWorks Summit
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
DataWorks Summit
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
OReillyStrata
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
Julian Hyde
 
HPE Vertica_7.0.x Administrators Guide
HPE Vertica_7.0.x Administrators GuideHPE Vertica_7.0.x Administrators Guide
HPE Vertica_7.0.x Administrators Guide
Andrey Karpov
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
 
PGConf.ASIA 2019 Bali - Partitioning in PostgreSQL - Amit Langote
PGConf.ASIA 2019 Bali -  Partitioning in PostgreSQL - Amit LangotePGConf.ASIA 2019 Bali -  Partitioning in PostgreSQL - Amit Langote
PGConf.ASIA 2019 Bali - Partitioning in PostgreSQL - Amit Langote
Equnix Business Solutions
 
Data organization: hive meetup
Data organization: hive meetupData organization: hive meetup
Data organization: hive meetup
t3rmin4t0r
 
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015
PivotalOpenSourceHub
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
enissoz
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
DataWorks Summit/Hadoop Summit
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PivotalOpenSourceHub
 
SparkR best practices for R data scientist
SparkR best practices for R data scientistSparkR best practices for R data scientist
SparkR best practices for R data scientist
DataWorks Summit
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handyPraveen Sripati
 
Performance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storagePerformance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storage
DataWorks Summit
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
t3rmin4t0r
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
pivotalny
 

What's hot (20)

Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
HPE Vertica_7.0.x Administrators Guide
HPE Vertica_7.0.x Administrators GuideHPE Vertica_7.0.x Administrators Guide
HPE Vertica_7.0.x Administrators Guide
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
 
PGConf.ASIA 2019 Bali - Partitioning in PostgreSQL - Amit Langote
PGConf.ASIA 2019 Bali -  Partitioning in PostgreSQL - Amit LangotePGConf.ASIA 2019 Bali -  Partitioning in PostgreSQL - Amit Langote
PGConf.ASIA 2019 Bali - Partitioning in PostgreSQL - Amit Langote
 
Data organization: hive meetup
Data organization: hive meetupData organization: hive meetup
Data organization: hive meetup
 
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
 
SparkR best practices for R data scientist
SparkR best practices for R data scientistSparkR best practices for R data scientist
SparkR best practices for R data scientist
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
 
Performance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storagePerformance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storage
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 

Similar to Vertica

Things learned from OpenWorld 2013
Things learned from OpenWorld 2013Things learned from OpenWorld 2013
Things learned from OpenWorld 2013
Connor McDonald
 
Cloudera's Original Pitch Deck from 2008
Cloudera's Original Pitch Deck from 2008Cloudera's Original Pitch Deck from 2008
Cloudera's Original Pitch Deck from 2008
Accel
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overview
Keshav Murthy
 
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitecturesSQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitecturesPolish SQL Server User Group
 
Tcod a framework for the total cost of big data - december 6 2013 - winte...
Tcod   a framework for the total cost of big data  - december 6 2013  - winte...Tcod   a framework for the total cost of big data  - december 6 2013  - winte...
Tcod a framework for the total cost of big data - december 6 2013 - winte...
Richard Winter
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
Skillspeed
 
HP flash optimized storage - webcast
HP flash optimized storage - webcastHP flash optimized storage - webcast
HP flash optimized storage - webcast
Calvin Zito
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
Inside Analysis
 
Oracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud InfrastructureOracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud Infrastructure
SinanPetrusToma
 
Oracle Database 12c para la comunidad GeneXus - Engineered for clouds
Oracle Database 12c para la comunidad GeneXus - Engineered for cloudsOracle Database 12c para la comunidad GeneXus - Engineered for clouds
Oracle Database 12c para la comunidad GeneXus - Engineered for clouds
GeneXus
 
Guob consolidation implementation11gr2
Guob consolidation implementation11gr2Guob consolidation implementation11gr2
Guob consolidation implementation11gr2Rodrigo Almeida
 
EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
solarisyougood
 
Snowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSnowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern Analytics
Senturus
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewPaulo Fagundes
 
NoSQL and MySQL
NoSQL and MySQLNoSQL and MySQL
NoSQL and MySQL
Ted Wennmark
 
Slides pentaho-hadoop-weka
Slides pentaho-hadoop-wekaSlides pentaho-hadoop-weka
Slides pentaho-hadoop-wekalucboudreau
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
Maria Colgan
 

Similar to Vertica (20)

Things learned from OpenWorld 2013
Things learned from OpenWorld 2013Things learned from OpenWorld 2013
Things learned from OpenWorld 2013
 
Cloudera's Original Pitch Deck from 2008
Cloudera's Original Pitch Deck from 2008Cloudera's Original Pitch Deck from 2008
Cloudera's Original Pitch Deck from 2008
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overview
 
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitecturesSQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
SQLDay2013_MarcinSzeliga_SQLServer2012FastTrackDWReferenceArchitectures
 
Tcod a framework for the total cost of big data - december 6 2013 - winte...
Tcod   a framework for the total cost of big data  - december 6 2013  - winte...Tcod   a framework for the total cost of big data  - december 6 2013  - winte...
Tcod a framework for the total cost of big data - december 6 2013 - winte...
 
Delphix2
Delphix2Delphix2
Delphix2
 
Delphix2
Delphix2Delphix2
Delphix2
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
HP flash optimized storage - webcast
HP flash optimized storage - webcastHP flash optimized storage - webcast
HP flash optimized storage - webcast
 
Delphix
DelphixDelphix
Delphix
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 
Oracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud InfrastructureOracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud Infrastructure
 
Oracle Database 12c para la comunidad GeneXus - Engineered for clouds
Oracle Database 12c para la comunidad GeneXus - Engineered for cloudsOracle Database 12c para la comunidad GeneXus - Engineered for clouds
Oracle Database 12c para la comunidad GeneXus - Engineered for clouds
 
Guob consolidation implementation11gr2
Guob consolidation implementation11gr2Guob consolidation implementation11gr2
Guob consolidation implementation11gr2
 
EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
 
Snowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSnowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern Analytics
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overview
 
NoSQL and MySQL
NoSQL and MySQLNoSQL and MySQL
NoSQL and MySQL
 
Slides pentaho-hadoop-weka
Slides pentaho-hadoop-wekaSlides pentaho-hadoop-weka
Slides pentaho-hadoop-weka
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
 

Vertica

  • 1. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Vertica In Depth Basic introduction Samchu Li/ Jan 3rd, 2013 Updated: Samchu Li/ May 23rd, 2014
  • 2. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Agend a• History • Storage Model – compare DSM with NSM, PAX • Column Store • Compression • Projection & record construction • Joins • Vertica SQL process (Hybrid Storage Model) • Flex Zone • 4C / Availability • Query Execution Workflow • Udx • Eco-system
  • 3. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Vertica History C-Store & Vertica Samchu Li / Jan 3rd, 2013
  • 4. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Vertica History 1. Miguel C. Ferreira <<Compression and Query Execution within Column Oruented Database>> for Master of Engineering in Computer Science and Electrical Engineering at MIT; June 2005. Where C-Store comes from. 2. MIT’s open source project C-Store. <<C-Store: A Column-oriented DBMS>>, VLDB 2005 3. Vertica was set up in 2005 based on C-Store, Billerica, Massachusetts , US. Co-founder is Michael Stonebraker. 4. March, 2011, HP acquired Vertica. <<The Vertica Analytic Database: C-Store 7 Years Later>>, VLDB, 2012
  • 5. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Michael Stonebraker,SQLServer/Sysbase奠基人。 著名的数据库科学家,他在1992年提出对象关系数据库模型在加州伯克利分 校计算机教授达25年。在此期间他创作了Ingres,Illustra, Cohera, StreamBase Systems和Vertica等系统。Stonebraker教授也曾担任过Informix的CEO,目 前他是MIT麻省理工学院客席教授。 Stonebraker教授领导了称为Postgres的后Ingres项目。这个项目的成果非常 巨大,在现代数据库的许多方面都做出的大量的贡献。Stonebraker教授还做 出了一件造福全人类的事情,那就是把Postgres放在了BSD版权的保护下。 如今Postgres名字已经变成了PostgreSQL,功能也是日渐强大。 87年左右,Sybase联合了微软,共同开发SQLServer。原始代码的来源与 Ingres有些渊源。后来1994年,两家公司合作终止。此时,两家公司都拥有一 套完全相同的SQLServer代码。可以认为,Stonebraker教授是目前主流数据 库的奠基人。 Ingres(Michael Stonebraker)  Informix (2000 年被 IBM 收购)  Sybase MS SQLServer (1992年将产品卖给微软)  NonStop SQL (Tandem 被 Compaq 并购并在 2000 年开始重写,HP2002年收购Compad)  Neoview  SeaQuest  Postgres Illustra (1997 年被 Informix 收购)  PostgreSQL C-Store  Vertica
  • 6. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6 Vertica Market Share 2012
  • 7. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Vertica Storage Model NSM, DSM, PAX Samchu Li / Jan 3rd, 2013
  • 8. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 Column Storage
  • 9. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9 Storage Model in DB NSM 70s ~ 1985 DSM 1985, <<A decomposition storage paper>>, Copeland and Khoshafian, SIGMOD PAX 2001, <<Weaving Relations for Cache Performance>>, Ailamaki, DeWitt, Hill, Skounakis, VLDB
  • 10. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10 NSM
  • 11. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11 DSM
  • 12. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12 PAX (Partition Attributes Across) - MonetDB
  • 13. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13 Why DSM/Columnar DB so quick? ID(PK, INT4) Name(Va rchar5) Ag e(I NT 2) 0962 Jane 3 0 7658 John 4 5 3859 Jim 2 0 5523 Susan 5 2 … … … SELECT NAME FROM TABLEName, Average length 4 BYTE 1.NSM Row length = 4+5+2 =11 BYTE 100 Million * 11 BYTE /1024=10742.1875KB 1 block = 32KB | 1 block contains =2978 complete records Block scan = 10742.1875KB/32KB = 336 2. DSM Length = 4 BYTE 100 Million * 4 BYTE /1024=3906.25KB 1 block = 32KB | 1 block contains = 8192 complete records Block scan = 123
  • 14. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14 Why DSM/Columnar DB so quick? 0 50 100 150 200 250 300 350 400 Block Scans NSM DSM 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 records/block NSM DSM
  • 15. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15 Weakness - Scan performance for example Read-Optimized Databases, In Depth; Allison L. Holloway and David J. DeWitt; 2008, VLDB
  • 16. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Vertica Compression Samchu Li / Jan 3rd, 2013
  • 17. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17 Clustering & Compression
  • 18. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18 Compression • Trades I/O for CPU • Increased column-store opportunities: • Higher data value locality in column stores • Techniques such as run length encoding far more useful • Can use extra space to store multiple copies of data in different sort orders • Operating Directly on Compressed Data • I/O - CPU tradeoff is no longer a tradeoff • Reduces memory–CPU bandwidth requirements • Opens up possibility of operating on multiple records at once
  • 19. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19 Run-length Encoding (RLE) Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
  • 20. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20 Bit-vector Encoding Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
  • 21. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21 Dictionary Encoding Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
  • 22. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22 Frame Of Reference Encoding +/1, one bit; the max 111=7 Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
  • 23. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23 Differential Encoding 100=4 Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
  • 24. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24 What Compression Scheme To Use? Integrating Compression and Execution in Column-Oriented Database Systems, SIGMOD, 2006
  • 25. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25 Group iteration Normally, The DB deal with the records with iteration method once a time before, But in row-oriented DB, for its storage model NSM, the cache efficient is low. But in Row-oriented DB, we could using this method to read more records one time, like RLE, (100,1,100), 100 hundred records once a time.
  • 26. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Vertica Projection & record construction Samchu Li / Jan 3rd, 2013
  • 27. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.27 Logic Schema & Physical Schema Logic Schema In traditional database architectures, data is primarily stored in tables. Additionally, secondary tuning structures such as index and materialized view structures are created for improved query performance. Physical Schema But in contrast, tables do not occupy any physical storage at all in Vertica. Instead, physical storage consists of collections of table columns called projection.
  • 28. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.28 Projection
  • 29. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.29 Projection ID(PK, INT4) Name(Va rchar5) Ag e(I NT 2) 0962 Jane 3 0 7658 John 4 5 3859 Jim 2 0 ID(PK, INT4) Name(Va rchar5) Ag e(I NT 2) 3859 Jim 2 0 5523 Susan 5 2 7658 John 4 5 0962 Jane 3 0 … … … Name(V archar5 ) ID(PK, INT4) Ag e(I NT 2) Jane 0962 3 0 Jim 3859 2 0 John 7658 4 5 Age(IN T2) ID(PK, INT4) 20 3859 30 0962 45 7658 52 5523 … … Super Projection Projection1 Projection2
  • 30. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30 Projection & Index Vertica is designed for Data warehouse/Big Data, no specific design for single data query. No Index In a highly simplified view, you can think of a Vertica projection as a single level, densely packed, clustered index which stores the actual data values, is never updated in place, and has no logging. Any “maintenance” such as merging sorted chunks or purging deleted records is done as automatic and background activity, not in the path of real-time loads. So yes, projections are a type of native index if you will, but they are very different from traditional indexes like Bitmap and Btrees.
  • 31. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31 Query Benefits of Storing Sorted Data How does Vertica query huge volumes without indexes? It’s easy… the data is sorted by column value, something we can do because we wrote both our storage engine and execution engine from scratch. We don’t store the data by insert order, nor do we limit sorting to within a set of disk blocks. Instead, we have put significant engineering effort into keeping the data totally sorted during its entire lifetime in Vertica. It should be clear how sorted data increases compression ratios (by putting similar values next to each other in the data stream), but it might be less obvious at first how we use sorted data to increase query speed as well.
  • 32. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.32 An example SELECT stock, price FROM ticks ORDER BY stock, price; SELECT stock, price FROM ticks WHERE stock=’IBM’ ORDER BY price;
  • 33. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.33 An example One pass aggregation SELECT stock, AVG(price) FROM ticks ORDER BY stock, price;
  • 34. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.34 Projection
  • 35. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.35 How tuples be constructed Two ways: 1. EM (Early Materialization) Like the row-oriented databases(where projections are almost always performed as soon as an attribute is no longer needed) suggest a natural tuple construction policy: at each point at which a column is accessed, add the column to an intermediate tuple representation if that column is needed by some later operator or is included in the set of output columns. 1. Perform an inner join to constructed the record which the operator needed 2. Then send to the operator to operate on the real record Materialization Strategies in a Column-Oriented DBMS, IEEE, 2007
  • 36. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.36 How tuples be constructed 2. LM (Late Materialization) a. First, scan the column blocks and output its positions (ordinal offsets of values within the column) b. Repeat with other columns to output its positions which satisfy the operations like WHERE… (these position can take the form of ranges, lists, or a bitmap) c. Use position-wise AND operations to intersect the position lists. d. Finally, re-access these columns and extract the values of records that satisfy all predicates and stich these values together into output tuples Materialization Strategies in a Column-Oriented DBMS, IEEE, 2007
  • 37. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.37 When should tuples be constructed? - EM Early Materialized – No join
  • 38. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.38 When should tuples be constructed? -LM Late Materialized – with Joins
  • 39. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.39 When should tuples be constructed? -LM Late Materialized – with Joins
  • 40. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.40 When should tuples be constructed? -LM Late Materialized – with Joins
  • 41. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.41 When should tuples be constructed? -LM Late Materialized – with Joins
  • 42. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.42 EM with Joins
  • 43. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.43 EM with Joins
  • 44. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.44 LM with Joins
  • 45. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.45 EM vs LM Naïve LM join about 2X slower than EM join on typical queries(due to random I/O) This number is very dependent on Amount of memory available Number of projected attributes Join cardinality But Here is some new join algorithms for LM do better: Invisible Join Jive/Flash Join Radix cluster/decluster join
  • 46. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.46 Pre-join projections Vertica supports prejoin projections which permit joining the projection’s anchor table with any number of dimension tables via N:1 joins. This permits a normalized logical schema, while allowing the physical storage to be denormalized. The cost of storing physically denormalized data is much less than in traditional systems because of the available encoding and compression. Prejoin projections are not used as often in practice as we expected. This is because Vertica’s execution engine handles joins with small dimension tables very well (using highly optimized hash and merge join algorithms), so the benefits of a prejoin for query execution are not as significant as we initially predicted. <<The Vertica Analytic Database: C-Store 7 Years Later>>, VLDB, 2012
  • 47. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.47 Pre-join projections In the case of joins involving a fact and a large dimension table or two large fact tables where the join cost is high, most customers are unwilling to slow down bulk loads to optimize such joins. In addition, joins during load offer fewer optimization opportunities than joins during query because the database knows nothing apriori about the data in the load stream. Pre-join projections can have only inner joins between tables on their primary and foreign key columns. Outer joins are not allowed. <<The Vertica Analytic Database: C-Store 7 Years Later>>, VLDB, 2012
  • 48. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Vertica Joins Samchu Li / Jan 3rd, 2013
  • 49. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.49 Invisible Join Designed for typical joins when data is modeled using a star schema One(“Fact”) table is joined with multiple dimension tables select c_nation, s_nation, d_year,sum(lo_revenue) as revenue from customer, lineorder, supplier, date where lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_orderdate = d_datekey and c_region = 'ASIA‘ and s_region = 'ASIA‘ and d_year >= 1992 and d_year <= 1997 group by c_nation, s_nation, d_year order by d_year asc, revenue desc;
  • 50. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.50 Invisible Join
  • 51. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.51 Invisible Join
  • 52. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.52 Invisible Join
  • 53. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.53 Invisible Join Bottom Line Many data warehouses model data using star/snowflake schemes Joins of one (fact) table with many dimension tables is common Invisible join takes advantage of this by making sure that the table that can be accessed in position order is the fact table for each join Position lists from the fact table are then intersected (in position order) This reduces the amount of data that must be accessed out of order from the dimension tables “Between-predicate rewriting” trick not relevant for this discussion
  • 54. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.54 Invisible Join
  • 55. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.55 Jive/Flash Join
  • 56. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.56 Jive/Flash Join Bottom Line Instead of probing projected columns from inner table out of order: • Sort join index • Probe projected columns in order • Sort result using an added column LM vs EM tradeoffs: LM has the extra sorts (EM accesses all columns in order) LM only has to fit join columns into memory (EM needs join columns and all projected columns) • § Results in big memory and CPU savings (see part 3 for why there is CPU savings) LM only has to materialize relevant columns In many cases LM advantages outweigh disadvantages LM would be a clear winner if not for those pesky sorts …
  • 57. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.57 Radix Cluster/Decluster The full sort from the Jive join is actually overkill We just want to access the storage blocks in order (we don’t mind random access within a block) So do a radix sort and stop early By stopping early, data within each block is accessed out of order, but in the order specified in the original join index • Use this pseudo-order to accelerate the post-probe sort as well Radix Sort 将所有待比较数值(正整数)统一为同样的数位长度,数位较短的数前面补零。 然后,从最低位开始,依次进行一次排序。这样从最低位排序一直到最高位排序 完成以后, 数列就变成一个有序序列。
  • 58. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.58 Radix Sort Example - LSD LSD (Least Significant Digital); positive integer From the right side. 73, 22, 93, 43, 55, 14, 28, 65, 390, 81 First, from units digit 0 1 2 3 4 5 6 7 8 9 073 022 093 043 055 014 028 065 390 081 390,081,022,073,093,043,014,055,0 65,028
  • 59. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.59 Radix Sort Example - LSD Second round, sort tens digit 390,081,022,073,093,043,014,055,065,028 0 1 2 3 4 5 6 7 8 9 390 081 022 073 093 043 014 055 065 028 014,022,028,043,055,065,073,081,3 90,093
  • 60. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.60 Radix Sort Example - LSD The last round, sort hundreds digit 014,022,028,043,055,065,073,081,390,093 0 1 2 3 4 5 6 7 8 9 014 022 028 043 055 065 073 081 390 093 014,022,028,043,055,065,073,081,0 93,390
  • 61. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.61 Radix Sort Example - MSD MSD (Most Significant Digital); positive integer From the left side, suitable for the case have large digits. The Process is the same.
  • 62. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.62 Radix Cluster/Decluster Bottom line Both sorts from the Jive join can be significantly reduced in overhead Only been tested when there is sufficient memory for the entire join index to be stored three times Technique is likely applicable to larger join indexes, but utility will go down a little Only works if random access within a storage block Don’t want to use radix cluster/decluster if you have variablewidth column values or compression schemes that can only be decompressed starting from the beginning of the block
  • 63. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.63 Tuple Construction Heuristics For queries with selective predicates, aggregations, or compressed data, use late materialization For joins: Research papers: Always use late materialization Commercial systems: Inner table to a join often materialized before join (reduces system complexity): Some systems will use LM only if columns from inner table can fit entirely in memory
  • 64. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.64 Query Optimization Almost, all the query optimization is auto, or by using Vertica Database Designer, we can do nothing, three generations: 1. StarOpt Only optimize the Data Warehouse style queries like Star and Snow. 2. StarifiedOpt Add non-start/snow queries optimization ability 3.V2Opt Optimized the Query Optimization, add the ability for a set of extensible modules, new algorithms for using the statistics …
  • 65. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Vertica SQL process (Hybrid Storage Model) Samchu Li / Jan 3rd, 2013
  • 66. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.66 Vertica SQL process (Hybrid Storage Model)
  • 67. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.67 Disk Physical Structure http://blog.163.com/sonyericssonss/blog/static/109683969200911233723670/ 硬盘结构详细易懂图解讲解
  • 68. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.68 I/O – Sequential I/O & Random I/O 连续和随机,是指本次I/O给出的初始扇区地址和上一次I/O的结束扇区地址一致. 是不是完全连续 的,或者相隔不多的,如果是,则本次I/O应该算是一个连续I/O, 如果相差太大,则算一次随机I/O. 连续I/O,因为本次初始扇区和上次结束扇区相隔很近,则磁头几乎不用换道或者换道时间极短;如 果相差太大,则磁头需要很长的换道时间,如果随机I/O很多,导致磁头不停的换道,效率大大降 低。 优化数据库性能最重要的一个方面是调整I/O性能,一般来说,15000转的服务器硬盘也就是能提供 75个左右的不连续(随机)的I/O的操作和150个连续的I/O操作。一般这种磁盘的标称传输速率在 100MB/S, 但是实际上影响和限制数据库服务器的传输率是每秒的 75/150 I/O. 假设,一次I/O的操作数据块大小是8KB, 则: 每秒75次的随机I/O操作 * 8KB = 600KB/S 每秒150次的连续I/O操作*8KB = 1200KB/S 跟标称传输100MB/S相差巨大 实际上不是这样,每次I/O的数据操作不会这么小,同时还有预读,硬盘缓存等机制。
  • 69. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.69 B+ Tree If one insert happens in the last leaf node, it is fast, for it is Sequential IO.
  • 70. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.70 B+ Tree But if there is an write operation like update, insert, delete, need to read many leaf node, many Random IO happens, it will waste time for disk seek time, low efficient.
  • 71. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.71
  • 72. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.72
  • 73. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.73 How to avoid this problem? 1. Give up for some read performance a. COLA (Cached-Oblivious Look Ahead Array) tokuDB b. LSM Tree (Log-structured merge Tree) Vertica,cassandra,hbase,bdb java editon,levelDB etc. 2. Memory / SSD The Design and Implementation of a Log-Structure File System, 1996
  • 74. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.74 LSM tree The Design and Implementation of a Log-Structure File System, 1996
  • 75. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.75 LSM tree The Design and Implementation of a Log-Structure File System, 1996
  • 76. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.76 Vertica SQL process (Hybrid Storage Model)
  • 77. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.77 WOS INSERT COPY DELETE UPDATE WOS(Memor y) Small Data Data in the WOS is solely in memory, where column or row oriented doesn’t matter. Cust Price Andrew $100.00 Cust Price Andrew $98.00 Cust Price Nga $90.00 Chuck $87.00 Merge When using WOS, directly put the data into WOS(memory), no need sorting, clustering, compressing ID Cust 1,2 Andrew 3 Chuck 4 Nga ID Price 1 $98.00 2 $100.00 3 $87.00 4 $90.00 Tuple Mover: Moveout ROS Cust Price Andrew $98.00 … … Merge multiple small files into large ones with sorting, clustering, compressing
  • 78. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.78 ROS Slow, need sorting, clustering, compressing and so on… … In Real World, prefer WOS, it is fast, suitable for large data be dived into large batches small job, or modify the parameter MoveOutInterval and MoveOutSizePct to let the WOS moveout the data quickly. But, be careful, could cause the Vertica down if your job workload is heavy, and WOS is out of memory. INSERT COPY DELETE UPDATE Large Data ROS
  • 79. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.79 Data Modifications and Delete Vectors Data in Vertica is never modified in place. When a tuple is deleted or updated from either the WOS or ROS, Vertica creates a delete vector. A delete vector is a list of positions of rows that have been deleted. Delete vectors are stored in the same format as user data: they are first written to a DVWOS in memory, then moved to DVROS containers on disk by the tuple mover and stored using efficient compression mechanisms. There may be multiple delete vectors for the WOS and multiple delete vectors for any particular ROS container. SQL UPDATE is supported by deleting the row being updated and then inserting a row containing the updated column values
  • 80. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.80 Histories Query – EPOCH An epoch is associated with each COMMIT the current_epoch at the time of the COMMIT is the epoch for that load. Vertica supports historical queries, though it's not a common use case for most customers. You can only query epochs that are after the current AHM, which is kept aggressively current by default. Deleted data prior to the AHM (Ancient History Mark) is eligible for being purged when a mergeout or explicit purge happens. After it's purged, delete vectors no longer need to be maintained. The Last Good Epoch is the epoch at which all data has been written from WOS to ROS. Any data after the LGE will be lost if the cluster shuts down abnormally from something like a power loss or a set of exceptions across multiple nodes. Refresh Epoch - don't worry about it, it doesn't get referenced in practice.
  • 81. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.81 Example dbadmin=> select current_epoch from system; current_epoch --------------- 44 (1 row) ================================================== ==== dbadmin=> insert into A values(1); commit; OUTPUT -------- 1 (1 row) COMMIT ================================================== ==== dbadmin=> select current_epoch from system; current_epoch --------------- 45 dbadmin=> insert into A values(2); commit; OUTPUT -------- 1 (1 row) ================================================== ==== dbadmin=> at epoch 46 select * from A; i --- 1 2 (2 rows) ================================================== ==== dbadmin=> at epoch 45 select * from A; i --- 1 (1 row)
  • 82. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.82 Example dbadmin=> select make_ahm_now(); make_ahm_now ----------------------------- AHM set (New AHM Epoch: 46) (1 row) ================================================== ==== dbadmin=> at epoch 45 select * from A; ERROR 2318: Can't run historical queries at epochs prior to the Ancient History Mark
  • 83. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Flex Zone Samchu Li / May 23rd, 2013
  • 84. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.84 Flex Zone – New with 7.0! Easily load, explore, analyze and monetize semi-structured data such as text, videos, call records More information in Loading Data module Vertica Analytics Flex Zone Tables Store and Explore Columnar Tables Daily Analytics
  • 85. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Vertica 4C / Availablity Samchu Li / Jan 3rd, 2013
  • 86. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.86 4C
  • 87. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.87 K-Safety – Clustering/MPP Your database must have a minimum number of nodes to be able to have a K- safety level greater than zero. Note: Vertica does not officially support values of K higher than 2. K-level Number of Nodes Required 0 1+ 1 3+ 2 5+ K (K+1)/2
  • 88. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.88 K-Safety K=1
  • 89. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.89 Projection
  • 90. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.90 Segmentation & Partition Segmentation Here the Segmentation = Neoview’s Partition Hash Segmentation Range Segmentation Partition Means on a single node, you still could divide this segmentation tabel into different parts to improve the performance. Range.
  • 91. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.91 Segmentation & Partition
  • 92. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Vertica Query Execution Workflow Samchu Li / Jan 3rd, 2013
  • 93. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.93 - MPP
  • 94. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.94 - MPP
  • 95. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Vertica UDx Samchu Li / Jan 3rd, 2013
  • 96. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.96 UDx User Defined Extension (UDx) refers to all extensions to Vertica developed using the APIs in the Vertica SDK. UDF – User Defined Functions, five types: • User Defined Scalar Functions (UDSFs) • User Defined Transform Functions (UDTFs) • User Defined Aggregate Functions (UDAF) • User Defined Analytic Functions (UDAnF) • The User Defined Load (UDL)
  • 97. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.97 Fenced Mode Fenced mode runs UDx code outside of the main Vertica process in a separate zygote process. UDx code that crashes while running in fenced mode does not impact the core Vertica process. There is a small performance impact when running UDx code in fenced mode. On average, using fenced mode adds about 10% more time to execution compared to unfenced mode. Zygote process The Vertica zygote process starts when Vertica starts. Each node has a single zygote process. Side processes are created "on demand". The zygote listens for requests and spawns a UDx side session that runs the UDx in fenced mode when a UDx is called by the user.
  • 98. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.98 Vertica R User Defined Functions developed in R always run in Fenced Mode in a process outside of the main Vertica process. You can create Scalar Functions and Transform Functions using the R language. Other UDx types are not supported with the R language. R Packages The Vertica R Language Pack includes the following R packages in addition to the default packages bundled with R: • Rcpp • RInside • IpSolve • lpSolveAPI
  • 99. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.99 Vertica R The R programming language is fast gaining popularity among data scientists to perform statistical analyses. It is extensible and has a large community of users, many of whom contribute packages to extend its capabilities. However, it is single-threaded and limited by the amount of RAM on the machine it is running on, which makes it challenging to run R programs on big data. There are efforts under way to remedy this situation, which essentially fall into one of the following two categories: • Integrate R into a parallel database, or • Parallelize R so it can process big data
  • 100. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.100 Running multiple instances of the R algorithm in parallel (query partitioned data) The first major performance benefit from Vertica R implementation has to do with running multiple instances of the R algorithm in parallel with queries that chunk the data independently.
  • 101. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.101 Running multiple instances of the R algorithm in parallel (query partitioned data)
  • 102. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.102 Leveraging column-store technology for optimized data exchange (query non- partitioned data) It is important to note that even for non-data parallel tasks (functions that operate on input that is basically one big chunk of non-partitioned data) , Vertica’s implementation provides better performance since computation runs on a server instead of client, and we have optimized data flow between DB and R (no need to parse data again).
  • 103. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.103 Leveraging column-store technology for optimized data exchange (query non- partitioned data)
  • 104. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.104 Leveraging column-store technology for optimized data exchange (query non- partitioned data) As the chart above indicates performance improvements are also achieved by the optimizing the data transfers between Vertica and R. Since Vertica is a column store and R is vector based it is very efficient to move data from a Vertica column in very large blocks to R vectors.
  • 105. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.105 Vertica eco-system Unstructured data Vertica Build-in engine RDBMS Index Store/idx/table ETL Hadoop/pig/HDFS/Hc atalog connector Flex Table Support Reporting
  • 106. © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Thank you

Editor's Notes

  1. When the C-Store appears for the first time, it is not so good to use, for example, the join algorithm only implemented Nested-Loop Join, also the record construction is based the storage model to do the inner join, too expensive, also have B-tree storage management, which comes from Berkeley DB, the SQL Compiler comes from PostgreSQL which is still be used in Vertica. An interesting thing is Ferreira’s one of supervisor for this paper is Michael Stonebraker. Then MIT open source this project and published a paper in VLDB in 2005.
  2. Ingres(Michael Stonebraker)  Informix (2000 年被 IBM 收购)  Sybase MS SQLServer (1992年将产品卖给微软)  NonStop SQL (Tandem 最后被 Compaq 并购并在 2000 年开始重写,而产品现在是 HP 的)  Neoview  SeaQuest  Postgres Illustra (1997 年被 Informix 收购)  PostgreSQL C-Store  Vertica
  3. Projections store data in formats that optimize query execution. They share one similarity to materialized views in that they store data sets on disk rather than compute them each time they are used in a query (e.g. physical storage).  However, projections aren’t aggregated but rather store every row in a table, e.g. the full atomic detail. The data sets are automatically refreshed whenever data values are inserted, appended, or changed – again, all of this happens beneath the covers without user intervention – unlike materialized views. Projections provide the following benefits: • Projections are transparent to end-users and SQL. The Vertica query optimizer automatically picks the best projections to use for any query. • Projections allow for the sorting of data in any order (even if different from the source tables). This enhances query performance and compression. • Projections deliver high availability optimized for performance, since the redundant copies of data are always actively used in analytics.  We have the ability to automatically store the redundant copy using a different sort order.  This provides the same benefits as a secondary index in a more efficient manner. • Projections do not require a batch update window.  Data is automatically available upon loads. • Projections are dynamic and can be added/changed on the fly without stopping the database. For each table in the database, Vertica requires a minimum of one projection, called a “superprojection”. A superprojection is a projection for a single table that contains all the columns and rows in the table.  use Vertica’s nifty Database Designer™  to optimize your database.  Database Designer creates new projections that optimize your database based on its data statistics and the queries you use. Database Designer: 1. Analyzes your logical schema, sample data, and sample queries (optional). 2. Creates a physical schema design (projections) in the form of a SQL script that can be deployed automatically or manually. 3. Can be used by anyone without specialized database knowledge (even business users can run Database Designer). 4. Can be run and re-run anytime for additional optimization without stopping the database. ad-hoc query performance
  4. Clearly Vertica is off the hook to do any sort at runtime: data is just read off disk (with perhaps some merging) and we are done. Finding rows in storage (disk or memory) that match stock=’IBM’ is quite easy when the data is sorted, simply by applying your favorite search algorithm (no indexes are required!). Furthermore, it isn’t even necessary to sort the stock=’IBM’ rows because the predicate ensures the secondary sort becomes primary within the rows that match as illustrated
  5. In general, the aggregator operator does not know a priori how many distinct stocks there are nor in what order that they will be encountered. One common approach to computing the aggregation is to keep some sort of lookup table in memory with the partial aggregates for each distinct stock. When a new tuple is read by the aggregator, its corresponding row in the table is found (or a new one is made) and the aggregate is updated as shown below: Illustration of aggregation when data is not sorted on stock. The aggregator has processed the first 4 rows: It has updated HPQ three times with 100, 102 and 103 for an average of 101.66, and it has updated IBM once for an average of 100. Now it encounters ORCL and needs to make a new entry in the table. With Vertica, a second type of aggregation algorithm is possible because the data is already sorted, so every distinct stock symbol appears together in the input stream. In this case, the aggregator can easily find the average stock price for each symbol while keeping only one intermediate average at any point in time. Once it sees a new symbol, the same symbol will never be seen again and the current average may be generated. This is illustrated below: Illustration of aggregation when data is sorted on stock. The aggregator has processed the first 7 rows. It has already computed the final averages of stock A and of stock HPQ and has seen the first value of stock IBM resulting in the current average of 100. When the aggregator encounters the next IBM row with price 103 it will update the average to 101.5. When the ORCL row is encountered the output row IBM,101.5 is produced. Of course, one pass aggregation is used in other systems (often called SORT GROUP BY), but they require a sort at runtime to sort the data by stock. Forcing a sort before the aggregation costs execution time and it prevents pipelined parallelism because all the tuples must be seen by the sort before any can be sent on. Using an index is also a possibility, but that requires more I/O, both to get the index and then to get the actual values. This is a reasonable approach for systems that aren’t designed for reporting, such as those that are designed for OLTP, but for analytic systems that often handle queries that contain large numbers of groups it is a killer. Other: Another area where having pre-sorted data helps is the computation of SQL-99 analytics. We can optimize the PARTITON BY clause in a manner very similar to GROUP BY when the partition keys are sequential in the data stream. We can also optimize the analytic ORDER BY clause similarly to the normal SQL ORDER BY clause. The final area to consider is Merge-Join. Of course this is not a new idea, but other database systems typically have Sort-Merge-Join, whereby a large join can be performed by pre-sorting the data from both input relations according to the join keys. Since Vertica already has the data sorted, it is often possible to skip the costly sort and begin the join right away.
  6. This Late materialization approach can potentially be more CPU efficient because it requires fewer intermediate tuples to be stitched together (which is a relatively expensive operation as it can be thought of a join on position), and position lists are small, highly-compressible data structures that can be operated on directly with very little overhead. Note, however, that one problem with this late materialization approach is that it requires re-scanning the base columns to form tuples, which can be slow (though they are likely to still be in memory upon re-access if the query is properly pipelined). Rather, it is to systematically explore the trade-offs between different strategies and provide a foundation for choosing a strategies and provide a foundation for choosing a strategy for a particular query. We focus on standard warehouse-style queries: read-only work-loads with selections, aggregations, and joins.
  7. Front Side Bus = 1600MHz * 64bit = 1024000 Mbit/s = 12800MByte/s SAS, 15000转 = 300MB/s Fiber 1G
  8. 从原理上讲,B+树的访问过程都走了相同的路径长度(Logm(N/2))。实践中每个B+树的索引节点包含的实际度数m是比较大的(几十到几百个),这就保证了树的深度不会太大,对于10亿级别的70%满的B+树结构,树的深度也只在10以内,也就是说访问某个记录所需的磁盘访问次数理论上只需要很少的几次磁盘I/O就可以完成;特别是针对文件系统的B+树组织,当访问过某个节点中的记录项后,后续的访问很可能也是在这个区域内,内存缓冲机制就已经根据访问策略(LRU等)将包含该记录项的节点至于内存了,因此就可以有效地减少I/O次数。 然而B+树为什么会慢呢?如前文指出的,还是要看面对什么样的应用场景。当数据插入是比较随机(无序、离散)的情形下,由于对某个叶节点的访问从时间角度看是很松散的,B+树的查找定位过程就会出现每次从磁盘查找定位所要插入的叶子节点,调入内存,然后将节点写回到磁盘这种反复换入换出的现象,内存缓冲机制就无法有效的工作(换句话说就是内存命中率很低,缺页概率很高)。对于这种很随机的访问,无论是查询、插入、更新抑或是删除,都会产生这种现象。 LSM-Tree结构解决的就是减少频繁的插入、修改、删除操作所需要的磁盘I/O次数
  9. 似乎会认为读应该是大部分系统最应该保证的特性,所以用读换写似乎不是个好买卖。但别急,听我分析之。 1. 内存的速度远超磁盘,1000倍以上。而读取的性能提升,主要还是依靠内存命中率而非磁盘读的次数 2. 写入不占用磁盘的io,读取就能获取更长时间的磁盘io使用权,从而也可以提升读取效率。 因此,虽然SSTable降低了了读的性能,但如果数据的读取命中率有保障的前提下,因为读取能够获得更多的磁盘io机会,因此读取性能基本没有降低,甚至还会有提升。 而写入的性能则会获得较大幅度的提升,基本上是5~10倍左右。 下面来看一下细节 其实从本质来说,k-v存储要解决的问题就是这么一个:尽可能快得写入,以及尽可能快的读取。 让我从写入最快的极端开始说起,阐述一下k-v存储的核心之一—树这个组件吧。 我们假设要写入一个1000个节点的key是随机数的数据。 对磁盘来说,最快的写入方式一定是顺序的将每一次写入都直接写入到磁盘中即可。 但这样带来的问题是,我没办法查询,因为每次查询一个值都需要遍历整个数据才能找到,这个读性能就太悲剧了。。 那么如果我想获取磁盘读性能最高,应该怎么做呢?把数据全部排序就行了,b树就是这样的结构。 那么,b树的写太烂了,我需要提升写,可以放弃部分磁盘读性能,怎么办呢? 简单,那就弄很多个小的有序结构,比如每m个数据,在内存里排序一次,下面100个数据,再排序一次……这样依次做下去,我就可以获得N/m个有序的小的有序结构。 在查询的时候,因为不知道这个数据到底是在哪里,所以就从最新的一个小的有序结构里做二分查找,找得到就返回,找不到就继续找下一个小有序结构,一直到找到为止。 很容易可以看出,这样的模式,读取的时间复杂度是(N/m)*log2N 。读取效率是会下降的。 这就是最本来意义上的LSM tree的思路。 那么这样做,性能还是比较慢的,于是需要再做些事情来提升,怎么做才好呢? 于是引入了以下的几个东西来改进它 1. Bloom filter : 就是个带随即概率的bitmap,可以快速的告诉你,某一个小的有序结构里有没有指定的那个数据的。于是我就可以不用二分查找,而只需简单的计算几次就能知道数据是否在某个小集合里啦。效率得到了提升,但付出的是空间代价。 2. 小树合并为大树: 也就是大家经常看到的compact的过程,因为小树他性能有问题,所以要有个进程不断地将小树合并到大树上,这样大部分的老数据查询也可以直接使用log2N的方式找到,不需要再进行(N/m)*log2n的查询了。 这就是LSMTree的核心思路和优化方式。 不过,LSMTree也有个隐含的条件,就是他实现数据库的insert语义时性能不会很高,原因是,insert的含义是: 事务中,先查找该插入的数据,如果存在,则抛出异常,如果不存在则写入。这个“查找”的过程,会拖慢整个写入。
  10. lsm构造许多小的结构,每个结构在内存里排序一下构成内部有序,查询的时候对每个小结构就可以采用二分法高效的查找定位,我们都知道有序的东西查找起来速度肯定比无序的快,如果只是这么设计肯定不能达到快速插入和查询的目的,lsm还引入了Bloom filter和小树到大树的排序。 Bloom Filter是一种空间效率很高的随机数据结构,它利用位数组很简洁地表示一个集合,并能判断一个元素是否属于这个集合。Bloom Filter的这种高效是有一定代价的:在判断一个元素是否属于某个集合时,有可能会把不属于这个集合的元素误认为属于这个集合(false positive)。因此,Bloom Filter不适合那些“零错误”的应用场合。而在能容忍低错误率的应用场合下,Bloom Filter通过极少的错误换取了存储空间的极大节省。 关于Bloom filter的详细请大家自己google。 Bloom filter在lsm中的作用就是判断要查询的数据在哪个内存部件中,或者要插入的数据应该插入到哪个内存部件中。 小树到大树的排序是为了节约内存,做开发的同学应该都明白内存中宝贵,同时也是为了恢复,因为我们知道hbase的delete和update其实都是insert,这都是由lsm的特点决定的,新的数据会被写到磁盘新的位置上去,这样就保证了旧记录不会被覆盖,在系统crash后的恢复过程会很有用,只要按日志顺序恢复就ok了。 说了半天没说什么是lsm tree: LSM-Tree通过使用一个基于内存的组件C0和一至多个基于磁盘的(C1, C2, …, CK)组件算法,对索引变更进行延迟及批量处理,并通过归并的方式高效的将更新迁移到磁盘。 当我们进行插入操作的时候,首先向日志文件中写入一个用于恢复该插入的日志记录,然后去写这条记录,把这个记录的索引放在c0树上,一段时间之后,把这个索引节点放到c1树上,对于数据的检索现在c0上,然后在c1上,这是肯定会命中的,由于内存的限制,所以c0不能太大,这就要求一定大小时要把c0中的某些连续节点merge到c1上,如下图
  11. When data is loaded into Vertica, it is loaded into all projections based on the source table. You cannot load only into a superprojection and then have it feed the data to the other projections. It create two tables and a view when loaded the data : 1. The flexible table (flex_table) 2. An associated keys table (flex_table_keys) 3. A default view for the main table (flex_table_view)
  12. User Defined Scalar Functions (UDSFs) take in a single row of data and return a single value. These functions can be used anywhere a native Vertica function can be used, except CREATE TABLE BY PARTITION and SEGMENTED BY expressions. User Defined Transform Functions (UDTFs) operate on table segments and return zero or more rows of data. The data they return can be an entirely new table, unrelated to the schema of the input table, including having its own ordering and segmentation expressions. They can only be used in the SELECT list of a query. For details see Using User Defined Transforms (page 421). User Defined Aggregate Functions (UDAF) allow you to create custom aggregate functions specific to your needs. They read one column of data, and return one output column.  User Defined Analytic Functions (UDAnF) are similar to UDSFs, in that they read a row of data and return a single row. However, the function can read input rows independently of outputting rows, so that the output values can be calculated over several input rows.  The User Defined Load (UDL) feature allows you to create custom routines to load your data into Vertica. You create custom libraries using the Vertica SDK to handle various steps in the loading process.
  13. What can Vertica do? EDW Centralized Data center Integrated with Hadoop, for massive data computation R, localized data mining Unstructured data for sentiment analysis/ location based analytic