SlideShare a Scribd company logo
A short introduction to Vertica 
Tommi Siivola, Software Engineer 
RedHat Software Developer Meetup 10.09.2014
- Quick orientation 
- Columns 
- Projections 
- Clustering 
- Hybrid storage 
- Special features 
AGENDA
Quick orientation to Vertica 
- Big data database product from HP 
- For handling terabytes/petabytes of data 
- Column-oriented
Quick orientation to Vertica 
- What does that mean in practice? 
– Vertica is a relational database 
– Supports a subset of ANSI SQL-99 standard 
– JDBC/ODBC drivers 
– A command line client (vsql)
Quick orientation to Vertica 
- Runs on major Linux distros (RHEL, Suse, Debian, Ubuntu) 
- Amazon AMI available for running in Vertica in the cloud 
- Up to 1 TB of data and a cluster of 3 nodes without license 
(so called ”Community Edition” mode) 
- Larger setups require a license from HP
Concepts: column-oriented 
- Vertica stores data as columns, instead of each row as unit 
– Allows for efficient data compression 
– Can skip unwanted columns when querying 
– More efficient aggregate value calculations
Concepts: column-oriented 
ROWS VS. COLUMNS 
2014-03-15 23.43 3 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 6 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 26.67 4 
2014-03-16 27.21 2 
2014-03-16 27.75 3 
2014-03-16 28.29 7 
2014-03-15 23.43 3 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 6 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 26.67 4 
2014-03-16 27.21 2 
2014-03-16 27.75 3 
2014-03-16 28.29 7
Concepts: column-oriented 
RUN LENGTH ENCODING 
2014-03-15 23.43 3 
(5 times) 23.97 4 
24.51 7 
25.05 6 
25.59 7 
2014-03-16 26.13 7 
(5 times) 26.67 4 
27.21 2 
27.75 3 
28.29 7 
2014-03-15 23.43 3 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 6 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 26.67 4 
2014-03-16 27.21 2 
2014-03-16 27.75 3 
2014-03-16 28.29 7
Concepts: column-oriented 
SKIP UNWANTED COLUMNS date value id 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 6 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 26.67 4 
2014-03-16 27.21 2 
2014-03-16 27.75 3 
2014-03-16 28.29 7 
SELECT value, id FROM table
Concepts: projections 
- Data physically stored in projections 
- Projections similar to materialized views 
– Data optimized for querying during insert 
- Table has one or more projections 
- Projection contains one or more columns 
- Data can be duplicated in projections for query efficiency
Concepts: projections 
ONE DATA, MANY PROJECTIONS 
Sorted by date Sorted by id 
2014-03-16 27.21 2 
2014-03-15 23.43 3 
2014-03-16 27.75 3 
2014-03-15 23.97 4 
2014-03-16 26.67 4 
2014-03-15 25.05 6 
2014-03-15 24.51 7 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 28.29 7 
2014-03-15 23.43 3 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 6 
2014-03-15 25.59 7 
2014-03-16 26.13 7 
2014-03-16 26.67 4 
2014-03-16 27.21 2 
2014-03-16 27.75 3 
2014-03-16 28.29 7
Concepts: clustering 
- Parallel processing 
– Data segments distributed across cluster nodes 
– Performance can be increased by adding hardware 
- Reliability (K-safety) 
– Tolerates nodes going offline 
- All nodes can respond to queries → queries can be load 
balanced between nodes
Concepts: clustering 
SEGMENTATION 
Node 1 
SEGMENT1 
Node 2 
SEGMENT2 
Node 3 
SEGMENT3 
Node 4 
SEGMENT4
Concepts: clustering 
K-SAFETY 
Node 1 
SEGMENT1 
SEGMENT2 
Node 2 
SEGMENT2 
SEGMENT3 
Node 3 
SEGMENT3 
SEGMENT4 
Node 4 
SEGMENT4 
SEGMENT1
Concepts: Hybrid storage 
- Read-optimized storage (ROS) 
– On disk 
– Heavily encoded & compressed 
- Write-optimized storage (WOS) 
– In memory 
– No encoding or compression
Concepts: Hybrid storage 
- Inserted data is first aggregated in WOS 
– Inserting to WOS is faster, due to lack of compression 
and disk write overheads 
- Background job moves data in batches from WOS to ROS 
– Writing to ROS is more efficient in batches 
– Querying is more efficient from ROS
Vertica feature: Pattern matching 
- Example: Finding sequences in 
web site log data 
- Find all sequences where user 
enters the site, browses and 
finally makes a purchase 
- Difficult to express in SQL 
- Vertica has SQL extension for 
finding patterns 
user action 
1 enter 
1 browse 
1 browse 
1 purchase 
2 enter 
2 browse 
3 enter 
3 browse 
3 purchase 
PATTERNS IN DATA
Vertica feature: Pattern matching 
- Example: find sequences where user enters a site, browses 
and makes a purchase 
SELECT uid,sid,ts,refurl,pageurl,action, 
event_name(),pattern_id(),match_id() 
FROM clickstream_log 
MATCH 
(PARTITION BY uid, sid ORDER BY ts 
DEFINE 
Entry AS refurl NOT ILIKE '%site.com%' AND pageurl ILIKE '%site.com%', 
Onsite AS pageurl ILIKE '%site.com%' AND action = 'V', 
Purchase AS pageurl ILIKE '%site.com%' AND action = 'P' 
PATTERN 
P AS (Entry Onsite* Purchase) 
ROWS MATCH FIRST EVENT);
Extending Vertica 
- Custom SQL functions can be created with R, Java or C++ 
- R can be used for creating scalar and transform functions 
- Java, all of the above + load functions 
- C++, all of the above + aggregate and analytic functions
Find out more 
- Vertica free downloads available at (requires registration) 
– my.vertica.com 
- Vertica documentation available at (no registration) 
– www.vertica.com/documentation 
- C-Store research project (Vertica predecessor) 
– db.csail.mit.edu/projects/cstore/
THANKS! 
Tommi Siivola, Software Engineer 
tommi.siivola@eficode.com 
+358 (0)50 371 9308 
eficode.fi 
”Automatisoi tai 
näivety” ja muita 
kirjoituksia 
Eficoden blogissa. 
EFICODE.FI/BLOGI

More Related Content

What's hot

Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Row-level security and Dynamic Data Masking
Row-level security and Dynamic Data MaskingRow-level security and Dynamic Data Masking
Row-level security and Dynamic Data Masking
SolidQ
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
HostedbyConfluent
 
Cassandra
CassandraCassandra
Cassandra
Upaang Saxena
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
Rodney Joyce
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
Shubham Tagra
 
Partitioning tables and indexing them
Partitioning tables and indexing them Partitioning tables and indexing them
Partitioning tables and indexing them
Hemant K Chitale
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
 
Tableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.comTableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.com
bigclasses.com
 
Open ebs 101
Open ebs 101Open ebs 101
Open ebs 101
LibbySchulze
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 

What's hot (20)

Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Row-level security and Dynamic Data Masking
Row-level security and Dynamic Data MaskingRow-level security and Dynamic Data Masking
Row-level security and Dynamic Data Masking
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
 
Cassandra
CassandraCassandra
Cassandra
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
 
Partitioning tables and indexing them
Partitioning tables and indexing them Partitioning tables and indexing them
Partitioning tables and indexing them
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Tableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.comTableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.com
 
Open ebs 101
Open ebs 101Open ebs 101
Open ebs 101
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 

Similar to A short introduction to Vertica

Zero to scaleable in ten minutes
Zero to scaleable in ten minutesZero to scaleable in ten minutes
Zero to scaleable in ten minutes
Matt Walters
 
Make your first CloudStack Cloud successful
Make your first CloudStack Cloud successfulMake your first CloudStack Cloud successful
Make your first CloudStack Cloud successful
Tim Mackey
 
Presentation
PresentationPresentation
Presentation
Dimitris Stripelis
 
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...
MUG-Lyon Microsoft User Group
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
amarsri
 
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Clustrix
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
I Goo Lee
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from Storage
Avere Systems
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
Scott Miao
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
Roopa Tangirala
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
ScyllaDB
 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2
zhang hua
 
Optimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec AzureOptimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec Azure
Swiss Data Forum Swiss Data Forum
 
Azure BCDR in Action: From Setup to Failover and Back
Azure BCDR in Action: From Setup to Failover and BackAzure BCDR in Action: From Setup to Failover and Back
Azure BCDR in Action: From Setup to Failover and Back
ssuser6c6f84
 
Kissy mvc
Kissy mvcKissy mvc
Kissy mvc
yiming he
 
VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019
Cédrick Lunven
 
Ucs invicta & application performance
Ucs invicta & application performanceUcs invicta & application performance
Ucs invicta & application performance
solarisyougood
 
StackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackStackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStack
Chiradeep Vittal
 
Azure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloudAzure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloud
ICT-Partners
 

Similar to A short introduction to Vertica (20)

Zero to scaleable in ten minutes
Zero to scaleable in ten minutesZero to scaleable in ten minutes
Zero to scaleable in ten minutes
 
SQL vs. NoSQL
SQL vs. NoSQLSQL vs. NoSQL
SQL vs. NoSQL
 
Make your first CloudStack Cloud successful
Make your first CloudStack Cloud successfulMake your first CloudStack Cloud successful
Make your first CloudStack Cloud successful
 
Presentation
PresentationPresentation
Presentation
 
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...
Global Windows Azure Bootcamp : Samir Arezki Multi-Tenancy. (sponsor Annuel d...
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
 
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from Storage
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2
 
Optimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec AzureOptimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec Azure
 
Azure BCDR in Action: From Setup to Failover and Back
Azure BCDR in Action: From Setup to Failover and BackAzure BCDR in Action: From Setup to Failover and Back
Azure BCDR in Action: From Setup to Failover and Back
 
Kissy mvc
Kissy mvcKissy mvc
Kissy mvc
 
VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019
 
Ucs invicta & application performance
Ucs invicta & application performanceUcs invicta & application performance
Ucs invicta & application performance
 
StackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackStackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStack
 
Azure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloudAzure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloud
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 

A short introduction to Vertica

  • 1. A short introduction to Vertica Tommi Siivola, Software Engineer RedHat Software Developer Meetup 10.09.2014
  • 2. - Quick orientation - Columns - Projections - Clustering - Hybrid storage - Special features AGENDA
  • 3. Quick orientation to Vertica - Big data database product from HP - For handling terabytes/petabytes of data - Column-oriented
  • 4. Quick orientation to Vertica - What does that mean in practice? – Vertica is a relational database – Supports a subset of ANSI SQL-99 standard – JDBC/ODBC drivers – A command line client (vsql)
  • 5. Quick orientation to Vertica - Runs on major Linux distros (RHEL, Suse, Debian, Ubuntu) - Amazon AMI available for running in Vertica in the cloud - Up to 1 TB of data and a cluster of 3 nodes without license (so called ”Community Edition” mode) - Larger setups require a license from HP
  • 6. Concepts: column-oriented - Vertica stores data as columns, instead of each row as unit – Allows for efficient data compression – Can skip unwanted columns when querying – More efficient aggregate value calculations
  • 7. Concepts: column-oriented ROWS VS. COLUMNS 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7
  • 8. Concepts: column-oriented RUN LENGTH ENCODING 2014-03-15 23.43 3 (5 times) 23.97 4 24.51 7 25.05 6 25.59 7 2014-03-16 26.13 7 (5 times) 26.67 4 27.21 2 27.75 3 28.29 7 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7
  • 9. Concepts: column-oriented SKIP UNWANTED COLUMNS date value id 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7 SELECT value, id FROM table
  • 10. Concepts: projections - Data physically stored in projections - Projections similar to materialized views – Data optimized for querying during insert - Table has one or more projections - Projection contains one or more columns - Data can be duplicated in projections for query efficiency
  • 11. Concepts: projections ONE DATA, MANY PROJECTIONS Sorted by date Sorted by id 2014-03-16 27.21 2 2014-03-15 23.43 3 2014-03-16 27.75 3 2014-03-15 23.97 4 2014-03-16 26.67 4 2014-03-15 25.05 6 2014-03-15 24.51 7 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 28.29 7 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7
  • 12. Concepts: clustering - Parallel processing – Data segments distributed across cluster nodes – Performance can be increased by adding hardware - Reliability (K-safety) – Tolerates nodes going offline - All nodes can respond to queries → queries can be load balanced between nodes
  • 13. Concepts: clustering SEGMENTATION Node 1 SEGMENT1 Node 2 SEGMENT2 Node 3 SEGMENT3 Node 4 SEGMENT4
  • 14. Concepts: clustering K-SAFETY Node 1 SEGMENT1 SEGMENT2 Node 2 SEGMENT2 SEGMENT3 Node 3 SEGMENT3 SEGMENT4 Node 4 SEGMENT4 SEGMENT1
  • 15. Concepts: Hybrid storage - Read-optimized storage (ROS) – On disk – Heavily encoded & compressed - Write-optimized storage (WOS) – In memory – No encoding or compression
  • 16. Concepts: Hybrid storage - Inserted data is first aggregated in WOS – Inserting to WOS is faster, due to lack of compression and disk write overheads - Background job moves data in batches from WOS to ROS – Writing to ROS is more efficient in batches – Querying is more efficient from ROS
  • 17. Vertica feature: Pattern matching - Example: Finding sequences in web site log data - Find all sequences where user enters the site, browses and finally makes a purchase - Difficult to express in SQL - Vertica has SQL extension for finding patterns user action 1 enter 1 browse 1 browse 1 purchase 2 enter 2 browse 3 enter 3 browse 3 purchase PATTERNS IN DATA
  • 18. Vertica feature: Pattern matching - Example: find sequences where user enters a site, browses and makes a purchase SELECT uid,sid,ts,refurl,pageurl,action, event_name(),pattern_id(),match_id() FROM clickstream_log MATCH (PARTITION BY uid, sid ORDER BY ts DEFINE Entry AS refurl NOT ILIKE '%site.com%' AND pageurl ILIKE '%site.com%', Onsite AS pageurl ILIKE '%site.com%' AND action = 'V', Purchase AS pageurl ILIKE '%site.com%' AND action = 'P' PATTERN P AS (Entry Onsite* Purchase) ROWS MATCH FIRST EVENT);
  • 19. Extending Vertica - Custom SQL functions can be created with R, Java or C++ - R can be used for creating scalar and transform functions - Java, all of the above + load functions - C++, all of the above + aggregate and analytic functions
  • 20. Find out more - Vertica free downloads available at (requires registration) – my.vertica.com - Vertica documentation available at (no registration) – www.vertica.com/documentation - C-Store research project (Vertica predecessor) – db.csail.mit.edu/projects/cstore/
  • 21. THANKS! Tommi Siivola, Software Engineer tommi.siivola@eficode.com +358 (0)50 371 9308 eficode.fi ”Automatisoi tai näivety” ja muita kirjoituksia Eficoden blogissa. EFICODE.FI/BLOGI