Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
HPE Vertica and
Sparkitecture
Overview
Myles Collins
Vertica Analytics Platform
It’s a database, but:
• Fast - Boost performance by 500% or more
• Scalable - Handles huge work...
Apache Kafka + Spark + HPE Vertica for both Batch and Streaming Analytics
HPE Vertica
Analytics Platform
Analytics/ Report...
Oil and Gas Customer
4
Foundations of Vertica
5
Columnar storage Compression MPP scale-out Distributed query Projections
Speeds query time by
rea...
Advanced In-database Analytics
Allows for:
– Standard functionality that
performs at scale
SQL ‘99
Allows for:
– Sessioniz...
Data Transformation
Messaging
BI & Visualization
ETL
R Java Python
ODBC,JDBC,OLEDB
UserDefinedLoads
Geospatial
EventSeries...
Spark
Kafka
Hadoop
Embracing an open source architecture
8
• Vertica performs optimized
load from Spark
• Load data from V...
Vertica Enterprise Unique Value to expand the data warehouse
9
Hadoop Data Lake Vertica Big Data Warehouse
CREATE TABLE cu...
We did the leg work
10
3	TBs	of	test	data
HPE	Proliant DL	380	gen	9
Five	nodes
ROS,	Parquet	and	ORC	format.
99 TPC-DS Quer...
Other technologies can’t run all the queries
11
99
80
59
29
19
40
70
V E RT I CA E NT E RP RI S E I M P A LA HI V E O N T ...
Boosting Spark with Vertica
12
• Vertica Enterprise completes
the benchmark in 4% of the
time of Hive on Spark
• Hive on S...
HPE Vertica
Architected to embrace ecosystem innovation
BI/visualization Data transformation
Platform
Advancedanalytics Cl...
Community Edition
- Free download 1TB, 3 nodes
- vertica.com
Learn More About – and Try! - HPE Vertica
Thank you
Upcoming SlideShare
Loading in …5
×

of

A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 1 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 2 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 3 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 4 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 5 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 6 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 7 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 8 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 9 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 10 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 11 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 12 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 13 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 14 A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins Slide 15
Upcoming SlideShare
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armbrust
Next
Download to read offline and view in fullscreen.

4 Likes

Share

Download to read offline

A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins

Download to read offline

Legacy enterprise data warehouse (EDW) architecture, geared toward day-to-day workloads associated with operational querying, reporting, and analytics, are often ill-equipped to handle the volume of data, traffic, and varied data types associated with a modern, ad-hoc analytics platform. Faced with challenges of increasing pipeline speed, aggregation, and visualization in a simplified, self-service fashion, organizations are increasingly turning to some combination of Spark, Hadoop, Kafka, and proven analytical databases like Vertica as key enabling technologies to optimize their EDW architecture. Join us to learn how successful organizations have developed real-time streaming solutions with these technologies for range of use cases, including IOT predictive maintenance.

A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East talk by Myles Collins

  1. 1. HPE Vertica and Sparkitecture Overview Myles Collins
  2. 2. Vertica Analytics Platform It’s a database, but: • Fast - Boost performance by 500% or more • Scalable - Handles huge workloads at high speeds. • Standard - No need to learn new languages or add complexity (ANSI SQL, ACID) • Costs - Significantly lower cost over legacy platforms
  3. 3. Apache Kafka + Spark + HPE Vertica for both Batch and Streaming Analytics HPE Vertica Analytics Platform Analytics/ Reporting Data Generation OLTP/ODS Logs (Apps, Web, Devices) User tracking Operational Metrics Distributed Messaging System ETL Stream processing SQL on Hadoop Hive ORC Parquet Raw Data Topics JSON, AVRO Processed Data Topics
  4. 4. Oil and Gas Customer 4
  5. 5. Foundations of Vertica 5 Columnar storage Compression MPP scale-out Distributed query Projections Speeds query time by reading only necessary data Lowers costly I/O to boost overall performance Provides high scalability on clusters with no name node or other single point of failure Any node can initiate the queries and use other nodes for work. No single point of failure Combine high availability with special optimizations for query performance A B D C E A Memory CPU Disk
  6. 6. Advanced In-database Analytics Allows for: – Standard functionality that performs at scale SQL ‘99 Allows for: – Sessionization – Conversion analysis – Fraud detection – Fast Aggregates (LAP) SQL Extensions Allows for: – Machine learning – Custom data mining – Specialized parsers SDKs – Pattern matching – Event series joins – Time series – Event-based windows – Aggregate – Analytical – Window functions – Graph – MonteCarlo – Statistical – Geospatial Analytics – Java – C++ – R Connection – ODBC/JDBC – HIVE – Hadoop – Flex zone Allows for: – Statistical modeling – Cluster analysis – Predictive analytics In-database Analytics – Regression testing – K-means – Statistical modeling – Classification algorithms – Pagerank – Text mining – Geospatial 6
  7. 7. Data Transformation Messaging BI & Visualization ETL R Java Python ODBC,JDBC,OLEDB UserDefinedLoads Geospatial EventSeries Time series TextAnalytics PatternMatching Regression User-DefinedFunctions External tablesto analyze inplace Integratedwith Open Source Innovation Throughan Ecosystem-Friendly Architecture SQL Real-Time Machine Learning User Defined Storage Security C++
  8. 8. Spark Kafka Hadoop Embracing an open source architecture 8 • Vertica performs optimized load from Spark • Load data from Vertica to Spark • Read native formats like ORC and Parquet • Any Hadoop • Run ON the Hadoop cluster or ON Vertica cluster • Share data between applications that support Kafka • Data streaming into Vertica
  9. 9. Vertica Enterprise Unique Value to expand the data warehouse 9 Hadoop Data Lake Vertica Big Data Warehouse CREATE TABLE customer_visits ( customer_id bigint, visit_num int) PARTITIONED BY (page_view_dt date) STORED AS ORC; Customer information in Hadoop Customer information in Data Warehouse SELECT customers.customer_id FROM orders RIGHT OUTER JOINcustomers ON orders.customer_id = customers.customer_id GROUP BY customers.customer_id HAVING COUNT(orders.customer_id) = 0; Vertica Engine Querying data that sits BOTH in the data warehouse and Hadoop is our unique value. Most solutions require that you move the data. ROS § Leveraging Web Logs to gain customer insight § Sensor and IOT data for pre-emptiveservice § Marketing Programs Tracking § Tracking impact of application updates § Many more uses
  10. 10. We did the leg work 10 3 TBs of test data HPE Proliant DL 380 gen 9 Five nodes ROS, Parquet and ORC format. 99 TPC-DS Queries
  11. 11. Other technologies can’t run all the queries 11 99 80 59 29 19 40 70 V E RT I CA E NT E RP RI S E I M P A LA HI V E O N T E Z HI V E O N S P A RK PASSING AND FAILING TPC-DS QUERIES PASS FAIL
  12. 12. Boosting Spark with Vertica 12 • Vertica Enterprise completes the benchmark in 4% of the time of Hive on Spark • Hive on Spark could not complete 70 of the 99 queries at all. Those queries were not compared. 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 Hive on Spark Vertica Enterprise Seconds to complete TPC-DS benchmark queries (among universally runnablequeries) About 11hours About 24minutes
  13. 13. HPE Vertica Architected to embrace ecosystem innovation BI/visualization Data transformation Platform Advancedanalytics Cloud
  14. 14. Community Edition - Free download 1TB, 3 nodes - vertica.com Learn More About – and Try! - HPE Vertica
  15. 15. Thank you
  • safibaig

    Nov. 17, 2019
  • neofact

    Apr. 7, 2017
  • MichaelLi100

    Mar. 10, 2017
  • vkm1971

    Feb. 22, 2017

Legacy enterprise data warehouse (EDW) architecture, geared toward day-to-day workloads associated with operational querying, reporting, and analytics, are often ill-equipped to handle the volume of data, traffic, and varied data types associated with a modern, ad-hoc analytics platform. Faced with challenges of increasing pipeline speed, aggregation, and visualization in a simplified, self-service fashion, organizations are increasingly turning to some combination of Spark, Hadoop, Kafka, and proven analytical databases like Vertica as key enabling technologies to optimize their EDW architecture. Join us to learn how successful organizations have developed real-time streaming solutions with these technologies for range of use cases, including IOT predictive maintenance.

Views

Total views

1,682

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

69

Shares

0

Comments

0

Likes

4

×