1 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hive 3.0, A New Horizon
Alan Gates
Hortonworks Co-founder, Apache Hive PMC member
@alanfgates
2 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hive – Data Warehousing for Big Data
• Comprehensive ANSI SQL
• Only open source Hadoop SQL with transactions, INSERT/UPDATE/DELETE/MERGE
• BI queries with MPP performance at big data scales
• ETL jobs scale with your cluster
• Enables per-user dynamic row and column security
• Enables replication for HA and DR
• Compatible with every major BI tool
• Proven at 300+ PB scale
3 © Hortonworks Inc. 2011–2018. All rights reserved
Hive on Tez
Deep
Storage
Hadoop Cluster
Tez Container
Query
Executors
Tez Container
Query
Executors
Tez Container
Query
Executors
Tez Container
Query
Executors
Tez AM
Tez AM
HiveServer2
(Query
Endpoint)
ODBC /
JDBC
SQL
Queries
HDFS and
Compatible
S3 WASB Isilon
4 © Hortonworks Inc. 2011–2018. All rights reserved
Hive LLAP - MPP Performance at Hadoop Scale
Deep
Storage
Hadoop Cluster
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
Query
Coordinators
Coord-
inator
Coord-
inator
Coord-
inator
HiveServer2
(Query
Endpoint)
ODBC /
JDBC
SQL
Queries In-Memory Cache
(Shared Across All Users)
HDFS and
Compatible
S3 WASB Isilon
5 © Hortonworks Inc. 2011–2018. All rights reserved
Hive3: EDW Analyst Pipeline
BI tools
Materialized
view
Surrogate
key
Constraints
Query
Result
Cache
Workload
management
• Results return
from HDFS/cache
directly
• Reduce load from
repetitive queries
• Allows more
queries to be run
in parallel
• Reduce resource
starvation in large
clusters
• Active/Passive HA
• More “tools” for
optimizer to use
• More ”tools” for
DBAs to
tune/optimize
• Invisible tuning of
DB from users’
perspective
• ACID v2 is as fast as
regular tables
• Hive 3 is optimized
for S3/WASB/GCP
• Support for
JDBC/Kafka/Druid
out of the box
ACID v2
Cloud
Storage
Connectors
6 © Hortonworks Inc. 2011–2018. All rights reserved
New SQL Features
7 © Hortonworks Inc. 2011–2018. All rights reserved
Transactional Read and Write
• Originally Hive supported write only by adding partitions or loading new files into
existing partitions
• Starting in version 0.13, Hive added transactions and INSERT, UPDATE, DELETE
• Supports
• Slow changing dimensions
• Correcting mis-loaded data
• GDPR's right to be forgotten
• Not OLTP!
• Drawbacks:
• Transactional tables had to be stored in ORC and had to be bucketed
• Reading transactional tables was significantly slower than non-transactional
• No support for MERGE or UPSERT functionality
8 © Hortonworks Inc. 2011–2018. All rights reserved
ACID v2
• In 3.0 ACID storage has been reworked
• Performance penalty for ACID is now negligible even when compactor has not run
• With other optimizations ACID can result in speed up (more on this in the performance talk)
• Added MERGE support
• CDC can be regularly merged into a fact table with upsert functionality
• Removed restrictions:
• Tables no longer have to be bucketed
• Non-ORC based tables supported (INSERT & SELECT only)
• Still not OLTP!
9 © Hortonworks Inc. 2011–2018. All rights reserved
Constraints & Defaults
• Helps optimizer to produce better plans
• BI tool integrations
• Data Integrity
• hive.constraint.notnull.enforce = true
• SQL compatibility & offload scenarios
Example:
CREATE TABLE Persons (
ID Int NOT NULL,
Name String NOT NULL,
Age Int,
Creator String DEFAULT CURRENT_USER(),
CreateDate Date DEFAULT CURRENT_DATE(),
PRIMARY KEY (ID) DISABLE NOVALIDATE
);
CREATE TABLE BusinessUnit (
ID Int NOT NULL,
Head Int NOT NULL,
Creator String DEFAULT CURRENT_USER(),
CreateDate Date DEFAULT CURRENT_DATE(),
PRIMARY KEY (ID) DISABLE NOVALIDATE,
CONSTRAINT fk FOREIGN KEY (Head)
REFERENCES Persons(ID) DISABLE
NOVALIDATE
);
10 © Hortonworks Inc. 2011–2018. All rights reserved
Hive Native Replication
• REPL commands added to support replication
• Replication currently done at database level (all tables etc. in the db)
• Copies data together with metadata
• Master/slave on db level
• When first setup, existing data copied
• Then incremental replication - only copies changes
• Hive itself provides primitives for replication, not active daemons
• Used by Hortonworks Data Lifecycle Manager to provide High Availability and Disaster Recovery
• Replication can be between two clusters or between cluster and cloud
11 © Hortonworks Inc. 2011–2018. All rights reserved
Plus More
• Materialized Views with refresh (more in the performance talk)
• Surrogate keys – default values, unique, not monotonically increasing
• SQL Standard Information Schema now supported
• Ranger can now enforce authorization policies for use of global non-builtin UDFs
• Support for TIMESTAMP WITH TIMEZONE data type
• In Hive 3 much work has been done to optimize Hive for object stores
• Hive uses its ACID system to determine which files to read rather than trust the storage
• Moves eliminated where ever possible
• More aggressive caching of file metadata and data to reduce file system operations
• Apache Parquet and text files now supported in LLAP
12 © Hortonworks Inc. 2011–2018. All rights reserved
Workload Management
13 © Hortonworks Inc. 2011–2018. All rights reserved
LLAP Workload Management
• Effectively share LLAP cluster resources
• Resource allocation per user policy; separate ETL and BI, etc.
• Resources based guardrails
• Protect against long running queries, high memory usage
• Improved, query-aware scheduling
• Scheduler is aware of query characteristics, types, etc.
• Fragments easy to pre-empt compared to containers
• Queries get guaranteed fractions of the cluster, but can use empty space
14 © Hortonworks Inc. 2011–2018. All rights reserved
Guardrail Example
Common Triggers
● ELAPSED_TIME
● EXECUTION_TIME
● TOTAL_TASKS
● HDFS_BYTES_READ, HDFS_BYTES_WRITTEN
● CREATED FILES
● CREATED_DYNAMIC_PARTITIONS
Example
CREATE RESOURCE PLAN guardrail;
CREATE TRIGGER guardrail.long_running WHEN EXECUTION_TIME > 2000 DO KILL;
ALTER TRIGGER guardrail.long_running ADD TO UNMANAGED;
ALTER RESOURCE PLAN guardrail ENABLE ACTIVATE;
15 © Hortonworks Inc. 2011–2018. All rights reserved
Resource Plans Example
CREATE RESOURCE PLAN daytime;
CREATE POOL daytime.bi WITH ALLOC_FRACTION=0.8, QUERY_PARALLELISM=5;
CREATE POOL daytime.etl WITH ALLOC_FRACTION=0.2, QUERY_PARALLELISM=20;
CREATE RULE downgrade IN daytime WHEN total_runtime > 3000 THEN MOVE etl;
ADD RULE downgrade TO bi;
CREATE APPLICATION MAPPING tableau in daytime TO bi;
ALTER PLAN daytime SET default pool= etl;
APPLY PLAN daytime;
daytime
bi: 80% etl: 20%
Downgrade when total_runtime>3000
16 © Hortonworks Inc. 2011–2018. All rights reserved
Connectors
17 © Hortonworks Inc. 2011–2018. All rights reserved
EDW Ingestion Pipeline
LLAP
interface
Kafka-Druid-
Hive ingest
Kafka-hive
streaming
ingest
Druid
ACID tables
Real-time analytics
• Druid answers in near real-time
• JDBC sources
• Kafka sources
Easy to use
• Query any data via LLAP
• No need to de-ACID tables
• No bucketing required
• Calcite talks SQL
• Materialization just works
• Cache just works
JDBC sources
MySQL, Postgres, Oracle
18 © Hortonworks Inc. 2011–2018. All rights reserved
Kafka Connector
Connect
● You say Stream I say Table!
● Define Time based View
Over the Stream
(e.g. last 15 mins)
● Enforce Hive authentication
& authorization out of the
box
Analyze
● Kafka metadata (key, partition,
offset, timestamp) as first
class columns.
● Time Traveling based on time
predicate.
● Seek to offset or partition
based on filter predicate.
Transform
● Join stream to stream
OR stream to table
● ACID offload data from
Kafka to Hive Exactly
once.
● Produce Data and write
it back to Kafka.
19 © Hortonworks Inc. 2011–2018. All rights reserved
Driver
MetaStore
HiveServer+Tez
LLAP DaemonsExecutors
Spark
Meta
Hive
Meta
HWC (JDBC)
Executors LLAP Daemons
1
2
3
1. Driver submits query to HiveServer
2. Compile query and return ”splits” to Driver
3. Execute query on LLAP
c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show()
ACID
Tables
20 © Hortonworks Inc. 2011–2018. All rights reserved
Driver HiveServer+Tez
LLAP DaemonsExecutors HWC (Arrow)
Executors LLAP Daemons
4
5
4. Executor Tasks run for each split
5. Tasks reads Arrow data from LLAP
6. HWC returns ArrowColumnVectors to Spark
6
c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show()
ACID
Tables
MetaStore
Spark
Meta
Hive
Meta
21 © Hortonworks Inc. 2011–2018. All rights reserved
JDBC Connector
• How did we build the information_schema?
• We mapped the metastore into Hive’s table
space!
• Uses Hive-JDBC connector
• Read-only for now
• Supports automatic pushdown of full
subqueries
• Cost-based optimizer decides part of query runs
in RDBMS versus Hive
• Joins, aggregates, filters, projections, etc
CREATE TABLE postgres_table (
id INT,
name varchar
);
CREATE EXTERNAL TABLE hive_table (
id INT,
name STRING
) STORED BY
'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "POSTGRES",
"hive.sql.jdbc.driver"="org.postgresql.Driver",
"hive.sql.jdbc.url"="jdbc:postgresql://...",
"hive.sql.dbcp.username"="jdbctest",
"hive.sql.dbcp.password"="",
"hive.sql.query"="select * from postgres_table",
"hive.sql.column.mapping" = "id=ID, name=NAME",
"hive.jdbc.update.on.duplicate" = "true"
);
In Postgres
In Hive
22 © Hortonworks Inc. 2011–2018. All rights reserved
Usability: Data Analytics Studio
23 © Hortonworks Inc. 2011–2018. All rights reserved
One of the Extensible DataPlane Services
⬢ DAS 1.0 available now for HDP 3.0!
⬢ Monthly release cadence
⬢ Replaces Hive & Tez Views
⬢ Separate install from stack
Hortonworks Data Analytics Studio
HORTONWORKS DATAPLANE SERVICE
DATA SOURCE INTEGRATION
DATA SERVICES CATALOG
…DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO
+OTHER
(partner)
SECURITY CONTROLS
CORE CAPABILITIES
MULTIPLE CLUSTERS AND SOURCES
MULTIHYBRID
*not yet available, coming soon
EXTENSIBLE SERVICES
IBM DSX*
DATA
ANALYTICS
STUDIO
24 © Hortonworks Inc. 2011–2018. All rights reserved.
Hortonworks confidential and proprietary information
SOLUTIONS: Pre-defined searches to quickly narrow
down problematic queries in a large cluster
25 © Hortonworks Inc. 2011–2018. All rights reserved.
Hortonworks confidential and proprietary information
SOLUTIONS: Full featured Auto-complete, results
direct download, quick-data preview and many other
quality-of-life improvements
26 © Hortonworks Inc. 2011–2018. All rights reserved.
Hortonworks confidential and proprietary information
SOLUTIONS: Heuristic recommendation engine
Fully self-serviced query and storage optimization
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SOLUTIONS: Data Analytics Studio gives database
heatmap, quickly discover and see what part of your
cluster is being utilized more
28 © Hortonworks Inc. 2011–2018. All rights reserved
Superset UI for Fast, Interactive Dashboards and Exploration
29 © Hortonworks Inc. 2011–2018. All rights reserved
Coming Soon
30 © Hortonworks Inc. 2011–2018. All rights reserved
⬢ Hive on Kubernetes solves:
– Hive/LLAP side install (to main cluster)
– Multiple versions of Hive
– Multiple warehouse & compute instances
– Dynamic configuration and secrets management
– Stateful and work preserving restarts (cache)
– Rolling restart for upgrades. Fast rollback to
previous good state.
Hive on Kubernetes (WIP)
Kubernetes Hosting Environments
AWS GCP
Data OS
CPU / MEMORY /
STORAGE
OPENSHIFTAZURE
CLOUD PROVIDERS
ON-
PREM/HYB
RID
DATA PLANE SERVICES
Cluster Lifecycle Manager Data Analytics Studio (DAS) Organizational Services
COMPUTE CLUSTER
SHARED SERVICES
Ranger
Atlas
Metastore
Tiller API Server
DAS Web Service
Query Coordinators
Query Executors
Registry
Blobstore
Indexe
r
RDBMS
Hive Server
Long-running kubernetes cluster
Inter-cluster communication Intra-cluster communication
Ingress Controller or
Load Balancer
Internal Service Endpoint for
ReplicaSet or StatefulSet
Ephemeral kubernetes cluster
31 © Hortonworks Inc. 2011–2018. All rights reserved
Questions?
32 © Hortonworks Inc. 2011–2018. All rights reserved
Thank You

What is new in Apache Hive 3.0?

  • 1.
    1 © HortonworksInc. 2011–2018. All rights reserved Apache Hive 3.0, A New Horizon Alan Gates Hortonworks Co-founder, Apache Hive PMC member @alanfgates
  • 2.
    2 © HortonworksInc. 2011–2018. All rights reserved Apache Hive – Data Warehousing for Big Data • Comprehensive ANSI SQL • Only open source Hadoop SQL with transactions, INSERT/UPDATE/DELETE/MERGE • BI queries with MPP performance at big data scales • ETL jobs scale with your cluster • Enables per-user dynamic row and column security • Enables replication for HA and DR • Compatible with every major BI tool • Proven at 300+ PB scale
  • 3.
    3 © HortonworksInc. 2011–2018. All rights reserved Hive on Tez Deep Storage Hadoop Cluster Tez Container Query Executors Tez Container Query Executors Tez Container Query Executors Tez Container Query Executors Tez AM Tez AM HiveServer2 (Query Endpoint) ODBC / JDBC SQL Queries HDFS and Compatible S3 WASB Isilon
  • 4.
    4 © HortonworksInc. 2011–2018. All rights reserved Hive LLAP - MPP Performance at Hadoop Scale Deep Storage Hadoop Cluster LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors Query Coordinators Coord- inator Coord- inator Coord- inator HiveServer2 (Query Endpoint) ODBC / JDBC SQL Queries In-Memory Cache (Shared Across All Users) HDFS and Compatible S3 WASB Isilon
  • 5.
    5 © HortonworksInc. 2011–2018. All rights reserved Hive3: EDW Analyst Pipeline BI tools Materialized view Surrogate key Constraints Query Result Cache Workload management • Results return from HDFS/cache directly • Reduce load from repetitive queries • Allows more queries to be run in parallel • Reduce resource starvation in large clusters • Active/Passive HA • More “tools” for optimizer to use • More ”tools” for DBAs to tune/optimize • Invisible tuning of DB from users’ perspective • ACID v2 is as fast as regular tables • Hive 3 is optimized for S3/WASB/GCP • Support for JDBC/Kafka/Druid out of the box ACID v2 Cloud Storage Connectors
  • 6.
    6 © HortonworksInc. 2011–2018. All rights reserved New SQL Features
  • 7.
    7 © HortonworksInc. 2011–2018. All rights reserved Transactional Read and Write • Originally Hive supported write only by adding partitions or loading new files into existing partitions • Starting in version 0.13, Hive added transactions and INSERT, UPDATE, DELETE • Supports • Slow changing dimensions • Correcting mis-loaded data • GDPR's right to be forgotten • Not OLTP! • Drawbacks: • Transactional tables had to be stored in ORC and had to be bucketed • Reading transactional tables was significantly slower than non-transactional • No support for MERGE or UPSERT functionality
  • 8.
    8 © HortonworksInc. 2011–2018. All rights reserved ACID v2 • In 3.0 ACID storage has been reworked • Performance penalty for ACID is now negligible even when compactor has not run • With other optimizations ACID can result in speed up (more on this in the performance talk) • Added MERGE support • CDC can be regularly merged into a fact table with upsert functionality • Removed restrictions: • Tables no longer have to be bucketed • Non-ORC based tables supported (INSERT & SELECT only) • Still not OLTP!
  • 9.
    9 © HortonworksInc. 2011–2018. All rights reserved Constraints & Defaults • Helps optimizer to produce better plans • BI tool integrations • Data Integrity • hive.constraint.notnull.enforce = true • SQL compatibility & offload scenarios Example: CREATE TABLE Persons ( ID Int NOT NULL, Name String NOT NULL, Age Int, Creator String DEFAULT CURRENT_USER(), CreateDate Date DEFAULT CURRENT_DATE(), PRIMARY KEY (ID) DISABLE NOVALIDATE ); CREATE TABLE BusinessUnit ( ID Int NOT NULL, Head Int NOT NULL, Creator String DEFAULT CURRENT_USER(), CreateDate Date DEFAULT CURRENT_DATE(), PRIMARY KEY (ID) DISABLE NOVALIDATE, CONSTRAINT fk FOREIGN KEY (Head) REFERENCES Persons(ID) DISABLE NOVALIDATE );
  • 10.
    10 © HortonworksInc. 2011–2018. All rights reserved Hive Native Replication • REPL commands added to support replication • Replication currently done at database level (all tables etc. in the db) • Copies data together with metadata • Master/slave on db level • When first setup, existing data copied • Then incremental replication - only copies changes • Hive itself provides primitives for replication, not active daemons • Used by Hortonworks Data Lifecycle Manager to provide High Availability and Disaster Recovery • Replication can be between two clusters or between cluster and cloud
  • 11.
    11 © HortonworksInc. 2011–2018. All rights reserved Plus More • Materialized Views with refresh (more in the performance talk) • Surrogate keys – default values, unique, not monotonically increasing • SQL Standard Information Schema now supported • Ranger can now enforce authorization policies for use of global non-builtin UDFs • Support for TIMESTAMP WITH TIMEZONE data type • In Hive 3 much work has been done to optimize Hive for object stores • Hive uses its ACID system to determine which files to read rather than trust the storage • Moves eliminated where ever possible • More aggressive caching of file metadata and data to reduce file system operations • Apache Parquet and text files now supported in LLAP
  • 12.
    12 © HortonworksInc. 2011–2018. All rights reserved Workload Management
  • 13.
    13 © HortonworksInc. 2011–2018. All rights reserved LLAP Workload Management • Effectively share LLAP cluster resources • Resource allocation per user policy; separate ETL and BI, etc. • Resources based guardrails • Protect against long running queries, high memory usage • Improved, query-aware scheduling • Scheduler is aware of query characteristics, types, etc. • Fragments easy to pre-empt compared to containers • Queries get guaranteed fractions of the cluster, but can use empty space
  • 14.
    14 © HortonworksInc. 2011–2018. All rights reserved Guardrail Example Common Triggers ● ELAPSED_TIME ● EXECUTION_TIME ● TOTAL_TASKS ● HDFS_BYTES_READ, HDFS_BYTES_WRITTEN ● CREATED FILES ● CREATED_DYNAMIC_PARTITIONS Example CREATE RESOURCE PLAN guardrail; CREATE TRIGGER guardrail.long_running WHEN EXECUTION_TIME > 2000 DO KILL; ALTER TRIGGER guardrail.long_running ADD TO UNMANAGED; ALTER RESOURCE PLAN guardrail ENABLE ACTIVATE;
  • 15.
    15 © HortonworksInc. 2011–2018. All rights reserved Resource Plans Example CREATE RESOURCE PLAN daytime; CREATE POOL daytime.bi WITH ALLOC_FRACTION=0.8, QUERY_PARALLELISM=5; CREATE POOL daytime.etl WITH ALLOC_FRACTION=0.2, QUERY_PARALLELISM=20; CREATE RULE downgrade IN daytime WHEN total_runtime > 3000 THEN MOVE etl; ADD RULE downgrade TO bi; CREATE APPLICATION MAPPING tableau in daytime TO bi; ALTER PLAN daytime SET default pool= etl; APPLY PLAN daytime; daytime bi: 80% etl: 20% Downgrade when total_runtime>3000
  • 16.
    16 © HortonworksInc. 2011–2018. All rights reserved Connectors
  • 17.
    17 © HortonworksInc. 2011–2018. All rights reserved EDW Ingestion Pipeline LLAP interface Kafka-Druid- Hive ingest Kafka-hive streaming ingest Druid ACID tables Real-time analytics • Druid answers in near real-time • JDBC sources • Kafka sources Easy to use • Query any data via LLAP • No need to de-ACID tables • No bucketing required • Calcite talks SQL • Materialization just works • Cache just works JDBC sources MySQL, Postgres, Oracle
  • 18.
    18 © HortonworksInc. 2011–2018. All rights reserved Kafka Connector Connect ● You say Stream I say Table! ● Define Time based View Over the Stream (e.g. last 15 mins) ● Enforce Hive authentication & authorization out of the box Analyze ● Kafka metadata (key, partition, offset, timestamp) as first class columns. ● Time Traveling based on time predicate. ● Seek to offset or partition based on filter predicate. Transform ● Join stream to stream OR stream to table ● ACID offload data from Kafka to Hive Exactly once. ● Produce Data and write it back to Kafka.
  • 19.
    19 © HortonworksInc. 2011–2018. All rights reserved Driver MetaStore HiveServer+Tez LLAP DaemonsExecutors Spark Meta Hive Meta HWC (JDBC) Executors LLAP Daemons 1 2 3 1. Driver submits query to HiveServer 2. Compile query and return ”splits” to Driver 3. Execute query on LLAP c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show() ACID Tables
  • 20.
    20 © HortonworksInc. 2011–2018. All rights reserved Driver HiveServer+Tez LLAP DaemonsExecutors HWC (Arrow) Executors LLAP Daemons 4 5 4. Executor Tasks run for each split 5. Tasks reads Arrow data from LLAP 6. HWC returns ArrowColumnVectors to Spark 6 c) hive.executeQuery(“SELECT * FROM t”).sort(“A”).show() ACID Tables MetaStore Spark Meta Hive Meta
  • 21.
    21 © HortonworksInc. 2011–2018. All rights reserved JDBC Connector • How did we build the information_schema? • We mapped the metastore into Hive’s table space! • Uses Hive-JDBC connector • Read-only for now • Supports automatic pushdown of full subqueries • Cost-based optimizer decides part of query runs in RDBMS versus Hive • Joins, aggregates, filters, projections, etc CREATE TABLE postgres_table ( id INT, name varchar ); CREATE EXTERNAL TABLE hive_table ( id INT, name STRING ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "POSTGRES", "hive.sql.jdbc.driver"="org.postgresql.Driver", "hive.sql.jdbc.url"="jdbc:postgresql://...", "hive.sql.dbcp.username"="jdbctest", "hive.sql.dbcp.password"="", "hive.sql.query"="select * from postgres_table", "hive.sql.column.mapping" = "id=ID, name=NAME", "hive.jdbc.update.on.duplicate" = "true" ); In Postgres In Hive
  • 22.
    22 © HortonworksInc. 2011–2018. All rights reserved Usability: Data Analytics Studio
  • 23.
    23 © HortonworksInc. 2011–2018. All rights reserved One of the Extensible DataPlane Services ⬢ DAS 1.0 available now for HDP 3.0! ⬢ Monthly release cadence ⬢ Replaces Hive & Tez Views ⬢ Separate install from stack Hortonworks Data Analytics Studio HORTONWORKS DATAPLANE SERVICE DATA SOURCE INTEGRATION DATA SERVICES CATALOG …DATA LIFECYCLE MANAGER DATA STEWARD STUDIO +OTHER (partner) SECURITY CONTROLS CORE CAPABILITIES MULTIPLE CLUSTERS AND SOURCES MULTIHYBRID *not yet available, coming soon EXTENSIBLE SERVICES IBM DSX* DATA ANALYTICS STUDIO
  • 24.
    24 © HortonworksInc. 2011–2018. All rights reserved. Hortonworks confidential and proprietary information SOLUTIONS: Pre-defined searches to quickly narrow down problematic queries in a large cluster
  • 25.
    25 © HortonworksInc. 2011–2018. All rights reserved. Hortonworks confidential and proprietary information SOLUTIONS: Full featured Auto-complete, results direct download, quick-data preview and many other quality-of-life improvements
  • 26.
    26 © HortonworksInc. 2011–2018. All rights reserved. Hortonworks confidential and proprietary information SOLUTIONS: Heuristic recommendation engine Fully self-serviced query and storage optimization
  • 27.
    27 © HortonworksInc. 2011 – 2016. All Rights Reserved SOLUTIONS: Data Analytics Studio gives database heatmap, quickly discover and see what part of your cluster is being utilized more
  • 28.
    28 © HortonworksInc. 2011–2018. All rights reserved Superset UI for Fast, Interactive Dashboards and Exploration
  • 29.
    29 © HortonworksInc. 2011–2018. All rights reserved Coming Soon
  • 30.
    30 © HortonworksInc. 2011–2018. All rights reserved ⬢ Hive on Kubernetes solves: – Hive/LLAP side install (to main cluster) – Multiple versions of Hive – Multiple warehouse & compute instances – Dynamic configuration and secrets management – Stateful and work preserving restarts (cache) – Rolling restart for upgrades. Fast rollback to previous good state. Hive on Kubernetes (WIP) Kubernetes Hosting Environments AWS GCP Data OS CPU / MEMORY / STORAGE OPENSHIFTAZURE CLOUD PROVIDERS ON- PREM/HYB RID DATA PLANE SERVICES Cluster Lifecycle Manager Data Analytics Studio (DAS) Organizational Services COMPUTE CLUSTER SHARED SERVICES Ranger Atlas Metastore Tiller API Server DAS Web Service Query Coordinators Query Executors Registry Blobstore Indexe r RDBMS Hive Server Long-running kubernetes cluster Inter-cluster communication Intra-cluster communication Ingress Controller or Load Balancer Internal Service Endpoint for ReplicaSet or StatefulSet Ephemeral kubernetes cluster
  • 31.
    31 © HortonworksInc. 2011–2018. All rights reserved Questions?
  • 32.
    32 © HortonworksInc. 2011–2018. All rights reserved Thank You