Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善

Apache Hiveは、急速に進化しているプロジェクトで、ビッグデータ エコシステムで広く活躍しています。

Hiveは、アナリティクス、レポーティング、そして双方型のクエリのサポートを拡大し続け、コミュニティは、その他の多くの側面やユースケースと共にサポートを改善しようと努力しています。

セミナーでは、LLAP、Apache Druidのマテリアライズド・ビューおよび統合、ワークロード管理、ACIDの改善、クラウドでのHiveの使用、そしてパフォーマンスの改善を取り上げるベンチマークなど、Hiveで実現する最新の機能と最適化の概要をご紹介します。

  • Login to see the comments

Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善

  1. 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Hive 3.0 - HDPの最新バージョンで実現する新機能 とパフォーマンス改善 Zhen Zeng Solution Engineer 2018/09/21
  2. 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Agenda • 自己紹介 • Hive基礎 • Hive 3の新機能 • まとめ
  3. 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved 自己紹介
  4. 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved 自己紹介 Zhen Zeng(曾臻) Hortonworksソリューションエンジニア。 これまでは、ヤフー、ITコンサルティングファーム、 SIerにてエンジニアを従事。 ビッグ・データ、データガバナンス、PaaS、 Webアプリケーションなどのアーキテクト、 設計や実装の経験を有す。
  5. 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Hive基礎
  6. 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved ▪ 現在Hiveを使っている人? ▪ Hiveがバッチしか使えないと思っている人? 質問 SQL
  7. 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved ▪ Powerful ▪ Familiar ▪ Flexible ▪ 普及している ▪ 周辺ツールが豊富 - Deep Ecosystem SQL Is King SQL
  8. 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved RDBMS vs SQL on Hadoop SQL Engine Data in HDFS Meta Data e.g. MySQL Data (tables) Meta Data SQL Engine SQL on Hadoop (could be in three separate systems) RDBMS (one logical system) Client Client
  9. 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved SQL on Hadoop – Schema on Read Client SQL Engine HDFS Meta Data e.g. MySQL Client SQL on Hadoop (Schema on Read) RDBMS (Schema on Write) 1 CREATE TABLE … 2 INSERT INTO … CHECK SCHEMA ✔ 3 SELECT … 2 CREATE TABLE … 1 Ingest data into HDFS 3 SELECT … CHECK SCHEMA ✔Flume
  10. 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved SQL on Hadoop – 分散処理 Hive Server Data Node Client ... Data Node Data Node Data Node Data Node machine 2 machine 3 machine 4 machine 5 machine n Client RDBMS machine 1 machine 1
  11. 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Apache Hiveとは? Apache Hive : SQL gateway to Hadoop Features: • Extensive SQL:2011 Support • ACID Transactions • In-Memory Caching • Cost-Based Optimizer • User-Based Dynamic Security • Replication and Disaster Recovery • JDBC and ODBC Support • Compatible with every major BI Tool • 300+ PB Scaleのデータでも実績あり
  12. 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved Hive LLAP : Hadoop native solution for Interactive Analytics • Open Source • Hadoop Native Integration • Security Hive Server 2 LLAP YARN HDFS Spark MR Pig ETL Interactive queries Interactive queries
  13. 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Apache Hive: Modern ArchitectureStorage Columnar Storage ORCFile Parquet Unstructured Data JSON CSV Text Avro Custom Weblog Engine SQL Engines Row Engine Vector Engine SQL SQL Support SQL:2011 Optimizer HCatalog HiveServer2 Cache Block Cache Linux Cache Distributed Execution Hadoop 1 MapReduce Hadoop 2 Tez Spark Vector Cache LLAP Persistent Server Historical Current Legend BI Tool JDBC ODBC
  14. 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Apache Hive: Fast Facts Most Queries Per Hour 100,000 Queries Per Hour (Yahoo Japan) 分析パフォーマンス 1億行/秒 Per Node (with Hive LLAP) Largest Hive Warehouse 300+ PB Raw Storage (Facebook) Largest Cluster 4,500+ Nodes (Yahoo)
  15. 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Hiveの進化: MR, Tez, Tez + LLAP M M M R R M M R M M R M M R HDFS HDFS HDFS T T T R R R T T T R M M M R R R M M R R HDFS In-Memory columnar cache Map – Reduce Intermediate results in HDFS Tez Optimized Pipeline Tez with LLAP Resident process on Nodes Map tasks read HDFS 「HiveがBatchしか出来ない」は 過去の歴史。Hive 2から処理速 度が劇的に改善
  16. 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Continuous evolution towards scalable performance = 100X Hive + MR Hive + Tez Hive + Tez + LLAP = 20X Hive 1.0 Hive 1.3 Hive 2.0 Batch SQL 分析SQL Interactive SQL • ETL • レポート作成 • Data Mining • 深い分析 • レポート作成 • BI Tools: Microstrategy, Cognos • Ad-Hoc • Drill-Down • Agile BI Tools: Tableau, Power BI Hive 3.0 Interactive SQL • Ad-Hoc • Result Cache • Better W/L Mangt. • Agile BI Tools: Tableau, Power BI HDP 3
  17. 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Hive 3の新機能
  18. 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved 大幅に進化したHive 3 • ユースケースが更に増えた • EDW Offload • Interactive Query • OLAP Query • Real-time ingestion • Unified SQL • Data Federation (SQLServer, Oracle, etc) • Spark-hive Connector • 高性能 • Low latency • Fast response time • Cloud Native • S3, GCS, Azure Real-Time Data Streams+ Workload Management+ ACID Transactions+ Materialized Views+ Scales Horizontally to Petabytes+
  19. 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Hive LLAP – MPP Performance at Hadoop Scale Deep Storage YARN Cluster Resource Mgmt LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors Query Coordinators Coord- inator Coord- inator Coord- inator HiveServer2 (Query Endpoint) ODBC / JDBC SQL Queries In-Memory Cache HDFS and Compatible S3 WASB Isilon BI Pool ETL Pool Background Pool 同じクラスタで簡単にBatch とInteractive を両方実行できる In-Memory Cache In-Memory Cache
  20. 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved EDW analyst pipeline Tableau BI systems Materialized view Surrogate key (代替キー ) Constrains Query Result Cache Workload management ACID v2 & ACID on default • Results return from HDFS/cache directly • Reduce load from repetitive queries • 同時に実行でき るクエリ数が更に 増えた • Reduce resource starvation in large clusters • Also: Active/Passive HA • More “tools” for optimizer to use • More ”tools” for DBAs to tune/optimize • Invisible tuning of DB from users’ perspective • ACID v2 is as fast as regular tables
  21. 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 21 HIVE-18513: Query 結果 cache 実際クエリを実行せずに、ストレージからク エリ結果を直接返す (e.g. HDFS) 前提: 同じクエリが実行されたことがある ダッシュボード、レポートでの利用時、重複 クエリがよくあるので、リソース節約&処理 パフォーマンス向上に役に立つ Without cache With cache
  22. 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 22 HIVE-18513: Query result cache details ⬢ hive.query.results.cache.enabled=true (on by default) ⬢ hive managed tablesのみ有効 – If you JOIN an external table with Hive managed table, Hive will fall back to executing the full query. Because Hive can’t know if external table data has changed ⬢ Works with ACID – That means if Hive table has been updated, the query will be rerun automatically ⬢ LLAP cacheと違う – LLAP cache は読み込みデータのcache. That means multiple queries can benefit by avoiding reading from disk. Speeds up the read path. – Result cache effectively bypasses execution of query ⬢ Stored at /tmp/hive/__resultcache__/, default space is 2GB, LRU eviction – hive.query.results.cache.max.size (bytes)で設定変更可能
  23. 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved LLAP Resource Management in 3.x • リソースプラン 例: • Daytime • Nightime • EndOfQuarter • リソースプール • Capacity based • Fair or FIFO scheduling • Automatic mapping • Map query to pool • User | Group | Application • Triggers to • Move queries • Kill queries 例:出力結果 が大きすぎる 例:実行時間 が長過ぎる
  24. 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved リソースプランの例 CREATE RESOURCE PLAN daytime; CREATE POOL daytime.bi WITH ALLOC_FRACTION=0.8, QUERY_PARALLELISM=5; CREATE POOL daytime.etl WITH ALLOC_FRACTION=0.2, QUERY_PARALLELISM=20; CREATE RULE downgrade IN daytime WHEN total_runtime > 300 THEN MOVE etl; ADD RULE downgrade TO bi; CREATE APPLICATION MAPPING tableau in daytime TO bi; ALTER PLAN daytime SET default pool= etl; APPLY PLAN daytime; daytime bi: 75% etl: 25% Downgrade when total_runtime>300 QUEUEの移動
  25. 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved HIVE-17481: LLAP workload management ⬢ LLAP cluster リソースを効率良く共有する – Resource allocation per user policy; separate ETL and BI, etc. ⬢ Resources based guardrails – Protect against long running queries, high memory usage ⬢ Improved, query-aware scheduling – Scheduler is aware of query characteristics, types, etc. – Fragments easy to pre-empt compared to containers – Queries はクラスタから決めた割合のリソースが保証され る、更に空いているリソースも無駄なく使える
  26. 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved Concurrency向上のチューニング  NVMe SSDs • Metastore DB backend • 50x to 60x improvements in query cache performance (hive_locks) • Namenode • 5-6x improvement in JDBC startup performance • Keep Namenode edit logs on SSD • Zookeeper • RM State store • HS2 active/passive info • LLAP service registry • HDFS • /tmp folder • Yarn logs and Yarn local  複数台 HS2サーバーの併用 • Doesn’t support Workload Management in HDP 3.0
  27. 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved EDW Features
  28. 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved Materialized Views & DW optimizations • MVsでaggregates とjoins を加速 • View navigation via CBO/Calcite • Optionally allow rewrites against out-of-date materializations
  29. 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 29 Materialized view How many unique city-pairs are there? SELECT count(*)/2 FROM ( SELECT dest,origin,count(*) FROM flights_hdfs GROUP BY dest,origin ) as T; Sub-query can be materialized CREATE MATERIALIZED VIEW mv1 AS SELECT dest,origin,count(*) FROM flights_hdfs GROUP BY dest,origin;
  30. 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 30 Materialized view navigation The query planner will automatically navigate to existing views
  31. 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved Hive + Druid : One SQL Interface Across Real-Time and Historical OLAP Cubes SQL Tables Streaming Data Historical Data Unified SQL Layer Pre-Aggregate ACID MERGE Easily ingest event data into OLAP cubes Keep data up-to-date with Hive MERGE Build OLAP Cubes from Hive Archive data to Hive for history Run OLAP queries in real-time or Deep Analytics over all history Deep AnalyticsReal-Time Query
  32. 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 32 Information schema Question: どのdatabaseのどのtableが”ssn”を含むカラムを持っているか、洗い出せる? SELECT columns.table_schema, columns.table_name FROM information_schema.columns WHERE column_name LIKE ‘%ssn%’; This is very useful for EDW offload use cases where some queries depend on databases’ metadata information.
  33. 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 33 HIVE-1555: JDBC connector – Data federation ⬢ How did we build the information_schema? – We basically mapped part of the metastore into Hive’s table space! ⬢ Under the hood we used Hive- JDBC connector ⬢ Read-only for now ⬢ Manual table mapping for now
  34. 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 34 JDBC Table mapping example CREATE TABLE HiveTable ( id INT, name varchar ) CREATE EXTERNAL TABLE HiveTable ( id INT, name STRING ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( "hive.sql.database.type" = "POSTGRES", "hive.sql.jdbc.driver"="org.postgresql.Driver", "hive.sql.jdbc.url"="jdbc:postgresql://hwx-demo- 1.field.hortonworks.com:5432/jdbctest", "hive.sql.dbcp.username"="jdbctest", "hive.sql.dbcp.password"="", "hive.sql.query" = "SELECT ID, NAME FROM hivetable", "hive.sql.column.mapping" = "id=ID, name=NAME", "hive.jdbc.update.on.duplicate" = "true" ); In Postgres In Hive
  35. 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 35 Database key 101 How many types of keys can you name in context of databases? Why those matter? ⬢ EDW solutions usually depend on those features ⬢ Keys allows database engine to make assumptions and run faster ⬢ Essential for building relational databases. ⬢ Primary key ⬢ Secondary key ⬢ Unique key ⬢ Foreign key ⬢ Composite key ⬢ Natural key ⬢ Surrogate key ⬢ Super key ⬢ Candidate key
  36. 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 36 Surrogate key(代替キー) 生成 ⬢ Surrogate key can replace wide, multiple composite keys. ⬢ JOIN on 2 integers are way faster than 2 JOINs on 2 Strings SELECT ROW_NUMBER() OVER () as row_num, * FROM airlines; +----------+----------------+----------------------------------------------------+ | row_num | airlines.code | airlines.description | +----------+----------------+----------------------------------------------------+ | 1 | 02Q | Titan Airways | | 2 | 04Q | Tradewind Aviation | | 3 | 05Q | Comlux Aviation, AG | | 4 | 06Q | Master Top Linhas Aereas Ltd. | | 5 | 07Q | Flair Airlines Ltd. | | 6 | 09Q | Swift Air, LLC | | 7 | 0BQ | DCA | | 8 | 0CQ | ACM AIR CHARTER GmbH |
  37. 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 37 NOT NULL と 制約(CONSTRAINT) ⬢ Essential for data integrity ⬢ Only works on ACID and Append-only tables ⬢ hive.constraint.notnull.enforce = true Example: CREATE TABLE Persons ( ID Int NOT NULL, Name String NOT NULL, Age Int );
  38. 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 38 Default value ⬢ Ensures a value exists ⬢ Can be overwritten in INSERT/UPDATE statements ⬢ Useful in EDW offload cases Example: CREATE TABLE Persons_default ( ID Int NOT NULL, Name String NOT NULL, Age Int, Creator String DEFAULT CURRENT_USER(), CreateDate Date DEFAULT CURRENT_DATE() );
  39. 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved ACID v2  Performance just as good as regular non-ACID tables • Simpler solution, no more requirement for bucketing  There are two parts to ACIDv2 – • one is insert-only ACID and other is full CRUD ACID  Insert-only ACID is support for *ALL* formats • Parquet • Avro • ORC • Text  Enables new optimizations • Incremental updates of MV & query cache • Query cache の一貫性(consistency)  差分ファイル沢山ある場合、パフォーマンスが下がる恐 れがある – Compactionを実行  Cannot be downgraded to ACID v1  Fully compatible with native cloud storage
  40. 40. 40 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 40 ACID v2 CREATE TABLE hello_acid (key int, value int) PARTITIONED BY (load_date date) CLUSTERED BY(key) INTO 3 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true'); CREATE TABLE hello_acid_v2 (load_date date, key int, value int);
  41. 41. 41 © Hortonworks Inc. 2011–2018. All rights reserved© Hortonworks Inc. 2011- 2017. All rights reserved | 41 HDP3: EDW ingestion pipeline LLAP interface Kafka-Druid- Hive ingest Kafka-hive streaming ingest Druid ACID tables Real-time analytics • Druid answers in near real-time Easy to use • Query any data via LLAP • No need to de-ACID tables • No bucketing required • Calcite talks SQL • Materialization just works • Cache just works
  42. 42. 42 © Hortonworks Inc. 2011–2018. All rights reserved HDP 3 – New Unified Streaming Ingest Pipeline Unified ingestion connectors ACID tablesMaterialized viewsReal-time rollup Streaming Data Historical Data Hive LLAP Unified SQL DAS | SuperSet | JDBC Real-time ingest  Read from Kafka  Dual write to Hive and Druid Real-time analytics • Druid answers in near real-time • Hive ACID keeps data in sync Unified API • Calcite talks unified SQL • Optimizer automatically use pre- computed materializations Easy to use tooling • DAS: Manage and Optimize • SuperSet: Dashboard and reports • JDBC: Tableau, Excel et al
  43. 43. 43 © Hortonworks Inc. 2011–2018. All rights reserved HDP – Security & Governance Classification Prohibition Time Location Policies PDP Resource Cache Ranger Manage Access Policies and Audit Logs Track Metadata and Lineage Atlas Client Subscribers to Topic Gets Metadata Updates Atlas Metastore Tags Assets Entitles Streams Pipelines Feeds Hive Tables HDFS Files HBase Tables Entities in Data Lake Industry First: Dynamic Tag-based Security Policies
  44. 44. 44 © Hortonworks Inc. 2011–2018. All rights reserved Unique Security Features within HDP for SQL Users  Control Access to Rows in Hive Tables based on Context!  Improve reliability and robustness of HDP by providing Row Level Security to Hive tables and reducing surface area of security system  Restrict data row access based on user characteristics (e.g. group membership) AND runtime context  Use Cases: • A hospital can create a security policy that allows doctors to view data rows only for their own patients • A bank can create a policy to restrict access to rows of financial data based on the employee's business division, locale or based on the employee's role • A multi-tenant application can create logical separation of each tenant's each tenant can see only its data rows.  Protect Sensitive Data in real-time with Dynamic Data Masking/Obfuscation!  Mask or anonymize sensitive columns of data (e.g. PII, PCI, PHI) from Hive query output  Benefits • Sensitive information never leaves database • No changes are required at the application or Hive layer • No need to produce additional protected duplicate versions of datasets • Simple & easy to setup masking policies Row Level Security in Hive Dynamic Data Masking of Hive Columns R A N G E R H I V E
  45. 45. 45 © Hortonworks Inc. 2011–2018. All rights reserved Security: Dynamic Row Filtering & Column Masking User 2: Ivanna Location : EU Group: HRUser 1: Joe Location : US Group: Analyst Original Query: SELECT country, nationalid, ccnumber, mrn, name FROM ww_customers Country National ID CC No DOB MRN Name Policy ID US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424 US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984 Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909 Country National ID CC No MRN Name US xxxxx3233 4539 xxxx xxxx xxxx null John Doe US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe Ranger Policy Enforcement Query Rewritten based on Dynamic Ranger Policies: Filter rows by region & apply relevant column masking Users from US Analyst group see data for US persons with CC and National ID (SSN) as masked values and MRN is nullified Country National ID Name MRN Germany T22000129 Ernie Schwarz 876452830A EU HR Policy Admins can see unmasked but are restricted by row filtering policies to see data for EU persons only Original Query: SELECT country, nationalid, name, mrn FROM ww_customers
  46. 46. 46 © Hortonworks Inc. 2011–2018. All rights reserved Data analytics studio data plane servicesの一部として、 クラウドでもOn-Premでも利用可能 2018年9月GAになりました Hortonworks Data Analytics Studio HORTONWORKS DATAPLANE SERVICE DATA SOURCE INTEGRATION DATA SERVICES CATALOG …DATA LIFECYCLE MANAGER DATA STEWARD STUDIO +OTHER (partner) SECURITY CONTROLS CORE CAPABILITIES MULTIPLE CLUSTERS AND SOURCES MULTIHYBRID *not yet available, coming soon EXTENSIBLE SERVICES IBM DSX* DATA ANALYTICS STUDIO
  47. 47. © Hortonworks Inc. 2011- 2017. All rights reserved | 47 Why is my query slow? Noisy neighbors Poor schema Inefficient queries Unstable demand Expensive Query log Storage Optimizations Query Optimizations Demand Shifting Hortonworks Data Analytics Studio Optimize Your Hive Workloads Part of the Hortonworks DataPlane Services
  48. 48. 48 © Hortonworks Inc. 2011–2018. All rights reserved SOLUTIONS: Full featured Auto-complete, results direct download, quick-data preview and many other quality-of-life improvements. Data Analytics Studio (DAS)Data Analytics Studio (DAS)
  49. 49. 49 © Hortonworks Inc. 2011–2018. All rights reserved SOLUTIONS: Data Analytics Studio gives database heatmap, quickly discover and see what part of your cluster is being utilized more Data Analytics Studio (DAS)Data Analytics Studio (DAS)
  50. 50. 50 © Hortonworks Inc. 2011–2018. All rights reserved SOLUTIONS: Pre-defined searches to quickly narrow down problematic queries in a large cluster Data Analytics Studio (DAS)Data Analytics Studio (DAS)
  51. 51. 51 © Hortonworks Inc. 2011–2018. All rights reserved SOLUTIONS: Heuristic recommendation engine Fully self-serviced query and storage optimization Data Analytics Studio (DAS)Data Analytics Studio (DAS)
  52. 52. 52 © Hortonworks Inc. 2011–2018. All rights reserved SOLUTIONS: Built-in batch operations No more scripting needed for day-to-day operations Data Analytics Studio (DAS)Data Analytics Studio (DAS)
  53. 53. 53 © Hortonworks Inc. 2011–2018. All rights reserved まとめ
  54. 54. 54 © Hortonworks Inc. 2011–2018. All rights reserved Hive 3 - Scalable Data Warehousing on Hadoop Capabilities Batch SQL OLAP / CubeInteractive SQL Sub-Second SQL (Hive LLAP) ACID / MERGE Applications • ETL • Reporting • Data Mining • Deep Analytics • Multidimensional Analytics • MDX Tools • Excel • Reporting • BI Tools: Tableau, Microstrategy, Cognos • Ad-Hoc • Drill-Down • BI Tools: Tableau, Excel • Continuous Ingestion from Operational DBMS • Slowly Changing DimensionsCore Platform Scale-Out Storage Petabyte Scale Processing Core SQL Engine Apache Tez: Scalable Distributed Processing Advanced Cost-Based Optimizer Connectivity Advanced Security JDBC / ODBC Comprehensive SQL:2011 Coverage
  55. 55. © Hortonworks Inc. 2011- 2017. All rights reserved | 55 Summary • Our vision of future EDW is a unified, open source data access layer that works with across technologies and in a hybrid model • Druid, Kafka and Hive integration enables real-time analytics on event streams • Offloading is still the primary use case, Hive is becoming a full featured database • ACID on by default enables data change at scale, key to support GDPR • Usability and visibility with release of Data analytics studio (DAS)
  56. 56. 56 © Hortonworks Inc. 2011–2018. All rights reserved Questions?
  57. 57. 57 © Hortonworks Inc. 2011–2018. All rights reserved Thank you

×