SlideShare a Scribd company logo
1 of 22
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Hive LLAP : A High Performance, Cost-Effective Alternative to
Traditional MPP Databases
Any reference in this presentation to any specific commercial product, process, or service, or the use of any trade, firm, or corporation
name is for information and convenience only and is not an endorsement, favor, or recommendation byWalmart Inc.
Naveen Peddamail
Sr. Manager, Global Data
Abhishek Gupta
Data Engineer, Global Data
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
• Introduction toWalmart
• Data Lake Initiative – Building a Single Source ofTruth
• Challenges Around Low Latency Querying on Hadoop – Hive LLAP as a Solution
• Performance & Cost Effectiveness of Hive LLAP vs. MPP Databases
• Conclusion & Next Steps
• Q & A
Agenda
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
• Largest retailer in the world and Fortune 1 company
• Serves over 275M customers weekly
• Employs over 2.2M associates worldwide
• 11,300 stores under 58 banners in 27 countries
• eCommerce websites in 10 countries & brands include:
• Walmart.com
• Jet.com
• Hayneedle.com (home furnishings)
• Shoes.com (footwear)
• Moosejaw (outdoor apparel and gear)
• ModCloth (women’s apparel)
• Bonobos (men’s apparel)
To find out more, visit us at https://corporate.walmart.com
About Walmart Labs
• Employs over 4,000 associates worldwide
• Development centers in the US, India, and Ireland
• Open source projects include:
• Hapi (server framework for Node.js)
• OneOps (cloud management platform)
• Electrode (universal React/Node.js platform)
• TestArmada (suite of testing tools)
• Includes Global Data and Analytics Platform team
To find out more, visit us at:
https://www.walmartlabs.com
https://www.facebook.com/WalmartLabs
https://twitter.com/WalmartLabs
https://github.com/walmartlabs
About Walmart
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Data Landscape atWalmart
• Transactional systems from various domains generate huge volume of data every second
• Sales & Orders
• Merchandizing
• Logistics & Supply Chain
• Real Estate
• HR Systems
• Compliance
• Analytical & Reporting databases spread across various platforms and teams
• Challenges in correctly identifying Source ofTruth
• Data Quality, Governance, Metadata management & Lineage was difficult to manage
• Need to build a single source of truth – Data Lake
Data Lake Initiative – Building a Single Source ofTruth
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Criteria for the Data Lake
Governed, Secured & Certified Data
Single Source ofTruth
LowerTotal Cost of Ownership
Robust and Fast Data Access & Reporting
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Data Lake @ Walmart
01 02 03 04
Central true source of analytical data across
Walmart
.
Central Analytical Data Source
• Common services for metadata
• ETL pipeline
• Data quality framework
Data Service Layer
• Roles to manage access control
• Encryption for sensitive data elements
• Providing end to end lineage
Governed and Secure
• Enable ad-hoc analysis
• Improve speed to market for analysis
• Providing a self served storage and compute
platform
Self Service Platform
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Governed, Secured & Certified Data
Single Source ofTruth
LowerTotal Cost of Ownership
Robust and Fast Data Access & Reporting
Are the Business Users Happy Now?
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Low Latency Querying on Hadoop; HIVE LLAP as a Solution
Challenges
• Ad hoc query performance was not so great on
Hadoop/Hive
• Users benchmarked against Massively Parallel
Processing - Enterprise data warehouses (MPP
EDWs)
• Migrating some teams off of Enterprise Data
Warehouses was not possible until you could
guarantee better query response times.
• Queries migrated from other data-warehouses were
not optimal for querying on Hive
Potential
Solutions
Tune Queries for
optimal Hive
performance
Recommend Tez as
default execution
engine
Hive LLAP as a
Performance
Booster
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
JIT Optimization & in- Memory
Cashing
Data Sharing,
Asynchronous IO
Leverages long
lived Daemons
Bridges inefficiencies
of execution engines
Hive LLAPLOW LATENCYANALYTICAL PROCESSING
(Also known as Long Live and Process)
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Hive LLAP Architecture
Source: https://hortonworks.com/blog/top-5-performance-boosters-with-apache-hive-llap/
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Hive LLAP – ReviewingTPC-DS Benchmarks on HDP 2.6
Source: https://hortonworks.com/blog/3x-faster-interactive-query-hive-llap/
• 10TB Scale & the Data model for the underlying tables
were similar to our use case
• Hive LLAP Benchmarks looked promising forTPC-DS data
• Wider Tables
• Complex Dimension tables
SimilaritiesTo Walmart’s Data Model Differences From Walmart’s Data Model
© 2019Walmart Inc.All Rights Reserved
POC GOALS
 Benchmark Hive LLAP query performance
on 3NF Tables involving Joins
 Compare Hive LLAP query performance vs.
MPP-EDWs on same set of queries
Hive LLAP – POC
DATA MODEL
© 2019Walmart Inc.All Rights Reserved
• Hadoop Distribution – HDP 2.6.3
• YARN Scheduler – Capacity Scheduler with pre-emption enabled
• Number of LLAP Nodes –Two Configs 10 Nodes & 15 Nodes.
• Hardware – 256GB RAM, 32 Cores, and 14*6TB disks. Incremental Spend : ~ $ 150K
• Overall Hadoop Cluster Nodes – 90 Nodes
Hive LLAP – Environment Setup
© 2019Walmart Inc.All Rights Reserved
Hive LLAP – Environment Setup
YARN Config
Nodemanager Max Container Size (MB) 230400
Number of LLAP nodes 10 & 15 (TwoVariations)
LLAP Configs
hive.llap.execution.mode all
hive.llap.io.memory.mode cache
hive.llap.io.enabled TRUE
Slider Memory 2048
tez.am.resource.memory.mb 2048
LLAP Daemon Container Max Headroom 8192
Number of concurrent queries 10
Memory per Daemon 226304
Number of executors per LLAP Daemon 44
hive.llap.io.threadpool.size 44
LLAP Daemon Heap Size (MB) 171213
In-Memory Cache per Daemon (MB) 46899
© 2019Walmart Inc.All Rights Reserved
Hive LLAP – Query Patterns & Stats
Query Characteristics
• Queries fall mainly into reporting & ad-hoc workloads
with a focus on business applications
• Aggregations of key metrics across various location,
item & timeframe dimensions
• Scans involving large tables & Joins on multiple tables
• Sorting across various dimensions & facts
• 48 Queries over 4 Time Frames
Table Stats
• Fact Table (1 year data): ~70 Billion rows, 12 TB
• Dimensions(1 key table): ~25 Million rows, 110 GB
SELECT l.column1, l.column2, i.column3, i.column4,
d.column5, sum(s.column6), sum(s. column7),
avg(s.column8), avg(s.column9)
….
….
….
FROM sales as s
JOIN item_dim as i on s.item_id=i.item_id
JOIN location_dim as l on s.location_id=l.location_id
JOIN date_dim as d on s.visit_dt=d.cal_dt
WHERE s.column10 BETWEEN <val1> and <val2>
AND l.column11 = <val3>
…
…
GROUP BY
l.column1, l.column2, i.column3, i.column4, d.column5
ORDER BY
l.column1, l.column2, i.column3, i.column4, d.column5;
Sample Query
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Hive LLAP – Results
0
50
100
150
200
250
300
350
400
450
ExecutionTime(seconds)
Hive LLAP Performance Benchmark
1 Week 4 Weeks 12 Weeks 52 Weeks
75% of the queries ran in < 100 secs
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
30% - 50% Performance Improvement between 10 node vs. 15 node configuration
0
100
200
300
400
500
600
ExecutionTime(seconds)
Queries
Hive LLAP Query Performance for 10 vs. 15 Nodes - Linear Scalability
LLAP -15 Nodes LLAP-10 Nodes
1 Week 4 Weeks 12 Weeks 52 Weeks
Hive LLAP – Results
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Comparing Query Performance of Hive LLAP vs. MPP-EDWs
• For our Comparative analysis, we used two MPP-EDW Clusters
• Queries in the MPP-EDW Clusters were optimized for best performance
Hadoop Cluster
~ 4 TB Memory
480 VCores
MPP EDW B
~ 16 TB Memory
840 VCores
MPP EDW A
~ 4 TB Memory
512 VCores
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
• LLAP performed better than MPP EDW-A system having similar infrastructure
• Comparable difference between LLAP and MPP EDW-B; Provided 4x Infrastructure for MPP
Comparing Query Performance of Hive LLAP vs. MPP-EDWs
0
100
200
300
400
500
600
700
800
ExecutionTime(seconds)
Hive LLAP vs. MPP-A vs. MPP-B
LLAP (Secs) MPP - Enterprise Data Warehouse A (Secs) MPP - Enterprise Data Warehouse B (Secs)
4 Weeks1 Week 13 Weeks 52 Weeks
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Hive LLAP: Conclusion & Next Steps
• Promising product for low latency SQLAccess on top of Hadoop
• Significant Cost Savings vs.Traditional MPP databases
• Not a one size fits all solution
Next Steps:
• Evaluate Hive LLAP on HDP 3.x (Better Enterprise Support)
• Resource Plans & Workload Manager
• SSD Caching
• HS2I : Hive Server2 Interactive - High Availability
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Thank You !
Abhishek Gupta
Data Engineer, Walmart
Abhishek.gupta2@Walmart.com
https://www.linkedin.com/in/gupta-abhishek/
Naveen Peddamail
Sr. Manager, Walmart
Naveen.Peddamail@walmart.com
https://www.linkedin.com/in/naveenpeddamail/
© 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved
Questions?

More Related Content

What's hot

Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive ApproachesData Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive ApproachesDatabricks
 
Emergence of MongoDB as an Enterprise Data Hub
Emergence of MongoDB as an Enterprise Data HubEmergence of MongoDB as an Enterprise Data Hub
Emergence of MongoDB as an Enterprise Data HubMongoDB
 
SAP S_4HANA Migration Cockpit - Migrate your Data to SAP S_4HANA.pdf
SAP S_4HANA Migration Cockpit - Migrate your Data to SAP S_4HANA.pdfSAP S_4HANA Migration Cockpit - Migrate your Data to SAP S_4HANA.pdf
SAP S_4HANA Migration Cockpit - Migrate your Data to SAP S_4HANA.pdfsubbulokam
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Simplilearn
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesCitiusTech
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudMichael Rainey
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Sap s4 hana logistics ppt
Sap s4 hana logistics pptSap s4 hana logistics ppt
Sap s4 hana logistics pptRamaCharitha1
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoAlluxio, Inc.
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Kai Wähner
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 

What's hot (20)

Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
 
Einstein Analytics for Developers
Einstein Analytics for DevelopersEinstein Analytics for Developers
Einstein Analytics for Developers
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive ApproachesData Privacy with Apache Spark: Defensive and Offensive Approaches
Data Privacy with Apache Spark: Defensive and Offensive Approaches
 
Emergence of MongoDB as an Enterprise Data Hub
Emergence of MongoDB as an Enterprise Data HubEmergence of MongoDB as an Enterprise Data Hub
Emergence of MongoDB as an Enterprise Data Hub
 
SAP S_4HANA Migration Cockpit - Migrate your Data to SAP S_4HANA.pdf
SAP S_4HANA Migration Cockpit - Migrate your Data to SAP S_4HANA.pdfSAP S_4HANA Migration Cockpit - Migrate your Data to SAP S_4HANA.pdf
SAP S_4HANA Migration Cockpit - Migrate your Data to SAP S_4HANA.pdf
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best Practices
 
Why SAP HANA?
Why SAP HANA?Why SAP HANA?
Why SAP HANA?
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Sap s4 hana logistics ppt
Sap s4 hana logistics pptSap s4 hana logistics ppt
Sap s4 hana logistics ppt
 
Workshop: Make the Most of Customer Data Platforms - David Raab
Workshop: Make the Most of Customer Data Platforms - David RaabWorkshop: Make the Most of Customer Data Platforms - David Raab
Workshop: Make the Most of Customer Data Platforms - David Raab
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 

Similar to Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP Databases

Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLMatt Lord
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal GemfireIn-Memory Computing Summit
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)Ontico
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetupByung Ho Lee
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!KNIMESlides
 
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIMEBig Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIMERosaria Silipo
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Will Du
 
Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeApache Geode
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)Anthony Baker
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...EMC
 
How to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsHow to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsDataCore Software
 
Building a Hadoop Powered Commerce Data Pipeline
Building a Hadoop Powered Commerce Data PipelineBuilding a Hadoop Powered Commerce Data Pipeline
Building a Hadoop Powered Commerce Data PipelineDataWorks Summit
 
MySQL High Availibility Solutions
MySQL High Availibility SolutionsMySQL High Availibility Solutions
MySQL High Availibility SolutionsMark Swarbrick
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
MySQL High Availability Solutions - Feb 2015 webinar
MySQL High Availability Solutions - Feb 2015 webinarMySQL High Availability Solutions - Feb 2015 webinar
MySQL High Availability Solutions - Feb 2015 webinarAndrew Morgan
 
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...Amazon Web Services
 

Similar to Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP Databases (20)

AWS Database Services @ Scale
AWS Database Services @ ScaleAWS Database Services @ Scale
AWS Database Services @ Scale
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQL
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetup
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!
 
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIMEBig Data as easy as 1, 2, 3, ... 4 ... with KNIME
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica
 
Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache Geode
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
How to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsHow to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANs
 
SAP HANA on Power
SAP HANA on PowerSAP HANA on Power
SAP HANA on Power
 
Building a Hadoop Powered Commerce Data Pipeline
Building a Hadoop Powered Commerce Data PipelineBuilding a Hadoop Powered Commerce Data Pipeline
Building a Hadoop Powered Commerce Data Pipeline
 
MySQL High Availibility Solutions
MySQL High Availibility SolutionsMySQL High Availibility Solutions
MySQL High Availibility Solutions
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
MySQL High Availability Solutions - Feb 2015 webinar
MySQL High Availability Solutions - Feb 2015 webinarMySQL High Availability Solutions - Feb 2015 webinar
MySQL High Availability Solutions - Feb 2015 webinar
 
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
How to Effectively Plan for Disaster Recovery on AWS (CMP204-S) - AWS re:Inve...
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 

Recently uploaded (20)

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 

Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP Databases

  • 1. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Hive LLAP : A High Performance, Cost-Effective Alternative to Traditional MPP Databases Any reference in this presentation to any specific commercial product, process, or service, or the use of any trade, firm, or corporation name is for information and convenience only and is not an endorsement, favor, or recommendation byWalmart Inc. Naveen Peddamail Sr. Manager, Global Data Abhishek Gupta Data Engineer, Global Data
  • 2. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved • Introduction toWalmart • Data Lake Initiative – Building a Single Source ofTruth • Challenges Around Low Latency Querying on Hadoop – Hive LLAP as a Solution • Performance & Cost Effectiveness of Hive LLAP vs. MPP Databases • Conclusion & Next Steps • Q & A Agenda
  • 3. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved • Largest retailer in the world and Fortune 1 company • Serves over 275M customers weekly • Employs over 2.2M associates worldwide • 11,300 stores under 58 banners in 27 countries • eCommerce websites in 10 countries & brands include: • Walmart.com • Jet.com • Hayneedle.com (home furnishings) • Shoes.com (footwear) • Moosejaw (outdoor apparel and gear) • ModCloth (women’s apparel) • Bonobos (men’s apparel) To find out more, visit us at https://corporate.walmart.com About Walmart Labs • Employs over 4,000 associates worldwide • Development centers in the US, India, and Ireland • Open source projects include: • Hapi (server framework for Node.js) • OneOps (cloud management platform) • Electrode (universal React/Node.js platform) • TestArmada (suite of testing tools) • Includes Global Data and Analytics Platform team To find out more, visit us at: https://www.walmartlabs.com https://www.facebook.com/WalmartLabs https://twitter.com/WalmartLabs https://github.com/walmartlabs About Walmart
  • 4. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Data Landscape atWalmart • Transactional systems from various domains generate huge volume of data every second • Sales & Orders • Merchandizing • Logistics & Supply Chain • Real Estate • HR Systems • Compliance • Analytical & Reporting databases spread across various platforms and teams • Challenges in correctly identifying Source ofTruth • Data Quality, Governance, Metadata management & Lineage was difficult to manage • Need to build a single source of truth – Data Lake Data Lake Initiative – Building a Single Source ofTruth
  • 5. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Criteria for the Data Lake Governed, Secured & Certified Data Single Source ofTruth LowerTotal Cost of Ownership Robust and Fast Data Access & Reporting
  • 6. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Data Lake @ Walmart 01 02 03 04 Central true source of analytical data across Walmart . Central Analytical Data Source • Common services for metadata • ETL pipeline • Data quality framework Data Service Layer • Roles to manage access control • Encryption for sensitive data elements • Providing end to end lineage Governed and Secure • Enable ad-hoc analysis • Improve speed to market for analysis • Providing a self served storage and compute platform Self Service Platform
  • 7. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Governed, Secured & Certified Data Single Source ofTruth LowerTotal Cost of Ownership Robust and Fast Data Access & Reporting Are the Business Users Happy Now?
  • 8. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Low Latency Querying on Hadoop; HIVE LLAP as a Solution Challenges • Ad hoc query performance was not so great on Hadoop/Hive • Users benchmarked against Massively Parallel Processing - Enterprise data warehouses (MPP EDWs) • Migrating some teams off of Enterprise Data Warehouses was not possible until you could guarantee better query response times. • Queries migrated from other data-warehouses were not optimal for querying on Hive Potential Solutions Tune Queries for optimal Hive performance Recommend Tez as default execution engine Hive LLAP as a Performance Booster
  • 9. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved JIT Optimization & in- Memory Cashing Data Sharing, Asynchronous IO Leverages long lived Daemons Bridges inefficiencies of execution engines Hive LLAPLOW LATENCYANALYTICAL PROCESSING (Also known as Long Live and Process)
  • 10. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Hive LLAP Architecture Source: https://hortonworks.com/blog/top-5-performance-boosters-with-apache-hive-llap/
  • 11. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Hive LLAP – ReviewingTPC-DS Benchmarks on HDP 2.6 Source: https://hortonworks.com/blog/3x-faster-interactive-query-hive-llap/ • 10TB Scale & the Data model for the underlying tables were similar to our use case • Hive LLAP Benchmarks looked promising forTPC-DS data • Wider Tables • Complex Dimension tables SimilaritiesTo Walmart’s Data Model Differences From Walmart’s Data Model
  • 12. © 2019Walmart Inc.All Rights Reserved POC GOALS  Benchmark Hive LLAP query performance on 3NF Tables involving Joins  Compare Hive LLAP query performance vs. MPP-EDWs on same set of queries Hive LLAP – POC DATA MODEL
  • 13. © 2019Walmart Inc.All Rights Reserved • Hadoop Distribution – HDP 2.6.3 • YARN Scheduler – Capacity Scheduler with pre-emption enabled • Number of LLAP Nodes –Two Configs 10 Nodes & 15 Nodes. • Hardware – 256GB RAM, 32 Cores, and 14*6TB disks. Incremental Spend : ~ $ 150K • Overall Hadoop Cluster Nodes – 90 Nodes Hive LLAP – Environment Setup
  • 14. © 2019Walmart Inc.All Rights Reserved Hive LLAP – Environment Setup YARN Config Nodemanager Max Container Size (MB) 230400 Number of LLAP nodes 10 & 15 (TwoVariations) LLAP Configs hive.llap.execution.mode all hive.llap.io.memory.mode cache hive.llap.io.enabled TRUE Slider Memory 2048 tez.am.resource.memory.mb 2048 LLAP Daemon Container Max Headroom 8192 Number of concurrent queries 10 Memory per Daemon 226304 Number of executors per LLAP Daemon 44 hive.llap.io.threadpool.size 44 LLAP Daemon Heap Size (MB) 171213 In-Memory Cache per Daemon (MB) 46899
  • 15. © 2019Walmart Inc.All Rights Reserved Hive LLAP – Query Patterns & Stats Query Characteristics • Queries fall mainly into reporting & ad-hoc workloads with a focus on business applications • Aggregations of key metrics across various location, item & timeframe dimensions • Scans involving large tables & Joins on multiple tables • Sorting across various dimensions & facts • 48 Queries over 4 Time Frames Table Stats • Fact Table (1 year data): ~70 Billion rows, 12 TB • Dimensions(1 key table): ~25 Million rows, 110 GB SELECT l.column1, l.column2, i.column3, i.column4, d.column5, sum(s.column6), sum(s. column7), avg(s.column8), avg(s.column9) …. …. …. FROM sales as s JOIN item_dim as i on s.item_id=i.item_id JOIN location_dim as l on s.location_id=l.location_id JOIN date_dim as d on s.visit_dt=d.cal_dt WHERE s.column10 BETWEEN <val1> and <val2> AND l.column11 = <val3> … … GROUP BY l.column1, l.column2, i.column3, i.column4, d.column5 ORDER BY l.column1, l.column2, i.column3, i.column4, d.column5; Sample Query
  • 16. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Hive LLAP – Results 0 50 100 150 200 250 300 350 400 450 ExecutionTime(seconds) Hive LLAP Performance Benchmark 1 Week 4 Weeks 12 Weeks 52 Weeks 75% of the queries ran in < 100 secs
  • 17. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved 30% - 50% Performance Improvement between 10 node vs. 15 node configuration 0 100 200 300 400 500 600 ExecutionTime(seconds) Queries Hive LLAP Query Performance for 10 vs. 15 Nodes - Linear Scalability LLAP -15 Nodes LLAP-10 Nodes 1 Week 4 Weeks 12 Weeks 52 Weeks Hive LLAP – Results
  • 18. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Comparing Query Performance of Hive LLAP vs. MPP-EDWs • For our Comparative analysis, we used two MPP-EDW Clusters • Queries in the MPP-EDW Clusters were optimized for best performance Hadoop Cluster ~ 4 TB Memory 480 VCores MPP EDW B ~ 16 TB Memory 840 VCores MPP EDW A ~ 4 TB Memory 512 VCores
  • 19. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved • LLAP performed better than MPP EDW-A system having similar infrastructure • Comparable difference between LLAP and MPP EDW-B; Provided 4x Infrastructure for MPP Comparing Query Performance of Hive LLAP vs. MPP-EDWs 0 100 200 300 400 500 600 700 800 ExecutionTime(seconds) Hive LLAP vs. MPP-A vs. MPP-B LLAP (Secs) MPP - Enterprise Data Warehouse A (Secs) MPP - Enterprise Data Warehouse B (Secs) 4 Weeks1 Week 13 Weeks 52 Weeks
  • 20. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Hive LLAP: Conclusion & Next Steps • Promising product for low latency SQLAccess on top of Hadoop • Significant Cost Savings vs.Traditional MPP databases • Not a one size fits all solution Next Steps: • Evaluate Hive LLAP on HDP 3.x (Better Enterprise Support) • Resource Plans & Workload Manager • SSD Caching • HS2I : Hive Server2 Interactive - High Availability
  • 21. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Thank You ! Abhishek Gupta Data Engineer, Walmart Abhishek.gupta2@Walmart.com https://www.linkedin.com/in/gupta-abhishek/ Naveen Peddamail Sr. Manager, Walmart Naveen.Peddamail@walmart.com https://www.linkedin.com/in/naveenpeddamail/
  • 22. © 2019Walmart Inc.All Rights Reserved© 2019Walmart Inc.All Rights Reserved Questions?

Editor's Notes

  1. We are a small - fortune 1 retailer from a small place back in NW Arkansas.  With the foot print in 27 countries and multiple online brands. WalmartLabs is the technology backbone for Walmart and we are located globally. Global D&A platform is part of WlamartLabs
  2. https://hortonworks.com/blog/top-5-performance-boosters-with-apache-hive-llap/