SlideShare a Scribd company logo
1 of 19
Download to read offline
Hive on Spark - Production Experience
@Uber
Xuefu Zhang, Staff Engineer, Data Infra
Outline
● Hive at Uber
● Current Status
● Issues
● Future Work
● Conclusions
● Q&A
Hive at Uber
● Hundreds of active users daily
● Over 20K queries per day
● P50 - P90 execution time 2min - 20min
● Used for ETL and data analytics
● MR + Tez + Spark
Hive at Uber (cont’d)
● Efficiency is top priority
● Cluster operates at capacity
● Faster data, faster ETL
● Technology/operations/expertise consolidations
Why Hive on Spark
● Significantly less disk IO on HDFS
● Utilize memory for better performance
● Higher success rate with Uber’s workload
● Better supportability, observability, and UI
● Spark is widely adopted in your infrastructure
Why Hive on Spark (cont’d)
● On average 2X performance improvement
● On average 1.5X efficiency improvement
● Significantly reduce RPC calls to HDFS namenode (5X)
● Significantly reduce temp disk space on HDFS (10X)
Current Status
● By H1 2017,
○ All ad-hoc queries are on Hive on Spark
○ 15% ETL pipelines are migrated
○ Current Hive traffic breakdown: 50% MR, 40% Spark, 10% Tez
● By H2 2017
○ All workload are on Hive on Spark
○ MR usage will be exceptional
Issues
● Infrastructural issues
○ IPv4 & IPv6 (not to mix)
○ Network timeout (spark.network.timeout=800s)
○ Try to keep homogeneous nodes in the cluster
● Spark dynamic allocation issues
○ Backported many patches to Spark 1.6
○ spark.dynamicAllocation.maxExecutors=2000
Issues
● Hive issues
○ Unbounded memory usage for orderBy
○ Concurrency issues related to static variables
○ Spark executor and driver memory settings
○ Hive RPC server and client connection problems
Issues (cont’d)
○ Stats-related issues
■ Missing/inaccurate stats
■ No stats for nest columns
○ Performance issues
■ MapJoin small table size
■ Operator stats used for mapjoin
Issues (cont’d)
● Other Spark issues
○ Spark driver performance
○ Spark event queue size
○ Unbounded memory usage for groupby
○ Spark history server
Configurations (cont’d)
● Some of our configurations
spark.scheduler.listenerbus.eventqueue.size=50000
hive.spark.client.connect.timeout=5s
hive.spark.client.server.connect.timeout=1h
spark.locality.wait=0s
hive.spark.use.op.stats=false
hive.spark.use.file.size.for.mapjoin=true
Configurations (cont’d)
hive.spark.job.max.tasks=200000
hive.spark.stage.max.tasks=80000
spark.dynamicAllocation.initialExecutors=5
spark.executor.cores=5
spark.executor.memory=7168m
spark.yarn.executor.memoryOverhead=3072
spark.shuffle.manager=tungsten-sort
You may need to figure out what configurations work best for your cluster!
Future Work
● Global collaboration
○ Uber
○ Intel
○ Cloudera
○ Freelance contributors in the community
Future Work (cont’d)
● Improve Spark
○ Dynamic allocation
○ Driver performance
○ Resource efficiency
Future Work (cont’d)
● Improve Hive
○ Stats support for nested columns
○ Predicate pushdown for nested columns
○ Dynamic partition pruning
○ Full vectorization
○ Optimizations that currently only work for Tez
Conclusions
● HoS helps us on query performance and resource efficiency
● HoS significant reduces load on HDFS
● HoS helps us consolidate technologies
● Migration to HoS is fairly straight forward and transparent for most users
● However, there are catches in deployment and production
● More effort is on the way
Thank you
Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be
reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or
by any information storage or retrieval systems, without permission in writing from Uber. This document is intended
only for the use of the individual or entity to whom it is addressed and contains information that is privileged,
confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified
that the information contained herein includes proprietary and confidential information of Uber, and recipient may not
make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person
other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber.
Business Development Lead at Uber
+1. 415.237.5555
doreipwehociwjcioreoicnrm@uber.com
Q&A
We are hiring: https://www.uber.com/careers/list/27366/
Contact: abhik@uber.com, xuefu@uber.com

More Related Content

What's hot

How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoYu Liu
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
 
LLAP Nov Meetup
LLAP Nov MeetupLLAP Nov Meetup
LLAP Nov Meetupt3rmin4t0r
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014alanfgates
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013alanfgates
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryTsz-Wo (Nicholas) Sze
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduDataWorks Summit
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 

What's hot (20)

How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
LLAP Nov Meetup
LLAP Nov MeetupLLAP Nov Meetup
LLAP Nov Meetup
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 

Similar to Hive on Spark, production experience @Uber

Even Faster: When Presto meets Parquet @ Uber
Even Faster: When Presto meets Parquet @ UberEven Faster: When Presto meets Parquet @ Uber
Even Faster: When Presto meets Parquet @ UberDataWorks Summit
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Zhenxiao Luo
 
Uber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks SummitUber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks SummitZhenxiao Luo
 
Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at UberDataWorks Summit
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017Zhenxiao Luo
 
Protecting the Data Lake
Protecting the Data LakeProtecting the Data Lake
Protecting the Data LakeAshutosh Narkar
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
Fast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud ServiceFast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud ServiceGustavo Rene Antunez
 
Does Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus WebinarDoes Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus WebinarImpetus Technologies
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology confluent
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA EDB
 
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to YouHTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to YouDavid Delabassee
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Hortonworks
 
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...jdijcks
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Fran Navarro
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my dataAndrejs Vorobjovs
 

Similar to Hive on Spark, production experience @Uber (20)

Even Faster: When Presto meets Parquet @ Uber
Even Faster: When Presto meets Parquet @ UberEven Faster: When Presto meets Parquet @ Uber
Even Faster: When Presto meets Parquet @ Uber
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 
Uber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks SummitUber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks Summit
 
Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at Uber
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017
 
Protecting the Data Lake
Protecting the Data LakeProtecting the Data Lake
Protecting the Data Lake
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Fast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud ServiceFast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud Service
 
Does Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus WebinarDoes Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus Webinar
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to YouHTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my data
 

Recently uploaded

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 

Recently uploaded (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 

Hive on Spark, production experience @Uber

  • 1. Hive on Spark - Production Experience @Uber Xuefu Zhang, Staff Engineer, Data Infra
  • 2. Outline ● Hive at Uber ● Current Status ● Issues ● Future Work ● Conclusions ● Q&A
  • 3. Hive at Uber ● Hundreds of active users daily ● Over 20K queries per day ● P50 - P90 execution time 2min - 20min ● Used for ETL and data analytics ● MR + Tez + Spark
  • 4. Hive at Uber (cont’d) ● Efficiency is top priority ● Cluster operates at capacity ● Faster data, faster ETL ● Technology/operations/expertise consolidations
  • 5. Why Hive on Spark ● Significantly less disk IO on HDFS ● Utilize memory for better performance ● Higher success rate with Uber’s workload ● Better supportability, observability, and UI ● Spark is widely adopted in your infrastructure
  • 6. Why Hive on Spark (cont’d) ● On average 2X performance improvement ● On average 1.5X efficiency improvement ● Significantly reduce RPC calls to HDFS namenode (5X) ● Significantly reduce temp disk space on HDFS (10X)
  • 7. Current Status ● By H1 2017, ○ All ad-hoc queries are on Hive on Spark ○ 15% ETL pipelines are migrated ○ Current Hive traffic breakdown: 50% MR, 40% Spark, 10% Tez ● By H2 2017 ○ All workload are on Hive on Spark ○ MR usage will be exceptional
  • 8. Issues ● Infrastructural issues ○ IPv4 & IPv6 (not to mix) ○ Network timeout (spark.network.timeout=800s) ○ Try to keep homogeneous nodes in the cluster ● Spark dynamic allocation issues ○ Backported many patches to Spark 1.6 ○ spark.dynamicAllocation.maxExecutors=2000
  • 9. Issues ● Hive issues ○ Unbounded memory usage for orderBy ○ Concurrency issues related to static variables ○ Spark executor and driver memory settings ○ Hive RPC server and client connection problems
  • 10. Issues (cont’d) ○ Stats-related issues ■ Missing/inaccurate stats ■ No stats for nest columns ○ Performance issues ■ MapJoin small table size ■ Operator stats used for mapjoin
  • 11. Issues (cont’d) ● Other Spark issues ○ Spark driver performance ○ Spark event queue size ○ Unbounded memory usage for groupby ○ Spark history server
  • 12. Configurations (cont’d) ● Some of our configurations spark.scheduler.listenerbus.eventqueue.size=50000 hive.spark.client.connect.timeout=5s hive.spark.client.server.connect.timeout=1h spark.locality.wait=0s hive.spark.use.op.stats=false hive.spark.use.file.size.for.mapjoin=true
  • 14. Future Work ● Global collaboration ○ Uber ○ Intel ○ Cloudera ○ Freelance contributors in the community
  • 15. Future Work (cont’d) ● Improve Spark ○ Dynamic allocation ○ Driver performance ○ Resource efficiency
  • 16. Future Work (cont’d) ● Improve Hive ○ Stats support for nested columns ○ Predicate pushdown for nested columns ○ Dynamic partition pruning ○ Full vectorization ○ Optimizations that currently only work for Tez
  • 17. Conclusions ● HoS helps us on query performance and resource efficiency ● HoS significant reduces load on HDFS ● HoS helps us consolidate technologies ● Migration to HoS is fairly straight forward and transparent for most users ● However, there are catches in deployment and production ● More effort is on the way
  • 18. Thank you Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber. Business Development Lead at Uber +1. 415.237.5555 doreipwehociwjcioreoicnrm@uber.com
  • 19. Q&A We are hiring: https://www.uber.com/careers/list/27366/ Contact: abhik@uber.com, xuefu@uber.com