Querying multiple distributed storage systems with Apache Hive robustly

•Download as PPTX, PDF•

1 like•367 views

Apache Hive facilitates querying and managing large datasets residing in distributed storage. Being used by a very wide community, Hive has been extended to support multiple distributed storage systems. It is now a common practice to have data in different storage systems within an organization. This presentation covers two important aspects of Apache Hive. First aspect covers how Hive makes it possible for organizations to run complex analytical queries across various storage systems or big data components. We recently added HiveKa, to support hive queries on Kafka, and will use it as an example. At Cloudera, we focus not only on providing solutions to help organizations answer bigger questions, but we also make sure that these solutions are robust. The second aspect of this presentation goes over advanced methods/ technologies, like, Random Query Generators, Dockers, Benchmarks, etc that we use at Cloudera to make sure Hive is ready to find right answers from that huge Volume, high Velocity and various Varieties of today’s data.

Engineering

1© Cloudera, Inc. All rights reserved.
Querying multiple distributed
storage systems with Apache
Hive robustly
Ashish Singh | Software Engineer, Cloudera

2© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.

3© Cloudera, Inc. All rights reserved.
Programming
Model
SQL

6© Cloudera, Inc. All rights reserved.
Storage Handler

10© Cloudera, Inc. All rights reserved.
+ = HiveKa

11© Cloudera, Inc. All rights reserved.
+ = HiveKa
Project available on github (https://github.com/HiveKa)

16© Cloudera, Inc. All rights reserved.
Demo Time
16© Cloudera, Inc. All rights reserved.

23© Cloudera, Inc. All rights reserved.
• Strict code review policies

24© Cloudera, Inc. All rights reserved.
• Strict code review policies
• ~7600 upstream tests

25© Cloudera, Inc. All rights reserved.
• Strict code review policies
• ~7600 upstream tests
• End-to-end tests: qTests

26© Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved.

27© Cloudera, Inc. All rights reserved.
@Cloudera

28© Cloudera, Inc. All rights reserved.
@Cloudera
• Believe in Open source Community
• Invest heavily in improving upstream test inf
• Ptests to reduce turn around time

29© Cloudera, Inc. All rights reserved.
@Cloudera
• Believe in Open source Community
• Invest heavily in improving upstream test inf
• Ptests to reduce turn around time
But, is that enough?

30© Cloudera, Inc. All rights reserved. 30© Cloudera, Inc. All rights reserved.
Integration Testing

31© Cloudera, Inc. All rights reserved. 31© Cloudera, Inc. All rights reserved.
Compatibility Testing

32© Cloudera, Inc. All rights reserved. 32© Cloudera, Inc. All rights reserved.
Scale Testing

33© Cloudera, Inc. All rights reserved. 33© Cloudera, Inc. All rights reserved.
Upgrade Testing

34© Cloudera, Inc. All rights reserved. 34© Cloudera, Inc. All rights reserved.
Random Query Generator

35© Cloudera, Inc. All rights reserved.
Thank you
Ashish Singh
asingh@cloudera.com
@singhasdev

What's hot

Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Amazon Web Services

#lspe Q1 2013 dynamically scaling netflix in the cloudCoburn Watson

Blue green deploymentLucas Falk Beier

How Apache Kafka is transforming Hadoop, Spark and StormEdureka!

High Performance Computing Implementation on AWSAmazon Web Services

Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Coburn Watson

IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIn-Memory Computing Summit

AWS Summit Tel Aviv - Enterprise Track - Data WarehouseAmazon Web Services

Redis TimeSeries Redis Labs

Redis For Distributed & Fault Tolerant Data Plumbing Infrastructure Redis Labs

Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira

大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)Amazon Web Services

How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...Amazon Web Services

Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom

Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira

Defending your workloads with aws waf and deep securityMark Nunnikhoven

Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services

Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...HostedbyConfluent

(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services

How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021StreamNative

What's hot (20)

Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...

#lspe Q1 2013 dynamically scaling netflix in the cloud

Blue green deployment

How Apache Kafka is transforming Hadoop, Spark and Storm

High Performance Computing Implementation on AWS

Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...

IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub

AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Redis TimeSeries

Redis For Distributed & Fault Tolerant Data Plumbing Infrastructure

Kafka and Hadoop at LinkedIn Meetup

大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)

How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...

Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite

Kafka & Hadoop - for NYC Kafka Meetup

Defending your workloads with aws waf and deep security

Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013

Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...

(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014

How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021

Similar to Querying multiple distributed storage systems with Apache Hive robustly

Cloudera のサポートエンジニアリング #supennightCloudera Japan

Hadoop Application Architectures tutorial at Big DataService 2015hadooparchbook

Apache Spark OperationsCloudera, Inc.

Hadoop on Cloud: Why and How?Cloudera, Inc.

Hadoop Storage in the Cloud Native EraDataWorks Summit

Data Science and Machine Learning for the EnterpriseCloudera, Inc.

Kafka for DBAsGwen (Chen) Shapira

One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu

One Hadoop, Multiple CloudsCloudera, Inc.

Decoupling Decisions with Apache KafkaGrant Henke

How to go into production your machine learning models? #CWT2017Cloudera Japan

The Fantastic Voyage to PaaS - Are we there yet? (Cloud Foundry Summit 2014)VMware Tanzu

Emerging trends in data analyticsWei-Chiu Chuang

Apache Accumulo OverviewBill Havanki

Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp

Part 2: A Visual Dive into Machine Learning and Deep Learning  Cloudera, Inc.

Introduction to HBase - NoSqlNow2015Apekshit Sharma

피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일 VMware Tanzu Korea

Building an aruba proof of concept lab javier urtubiaAruba, a Hewlett Packard Enterprise company

YARNAlex Moundalexis

Similar to Querying multiple distributed storage systems with Apache Hive robustly (20)

Cloudera のサポートエンジニアリング #supennight

Hadoop Application Architectures tutorial at Big DataService 2015

Apache Spark Operations

Hadoop on Cloud: Why and How?

Hadoop Storage in the Cloud Native Era

Data Science and Machine Learning for the Enterprise

Kafka for DBAs

One Hadoop, Multiple Clouds - NYC Big Data Meetup

One Hadoop, Multiple Clouds

Decoupling Decisions with Apache Kafka

How to go into production your machine learning models? #CWT2017

The Fantastic Voyage to PaaS - Are we there yet? (Cloud Foundry Summit 2014)

Emerging trends in data analytics

Apache Accumulo Overview

Cloudera Analytics and Machine Learning Platform - Optimized for Cloud

Part 2: A Visual Dive into Machine Learning and Deep Learning  

Introduction to HBase - NoSqlNow2015

피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일

Building an aruba proof of concept lab javier urtubia

YARN

Recently uploaded

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

Introduction to Multiple Access Protocol.pptxupamatechverse

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

UNIT - IV - Air Compressors and its Performancesivaprakash250

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

Recently uploaded (20)

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

SPICE PARK APR2024 ( 6,793 SPICE Models )

UNIT-II FMM-Flow Through Circular Conduits

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Introduction to IEEE STANDARDS and its different types.pptx

Introduction to Multiple Access Protocol.pptx

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts

UNIT - IV - Air Compressors and its Performance

KubeKraft presentation @CloudNativeHooghly

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...