SlideShare a Scribd company logo

Impala: Real-time Queries in Hadoop

Learn how Cloudera Impala empowers you to: - Perform interactive, real-time analysis directly on source data stored in Hadoop - Interact with data in HDFS and HBase at the “speed of thought” - Reduce data movement between systems & eliminate double storage

1 of 21
Cloudera	
  Impala	
  
Jus/n	
  Erickson	
  |	
  Product	
  Manager	
  
	
  
November	
  2012	
  
Why	
  Data	
  Scien/sts	
  Love	
  Hadoop	
  
                                 	
  
    •    Massive	
  volumes	
  of	
  data	
  


                          	
  
    •    Data	
  prepara/on	
  &	
  analy/cs	
  in	
  1	
  environment	
  
    •    Highly	
  flexible	
  environment	
  for	
  crea/ng	
  &	
  tes/ng	
  machine	
  learning	
  models	
  


                          	
  
    •    10%	
  the	
  cost/TB	
  under	
  management	
  


                                           ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Hadoop	
  Use	
  Cases	
  Moving	
  to	
  Real-­‐Time	
  




         Already	
  query	
            Already	
  load	
  data	
  into	
                            Already	
  use	
  HBase	
  for	
  
       Hadoop	
  using	
  Hive	
      CDH	
  every	
  90	
  mins	
  or	
  less	
  	
                real-­‐/me	
  data	
  access	
  




                                                                       Source:	
  Cloudera	
  customer	
  survey	
  August	
  2012	
  

                                     ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
But	
  Hadoop	
  Isn’t	
  Fast	
  Enough	
  




         Need	
  faster	
           Move	
  data	
  from	
  	
                               See	
  value	
  today	
  in	
  
          queries	
  on	
         Hadoop	
  to	
  RDBMS	
  for	
                             consolida/ng	
  to	
  a	
  
         Hadoop	
  data	
            interac/ve	
  SQL	
                                      single	
  plaYorm	
  




                                                                Source:	
  Cloudera	
  customer	
  survey	
  August	
  2012	
  

                              ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Beyond	
  Batch	
  –	
  The	
  Next	
  Stage	
  for	
  Hadoop	
  
                    HADOOP	
  TODAY	
  IS	
  TOO	
  SLOW	
  
                             MapReduce	
  is	
  batch	
  
           Simple	
  queries	
  can	
  take	
  minutes	
  /	
  tens	
  of	
  minutes	
  
                                               	
  
                              	
  
      CURRENT	
  DATA	
  MANAGEMENT	
  IS	
  TOO	
  COMPLEX	
  
                         Op/mized	
  for	
  rigid	
  schemas	
  &	
  	
  
                           special	
  purpose	
  applica/ons	
  
                   Redundant	
  data	
  storage	
  &	
  processes	
  
                  Very	
  expensive	
  systems:	
  $20K-­‐150K	
  /	
  TB	
  
                                                          	
  
                                  ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
Cloudera	
  Enterprise	
  RTQ	
  
Real-­‐Time	
  Query	
  for	
  Data	
  Stored	
  in	
  Hadoop	
  	
  
Powered	
  by	
  Cloudera	
  Impala.	
  
                                                      Supports	
  Hive	
  SQL	
  
                                                      4-­‐30X	
  faster	
  than	
  Hive	
  over	
  MapReduce	
  
                                                      Supports	
  mul/ple	
  storage	
  engines	
  &	
  	
  
                                                      file	
  formats	
  
                                                      Uses	
  exis/ng	
  drivers,	
  integrates	
  with	
  exis/ng	
  
                                                      metastore,	
  works	
  with	
  leading	
  BI	
  tools	
  
                                                      Flexible,	
  cost-­‐effec/ve,	
  no	
  lock-­‐in	
  

                                                      Deploy	
  &	
  operate	
  with	
  Cloudera	
  Manager	
  

                                     ©2012	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  

Recommended

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
 

More Related Content

What's hot

How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkLi Ming Tsai
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystifiedOmid Vahdaty
 
Spark overview
Spark overviewSpark overview
Spark overviewLisa Hua
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 

What's hot (20)

How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Spark
SparkSpark
Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
Spark overview
Spark overviewSpark overview
Spark overview
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 

Viewers also liked

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageCloudera, Inc.
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentationhadooparchbook
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera, Inc.
 
Introduction to Apache HBase Training
Introduction to Apache HBase TrainingIntroduction to Apache HBase Training
Introduction to Apache HBase TrainingCloudera, Inc.
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopJSI
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ ZooskCloudera, Inc.
 
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerApache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerStratio
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 

Viewers also liked (20)

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Bhan hiv
Bhan hivBhan hiv
Bhan hiv
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 
ImpalaToGo use case
ImpalaToGo use caseImpalaToGo use case
ImpalaToGo use case
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 
Introduction to Apache HBase Training
Introduction to Apache HBase TrainingIntroduction to Apache HBase Training
Introduction to Apache HBase Training
 
Data Visualization Design Best Practices Workshop
Data Visualization Design Best Practices WorkshopData Visualization Design Best Practices Workshop
Data Visualization Design Best Practices Workshop
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
 
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerApache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 

Similar to Impala: Real-time Queries in Hadoop

Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Cloudera, Inc.
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseCloudera, Inc.
 
Improving MySQL performance with Hadoop
Improving MySQL performance with HadoopImproving MySQL performance with Hadoop
Improving MySQL performance with HadoopSagar Jauhari
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 

Similar to Impala: Real-time Queries in Hadoop (20)

Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
 
Improving MySQL performance with Hadoop
Improving MySQL performance with HadoopImproving MySQL performance with Hadoop
Improving MySQL performance with Hadoop
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stackSummit
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceSusan Ibach
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Umar Saif
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1Inbay UK
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewAshraf Fouad
 
Artificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeArtificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeJosh Gellers
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Product School
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Product School
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfSafe Software
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro KozhevinFwdays
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringMassimo Talia
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor FesenkoFwdays
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfMostafa Higazy
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...htrindia
 
Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotelPhilippines
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Product School
 
My Journey towards Artificial Intelligence
My Journey towards Artificial IntelligenceMy Journey towards Artificial Intelligence
My Journey towards Artificial IntelligenceVijayananda Mohire
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
 

Recently uploaded (20)

How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stack
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data science
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book Review
 
Artificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeArtificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human Justice
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineering
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdf
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
 
Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company Profile
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 
My Journey towards Artificial Intelligence
My Journey towards Artificial IntelligenceMy Journey towards Artificial Intelligence
My Journey towards Artificial Intelligence
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 

Impala: Real-time Queries in Hadoop

  • 1. Cloudera  Impala   Jus/n  Erickson  |  Product  Manager     November  2012  
  • 2. Why  Data  Scien/sts  Love  Hadoop     •  Massive  volumes  of  data     •  Data  prepara/on  &  analy/cs  in  1  environment   •  Highly  flexible  environment  for  crea/ng  &  tes/ng  machine  learning  models     •  10%  the  cost/TB  under  management   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 3. Hadoop  Use  Cases  Moving  to  Real-­‐Time   Already  query   Already  load  data  into   Already  use  HBase  for   Hadoop  using  Hive   CDH  every  90  mins  or  less     real-­‐/me  data  access   Source:  Cloudera  customer  survey  August  2012   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 4. But  Hadoop  Isn’t  Fast  Enough   Need  faster   Move  data  from     See  value  today  in   queries  on   Hadoop  to  RDBMS  for   consolida/ng  to  a   Hadoop  data   interac/ve  SQL   single  plaYorm   Source:  Cloudera  customer  survey  August  2012   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 5. Beyond  Batch  –  The  Next  Stage  for  Hadoop   HADOOP  TODAY  IS  TOO  SLOW   MapReduce  is  batch   Simple  queries  can  take  minutes  /  tens  of  minutes       CURRENT  DATA  MANAGEMENT  IS  TOO  COMPLEX   Op/mized  for  rigid  schemas  &     special  purpose  applica/ons   Redundant  data  storage  &  processes   Very  expensive  systems:  $20K-­‐150K  /  TB     ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 6. Cloudera  Enterprise  RTQ   Real-­‐Time  Query  for  Data  Stored  in  Hadoop     Powered  by  Cloudera  Impala.   Supports  Hive  SQL   4-­‐30X  faster  than  Hive  over  MapReduce   Supports  mul/ple  storage  engines  &     file  formats   Uses  exis/ng  drivers,  integrates  with  exis/ng   metastore,  works  with  leading  BI  tools   Flexible,  cost-­‐effec/ve,  no  lock-­‐in   Deploy  &  operate  with  Cloudera  Manager   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 7. Cloudera  Now  Powered  by  Impala   BEFORE  IMPALA   WITH  IMPALA   USER  INTERFACE   BATCH  PROCESSING   REAL-­‐TIME  ACCESS   •  Unified  Storage:   •  With  Impala:     Supports  HDFS  and  HBase   Real-­‐/me  SQL  queries   Flexible  file  formats   Na/ve  distributed  query  engine   •  Unified  Metastore   Op/mized  for  low-­‐latency   •  Unified  Security   •  Provides:   •  Unified  Client  Interfaces:   Answers  as  fast  as  you  can  ask   ODBC,  SQL  syntax,  Hue  Beeswax   Everyone  to  ask  ques/ons  for  all  data   Big  data  storage  and  analy/cs  together   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 8. Impala  beta  features   Today  (Cloudera  Impala  0.1):   •  Nearly  all  of  Hive's  SQL,  including  insert,  join,  and  subqueries   •  Query  results  4-­‐30X  faster  than  Hive   •  Same  open  Hive  metadata  model  =>  easy  to  create  &  change  schema   •  Support  for  HDFS  and  HBase  storage   •  HDFS  file  formats:  TextFile,  SequenceFile   •  HDFS  compression:  Snappy,  GZIP,  BZIP   •  Common  ODBC  driver  and  Hue  Beeswax  with  Hive   •  Separate  CLI  than  Hive   Next  few  months:   •  Support  for  Avro,  RCFile  &  LZO  compressed  text   •  Addi/onal  OS  support   •  Trevni  columnar  format   •  JDBC  driver   •  DDL   •  Straggler  handling   •  Increased  join  perf   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 9. Impala  v0.1  SQL  (HiveQL)   •  Select   –  Boolean,  /nyint,  smallint,  int,  bigint,  float,  double,  /mestamp,  string   –  All,  dis/nct   –  Subqueries  (in  from  clause)   –  Where,  group  by,  having   –  Order  by  (with  limit  ini/ally)   –  Joins  (ler,  right,  full,  outer),  mul/-­‐table,  subquery   –  Union  all   –  Limit   –  External  tables   –  Rela/onal,  arithme/c,  logical  operators   –  Math,  collec/on,  cast,  date,  condi/onal,  string,  /mestamp  built-­‐ins  (e.g.  count,  sum,  cast,  case,  like,   in,  between,  coalesce)   •  Insert  into   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 10. Cloudera  Impala  Details   Common  Hive  SQL  and  interface   Unified  metadata  and  scheduler   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   Query  Planner   Query  Planner   Fully  MPP   Query  Planner   Query  Coordinator   Query  Coordinator   Distributed   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   Local  Direct  Reads   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 11. Cloudera  Impala  Details   Common  Hive  SQL  and  interface   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   SQL  Request   Query  Planner   Query  Planner   Query  Planner   Query  Coordinator   Query  Coordinator   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 12. Cloudera  Impala  Details   Unified  metadata  and  scheduler   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   Query  Planner   Query  Planner   Query  Planner   Query  Coordinator   Query  Coordinator   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 13. Cloudera  Impala  Details   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   Query  Planner   Query  Planner   Fully  MPP   Query  Planner   Query  Coordinator   Query  Coordinator   Distributed   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 14. Cloudera  Impala  Details   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   Query  Planner   Query  Planner   Query  Planner   Query  Coordinator   Query  Coordinator   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   Local  Direct  Reads   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 15. Cloudera  Impala  Details   SQL  App   Hive   State   Metastore   YARN   HDFS  NN   Store   ODBC   SQL  Results   Query  Planner   Query  Planner   In  Memory   Query  Planner   Query  Coordinator   Query  Coordinator   Transfers   Query  Coordinator   Query  Exec  Engine   Query  Exec  Engine   Query  Exec  Engine   HDFS  DN   HBase   HDFS  DN   HBase   HDFS  DN   HBase   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 16. Impala  and  Hive   •  Shared  with  Hive:   –  Metadata  (table  defini/ons)   –  ODBC  driver   –  Hue  Beeswax   –  SQL  syntax  (HiveQL)   –  Flexible  file  formats   –  Machine  pool   •  Improvements:   –  Purpose-­‐built  query  engine  direct  on  HDFS  and  HBase   –  No  JVM  and  MapReduce   –  In-­‐memory  data  transfers   –  Low-­‐latency  scheduler   –  Na/ve  distributed  rela/onal  query  engine   –  Trevni  columnar  format  (arer  v0.1)   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 17. Advantages  of  Our  Approach   •  No  high-­‐latency  MapReduce  batch  processing   •  Local  processing  avoids  network  botlenecks   •  No  costly  data  format  conversion  overhead   •  All  data  immediately  query-­‐able   •  Single  machine  pool  to  scale   •  All  machines  available  to  both  Impala  and  MapReduce   •  Single,  open,  and  unified  metadata  and  scheduler   MapReduce   Remote  Query   Side  Storage   Query   Query   Query   Query   Node   Node   Node   Node   Query   MR   Hive   Engine   MR   OR   MR   DN   NN   DN   HDFS   DN   DN   DN   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 18. Google  Dremel  and  Impala   •  What  is  Dremel:   –  Columnar  storage  for  data  with  nested  structures   –  Distributed  scalable  aggrega/on  on  top  of  that   •  Columnar  storage  in  Hadoop:  Trevni   –  New  columnar  format  created  by  Doug  Cuung   –  Stores  data  in  appropriate  na/ve/binary  types   –  Will  also  store  nested  structures  similar  to  Dremel's  ColumnIO   •  Distributed  aggrega/on:  Impala   •  Impala  plus  Trevni:  a  superset  of  the  published  version  of  Dremel  (which  didn't   support  joins)   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 19. Benefits  of  Cloudera  Impala   Real-­‐Time  Query  for  Data  Stored  in  Hadoop   • Get  answers  as  fast  as  you  can  ask  ques/ons   • Interac/ve  analy/cs  directly  on  source  data   • No  jumping  between  data  silos   • Reduce  duplicate  storage  with  EDW   • Reduce  data  movement  for  interac/ve  analysis   • Leverage  exis/ng  tools  and  employee  skills   • Ask  ques/ons  of  all  your  data   • No  informa/on  loss  from  aggrega/on  or   conforming  to    rela/onal  schemas  for  analysis   • Single  metadata  store  from  origina/on    through  analysis   • No  need  to  hunt  through  mul/ple  data  silos   ©2012  Cloudera,  Inc.  All  Rights  Reserved.  
  • 20. Validated  Beta  Partners   ©2012  Cloudera,  Inc.  All  Rights  Reserved.