Justin Erickson, Cloudera_Hadoop&SQL

615 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
615
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Justin Erickson, Cloudera_Hadoop&SQL

  1. 1. The Platform for Big Data 1 It’s Not Just About SQL on Hadoop Storage Integration Resource Management Metadata Batch Processing MAPREDUCE, HIVE & PIG … Interactive SQL IMPALA Interactive Search Solr HDFS HBase TEXT, RCFILE, PARQUET, AVRO… RECORDS Engines Management | Support Single platform for processing ML, SQL, Search, SAS, R, … Scales to ‘000s of servers No upfront schema 10% the cost per TB Open source platform ©2013 Cloudera, Inc. All Rights Reserved. Interactive Analytics SAS, R, …
  2. 2. Impala Today • Interactive SQL • Typically 4-65x faster than the latest Hive (observed 100x faster) • Responses in seconds instead of minutes (sometimes sub-second) • ANSI-92 standard SQL queries with HiveQL • Compatible SQL interface for existing Hadoop/CDH applications • Industry standard SQL • Natively on Hadoop/HBase storage and metadata • Flexibility, scale, and cost advantages of Hadoop • No duplication/synchronization of data and metadata • Local processing to avoid network bottlenecks • Separate runtime from batch Hive, Pig, or MapReduce • Hive is designed and great for batch • Impala is purpose-built for low-latency SQL queries on Hadoop 2 ©2013 Cloudera, Inc. All Rights Reserved.
  3. 3. Impala’s Benefits Today • Unlocks BI/analytics on Hadoop • Interactive SQL in seconds/milliseconds • Highly concurrent to handle 100s and 1000s of users • Native Hadoop flexibility • No data migration, conversion, or duplication required • Query across existing Hadoop data • Run multiple frameworks on the same data at the same time • Supports Parquet for best-of-breed columnar performance • Native MPP query engine designed into Hadoop: • Unified Hadoop storage • Unified Hadoop metadata (uses Hive and HCatalog) • Unified Hadoop security • Fine-grained role-based access controls with Sentry • Apache-licensed open source • Deployed and proven across many customers today ©2013 Cloudera, Inc. All Rights Reserved. 3

×