Tomer Shiran, MapR_Hadoop&SQL

475 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
475
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Tomer Shiran, MapR_Hadoop&SQL

  1. 1. SQL-in-Hadoop • SQL is hot again! Apache Hive (+ Stinger/Tez) Apache Drill Shark/Spark Impala Phoenix Greenplum (HAWQ) Cascading Lingual Hadapt Splice Machine • MapR provides the broadest SQL support Apache Hive 0.11 GA Impala on MapR Private beta (25-50% faster) Apache Drill 1.0 Alpha this month • Hadoop BI tools can do a lot more than SQL queries
  2. 2. Why Apache Drill? • Community-driven project – SQL is an application interface – Users don’t want vendor lock-in • Next-generation SQL-in-Hadoop – Full ANSI SQL:2003 – Schema is optional – Nested data: JSON, Protobuf, … – Highly extensible – YARN integration Who’s contributing? MapR Pentaho Oracle VMWare Microsoft Thoughtworks UT Austin UW Madison RJMetrics XingCloud Lines of code: > 100K
  3. 3. It’s Not Just About Queries… • Real-time data loading so you don’t query stale data – HDFS was not designed for these workloads • Common storage and resource mgmt for all Big Data applications – Enterprise-grade: HA, DP (snapshots), DR (mirrors) – Multi-tenancy – Read/write access (POSIX) MapRDistributed Data System (MDDS) YARN Batch (MapReduce) SQL (Drill, Tez, Impala) Search (Solr, Elasticsearch) Streaming (Storm) File-based (POSIX) Table-based (MDDS, HBase) MapRDistribution for Apache Hadoop

×