Efficient in situ processing of various storage types on apache tajo

4,720 views

Published on

Published in: Data & Analytics
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,720
On SlideShare
0
From Embeds
0
Number of Embeds
391
Actions
Shares
0
Downloads
24
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Efficient in situ processing of various storage types on apache tajo

  1. 1. Efficient  In-­‐situ  Processing  of   Various  Storage  Types  on   Apache  Tajo Hadoop  Summit  2015  San  Jose Hyunsik Choi,  Gruter Inc.
  2. 2. Agenda • Tajo  Overview • Various  Storage  Support • Motivation • Design  Consideration • What  we  did/are  doing
  3. 3. An  overview  of  Apache  Tajo
  4. 4. Tajo:  A  Data  Warehouse  System • Data  Warehouse  System • Apache  Top-­‐level  project • Low  latency,  and  long  running   batch  queries  in  a  single  system • ~100  ms up  to  several  hours • Fault  tolerance • Features • ANSI  SQL  compliance • Mature  SQL  features:  Joins,  Group  by,  Sort,  Multiple  distinct  aggregations  and    Window  function • Partitioned  table  support • Java/Python  UDF  support • JDBC  driver  and  Java-­‐based  asynchronous  API • SQL  data  type  and  Nested  type  support • Direct  JSON  support
  5. 5. Master

×