Successfully reported this slideshow.
Your SlideShare is downloading. ×

PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
201810 td tech_talk
201810 td tech_talk
Loading in …3
×

Check these out next

1 of 20 Ad

More Related Content

Slideshows for you (20)

Similar to PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine (20)

Advertisement

Recently uploaded (20)

Advertisement

PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine

  1. 1. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Oct 17 2018, Ryu Kobayashi PLAZMA TD Tech Talk 2018 at Shibuya Hive2 as a new TD Hadoop core Engine
  2. 2. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Agenda
  3. 3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Agenda - PTD Hive - Our storage is PlazmaDB - Default support Vectorization - Test - Next plan
  4. 4. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Ryu Kobayashi Software engineer at the Hadoop team • Backend team -> Hadoop team -> MPP(Presto) team -> Hadoop Team • Hadoop usage history: about 10 years – Background:
  5. 5. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PTD Hive
  6. 6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PTD Hive • PTD = Patch set by Treasure Data • Our Hadoop and Hive History – CDH3 -> CDH4 -> HDP2 -> Apache Hadoop and Hive • Why did we discarded the distribution? – Bugs are fixed by ourselves ▪ But, it will not be taken in soon(Hive): e.g. HIVE-11353 – Distribution depends on a specific version ▪ The test range becomes wider
  7. 7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PTD Hive • PTD project starts from 2015 – At that time version: Hive 2.1.0 – Current support version: Hive 2.3.2 • Why from 2.1.0 to 2.3.2, between 2015 and 2018? – See the self introduction – So, restart 2018 • We have fixed many bugs in 2.3.2 as well
  8. 8. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PTD Hive • We apply internal patch besides this: – INSERT INTO/OVERWRITE ▪ Why? – Our storage is PlazmaDB – Storage does not HDFS – So, output must be made to PlazmaDB • Our original bugs may happen – Investigation is serious ▪ Our original or Hive itself?
  9. 9. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Our storage is PlazmaDB
  10. 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PlazmaDB • What is PlazmaDB? – Columnar Compression Storage • PlazmaDB’s contents – plazmadb – plazmadb-mpcfile ▪ What is mpcfile? – A proprietary format that compresses the MessagePack
  11. 11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PlazmaDB • We does not used HDFS(But, we are using it as an intermediate file) – Advantage: Easy upgrade Hadoop’s version • Upgrade internal PlazmaDB library from Hive2 – Old: ▪ plazmadb ▪ plazmadb-mpcfile ▪ td-storage ▪ msgpack(0.6) – New: ▪ plazmadb ▪ partition-manager ▪ msgpack(0.8)
  12. 12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Default support Vectorization
  13. 13. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Default support Vectorization • Currently our Hive 0.13 does not support Vectorization – Because there are many bugs • Since bugs have been fixed from Hive2, support by default – There are some problems internally ▪ Schema type problem: READ and WRITE
  14. 14. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Default support Vectorization • Performance? – About 2 times Our legacy Hive than faster ▪ Vectorization ▪ New Storage Library • The remaining challenges – Our UDF support for vectorization ▪ Mainly time related
  15. 15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Test
  16. 16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Test • How do we testing? – system-test ▪ scheduled run – Hive 0.13 and Hive2 – elephant-testing ▪ scheduled run – Register query that was problematic so far
  17. 17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Test • What kind of problems happened? – The result is different ▪ Schema type problem – Null – Decimal point ▪ This also affects INSERT INTO/OVERWRITE – Specific UDF does not work ▪ Compatibility of jar used by Hive and jar used by us – Cross join is not supported by default ▪ Because of hive.strict.checks.cartesian.product property
  18. 18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Next plan
  19. 19. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Next plan • Alpha release next Month • Beta and Stable next year • Our new PlazmaDB – CBO support • Tez support – last time 2015... ▪ 0.8.4 -> 0.9(currently 0.9.1) • Hive3 support
  20. 20. Thank You! Danke! Merci! 谢谢! Gracias! Kiitos! Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

×