Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Perfect Hive Query for a Perfect Meeting

351 views

Published on

Published in: Technology, Business
  • Be the first to comment

A Perfect Hive Query for a Perfect Meeting

  1. 1. Adam Kawa
  2. 2. A deal was made!
  3. 3. Martin will invite Adam and Timbuktu, my favourite Swedish artist, for a beer or coke or whatever to drink * by Martin
  4. 4. -
  5. 5. Question
  6. 6. Question Answers
  7. 7. Data will tell the truth!
  8. 8. - - - - -
  9. 9. Why? by Adam
  10. 10. - - - - - -
  11. 11. Introduction
  12. 12. … … … … … … … … … …
  13. 13. … … … …
  14. 14. -
  15. 15. ✓ ✓ ✓ ✗ ✗
  16. 16. HiveQL
  17. 17. A line where I may have a bug ? !
  18. 18. HiveQL
  19. 19. Verbose and complex Java code
  20. 20. -
  21. 21. - -
  22. 22. -
  23. 23. - -
  24. 24. - - -
  25. 25. - - - -
  26. 26. - - - - -
  27. 27. For Each Line -
  28. 28. For Each Line - -
  29. 29. track.txt
  30. 30. user.txt track.txt
  31. 31. stream.txt user.txt track.txt
  32. 32. expected.txt stream.txt
  33. 33. … …
  34. 34. … … …
  35. 35. … … …
  36. 36. Bee test Be happy !
  37. 37. HiveQL
  38. 38. - - -
  39. 39. ✗ - ✗ -
  40. 40. … … … ✗
  41. 41. - - -
  42. 42. Threshold
  43. 43. ✓ - Threshold
  44. 44. ✗ Threshold
  45. 45. ✗ Try and see -
  46. 46. - ?
  47. 47. HiveQL
  48. 48. -
  49. 49. - - - -
  50. 50. 2 MapReduce job in total
  51. 51. Runs many Map joins in a Map-Only job [HIVE-3784]
  52. 52. - - -
  53. 53. - - - -
  54. 54. - - - - - - -
  55. 55. - - - - - - -
  56. 56. - - -
  57. 57. Runs as a single MR job [HIVE-3952]
  58. 58. 2 MapReduce job in total
  59. 59. HiveQL
  60. 60. ✗ ✓ - -
  61. 61. ✓ ✗ -
  62. 62. -
  63. 63. - My query generates small amount of intermediate data -
  64. 64. ✓ ✗
  65. 65. -
  66. 66. - - - - - -
  67. 67. - - - - - - - -
  68. 68. 2 months of data 50 min 2 sec 10th place ?
  69. 69. Changes are needed!
  70. 70. File Format
  71. 71. - -
  72. 72. ✓ - -
  73. 73. ✗ ✓ -
  74. 74. 16x
  75. 75. 3.5x
  76. 76. 32x
  77. 77. -
  78. 78. Computation
  79. 79. - - -
  80. 80. 1.4x 2.4x
  81. 81. 8x
  82. 82. ✓ - - -
  83. 83. ✓ - -
  84. 84. ✓ - -
  85. 85. Time
  86. 86. The more congested queue/cluster, the bigger benefits of reusing Time
  87. 87. No scheduling overhead to run new Reduce task Time
  88. 88. Time Thinner tasks allows to avoid stragglers
  89. 89. Finished within 1,5 sec. Warm !
  90. 90. - - - -
  91. 91. -
  92. 92. ✓ ✓
  93. 93. - - - -
  94. 94. Feature
  95. 95. ✓ - ✓ ✓ - ✓
  96. 96. 1.4x
  97. 97. ✗ ✓ ✓ ✗ ✓
  98. 98. - -
  99. 99. Feature
  100. 100. ✓ ✓ - - - ✓
  101. 101. 14 months of data 10 min 11 sec ?
  102. 102. Results
  103. 103. That’s all !
  104. 104. - - - - -
  105. 105. - - -

×