Kenshoo - Use Hadoop, One Week, No Coding

1,296
-1

Published on

Noam Hasson, team leader for Big Data at Kenshoo explains what Kenshoo does, and how it leverages Hadoop to solve it's Big Data Challenges.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,296
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Kenshoo - Use Hadoop, One Week, No Coding

  1. 1. Use Hadoop, One Week, No Coding
  2. 2. © 2014 Kenshoo, Inc. Confidential and Proprietary Information About Me Noam Hasson Team Leader for Big Data • 12 years experience in web development • Hadoop enthusiast since 2011 • noam.hasson@kenshoo.com
  3. 3. What is Kenshoo?
  4. 4. © 2014 Kenshoo, Inc. Confidential and Proprietary Information • Benefits of Hadoop: • Solve big data challenges • See actual results in less than a week • Use current RDMBS infrastructure • No coding experience necessary • See how with an actual case study Agenda
  5. 5. © 2014 Kenshoo, Inc. Confidential and Proprietary Information Kenshoo Data Infrastructure • 250 MySQL & application server pairs, each pair dedicated to a single client • Identical schema on all MySQL servers • Different table size & capacities on each MySQL server • Total size of more than 100 TB X 250…
  6. 6. © 2014 Kenshoo, Inc. Confidential and Proprietary Information The Challenge • Avoid load-intensive queries on production clusters • Reduce run-time for data analysis queries • Run cross-server queries for clients deployed on more than one DB server • Compare statistical information across several DB servers
  7. 7. © 2014 Kenshoo, Inc. Confidential and Proprietary Information The Solution • Use Sqoop to migrate tables to Hive • Running SQL queries on Hive • That’s it! Sqoop - Full Table Import Hadoop
  8. 8. © 2014 Kenshoo, Inc. Confidential and Proprietary Information Example Code • Oneline Import sqoop import --hive-import --connect jdbc:mysql://ServerAddress/Database -m 10 --table DBTable --hive- database DBTable --username dbUsername --password dbPassword -- hive-overwrite -z • Query hive use myDatabase Select * from myTable where field = ‘value’
  9. 9. © 2014 Kenshoo, Inc. Confidential and Proprietary Information Performance • Single-node Hardware • 12 Hard drives each 4TB • 12 cores, 2 x CPU Sockets • 32GB memory • Performance • Import table of 300M rows took 3 hours • Select count on 5.5 Billion rows took 90 minutes • Group By on 5.5 Billion rows, with 1.1 Billion rows took 18 hours
  10. 10. © 2014 Kenshoo, Inc. Confidential and Proprietary Information What We’ve Learned • Use partitions • Import directly to compressed files • Compare the row count in the source and destination tables • Import only the columns you need to query • Use a full table import for easy & quick results • Refine the number of map tasks used for import • Adopt Hive for your Map-Reduce jobs • Increase query speed by avoiding “order by”

×