• Save
Kenshoo - Use Hadoop, One Week, No Coding
Upcoming SlideShare
Loading in...5
×
 

Kenshoo - Use Hadoop, One Week, No Coding

on

  • 734 views

Noam Hasson, team leader for Big Data at Kenshoo explains what Kenshoo does, and how it leverages Hadoop to solve it's Big Data Challenges.

Noam Hasson, team leader for Big Data at Kenshoo explains what Kenshoo does, and how it leverages Hadoop to solve it's Big Data Challenges.

Statistics

Views

Total Views
734
Views on SlideShare
198
Embed Views
536

Actions

Likes
0
Downloads
0
Comments
0

4 Embeds 536

http://www.bigdataeverywhere.com 495
http://bigdataeverywhere.com 35
http://hadooponedev.devcloud.acquia-sites.com 5
http://www.slideee.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Kenshoo - Use Hadoop, One Week, No Coding Kenshoo - Use Hadoop, One Week, No Coding Presentation Transcript

  • Use Hadoop, One Week, No Coding
  • © 2014 Kenshoo, Inc. Confidential and Proprietary Information About Me Noam Hasson Team Leader for Big Data • 12 years experience in web development • Hadoop enthusiast since 2011 • noam.hasson@kenshoo.com
  • What is Kenshoo?
  • © 2014 Kenshoo, Inc. Confidential and Proprietary Information • Benefits of Hadoop: • Solve big data challenges • See actual results in less than a week • Use current RDMBS infrastructure • No coding experience necessary • See how with an actual case study Agenda
  • © 2014 Kenshoo, Inc. Confidential and Proprietary Information Kenshoo Data Infrastructure • 250 MySQL & application server pairs, each pair dedicated to a single client • Identical schema on all MySQL servers • Different table size & capacities on each MySQL server • Total size of more than 100 TB X 250…
  • © 2014 Kenshoo, Inc. Confidential and Proprietary Information The Challenge • Avoid load-intensive queries on production clusters • Reduce run-time for data analysis queries • Run cross-server queries for clients deployed on more than one DB server • Compare statistical information across several DB servers
  • © 2014 Kenshoo, Inc. Confidential and Proprietary Information The Solution • Use Sqoop to migrate tables to Hive • Running SQL queries on Hive • That’s it! Sqoop - Full Table Import Hadoop
  • © 2014 Kenshoo, Inc. Confidential and Proprietary Information Example Code • Oneline Import sqoop import --hive-import --connect jdbc:mysql://ServerAddress/Database -m 10 --table DBTable --hive- database DBTable --username dbUsername --password dbPassword -- hive-overwrite -z • Query hive use myDatabase Select * from myTable where field = ‘value’
  • © 2014 Kenshoo, Inc. Confidential and Proprietary Information Performance • Single-node Hardware • 12 Hard drives each 4TB • 12 cores, 2 x CPU Sockets • 32GB memory • Performance • Import table of 300M rows took 3 hours • Select count on 5.5 Billion rows took 90 minutes • Group By on 5.5 Billion rows, with 1.1 Billion rows took 18 hours
  • © 2014 Kenshoo, Inc. Confidential and Proprietary Information What We’ve Learned • Use partitions • Import directly to compressed files • Compare the row count in the source and destination tables • Import only the columns you need to query • Use a full table import for easy & quick results • Refine the number of map tasks used for import • Adopt Hive for your Map-Reduce jobs • Increase query speed by avoiding “order by”