• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Benchmarking
 

Benchmarking

on

  • 3,730 views

Thoughts on improving Hadoop benchmarking

Thoughts on improving Hadoop benchmarking

Statistics

Views

Total Views
3,730
Views on SlideShare
3,697
Embed Views
33

Actions

Likes
1
Downloads
80
Comments
0

5 Embeds 33

http://www.slideshare.net 12
http://www.1060.org 10
http://1060.org 8
http://www.linkedin.com 2
http://74.125.93.132 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Benchmarking Benchmarking Presentation Transcript

    • Benchmarking Steve Loughran Julio Guijarro © 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
    • Benchmarks
    • Some Problems •  Estimating Hadoop performance of hardware •  Estimating Hadoop performance of a cluster •  Designing Hadoop-ready servers •  Designing Hadoop-ready clusters •  Optimising the network for Hadoop •  Optimising Hadoop/HDFS for specific applications
    • Recent customer request "They want data for Hadoop Sort for 100GB."
    • Terasort: what else? •  PageRank: CPU intensive, small (static) input dataset •  Something that stresses RAM and CPU •  Something that seeks in the files?
    • Test Datasets •  Wikipedia: 5-10 TB of XML data with changes; user relationships have to be inferred •  SpamAssassin: 70+ GB of SPAM •  Physics? Something Small?
    • Network Measurement What to add to Hadoop/Avro/Thrift to monitor network traffic -and relate to specific jobs?
    • Predicting performance Can an MR job on small datasets predict performance on full size datasets? What extra instrumentation can help?
    • Hardware Q What should a Hadoop-ready server look like? What about a rack? Or a container?