Bitfusion TensorFlow Meetup 10/18/2016

179 views

Published on

TensorFlow Performance comparison of new AWS p2 K80 based GPU instances vs. previous generation g2 k520 based GPU instances.

Published in: Software
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
179
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Bitfusion TensorFlow Meetup 10/18/2016

  1. 1. Software Defined Supercomputing TensorFlow Meetup 10/18/16
  2. 2. Logistics • Most Important: • Food: In the Back • Beverages: In the Back • Restrooms: Take a left out the door, restrooms will be on the right • Trashcans: Outside the room
  3. 3. AWS TensorFlow Performance • New AWS P2 instance (P2) which feature Nvidia K80 GPUs (1, 8, 16 GPU configurations, max of 8 physical cards) • How do the P2s compare performance wise to the older G2 instances with the K520 GPUs? • How do the P2 compare in terms of performance / $ vs. the G2s? • Used the convent-benchmarks: https://github.com/soumith/convnet- benchmarks
  4. 4. AWS TensorFlow Performance G2 (K520) Batch Size Forward(ms) Backward(ms) Total(ms) AlexNet 128 135 258 393 Overfeat 16 151 291 442 VGG 16 303 711 1014 GoogleNet 16 97 249 346 P2 (K80) Batch Size Forward(ms) Backward(ms) Total(ms) AlexNet 128 69 133 202 Overfeat 16 68 133 201 VGG 16 154 316 470 GoogleNet 16 55 121 176 On average, the P2 instance is ~2x faster than the previous generation G2 instances
  5. 5. AWS TensorFlow Performance/$ Instance $/hr Performance g2.2xlarge 0.65 1 p2.xlarge 0.90 2 p2.xlarge is 1.38x more expensive per hour than the g2.2xlarge, however, the throughput is approximately 2x. The $/hr/performance for P2  $0.45, making the p2 about 31% more efficient. This is probably a lower bound as the batch sizes used for most networks/models were small. For large networks/models the performance factor starts trending towards 3.
  6. 6. AWS TensorFlow: G2 and P2 Limitations G2 Limitations: No Peer2Peer access and Memory Size Network Max Batch Size (no warn) Max Batch Size (with warn) AlexNet 256 512 Overfeat 32 256 VGG 32 64 GoogleNet 64 128 P2 Limitations: Too much Peer2Peer access on p2.16xlarge. Tensorflow will fail even if you try to only use a single GPU. Disable Peer2peer or use CUDA_VISIBLE_DEVICES to mask GPUs if you must use that instance.
  7. 7. Bitfusion AMIs Try some of our Bitfusion AMIs: http://www.bitfusion.io/boost-machine-images/ + + Boost and more….
  8. 8. Thank You Maciej Bajkowski, COO maciej@bitfusion.io

×