Accelerating High Performance
Computing with Ethernet

Derek Granath
Senior Director, Product Line Management
December 10, 2013
© 2013 Extreme Networks Inc. All rights reserved.
Agenda
Transforming Data into Information
HPC Interconnect Challenges and Alternatives
High Speed, Low Latency Ethernet for HPC
Architecture Examples

© 2013 Extreme Networks Inc. All rights reserved.
How Much Data?

The World Generates
2.5 Quintillion bytes each day…
or 57.5 Billion 32Gb iPad’s worth
each day

3

© 2013 Extreme Networks Inc. All rights reserved.
Growth of Data

90% of the data in the world
was created in the past

TWO YEARS
Source: IBM, 2012

Image: NCSA’s Blue Waters sustained petascale supercomputing facility
© 2013 Extreme Networks Inc. All rights reserved.
Data Analytics Challenge
Big Data = Transactions + Interactions + Observations
BIG DATA

Sensors/RFID/Devices

Observations

Mobile Web

Petabytes

User Generated Content
Sentiment

Social Interactions & Feeds

User Click Stream

Terabytes

Spatial & GPS Coordinates

WEB

Web Logs

A/B Testing
External Demographics

Dynamic Pricing

Offer History

Interactions
Business Data Feeds

CRM

Affiliate Networks
HD Video, Audio, Images

Segmentation
Transactions
Offer Details

Search Marketing

Purchase Details

Customer Touches

Behavioral Targeting

Purchase Record

Support Contacts

Dynamic Funnels

Gigabytes
ERP
Megabytes

Speech to Text
Product/Service Log

Payment Record

SMS/MMS

Increasing Data Variety and Complexity
Source: Contents of above graphic created in partnership with Teradata, Inc. http://tinyurl.com/mt4ltah

5

© 2013 Extreme Networks Inc. All rights reserved.
Structured and Unstructured Data
Example
Structured Data
• Fits neatly into traditional
database schemas
• Email metadata
• Call records
• Can be easily
• Stored
• Queried
• Analyzed

Unstructured Data
• Everything else…
• May contain patterns
• Also it might not!
• Video
• Audio
• Photos
• This Presentation!
• Doesn’t fit in fixed length fields
6

© 2013 Extreme Networks Inc. All rights reserved.
HPC – What Can You Do with it?
Utilities

Financial Services

 Weather impact analysis on
power generation
 Transmission monitoring
 Smart grid management

 Fraud detection
 Risk management
 High Frequency Trading
 360° View of the Customer

Transportation
 Weather and traffic
impact on logistics and
fuel consumption
 Traffic congestion

IT

 System log analysis
 Cybersecurity

Retail

Health & Life Sciences

 360° View of the Customer
 Click-stream analysis
 Real-time promotions

 DNA sequencing
 Epidemic early warning
 ICU monitoring
 Remote healthcare monitoring

Telecommunications
 CDR processing
 Churn prediction
 Geomapping / marketing
 Network monitoring
7

7

Law Enforcement
 Real-time multimodal surveillance
 Situational awareness and threat detection
 Cyber security detection
© 2013 Extreme Networks Inc. All rights reserved.
HPC Architectural Evolution
– Clusters Dominate

8

© 2013 Extreme Networks Inc. All rights reserved.
Moore’s Law For HPC
Compute

Doubles Every
1.5 Years

9

Data

Doubles Every
1.5 Years
© 2013 Extreme Networks Inc. All rights reserved.

I/O

Doubles Every
4 Years
Interconnect Challenges
Massive Scalability
• Flexibility to grow big cheaply
• Deploy and re-deploy assets
Manageability
• More servers, more storage, more applications
Eased Convergence
• Proven, certified interoperability
• Standards-based technology
Efficiency
• Energy
• Operational
Availability
• Resilient architectures

10

© 2013 Extreme Networks Inc. All rights reserved.
Network Considerations for Big Data
Processing Time
Barrier Synchronized Computation

Traffic Burstiness
Buffering & Burst Handling

Data Volume
Large long-lived flows
Das, Anupam et. al, "Transparent and Flexible Network Management for Big Data Processing in the Cloud."
11

© 2013 Extreme Networks Inc. All rights reserved.
Top 3 Requirements for HPC/SC
Interconnect Fabric
Next-Generation HPC clusters demand:
• Non-blocking performance for any-to-any connectivity

Throughput

o Switch fabric performance
o Non-oversubscribed, wire-speed architecture
o Cut-through support

• Ample bandwidth for multiple applications and jobs

Bandwidth

o Higher speeds and feeds: 10/40/100 Gigabit Ethernet
o Higher density per slot
o Link aggregation (LAG) support

• Least time taken to complete a job

Latency

o Lowest port-to-port latency in the I/O fabric
o Jumbo-frame support
o Short-reach optics and cables support

© 2013 Extreme Networks Inc. All rights reserved.
I/O Technology to Meet the Future
Demand
To bridge the gap, you need an I/O technology, that is:
Scalable

• Bandwidth aggregation to multiply I/O
• Seamless migration to higher speeds and feeds

Flexible

• Short, medium, long range connectivity options
• I/O diversity and mix-n-match

Economical

• Minimal cost increase with speed migrations
• Reusable in terms of infrastructure and training

Reliable

• Is resilient and time tested
• Provides required level of service up time
© 2013 Extreme Networks Inc. All rights reserved.
HPC I/O Technology Alternatives
Infiniband

Fiber Channel

Scalability
Bandwidth
Latency

Flexibility
Reliability
Economics
© 2013 Extreme Networks Inc. All rights reserved.

Ethernet
Ethernet and InfiniBand Dominate

Note: Gigabit Ethernet Category
includes 10GbE

15

© 2013 Extreme Networks Inc. All rights reserved.
Ethernet Penetration in Top 500
212 (42.4%) of world’s top 500 fastest supercomputers use Ethernet

Source: Top500.org, November 2013
© 2013 Extreme Networks Inc. All rights reserved.
RoCE – RDMA over Converged Ethernet
• RoCE is a link layer protocol between two hosts in a broadcast domain
• Allows Remote Direct Memory Access (RDMA) similar to Infiniband to run
over Ethernet
• RoCE replaces IB link layer with Ethernet

• Simpler than iWARP (Internet Wide Area RDMA Protocol)
• RoCE is marginally slower than IB which is sub μs, and latencies can
approach 1-3 μs (micro-seconds), but less expensive and lower power to
deploy than IB
• Ideal for High Performance Cluster Computing environments already
familiar with Ethernet Technology, but need the speed and agility of IB
• No Support for IP (unlike iWARP): Need to use a head-end gateway to
access closed- cluster environments
• Multicast RDMA is defined for RoCE (also unlike iWARP)
• Requires IEEE Data Center Bridging for PFC support on the network, a
RoCE capable Ethernet adapter for hardware acceleration
17

© 2013 Extreme Networks Inc. All rights reserved.
Sample Architecture
Compute Nodes
Cluster

Clients

Master Nodes

10G

40G

Compute Nodes
Cluster
Front End

Access
© 2013 Extreme Networks Inc. All rights reserved.

40G

Storage

High Performance
Interconnect Fabric

Back End
HPC Design with Centralized Storage
Back End
Front End

High-Performance Fabric

Master
Nodes

Front End

Master
Nodes
10GbE Data Path

Access

Access

40GbE Storage Path

Compute Nodes

Storage Nodes
© 2013 Extreme Networks Inc. All rights reserved.

Compute Nodes
Cluster Interconnect Fabric Options
Fully Non-blocking Architecture
•
•
•
•
•

Six 32 x 40G switches
256 10G ports, fully non-blocking
64 10G ports to each server
16 40G uplinks per rack to spine
Less than 1.8 microsecond latency

3:1 Over-subscribed
•
•
•
•

20

Ten 32 x 40G switches
768 10G ports to servers
8 40G uplinks per rack to spine
Less than 1.8 microsecond
latency

© 2013 Extreme Networks Inc. All rights reserved.
Ethernet evolves –
That’s what it does!

Thank You
21

© 2013 Extreme Networks Inc. All rights reserved.

Accelerating HPC with Ethernet

  • 1.
    Accelerating High Performance Computingwith Ethernet Derek Granath Senior Director, Product Line Management December 10, 2013 © 2013 Extreme Networks Inc. All rights reserved.
  • 2.
    Agenda Transforming Data intoInformation HPC Interconnect Challenges and Alternatives High Speed, Low Latency Ethernet for HPC Architecture Examples © 2013 Extreme Networks Inc. All rights reserved.
  • 3.
    How Much Data? TheWorld Generates 2.5 Quintillion bytes each day… or 57.5 Billion 32Gb iPad’s worth each day 3 © 2013 Extreme Networks Inc. All rights reserved.
  • 4.
    Growth of Data 90%of the data in the world was created in the past TWO YEARS Source: IBM, 2012 Image: NCSA’s Blue Waters sustained petascale supercomputing facility © 2013 Extreme Networks Inc. All rights reserved.
  • 5.
    Data Analytics Challenge BigData = Transactions + Interactions + Observations BIG DATA Sensors/RFID/Devices Observations Mobile Web Petabytes User Generated Content Sentiment Social Interactions & Feeds User Click Stream Terabytes Spatial & GPS Coordinates WEB Web Logs A/B Testing External Demographics Dynamic Pricing Offer History Interactions Business Data Feeds CRM Affiliate Networks HD Video, Audio, Images Segmentation Transactions Offer Details Search Marketing Purchase Details Customer Touches Behavioral Targeting Purchase Record Support Contacts Dynamic Funnels Gigabytes ERP Megabytes Speech to Text Product/Service Log Payment Record SMS/MMS Increasing Data Variety and Complexity Source: Contents of above graphic created in partnership with Teradata, Inc. http://tinyurl.com/mt4ltah 5 © 2013 Extreme Networks Inc. All rights reserved.
  • 6.
    Structured and UnstructuredData Example Structured Data • Fits neatly into traditional database schemas • Email metadata • Call records • Can be easily • Stored • Queried • Analyzed Unstructured Data • Everything else… • May contain patterns • Also it might not! • Video • Audio • Photos • This Presentation! • Doesn’t fit in fixed length fields 6 © 2013 Extreme Networks Inc. All rights reserved.
  • 7.
    HPC – WhatCan You Do with it? Utilities Financial Services  Weather impact analysis on power generation  Transmission monitoring  Smart grid management  Fraud detection  Risk management  High Frequency Trading  360° View of the Customer Transportation  Weather and traffic impact on logistics and fuel consumption  Traffic congestion IT  System log analysis  Cybersecurity Retail Health & Life Sciences  360° View of the Customer  Click-stream analysis  Real-time promotions  DNA sequencing  Epidemic early warning  ICU monitoring  Remote healthcare monitoring Telecommunications  CDR processing  Churn prediction  Geomapping / marketing  Network monitoring 7 7 Law Enforcement  Real-time multimodal surveillance  Situational awareness and threat detection  Cyber security detection © 2013 Extreme Networks Inc. All rights reserved.
  • 8.
    HPC Architectural Evolution –Clusters Dominate 8 © 2013 Extreme Networks Inc. All rights reserved.
  • 9.
    Moore’s Law ForHPC Compute Doubles Every 1.5 Years 9 Data Doubles Every 1.5 Years © 2013 Extreme Networks Inc. All rights reserved. I/O Doubles Every 4 Years
  • 10.
    Interconnect Challenges Massive Scalability •Flexibility to grow big cheaply • Deploy and re-deploy assets Manageability • More servers, more storage, more applications Eased Convergence • Proven, certified interoperability • Standards-based technology Efficiency • Energy • Operational Availability • Resilient architectures 10 © 2013 Extreme Networks Inc. All rights reserved.
  • 11.
    Network Considerations forBig Data Processing Time Barrier Synchronized Computation Traffic Burstiness Buffering & Burst Handling Data Volume Large long-lived flows Das, Anupam et. al, "Transparent and Flexible Network Management for Big Data Processing in the Cloud." 11 © 2013 Extreme Networks Inc. All rights reserved.
  • 12.
    Top 3 Requirementsfor HPC/SC Interconnect Fabric Next-Generation HPC clusters demand: • Non-blocking performance for any-to-any connectivity Throughput o Switch fabric performance o Non-oversubscribed, wire-speed architecture o Cut-through support • Ample bandwidth for multiple applications and jobs Bandwidth o Higher speeds and feeds: 10/40/100 Gigabit Ethernet o Higher density per slot o Link aggregation (LAG) support • Least time taken to complete a job Latency o Lowest port-to-port latency in the I/O fabric o Jumbo-frame support o Short-reach optics and cables support © 2013 Extreme Networks Inc. All rights reserved.
  • 13.
    I/O Technology toMeet the Future Demand To bridge the gap, you need an I/O technology, that is: Scalable • Bandwidth aggregation to multiply I/O • Seamless migration to higher speeds and feeds Flexible • Short, medium, long range connectivity options • I/O diversity and mix-n-match Economical • Minimal cost increase with speed migrations • Reusable in terms of infrastructure and training Reliable • Is resilient and time tested • Provides required level of service up time © 2013 Extreme Networks Inc. All rights reserved.
  • 14.
    HPC I/O TechnologyAlternatives Infiniband Fiber Channel Scalability Bandwidth Latency Flexibility Reliability Economics © 2013 Extreme Networks Inc. All rights reserved. Ethernet
  • 15.
    Ethernet and InfiniBandDominate Note: Gigabit Ethernet Category includes 10GbE 15 © 2013 Extreme Networks Inc. All rights reserved.
  • 16.
    Ethernet Penetration inTop 500 212 (42.4%) of world’s top 500 fastest supercomputers use Ethernet Source: Top500.org, November 2013 © 2013 Extreme Networks Inc. All rights reserved.
  • 17.
    RoCE – RDMAover Converged Ethernet • RoCE is a link layer protocol between two hosts in a broadcast domain • Allows Remote Direct Memory Access (RDMA) similar to Infiniband to run over Ethernet • RoCE replaces IB link layer with Ethernet • Simpler than iWARP (Internet Wide Area RDMA Protocol) • RoCE is marginally slower than IB which is sub μs, and latencies can approach 1-3 μs (micro-seconds), but less expensive and lower power to deploy than IB • Ideal for High Performance Cluster Computing environments already familiar with Ethernet Technology, but need the speed and agility of IB • No Support for IP (unlike iWARP): Need to use a head-end gateway to access closed- cluster environments • Multicast RDMA is defined for RoCE (also unlike iWARP) • Requires IEEE Data Center Bridging for PFC support on the network, a RoCE capable Ethernet adapter for hardware acceleration 17 © 2013 Extreme Networks Inc. All rights reserved.
  • 18.
    Sample Architecture Compute Nodes Cluster Clients MasterNodes 10G 40G Compute Nodes Cluster Front End Access © 2013 Extreme Networks Inc. All rights reserved. 40G Storage High Performance Interconnect Fabric Back End
  • 19.
    HPC Design withCentralized Storage Back End Front End High-Performance Fabric Master Nodes Front End Master Nodes 10GbE Data Path Access Access 40GbE Storage Path Compute Nodes Storage Nodes © 2013 Extreme Networks Inc. All rights reserved. Compute Nodes
  • 20.
    Cluster Interconnect FabricOptions Fully Non-blocking Architecture • • • • • Six 32 x 40G switches 256 10G ports, fully non-blocking 64 10G ports to each server 16 40G uplinks per rack to spine Less than 1.8 microsecond latency 3:1 Over-subscribed • • • • 20 Ten 32 x 40G switches 768 10G ports to servers 8 40G uplinks per rack to spine Less than 1.8 microsecond latency © 2013 Extreme Networks Inc. All rights reserved.
  • 21.
    Ethernet evolves – That’swhat it does! Thank You 21 © 2013 Extreme Networks Inc. All rights reserved.