SlideShare a Scribd company logo
Dr. Les Cottrell,
SLAC
<cottrell@slac.stanford.edu>
Linac Coherent Light Source (LCLS)
Data Transfer Requirements
HPC talk
Stanford Feb 2018
2
LCLS-II, a major (~ B$) upgrade to LCLS is
currently underway. Online in 2020.
Video
Basic instrument layout:
Optical laser pump, and x-ray laser probe
Basement Level
Optical
Laser
X-ray hutch
X-Rays
Sub-Basement Level
3
Example experiment #1
‘Molecular Movie’ Captures Ultrafast Chemistry in Motion
Scientific Achievement
Time-resolved observation of
an evolving chemical reaction
triggered by light.
Method
LCLS X-ray pulses were
delivered at different time
intervals, measuring
the structural changes on an
X-ray area detector.
M.P. Minitti, J.M. Budarz, et al., Phys. Rev. Lett., 114, 255501 (2015) (COVER ARTICLE)
Significance and Impact
Results pave the way for a wide range of X-ray studies examining gas phase chemistry
and the structural dynamics associated with the chemical reactions they undergo.
4
5
Next example: Catalysis
Another related example
Polluting
gases
Porous substrate
coated with
precious metals
Exhaust gases
react with
precious metals
H2O, CO2
Catalyst
HC, CO, NOX
50% of the world’s gasoline goes through fluid catalytic cracking
Example experiment #2: Catalytic converter –
transient dynamics resolved at the atomic scale
• Surface catalysis of CO oxidation to CO2
• Sub-picosecond transient states, monitored
via appearance of new electronic states in
the O K-edge x-ray absorption spectrum.
H. Öström et al., Science 2015
6
Data Analytics for high repetition rate Free Electron Lasers
7
FEL data challenge:
● Ultrafast X-ray pulses from LCLS are
used like flashes from a high-speed
strobe light, producing stop-action
movies of atoms and molecules
● Both data processing and scientific
interpretation demand intensive
computational analysis
LCLS-II represents SLAC’s largest data challenge
LCLS-II will increase data throughput by orders of magnitude by 2026,
creating an exceptional scientific computing challenge
8
LCLS-II experiments will present challenging computing
requirements, in addition to the capacity increase
1. Fast feedback is essential (seconds / minute timescale) to reduce the time to
complete the experiment, improve data quality, and increase the success rate
2. 24/7 availability
3. Short burst jobs, needing very short startup time
Very disruptive for computers that typically host simulations that run for days
4. Storage represents significant fraction of the overall system, both in cost and
complexity:
1. 1 Pbyte/day fast local buffer, 5-10 Pbytes storage for local processing, 10 yr
long term offline tape storage
2. 60-300 Teraflops in 2020, grow to 1 Pflop in 2022+
5. Throughput between storage and processing is critical
Currently most LCLS jobs are I/O limited
6. Speed and flexibility of the development cycle is critical
Wide variety of experiments, with rapid turnaround, and the need to tune data
analysis during experiments
These aspects are instrumental also for other SLAC facilities
(eg CryoEM, UED, SSRL, FACET-2)
9
LCLS-II data flow: from data production, to online
reduction, real-time analysis, and offline interpretation
Currently seeking user input on acceptable solutions for data reduction & analysis
Megapixel
detector
X-ray diffraction
image
Intensity map from
multiple pulses
Interpretation of system
structure / dynamics
Critical Requirement for Offsite Resources for LCLS Computing
MIRA
at Argonne
TITAN
at Oak
Ridge
CORI
at NERSC
● Several experiments require access to leading
edge computers for detailed data analysis.
This has its own challenges, in particular:
• Need to transfer huge amounts of
compressed data from SLAC to
supercomputers at other sites
• Providing near real-time
results/feedback to the experiment
• Has to be reliable, production quality
The requirements will need long term agreement between BES and ASCR 10
11
Requirement
• Today 20 Gbps from SLAC to NERSC =>70Gbps 2019
• Experiments increase efficiency & networking
• 2020 LCLS-II online, data rate 120Hz=>1MHz
• LCLS-II starts taking data at
increased data rate
• 2024 1Tbps:
• Imaging detectors get faster
Moore’s
law
Offsite Data Transfer: Needs and Plans
12
LCLS-II needs are compatible with SLAC and NERSC plans
SLAC plans
LCLS-II
LCLS-I
NERSC plans
ESnet6
upgrade
13
Data Transfer testing
• Zettar/zx:
• Provide HPC data transfer solution (i.e. SW + transfer system
reference design):
- state of the art, efficient, scalable high speed data transfer
• Over carefully selected demonstration hardware
• Today using existing equipment
• DTNs at NERSC & SLAC
• With Lustre file systems at ends
• Today 100Gbps link
• Using widely used bbcp and xrootd tools
• Currently ~55Gbps, exploring limitations
• Upgrading border to 2*100Gbps in progress
14
Data transfer performance: Test bed: two clusters at
SLAC with 5000 mile link
Space efficient: 6U per cluster
Energy efficient,
<80Gbps
NG demonstration
100Gbps
100Gbps
Storage servers
> 25TBytes in 8 SSDs
Data Transfer Nodes (DTNs)
IP over InfiniBand
IPoIB)
4* 56 Gbps
Other cluster
or High speed
Internet
n(2)*100GbE
4* 2 * 25GbE
2x100G
LAG
16
Memory to Memory between clusters with 2*100Gbps
• No storage involved just DTN to DTN mem-to-mem
• Extended locally to 200Gbps
• Here repeated 3 times
• Note uniformity of 8* 25Gbps interfaces.
• Can simply use TCP, no need for exotic proprietary protocols
• Network is not a problem
17
Storage
• On the other hand, file-to-file transfers are at the mercy of
the back-end storage performance.
• Even with generous compute power and network
bandwidth available, the best designed and implemented
data transfer software cannot create any magic with a
slow storage backend
18
XFS READ performance of 8*SSDs in a file server
measured by Unix fio utility
20GBps
15GBps
10GBps
5GBps
0GBps
15:16 15:18
Read Throughput
15:18
SSD busy
800%
600%
400%
200%
15:16 15:16 15:18
Queue Size
3500
2500
1500
Data size = 5*200GiB files similar to
typical LCLS large file sizes
Note reading SSD busy, uniformity, plenty of objects in queue
yields close to raw throughput available
19
XFS + parallel file system WRITE performance for 16
SSDs in 2 file servers
SSD busy
10GBps
SSD write throughput
50
Queue size of
pending writes
Write much slower than read
File system layers can’t keep queue full (factor
1000 less items queued than for reads)
20
Conclusion
Network is fine, can drive 200Gbps, no need for proprietary
protocols
Insufficient IOPS for write < 50% of raw capability
- Today limited to 80-90Gbps file transfer
Work with local vendors
• State of art components fail, need fast replacements
• Worst case waited 2 months for parts
Use fastest SSDs for write
• We used Intel DC P3700 NVMe 1.6TB drives
• Biggest also fastest but also most expensive
• 1.6TB $1677 vs 2.0TB $2655 ; 20% improvement 60% cost
increase
Parallel file system is bottleneck
• Needs enhancing for modern hardware & OS
21
Demonstration PetaByte in < 1.5days at ~70Gbps on
<80Gbps shared link
1/3rd of ESnet's
traffic
22
Impact on all ESnet traffic
12pm 3pm 6pm 9pm Mon 10 3am 6am 9am
100G
200G
OSCARS LHCONE Other
When running data transfer contributes ~ 1/3 of total ESnet traffic
23
Summary
What is special:
• Scalable. Add more NICs, more DTNs, more storage servers,
links as needed/available…
• Power, & space efficient; low cost
• HA tolerant to loss of components
• Storage tiering friendly
• Reference designs
• Easy to use, production-ready software
Proposed Future PetaByte Club
A member of the Petabyte Club MUST be an organization that is
capable of using a shared production point-to-point WAN link to
attain a production data transfer rate >= 150PiB-mile/hour
24
Future
Upgrade 200Gbps border at SLAC to 2x100Gbps to Esnet – Spring 2018
Then onto SLAC to NERSC
• Very special environment, hard to modify
• Focus has been different to what we need:
• vector computing and communication-intensive problems at expense of faster cpus
- but LCLS embarassingly parallel
• Not easy to use for cluster computing such as xrootd or zx
• Unless Cray supports porting application - unlikely
• Using the Burst Buffer (BB) is not easy from an application
• BB Will probably limit data transfers performance
• Normally used from batch so challenge for near-realtime
• Can bypass the NERSC DTNs and go directly to the Cori nodes which have direct
access to the BB.
• However DTNs provided common ways to customize, more agile
• Harder to do for Cori nodes
• Complex supercomputer,
• components tend to be behind very latest technology curve
Looking to exceed PB/day by SC2018
Questions

More Related Content

What's hot

Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...
Steffen Gebert
 
Accelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet ProcessingAccelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet Processing
Open-NFP
 
Achieving congestion diversity in multi hop wireless mesh networks
Achieving congestion diversity in multi hop wireless mesh networksAchieving congestion diversity in multi hop wireless mesh networks
Achieving congestion diversity in multi hop wireless mesh networks
ieeeprojectschennai
 
Data center architectures and their relevance in cloud processes
Data center architectures and their relevance in cloud processesData center architectures and their relevance in cloud processes
Data center architectures and their relevance in cloud processes
Claudio Fiandrino
 
Provisioning Janet
Provisioning JanetProvisioning Janet
Provisioning Janet
Jisc
 
Application Behavior-Aware Flow Control in Network-on-Chip
Application Behavior-Aware Flow Control in Network-on-ChipApplication Behavior-Aware Flow Control in Network-on-Chip
Application Behavior-Aware Flow Control in Network-on-Chip
Ivonne Liu
 
Whitebox Switches Deployment Experience
Whitebox Switches Deployment ExperienceWhitebox Switches Deployment Experience
Whitebox Switches Deployment Experience
APNIC
 
UDT.pptx
UDT.pptxUDT.pptx
UDT.pptx
Dan Sullivan
 
P4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and ControlP4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and Control
Open-NFP
 
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Multimedia flow classification at 10 Gbps using acceleration techniques on co...Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Jorge E. López de Vergara Méndez
 
OpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouterOpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouter
Open-NFP
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Xing Xu
 
Stacks and Layers: Integrating P4, C, OVS and OpenStack
Stacks and Layers: Integrating P4, C, OVS and OpenStackStacks and Layers: Integrating P4, C, OVS and OpenStack
Stacks and Layers: Integrating P4, C, OVS and OpenStack
Open-NFP
 
The Impact of Software-based Virtual Network in the Public Cloud
The Impact of Software-based Virtual Network in the Public CloudThe Impact of Software-based Virtual Network in the Public Cloud
The Impact of Software-based Virtual Network in the Public Cloud
Chunghan Lee
 
Sierra Supercomputer: Science Unleashed
Sierra Supercomputer: Science UnleashedSierra Supercomputer: Science Unleashed
Sierra Supercomputer: Science Unleashed
inside-BigData.com
 
PlatCon-19 ICMPv6SD
PlatCon-19 ICMPv6SDPlatCon-19 ICMPv6SD
PlatCon-19 ICMPv6SD
linzichao
 
Datacenter traffic demand characterization
Datacenter traffic demand characterizationDatacenter traffic demand characterization
Datacenter traffic demand characterization
UC San Diego
 
The Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a MissionThe Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a Mission
inside-BigData.com
 
Data Plane and VNF Acceleration Mini Summit
Data Plane and VNF Acceleration Mini Summit Data Plane and VNF Acceleration Mini Summit
Data Plane and VNF Acceleration Mini Summit
Open-NFP
 
User-­friendly Metaworkflows in Quantum Chemistry
User-­friendly Metaworkflows in Quantum ChemistryUser-­friendly Metaworkflows in Quantum Chemistry
User-­friendly Metaworkflows in Quantum Chemistry
Sandra Gesing
 

What's hot (20)

Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...
 
Accelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet ProcessingAccelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet Processing
 
Achieving congestion diversity in multi hop wireless mesh networks
Achieving congestion diversity in multi hop wireless mesh networksAchieving congestion diversity in multi hop wireless mesh networks
Achieving congestion diversity in multi hop wireless mesh networks
 
Data center architectures and their relevance in cloud processes
Data center architectures and their relevance in cloud processesData center architectures and their relevance in cloud processes
Data center architectures and their relevance in cloud processes
 
Provisioning Janet
Provisioning JanetProvisioning Janet
Provisioning Janet
 
Application Behavior-Aware Flow Control in Network-on-Chip
Application Behavior-Aware Flow Control in Network-on-ChipApplication Behavior-Aware Flow Control in Network-on-Chip
Application Behavior-Aware Flow Control in Network-on-Chip
 
Whitebox Switches Deployment Experience
Whitebox Switches Deployment ExperienceWhitebox Switches Deployment Experience
Whitebox Switches Deployment Experience
 
UDT.pptx
UDT.pptxUDT.pptx
UDT.pptx
 
P4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and ControlP4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and Control
 
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Multimedia flow classification at 10 Gbps using acceleration techniques on co...Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
 
OpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouterOpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouter
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
 
Stacks and Layers: Integrating P4, C, OVS and OpenStack
Stacks and Layers: Integrating P4, C, OVS and OpenStackStacks and Layers: Integrating P4, C, OVS and OpenStack
Stacks and Layers: Integrating P4, C, OVS and OpenStack
 
The Impact of Software-based Virtual Network in the Public Cloud
The Impact of Software-based Virtual Network in the Public CloudThe Impact of Software-based Virtual Network in the Public Cloud
The Impact of Software-based Virtual Network in the Public Cloud
 
Sierra Supercomputer: Science Unleashed
Sierra Supercomputer: Science UnleashedSierra Supercomputer: Science Unleashed
Sierra Supercomputer: Science Unleashed
 
PlatCon-19 ICMPv6SD
PlatCon-19 ICMPv6SDPlatCon-19 ICMPv6SD
PlatCon-19 ICMPv6SD
 
Datacenter traffic demand characterization
Datacenter traffic demand characterizationDatacenter traffic demand characterization
Datacenter traffic demand characterization
 
The Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a MissionThe Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a Mission
 
Data Plane and VNF Acceleration Mini Summit
Data Plane and VNF Acceleration Mini Summit Data Plane and VNF Acceleration Mini Summit
Data Plane and VNF Acceleration Mini Summit
 
User-­friendly Metaworkflows in Quantum Chemistry
User-­friendly Metaworkflows in Quantum ChemistryUser-­friendly Metaworkflows in Quantum Chemistry
User-­friendly Metaworkflows in Quantum Chemistry
 

Similar to Linac Coherent Light Source (LCLS) Data Transfer Requirements

Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
Infrastructure Strategies 2007
Infrastructure Strategies 2007Infrastructure Strategies 2007
Infrastructure Strategies 2007
Dr. Jimmy Schwarzkopf
 
An Architecture for Data Intensive Service Enabled by Next Generation Optical...
An Architecture for Data Intensive Service Enabled by Next Generation Optical...An Architecture for Data Intensive Service Enabled by Next Generation Optical...
An Architecture for Data Intensive Service Enabled by Next Generation Optical...
Tal Lavian Ph.D.
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
Globus
 
Data Center Network Trends - Lin Nease
Data Center Network Trends - Lin NeaseData Center Network Trends - Lin Nease
Data Center Network Trends - Lin Nease
HPDutchWorld
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed system
Deepak Shankar
 
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Community
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
inside-BigData.com
 
Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance
Ceph Community
 
Dcn invited ecoc2018_short
Dcn invited ecoc2018_shortDcn invited ecoc2018_short
Dcn invited ecoc2018_short
Shuangyi Yan
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
Jim St. Leger
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
Vikas Deolaliker
 
Parallel session: supporting data-intensive applications
Parallel session: supporting data-intensive applicationsParallel session: supporting data-intensive applications
Parallel session: supporting data-intensive applications
Jisc
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
Rakuten Group, Inc.
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
HostedbyConfluent
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Cloudera, Inc.
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
GlobalLogic Ukraine
 

Similar to Linac Coherent Light Source (LCLS) Data Transfer Requirements (20)

Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Infrastructure Strategies 2007
Infrastructure Strategies 2007Infrastructure Strategies 2007
Infrastructure Strategies 2007
 
An Architecture for Data Intensive Service Enabled by Next Generation Optical...
An Architecture for Data Intensive Service Enabled by Next Generation Optical...An Architecture for Data Intensive Service Enabled by Next Generation Optical...
An Architecture for Data Intensive Service Enabled by Next Generation Optical...
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
 
Data Center Network Trends - Lin Nease
Data Center Network Trends - Lin NeaseData Center Network Trends - Lin Nease
Data Center Network Trends - Lin Nease
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed system
 
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance
 
Dcn invited ecoc2018_short
Dcn invited ecoc2018_shortDcn invited ecoc2018_short
Dcn invited ecoc2018_short
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
Parallel session: supporting data-intensive applications
Parallel session: supporting data-intensive applicationsParallel session: supporting data-intensive applications
Parallel session: supporting data-intensive applications
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 

More from inside-BigData.com

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 

Recently uploaded (20)

Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 

Linac Coherent Light Source (LCLS) Data Transfer Requirements

  • 1. Dr. Les Cottrell, SLAC <cottrell@slac.stanford.edu> Linac Coherent Light Source (LCLS) Data Transfer Requirements HPC talk Stanford Feb 2018
  • 2. 2 LCLS-II, a major (~ B$) upgrade to LCLS is currently underway. Online in 2020. Video
  • 3. Basic instrument layout: Optical laser pump, and x-ray laser probe Basement Level Optical Laser X-ray hutch X-Rays Sub-Basement Level 3
  • 4. Example experiment #1 ‘Molecular Movie’ Captures Ultrafast Chemistry in Motion Scientific Achievement Time-resolved observation of an evolving chemical reaction triggered by light. Method LCLS X-ray pulses were delivered at different time intervals, measuring the structural changes on an X-ray area detector. M.P. Minitti, J.M. Budarz, et al., Phys. Rev. Lett., 114, 255501 (2015) (COVER ARTICLE) Significance and Impact Results pave the way for a wide range of X-ray studies examining gas phase chemistry and the structural dynamics associated with the chemical reactions they undergo. 4
  • 5. 5 Next example: Catalysis Another related example Polluting gases Porous substrate coated with precious metals Exhaust gases react with precious metals H2O, CO2 Catalyst HC, CO, NOX 50% of the world’s gasoline goes through fluid catalytic cracking
  • 6. Example experiment #2: Catalytic converter – transient dynamics resolved at the atomic scale • Surface catalysis of CO oxidation to CO2 • Sub-picosecond transient states, monitored via appearance of new electronic states in the O K-edge x-ray absorption spectrum. H. Öström et al., Science 2015 6
  • 7. Data Analytics for high repetition rate Free Electron Lasers 7 FEL data challenge: ● Ultrafast X-ray pulses from LCLS are used like flashes from a high-speed strobe light, producing stop-action movies of atoms and molecules ● Both data processing and scientific interpretation demand intensive computational analysis LCLS-II represents SLAC’s largest data challenge LCLS-II will increase data throughput by orders of magnitude by 2026, creating an exceptional scientific computing challenge
  • 8. 8 LCLS-II experiments will present challenging computing requirements, in addition to the capacity increase 1. Fast feedback is essential (seconds / minute timescale) to reduce the time to complete the experiment, improve data quality, and increase the success rate 2. 24/7 availability 3. Short burst jobs, needing very short startup time Very disruptive for computers that typically host simulations that run for days 4. Storage represents significant fraction of the overall system, both in cost and complexity: 1. 1 Pbyte/day fast local buffer, 5-10 Pbytes storage for local processing, 10 yr long term offline tape storage 2. 60-300 Teraflops in 2020, grow to 1 Pflop in 2022+ 5. Throughput between storage and processing is critical Currently most LCLS jobs are I/O limited 6. Speed and flexibility of the development cycle is critical Wide variety of experiments, with rapid turnaround, and the need to tune data analysis during experiments These aspects are instrumental also for other SLAC facilities (eg CryoEM, UED, SSRL, FACET-2)
  • 9. 9 LCLS-II data flow: from data production, to online reduction, real-time analysis, and offline interpretation Currently seeking user input on acceptable solutions for data reduction & analysis Megapixel detector X-ray diffraction image Intensity map from multiple pulses Interpretation of system structure / dynamics
  • 10. Critical Requirement for Offsite Resources for LCLS Computing MIRA at Argonne TITAN at Oak Ridge CORI at NERSC ● Several experiments require access to leading edge computers for detailed data analysis. This has its own challenges, in particular: • Need to transfer huge amounts of compressed data from SLAC to supercomputers at other sites • Providing near real-time results/feedback to the experiment • Has to be reliable, production quality The requirements will need long term agreement between BES and ASCR 10
  • 11. 11 Requirement • Today 20 Gbps from SLAC to NERSC =>70Gbps 2019 • Experiments increase efficiency & networking • 2020 LCLS-II online, data rate 120Hz=>1MHz • LCLS-II starts taking data at increased data rate • 2024 1Tbps: • Imaging detectors get faster Moore’s law
  • 12. Offsite Data Transfer: Needs and Plans 12 LCLS-II needs are compatible with SLAC and NERSC plans SLAC plans LCLS-II LCLS-I NERSC plans ESnet6 upgrade
  • 13. 13 Data Transfer testing • Zettar/zx: • Provide HPC data transfer solution (i.e. SW + transfer system reference design): - state of the art, efficient, scalable high speed data transfer • Over carefully selected demonstration hardware • Today using existing equipment • DTNs at NERSC & SLAC • With Lustre file systems at ends • Today 100Gbps link • Using widely used bbcp and xrootd tools • Currently ~55Gbps, exploring limitations • Upgrading border to 2*100Gbps in progress
  • 14. 14 Data transfer performance: Test bed: two clusters at SLAC with 5000 mile link Space efficient: 6U per cluster Energy efficient, <80Gbps
  • 15. NG demonstration 100Gbps 100Gbps Storage servers > 25TBytes in 8 SSDs Data Transfer Nodes (DTNs) IP over InfiniBand IPoIB) 4* 56 Gbps Other cluster or High speed Internet n(2)*100GbE 4* 2 * 25GbE 2x100G LAG
  • 16. 16 Memory to Memory between clusters with 2*100Gbps • No storage involved just DTN to DTN mem-to-mem • Extended locally to 200Gbps • Here repeated 3 times • Note uniformity of 8* 25Gbps interfaces. • Can simply use TCP, no need for exotic proprietary protocols • Network is not a problem
  • 17. 17 Storage • On the other hand, file-to-file transfers are at the mercy of the back-end storage performance. • Even with generous compute power and network bandwidth available, the best designed and implemented data transfer software cannot create any magic with a slow storage backend
  • 18. 18 XFS READ performance of 8*SSDs in a file server measured by Unix fio utility 20GBps 15GBps 10GBps 5GBps 0GBps 15:16 15:18 Read Throughput 15:18 SSD busy 800% 600% 400% 200% 15:16 15:16 15:18 Queue Size 3500 2500 1500 Data size = 5*200GiB files similar to typical LCLS large file sizes Note reading SSD busy, uniformity, plenty of objects in queue yields close to raw throughput available
  • 19. 19 XFS + parallel file system WRITE performance for 16 SSDs in 2 file servers SSD busy 10GBps SSD write throughput 50 Queue size of pending writes Write much slower than read File system layers can’t keep queue full (factor 1000 less items queued than for reads)
  • 20. 20 Conclusion Network is fine, can drive 200Gbps, no need for proprietary protocols Insufficient IOPS for write < 50% of raw capability - Today limited to 80-90Gbps file transfer Work with local vendors • State of art components fail, need fast replacements • Worst case waited 2 months for parts Use fastest SSDs for write • We used Intel DC P3700 NVMe 1.6TB drives • Biggest also fastest but also most expensive • 1.6TB $1677 vs 2.0TB $2655 ; 20% improvement 60% cost increase Parallel file system is bottleneck • Needs enhancing for modern hardware & OS
  • 21. 21 Demonstration PetaByte in < 1.5days at ~70Gbps on <80Gbps shared link 1/3rd of ESnet's traffic
  • 22. 22 Impact on all ESnet traffic 12pm 3pm 6pm 9pm Mon 10 3am 6am 9am 100G 200G OSCARS LHCONE Other When running data transfer contributes ~ 1/3 of total ESnet traffic
  • 23. 23 Summary What is special: • Scalable. Add more NICs, more DTNs, more storage servers, links as needed/available… • Power, & space efficient; low cost • HA tolerant to loss of components • Storage tiering friendly • Reference designs • Easy to use, production-ready software Proposed Future PetaByte Club A member of the Petabyte Club MUST be an organization that is capable of using a shared production point-to-point WAN link to attain a production data transfer rate >= 150PiB-mile/hour
  • 24. 24 Future Upgrade 200Gbps border at SLAC to 2x100Gbps to Esnet – Spring 2018 Then onto SLAC to NERSC • Very special environment, hard to modify • Focus has been different to what we need: • vector computing and communication-intensive problems at expense of faster cpus - but LCLS embarassingly parallel • Not easy to use for cluster computing such as xrootd or zx • Unless Cray supports porting application - unlikely • Using the Burst Buffer (BB) is not easy from an application • BB Will probably limit data transfers performance • Normally used from batch so challenge for near-realtime • Can bypass the NERSC DTNs and go directly to the Cori nodes which have direct access to the BB. • However DTNs provided common ways to customize, more agile • Harder to do for Cori nodes • Complex supercomputer, • components tend to be behind very latest technology curve Looking to exceed PB/day by SC2018