- The document summarizes the networks at CERN, including the IT-CS group which manages communication services, the extensive IP network connecting equipment and facilities, and the networks supporting the Large Hadron Collider experiments. It describes the large data flows and computing challenges of the LHC experiments and the worldwide computing grid (WLCG) established to support this. It focuses on the LHCOPN connecting CERN and the Tier1 centers and the new LHCONE being developed to better support the changing computing models of the experiments.
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemLarry Smarr
12.10.11
Keynote Lecture
12th Annual Global LambdaGrid Workshop
Title: How to Terminate the GLIF by Building a Campus Big Data Freeway System
Chicago, IL
The BonFIRE architecture was presented at the TridentCom Conference. These are the slides for the paper, which describes the key components and principles of the architecture and also some specific features offered to experimenter that are not available elsewhere.
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemLarry Smarr
12.10.11
Keynote Lecture
12th Annual Global LambdaGrid Workshop
Title: How to Terminate the GLIF by Building a Campus Big Data Freeway System
Chicago, IL
The BonFIRE architecture was presented at the TridentCom Conference. These are the slides for the paper, which describes the key components and principles of the architecture and also some specific features offered to experimenter that are not available elsewhere.
In this deck from ATPESC 2019, Jack Dongarra from UT Knoxville presents: Adaptive Linear Solvers and Eigensolvers.
"Success in large-scale scientific computations often depends on algorithm design. Even the fastest machine may prove to be inadequate if insufficient attention is paid to the way in which the computation is organized. We have used several problems from computational physics to illustrate the importance of good algorithms, and we offer some very general principles for designing algorithms. Two subthemes are, first, the strong connection between the algorithm and the architecture of the target machine; and second, the importance of non-numerical methods in scientific computations."
Watch the video: https://wp.me/p3RLHQ-lq3
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this talk Jiří Pírko discusses the design and evolution of the VLAN implementation in Linux, the challenges and pitfalls as well as hardware acceleration and alternative implementations.
Jiří Pírko is a major contributor to kernel networking and the creator of libteam for link aggregation.
Guido Appenzeller
CEO
Big Switch Networks
ONS2015: http://bit.ly/ons2015sd
ONS Inspire! Webinars: http://bit.ly/oiw-sd
Watch the talk (video) on ONS Content Archives: http://bit.ly/ons-archives-sd
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Larry Smarr
08.02.21
Presentation
Philip Papadopoulos, Larry Smarr, Joseph Ford, Shaya Fainman, and Brian Dunne
University of California, San Diego
Title: Using Photonics to Prototype the Research Campus Infrastructure of the Future: The UCSD Quartzite Project
La Jolla, CA
In this deck from ATPESC 2019, Jack Dongarra from UT Knoxville presents: Adaptive Linear Solvers and Eigensolvers.
"Success in large-scale scientific computations often depends on algorithm design. Even the fastest machine may prove to be inadequate if insufficient attention is paid to the way in which the computation is organized. We have used several problems from computational physics to illustrate the importance of good algorithms, and we offer some very general principles for designing algorithms. Two subthemes are, first, the strong connection between the algorithm and the architecture of the target machine; and second, the importance of non-numerical methods in scientific computations."
Watch the video: https://wp.me/p3RLHQ-lq3
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this talk Jiří Pírko discusses the design and evolution of the VLAN implementation in Linux, the challenges and pitfalls as well as hardware acceleration and alternative implementations.
Jiří Pírko is a major contributor to kernel networking and the creator of libteam for link aggregation.
Guido Appenzeller
CEO
Big Switch Networks
ONS2015: http://bit.ly/ons2015sd
ONS Inspire! Webinars: http://bit.ly/oiw-sd
Watch the talk (video) on ONS Content Archives: http://bit.ly/ons-archives-sd
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Larry Smarr
08.02.21
Presentation
Philip Papadopoulos, Larry Smarr, Joseph Ford, Shaya Fainman, and Brian Dunne
University of California, San Diego
Title: Using Photonics to Prototype the Research Campus Infrastructure of the Future: The UCSD Quartzite Project
La Jolla, CA
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
11.04.06
Joint Presentation
UCSD School of Medicine Research Council
Larry Smarr, Calit2 & Phil Papadopoulos, SDSC/Calit2
Title: High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biomedical Sciences
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...Larry Smarr
11.03.28
Remote Luncheon Presentation from Calit2@UCSD
National Science Board
Expert Panel Discussion on Data Policies
National Science Foundation
Title: High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Science and Engineering
Arlington, Virginia
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...Larry Smarr
10.04.07
Presentation by Larry Smarr to the NSF Campus Bridging Workshop
University Place Conference Center
Title: A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging End-User Laboratories to Data-Intensive Sources
Indianapolis, IN
The Transformation of Systems Biology Into A Large Data ScienceRobert Grossman
This is a talk I gave at the Institute for Genomics & System Biology (IGSB) on December 7, 2009. The talk looks at the role of cloud computing platforms, including private clouds, for managing the large data produced by next generation sequencing platforms.
Review of CERN's objectives and how the computing infrastructure is evolving to address the challenges at scale using community supported software such as Puppet and OpenStack.
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
NRP Engagement webinar: Description of the 380 PFLOP32S , 51k GPU multi-cloud burst using HTCondor to run IceCube photon propagation simulation.
Presented January 27th, 2020.
Overview of Photonics Research at Calit2: Scaling from Nanometers to the EarthLarry Smarr
10.03.26
UCSD-NICT Joint Symposium on Innovative Lightwave,
Millimeter-Wave and THz Technologies
for Future Sustainable Network
Title: Overview of Photonics Research at Calit2: Scaling from Nanometers to the Earth
La Jolla, CA
Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * C...Larry Smarr
09.11.03
Report to the
Dept. of Energy Advanced Scientific Computing Advisory Committee
Title: Project StarGate An End-to-End 10Gbps HPC to User Cyberinfrastructure ANL * Calit2 * LBNL * NICS * ORNL * SDSC
Oak Ridge, TN
In this video from PASC18, Alexander Nitz from the Max Planck Institute for Gravitational Physics in Germany presents: The Search for Gravitational Waves.
"The LIGO and Virgo detectors have completed a prolific observation run. We are now observing gravitational waves from both the mergers of binary black holes and neutron stars. We’ll discuss how these discoveries were made and look into what the near future of searching for gravitational waves from compact binary mergers will look like."
Watch the video: https://wp.me/p3RLHQ-iTv
Learn more: github.com/gwastro/pycbc
and
https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
11.12.12
Seminar Presentation
Princeton Institute for Computational Science and Engineering (PICSciE)
Princeton University
Title: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
Princeton, NJ
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...inside-BigData.com
In this deck from the 2018 Swiss HPC Conference, Gilles Fourestey from EPFL presents: Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lensing Software.
"LENSTOOL is a gravitational lensing software that models mass distribution of galaxies and clusters. It was developed by Prof. Kneib, head of the LASTRO lab at EPFL, et al., starting from 1996. It is used to obtain sub-percent precision measurements of the total mass in galaxy clusters and constrain the dark matter self-interaction cross-section, a crucial ingredient to understanding its nature.
However, LENSTOOL lacks efficient vectorization and only uses OpenMP, which limits its execution to one node and can lead to execution times that exceed several months. Therefore, the LASTRO and the EPFL HPC group decided to rewrite the code from scratch and in order to minimize risk and maximize performance, a bottom-up approach that focuses on exposing parallelism at hardware and instruction levels was used. The result is a high performance code, fully vectorized on Xeon, Xeon Phis and GPUs that currently scales up to hundreds of nodes on CSCS’ Piz Daint, one of the fastest supercomputers in the world."
Watch the video: https://wp.me/p3RLHQ-ili
Learn more: https://infoscience.epfl.ch/record/234382/files/EPFL_TH8338.pdf?subformat=pdfa
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...Larry Smarr
11.05.13
Invited Presentation
Sanford Consortium for Regenerative Medicine
Salk Institute, La Jolla
Larry Smarr, Calit2 & Phil Papadopoulos, SDSC/Calit2
Title: High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting Stem Cell Research
Similar to The size and complexity of the CERN network (20)
8. High Energy Physics over IP
Most of the CERN infrastructure is
controlled and managed over a
pervasive IP network
8
9. Cryogenics
27Km of pipes at
-271.11° C by means of
700.000 litres of Helium:
controlled over IP
Source: http://te-dep-crg-oa.web.cern.ch/te-dep-crg-oa/te-crg-oa_fichiers/cryolhc/LHC%20Cryo_BEOP_lectures2009.pdf
9
10. Access control
Safety and Security: made over IP
Source:https://edms.cern.ch/file/931641/1/LASS-LACS_IHM.pdf
10
11. Remote inspections
Remote inspection of dangerous areas: robots
controlled and giving feedback over WiFi and
GSM IP networks
11
12. DAQ: Data Acquisition
A constant stream of data from the four Detectors to disk storage
Source: http://aliceinfo.cern.ch/Public/Objects/Chapter2/DetectorComponents/daq_architecture.pdf
12
13. CCC: CERN Control Centre
The neuralgic centre of the particle accelerator: over IP
13
14. CERN data network
- 150 routers
- 2200 Switches
- 50000 connected devices
- 5000km of optical fibres
14
15. Network Provisioning and
Management System
- 250 Database
tables
- 100,000
Registered devices
- 50,000 hits/day on
web user interface
- 1,000,000 lines of
codes
- 11 years of
development
15
22. Data flow
3 PBytes/s
4 Experiments
Filter and first selection
2 GBytes/s
to the CERN computer center
Create sub-samples
World-Wide Analysis
1 TByte/s ?
Distributed + local Physics
Explanation of nature
10 GBytes/s 4 GBytes/s
sΓZ2
Store on disk and tape Export copies σ _ ≈σ _×
0
with
ff ff ( s - m 2 ) 2 + s 2 ΓZ / m 2
Z
2
z
12π Γ Γ GF m 3
σ 0
_ = 2 ee 2 ff and Γff = Z
× ( v f2 + a f2 ) × N col
ff m Z ΓZ 6π 2
22
23. Data Challenge
- 40 million collisions per second
- After filtering, 100 collisions of interest per
second
- 1010 collisions recorded each year
= 15 Petabytes/year of data
23
31. Trends
Virtualization mobility (Software Defined
Networks)
Commodity Servers with 10G NICs
High-end Servers with 40G NICs
40G and 100G interfaces on switches and
routers
31
34. A collaborative effort
Designed, built and operated by the Tier0-Tier1s
community
Links provided by the Research and Education
network providers: Geant, USLHCnet, Esnet,
Canarie, ASnet, Nordunet, Surfnet, GARR,
Renater, JANET.UK, Rediris, DFN, SWITCH
34
35. Technology
- Single and bundled long distance 10G
ethernet links
- Multiple redundant paths. Star+PartialMesh
topology
- BGP routing: communities for traffic
engineering, load balancing.
- Security: only declared IP prefixes can
exchange traffic.
35
39. Driving the change
“The Network infrastructure is the most
reliable service we have”
“Network Bandwidth (rather than disk)
will need to scale more with users and
data volume”
“Data placement will be driven by
demand for analysis and not pre-
placement”
Ian Bird, WLCG project leader
39
41. New computing model
- Better and more dynamic use of storage
- Reduce the load on the Tier1s for data serving
- Increase the speed to populate analysis
facilities
Needs for a faster, predictable, pervasive
network connecting Tier1s and Tier2s
41
42. Requirements from the Experiments
- Connecting any pair of sites, regardless of the
continent they reside
- Bandwidth ranging from 1Gbps (Minimal), 5Gbps
(Nominal), 10G and above (Leadership)
- Scalability: sites are expected to grow
- Flexibility: sites may join and leave at any time
- Predictable cost: well defined cost, and not too
high
42
43. Needs for a better network
- more bandwidth by federating (existing)
resources
- sharing cost of expensive resources
- accessible to any TierX site
=
LHC Open Network Environment
43
44. LHCONE concepts
- Serves any LHC sites according to their needs and
allowing them to grow
- A collaborative effort among Research & Education
Network Providers
- Based on Open Exchange Points: easy to join,
neutral
- Multiple services: one cannot fit all
- Traffic separation: no clash with other data transfer,
resource allocated for and funded by HEP community
44
46. LHCONE building blocks
- Single node Exchange Points
- Continental/regional Distributed Exchange Points
- Interconnect circuits between Exchange Points
These exchange points and the links in between
collectively provide LHCONE services and operate under a
common LHCONE policy
46
51. CINBAD
CERN Investigation of Network Behaviour and Anomaly Detection
Project Goals:
Understand the behaviour of large computer networks (10’000+
nodes) in High Performance Computing or large Campus
installations to be able to:
●
detect traffic anomalies in the system
●
perform trend analysis
●
automatically take counter measures
●
provide post-mortem analysis facilities
Resources:
- In collaboration with HP Networking
- Two Engineers in IT-CS
51
52. Results
Project completed in 2010
For CERN:
Designed and deployed a complete framework
(hardware and software) to detect anomalies in
the Campus Network (GPN)
For HP:
Intellectual properties of new technologies used
in commercial products
52
55. WIND
Wireless Infrastructure Network Deployment
Project Goals
- Analyze the problems of large scale wireless deployments and
understand the constraint
- Simulate behaviour of WLAN
- Develop new optimisation algorithms
Resources:
- In collaboration with HP Networking
- Two Engineers in IT-CS
- Started in 2010
55
56. Needs
Wireless LAN (WLAN) deployments are
problematic:
●
Radio propagation is very difficult to predict
●
Interference is an ever present danger
●
WLANs are difficult to properly deploy
●
Monitoring was not an issue when the first standards
were developed
●
When administrators are struggling just to operate the
WLAN, performance optimisation is often forgotten
56
57. Example: Radio interferences
Max data rate in 0031-S: The APs work on the same channel
Max data rate in 0031-S: The APs work on 3 independent channels
57
58. Expected results
Extend monitoring and analysis tools
Act on the network
- smart load balancing
- isolating misbehaving clients
- intelligent minimum data rates
More accurate troubleshooting
Streamline WLAN design
58
60. ViSION
Project Goals:
- Develop a SDN traffic orchestrator using OpenFlow
Resources:
- In collaboration with HP Networking
- Two Engineers in IT-CS
- Started in 2012
60
61. Goals
SDN traffic orchestrator using OpenFlow:
●
distribute traffic over a set of network resources
●
perform classification (different types of applications and
resources)
●
perform load sharing (similar resources).
Benefits:
●
improved scalability and control than traditional networking
technologies
61
62. From traditional networks...
Closed boxes, fully distributed protocols
App App App
Operating
System
App App App
Specialized Packet
Forwarding Hardware
Operating
System
App App App
Specialized Packet
Forwarding Hardware
Operating
System
App App App
Specialized Packet
Forwarding Hardware Operating
System
Specialized Packet
App App App Forwarding Hardware
Operating
System
Specialized Packet
Forwarding Hardware
62
63. .. to Software Defined Networks (SDN)
2. At least one good operating system
3. Well-defined open API Extensible, possibly open-source
App App App
Network Operating System
1. Open interface to hardware
(OpenFlow)
Simple Packet
Forwarding
Hardware
Simple Packet
Forwarding
Hardware
Simple Packet
Forwarding
Hardware
Simple Packet
Forwarding
Hardware
Simple Packet
Forwarding
Hardware
63
64. OpenFlow example Controller
PC
Software
Layer
OpenFlow Client
Flow Table
MAC MAC IP IP TCP TCP
Action Hardware Forwarding table remotely
src dst Src Dst sport dport Hardware Forwarding table remotely
Hardware controlled
controlled
* * * 5.6.7.8 * * port 1
Layer
port 1 port 2 port 3 port 4
64
66. Conclusions
- The Data Network is an essential component
of the LHC instrument
- The Data Network is a key part of the LHC
data processing and will become even more
important
- More and more security and design
challenges to come
66
67. Credits
Artur Barczyk (LHCONE)
Dan Savu (VISION)
Milosz Hulboj (WIND and CINBAD)
Ryszrard Jurga (CINBAD)
Sebastien Ceuterickx (WIND)
Stefan Stancu (VISION)
Vlad Lapadatescu (WIND)
67
68. What's next
SWAN: Space Wide Area Network :-)
68