Grappling with Massive
Data Sets
Gavin McCance, CERN IT
Digital Energy 2018
1 May 2018 | Aberdeen
06/06/2018 OpenStack at CERN 2
OpenStack at CERN : A 5 year perspective
Tim Bell
tim.bell@cern.ch
@noggin143
OpenStack Days Budapest 2018
About Me - @noggin143
• Responsible for
Compute and Monitoring
at CERN
• Elected member of the
OpenStack Foundation
board
• Member of the
OpenStack user
committee from 2013-
2015
06/06/2018 OpenStack at CERN 3
OpenStack at CERN 4
CERNa
Worldwide
collaboration
CERN’s primary mission:
SCIENCE
Fundamental research on particle physics,
pushing the boundaries of knowledge and
technology
06/06/2018
CERN
World’s largest
particle physics
laboratory
OpenStack at CERN 5
Image credit: CERN
06/06/2018
06/06/2018 OpenStack at CERN 6
Evolution of the Universe
Test the
Standard
Model?
What’s matter
made of?
What holds it
together?
Anti-matter?
(Gravity?)
OpenStack at CERN 7
The Large Hadron Collider: LHC
1232
dipole magnets
15 metres
35t EACH
27km
Image credit: CERN
06/06/2018
Image credit: CERN
COLDER
TEMPERATURES
than outer space
( 120t He )
OpenStack at CERN 8
LHC: World’s Largest Cryogenic System (1.9 K)
06/06/2018
Vacuum?
• Yes
OpenStack at CERN 9
LHC: Highest Vacuum
104 km
of PIPES
10-11bar (~ moon)
Image credit: CERN
06/06/2018
Image credit: CERN
Image credit: CERN
OpenStack at CERN 10
ATLAS, CMS, ALICE and LHCb
EIFFEL
TOWER
HEAVIER
than the
Image credit: CERN
06/06/2018
OpenStack at CERN 11
40 million
pictures
per second
1PB/s
Image credit: CERN
OpenStack at CERN 12
Data Flow to Storage and Processing
ALICE: 4GB/s
ATLAS: 1GB/s
CMS: 600MB/s
LHCB: 750MB/s
RUN 2CERN DC
06/06/2018
Image credit: CERN
OpenStack at CERN 13
CERN Data Centre: Primary Copy of LHC Data
Data Centre on Google Street View
90k disks
15k servers
> 200 PB
on TAPES
06/06/2018
About WLCG:
• A community of 10,000 physicists
• ~250,000 jobs running concurrently
• 600,000 processing cores
• 700 PB storage available worldwide
• 20-40 Gbit/s connect CERN to Tier1s
Tier-0 (CERN)
• Initial data reconstruction
• Data recording & archiving
• Data distribution to rest of world
Tier-1s (14 centres worldwide)
• Permanent storage
• Re-processing
• Monte Carlo Simulation
• End-user analysis
Tier-2s (>150 centres worldwide)
• Monte Carlo Simulation
• End-user analysis
WLCG: LHC Computing Grid
Image credit: CERN
170 sites
WORLDWIDE
> 10000
users
CERN in 2017
230 PB on tape
550 million files
2017
55 PB produced
TB
06/06/2018 OpenStack at CERN 15
Cloud
OpenStack at CERN 16
CERN Data Centre: Private OpenStack Cloud
More Than
300 000
cores
More Than
500 000
physics jobs
per day
06/06/2018
Infrastructure in 2011
• Data centre managed by home grown toolset
• Initial development funded by EU projects
• Quattor, Lemon, …
• Development environment based on CVS
• 100K or so lines of Perl
• At the limit for power and cooling in Geneva
• No simple expansion options
06/06/2018 OpenStack at CERN 17
Wigner Data Centre
06/06/2018 OpenStack at CERN 18
Started project in 2011 with
inauguration in June 2013
Getting resources in 2011
06/06/2018 OpenStack at CERN 19
OpenStack London July 2011
06/06/2018 OpenStack at CERN 20
2011 - First OpenStack summit talk
06/06/2018 OpenStack at CERN 21
https://www.slideshare.net/noggin143/cern-user-story
The Agile Infrastructure Project
2012, a turning point for CERN IT:
- LHC Computing and data requirements were
increasing … Moore’s law would help, but not enough
- EU funded projects for fabric management
toolset ended
- Staff fixed but must grow resources
- LS1 (2013) ahead, next window only in 2019!
- Other deployments have surpassed CERN‘s
Three core areas:
- Centralized Monitoring
- Config’ management
- IaaS based on OpenStack
“All servers shall be virtual!”
06/06/2018 OpenStack at CERN 22
CERN Tool Chain
06/06/2018 OpenStack at CERN 23
06/06/2018 OpenStack at CERN 24
And block storage.. February 2013
06/06/2018 OpenStack at CERN 25
Sharing with Central Europe – May 2013
06/06/2018 OpenStack at CERN 26
https://www.slideshare.net/noggin143/20130529-openstack-ceedayv6
Production in Summer 2013
06/06/2018 OpenStack at CERN 27
06/06/2018 OpenStack at CERN 28
CERN Ceph Clusters Size Version
OpenStack Cinder/Glance Production 5.5PB jewel
Satellite data centre (1000km away) 0.4PB luminous
CephFS (HPC+Manila) Production 0.8PB luminous
Manila testing cluster 0.4PB luminous
Hyperconverged HPC 0.4PB luminous
CASTOR/XRootD Production 4.2PB luminous
CERN Tape Archive 0.8PB luminous
S3+SWIFT Production 0.9PB luminous
29
+5PB in the pipeline
06/06/2018 OpenStack at CERN
Bigbang Scale Tests
• Bigbang scale tests mutually benefit
CERN & Ceph project
• Bigbang I: 30PB, 7200 OSDs, Ceph
hammer. Several osdmap limitations
• Bigbang II: Similar size, Ceph jewel.
Scalability limited by OSD/MON
messaging. Motivated ceph-mgr
• Bigbang III: 65PB, 10800 OSDs
30
https://ceph.com/community/new-luminous-scalability/
06/06/2018 OpenStack at CERN
OpenStack Magnum
An OpenStack API Service that allows creation of
container clusters
● Use your keystone credentials
● You choose your cluster type
● Multi-Tenancy
● Quickly create new clusters with advanced
features such as multi-master
OpenStack Magnum
$ openstack coe cluster create --cluster-template kubernetes --node-count 100 … mycluster
$ openstack cluster list
+------+----------------+------------+--------------+-----------------+
| uuid | name | node_count | master_count | status |
+------+----------------+------------+--------------+-----------------+
| .... | mycluster | 100 | 1 | CREATE_COMPLETE |
+------+----------------+------------+--------------+-----------------+
$ $(magnum cluster-config mycluster --dir mycluster)
$ kubectl get pod
$ openstack coe cluster update mycluster replace node_count=200
Single command cluster creation
33
Why Bare-Metal Provisioning?
• VMs not sensible/suitable for all of our use cases
- Storage and database nodes, HPC clusters, boot strapping,
critical network equipment or specialised network setups,
precise/repeatable benchmarking for s/w frameworks, …
• Complete our service offerings
- Physical nodes (in addition to VMs and containers)
- OpenStack UI as the single pane of glass
• Simplify hardware provisioning workflows
- For users: openstack server create/delete
- For procurement & h/w provisioning team: initial on-boarding, server re-assignments
• Consolidate accounting & bookkeeping
- Resource accounting input will come from less sources
- Machine re-assignments will be easier to track
06/06/2018 OpenStack at CERN
Compute Intensive Workloads on VMs
• Up to 20% loss on very large VMs!
• “Tuning”: KSM*, EPT**, pinning, … 10%
• Compare with Hyper-V: no issue
• Numa-awares & node pinning ... <3%!
• Cross over : patches from Telecom
(*) Kernel Shared Memory
(**) Extended Page Tables
06/06/2018 OpenStack at CERN 34
VM Before After
4x 8 7.8%
2x 16 16%
1x 24 20% 5%
1x 32 20% 3%
06/06/2018 OpenStack at CERN 35
A new use case: Containers on Bare-Metal
• OpenStack managed containers and bare
metal so put them together
• General service offer: managed clusters
- Users get only K8s credentials
- Cloud team manages the cluster and the underlying infra
• Batch farm runs in VMs as well
- Evaluating federated kubernetes for hybrid cloud integration
- 7 clouds federated demonstrated at Kubecon
- OpenStack and non-OpenStack transparently managed
Integration: seamless!
(based on specific template)
Monitoring (metrics/logs)?
 Pod in the cluster
 Logs: fluentd + ES
 Metrics: cadvisor + influx
• h/w purchases: formal procedure compliant with public procurements
- Market survey identifies potential bidders
- Tender spec is sent to ask for offers
- Larger deliveries 1-2 times / year
• “Burn-in” before acceptance
- Compliance with technical spec (e.g. performance)
- Find failed components (e.g. broken RAM)
- Find systematic errors (e.g. bad firmware)
- Provoke early failing due to stress
Whole process can take weeks!
Hardware Burn-in in the CERN Data Centre (1)
“bathtub curve”
06/06/2018 OpenStack at CERN 36
Hardware Burn-in in the CERN Data Centre (2)
• Initial checks: Serial Asset Tag and BIOS settings
- Purchase order ID and unique serial no. to be set in the BMC (node name!)
• “Burn-in” tests
- CPU: burnK7, burnP6, burnMMX (cooling)
- RAM: memtest, Disk: badblocks
- Network: iperf(3) between pairs of nodes
- automatic node pairing
- Benchmarking: HEPSpec06 (& fio)
- derivative of SPEC06
- we buy total compute capacity (not newest processors)
$ ipmitool fru print 0 | tail -2
Product Serial : 245410-1
Product Asset Tag : CD5792984
$ openstack baremetal node show CD5792984-245410-1
“Double peak” structure due
to slower hardware threads
OpenAccess paper
06/06/2018 OpenStack at CERN 37
38
Re-allocationAllocation
Foreman
Recently added
Burn-in
06/06/2018 OpenStack at CERN
39
Phase 1.
Nova Network
Linux Bridge
Phase 2.
Neutron
Linux Bridge
Phase 3.
SDN
Tungsten Fabric (testing)
Network Migration
New Region
coming in 2018
Already running
* But still used in 2018
*
Spectre / Meltdown
 In January, a security vulnerability was
disclosed a new kernel everywhere
 Campaign over two weeks from15th
January
 7 reboot days, 7 tidy up days
 By availability zone
 Benefits
 Automation now to reboot the cloud if
needed - 33,000 VMs on 9,000
hypervisors
 Latest QEMU and RBD user code on all
VMs
 Downside
 Discovered Kernel bug in XFS which may
mean we have to do it again soon
06/06/2018 OpenStack at CERN 40
Community Experience
 Open source collaboration sets model for in-house
teams
 External recognition by the community is highly
rewarding for contributors
 Reviews and being reviewed is a constant
learning experience
 Productive for job market for staff
 Working groups, like the Scientific and Large
Deployment teams, discuss wide range of topics
 Effective knowledge transfer mechanisms
consistent with the CERN mission
 110 outreach talks since 2011
 Dojos at CERN bring good attendance
 Ceph, CentOS, Elastic, OpenStack CH, …
06/06/2018 OpenStack at CERN 41
 Increased complexity due to much higher pile-up and
higher trigger rates will bring several challenges to
reconstruction algorithms
MS had to cope with monster pile-up
8b4e bunch structure à pile-up of ~ 60 events/x-ing
for ~ 20 events/x-ing)
CMS: event with 78 reconstructed vertices
CMS: event from 2017 with 78
reconstructed vertices
ATLAS: simulation for HL-LHC
with 200 vertices
06/06/2018 OpenStack at CERN 42
HL-LHC: More collisions!
06/06/2018 OpenStack at CERN 43
First run LS1 Second run Third run LS3 HL-LHC Run4
…2009 2013 2014 2015 2016 2017 201820112010 2012 2019 2023 2024 2030?20212020 2022 …2025
LS2
 Significant part of cost comes
from global operations
 Even with technology increase of
~15%/year, we still have a big
gap if we keep trying to do things
with our current compute models
Raw data volume
increases significantly
for High Luminosity LHC
2026
Commercial Clouds
06/06/2018 OpenStack at CERN 44
Development areas going forward
 Spot Market
 Cells V2
 Neutron scaling
 Magnum rolling upgrades
 Block Storage Performance
 Federated Kubernetes
 Collaborations with Industry and SKA
06/06/2018 OpenStack at CERN 45
Summary
 OpenStack has provided flexible infrastructure at
CERN since 2013
 The open infrastructure toolchain has been stable
at scale
 Clouds are part but not all of the solution
 Open source collaborations have been fruitful for
CERN, industry and the communities
 Further efforts will be needed to ensure that
physics is not limited by the computing resources
available
06/06/2018 OpenStack at CERN 46
Thanks for all your help .. Some links
 CERN OpenStack blog at http://openstack-
in-production.blogspot.com
 Recent CERN OpenStack talks at
Vancouver summit at
https://www.openstack.org/videos/search?se
arch=cern
 CERN Tools at https://github.com/cernops
06/06/2018 OpenStack at CERN 47
Backup Material
06/06/2018 OpenStack at CERN 48
Hardware Evolution
 Looking at new hardware
platforms to reduce the
upcoming resource gap
 Explorations have been made
in low cost and low power
ARM processors
 Interesting R&Ds in high
performance hardware
 GPUs for deep learning
network training and fast
simulation
 FPGAs for neural network
inference and data
transformations
49
Significant
algorithm changes
needed to benefit
from potential
06/06/2018 OpenStack at CERN

OpenStack at CERN : A 5 year perspective

  • 2.
    Grappling with Massive DataSets Gavin McCance, CERN IT Digital Energy 2018 1 May 2018 | Aberdeen 06/06/2018 OpenStack at CERN 2 OpenStack at CERN : A 5 year perspective Tim Bell tim.bell@cern.ch @noggin143 OpenStack Days Budapest 2018
  • 3.
    About Me -@noggin143 • Responsible for Compute and Monitoring at CERN • Elected member of the OpenStack Foundation board • Member of the OpenStack user committee from 2013- 2015 06/06/2018 OpenStack at CERN 3
  • 4.
    OpenStack at CERN4 CERNa Worldwide collaboration CERN’s primary mission: SCIENCE Fundamental research on particle physics, pushing the boundaries of knowledge and technology 06/06/2018
  • 5.
    CERN World’s largest particle physics laboratory OpenStackat CERN 5 Image credit: CERN 06/06/2018
  • 6.
    06/06/2018 OpenStack atCERN 6 Evolution of the Universe Test the Standard Model? What’s matter made of? What holds it together? Anti-matter? (Gravity?)
  • 7.
    OpenStack at CERN7 The Large Hadron Collider: LHC 1232 dipole magnets 15 metres 35t EACH 27km Image credit: CERN 06/06/2018
  • 8.
    Image credit: CERN COLDER TEMPERATURES thanouter space ( 120t He ) OpenStack at CERN 8 LHC: World’s Largest Cryogenic System (1.9 K) 06/06/2018
  • 9.
    Vacuum? • Yes OpenStack atCERN 9 LHC: Highest Vacuum 104 km of PIPES 10-11bar (~ moon) Image credit: CERN 06/06/2018
  • 10.
    Image credit: CERN Imagecredit: CERN OpenStack at CERN 10 ATLAS, CMS, ALICE and LHCb EIFFEL TOWER HEAVIER than the Image credit: CERN 06/06/2018
  • 11.
    OpenStack at CERN11 40 million pictures per second 1PB/s Image credit: CERN
  • 12.
    OpenStack at CERN12 Data Flow to Storage and Processing ALICE: 4GB/s ATLAS: 1GB/s CMS: 600MB/s LHCB: 750MB/s RUN 2CERN DC 06/06/2018
  • 13.
    Image credit: CERN OpenStackat CERN 13 CERN Data Centre: Primary Copy of LHC Data Data Centre on Google Street View 90k disks 15k servers > 200 PB on TAPES 06/06/2018
  • 14.
    About WLCG: • Acommunity of 10,000 physicists • ~250,000 jobs running concurrently • 600,000 processing cores • 700 PB storage available worldwide • 20-40 Gbit/s connect CERN to Tier1s Tier-0 (CERN) • Initial data reconstruction • Data recording & archiving • Data distribution to rest of world Tier-1s (14 centres worldwide) • Permanent storage • Re-processing • Monte Carlo Simulation • End-user analysis Tier-2s (>150 centres worldwide) • Monte Carlo Simulation • End-user analysis WLCG: LHC Computing Grid Image credit: CERN 170 sites WORLDWIDE > 10000 users
  • 15.
    CERN in 2017 230PB on tape 550 million files 2017 55 PB produced TB 06/06/2018 OpenStack at CERN 15
  • 16.
    Cloud OpenStack at CERN16 CERN Data Centre: Private OpenStack Cloud More Than 300 000 cores More Than 500 000 physics jobs per day 06/06/2018
  • 17.
    Infrastructure in 2011 •Data centre managed by home grown toolset • Initial development funded by EU projects • Quattor, Lemon, … • Development environment based on CVS • 100K or so lines of Perl • At the limit for power and cooling in Geneva • No simple expansion options 06/06/2018 OpenStack at CERN 17
  • 18.
    Wigner Data Centre 06/06/2018OpenStack at CERN 18 Started project in 2011 with inauguration in June 2013
  • 19.
    Getting resources in2011 06/06/2018 OpenStack at CERN 19
  • 20.
    OpenStack London July2011 06/06/2018 OpenStack at CERN 20
  • 21.
    2011 - FirstOpenStack summit talk 06/06/2018 OpenStack at CERN 21 https://www.slideshare.net/noggin143/cern-user-story
  • 22.
    The Agile InfrastructureProject 2012, a turning point for CERN IT: - LHC Computing and data requirements were increasing … Moore’s law would help, but not enough - EU funded projects for fabric management toolset ended - Staff fixed but must grow resources - LS1 (2013) ahead, next window only in 2019! - Other deployments have surpassed CERN‘s Three core areas: - Centralized Monitoring - Config’ management - IaaS based on OpenStack “All servers shall be virtual!” 06/06/2018 OpenStack at CERN 22
  • 23.
    CERN Tool Chain 06/06/2018OpenStack at CERN 23
  • 24.
  • 25.
    And block storage..February 2013 06/06/2018 OpenStack at CERN 25
  • 26.
    Sharing with CentralEurope – May 2013 06/06/2018 OpenStack at CERN 26 https://www.slideshare.net/noggin143/20130529-openstack-ceedayv6
  • 27.
    Production in Summer2013 06/06/2018 OpenStack at CERN 27
  • 28.
  • 29.
    CERN Ceph ClustersSize Version OpenStack Cinder/Glance Production 5.5PB jewel Satellite data centre (1000km away) 0.4PB luminous CephFS (HPC+Manila) Production 0.8PB luminous Manila testing cluster 0.4PB luminous Hyperconverged HPC 0.4PB luminous CASTOR/XRootD Production 4.2PB luminous CERN Tape Archive 0.8PB luminous S3+SWIFT Production 0.9PB luminous 29 +5PB in the pipeline 06/06/2018 OpenStack at CERN
  • 30.
    Bigbang Scale Tests •Bigbang scale tests mutually benefit CERN & Ceph project • Bigbang I: 30PB, 7200 OSDs, Ceph hammer. Several osdmap limitations • Bigbang II: Similar size, Ceph jewel. Scalability limited by OSD/MON messaging. Motivated ceph-mgr • Bigbang III: 65PB, 10800 OSDs 30 https://ceph.com/community/new-luminous-scalability/ 06/06/2018 OpenStack at CERN
  • 31.
    OpenStack Magnum An OpenStackAPI Service that allows creation of container clusters ● Use your keystone credentials ● You choose your cluster type ● Multi-Tenancy ● Quickly create new clusters with advanced features such as multi-master
  • 32.
    OpenStack Magnum $ openstackcoe cluster create --cluster-template kubernetes --node-count 100 … mycluster $ openstack cluster list +------+----------------+------------+--------------+-----------------+ | uuid | name | node_count | master_count | status | +------+----------------+------------+--------------+-----------------+ | .... | mycluster | 100 | 1 | CREATE_COMPLETE | +------+----------------+------------+--------------+-----------------+ $ $(magnum cluster-config mycluster --dir mycluster) $ kubectl get pod $ openstack coe cluster update mycluster replace node_count=200 Single command cluster creation
  • 33.
    33 Why Bare-Metal Provisioning? •VMs not sensible/suitable for all of our use cases - Storage and database nodes, HPC clusters, boot strapping, critical network equipment or specialised network setups, precise/repeatable benchmarking for s/w frameworks, … • Complete our service offerings - Physical nodes (in addition to VMs and containers) - OpenStack UI as the single pane of glass • Simplify hardware provisioning workflows - For users: openstack server create/delete - For procurement & h/w provisioning team: initial on-boarding, server re-assignments • Consolidate accounting & bookkeeping - Resource accounting input will come from less sources - Machine re-assignments will be easier to track 06/06/2018 OpenStack at CERN
  • 34.
    Compute Intensive Workloadson VMs • Up to 20% loss on very large VMs! • “Tuning”: KSM*, EPT**, pinning, … 10% • Compare with Hyper-V: no issue • Numa-awares & node pinning ... <3%! • Cross over : patches from Telecom (*) Kernel Shared Memory (**) Extended Page Tables 06/06/2018 OpenStack at CERN 34 VM Before After 4x 8 7.8% 2x 16 16% 1x 24 20% 5% 1x 32 20% 3%
  • 35.
    06/06/2018 OpenStack atCERN 35 A new use case: Containers on Bare-Metal • OpenStack managed containers and bare metal so put them together • General service offer: managed clusters - Users get only K8s credentials - Cloud team manages the cluster and the underlying infra • Batch farm runs in VMs as well - Evaluating federated kubernetes for hybrid cloud integration - 7 clouds federated demonstrated at Kubecon - OpenStack and non-OpenStack transparently managed Integration: seamless! (based on specific template) Monitoring (metrics/logs)?  Pod in the cluster  Logs: fluentd + ES  Metrics: cadvisor + influx
  • 36.
    • h/w purchases:formal procedure compliant with public procurements - Market survey identifies potential bidders - Tender spec is sent to ask for offers - Larger deliveries 1-2 times / year • “Burn-in” before acceptance - Compliance with technical spec (e.g. performance) - Find failed components (e.g. broken RAM) - Find systematic errors (e.g. bad firmware) - Provoke early failing due to stress Whole process can take weeks! Hardware Burn-in in the CERN Data Centre (1) “bathtub curve” 06/06/2018 OpenStack at CERN 36
  • 37.
    Hardware Burn-in inthe CERN Data Centre (2) • Initial checks: Serial Asset Tag and BIOS settings - Purchase order ID and unique serial no. to be set in the BMC (node name!) • “Burn-in” tests - CPU: burnK7, burnP6, burnMMX (cooling) - RAM: memtest, Disk: badblocks - Network: iperf(3) between pairs of nodes - automatic node pairing - Benchmarking: HEPSpec06 (& fio) - derivative of SPEC06 - we buy total compute capacity (not newest processors) $ ipmitool fru print 0 | tail -2 Product Serial : 245410-1 Product Asset Tag : CD5792984 $ openstack baremetal node show CD5792984-245410-1 “Double peak” structure due to slower hardware threads OpenAccess paper 06/06/2018 OpenStack at CERN 37
  • 38.
  • 39.
    39 Phase 1. Nova Network LinuxBridge Phase 2. Neutron Linux Bridge Phase 3. SDN Tungsten Fabric (testing) Network Migration New Region coming in 2018 Already running * But still used in 2018 *
  • 40.
    Spectre / Meltdown In January, a security vulnerability was disclosed a new kernel everywhere  Campaign over two weeks from15th January  7 reboot days, 7 tidy up days  By availability zone  Benefits  Automation now to reboot the cloud if needed - 33,000 VMs on 9,000 hypervisors  Latest QEMU and RBD user code on all VMs  Downside  Discovered Kernel bug in XFS which may mean we have to do it again soon 06/06/2018 OpenStack at CERN 40
  • 41.
    Community Experience  Opensource collaboration sets model for in-house teams  External recognition by the community is highly rewarding for contributors  Reviews and being reviewed is a constant learning experience  Productive for job market for staff  Working groups, like the Scientific and Large Deployment teams, discuss wide range of topics  Effective knowledge transfer mechanisms consistent with the CERN mission  110 outreach talks since 2011  Dojos at CERN bring good attendance  Ceph, CentOS, Elastic, OpenStack CH, … 06/06/2018 OpenStack at CERN 41
  • 42.
     Increased complexitydue to much higher pile-up and higher trigger rates will bring several challenges to reconstruction algorithms MS had to cope with monster pile-up 8b4e bunch structure à pile-up of ~ 60 events/x-ing for ~ 20 events/x-ing) CMS: event with 78 reconstructed vertices CMS: event from 2017 with 78 reconstructed vertices ATLAS: simulation for HL-LHC with 200 vertices 06/06/2018 OpenStack at CERN 42 HL-LHC: More collisions!
  • 43.
    06/06/2018 OpenStack atCERN 43 First run LS1 Second run Third run LS3 HL-LHC Run4 …2009 2013 2014 2015 2016 2017 201820112010 2012 2019 2023 2024 2030?20212020 2022 …2025 LS2  Significant part of cost comes from global operations  Even with technology increase of ~15%/year, we still have a big gap if we keep trying to do things with our current compute models Raw data volume increases significantly for High Luminosity LHC 2026
  • 44.
  • 45.
    Development areas goingforward  Spot Market  Cells V2  Neutron scaling  Magnum rolling upgrades  Block Storage Performance  Federated Kubernetes  Collaborations with Industry and SKA 06/06/2018 OpenStack at CERN 45
  • 46.
    Summary  OpenStack hasprovided flexible infrastructure at CERN since 2013  The open infrastructure toolchain has been stable at scale  Clouds are part but not all of the solution  Open source collaborations have been fruitful for CERN, industry and the communities  Further efforts will be needed to ensure that physics is not limited by the computing resources available 06/06/2018 OpenStack at CERN 46
  • 47.
    Thanks for allyour help .. Some links  CERN OpenStack blog at http://openstack- in-production.blogspot.com  Recent CERN OpenStack talks at Vancouver summit at https://www.openstack.org/videos/search?se arch=cern  CERN Tools at https://github.com/cernops 06/06/2018 OpenStack at CERN 47
  • 48.
  • 49.
    Hardware Evolution  Lookingat new hardware platforms to reduce the upcoming resource gap  Explorations have been made in low cost and low power ARM processors  Interesting R&Ds in high performance hardware  GPUs for deep learning network training and fast simulation  FPGAs for neural network inference and data transformations 49 Significant algorithm changes needed to benefit from potential 06/06/2018 OpenStack at CERN

Editor's Notes

  • #5 Reference: Fabiola’s talk @ Univ of Geneva https://www.unige.ch/public/actualites/2017/le-boson-de-higgs-et-notre-vie/ European Centre for Nuclear research Founded in 1954, today 22 member state World largest particle physics laboratory ~2.300 staff, 13k users on site Budget 1k MCHF Mission Answer fundamental question on the universe Advance the technology frontiers Train scientist of tomorrow Bring nations together https://communications.web.cern.ch/fr/node/84
  • #6 For all this fundamental research, CERN provides different facilities to scientists, for example the LHC It’s a ring 27 km in circumference, crosses 2 countries, 100 mt underground, accelerates 2 particle beans to near the speed of light, and it make them collides to 4 different points where there are detectors to observe the fireworks. 2.500 people employed by CERN, > 10k users on the site Talk about LHC here, describe experiment, lake geneve , mont blanc, an then jump in Big ring is the LHC, the small one is the SPS, computer centre is not so far. Pushing the boundary of technology, It facilitate research, we just run the accelerators, experiment are done by institurtes, member states, university Itranco swiss border, very close to geneva
  • #8 Our flagship program is the LHC Trillions of protons race around the 27km ring in opposite directions over 11,000 times a second, travelling at 99.9999991 per cent the speed of light. Largest machine on earth
  • #9 With an operating temperature of about -271 degrees Celsius, just 1.9 degrees above absolute zero, the LHC is one of the coldest places in the universe 120T Helium, only at that temperature there is no resistence
  • #10 https://home.cern/about/engineering/vacuum-empty-interstellar-space Inside beam operate a vey high vacuum, comparable to vacuum of the moon, there actually 2 beam, proton beams going int 2 directions, vaccum to avoiud protocon interacting with other particles
  • #11 Technology very advanced beasts, 4 of them, ATLAS and CMS are the most well known ones, generale pouprose testing standard model properties, in those detector higgs particle have been discovered in 2012 In the picture you can see physicists. ALICE and LHCB To sample and record the debris from up to 600 million proton collisions per second, scientists are building gargantuan devices that measure particles with micron precision.
  • #12 100 Mpixel camera, 40 Million picture per seconds https://www.ethz.ch/en/news-and-events/eth-news/news/2017/03/new-heart-for-cerns-cms.html
  • #13 https://home.cern/about/computing/processing-what-record First run abuot 5GB/s Size of the Fibers from pitches to DC?
  • #14 What we do with all this data? First thing we store it, the analysis is done offline, analysis can go one for years.
  • #15  Tiered-systems where l0 is CERN data is recorded, reconstruted and distributed. All these detectors will generate loads of data… about 1 PB (petabyte = million of gigabyte per… SECOND!) Impossible to store so much data. Anyway not needed. The event the experiments are trying to create and observe are very rare. That’s why we make so many collisions but we keep only the interesting ones. Therefore next to each detector is a «trigger», a kind of filter made of various layers (first electronic, then computers) which will select and keep only 1 collision out of a million average. In the end will still generate dozens of Petabytes of data each year. We need about 200’000 computer CPUs to analyze this data. As CERN has only about 100’000 CPUS we share the date over more than 100 computer centre over the planet (usually located in the physics institutes participating to the LHC collaboration). This is the Computing Grid, a gigantic planetary computer and hard drive! Biggest scientific Grid project in the world ~170 computer centers (site) 1 Tier 0 (distributed in two locations) 14 bigger centers (Tier 1) ~160 Tier 2 42 countries 10,000 users Running since Oct 2008 3 million jobs per day ~600.000 cores 300 PB data Do you want to contribute? http://lhcathome.web.cern.ch/
  • #17 Optimized the usage resources and computing ( ~2012 private cloud based on Openstack) focusing on virtualization etc. and scaling options.