0
OSDC 2014
Overlay Datacenter Information
Christian Kniep

Bull SAS!
2014-04-10
About Me
❖ Me (>30y)
2
!
❖ SysOps (>10y)
About Me
❖ Me (>30y)
2
!
!
❖ SysOps v1.1 (>8y)
!
❖ SysOps (>10y)
About Me
❖ Me (>30y)
2
!
!
❖ SysOps v1.1 (>8y)
!
!
!
❖ BSc (2008-2011)
!
❖ SysOps (>10y)
About Me
❖ Me (>30y)
2
!
!
!
!
❖ DevOps (>4y)
!
!
❖ SysOps v1.1 (>8y)
!
!
!
❖ BSc (2008-2011)
!
❖ SysOps (>10y)
About Me
❖ Me (>30y)
2
!
!
!
!
!
❖ R&D [OpsDev?](>1y)
!
!
!
!
❖ DevOps (>4y)
!
!
❖ SysOps v1.1 (>8y)
!
!
!
❖ BSc (2008-2011)
!
❖ SysOps (>10y)
Ab...
Agenda
3
❖ Cluster Stack
Agenda
3
Cluster
Stack
!
❖ Motivation (InfiniBand use-case)
❖ Cluster Stack
Agenda
3
Cluster
Stack
IB
!
!
❖ QNIB/ng
!
❖ Motivation (InfiniBand use-case)
❖ Cluster Stack
Agenda
3
Cluster
Stack
QNIBngIB
!
!
!
❖ QNIBTerminal (virtual cluster using docker)
!
!
❖ QNIB/ng
!
❖ Motivation (InfiniBand use-case)
❖ Cluster Stack
Agen...
!
!
!
❖ QNIBTerminal (virtual cluster using docker)
!
!
❖ QNIB/ng
!
❖ Motivation (InfiniBand use-case)
❖ Cluster Stack
Agen...
Cluster Stack Work Environment
4
Cluster?
5
„A computer cluster consists of a set of loosely connected or tightly connected computers !
that work together ...
Cluster?
5
„A computer cluster consists of a set of loosely connected or tightly connected computers !
that work together ...
Cluster?
5
„A computer cluster consists of a set of loosely connected or tightly connected computers !
that work together ...
HPC-Cluster
6
High Performance Computing
HPC-Cluster
6
High Performance Computing
❖ HPC: Surfing the bottleneck!
❖ Weakest link breaks performance
HPC-Cluster
6
High Performance Computing
❖ HPC: Surfing the bottleneck!
❖ Weakest link breaks performance
Cluster Layers
7
(rough estimate)
Events Metrics
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
(rough estimate)
Events Metrics
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
(rough estimate)
Ev...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Cluster Layers
7
Hardware:! ! ! IMPI, lm_sensors, IB counter
Operating System:! Kernel, Userland tools
MiddleWare:! ! ! MP...
Layer
n
❖ Every Layer is composed of layers!
❖ How deep to go?
8
Little Data w/o Connection
9
❖ Multiple data sources
!
❖ No way of connecting them
Little Data w/o Connection
9
❖ Multiple data sources
!
!
❖ Connecting is manual labour
!
❖ No way of connecting them
Little Data w/o Connection
9
❖ Multiple data sources
!
!
!
❖ Experience driven
!
!
❖ Connecting is manual labour
!
❖ No way of connecting them
Little Data w/o Connection
9
❖ M...
!
!
!
!
❖ Niche solutions misleading
!
!
!
❖ Experience driven
!
!
❖ Connecting is manual labour
!
❖ No way of connecting ...
IB + QNIBng Motivation
10
Modular Switch
11
❖ Looks like one „switch“!
Modular Switch
12
❖ Looks like one „switch“!
❖ Composed of a network itself
Modular Switch
13
❖ Looks like one „switch“!
❖ Composed of a network itself!
❖ Which route is taken is transparent to
appl...
Modular Switch
14
❖ Looks like one „switch“!
❖ Composed of a network itself!
❖ Which route is taken is transparent to
appl...
Modular Switch
15
❖ Looks like one „switch“!
❖ Composed of a network itself!
❖ Which route is taken is transparent to
appl...
Debug-Nightmare
16
❖ Job seems to fail due to bad internal link
!
❖ 96 port switch
Debug-Nightmare
16
❖ Job seems to fail due to bad internal link
!
!
❖ multiple autonomous job-cells
!
❖ 96 port switch
Debug-Nightmare
16
❖ Job seems to fail due to bad internal link
!
!
!
❖ Relevant information!
❖ Job status (Resource Scheduler)!
❖ Routes (IB Subnet Manager)!
❖ IB Counter (Command Line)...
!
!
!
!
!
!
!
❖ changing one plug, recomputes routes :)
!
!
!
❖ Relevant information!
❖ Job status (Resource Scheduler)!
❖...
Communication Networks
IBPM: Demo OverviewBackground: InfiniBand (IB)
Rate Measurement in IB Networks
IBPM: An Open-Source...
OpenSM
18
Sw
OpenSM
18
OpenSM
nodenode
Sw
node
nodenode
node
node
❖ OpenSM Performance Manager
Sw
OpenSM
18
OpenSM
PerfMgmt
nodenode
Sw
node
nodenode
node
node
!
❖ Sends token to all ports
❖ OpenSM Performance Manager
Sw
OpenSM
18
OpenSM
PerfMgmt
nodenode
Sw
node
nodenode
node
node
!
!
❖ All ports reply with metrics
!
❖ Sends token to all ports
❖ OpenSM Performance Manager
Sw
OpenSM
18
OpenSM
PerfMgmt
...
!
!
!
❖ Callback triggered for every reply
!
!
❖ All ports reply with metrics
!
❖ Sends token to all ports
❖ OpenSM Perfor...
!
!
!
❖ Callback triggered for every reply
!
!
❖ All ports reply with metrics
!
❖ Sends token to all ports
❖ OpenSM Perfor...
!
!
!
❖ Callback triggered for every reply
!
❖ Dumps info to file
!
!
❖ All ports reply with metrics
!
❖ Sends token to all...
!
!
!
❖ Callback triggered for every reply
!
❖ Dumps info to file
!
!
❖ All ports reply with metrics
!
❖ Sends token to all...
OpenSM
PerfMgmt
OpenSM
19
OpenSM
PerfMgmt
qnib
OpenSM
19
❖ qnib
OpenSM
PerfMgmt
qnib
OpenSM
19
!
❖ sends metrics to RRDtool !
❖ events to PostgreSQL
❖ qnib
OpenSM
PerfMgmt
qnibng
OpenSM
19
!
❖ sends metrics to RRDtool !
❖ events to PostgreSQL
❖ qnib
❖ qnibng
OpenSM
PerfMgmt
qnibng
OpenSM
19
!
❖ sends metrics to RRDtool !
❖ events to PostgreSQL
❖ qnib
!
❖ sends metrics to graphit...
OpenSM
PerfMgmt
qnibng
OpenSM
19
!
❖ sends metrics to RRDtool !
❖ events to PostgreSQL
❖ qnib
!
❖ sends metrics to graphit...
Graphite Events port is up/down
20
21
22
QNIBTerminal Proof of Concept
23
Cluster Stack Mock-Up
❖ IB events and metrics are not enough!
❖ How to get real-world behavior?!
❖ Wanted:!
❖ Slurm (Resou...
Classical Virtualization
❖ Big overhead for simple node!
❖ Resources provisioned in advance!
❖ Host resources allocated
25
LXC (docker)
❖ minimal overhead ( couple of MB)!
❖ no resource pinning!
❖ cgroups option!
❖ highly automatable
26
LXC (docker)
❖ minimal overhead ( couple of MB)!
❖ no resource pinning!
❖ cgroups option!
❖ highly automatable
26
NOW: Wat...
Virtual Cluster Nodes
27
host
Virtual Cluster Nodes
❖ Master Node (etcd, DNS, slurmctld)
27
host
master
!
❖ monitoring (graphite + statsd)
Virtual Cluster Nodes
❖ Master Node (etcd, DNS, slurmctld)
27
host
master
monitoring
!
!
❖ log mgmt (ELK)
!
❖ monitoring (graphite + statsd)
Virtual Cluster Nodes
❖ Master Node (etcd, DNS, slurmctld)
27
host...
!
!
!
❖ compute nodes (slurmd)
!
!
❖ log mgmt (ELK)
!
❖ monitoring (graphite + statsd)
Virtual Cluster Nodes
❖ Master Node...
!
!
!
!
❖ alarming (Icinga) [not integrated]
!
!
!
❖ compute nodes (slurmd)
!
!
❖ log mgmt (ELK)
!
❖ monitoring (graphite ...
Master Node
❖ takes care of inventory (etcd)!
❖ provides DNS (+PTR)!
❖ Integrate Rudder, ansible, chef,…?
28
Non-Master Nodes (in general)
❖ are started with master as DNS!
❖ mounting /scratch, /chome (sits on SSDs)!
❖ supervisord ...
docker-compute
❖ slurmd!
❖ sshd!
❖ logstash-forwarder!
❖ openmpi!
❖ qperf
30
docker-compute
❖ slurmd!
❖ sshd!
❖ logstash-forwarder!
❖ openmpi!
❖ qperf
30
docker-compute
❖ slurmd!
❖ sshd!
❖ logstash-forwarder!
❖ openmpi!
❖ qperf
30
docker-compute
❖ slurmd!
❖ sshd!
❖ logstash-forwarder!
❖ openmpi!
❖ qperf
30
docker-compute
❖ slurmd!
❖ sshd!
❖ logstash-forwarder!
❖ openmpi!
❖ qperf
30
docker-compute
❖ slurmd!
❖ sshd!
❖ logstash-forwarder!
❖ openmpi!
❖ qperf
30
docker-compute
❖ slurmd!
❖ sshd!
❖ logstash-forwarder!
❖ openmpi!
❖ qperf
30
docker-graphite (monitoring)
❖ full graphite stack + statsd!
❖ stresses IO (<3 SSDs)!
❖ needs more care (optimize IO)
31
docker-elk (Log Mgmt)
❖ elasticsearch, logstash, kibana!
❖ inputs: syslog, lumberjack!
❖ filters: none!
❖ outputs: elastics...
It’s alive!
33
Start Compute Node
34
Start Compute Node
35
Check Slurm Config
36
Check Slurm Config
36
Check Slurm Config
36
Check Slurm Config
36
Check Slurm Config
36
Run MPI-Job
37
Run MPI-Job
37
Run MPI-Job
37
TCP benchmark
38
QNIBTerminal Future Work
39
docker-icinga
40
❖ Icinga to provide !
❖ state-of-the-cluster overview!
❖ bundle with graphite/elk!
❖ no big deal…
docker-icinga
40
❖ Icinga to provide !
❖ state-of-the-cluster overview!
❖ bundle with graphite/elk!
❖ no big deal…
!
!
!
!...
docker-(GlusterFS,Lustre)
❖ Cluster scratch to integrate with!
❖ Use of kernel-modules freezes attempt!
❖ Might be pushed ...
❖ How is SysOps/DevOps/Mgmt
Humans!
42
!
❖ react to the changes
❖ How is SysOps/DevOps/Mgmt
Humans!
42
!
!
❖ adopt them
!
❖ react to the changes
❖ How is SysOps/DevOps/Mgmt
Humans!
42
!
!
!
❖ feared by them
!
!
❖ adopt them
!
❖ react to the changes
❖ How is SysOps/DevOps/Mgmt
Humans!
42
❖ Truckload of
Big Data!
43
!
❖ Events
❖ Truckload of
Big Data!
43
!
!
❖ Metrics
!
❖ Events
❖ Truckload of
Big Data!
43
!
!
!
❖ Interaction
!
!
❖ Metrics
!
❖ Events
❖ Truckload of
Big Data!
43
!
!
!
❖ Interaction
!
!
❖ Metrics
!
❖ Events
❖ Truckload of
Big Data!
43
node01.system.memory.usage 9!
node13.system.memor...
!
!
!
❖ Interaction
!
!
❖ Metrics
!
❖ Events
❖ Truckload of
Big Data!
43
node01.system.memory.usage 9!
node13.system.memor...
!
!
!
❖ Interaction
!
!
❖ Metrics
!
❖ Events
❖ Truckload of
Big Data!
43
job1.node01.system.memory.usage 9!
job1.node13.sy...
!
!
!
❖ Interaction
!
!
❖ Metrics
!
❖ Events
❖ Truckload of
Big Data!
43
job1.node01.system.memory.usage 9!
job1.node13.sy...
pipework / mininet
❖ Currently all containers are bound to docker0 bridge!
❖ Creating topology with virtual/real switches ...
Dockerfiles
❖ Only 3 images are fd20 based
45
Questions?
❖ Pictures!
❖ p2: http://de.wikipedia.org/wiki/Datei:Audi_logo.svg

http://commons.wikimedia.org/wiki/File:Daim...
Upcoming SlideShare
Loading in...5
×

OSDC 2014: Christian Kniep - Understand your data center by overlaying multiple information layers

1,204

Published on

Today's data center managers are burdened by a lack of aligned information of multiple layers. Work-flow events like 'job starts' aligned with performance metrics and events extracted from log facilities are low-hanging fruit that is on the edge to become use-able due to open-source software like Graphite, StatsD, logstash and alike.
This talk aims to show off the benefits of merging multiple layers of information within an InfiniBand cluster by using use-cases for level 1/2/3 personnel.

Published in: Software, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,204
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
23
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "OSDC 2014: Christian Kniep - Understand your data center by overlaying multiple information layers"

  1. 1. OSDC 2014 Overlay Datacenter Information Christian Kniep
 Bull SAS! 2014-04-10
  2. 2. About Me ❖ Me (>30y) 2
  3. 3. ! ❖ SysOps (>10y) About Me ❖ Me (>30y) 2
  4. 4. ! ! ❖ SysOps v1.1 (>8y) ! ❖ SysOps (>10y) About Me ❖ Me (>30y) 2
  5. 5. ! ! ❖ SysOps v1.1 (>8y) ! ! ! ❖ BSc (2008-2011) ! ❖ SysOps (>10y) About Me ❖ Me (>30y) 2
  6. 6. ! ! ! ! ❖ DevOps (>4y) ! ! ❖ SysOps v1.1 (>8y) ! ! ! ❖ BSc (2008-2011) ! ❖ SysOps (>10y) About Me ❖ Me (>30y) 2
  7. 7. ! ! ! ! ! ❖ R&D [OpsDev?](>1y) ! ! ! ! ❖ DevOps (>4y) ! ! ❖ SysOps v1.1 (>8y) ! ! ! ❖ BSc (2008-2011) ! ❖ SysOps (>10y) About Me ❖ Me (>30y) 2
  8. 8. Agenda 3
  9. 9. ❖ Cluster Stack Agenda 3 Cluster Stack
  10. 10. ! ❖ Motivation (InfiniBand use-case) ❖ Cluster Stack Agenda 3 Cluster Stack IB
  11. 11. ! ! ❖ QNIB/ng ! ❖ Motivation (InfiniBand use-case) ❖ Cluster Stack Agenda 3 Cluster Stack QNIBngIB
  12. 12. ! ! ! ❖ QNIBTerminal (virtual cluster using docker) ! ! ❖ QNIB/ng ! ❖ Motivation (InfiniBand use-case) ❖ Cluster Stack Agenda 3 Cluster Stack QNIBngIB QNIB
 Terminal
  13. 13. ! ! ! ❖ QNIBTerminal (virtual cluster using docker) ! ! ❖ QNIB/ng ! ❖ Motivation (InfiniBand use-case) ❖ Cluster Stack Agenda 3 Cluster Stack QNIBngIB I. QNIB
 Terminal II. III.
  14. 14. Cluster Stack Work Environment 4
  15. 15. Cluster? 5 „A computer cluster consists of a set of loosely connected or tightly connected computers ! that work together so that in many respects they can be viewed as a single system.“ - wikipedia.org User
  16. 16. Cluster? 5 „A computer cluster consists of a set of loosely connected or tightly connected computers ! that work together so that in many respects they can be viewed as a single system.“ - wikipedia.org User
  17. 17. Cluster? 5 „A computer cluster consists of a set of loosely connected or tightly connected computers ! that work together so that in many respects they can be viewed as a single system.“ - wikipedia.org User
  18. 18. HPC-Cluster 6 High Performance Computing
  19. 19. HPC-Cluster 6 High Performance Computing ❖ HPC: Surfing the bottleneck! ❖ Weakest link breaks performance
  20. 20. HPC-Cluster 6 High Performance Computing ❖ HPC: Surfing the bottleneck! ❖ Weakest link breaks performance
  21. 21. Cluster Layers 7 (rough estimate) Events Metrics
  22. 22. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter (rough estimate) Events Metrics
  23. 23. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools (rough estimate) Events Metrics
  24. 24. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs (rough estimate) Events Metrics
  25. 25. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs Services:! ! ! ! Storage, Job Scheduler, sshd (rough estimate) Events Metrics
  26. 26. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs Services:! ! ! ! Storage, Job Scheduler, sshd Software:! ! ! ! End user application (rough estimate) Events Metrics
  27. 27. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs Services:! ! ! ! Storage, Job Scheduler, sshd Software:! ! ! ! End user application (rough estimate) End
 User Events Metrics
  28. 28. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs Services:! ! ! ! Storage, Job Scheduler, sshd Software:! ! ! ! End user application (rough estimate) End
 User PowerUser/ISV Events Metrics
  29. 29. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs Services:! ! ! ! Storage, Job Scheduler, sshd Software:! ! ! ! End user application (rough estimate) End
 User Excel:! ! ! ! ! KPI, SLA Mgmt PowerUser/ISV Events Metrics
  30. 30. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs Services:! ! ! ! Storage, Job Scheduler, sshd Software:! ! ! ! End user application (rough estimate) End
 User Excel:! ! ! ! ! KPI, SLA Mgmt SysOps PowerUser/ISV Events Metrics
  31. 31. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs Services:! ! ! ! Storage, Job Scheduler, sshd Software:! ! ! ! End user application (rough estimate) End
 User Excel:! ! ! ! ! KPI, SLA Mgmt SysOps PowerUser/ISV SysOpsL1 Events Metrics
  32. 32. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs Services:! ! ! ! Storage, Job Scheduler, sshd Software:! ! ! ! End user application (rough estimate) End
 User Excel:! ! ! ! ! KPI, SLA Mgmt SysOps PowerUser/ISV SysOpsL2 SysOpsL1 Events Metrics
  33. 33. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs Services:! ! ! ! Storage, Job Scheduler, sshd Software:! ! ! ! End user application (rough estimate) End
 User Excel:! ! ! ! ! KPI, SLA Mgmt SysOps PowerUser/ISV SysOpsL2 SysOpsL1 Events Metrics SysOpsL3
  34. 34. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs Services:! ! ! ! Storage, Job Scheduler, sshd Software:! ! ! ! End user application (rough estimate) End
 User Excel:! ! ! ! ! KPI, SLA Mgmt SysOps PowerUser/ISV SysOpsMgmt SysOpsL2 SysOpsL1 Events Metrics SysOpsL3
  35. 35. Cluster Layers 7 Hardware:! ! ! IMPI, lm_sensors, IB counter Operating System:! Kernel, Userland tools MiddleWare:! ! ! MPI, ISV-libs Services:! ! ! ! Storage, Job Scheduler, sshd Software:! ! ! ! End user application (rough estimate) End
 User Excel:! ! ! ! ! KPI, SLA Mgmt SysOps PowerUser/ISV SysOpsMgmt ISVMgmt SysOpsL2 SysOpsL1 Events Metrics SysOpsL3
  36. 36. Layer n ❖ Every Layer is composed of layers! ❖ How deep to go? 8
  37. 37. Little Data w/o Connection 9 ❖ Multiple data sources
  38. 38. ! ❖ No way of connecting them Little Data w/o Connection 9 ❖ Multiple data sources
  39. 39. ! ! ❖ Connecting is manual labour ! ❖ No way of connecting them Little Data w/o Connection 9 ❖ Multiple data sources
  40. 40. ! ! ! ❖ Experience driven ! ! ❖ Connecting is manual labour ! ❖ No way of connecting them Little Data w/o Connection 9 ❖ Multiple data sources
  41. 41. ! ! ! ! ❖ Niche solutions misleading ! ! ! ❖ Experience driven ! ! ❖ Connecting is manual labour ! ❖ No way of connecting them Little Data w/o Connection 9 ❖ Multiple data sources
  42. 42. IB + QNIBng Motivation 10
  43. 43. Modular Switch 11 ❖ Looks like one „switch“!
  44. 44. Modular Switch 12 ❖ Looks like one „switch“! ❖ Composed of a network itself
  45. 45. Modular Switch 13 ❖ Looks like one „switch“! ❖ Composed of a network itself! ❖ Which route is taken is transparent to application! ❖ LB1<>FB1<>LB4
  46. 46. Modular Switch 14 ❖ Looks like one „switch“! ❖ Composed of a network itself! ❖ Which route is taken is transparent to application! ❖ LB1<>FB1<>LB4! ❖ LB1<>FB2<>LB4
  47. 47. Modular Switch 15 ❖ Looks like one „switch“! ❖ Composed of a network itself! ❖ Which route is taken is transparent to application! ❖ LB1<>FB1<>LB4! ❖ LB1<>FB2<>LB4! ❖ LB1 ->FB1 ->LB4 / LB1 <-FB2 <-LB4
  48. 48. Debug-Nightmare 16 ❖ Job seems to fail due to bad internal link
  49. 49. ! ❖ 96 port switch Debug-Nightmare 16 ❖ Job seems to fail due to bad internal link
  50. 50. ! ! ❖ multiple autonomous job-cells ! ❖ 96 port switch Debug-Nightmare 16 ❖ Job seems to fail due to bad internal link
  51. 51. ! ! ! ❖ Relevant information! ❖ Job status (Resource Scheduler)! ❖ Routes (IB Subnet Manager)! ❖ IB Counter (Command Line) ! ! ❖ multiple autonomous job-cells ! ❖ 96 port switch Debug-Nightmare 16 ❖ Job seems to fail due to bad internal link
  52. 52. ! ! ! ! ! ! ! ❖ changing one plug, recomputes routes :) ! ! ! ❖ Relevant information! ❖ Job status (Resource Scheduler)! ❖ Routes (IB Subnet Manager)! ❖ IB Counter (Command Line) ! ! ❖ multiple autonomous job-cells ! ❖ 96 port switch Debug-Nightmare 16 ❖ Job seems to fail due to bad internal link
  53. 53. Communication Networks IBPM: Demo OverviewBackground: InfiniBand (IB) Rate Measurement in IB Networks IBPM: An Open-Source-Based Framework for InfiniBand Performance Monitoring Michael Hoefling1, Michael Menth1, Christian Kniep2, Marcus Camen2 State-of-the art communication technology for interconnection in high-performance computing data centers Point-to-point bidirectional links High throughput (40 Gbit/s with QDR) Low latency Dynamic on-line network reconfiguration in cooperation with Idea Extract raw network information from IB network Analyze output Derive statistics about performance of the network Topology Extraction Subnet discovery using ibnetdiscover Produces human readable file of network topology Process output to produce graphical representation of the network Remote Counter Readout Each port has its own set of performance counters Counters measure, e.g., transferred data, congestion, errors, link states changes ibsim-Based Network Simulation ibsim simulates an IB network Simple topology changes possible (GUI) ibsim limitations No performance simulation possible No data rate changes possible Real IB Network Physical network Allows performance measurements GUI controlled traffic scenarios 17
  54. 54. OpenSM 18
  55. 55. Sw OpenSM 18 OpenSM nodenode Sw node nodenode node node
  56. 56. ❖ OpenSM Performance Manager Sw OpenSM 18 OpenSM PerfMgmt nodenode Sw node nodenode node node
  57. 57. ! ❖ Sends token to all ports ❖ OpenSM Performance Manager Sw OpenSM 18 OpenSM PerfMgmt nodenode Sw node nodenode node node
  58. 58. ! ! ❖ All ports reply with metrics ! ❖ Sends token to all ports ❖ OpenSM Performance Manager Sw OpenSM 18 OpenSM PerfMgmt nodenode Sw node nodenode node node
  59. 59. ! ! ! ❖ Callback triggered for every reply ! ! ❖ All ports reply with metrics ! ❖ Sends token to all ports ❖ OpenSM Performance Manager Sw OpenSM 18 OpenSM PerfMgmt nodenode Sw node nodenode node node
  60. 60. ! ! ! ❖ Callback triggered for every reply ! ! ❖ All ports reply with metrics ! ❖ Sends token to all ports ❖ OpenSM Performance Manager Sw OpenSM 18 OpenSM PerfMgmt osmeventplugin nodenode Sw node nodenode node node ❖ osmeventplugin
  61. 61. ! ! ! ❖ Callback triggered for every reply ! ❖ Dumps info to file ! ! ❖ All ports reply with metrics ! ❖ Sends token to all ports ❖ OpenSM Performance Manager Sw OpenSM 18 OpenSM PerfMgmt osmeventplugin nodenode Sw node nodenode node node ❖ osmeventplugin
  62. 62. ! ! ! ❖ Callback triggered for every reply ! ❖ Dumps info to file ! ! ❖ All ports reply with metrics ! ❖ Sends token to all ports ❖ OpenSM Performance Manager Sw OpenSM 18 OpenSM PerfMgmt nodenode Sw node nodenode node node ❖ osmeventplugin
  63. 63. OpenSM PerfMgmt OpenSM 19
  64. 64. OpenSM PerfMgmt qnib OpenSM 19 ❖ qnib
  65. 65. OpenSM PerfMgmt qnib OpenSM 19 ! ❖ sends metrics to RRDtool ! ❖ events to PostgreSQL ❖ qnib
  66. 66. OpenSM PerfMgmt qnibng OpenSM 19 ! ❖ sends metrics to RRDtool ! ❖ events to PostgreSQL ❖ qnib ❖ qnibng
  67. 67. OpenSM PerfMgmt qnibng OpenSM 19 ! ❖ sends metrics to RRDtool ! ❖ events to PostgreSQL ❖ qnib ! ❖ sends metrics to graphite ! ❖ events to logstash ❖ qnibng
  68. 68. OpenSM PerfMgmt qnibng OpenSM 19 ! ❖ sends metrics to RRDtool ! ❖ events to PostgreSQL ❖ qnib ! ❖ sends metrics to graphite ! ❖ events to logstash ❖ qnibng
  69. 69. Graphite Events port is up/down 20
  70. 70. 21
  71. 71. 22
  72. 72. QNIBTerminal Proof of Concept 23
  73. 73. Cluster Stack Mock-Up ❖ IB events and metrics are not enough! ❖ How to get real-world behavior?! ❖ Wanted:! ❖ Slurm (Resource Scheduler)! ❖ MPI enabled compute nodes! ❖ As much additional cluster stack as possible 
 (Graphite,elasticsearch/logstash/kibana, Icinga, Cluster-FS, …) 24
  74. 74. Classical Virtualization ❖ Big overhead for simple node! ❖ Resources provisioned in advance! ❖ Host resources allocated 25
  75. 75. LXC (docker) ❖ minimal overhead ( couple of MB)! ❖ no resource pinning! ❖ cgroups option! ❖ highly automatable 26
  76. 76. LXC (docker) ❖ minimal overhead ( couple of MB)! ❖ no resource pinning! ❖ cgroups option! ❖ highly automatable 26 NOW: Watch OSDC2014 talk ‚Docker‘ by ‚Tobias Schwab‘
  77. 77. Virtual Cluster Nodes 27 host
  78. 78. Virtual Cluster Nodes ❖ Master Node (etcd, DNS, slurmctld) 27 host master
  79. 79. ! ❖ monitoring (graphite + statsd) Virtual Cluster Nodes ❖ Master Node (etcd, DNS, slurmctld) 27 host master monitoring
  80. 80. ! ! ❖ log mgmt (ELK) ! ❖ monitoring (graphite + statsd) Virtual Cluster Nodes ❖ Master Node (etcd, DNS, slurmctld) 27 host master monitoring logmgmt
  81. 81. ! ! ! ❖ compute nodes (slurmd) ! ! ❖ log mgmt (ELK) ! ❖ monitoring (graphite + statsd) Virtual Cluster Nodes ❖ Master Node (etcd, DNS, slurmctld) 27 host master monitoring logmgmt compute0 compute1 computeN
  82. 82. ! ! ! ! ❖ alarming (Icinga) [not integrated] ! ! ! ❖ compute nodes (slurmd) ! ! ❖ log mgmt (ELK) ! ❖ monitoring (graphite + statsd) Virtual Cluster Nodes ❖ Master Node (etcd, DNS, slurmctld) 27 host master monitoring logmgmt compute0 compute1 computeN
  83. 83. Master Node ❖ takes care of inventory (etcd)! ❖ provides DNS (+PTR)! ❖ Integrate Rudder, ansible, chef,…? 28
  84. 84. Non-Master Nodes (in general) ❖ are started with master as DNS! ❖ mounting /scratch, /chome (sits on SSDs)! ❖ supervisord kicks in and starts services and setup-scripts! ❖ sending metrics to graphite! ❖ logs to logstash 29
  85. 85. docker-compute ❖ slurmd! ❖ sshd! ❖ logstash-forwarder! ❖ openmpi! ❖ qperf 30
  86. 86. docker-compute ❖ slurmd! ❖ sshd! ❖ logstash-forwarder! ❖ openmpi! ❖ qperf 30
  87. 87. docker-compute ❖ slurmd! ❖ sshd! ❖ logstash-forwarder! ❖ openmpi! ❖ qperf 30
  88. 88. docker-compute ❖ slurmd! ❖ sshd! ❖ logstash-forwarder! ❖ openmpi! ❖ qperf 30
  89. 89. docker-compute ❖ slurmd! ❖ sshd! ❖ logstash-forwarder! ❖ openmpi! ❖ qperf 30
  90. 90. docker-compute ❖ slurmd! ❖ sshd! ❖ logstash-forwarder! ❖ openmpi! ❖ qperf 30
  91. 91. docker-compute ❖ slurmd! ❖ sshd! ❖ logstash-forwarder! ❖ openmpi! ❖ qperf 30
  92. 92. docker-graphite (monitoring) ❖ full graphite stack + statsd! ❖ stresses IO (<3 SSDs)! ❖ needs more care (optimize IO) 31
  93. 93. docker-elk (Log Mgmt) ❖ elasticsearch, logstash, kibana! ❖ inputs: syslog, lumberjack! ❖ filters: none! ❖ outputs: elasticsearch 32
  94. 94. It’s alive! 33
  95. 95. Start Compute Node 34
  96. 96. Start Compute Node 35
  97. 97. Check Slurm Config 36
  98. 98. Check Slurm Config 36
  99. 99. Check Slurm Config 36
  100. 100. Check Slurm Config 36
  101. 101. Check Slurm Config 36
  102. 102. Run MPI-Job 37
  103. 103. Run MPI-Job 37
  104. 104. Run MPI-Job 37
  105. 105. TCP benchmark 38
  106. 106. QNIBTerminal Future Work 39
  107. 107. docker-icinga 40 ❖ Icinga to provide ! ❖ state-of-the-cluster overview! ❖ bundle with graphite/elk! ❖ no big deal…
  108. 108. docker-icinga 40 ❖ Icinga to provide ! ❖ state-of-the-cluster overview! ❖ bundle with graphite/elk! ❖ no big deal… ! ! ! ! ❖ Is this going to scale?
  109. 109. docker-(GlusterFS,Lustre) ❖ Cluster scratch to integrate with! ❖ Use of kernel-modules freezes attempt! ❖ Might be pushed in VirtualBox (vagrant) 41
  110. 110. ❖ How is SysOps/DevOps/Mgmt Humans! 42
  111. 111. ! ❖ react to the changes ❖ How is SysOps/DevOps/Mgmt Humans! 42
  112. 112. ! ! ❖ adopt them ! ❖ react to the changes ❖ How is SysOps/DevOps/Mgmt Humans! 42
  113. 113. ! ! ! ❖ feared by them ! ! ❖ adopt them ! ❖ react to the changes ❖ How is SysOps/DevOps/Mgmt Humans! 42
  114. 114. ❖ Truckload of Big Data! 43
  115. 115. ! ❖ Events ❖ Truckload of Big Data! 43
  116. 116. ! ! ❖ Metrics ! ❖ Events ❖ Truckload of Big Data! 43
  117. 117. ! ! ! ❖ Interaction ! ! ❖ Metrics ! ❖ Events ❖ Truckload of Big Data! 43
  118. 118. ! ! ! ❖ Interaction ! ! ❖ Metrics ! ❖ Events ❖ Truckload of Big Data! 43 node01.system.memory.usage 9! node13.system.memory.usage 14! node35.system.memory.usage 12! node95.system.memory.usage 11
  119. 119. ! ! ! ❖ Interaction ! ! ❖ Metrics ! ❖ Events ❖ Truckload of Big Data! 43 node01.system.memory.usage 9! node13.system.memory.usage 14! node35.system.memory.usage 12! node95.system.memory.usage 11 target=sumSeries(node{01,13,35,95}.system.memory.usage)
  120. 120. ! ! ! ❖ Interaction ! ! ❖ Metrics ! ❖ Events ❖ Truckload of Big Data! 43 job1.node01.system.memory.usage 9! job1.node13.system.memory.usage 14! job1.node35.system.memory.usage 12! job1.node95.system.memory.usage 11 node01.system.memory.usage 9! node13.system.memory.usage 14! node35.system.memory.usage 12! node95.system.memory.usage 11 target=sumSeries(node{01,13,35,95}.system.memory.usage)
  121. 121. ! ! ! ❖ Interaction ! ! ❖ Metrics ! ❖ Events ❖ Truckload of Big Data! 43 job1.node01.system.memory.usage 9! job1.node13.system.memory.usage 14! job1.node35.system.memory.usage 12! job1.node95.system.memory.usage 11 target=sumSeries(job01.*.system.memory.usage) node01.system.memory.usage 9! node13.system.memory.usage 14! node35.system.memory.usage 12! node95.system.memory.usage 11 target=sumSeries(node{01,13,35,95}.system.memory.usage)
  122. 122. pipework / mininet ❖ Currently all containers are bound to docker0 bridge! ❖ Creating topology with virtual/real switches would be nice! ❖ First iteration might use pipework! ❖ More complete one should use vSwitches (mininet?) 44
  123. 123. Dockerfiles ❖ Only 3 images are fd20 based 45
  124. 124. Questions? ❖ Pictures! ❖ p2: http://de.wikipedia.org/wiki/Datei:Audi_logo.svg
 http://commons.wikimedia.org/wiki/File:Daimler_AG.svg
 http://ffb.uni-lueneburg.de/20JahreFFB/! ❖ p4: https://www.flickr.com/photos/adeneko/4229090961! ❖ p6: cae t100
 https://www.flickr.com/photos/losalamosnatlab/7422429706! ❖ p8: http://www.brendangregg.com/Slides/SCaLE_Linux_Performance2013.pdf! ❖ p9: https://www.flickr.com/photos/riafoge/6796129047! ❖ p10: https://www.flickr.com/photos/119364768@N03/12928685224/! ❖ p11: http://www.mellanox.com/page/products_dyn?product_family=74 ! ❖ p23: https://www.flickr.com/photos/jaxport/3077543062! ❖ p25/26: https://blog.trifork.com/2013/08/08/next-step-in-virtualization-docker-lightweight-containers/! ❖ p33: https://www.flickr.com/photos/fkehren/5139094564! ❖ p39: https://www.flickr.com/photos/brizzlebornandbred/12852909293 46
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×