OpenNebula at the Leibniz Supercomputing Centre
Dr. Matteo Lanati (matteo.lanati@lrz.de)
22nd October 2015
2LRZ, Distributed Resources Group, Matteo Lanati
●
Introduction
●
OpenNebula in the LRZ Compute Cloud
●
Problems and customisation
●
Where we want to go
●
Personal experience
Outline
3LRZ, Distributed Resources Group, Matteo Lanati
●
2004: MSc. Electronic Eng. - Univ. of Pavia
●
2007: PhD. Telecommunication – Univ. of Pavia
●
2008 – 2011: EUCENTRE (European Centre for Training and
Research in Earthquake Engineering)
●
DORII (Deployment of Remote Instrumentation Infrastructure)
●
PROETEX (Advanced e-textiles for firefighters and civilian victims)
●
2011 – present: LRZ, Distributed Resources Group
Introduction
4LRZ, Distributed Resources Group, Matteo Lanati
●
Scope:
– Munich
– Bavaria
– Germany
– Europe
– Worldwide
●
Provision of traditional IT services
●
High performance systems
– SuperMUC Petascale system: 6.8 Pflop/s
Leibniz Supercomupting Centre of the Bavarian
Academy of Sciences and Humanities
5LRZ, Distributed Resources Group, Matteo Lanati
What happens when the users ask for
●
a particular OS (i.e., Scientific Linux for LCG)
●
specific software (already packaged/available as VM)
●
to install/run a daemon
●
Hadoop/Cassandra ...
●
computational resources with a strict time constraint
The LRZ Compute Cloud
https://www.lrz.de/services/compute/cloud_en/
6LRZ, Distributed Resources Group, Matteo Lanati
SSH commands
Monitoring probes
...
...
Worker node 80
Datastore System store 1 System store 10
Frontend:
- GUI
- Scheduling
- Deployment
- Monitoring
Worker node 1
LRZ Compute Cloud: OpenNebula setup
7LRZ, Distributed Resources Group, Matteo Lanati
Isolate VMs of different groups
●
Protection from faulty/malicious configuration
Resilient management of the IP subnet(s)
●
Serve as many groups/users as needed
●
All in the same subnet
●
Use the cloud middleware to assign MAC addresses
●
Use a DHCP server for the rest
OpenNebula networking: the problem
8LRZ, Distributed Resources Group, Matteo Lanati
VM 2
Public Internet
MWN
VM 3
VM 1
VM 4
VM 5
VM 6
Groups are orthogonal: no x-talk allowed
But routing to/from the Internet still possible
OpenNebula networking: the approach
9LRZ, Distributed Resources Group, Matteo Lanati
Private V-LAN approach
Embed the numerical group ID in the MAC
●
02:00:00:0A:00:00
Filter ingress traffic
●
02:00:00:0A:xx:xx/FF:FF:FF:FF:00:00 => 02:00:00:0A:00:00
Result: flat IP subnet, isolation at layer 2
OpenNebula networking: the approach
10LRZ, Distributed Resources Group, Matteo Lanati
WN 1 WN 2
VM 2
Public Internet
OVS
bridge
MWN
VM 3
OVS
bridge
VM 1
VM 4
VM 5 VM 6
OVS
bridge
OpenNebula networking: the actual picture
OVS
bridge
OVS = Open vSwitch (http://openvswitch.org/)
11LRZ, Distributed Resources Group, Matteo Lanati
Capacity
●
Resource limits: cores, RAM, disk space …
Time
●
Introduce the concept of Budget, in CPU-hours
– GUI integration
– Blend into the life cycle of the VM
Fair resource usage
12LRZ, Distributed Resources Group, Matteo Lanati
Time
●
Introduce the concept of Budget, in CPU-hours
– GUI integration
Fair resource usage
13LRZ, Distributed Resources Group, Matteo Lanati
Time
●
Introduce the concept of Budget, in CPU-hours
– GUI integration
Fair resource usage
14LRZ, Distributed Resources Group, Matteo Lanati
Time
●
Introduce the concept of Budget, in CPU-hours
– GUI integration
Fair resource usage
15LRZ, Distributed Resources Group, Matteo Lanati
Capacity
●
Resource limits: cores, RAM, disk space …
Time
●
Introduce the concept of Budget, in CPU-hours
– GUI integration
– Blend into the life cycle of the VM
●
Cron jobs and oneacct to monitor
●
ACLs to stop usage without a budget
Fair resource usage
16LRZ, Distributed Resources Group, Matteo Lanati
●
No possibility to perform NATting
●
Public IPs pool available
●
Modify Econe server so that:
●
Associate EIP = attach interface
●
Preserve compatibility
EC2 Interface and EIP
17LRZ, Distributed Resources Group, Matteo Lanati
●
Redesign the budgeting (for billing)
●
Introduce a cost function rather than CPU hours
●
Monitor VMs for security flaws
●
Introduce new scheduling policies
●
Opportunistic scheduling
●
Resource contribution
●
IPv6 support
Next steps
18LRZ, Distributed Resources Group, Matteo Lanati
What our bosses thought we were going to do
Personal experience
19LRZ, Distributed Resources Group, Matteo Lanati
Documentation is fundamental
Personal experience
20LRZ, Distributed Resources Group, Matteo Lanati
Customisation to avoid errors
Personal experience
21LRZ, Distributed Resources Group, Matteo Lanati
Thank you for your attention
22LRZ, Distributed Resources Group, Matteo Lanati
Phase 1 (2012)
●
Westmere/Sandy Bridge
●
> 155.000 cores
●
> 3.0 Pflops/s total peak
performance
SuperMUC
Phase 2 (2015)
●
Haswell
●
> 86.000 cores
●
> 3.5 Pflops/s total peak
performance
https://www.lrz.de/services/compute/supermuc/
23LRZ, Distributed Resources Group, Matteo Lanati
WN 1 WN 2
VM 2
OVS
bridge
MWN
OVS
bridge
VM 3
OVS
bridge
VM 1
VM 4
VM 5 VM 6
OVS
bridge
OpenNebula networking: the actual picture
02:00:00:0A:00:03
02:00:00:0A:00:01
When the OVS bridge receives the packet:
●
Masks the source MAC address to get the group ID
02:00:00:0A:00:01 & FF:FF:FF:FF:00:00 = 02:00:00:0A:00:00
●
Compares with the group ID of the destination:
02:00:00:0A:00:03
●
Match: let the packet go through!
24LRZ, Distributed Resources Group, Matteo Lanati
WN 1 WN 2
VM 2
OVS
bridge
MWN
OVS
bridge
VM 3
OVS
bridge
VM 1
VM 4
VM 5 VM 6
OVS
bridge
OpenNebula networking: the actual picture
02:00:00:0B:00:01
02:00:00:0A:00:01
When the OVS bridge receives the packet:
●
Masks the source MAC address to get the group ID
02:00:00:0B:00:01 & FF:FF:FF:FF:00:00 = 02:00:00:0B:00:00
●
Compares with the group ID of the destination:
02:00:00:0A:00:01
●
No match: drop the connection!
25LRZ, Distributed Resources Group, Matteo Lanati
WN 1 WN 2
VM 2
OVS
bridge
VM 3
OVS
bridge
VM 1
VM 4
VM 5 VM 6
OVS
bridge
OpenNebula networking: the actual picture
OVS
bridge
DHCPDHCP
LDAP
backend
Configure
OVS
Configure
DHCP entry
26LRZ, Distributed Resources Group, Matteo Lanati
WN 1 WN 2
VM 2
OVS
bridge
VM 3
OVS
bridge
VM 1
VM 4
VM 5 VM 6
OVS
bridge
OpenNebula networking: the actual picture
OVS
bridge
DHCPDHCP
LDAP
backend
Get Net setup
27LRZ, Distributed Resources Group, Matteo Lanati
Goal: understand the bacterial composition of the gut microbiota
in human and mouse samples
Use case: QIIME (Quantitative Insights Into
Microbial Ecology)
Credits: Dr. Debora Garzetti ( garzetti@mvp.uni-muenchen.de) -Max von Pettenkofer-Institute, LMU
Samples
Area chart of the
bacteria
composition of
mouse gut's
samples
28LRZ, Distributed Resources Group, Matteo Lanati
Use case: Malware Zoo
Credits: George Webster (webstergd@sec.in.tum.de), Alexander Luedtke (alex@sec.in.tum.de)
Catalyst for Computer Security research @ TUM
Machine learning analysis on
> 100.000 samples
29LRZ, Distributed Resources Group, Matteo Lanati
LRZ Compute Cloud: the numbers so far
30LRZ, Distributed Resources Group, Matteo Lanati
LRZ Compute Cloud: the numbers so far

OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Matteo Lanati

  • 1.
    OpenNebula at theLeibniz Supercomputing Centre Dr. Matteo Lanati (matteo.lanati@lrz.de) 22nd October 2015
  • 2.
    2LRZ, Distributed ResourcesGroup, Matteo Lanati ● Introduction ● OpenNebula in the LRZ Compute Cloud ● Problems and customisation ● Where we want to go ● Personal experience Outline
  • 3.
    3LRZ, Distributed ResourcesGroup, Matteo Lanati ● 2004: MSc. Electronic Eng. - Univ. of Pavia ● 2007: PhD. Telecommunication – Univ. of Pavia ● 2008 – 2011: EUCENTRE (European Centre for Training and Research in Earthquake Engineering) ● DORII (Deployment of Remote Instrumentation Infrastructure) ● PROETEX (Advanced e-textiles for firefighters and civilian victims) ● 2011 – present: LRZ, Distributed Resources Group Introduction
  • 4.
    4LRZ, Distributed ResourcesGroup, Matteo Lanati ● Scope: – Munich – Bavaria – Germany – Europe – Worldwide ● Provision of traditional IT services ● High performance systems – SuperMUC Petascale system: 6.8 Pflop/s Leibniz Supercomupting Centre of the Bavarian Academy of Sciences and Humanities
  • 5.
    5LRZ, Distributed ResourcesGroup, Matteo Lanati What happens when the users ask for ● a particular OS (i.e., Scientific Linux for LCG) ● specific software (already packaged/available as VM) ● to install/run a daemon ● Hadoop/Cassandra ... ● computational resources with a strict time constraint The LRZ Compute Cloud https://www.lrz.de/services/compute/cloud_en/
  • 6.
    6LRZ, Distributed ResourcesGroup, Matteo Lanati SSH commands Monitoring probes ... ... Worker node 80 Datastore System store 1 System store 10 Frontend: - GUI - Scheduling - Deployment - Monitoring Worker node 1 LRZ Compute Cloud: OpenNebula setup
  • 7.
    7LRZ, Distributed ResourcesGroup, Matteo Lanati Isolate VMs of different groups ● Protection from faulty/malicious configuration Resilient management of the IP subnet(s) ● Serve as many groups/users as needed ● All in the same subnet ● Use the cloud middleware to assign MAC addresses ● Use a DHCP server for the rest OpenNebula networking: the problem
  • 8.
    8LRZ, Distributed ResourcesGroup, Matteo Lanati VM 2 Public Internet MWN VM 3 VM 1 VM 4 VM 5 VM 6 Groups are orthogonal: no x-talk allowed But routing to/from the Internet still possible OpenNebula networking: the approach
  • 9.
    9LRZ, Distributed ResourcesGroup, Matteo Lanati Private V-LAN approach Embed the numerical group ID in the MAC ● 02:00:00:0A:00:00 Filter ingress traffic ● 02:00:00:0A:xx:xx/FF:FF:FF:FF:00:00 => 02:00:00:0A:00:00 Result: flat IP subnet, isolation at layer 2 OpenNebula networking: the approach
  • 10.
    10LRZ, Distributed ResourcesGroup, Matteo Lanati WN 1 WN 2 VM 2 Public Internet OVS bridge MWN VM 3 OVS bridge VM 1 VM 4 VM 5 VM 6 OVS bridge OpenNebula networking: the actual picture OVS bridge OVS = Open vSwitch (http://openvswitch.org/)
  • 11.
    11LRZ, Distributed ResourcesGroup, Matteo Lanati Capacity ● Resource limits: cores, RAM, disk space … Time ● Introduce the concept of Budget, in CPU-hours – GUI integration – Blend into the life cycle of the VM Fair resource usage
  • 12.
    12LRZ, Distributed ResourcesGroup, Matteo Lanati Time ● Introduce the concept of Budget, in CPU-hours – GUI integration Fair resource usage
  • 13.
    13LRZ, Distributed ResourcesGroup, Matteo Lanati Time ● Introduce the concept of Budget, in CPU-hours – GUI integration Fair resource usage
  • 14.
    14LRZ, Distributed ResourcesGroup, Matteo Lanati Time ● Introduce the concept of Budget, in CPU-hours – GUI integration Fair resource usage
  • 15.
    15LRZ, Distributed ResourcesGroup, Matteo Lanati Capacity ● Resource limits: cores, RAM, disk space … Time ● Introduce the concept of Budget, in CPU-hours – GUI integration – Blend into the life cycle of the VM ● Cron jobs and oneacct to monitor ● ACLs to stop usage without a budget Fair resource usage
  • 16.
    16LRZ, Distributed ResourcesGroup, Matteo Lanati ● No possibility to perform NATting ● Public IPs pool available ● Modify Econe server so that: ● Associate EIP = attach interface ● Preserve compatibility EC2 Interface and EIP
  • 17.
    17LRZ, Distributed ResourcesGroup, Matteo Lanati ● Redesign the budgeting (for billing) ● Introduce a cost function rather than CPU hours ● Monitor VMs for security flaws ● Introduce new scheduling policies ● Opportunistic scheduling ● Resource contribution ● IPv6 support Next steps
  • 18.
    18LRZ, Distributed ResourcesGroup, Matteo Lanati What our bosses thought we were going to do Personal experience
  • 19.
    19LRZ, Distributed ResourcesGroup, Matteo Lanati Documentation is fundamental Personal experience
  • 20.
    20LRZ, Distributed ResourcesGroup, Matteo Lanati Customisation to avoid errors Personal experience
  • 21.
    21LRZ, Distributed ResourcesGroup, Matteo Lanati Thank you for your attention
  • 22.
    22LRZ, Distributed ResourcesGroup, Matteo Lanati Phase 1 (2012) ● Westmere/Sandy Bridge ● > 155.000 cores ● > 3.0 Pflops/s total peak performance SuperMUC Phase 2 (2015) ● Haswell ● > 86.000 cores ● > 3.5 Pflops/s total peak performance https://www.lrz.de/services/compute/supermuc/
  • 23.
    23LRZ, Distributed ResourcesGroup, Matteo Lanati WN 1 WN 2 VM 2 OVS bridge MWN OVS bridge VM 3 OVS bridge VM 1 VM 4 VM 5 VM 6 OVS bridge OpenNebula networking: the actual picture 02:00:00:0A:00:03 02:00:00:0A:00:01 When the OVS bridge receives the packet: ● Masks the source MAC address to get the group ID 02:00:00:0A:00:01 & FF:FF:FF:FF:00:00 = 02:00:00:0A:00:00 ● Compares with the group ID of the destination: 02:00:00:0A:00:03 ● Match: let the packet go through!
  • 24.
    24LRZ, Distributed ResourcesGroup, Matteo Lanati WN 1 WN 2 VM 2 OVS bridge MWN OVS bridge VM 3 OVS bridge VM 1 VM 4 VM 5 VM 6 OVS bridge OpenNebula networking: the actual picture 02:00:00:0B:00:01 02:00:00:0A:00:01 When the OVS bridge receives the packet: ● Masks the source MAC address to get the group ID 02:00:00:0B:00:01 & FF:FF:FF:FF:00:00 = 02:00:00:0B:00:00 ● Compares with the group ID of the destination: 02:00:00:0A:00:01 ● No match: drop the connection!
  • 25.
    25LRZ, Distributed ResourcesGroup, Matteo Lanati WN 1 WN 2 VM 2 OVS bridge VM 3 OVS bridge VM 1 VM 4 VM 5 VM 6 OVS bridge OpenNebula networking: the actual picture OVS bridge DHCPDHCP LDAP backend Configure OVS Configure DHCP entry
  • 26.
    26LRZ, Distributed ResourcesGroup, Matteo Lanati WN 1 WN 2 VM 2 OVS bridge VM 3 OVS bridge VM 1 VM 4 VM 5 VM 6 OVS bridge OpenNebula networking: the actual picture OVS bridge DHCPDHCP LDAP backend Get Net setup
  • 27.
    27LRZ, Distributed ResourcesGroup, Matteo Lanati Goal: understand the bacterial composition of the gut microbiota in human and mouse samples Use case: QIIME (Quantitative Insights Into Microbial Ecology) Credits: Dr. Debora Garzetti ( garzetti@mvp.uni-muenchen.de) -Max von Pettenkofer-Institute, LMU Samples Area chart of the bacteria composition of mouse gut's samples
  • 28.
    28LRZ, Distributed ResourcesGroup, Matteo Lanati Use case: Malware Zoo Credits: George Webster (webstergd@sec.in.tum.de), Alexander Luedtke (alex@sec.in.tum.de) Catalyst for Computer Security research @ TUM Machine learning analysis on > 100.000 samples
  • 29.
    29LRZ, Distributed ResourcesGroup, Matteo Lanati LRZ Compute Cloud: the numbers so far
  • 30.
    30LRZ, Distributed ResourcesGroup, Matteo Lanati LRZ Compute Cloud: the numbers so far