perfSONAR: getting telemetry on your network

Duncan Rand, Jisc and Imperial College London
perfSONAR: getting telemetry on your network

WLCG/GridPP as an example community
19 Oct 2016
» TheWorldwide Large Hadron Collider Computing Grid
(WLCG) is a global collaboration of more than 170 computing
centres in 42 countries
» Its mission is to provide global computing resources to store,
distribute and analyse the ~30 petabytes of data generated
per year by the LHC experiments
» GridPP is a collaboration providing data-intensive distributed
computing resources for the UK HEP community and the UK
contribution to theWLCG
» Hierarchically arranged with four tiers:
› Tier-0 at CERN (andWigner in Hungary)
› 13Tier-1s (mainly national physics laboratories)
› 149Tier-2s (generally university physics laboratories)
› Tier-3s

19 Oct 2016
» Initial modelling of LHC computing requirements suggested a
hierarchical tier-based data management and transfer model
» Data exported fromTier-0 at CERN to eachTier-1 and then on
toTier-2s
» However better than expected network bandwidth means
that the LHC experiments have been able to relax this
hierarchy
» Now data is transferred in an all-to-all mesh configuration
» Data often transferred across multiple domains
› e.g. a CMS transfer to Imperial College London might come
predominately from Fermilab near Chicago along with
other CMS sites
» So good network is crucial to the operation of theWLCG and
that means good monitoring

perfSONAR
19 Oct 2016
»Network monitoring tool developed by ESnet, GEANT,
Indiana University and Internet2
»'perfSONAR is a widely-deployed test and measurement
infrastructure that is used by science networks and
facilities around the world to monitor and ensure
network performance.’
»'perfSONAR’s purpose is to aid in network diagnosis by
allowing users to characterize and isolate problems. It
provides measurements of network performance metrics
over time as well as “on-demand” tests’
»http://www.perfsonar.net/about/what-is-perfsonar/

Worldwide perfSONAR host locations
19 Oct 2016

19 Oct 2016
Reverse
throughput
Throughput

Durham University GridPP site
19 Oct 2016
Replaced perfSONAR
host motherboard

Lancaster University GridPP site
19 Oct 2016
“a number of
major tweaks
to our network
configuration”

Oxford University GridPP site
19 Oct 2016
Reconfiguration of site
core network

MaDDash visualisation dashboard
19 Oct 2016
»With large meshes it is difficult to check all hosts
»Centralised dashboards really help visualise overall
performance
»MaDDash (Monitoring and Debugging Dashboard)
displays meshes of perfSONAR hosts
»Many examples of MaDDash dashboards, e.g. ICNRG,
WLCG
»WLCG dashboard has two aspects
› Open Monitoring Distribution (Nagios monitoring)
› MaDDash
»http://psmad.grid.iu.edu/maddash-webui/

perfSONAR configuration interface
19 Oct 2016
»A perfSONAR host can participate in multiple meshes
»Configuration interface and auto-URL enables dynamic
configuration of entire network
McKee et al.
CHEP2015

19 Oct 2016
»Adding and removing hosts from the mesh configuration
is very simple
»Makes use of aWLCG database of hosts
»Version of GUI developed by OSG to be included in
perfSONAR toolkit

19 Oct 2016
Initial WLCG meshes based around countries, e.g. UK/GridPP

Dual-stack perfSONAR measurements
19 Oct 2016
»IPv6 rollout is slow but steady
»Assumption (hope) that future campus upgrades will
include provision of IPv6
»perfSONAR supports IPv4 and IPv6 measurements
»Can leave perfSONAR hosts to default to using IPv6 if it
exists but then not always clear which is in use
»Otherwise can force with "ipv6_only": "1” parameter

WLCG/HEPiX IPv6 Working Groups
19 Oct 2016
»TheWLCG has an ongoing effort to promote the
adoption of IPv6
»Aim to be able to allow sites to offer IPv6-only
computing resources to theWLCG by April 2017
»HEPiX/WLCG IPv6 working groups looking into issues
»Developed mesh to track roll-out of IPv6 capable
perfSONAR hosts within WLCG
»Currently twenty oneWLCG perfSONAR dual-stack
nodes are in the mesh

Dual-stack bandwidth measurements
19 Oct 2016

Dual-stackTraceroute
19 Oct 2016

Oxford Oct 2015
19 Oct 2016
IPv4 ~ 5Gbps
IPv6 ~ 0.5Gbps

Oxford Sept 2016
19 Oct 2016
IPv4 ~ 1.3Gbps
IPv6 ~1.3Gbps

Small perfSONAR node projects
19 Oct 2016
»DataTransfer Zones need well-specified, dedicated
hardware to run perfSONAR hosts
»Requires some investment of time and money
»Would be nice to have an easier way to get any idea of
network performance
»GÉANT have developed a small perfSONAR node using
Gigabyte Brix devices costing about £150-200 each
»Using these in a short but time-limited small perfSONAR
node project
»IPv6 included from the start
GÉANT

Small perfSONAR node projects
19 Oct 2016
»Jisc would like to take this project forward
»Will probably use existing image
»Send out small perfSONAR node to users who wish to
get a rapid and easy idea of their network performance
»For example a scientist in a UK institute with slow
download of data set from e.g. Diamond or Jasmin
»Also plan to produce a UK mesh into which these small
nodes could be added more or less temporarily
»Training course on how to set up such a mesh being run
by GEANT in Zurich on 4th November 2016
› https://eventr.geant.org/events/2496

Improving diagnostics: Pundit
19 Oct 2016
»A large mesh such as those in use byWLCG contains a lot
of useful data
»Should be possible to use network tomography to, for
example, identify problematic routers by correlating
traceroute and performance data
»PUNDIT project in US aimed at this
»Additional executable installed on perfSONAR host
»More details: http://pundit.gatech.edu and
https://indico.cern.ch/event/505613/contributions/222742
8/

Summary
19 Oct 2016
»perfSONAR is a valuable resource for characterising and
diagnosing network performance
»Bandwidth nodes typically record throughput and
traceroute data; latency nodes record latency and loss
»Network administrators should consider installing several
at pertinent places, e.g. at the border, next to storage etc
»Meshes together with MadDash dashboards allow
relatively easy monitoring of groups of hosts
»Future perfSONAR meshes should include IPv6
»Development work is ongoing to improve the automatic
notification and diagnosis of network faults

19 Oct 2016
»Thank you
»Duncan.Rand@jisc.ac.uk

perfSONAR: getting telemetry on your network

More Related Content

What's hot

Viewers also liked

Similar to perfSONAR: getting telemetry on your network

More from Jisc

Recently uploaded

perfSONAR: getting telemetry on your network

Editor's Notes