Parallel session: supporting data-intensive applications

Parallel Session A:
Supporting data
intensive applications
Chair:Tim Chown

Please switch your mobile phones to silent
17:30 -
19:00
No fire alarms scheduled. In the event of an
alarm, please follow directions of NCC staff
Exhibitor showcase and drinks reception
18:00 -
19:00 Birds of a feather sessions

Janet end-to-end
performance initiative update
Tim Chown, Jisc

Why is end-to-end performance important?
» Overall goal: help optimise a site’s use of its Janet connectivity
» Seeing a growth in data-intensive science applications
› Includes established areas like GridPP (moving LHC data around)
› As well as new areas like cryo-electron microscopy
» Seeing an increasing number of remote computation scenarios
› e.g., scientific networked equipment, no local compute
› Might require 10Gbit/s to remote compute to return computation results on a 100GB
data set to a researcher for timely visualisation
» Starting to see more 100Gbit/s connectivity requests
› Likely to have challenging data transfer requirements behind them
» As networking people, how do we help our researchers and their applications?
11/04/2017 Janet end-to-end performance initiative update

Speaking to your researchers
» Are you or your computing service department speaking to your researchers?
› If not, how do you understand their data-intensive requirements?
› If so, is this happening on a regular basis?
› Ideally you’d want to be able to plan ahead, rather than adapt on the fly
» Do you conduct networking “future looks”?
› Any step changes might dwarf your site’s organic growth
› This issue should have some attention at CIO or PVC Research level
» Do you know what your application elephants are?
› What’s the breakdown of your site’s network traffic?
› How are you monitoring network flows?

Researcher expectations
» How can we help set and manage researcher expectations?
» One aspect is helping them articulate their network requirements
› Volume of data in time X => data rate required
» It’s also about understanding practical limitations
› Can determine theoretical network throughput
› e.g. in principle, you can transfer 100TB over a 10Gbit/s link in 1 day
› But in practice, many factors may prevent this
» We should encourage researchers to speak to their computing service
» Computing services can in turn talk to the Janet Service Desk
› jisc.ac.uk/contact
» And noting here that cloud capabilities are becoming increasingly important

Janet end-to-end performance initiative
» This is the context in which Jisc set up the Janet end-to-end performance initiative
» The goals of the initiative include:
› Engaging with existing data-intensive research communities and identifying emerging
communities
› Creating dialogue between Jisc, computing service groups, and research communities
› Holding workshops, facilitating discussion on e-mail lists, etc.
› Helping researchers manage expectations
› Establishing and sharing best practices in identifying and rectifying causes of poor
performance
» More information:
› jisc.ac.uk/rd/projects/janet-end-to-end-performance-initiative

Understanding the factors affecting E2E
» Achieving optimal end-to-end performance is a multi-faceted problem.
» It includes:
› Appropriate provisioning between the end sites
› Properties of the local campus network (at each end), including capacity of the Janet
connectivity, internal LAN design, the performance of firewalls and the configuration of
other devices on the path
› End system configuration and tuning; network stack buffer sizes, disk I/O, memory
management, etc.
› The choice of tools used to transfer data, and the underlying network protocols
» To optimise end-to-end performance, you need to address each aspect

Janet network engineering
» From Jisc’s perspective, it’s important to ensure there is sufficient capacity in the network
for its connected member sites
» We perform an ongoing review of network utilisation
› Provision the backbone network and regional links
› Provision external connectivity, to other NRENs and networks
› Observe the utilisation, model growth, predict ahead
› Step changes have bigger impact at the local rather than backbone scale
» Janet has no differential queueing for regular IP traffic
› The Netpath service exists for dedicated / overlay links
› In general, Jisc plans regular network upgrades with a view to ensuring that there
is sufficient latent capacity in the network

Janet backbone, October 2016

Major external links, October 2016

E2EPI site visits
» We (Duncan Rand and I) have visited a dozen or so sites
› Met with a variety of networking staff and researchers
› And spoken to many others via email
» Really interesting to hear what sites are doing
› Some good practice evident, especially in local network engineering
› e.g. routing those elephants around main campus firewalls
› Campus firewalls often not designed for single high throughput flows
» Seeing varying use of site links
› e.g. 10G for campus, 10G resilient, 20G for research (GridPP)
› Some sites using their “resilient” link for bulk data transfers
» Some rate limiting of researcher traffic
› To avoid adverse impact on general campus traffic
› We’d encourage sites to talk to the JSD rather than rate limit

The Science DMZ model
» ESnet published the Science DMZ “design pattern” in 2012/13
› es.net/assets/pubs_presos/sc13sciDMZ-final.pdf
» Three key elements:
› Network architecture; avoiding local bottlenecks
› Network performance measurement
› Data transfer node (DTN) design and configuration
» Also important to apply your security policy without impacting performance
» The NSF’s Cyberinfrastructure (CC*) Program funded this model in over 100 US
universities:
› See nsf.gov/funding/pgm_summ.jsp?pims_id=504748
» No current funding equivalent in the UK; it’s down to individual campuses to
fund changes to network architectures for data-intensive science
› But this can and should be part of your network architecture evolution

Good news – you’re doing a lot already
» There are several examples of sites in the UK that
have a form of Science DMZ deployment
» In many cases the deployments were made
without knowledge of the Science DMZ model
» But Science DMZ is simply a set of good principles
to follow, so it’s not surprising that some Janet
sites are already doing it
» Examples in the UK:
› Diamond Light Source (more from Alex next)
› JASMIN/CEDA DataTransfer Zone
› Imperial College GridPP; supports up to 40Gbit/s
of IPv4/IPv6
› To realise the benefit, both end sites need to
apply the principles

Examples of campus network engineering
» In principle you can just use your Janet IP service
» Where specific guarantees are required, the Netpath Plus service is available
› See jisc.ac.uk/netpath
› But then you’ll not be able to exceed that capacity
» Some sites split their campus links, e.g. 10G campus, 10G science
› Again, be careful about using your “resilient” link for bulk data
› It’s better to speak to the JSD about a primary link upgrade
› And ensure appropriate resilience for that
» Some sites rate-limit research traffic
» Some examples of physical / virtual overlays
› The WLCG (for LHC data) deployed OPN (optical) and LHCONE (virtual) networks
› Not clear that one overlay per research community would scale
» At least one site is exploringCisco ACI (their SDN solution)

Measuring network characteristics
» It’s important to have telemetry on your network
» The Science DMZ model recommends perfSONAR for this
› More from Alex and Szymon later in this session
› Current version about to be 4.0 (as of April 17th, all being well )
» perfSONAR uses proven measurement tools under the hood
› e.g. iperf and owamp
» Can run between two perfSONAR systems or build a mesh
» Collects telemetry over time; throughput, loss, latency, traffic path
» Helps you assess the impact of changes to your network or systems
› And to understand variance in characteristics over time
» It can highlight poor performance, but doesn’t troubleshoot per se

Example: UK GridPP perfSONAR mesh

Janet perfSONAR / DTN test node(s)
» We’ve installed a 10G perfSONAR node at a Janet PoP in London
› Lets you test your site’s throughput to/from Janet backbone
› Might be useful to you if you know you’ll want to run some data-intensive applications
in the near future, but don’t yet have perfSONAR at the far end, or if you just want to
benchmark your site’s connectivity
› Ask us if you’re interested in using it
» We’re planning to add a second perfSONAR test node in our Slough DC
› Also planning to install a 10G reference SSD-based DTN there
› See https://fasterdata.es.net/science-dmz/DTN/
› This will allow disk-to-disk tests, using a variety of transfer tools
» We can also run a perfSONAR mesh for you, using MaDDash on aVM
» We may also deploy an experimental perfSONAR node
› e.g. to evaluate the new GoogleTCP-BBR implementation

Small node perfSONAR
» In some cases, just an indicative perfSONAR test is useful
» i.e., run loss/latency tests as normal, but limit throughout tests to 1Gbit/s
» For this scenario, you can build a small node perfSONAR system for under £250
» Jisc took part in the GEANT small node pilot project, using Gigabyte Brix:
› IPv4 and IPv6 test mesh at http://perfsonar-smallnodes.geant.org/maddash-webui/
» We now have a device build that we can offer to communities for testing
› Aim to make them as “plug and play” as possible
› And a stepping stone to a full perfSONAR node
» Further information andTNC2016 meeting slide deck:
› https://lists.geant.org/sympa/d_read/perfsonar-smallnodes/

perfSONAR small node test mesh

Aside: Google’sTCP-BBR
» TraditionalTCP performs poorly even with just a fraction of 1% loss rate
» Google have been developing a new version ofTCP
› TCP-BBR was open-sourced last year
› Requires just sender-side deployment
› Seeks high throughput with a small queue
› Good performance at up to 15% loss
› Google using it in production today
» Would be good to explore this further
› Understand impact on otherTCP variants
› And when used for parallelisedTCP applications like GridFTP
» See the presentation from the March 2017 IETF meeting:
› etf.org/proceedings/98/slides/slides-98-iccrg-an-update-on-bbr-congestion-control-
00.pdf

Building on Science DMZ?
» We should seek to establish good principles and practices at all campuses
› And the research organisations they work with, like Diamond
› There’s already a good foundation at many GridPP sites
» The Janet backbone is heading towards 600G capacity and beyond
» We can seed further communities of good practice on this foundation
› e.g. the DiRAC HPC community, the SES consortium, …
» And grow a Research DataTransfer Zone (RDTZ) within and between campuses
› Build towards a UK RDTZ
› Inspired by the US Pacific Research Platform model of multi-site, multi-discipline
research-driven collaboration built on NSF Science DMZ investment
» Many potential benefits, such as enabling new types of workflow
› e.g. streaming data to CPUs without the need to store locally

Transfer tools
» Your researchers are likely to find many available data transfer tools:
» There’s the simpler old friends like ftp and scp
› But these are likely to give a bad initial impression of what your network can do
» There’sTCP-based tools designed to mitigate the impact of packet loss
› GridFTP typically uses four parallelTCP streams
» Globus Connect is free for non-profit research and education use
› See globus.org/globus-connect
» There’s tools to support management of transfers
› FTS – see http://astro.dur.ac.uk/~dph0elh/documentation/transfer-data-to-ral-v1.4.pdf
» There’s also a commercial UDP-based option, Aspera
› See asperasoft.com/
» It would be good to establish more benchmarking of these tools at Janet campuses

E2E performance to cloud compute
» We’re seeing growing interest in the use of commercial cloud compute
› e.g. to provide remote CPU for scientific equipment
» Complements compute available at the new ESPRCTier-2 HPC facilities
› epsrc.ac.uk/research/facilities/hpc/tier2/
» Anecdotal reports of 2-3Gbit/s into AWS
› e.g. by researchers at the Institute of Cancer Research
› See presentations at RCUK CloudWorkshop - https://cloud.ac.uk/
» Bandwidth for AWS depends on theVM size
› See https://aws.amazon.com/ec2/instance-types/
» We’re keen to explore cloud compute connectivity further
› Includes AWS, MS ExpressRoute, …
› And scaling Janet connectivity to these services as appropriate

Future plans for E2EPI
» Our future plans include:
› Writing up and disseminating best practice case studies
› Growing a UK RDTZ by promoting such best practices within communities
› Deploying a second 10G Janet perfSONAR test node at our Slough DC
› Deploying a 10G reference DTN at Slough and performing transfer tool benchmarking
› Promoting wider campus perfSONAR deployment and community meshes
› Integrating perfSONAR data with router link utilisation
› Developing our troubleshooting support further
› Experimenting with Google’sTCP-BBR
› Exploring best practices in implementing security models for Science DMZ
› Expanding Science DMZ to include IPv6, SDN and other technologies
› Running a second community E2EPI workshop in October
› Holding a hands-on perfSONAR training event (after 4.0 is out)

Useful links
» Janet E2EPI roject page
› jisc.ac.uk/rd/projects/janet-end-to-end-performance-initiative
» E2EPI Jisc community page
› https://community.jisc.ac.uk/groups/janet-end-end-performance-initiative
» JiscMail E2EPI list (approx 100 subscribers)
› jiscmail.ac.uk/cgi-bin/webadmin?A0=E2EPI
» Camus Network Engineering for Data-Intensive Science workshop slides
› jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-
19-oct-2016
» Fasterdata knowledge base
› http://fasterdata.es.net/
» eduPERT knowledge base
› http://kb.pert.geant.net/PERTKB/WebHome

jisc.ac.uk
contact
Tim Chown
Jisc
tim.chown@jisc.ac.uk

Using perfSONAR and
Science DMZ to resolve
throughput issues
AlexWhite, Diamond

Diamond Light Source
perfSONAR and Science DMZ at Diamond

Diamond
»The Diamond machine is a type of particle accelerator
»CERN: high energy particles smashed together, analyse
the crash
»Diamond: exploits the light produced by high energy
particles undergoing acceleration
»Use this light to study matter – like a “super microscope”
What is the Diamond Light Source?

Diamond
»In the Oxfordshire countryside, by the A34 near Didcot
»Diamond is a not-for-profit joint venture between STFC
and the WellcomeTrust
»Cost of use: Free access for scientists through a
competitive scientific application process
»Over 7000 researchers from academia and industry have
used our facility
What is the Diamond Light Source?

The Machine
»Three particle accelerators:
› Linear accelerator
› Booster Synchrotron
› Storage ring
– 48 straight sections angled
together to make a ring
– 562m long
– could be called a
“tetracontakaioctagon”

The Machine

The Science
»Bright beams of light from
each bending magnet or
wiggler are directed into
laboratories known as
“beamlines”.
»We also have several cutting-
edge electron microscopes.
»All of these experiments
operate concurrently!
»Spectroscopy
»Crystallography
»Tomography (think CAT Scan)
»Infrared
»X-ray absorption
»X-ray scattering

x-ray diffraction

Data Rates
»Central Lustre and GPFS filesystems for science data:
› 420TB, 900TB, 3.3PB (as of 2016)
»Typical x-ray camera: 4MB frame, 100x per second
»An experiment can easily produce 300GB-1TB
»Scientists want to take their data home
Data-intensive research

Data Rates
»Each dataset might only be downloaded once
»I don’t know where my users are
»Some scientists want to push data back to Diamond
Data-intensive problems

Sneakernet
The old ways are the best?

Network Limits
»Science data downloads from Diamond to visiting users’
institutes were inconsistent and slow
› …even though the facility had a “10Gb/s” JANET
connection from STFC.
»The limit on download speeds was delaying post-
experiment analysis at users’ home institutes.
The Problem

Characterising the problem
»Our target: “a stable 50Mb/s
over a 10ms path”
› Using our site’s shared
JANET connection
› In real terms, 10ms was
approximately DLS to
Oxford

Baseline findings
»Inside our network: 10Gb/s, no packet loss
»Over the STFC/JANET segment between Diamond’s edge
and the Physics Department at Oxford:
› low and unpredictable speeds
› a small amount of packet loss
Baseline findings with iperf

Packet Loss
TCPThroughput =
»MSS (Packet Size)
»Latency (RTT)
»Packet Loss probability
TCP Performance is predicted by the Mathis equation

Packet Loss
“Interesting” effects of packet loss

Packet Loss
»According to Mathis, to
achieve our initial goal over a
10ms path, the tolerable
packet loss is:
› 0.026% (maximum)

Finding the problem in the Last Mile
»We worked with STFC to connect a perfSONAR server
directly to the main Harwell campus border router to look
for loss in the “last mile”

Science DMZ
The Fix: Science DMZ

Security
»Data intensive science traffic interacts poorly with
enterprise firewalls
»Does this mean we can use the Science DMZ idea to just…
ignore security?
› No!
Aside: Security without Firewalls

Security
»Implementing a Science DMZ means segmenting your
network traffic
› Apply specific controls to data transfer hosts
› Avoid unnecessary controls
Science DMZ as a Security Architecture

Security
»Only run the services you need
»Protect each host:
› Router ACLs
› On-host firewall – Linux iptables is performant
Techniques to secure Science DMZ hosts

Globus GridFTP
»We adopted Globus GridFTP as
our recommended transfer
method
› It uses parallelTCP streams
› It retries and resumes
automatically
› It has a simple, web-based
interface

Diamond Science DMZ
The Diamond Science DMZ

Diamond Science DMZ
»Test data: 2Gb/s+ consistently
between Diamond and the
ESnet test point at
Brookhaven Labs, NewYork
State, USA
Speed records!
»Biggest: Electron Microscope
data from DLS to Imperial
› 1120GB at 290Mb/s
»Fastest: Crystallography
dataset from DLS to
Newcastle
› 260GB at 480Mb/s
Actual transfers:

Bad bits
»Globus’ logs
› Netflow
»Globus install
»My own use of perfSONAR
Bad bits

SCP and Aspera
»SCP
»Aspera
Different TransferTools?

Near Future
»10Gb+ performance
› Globus Cluster for multiple 10Gb links
› 40Gb single links
Near Future

Thank you!
»Use real world testing
»Never believe your vendor
»Zero packet loss is crucial
»Enterprise firewalls introduce
packet loss
AlexWhite
Diamond Light Source
alex.white@diamond.ac.uk

jisc.ac.uk
contact
AlexWhite
Diamond
alex.white@diamond.ac.uk

Essentials of the Modern
Performance Monitoring with
perfSONAR
SzymonTrocha, PSNC / GÉANT, szymon.trocha@psnc.pl
11 April 2017

Motivations
»Identify problems, when they
happen or (better) earlier
»The tools must be available (at
campus endpoints,
demarcations between
networks, at exchange points,
and near data resources such
as storage and computing
elements, etc)
»Access to testing resources
12/04/2017 Essentials of the Modern Performance Monitoring with perfSONAR

Problem statement
» The global Research & Education network ecosystem is comprised of hundreds of
international, national, regional and local-scale networks
» While these networks all interconnect, each network is owned and operated by separate
organizations (called “domains”) with different policies, customers, funding models,
hardware, bandwidth and configurations
» This complex, heterogeneous set of networks must operate seamlessly from “end to end”
to support your science and research collaborations that are distributed globally

Where AreThe (multidomain) Problems?
Source
Campus
S
Congested or faulty links
between domains
Congested intra-
campus links
D
Destination
Campus
Latency dependant problems inside
domains with small RTT

Challenges
»Delivering end-to-end performance
› Get the user, service delivery teams, local campus and
metro/backbone network operators working together
effectively
–Have tools in place
–Know your (network) expectations
–Be aware of network troubleshooting

What is perfSONAR?
» It’s infeasible to perform at-scale data movement all the time – as we see in other
forms of science, we need to rely on simulations
» perfSONAR is a tool to:
› Set network performance expectations
› Find network problems (“soft failures”)
› Help fix these problems
› All in multi-domain environments
» These problems are all harder when multiple networks are involved
» perfSONAR is provides a standard way to publish monitoring data
» This data is interesting to network researchers as well as network operators

TheToolkit
» Network performance comes down to a couple of key metrics:
› Throughput (e.g. “how much can I get out of the network”)
› Latency (time it takes to get to/from a destination)
› Packet loss/duplication/ordering (for some sampling of packets, do they all make it to the other side without serious abnormalities
occurring?)
› Network utilization (the opposite of “throughput” for a moment in time)
» We can get many of these from a selection of measurement tools – the perfSONAR Toolkit
» The “perfSONAR Toolkit” is an open source implementation and packaging of the perfSONAR
measurement infrastructure and protocols
» All components are available as RPMs, DEBs, and bundled as a CentOS ISO
» Very easy to install and configure (usually takes less than 30 minutes for default install)

Importance of RegularTesting
» We can’t wait for users to report problems and then fix them
» Things just break sometimes
› Bad system or network tuning
› Failing optics
› Somebody messed around in a patch panel and kinked a fiber
› Hardware goes bad
» Problems that get fixed have a way of coming back
› System defaults come back after hardware/software upgrades
› New employees may not know why the previous employee set things up a certain way and back
out fixes
» Important to continually collect, archive, and alert on active test results

perfSONAR Deployment Possibilities
»Small node»Dedicated server
› A singleCPU with multiple cores
(2.7 GHz for 10Gps tests)
› 4GB RAM
› 1Gps onboard (mgmt + delay)
› 10Gps PCI-slot NIC (throughput)
»Low cost – small PC
› e.g. GIGABYTE BRIX GB-BACE-
3150
Various models
»Ad hoc testing
› Docker

Deployment Styles
Beacon
Island
Mesh

Location criteria
»Where it can be integrated
into the facility
software/hardware
management systems?
»Where it can do the most good
for the network operators or
users?
»Where it can do the most good
for the community?
EDGE
NEXTTO
SERVICES

Distributed Deployment In a Few Steps
Choose your
home
• Connect to
network
InstallToolkit
software
• By site
administrator
Configure
hosts
• Networking
• 2 interfaces
• Visibility
Point to
central host
• To consume
central mesh
configuration
Install and
configure
• By mesh
administrator
• Central data
storage
• Dashboard GUI
• Home for mesh
configuration
Configure
mesh
• Who, what and
when
• Every 6 hours
(bandwidth)
• Be careful 10G ->
1 G
Publish mesh
configuration
• To be consumed
by measurement
hosts
Run dashboard
• Observe
thresholds
• Look for errors
HOSTSCENTRALSERVER

New 4.0 release announcement
» Introduction of pScheduler
› New software for measurements to replace BWCTL
» GUI changes
› More information, better presentation
» OS support upgrade
› Debian 7, Ubuntu 14
› Centos7
– Still supportingCentos6
» Mesh config GUI
» Maddash alerting
April 17th

4.0 Data Colletion

New GUI

Thank you!
szymon.trocha@psnc.pl
• http://www.perfsonar.net/
• http://docs.perfsonar.net/
• http://www.perfsonar.net/about/getting-help/
• perfSONAR videos
• https://learn.nsrc.org/perfsonar
Hands-on
course in
London is
coming…

jisc.ac.uk
SzymonTrocha
PSNC/GEANT
szymon.trocha@man.poznan.pl
12/04/2017 Title of presentation (Insert > Header & Footer > Slide > Footer > Apply to all)

Parallel session: supporting data-intensive applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Parallel session: supporting data-intensive applications

Similar to Parallel session: supporting data-intensive applications (20)

More from Jisc

More from Jisc (20)

Recently uploaded

Recently uploaded (20)

Parallel session: supporting data-intensive applications

Editor's Notes