Archiving data from Durham to RAL using the File Transfer Service (FTS)

Lydia Heck, Campus network engineering workshop
19/10/2016
Archiving data from Durham to RAL using the FileTransfer
Service (FTS)

19 October 2016 Campus Network Engineering for Data
Intensive Science Workshop
2
Archiving data from Durham to
RAL using the File Transfer
Service (FTS)
Lydia Heck
Institute for Computational Cosmology
Manager of the DiRAC-2/2.5 Data Centric Facility
COSMA

3
Introduction to DiRAC
l
DIRAC -- Distributed Research utilising Advanced
Computing established in 2009 with DiRAC-1
• Support of research in theoretical astronomy, particle physics and
nuclear physics
• Funded by STFC with infrastructure money allocated from the
Department for Business, Innovation and Skills (BIS)
• The running costs, such as staff costs and electricity are funded by
STFC
• DiRAC is classed as a major research facility by STFC on a par with the big
telescopes

What is DiRAC
l A national service run/managed/allocated by the
scientists who do the science funded by BIS and
STFC
l The systems are built around and for the applications
with which the science is done.
l We do not rival a facility like ARCHER, as we do not
aspire to run a general national service.
19 October 2016 4Campus Network Engineering for Data

What is DiRAC – cont’d?
l For the highlights of science carried out on the
DiRAC facility please see:
http://www.dirac.ac.uk/science.html
l Specific example: Large scale structure
calculations with the Eagle run
4096 cores
~8 GB RAM/core
47 days = 4,620,288 cpu hours
200 TB of data

The DiRAC computing systems
Blue Gene
Edinburgh
Cosmos
Cambridge
Complexity
Leicester
Data Centric
Durham
Data Analytic
Cambridge

COSMA @ DiRAC (Data Centric)
Durham – Data Centric
system –IBM IDataplex
6720 Intel Sandy Bridge
cores
53.8 TB of RAM
FDR10 infiniband 2:1
blocking
2.5 Pbyte of GPFS
storage (2.2 Pbyte used!)

Resources of DiRAC
l Long projects with significant amount of CPU hours allocated
for 3 years typically on a specific system on one or more of the
available 5 systems. Resources available:
l
l
l
l
l
8
System cpu hours storage location
Bluegene 98,304 cores 861 M 1 PB (GPFS) Edinburgh
Data Centric 6720 Xeon
cores
59 M 2.5 PB (GPFS) Durham (DiRAC2)
Data Centric 8000 Xeon
cores
> 71 M 2.5 PB data (Lustre)
1.8 PB scratch (Lustre)
Durham (DiRAC2.5)
Complexity 4352 Xeon
cores
38 M 0.8 PB (Panasas) Leicester
Data Analytic 4800 Xeon
cores
42 M 0.75 PB (Lustre) Cambridge
SMP 1784 Xeon cores
shared memory
15.6M 146 TB (EXT) Cambridge

Why do we need to copy data ?
During and when a project is completed copy data to home institutions
l requires additional storage resource at researchers’ home institutions
l Not enough provision – will require additional funds.
Make backup copies
l if disaster struck many cpu hours of calculations would be lost.
Copy data to other sites to leverage compute resources for post processing.
Storage on HPC facility runs out of capacity
data creation considerably above expectation ?
l

Why do we copy data to RAL ?
Research data must now be available to interested parties for
specified period of time
l We could install DiRAC's own archive
• requires funds and there is (currently) no budget
We needed to get started:
l to gain experience
l to get a valid backup
l to remove data as the resources run out
l Identify bottlenecks and technical challenges
Jeremy Yates (Director of DiRAC) negotiated access to the
RAL archiving systems
Set up collaborations and make use of previous experience
and pool resources
AND: copy data!
l
l

Network connectivity of Durham University
• 2012 – upgrade to 4x1 Gbit to Janet
• Janet advised to investigate optimal utilisation of
available bandwith before applying for further upgrade
• 2014 – upgrade to 6 Gbit to Janet
• currently: 8 Gbit to Janet should be a full 10 Gbit by the
end of the year – technical issues

network bandwidth – situation for Durham
l 2014: Measured throughput ?
l
l

2014: Measured Limits ?
l
l

September 2014 – Measured limits
l
l

Making optimal use of available bandwidth
• planning and investment to by-pass the external campus firewall:
• Prepartory work started in October/November 2014 two new
routers (~£80k) – configured for throughput with minimal ACL
enough to safeguard site.
• deploying internal firewalls – part of new security
infrastructure anyhow but essential for such a venture
• security now relies on front-end systems of Durham DiRAC
and Durham GridPP
• IPPP was moved outside the firewall in April 2015 with a clear
mandate to manage security for their installation.
• The DiRAC Data Transfer system was moved outside about 1
month later.

GridPP Site FW config for endpoint node
GridFTP
Port
blocking
GridFTP
Pass
thru
GridFTP
GridFTP
Monitor
w/fw
GridFTP
Bypass
site fw

Result for DiRAC and GridPP in Durham
• guaranteed 3 Gbit/sec in/out
• Consequences:
• pushed the network performance for Durham GridPP from bottom 3 in the
country to top 5 of the UK GridPP sites
• Now they experience different bottlenecks, but they under their control
• DiRAC data transfers achieve up to 300 – 400 Mbyte/sec throughput to RAL
on archiving depending on file sizes.
• faster data sharing with other collaboration sites
• recently (October 2016) offered service to Earth Sciences with 70-80
MByte/sec from site in Switzerland
•

Collaboration between DiRAC and
GridPP/RAL
l Durham Institute for Computational Cosmology (ICC)
volunteered to be the prototype installation
l Huge thanks to Jens Jensen and Brian Davies - there
were many emails exchanged, many questions asked
and many answers given.
l Resulting document
“Setting up a system for data archiving using FTS3” by
Lydia Heck, Jens Jensen and Brian Davies
l https://www.cosma.dur.ac.uk/documentation

Setting up the archiving tools
l Identify appropriate hardware – could mean
extra expense:
need freedom to modify and experiment with
 cannot have HPC users logged in and working
when you need to reboot the system!
l free to do very latest security updates
 This might not always be possible on an HPC
system
l requires optimal connection to storage
 For the transfer system this meant an infiniband
card

Setting up the archiving tools
l Create an interface to access the file/archving
service at RAL using the GridPP tools
• gridftp – Globus Toolkit – also provides Globus
Connect
• Trust anchors (egi-trustanchors)
• voms tools (emi3-xxx)
• fts3 (cern)
20

21
Chose to use FTS3 with
GridFTP
User submits transfer
lists
(and credentials)
GPFS
data.cosma.dur.ac.uk
(GridFTP)
CASTOR-GEN
srm-dirac.gridpp.rl.ac.uk
(SRM)
GridFTP
FTS3

Learning to use certificates and proxies 
l long-lived voms proxy?
l myproxy-init; myproxy-logon; voms-proxy-init; fts-transfer-
delegation
l How to create a proxy and delegation that lasts weeks
even months?
l This is still an issue for a voms proxy. But circumvented it
using normal proxy.
l grid-proxy-init; fts-transfer-delegation
l grid-proxy-init –valid HH:MM
l fts-transfer-delegation –e time-in-seconds
l creates proxy that lasts up to certificate life time.
22

Experiences
1. Large files – optimal throughput limited by network bandwidth
2. Many small files – limited by latency
3. many parallel sessions: impedes on proper functioning of
archive server.
4. Ownership, creation dates not preserved – one grid owner
5. Simple approach of “just” pushing files will not work!
23

Actions to overcome issues
• tar files up in chunks - ~256 Gbyte
• exclude checked out versioning subdirectories
• preserves ownership, and time stamps in the tar archive
• keep record of archived files
• Files to transfer are large – limited by bandwidth, not by latency
24

Open issues
l depends on single admin to carry out. Not
automatic.
l what happens when content in directories
change? – complete new archive sessions?
l Create a tool more like rsync – requires
extensive scripting
l When trying to get data back, get back all of a
subset, to find single or string of files
25

Conclusions
l With the right network speed we can archive the DiRAC data to
RAL or anywhere else with the right tools and connectivity.
l Documenting the procedure is very important to transfer the
knowledge and duplicating effort. The documentation is online
https://www.cosma.dur.ac.uk/documentation
l Each DiRAC site should have their own dirac0X account
l Start with and keep on archiving – this is more difficult as it is
not completely automatic yet and more development is
required.
l Collaboration between DiRAC and GridPP/RAL DOES work!
l The work has been of benefit to other transfer actions, which
significantly helps research and reflects well on the service we
can deliver.
l Can we aspire to more?
26

Archiving data from Durham to RAL using the File Transfer Service (FTS)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Archiving data from Durham to RAL using the File Transfer Service (FTS)

Similar to Archiving data from Durham to RAL using the File Transfer Service (FTS) (20)

More from Jisc

More from Jisc (20)

Recently uploaded

Recently uploaded (20)

Archiving data from Durham to RAL using the File Transfer Service (FTS)

Editor's Notes