My INSURER PTE LTD - Insurtech Innovation Award 2024
Transferring data: best practices, Globus Online, and Compute Canada infrastructure
1. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
LAB MEETING—TECHNICAL TALK
TRANSFERRING DATA
BEST PRACTICES, GLOBUS ONLINE, AND
COMPUTE CANADA INFRASTRUCTURE
Coby Viner
Hoffman Lab
Wednesday January 18, 2017
2. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
3. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
4. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
5. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
Need robust (exact) data transfer
6. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
Need robust (exact) data transfer
Want transfers to be efficient
7. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
Need robust (exact) data transfer
Want transfers to be efficient
Might want or need transfers to be secure
8. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
9. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
10. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
Easily accomplished with GNU Parallel!
11. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
Check automatically: rsync or Globus
12. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
Check automatically: rsync or Globus
Transfer files exactly
13. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
14. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
-a: does not include -H, may want to add it if you used
hard links (but you almost always want symbolic links
instead!)
15. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
-a: does not include -H, may want to add it if you used
hard links (but you almost always want symbolic links
instead!)
-v: verbose
16. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
-a: does not include -H, may want to add it if you used
hard links (but you almost always want symbolic links
instead!)
-v: verbose
-P: --partial --progress, but be careful with partial
17. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
GLOBUS: SIMPLE & EFFECTIVE DATA
TRANSFER
(2017). How it works, https://www.globus.org/how-it-works
20. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
GLOBUS: AS A SERVICE
K. Chard et al., “Globus data publication as a service: lowering
barriers to reproducible science”, in 2015 IEEE 11th
international conference on e-Science, IEEE, 2015,
pp. 401–410
21. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
COMPUTE CANADA: THE TRANS-CANADA
DATA HIGHWAY
(2016). Compute Canada technology briefing,
https://www.computecanada.ca/wp-content/uploads/2015/02/161125-
Tech_Brief_PROOF_2016_EN_05.pdf
23. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
C. A. Mattmann et al., “A topical evaluation and discussion of
data movement technologies for data-intensive scientific
applications”, Earth Science Informatics, vol. 9, no. 2,
pp. 247–262, 2016
24. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
25. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
26. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
Confounded performance and reliability makes this
comparison much less useful. . .
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
27. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
Confounded performance and reliability makes this
comparison much less useful. . .
Deconvolute when selecting a technology
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
28. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
Confounded performance and reliability makes this
comparison much less useful. . .
Deconvolute when selecting a technology
Ensuring 100% effective reliability is paramount
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
29. LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
(2017). How it works, https://www.globus.org/how-it-works.
K. Chard, J. Pruyne, B. Blaiszik, et al., “Globus data publication
as a service: lowering barriers to reproducible science”, in 2015
IEEE 11th international conference on e-Science, IEEE, 2015,
pp. 401–410.
(2016). Compute Canada technology briefing,
https://www.computecanada.ca/wp-
content/uploads/2015/02/161125-
Tech_Brief_PROOF_2016_EN_05.pdf.
C. A. Mattmann, L. Cinquini, P. Zimdars, et al., “A topical
evaluation and discussion of data movement technologies for
data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016.
J. Bresnahan, M. Link, G. Khanna, et al., “Globus GridFTP:
what’s new in 2007”, in Proceedings of the first international
conference on networks for grid applications, ser. GridNets ’07,
pp. 17–19.
P. Z. Kolano, “High performance reliable file transfers using
automatic many-to-many parallelization”, in, I. Caragiannis,
M. Alexander, R. M. Badia, et al., Eds., 2013, pp. 463–473.