SlideShare a Scribd company logo
1 of 29
Download to read offline
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
LAB MEETING—TECHNICAL TALK
TRANSFERRING DATA
BEST PRACTICES, GLOBUS ONLINE, AND
COMPUTE CANADA INFRASTRUCTURE
Coby Viner
Hoffman Lab
Wednesday January 18, 2017
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
Need robust (exact) data transfer
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
Need robust (exact) data transfer
Want transfers to be efficient
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
Need robust (exact) data transfer
Want transfers to be efficient
Might want or need transfers to be secure
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
Easily accomplished with GNU Parallel!
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
Check automatically: rsync or Globus
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
Check automatically: rsync or Globus
Transfer files exactly
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
-a: does not include -H, may want to add it if you used
hard links (but you almost always want symbolic links
instead!)
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
-a: does not include -H, may want to add it if you used
hard links (but you almost always want symbolic links
instead!)
-v: verbose
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
-a: does not include -H, may want to add it if you used
hard links (but you almost always want symbolic links
instead!)
-v: verbose
-P: --partial --progress, but be careful with partial
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
GLOBUS: SIMPLE & EFFECTIVE DATA
TRANSFER
(2017). How it works, https://www.globus.org/how-it-works
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
GLOBUS: SIMPLE & EFFECTIVE DATA
TRANSFER
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
GLOBUS: SIMPLE & EFFECTIVE DATA
TRANSFER
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
GLOBUS: AS A SERVICE
K. Chard et al., “Globus data publication as a service: lowering
barriers to reproducible science”, in 2015 IEEE 11th
international conference on e-Science, IEEE, 2015,
pp. 401–410
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
COMPUTE CANADA: THE TRANS-CANADA
DATA HIGHWAY
(2016). Compute Canada technology briefing,
https://www.computecanada.ca/wp-content/uploads/2015/02/161125-
Tech_Brief_PROOF_2016_EN_05.pdf
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
MANY SITES, WITH LOTS OF STORAGE
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
C. A. Mattmann et al., “A topical evaluation and discussion of
data movement technologies for data-intensive scientific
applications”, Earth Science Informatics, vol. 9, no. 2,
pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
Confounded performance and reliability makes this
comparison much less useful. . .
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
Confounded performance and reliability makes this
comparison much less useful. . .
Deconvolute when selecting a technology
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
Confounded performance and reliability makes this
comparison much less useful. . .
Deconvolute when selecting a technology
Ensuring 100% effective reliability is paramount
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
(2017). How it works, https://www.globus.org/how-it-works.
K. Chard, J. Pruyne, B. Blaiszik, et al., “Globus data publication
as a service: lowering barriers to reproducible science”, in 2015
IEEE 11th international conference on e-Science, IEEE, 2015,
pp. 401–410.
(2016). Compute Canada technology briefing,
https://www.computecanada.ca/wp-
content/uploads/2015/02/161125-
Tech_Brief_PROOF_2016_EN_05.pdf.
C. A. Mattmann, L. Cinquini, P. Zimdars, et al., “A topical
evaluation and discussion of data movement technologies for
data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016.
J. Bresnahan, M. Link, G. Khanna, et al., “Globus GridFTP:
what’s new in 2007”, in Proceedings of the first international
conference on networks for grid applications, ser. GridNets ’07,
pp. 17–19.
P. Z. Kolano, “High performance reliable file transfers using
automatic many-to-many parallelization”, in, I. Caragiannis,
M. Alexander, R. M. Badia, et al., Eds., 2013, pp. 463–473.

More Related Content

What's hot

Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedGuido Schmutz
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detectionhadooparchbook
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Nathan Bijnens
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonNathan Bijnens
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon TokyoSid Anand
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialhadooparchbook
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?MapR Technologies
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at DevoxxNathan Bijnens
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...Amazon Web Services
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platformhadooparchbook
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317Nan Zhu
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialhadooparchbook
 
Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullJim Dowling
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Jeffrey Sica
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Data Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloningData Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloningKyle Hailey
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 

What's hot (20)

Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detection
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platform
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
 
Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-full
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Data Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloningData Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloning
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 

Similar to Transferring data: best practices, Globus Online, and Compute Canada infrastructure

WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelFrank Pfleger
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...Chris Fregly
 
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIOptimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIData Con LA
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLY
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLYENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLY
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLYCDNetworks
 
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...Amazon Web Services
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationAndrew Wesbecher
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...Chris Fregly
 
Digital Transformation | AWS Webinar
Digital Transformation | AWS WebinarDigital Transformation | AWS Webinar
Digital Transformation | AWS WebinarAmazon Web Services
 
AWS Webcast - AWS OpsWorks Continuous Integration Demo
AWS Webcast - AWS OpsWorks Continuous Integration Demo  AWS Webcast - AWS OpsWorks Continuous Integration Demo
AWS Webcast - AWS OpsWorks Continuous Integration Demo Amazon Web Services
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...Chris Fregly
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Chris Fregly
 
Embrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleEmbrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleSean Cribbs
 
Humans and Data Don’t Mix: Best Practices to Secure Your Cloud
Humans and Data Don’t Mix: Best Practices to Secure Your CloudHumans and Data Don’t Mix: Best Practices to Secure Your Cloud
Humans and Data Don’t Mix: Best Practices to Secure Your CloudPriyanka Aash
 
Patterns & Practices of Microservices
Patterns & Practices of MicroservicesPatterns & Practices of Microservices
Patterns & Practices of MicroservicesWesley Reisz
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013Kirill Osipov
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryIan Foster
 
High Performance Computing - A Serverless Story
High Performance Computing - A Serverless StoryHigh Performance Computing - A Serverless Story
High Performance Computing - A Serverless StoryEoin Shanaghy
 

Similar to Transferring data: best practices, Globus Online, and Compute Canada infrastructure (20)

WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
 
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIOptimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLY
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLYENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLY
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLY
 
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentation
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentation
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
 
Digital Transformation | AWS Webinar
Digital Transformation | AWS WebinarDigital Transformation | AWS Webinar
Digital Transformation | AWS Webinar
 
AWS Webcast - AWS OpsWorks Continuous Integration Demo
AWS Webcast - AWS OpsWorks Continuous Integration Demo  AWS Webcast - AWS OpsWorks Continuous Integration Demo
AWS Webcast - AWS OpsWorks Continuous Integration Demo
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
Embrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleEmbrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with Ripple
 
Humans and Data Don’t Mix: Best Practices to Secure Your Cloud
Humans and Data Don’t Mix: Best Practices to Secure Your CloudHumans and Data Don’t Mix: Best Practices to Secure Your Cloud
Humans and Data Don’t Mix: Best Practices to Secure Your Cloud
 
Patterns & Practices of Microservices
Patterns & Practices of MicroservicesPatterns & Practices of Microservices
Patterns & Practices of Microservices
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
High Performance Computing - A Serverless Story
High Performance Computing - A Serverless StoryHigh Performance Computing - A Serverless Story
High Performance Computing - A Serverless Story
 

More from Hoffman Lab

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkHoffman Lab
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetHoffman Lab
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome BrowserHoffman Lab
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelHoffman Lab
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornHoffman Lab
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)Hoffman Lab
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorHoffman Lab
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and RmdformatsHoffman Lab
 
File searching tools
File searching toolsFile searching tools
File searching toolsHoffman Lab
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroHoffman Lab
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and BioawkHoffman Lab
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and ShellsHoffman Lab
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/AcronymHoffman Lab
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyHoffman Lab
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With GitHoffman Lab
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserHoffman Lab
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...Hoffman Lab
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2Hoffman Lab
 

More from Hoffman Lab (20)

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talk
 
TCRpower
TCRpowerTCRpower
TCRpower
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with gget
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome Browser
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network Tunnel
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seaborn
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and Rmdformats
 
File searching tools
File searching toolsFile searching tools
File searching tools
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for Zotero
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and Bioawk
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and Shells
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/Acronym
 
Linters in R
Linters in RLinters in R
Linters in R
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biology
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With Git
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Transferring data: best practices, Globus Online, and Compute Canada infrastructure

  • 1. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES LAB MEETING—TECHNICAL TALK TRANSFERRING DATA BEST PRACTICES, GLOBUS ONLINE, AND COMPUTE CANADA INFRASTRUCTURE Coby Viner Hoffman Lab Wednesday January 18, 2017
  • 2. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data
  • 3. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data Bringing computation to large datasets is not always feasible
  • 4. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data Bringing computation to large datasets is not always feasible Often data is “portable”, software and pipelines are not
  • 5. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data Bringing computation to large datasets is not always feasible Often data is “portable”, software and pipelines are not Need robust (exact) data transfer
  • 6. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data Bringing computation to large datasets is not always feasible Often data is “portable”, software and pipelines are not Need robust (exact) data transfer Want transfers to be efficient
  • 7. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data Bringing computation to large datasets is not always feasible Often data is “portable”, software and pipelines are not Need robust (exact) data transfer Want transfers to be efficient Might want or need transfers to be secure
  • 8. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES Always perform a check-sum whenever data is transferred to another file system
  • 9. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES Always perform a check-sum whenever data is transferred to another file system Use MD5 sum: md5sum <file> > <file>.md5 locally; md5sum -c <file>, from the new system with both <file> and <file>.md5 transferred to it.
  • 10. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES Always perform a check-sum whenever data is transferred to another file system Use MD5 sum: md5sum <file> > <file>.md5 locally; md5sum -c <file>, from the new system with both <file> and <file>.md5 transferred to it. Easily accomplished with GNU Parallel!
  • 11. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES Always perform a check-sum whenever data is transferred to another file system Use MD5 sum: md5sum <file> > <file>.md5 locally; md5sum -c <file>, from the new system with both <file> and <file>.md5 transferred to it. Check automatically: rsync or Globus
  • 12. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES Always perform a check-sum whenever data is transferred to another file system Use MD5 sum: md5sum <file> > <file>.md5 locally; md5sum -c <file>, from the new system with both <file> and <file>.md5 transferred to it. Check automatically: rsync or Globus Transfer files exactly
  • 13. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES: (ALMOST) ALWAYS USE RSYNC rsync -avP <source path> [user@system.domain:]<destination path> -a: archive mode (-rlptgoD)
  • 14. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES: (ALMOST) ALWAYS USE RSYNC rsync -avP <source path> [user@system.domain:]<destination path> -a: archive mode (-rlptgoD) -a: does not include -H, may want to add it if you used hard links (but you almost always want symbolic links instead!)
  • 15. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES: (ALMOST) ALWAYS USE RSYNC rsync -avP <source path> [user@system.domain:]<destination path> -a: archive mode (-rlptgoD) -a: does not include -H, may want to add it if you used hard links (but you almost always want symbolic links instead!) -v: verbose
  • 16. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES: (ALMOST) ALWAYS USE RSYNC rsync -avP <source path> [user@system.domain:]<destination path> -a: archive mode (-rlptgoD) -a: does not include -H, may want to add it if you used hard links (but you almost always want symbolic links instead!) -v: verbose -P: --partial --progress, but be careful with partial
  • 17. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES GLOBUS: SIMPLE & EFFECTIVE DATA TRANSFER (2017). How it works, https://www.globus.org/how-it-works
  • 18. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES GLOBUS: SIMPLE & EFFECTIVE DATA TRANSFER
  • 19. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES GLOBUS: SIMPLE & EFFECTIVE DATA TRANSFER
  • 20. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES GLOBUS: AS A SERVICE K. Chard et al., “Globus data publication as a service: lowering barriers to reproducible science”, in 2015 IEEE 11th international conference on e-Science, IEEE, 2015, pp. 401–410
  • 21. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES COMPUTE CANADA: THE TRANS-CANADA DATA HIGHWAY (2016). Compute Canada technology briefing, https://www.computecanada.ca/wp-content/uploads/2015/02/161125- Tech_Brief_PROOF_2016_EN_05.pdf
  • 22. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES MANY SITES, WITH LOTS OF STORAGE
  • 23. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 24. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 25. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 26. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES Confounded performance and reliability makes this comparison much less useful. . . C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 27. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES Confounded performance and reliability makes this comparison much less useful. . . Deconvolute when selecting a technology C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 28. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES Confounded performance and reliability makes this comparison much less useful. . . Deconvolute when selecting a technology Ensuring 100% effective reliability is paramount C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 29. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES (2017). How it works, https://www.globus.org/how-it-works. K. Chard, J. Pruyne, B. Blaiszik, et al., “Globus data publication as a service: lowering barriers to reproducible science”, in 2015 IEEE 11th international conference on e-Science, IEEE, 2015, pp. 401–410. (2016). Compute Canada technology briefing, https://www.computecanada.ca/wp- content/uploads/2015/02/161125- Tech_Brief_PROOF_2016_EN_05.pdf. C. A. Mattmann, L. Cinquini, P. Zimdars, et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016. J. Bresnahan, M. Link, G. Khanna, et al., “Globus GridFTP: what’s new in 2007”, in Proceedings of the first international conference on networks for grid applications, ser. GridNets ’07, pp. 17–19. P. Z. Kolano, “High performance reliable file transfers using automatic many-to-many parallelization”, in, I. Caragiannis, M. Alexander, R. M. Badia, et al., Eds., 2013, pp. 463–473.