SlideShare a Scribd company logo
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
LAB MEETING—TECHNICAL TALK
TRANSFERRING DATA
BEST PRACTICES, GLOBUS ONLINE, AND
COMPUTE CANADA INFRASTRUCTURE
Coby Viner
Hoffman Lab
Wednesday January 18, 2017
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
Need robust (exact) data transfer
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
Need robust (exact) data transfer
Want transfers to be efficient
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
A NEED FOR EFFICIENT AND ROBUST DATA
TRANSFER
Often need to transfer large amounts of data
Bringing computation to large datasets is not always
feasible
Often data is “portable”, software and pipelines are not
Need robust (exact) data transfer
Want transfers to be efficient
Might want or need transfers to be secure
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
Easily accomplished with GNU Parallel!
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
Check automatically: rsync or Globus
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES
Always perform a check-sum whenever data is transferred
to another file system
Use MD5 sum: md5sum <file> > <file>.md5 locally;
md5sum -c <file>, from the new system with both
<file> and <file>.md5 transferred to it.
Check automatically: rsync or Globus
Transfer files exactly
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
-a: does not include -H, may want to add it if you used
hard links (but you almost always want symbolic links
instead!)
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
-a: does not include -H, may want to add it if you used
hard links (but you almost always want symbolic links
instead!)
-v: verbose
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
BEST PRACTICES: (ALMOST) ALWAYS USE
RSYNC
rsync -avP <source path>
[user@system.domain:]<destination path>
-a: archive mode (-rlptgoD)
-a: does not include -H, may want to add it if you used
hard links (but you almost always want symbolic links
instead!)
-v: verbose
-P: --partial --progress, but be careful with partial
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
GLOBUS: SIMPLE & EFFECTIVE DATA
TRANSFER
(2017). How it works, https://www.globus.org/how-it-works
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
GLOBUS: SIMPLE & EFFECTIVE DATA
TRANSFER
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
GLOBUS: SIMPLE & EFFECTIVE DATA
TRANSFER
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
GLOBUS: AS A SERVICE
K. Chard et al., “Globus data publication as a service: lowering
barriers to reproducible science”, in 2015 IEEE 11th
international conference on e-Science, IEEE, 2015,
pp. 401–410
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
COMPUTE CANADA: THE TRANS-CANADA
DATA HIGHWAY
(2016). Compute Canada technology briefing,
https://www.computecanada.ca/wp-content/uploads/2015/02/161125-
Tech_Brief_PROOF_2016_EN_05.pdf
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
MANY SITES, WITH LOTS OF STORAGE
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
C. A. Mattmann et al., “A topical evaluation and discussion of
data movement technologies for data-intensive scientific
applications”, Earth Science Informatics, vol. 9, no. 2,
pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
Confounded performance and reliability makes this
comparison much less useful. . .
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
Confounded performance and reliability makes this
comparison much less useful. . .
Deconvolute when selecting a technology
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
AN EMPIRICAL COMPARISON OF DATA
TRANSFER TECHNOLOGIES
Confounded performance and reliability makes this
comparison much less useful. . .
Deconvolute when selecting a technology
Ensuring 100% effective reliability is paramount
C. A. Mattmann et al., “A topical evaluation and discussion of data movement
technologies for data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
MOTIVATION
BEST PRACTICES
GLOBUS
CC
TRANS-CANADA
DATA TRANSFER
TRANSFER TECH.
EXPERIMENT
REFERENCES
(2017). How it works, https://www.globus.org/how-it-works.
K. Chard, J. Pruyne, B. Blaiszik, et al., “Globus data publication
as a service: lowering barriers to reproducible science”, in 2015
IEEE 11th international conference on e-Science, IEEE, 2015,
pp. 401–410.
(2016). Compute Canada technology briefing,
https://www.computecanada.ca/wp-
content/uploads/2015/02/161125-
Tech_Brief_PROOF_2016_EN_05.pdf.
C. A. Mattmann, L. Cinquini, P. Zimdars, et al., “A topical
evaluation and discussion of data movement technologies for
data-intensive scientific applications”, Earth Science
Informatics, vol. 9, no. 2, pp. 247–262, 2016.
J. Bresnahan, M. Link, G. Khanna, et al., “Globus GridFTP:
what’s new in 2007”, in Proceedings of the first international
conference on networks for grid applications, ser. GridNets ’07,
pp. 17–19.
P. Z. Kolano, “High performance reliable file transfers using
automatic many-to-many parallelization”, in, I. Caragiannis,
M. Alexander, R. M. Badia, et al., Eds., 2013, pp. 463–473.

More Related Content

What's hot

Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Guido Schmutz
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detection
hadooparchbook
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Nathan Bijnens
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
DataWorks Summit
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
Nathan Bijnens
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Sid Anand
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
hadooparchbook
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
MapR Technologies
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
Nathan Bijnens
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
Amazon Web Services
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platform
hadooparchbook
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
Nan Zhu
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
hadooparchbook
 
Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-full
Jim Dowling
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
Jeffrey Sica
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
DataWorks Summit
 
Data Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloningData Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloning
Kyle Hailey
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
 

What's hot (20)

Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detection
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
 
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese)  - QCon TokyoCloud Native Data Pipelines (in Eng & Japanese)  - QCon Tokyo
Cloud Native Data Pipelines (in Eng & Japanese) - QCon Tokyo
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platform
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
 
Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-full
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Data Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloningData Virtualization: revolutionizing database cloning
Data Virtualization: revolutionizing database cloning
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 

Similar to Transferring data: best practices, Globus Online, and Compute Canada infrastructure

WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
Frank Pfleger
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
Shivji Kumar Jha
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
Chris Fregly
 
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIOptimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Data Con LA
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
Amazon Web Services
 
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLY
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLYENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLY
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLY
CDNetworks
 
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
Amazon Web Services
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentation
Forward Networks
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentation
Andrew Wesbecher
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
Chris Fregly
 
Digital Transformation | AWS Webinar
Digital Transformation | AWS WebinarDigital Transformation | AWS Webinar
Digital Transformation | AWS Webinar
Amazon Web Services
 
AWS Webcast - AWS OpsWorks Continuous Integration Demo
AWS Webcast - AWS OpsWorks Continuous Integration Demo  AWS Webcast - AWS OpsWorks Continuous Integration Demo
AWS Webcast - AWS OpsWorks Continuous Integration Demo
Amazon Web Services
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
Chris Fregly
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Chris Fregly
 
Embrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleEmbrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with Ripple
Sean Cribbs
 
Humans and Data Don’t Mix: Best Practices to Secure Your Cloud
Humans and Data Don’t Mix: Best Practices to Secure Your CloudHumans and Data Don’t Mix: Best Practices to Secure Your Cloud
Humans and Data Don’t Mix: Best Practices to Secure Your Cloud
Priyanka Aash
 
Patterns & Practices of Microservices
Patterns & Practices of MicroservicesPatterns & Practices of Microservices
Patterns & Practices of Microservices
Wesley Reisz
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
Kirill Osipov
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
Ian Foster
 
High Performance Computing - A Serverless Story
High Performance Computing - A Serverless StoryHigh Performance Computing - A Serverless Story
High Performance Computing - A Serverless Story
Eoin Shanaghy
 

Similar to Transferring data: best practices, Globus Online, and Compute Canada infrastructure (20)

WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
 
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AIOptimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLY
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLYENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLY
ENSURING FAST AND SECURE GAMING APPLICATION DOWNLOADS GLOBALLY
 
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentation
 
Forward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentationForward Networks - Networking Field Day 13 presentation
Forward Networks - Networking Field Day 13 presentation
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
 
Digital Transformation | AWS Webinar
Digital Transformation | AWS WebinarDigital Transformation | AWS Webinar
Digital Transformation | AWS Webinar
 
AWS Webcast - AWS OpsWorks Continuous Integration Demo
AWS Webcast - AWS OpsWorks Continuous Integration Demo  AWS Webcast - AWS OpsWorks Continuous Integration Demo
AWS Webcast - AWS OpsWorks Continuous Integration Demo
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
Embrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleEmbrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with Ripple
 
Humans and Data Don’t Mix: Best Practices to Secure Your Cloud
Humans and Data Don’t Mix: Best Practices to Secure Your CloudHumans and Data Don’t Mix: Best Practices to Secure Your Cloud
Humans and Data Don’t Mix: Best Practices to Secure Your Cloud
 
Patterns & Practices of Microservices
Patterns & Practices of MicroservicesPatterns & Practices of Microservices
Patterns & Practices of Microservices
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
High Performance Computing - A Serverless Story
High Performance Computing - A Serverless StoryHigh Performance Computing - A Serverless Story
High Performance Computing - A Serverless Story
 

More from Hoffman Lab

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talk
Hoffman Lab
 
TCRpower
TCRpowerTCRpower
TCRpower
Hoffman Lab
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with gget
Hoffman Lab
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome Browser
Hoffman Lab
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network Tunnel
Hoffman Lab
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seaborn
Hoffman Lab
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)
Hoffman Lab
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
Hoffman Lab
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and Rmdformats
Hoffman Lab
 
File searching tools
File searching toolsFile searching tools
File searching tools
Hoffman Lab
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for Zotero
Hoffman Lab
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and Bioawk
Hoffman Lab
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and Shells
Hoffman Lab
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/Acronym
Hoffman Lab
 
Linters in R
Linters in RLinters in R
Linters in R
Hoffman Lab
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biology
Hoffman Lab
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With Git
Hoffman Lab
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
Hoffman Lab
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...
Hoffman Lab
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2
Hoffman Lab
 

More from Hoffman Lab (20)

GNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talkGNU Parallel: Lab meeting—technical talk
GNU Parallel: Lab meeting—technical talk
 
TCRpower
TCRpowerTCRpower
TCRpower
 
Efficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with ggetEfficient querying of genomic reference databases with gget
Efficient querying of genomic reference databases with gget
 
WashU Epigenome Browser
WashU Epigenome BrowserWashU Epigenome Browser
WashU Epigenome Browser
 
Wireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network TunnelWireguard: A Virtual Private Network Tunnel
Wireguard: A Virtual Private Network Tunnel
 
Plotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seabornPlotting heatmap with matplotlib/seaborn
Plotting heatmap with matplotlib/seaborn
 
Go Get Data (GGD)
Go Get Data (GGD)Go Get Data (GGD)
Go Get Data (GGD)
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
 
R markdown and Rmdformats
R markdown and RmdformatsR markdown and Rmdformats
R markdown and Rmdformats
 
File searching tools
File searching toolsFile searching tools
File searching tools
 
Better BibTeX (BBT) for Zotero
Better BibTeX (BBT) for ZoteroBetter BibTeX (BBT) for Zotero
Better BibTeX (BBT) for Zotero
 
Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and Bioawk
 
Terminals and Shells
Terminals and ShellsTerminals and Shells
Terminals and Shells
 
BioRender & Glossary/Acronym
BioRender & Glossary/AcronymBioRender & Glossary/Acronym
BioRender & Glossary/Acronym
 
Linters in R
Linters in RLinters in R
Linters in R
 
BioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biologyBioSyntax: syntax highlighting for computational biology
BioSyntax: syntax highlighting for computational biology
 
Get Good With Git
Get Good With GitGet Good With Git
Get Good With Git
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...MultiQC: summarize analysis results for multiple tools and samples in a singl...
MultiQC: summarize analysis results for multiple tools and samples in a singl...
 
dreamRs: interactive ggplot2
dreamRs: interactive ggplot2dreamRs: interactive ggplot2
dreamRs: interactive ggplot2
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 

Transferring data: best practices, Globus Online, and Compute Canada infrastructure

  • 1. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES LAB MEETING—TECHNICAL TALK TRANSFERRING DATA BEST PRACTICES, GLOBUS ONLINE, AND COMPUTE CANADA INFRASTRUCTURE Coby Viner Hoffman Lab Wednesday January 18, 2017
  • 2. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data
  • 3. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data Bringing computation to large datasets is not always feasible
  • 4. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data Bringing computation to large datasets is not always feasible Often data is “portable”, software and pipelines are not
  • 5. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data Bringing computation to large datasets is not always feasible Often data is “portable”, software and pipelines are not Need robust (exact) data transfer
  • 6. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data Bringing computation to large datasets is not always feasible Often data is “portable”, software and pipelines are not Need robust (exact) data transfer Want transfers to be efficient
  • 7. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES A NEED FOR EFFICIENT AND ROBUST DATA TRANSFER Often need to transfer large amounts of data Bringing computation to large datasets is not always feasible Often data is “portable”, software and pipelines are not Need robust (exact) data transfer Want transfers to be efficient Might want or need transfers to be secure
  • 8. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES Always perform a check-sum whenever data is transferred to another file system
  • 9. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES Always perform a check-sum whenever data is transferred to another file system Use MD5 sum: md5sum <file> > <file>.md5 locally; md5sum -c <file>, from the new system with both <file> and <file>.md5 transferred to it.
  • 10. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES Always perform a check-sum whenever data is transferred to another file system Use MD5 sum: md5sum <file> > <file>.md5 locally; md5sum -c <file>, from the new system with both <file> and <file>.md5 transferred to it. Easily accomplished with GNU Parallel!
  • 11. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES Always perform a check-sum whenever data is transferred to another file system Use MD5 sum: md5sum <file> > <file>.md5 locally; md5sum -c <file>, from the new system with both <file> and <file>.md5 transferred to it. Check automatically: rsync or Globus
  • 12. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES Always perform a check-sum whenever data is transferred to another file system Use MD5 sum: md5sum <file> > <file>.md5 locally; md5sum -c <file>, from the new system with both <file> and <file>.md5 transferred to it. Check automatically: rsync or Globus Transfer files exactly
  • 13. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES: (ALMOST) ALWAYS USE RSYNC rsync -avP <source path> [user@system.domain:]<destination path> -a: archive mode (-rlptgoD)
  • 14. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES: (ALMOST) ALWAYS USE RSYNC rsync -avP <source path> [user@system.domain:]<destination path> -a: archive mode (-rlptgoD) -a: does not include -H, may want to add it if you used hard links (but you almost always want symbolic links instead!)
  • 15. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES: (ALMOST) ALWAYS USE RSYNC rsync -avP <source path> [user@system.domain:]<destination path> -a: archive mode (-rlptgoD) -a: does not include -H, may want to add it if you used hard links (but you almost always want symbolic links instead!) -v: verbose
  • 16. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES BEST PRACTICES: (ALMOST) ALWAYS USE RSYNC rsync -avP <source path> [user@system.domain:]<destination path> -a: archive mode (-rlptgoD) -a: does not include -H, may want to add it if you used hard links (but you almost always want symbolic links instead!) -v: verbose -P: --partial --progress, but be careful with partial
  • 17. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES GLOBUS: SIMPLE & EFFECTIVE DATA TRANSFER (2017). How it works, https://www.globus.org/how-it-works
  • 18. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES GLOBUS: SIMPLE & EFFECTIVE DATA TRANSFER
  • 19. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES GLOBUS: SIMPLE & EFFECTIVE DATA TRANSFER
  • 20. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES GLOBUS: AS A SERVICE K. Chard et al., “Globus data publication as a service: lowering barriers to reproducible science”, in 2015 IEEE 11th international conference on e-Science, IEEE, 2015, pp. 401–410
  • 21. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES COMPUTE CANADA: THE TRANS-CANADA DATA HIGHWAY (2016). Compute Canada technology briefing, https://www.computecanada.ca/wp-content/uploads/2015/02/161125- Tech_Brief_PROOF_2016_EN_05.pdf
  • 22. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES MANY SITES, WITH LOTS OF STORAGE
  • 23. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 24. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 25. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 26. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES Confounded performance and reliability makes this comparison much less useful. . . C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 27. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES Confounded performance and reliability makes this comparison much less useful. . . Deconvolute when selecting a technology C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 28. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES AN EMPIRICAL COMPARISON OF DATA TRANSFER TECHNOLOGIES Confounded performance and reliability makes this comparison much less useful. . . Deconvolute when selecting a technology Ensuring 100% effective reliability is paramount C. A. Mattmann et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016
  • 29. LAB MEETING— TECHNICAL TALK COBY VINER MOTIVATION BEST PRACTICES GLOBUS CC TRANS-CANADA DATA TRANSFER TRANSFER TECH. EXPERIMENT REFERENCES (2017). How it works, https://www.globus.org/how-it-works. K. Chard, J. Pruyne, B. Blaiszik, et al., “Globus data publication as a service: lowering barriers to reproducible science”, in 2015 IEEE 11th international conference on e-Science, IEEE, 2015, pp. 401–410. (2016). Compute Canada technology briefing, https://www.computecanada.ca/wp- content/uploads/2015/02/161125- Tech_Brief_PROOF_2016_EN_05.pdf. C. A. Mattmann, L. Cinquini, P. Zimdars, et al., “A topical evaluation and discussion of data movement technologies for data-intensive scientific applications”, Earth Science Informatics, vol. 9, no. 2, pp. 247–262, 2016. J. Bresnahan, M. Link, G. Khanna, et al., “Globus GridFTP: what’s new in 2007”, in Proceedings of the first international conference on networks for grid applications, ser. GridNets ’07, pp. 17–19. P. Z. Kolano, “High performance reliable file transfers using automatic many-to-many parallelization”, in, I. Caragiannis, M. Alexander, R. M. Badia, et al., Eds., 2013, pp. 463–473.