Practical Petabyte Pushing

chris@bioteam.net / @chris_dag
PRACTICAL PETABYTE PUSHING
Jan 2019 / Lightning Talk / Foundation Medicine
Boston Computational Biology and Bioinformatics Meetup
Chris Dagdigian; chris@bioteam.net

30 Second Background
● 24x7 Production HPC Environment
● 100s of user accounts; 10+ power users; 50+ frequent users
● Many integrated “cluster aware” commercial apps leverage this system
● ~2 petabytes scientific & user data (Linux & Windows clients)
● Multiple catastrophic NAS outages in 2018
○ Demoralized scientists; shell-shocked IT staff; angry management
○ Replacement storage platform procured; 100% NAS-to-NAS migration ordered
● Mandate / Mission - 2 petabyte live data migration
○ IT must re-earn trust and confidence of scientific end-users & leadership
○ User morale/confidence is low; Stability/Uptime is key; Zero Unplanned Outages
○ “Jobs must flow” -- HPC remains in production during data migration

1. NEVER comingle “data management” & “data movement” at same time
Cleanup/manage your data BEFORE or AFTER; never DURING
2. Understand upfront vendor-specific data protection overhead (small files esp)
New NAS needed +20% more raw disk to store the same data, a non-trivial CapEx cost at petascale
3. Interrogate/Understand your data before you move it (or buy new storage!)
Massive replication bandwidth is meaningless if you have 200+ million tiny files;
This was our real-world data movement bottleneck
Lightning Talk ProTip: CONCLUSIONS FIRST
Things we already knew + things we wished we knew beforehand

Lightning Talk ProTip: CONCLUSIONS FIRST
4. Be proactive in setting (and re-setting) management expectations
Data transfer time estimates based off of aggregate network bandwidth were
insanely wrong. Real world throughput range was: [ 2mb/sec -- 13GB/sec ]
5. Tasks that take days/weeks require visibility & transparency
Users & management will want a dashboard or progress view
6. Work against full filesystems or network shares ONLY (See tip #1 …)
Attempts to get clever with curated “exclude-these-files-and-folders” lists add
complexity and introduce vectors for human/operator error

Materials & Methods - Tooling
Tooling
● We are not special/unique in life science informatics - plagiarizing methods
from Amazon, supercomputing sites & high-energy physics is a legit strategy
● Our tooling choice: fpart/fpsync from https://github.com/martymac/fpart
○ ‘fpart’ - Does the hard work of filesystem crawling to build ‘partition’ lists that can be used as
input data for whatever tool you want to use to replicate/copy data
○ ‘fpsync’ - Wrapper script to parallelize, distribute and manage a swarm of replication jobs
○ ‘rsync’ - https://rsync.samba.org/
● Actual data replication via ‘rsync’ (managed by fpsync)
○ fpsync wrapper script is pluggable and supports different data mover/copy binaries
○ We explicitly chose ‘rsync’ because it is well known, well tested and had the least amount of
potential edge and corner-cases to deal with

Materials & Methods - Process
The Process (one filesystem or share at a time):
● [A] Perform initial full replication in background on live “in-use” file system
● [B] Perform additional ‘re-sync’ replications to stay current
● [C] Perform ‘delete pass’ sync to catch data that was deleted from source filesystem while
replication(s) were occuring
● Repeat tasks [B] and [C] until time window for full sync + delete-pass is small enough to fit
within an acceptable maintenance/outage window
● Schedule outage window; make source filesystem Read-Only at a global level; perform final
replication sync; migrate client mounts; have backout plan handy
● Test, test, test, test, test, test (admins & end-users should both be involved testing)
● Have a plan to document & support the previously unknown storage users that will come out of the
woodwork once you mark the source filesystem read/only (!)

Wrap Up
Commercial Alternative
● If management requires fancy live dashboards & other UI candy --OR-- you have limited IT/ops support available for
scripted OSS tooling support …
● You can purchase petascale data migration capability commercially
○ Recommendation: Talk to DataDobi (https://datadobi.com)
○ (Yes this is a different niche than IBM Aspera or GridFTP type tooling …)
Acknowledgements
● Aaron Gardner (aaron@bioteam.net)
○ One of several Bioteam infrastructure gurus with extreme storage & filesystem expertise
○ He did the hard work on this
○ I just scripted things & monitored progress #lazy
More Info/Details: If you want to see this topic expanded into a long-form blog post / technical write-up
or BioITWorld conference talk then please let me know via email!

Practical Petabyte Pushing

More Related Content

What's hot

Similar to Practical Petabyte Pushing

More from Chris Dagdigian

Recently uploaded

Practical Petabyte Pushing