Distcp-ng is a tool for replicating massive datasets between Hadoop compatible file systems using Gobblin. It provides features like continuous replication of datasets, efficient file listing to reduce filesystem calls, dataset awareness for prioritization and notifications, failure isolation, and operational metrics. The architecture allows for copying between different source and target systems like HDFS, SFTP, and Hive. It also supports transformations, atomic publishing, and Hive table registration during copies. Future work includes building Distcp as a continuous replication service.