Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cross-DC Fault-Tolerant ViewFileSystem @ Twitter

427 views

Published on

Cross-DC Fault-Tolerant ViewFileSystem @ Twitter

Published in: Technology
  • Be the first to comment

Cross-DC Fault-Tolerant ViewFileSystem @ Twitter

  1. 1. ○ ○ ○ ○
  2. 2. ○ ○ ○ ○
  3. 3. Command Line Interface hadoop —config /etc/hadoop/revenue-dcA fs -ls / hadoop —config /etc/hadoop/dw-dcB fs -ls / …… Directory layout on local file system /etc/hadoop revenue-dcA dw-dcB core-site.xml, hdfs-site.xml Log processing MR YARN HDFS Revenue MR YARN HDFS Data Warehouse MR YARN HDFS Log processing MR YARN HDFS DataCenter A DataCenter B
  4. 4. ○ ○ ○
  5. 5. ○ ○ ○ ○ ○
  6. 6. DW @ dcB HDFS blocks Client machine: /etc/hadoop/dw-dcB/hdfs-site.xml, core-site.xml
  7. 7. DW @ dcB HDFS blocks Client machine: hadoop -config /etc/hadoop/dw-dcB fs -get /user/bob/fileB Distributed FileSystem fs.defaultFS -> hdfs://dw-nn-dcB -> NameNode1, NameNode2 /etc/hadoop/dw-dcB/hdfs-site.xml, core-site.xml
  8. 8. DW @ dcB
  9. 9. App paths HDFS paths viewfs://dw-nn-dcB/var hdfs://dw-tmp-dcB/var viewfs://dw-nn-dcB/user hdfs://dw-user-dcB/user viewfs://dw-nn-dcB/logs hdfs://dw-log-dcB/logs mountable.xml FileSystem ViewFileSystem (viewfs://) Distributed FileSystem (hdfs://) S3 FileSystem
  10. 10. Client machine: hadoop -config /etc/hadoop/dw-dcB fs -get /user/bob/fileB Distributed FileSystem dw-user-dcB -> dw-user-NN1-dcB, dw-user-NN2-dcB ViewFileSystem /user -> hdfs://dw-user-dcB/user /etc/hadoop/dw-dcB/hdfs-site.xml, core-site.xml, mountable.xml DW @ dcB
  11. 11. ○ ○ Revenue Revenue MR YARN HDFS Data Warehouse Revenue MR YARN HDFS ….. DataCenter A DataCenter B
  12. 12. hadoop —config /etc/hadoop/revenue-dcA fs -get /user/bob/fileA
  13. 13. hadoop —config /etc/hadoop/revenue-dcA fs -get /user/bob/fileA hdfs --config /etc/hadoop/dw-dcB fsck -fs hdfs://dw-user-dc2 /
  14. 14. hadoop —config /etc/hadoop/revenue-dcA fs -get /user/bob/fileA hdfs --config /etc/hadoop/dw-dcB fsck -fs hdfs://dw-user-dc2 / // find all “fileC” files on all clusters for i in `ls /etc/hadoop`; do hadoop --config /etc/hadoop/$i fs -ls fileC; done
  15. 15.
  16. 16. ○ Description $sourcePath Result global namespace viewfs://revenue-nn-dcA/logs/dirB /logs/dirB Source path unresolvable
  17. 17. ○ Description $sourcePath Result global namespace viewfs://revenue-nn-dcA/logs/dirB /logs/dirB Source path unresolvable Active namenode hdfs://revenue-log-nn1-dcA/logs/dirB /logs/dirB not reliable due to hard coded namenode
  18. 18. ○ Description $sourcePath Result global namespace viewfs://revenue-nn-dcA/logs/dirB /logs/dirB Source path unresolvable Active namenode hdfs://revenue-log-nn1-dcA/logs/dirB /logs/dirB not reliable due to hard coded namenode hftp & DNS alias hftp://revenue-nn-dcA/logs/dirB hftp reliability and efficiency
  19. 19. ○ ○ ○
  20. 20. viewfs://cluster-dc/ user tmp log
  21. 21. ClusterA DC1 root /
  22. 22. ○ ○
  23. 23. scalding> fsShell("-ls /dc*/") // list all clusters Found N items ...
  24. 24. // no need to remember copy(From/To)Local aka get/put scalding> fsShell("-cp /local/user/cecily/file.txt /user/cecily/hdfs_file.txt") res6: Int = 0
  25. 25. user user Forward FileSystem API calls to all target filesystems with certain policies.
  26. 26. c@dc1 FileSystem nfly = FileSystem.get(conf); out = nfly.create(“/nfly/C/user/cecily/file.txt”); out.write(....); out.close() BEGIN out.close() END In = nfly.open(“/nfly/C/user/cecily/file.txt”); c@dc2 /user/cecily/_nfly_tmp_file.txt /user/cecily/_nfly_tmp_file.txt close, rename to /user/cecily/file.txt close, rename to /user/cecily/file.txt client@dc1
  27. 27. ○ ○ ○ ○

×