Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Practice of Alluxio in Ctrip Bigdata Platform

385 views

Published on

10/27/2018

Published in: Software
  • Be the first to comment

  • Be the first to like this

The Practice of Alluxio in Ctrip Bigdata Platform

  1. 1. Alluxio
  2. 2. OPS/IT/CC • Spark • • Spark Sql Bug fix
  3. 3. OPS/IT/CC • • • Alluxio •
  4. 4. OPS/IT/CC -   A - JStorm Spark* Streaming Flink Hermes**KafkaHadoop Hive Spark HBase Presto Kylin Alluxio Adhoc
  5. 5. OPS/IT/CC 1600+ 5 30 Application/ 90PB4/4100PB :ns1:47 ns2:41.5 200TB 440 1000+ Spark:hive 1:2 13+ JStorm:4150+ 350+4Application4 Spark4Streaming:450+ 150+4Application Flink:410+ 20+4Application
  6. 6. OPS/IT/CC 1. Spark(Streaming NameNode RM , NameNode 10 Trash , Spark Streaming , Spark Streaming :( Spark(Streaming
  7. 7. OPS/IT/CC 2. Kerberos) KDC : read(hdfs://ns/xxx) save(hdfs://ns/xxx):)RDD DStream spark.sql(“insert)tableName”)
  8. 8. OPS/IT/CC 2.1 read save() Spark : spark.yarn.access.namenodes or5spark.yarn.access.hadoopFileSystems ns;prod55 ns5 , ResourceManager ns
  9. 9. OPS/IT/CC 2.2 spark.sql(“hive0insert0into0or0overwrite0sql ”) ns namespace , Hive namespace ns 1. hdfs>site.xml, ns, 2. spark>defaults.conf 3. fs.defaultFS ,yarn spark , ,
  10. 10. OPS/IT/CC 3.NameNode metadata RPC
  11. 11. OPS/IT/CC 3.1 metaData Federation, yarn, Spark ns2 merge (6.7 ) Hive ttl (77PB) tmp (50 ) :
  12. 12. OPS/IT/CC 3.1 metaData , Spark streaming Streaming : 1. , 2. . Streaming ,
  13. 13. OPS/IT/CC 3.2 RPC yarn+app+log jobhistory yarn6staging6dir spark.yarn.stagingDir spark.history.fs.logDirectory spark.eventLog.dir spark.yarn.archive ns2 ns1 RPC Mover Balancer RPC Standby
  14. 14. OPS/IT/CC 1. Spark( 2. Streaming 3. NN RPC
  15. 15. OPS/IT/CC RT#HDFS Prod#HDFS Alluxio Spark#SQLSpark#Streaming Kafka hermes qmq mysql ES etl report
  16. 16. OPS/IT/CC 1. Alluxio unified-namespace , mount , mount Spark Alluxio , alluxio client jar, fs.alluxio.impl alluxio FileSystem alluxio://master:port/ data1 data2 adhoc report art Hdfs://ns2/ art adminadmin Hdfs://ns1/ adhoc report mount mount
  17. 17. OPS/IT/CC 2. Streaming Alluxio TTL3 Streaming Alluxio HDFS Super HDFS 777 , mount , HDFS Streaming , , .
  18. 18. OPS/IT/CC 3. NameNode RPC Cache Alluxio , NameNode RPC ,Alluxio Spark Sql : hive;testbench (https://github.com/gjhkael/hive;testbench) : Spark2.2.0 on yarn,H5 nodemanager,H hive :20GB : spark.executor.memory 8g spark.executor.cores 4 spark.executor.instances 5
  19. 19. OPS/IT/CC 3. Spark Sql : 1.5 query3 query43 query55 query58 query73 query82 HDFS 29.441$ 31.116$ 34.71 68.412$ 19.98 28.438 Alluxio 23.267$ 12.431 12.097 45.339 17.342 8.416
  20. 20. OPS/IT/CC Alluxio1.4 : 2"master"+"4Worker 400GB"Mem 800GB"HDD
  21. 21. OPS/IT/CC 1. Alluxio1.4 impersonate, HDFS Super alluxio.security.login.username Alluxio HDFS 777 HDFS mount
  22. 22. OPS/IT/CC 2.#TTL Free 1.4 Alluxio TTL Streaming , TTL , , Free Path TTL , Free# , TtlAction DELETE, FREE; CreateFileOptions SetAttributeOptions
  23. 23. OPS/IT/CC 3.# Alluxio HDFS Load Alluxio HDFS : makeConsistency ,# ,
  24. 24. OPS/IT/CC 1. Presto Adhoc 2.
  25. 25. OPS/IT/CC Q&A Thanks

×