Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

Efficient And Invincible Big Data Platform In LINE Slide 1 Efficient And Invincible Big Data Platform In LINE Slide 2 Efficient And Invincible Big Data Platform In LINE Slide 3 Efficient And Invincible Big Data Platform In LINE Slide 4 Efficient And Invincible Big Data Platform In LINE Slide 5 Efficient And Invincible Big Data Platform In LINE Slide 6 Efficient And Invincible Big Data Platform In LINE Slide 7 Efficient And Invincible Big Data Platform In LINE Slide 8 Efficient And Invincible Big Data Platform In LINE Slide 9 Efficient And Invincible Big Data Platform In LINE Slide 10 Efficient And Invincible Big Data Platform In LINE Slide 11 Efficient And Invincible Big Data Platform In LINE Slide 12 Efficient And Invincible Big Data Platform In LINE Slide 13 Efficient And Invincible Big Data Platform In LINE Slide 14 Efficient And Invincible Big Data Platform In LINE Slide 15 Efficient And Invincible Big Data Platform In LINE Slide 16 Efficient And Invincible Big Data Platform In LINE Slide 17 Efficient And Invincible Big Data Platform In LINE Slide 18 Efficient And Invincible Big Data Platform In LINE Slide 19 Efficient And Invincible Big Data Platform In LINE Slide 20 Efficient And Invincible Big Data Platform In LINE Slide 21 Efficient And Invincible Big Data Platform In LINE Slide 22 Efficient And Invincible Big Data Platform In LINE Slide 23 Efficient And Invincible Big Data Platform In LINE Slide 24 Efficient And Invincible Big Data Platform In LINE Slide 25 Efficient And Invincible Big Data Platform In LINE Slide 26 Efficient And Invincible Big Data Platform In LINE Slide 27 Efficient And Invincible Big Data Platform In LINE Slide 28 Efficient And Invincible Big Data Platform In LINE Slide 29 Efficient And Invincible Big Data Platform In LINE Slide 30 Efficient And Invincible Big Data Platform In LINE Slide 31 Efficient And Invincible Big Data Platform In LINE Slide 32 Efficient And Invincible Big Data Platform In LINE Slide 33 Efficient And Invincible Big Data Platform In LINE Slide 34 Efficient And Invincible Big Data Platform In LINE Slide 35 Efficient And Invincible Big Data Platform In LINE Slide 36 Efficient And Invincible Big Data Platform In LINE Slide 37 Efficient And Invincible Big Data Platform In LINE Slide 38 Efficient And Invincible Big Data Platform In LINE Slide 39 Efficient And Invincible Big Data Platform In LINE Slide 40
Upcoming SlideShare
What to Upload to SlideShare
Next

5 Likes

Share

Efficient And Invincible Big Data Platform In LINE

Neil Tu
LINE / Data Labs

LINE's services are fast-growing and continuously generating different data and logs. In today's world, data scientists are put to the test to see how they are able to quickly extract valuable information – the modern asset – from massive data sets. Moreover, providing a stable, safe, and efficient mega-platform is imperative. This session will discuss the foundation that enables LINE's Data Labs team to continuously produce data.

Related Books

Free with a 30 day trial from Scribd

See all

Efficient And Invincible Big Data Platform In LINE

  1. 1. Efficient and Invincible Big Data Platform in LINE
  2. 2. Neil Tu (杜佐民) ● Data architect and engineer ● Expert on Hadoop distributed system and its ecosystems ● 5+ years of experience in image processing, computer vision, and pattern recognition About Me
  3. 3. Agenda • Data Platforms Within LINE • Pipeline Platform • Analysis Platform • Ecosystem
  4. 4. Data Platforms Within LINE
  5. 5. Data Platforms
  6. 6. Data Platforms Big Data Data Analysis Mathematic Modeling Pipeline Machine Learning Deep Learning Etc. Protocolized Model System Integrated Streaming
  7. 7. Tracking Service
 Platform Pipeline
 Platform Analysis
 Platform Data Platforms Within LINE
  8. 8. Pipeline Platform
  9. 9. Types 30 PB 6.5M msg/sec 652
  10. 10. Service
 System ETL Protocol definition Data flow definition
  11. 11. Protocolized Data Model message ApiAccessLog { string request_id = 1; string method = 2; string path = 3; string request_ip = 4 [(EsMapping.type) = "ip"]; int32 status = 5; string contents = 6 [(EsMapping.index) = false]; string result = 7; int64 event_time = 8 [(use_as_timestamp) = true]; int64 injest_time = 9; }
  12. 12. Analysis Platform
  13. 13. Analysis Platform
  14. 14. Tables 25 PB 550 Users 1668
  15. 15. Data Infrastructure BI tool Event
 log RDBMS
 dump Other
 storages Data hub
  16. 16. Data Flow RDBMS ETL Service data Other storages
  17. 17. Real-time Query 180,000 data / sec
  18. 18. ● UI ● Security ● Local backup Nifi
  19. 19. Ecosystem
  20. 20. etc. Oasis
  21. 21. https://github.com/yanagishima/yanagishima Yanagishima
  22. 22. LINE Analytics Reporting Tool
  23. 23. Interactive Data Analytics Tool Oasis
  24. 24. Data Catalog Tool Aquarium
  25. 25. Data Catalog Tool Aquarium
  26. 26. Aquarium Data Catalog Tool
  27. 27. Security
  28. 28. Office authentication Private authentication and authorization Gateway server Client server
  29. 29. Sign-up web UI HDFS
 user home directory Registration WF
  30. 30. Monitoring
  31. 31. ● JVM ● Net traffic ● Disk capacity ● etc. Basic Monitoring
  32. 32. ● Small files ● Cluster usage per user ● Disk usage ● Blocks ● Empty files ● etc. Cluster Monitoring
  33. 33. Third Namenode NN1 NN2 NN3 JN1 JN2 JN3 Always on standby Real-time metadata
  34. 34. Tuning
  35. 35. YARN ● yarn.log-aggregation.retain-check-interval-seconds=86400 ● yarn.log-aggregation.retain-seconds=172800 Basic Tuning Spark ● spark.history.fs.cleaner.enabled=true ● spark.history.fs.cleaner.interval=1d ● spark.history.fs.cleaner.maxAge=2d
  36. 36. Hive ● hive.merge.mapredfiles=true ● hive.merge.smallfiles.avgsize=128000000 ● mapreduce.input.fileinputformat.split.maxsize=2147483648 ● mapreduce.input.fileinputformat.split.minsize=134217728 ● mapreduce.input.fileinputformat.split.minsize.per.node=134217728 ● mapreduce.input.fileinputformat.split.minsize.per.rack=134217728 hive> ALTER TABLE xxx PARTITION (dt='19840312') CONCATENATE; Basic Tuning
  37. 37. Conclusion
  38. 38. What is required? ● Be patient How to achieve results? ● Trial and error ● Never give up Running a Platform
  39. 39. THANK YOU
  • Hideki_Nomura

    Feb. 14, 2020
  • fantasylight

    Nov. 26, 2019
  • viperbjpn

    Jul. 22, 2019
  • Zengzhen2

    Dec. 31, 2018
  • eniton

    Dec. 1, 2018

Neil Tu LINE / Data Labs LINE's services are fast-growing and continuously generating different data and logs. In today's world, data scientists are put to the test to see how they are able to quickly extract valuable information – the modern asset – from massive data sets. Moreover, providing a stable, safe, and efficient mega-platform is imperative. This session will discuss the foundation that enables LINE's Data Labs team to continuously produce data.

Views

Total views

1,797

On Slideshare

0

From embeds

0

Number of embeds

769

Actions

Downloads

0

Shares

0

Comments

0

Likes

5

×