Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

LINE's Infrastructure Platform: How It Scales Massive Services and Maintains Low Operational Cost

1,507 views

Published on

Yoshihiro Saegusa
LINE / Infra Platform Department

LINE operates over 30,000 physical servers that collectively handle over 1Tbps of internet traffic.

Approximately 100 engineers are supporting LINE's infrastructure, upon which 2,100 software engineers are building LINE's services at more than 10 development centers around the world.

This session will explore the ways in which LINE's massive infrastructure is being scaled, and how software and infrastructure engineers are able to work with greater ease.

Topics will include network design for massive traffic handling, resource management, and a deeper look at the issues that were resolved to provide highly available and scalable infrastructure.

Published in: Technology
  • Be the first to comment

LINE's Infrastructure Platform: How It Scales Massive Services and Maintains Low Operational Cost

  1. 1. LINE'S INFRASTRUCTURE PLATFORM: HOW IT SCALES MASSIVE SERVICES AND MAINTAINS LOW OPERATIONAL COST Infrastructure Platform Department Yoshihiro Saegusa
  2. 2. • Now responsible for infrastructure platform (private cloud) • Joined NHN Japan in April 2005 as a network engineer • Was responsible for networks and data centers until June 2017 ABOUT ME Yoshihiro Saegusa
  3. 3. LINE INFRASTRUCTURE SCOPE Data center Server (H/W) Network (H/W) Infrastructure platform Bare metal VM Operating system Libraries, binaries Applications Data store • Database • Storage
  4. 4. SESSION SCOPE Data center Server (H/W) Network (H/W) Infrastructure platform Bare metal VM Operating system Libraries, binaries Applications
  5. 5. LINE SCALE User traffic: 1Tbps+ Physical servers: 30,000+ Engineers: 2,000+ Dev centers: 10+
  6. 6. • High operational cost • Lack of capacity • Inefficient architecture CHALLENGES
  7. 7. CHALLENGES ・・・ • 3,200 servers (10Gbps / server) • 16,000 Gbps of capacity • 2N redundancy Network POD Server ToR Distribution switch POD Scale • Lack of capacity • Inefficient architecture
  8. 8. CHALLENGES 165,000,000+ MAU Messaging gatewaysLoad balancersLINE users • Lack of capacity • Inefficient architecture DSR
  9. 9. CHALLENGES High Operational Cost Product Team A Product Team B Product Team C Data center Server (H/W) Network (H/W) Bare metal VM Operating system Data store REQ REQ REQ Dev teams Infrastructure teams
  10. 10. OUR PRINCIPLES Solve a problem at its root Reduce operational cost
  11. 11. Basic Requirements SOLUTIONS • Provide extremely high-capacity network • Build horizontally scalable architecture • Significantly reduce operational cost
  12. 12. SOLUTIONS CLOS Network with Whitebox Switches ・・・ ・・・ Non-blocking large-scale network POD 7,200 servers Server Top of Rack (ToR) Leaf Spine 72,000Gbps capacity
  13. 13. SOLUTIONS ToR ~ Server Configuration Option 1: L2 Option 2: L3 MC-LAG BONDING BGP BGP VS
  14. 14. CLOS NETWORK EXAMPLE 2 Network PODs with 948 Whitebox Switches 7,200 servers 7,200 servers
  15. 15. SOLUTIONS Multi-tier Load Balancing L4LINE users L3 Anycast + ECMP ? Stateful load balancing L7 (Messaging gateways)
  16. 16. SOLUTIONS Multi-tier Stateless Load Balancing L7 (Messaging gateways)L4LINE users L3 Anycast + ECMP ConsistentHash(5-Tuple) XDP Stateless load balancing
  17. 17. SOLUTIONS Reduce Operational Cost Data center Server (H/W) Network (H/W) Bare metal VM Operating system Data store Infrastructure platform API,WebUI Product Team A Product Team B Product Team C Dev Teams Infrastructure Teams
  18. 18. SOLUTIONS Resource Distribution • 4 VMs • 1 VIP VM VM VM VM VIP VIP VIP
  19. 19. SOLUTIONS Network Configuration Network A Network B Network C Network D Network POD with legacy configuration Network POD with new configuration VM VM
  20. 20. SOLUTIONS Making a Service Public VIP VIP VIP Create a VIP with public IP REQ ACL
  21. 21. LESSON LEARNED Major Change Required Change Takes Time
  22. 22. OVERLAY NETWORK & PROGRAMMABILITY SRv6 SRv6 underlay Tenant A Tenant B Tenant C FinTech business WIP
  23. 23. GW API Project A Project B Project C WIP K8S AS A SERVICE Managed Kubernetes Service
  24. 24. EVENT HANDLER Operation Platform log: ****** Every hour VM created Events Send notification Run test Deploy app Actions WIP Event sources Function executor Functions
  25. 25. • We will keep challenging ourselves • Solved network challenges with CLOS network and multi-tier LB • Reducing burden on developers and infrastructure operators with Verda RECAP
  26. 26. THANK YOU

×