Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra @ Yahoo Japan | Cassandra Summit 2016

1,272 views

Published on

Cassandra Summit 2016
Day2 Conference - Thursday, September 8, 2016
3:00 PM – 3:35 PM, Room: 212
Cassandra @ Yahoo Japan
Satoshi Konno: Yahoo Japan Corporation

Published in: Technology
  • Be the first to comment

Cassandra @ Yahoo Japan | Cassandra Summit 2016

  1. 1. Cassandra @
  2. 2. Satoshi Konno http://www.cybergarage.org • Engineering Manager of NoSQL Team @ Yahoo! Japan • Open Source Software Developer for Virtual Reality, IoT and Cloud Computing • Doctor's Course Student @ JAIST Défago Lab : The φ accrual failure detector About me 2
  3. 3. Agenda • Company Profile • Summary of C* Clusters • Issues and Solutions of C* • Next Generation Infrastructures for C*
  4. 4. Company Profile 4
  5. 5. Founded : January 31, 1996 Businesses : Internet Advertising e-Commerce Members Services, etc. Web Services : 100+ Smartphone Apps: 50+ (iOS), 50+ (Android) Employees : 5,800+ (as of June 30, 2016) Head Office : Chiyoda-ku, Tokyo, Japan Company Profile 5
  6. 6. Shareholder Composition 6 An independent and public company in the Japanese Market U.S. Japan 35.5% 42.9% Market Cap $22 billion Market Cap $29 billion Market Cap $60 billion
  7. 7. 18th Largest Internet Company in market cap 7 0 100 200 300 400 500 600 bilion U.S. dollars http://www.statista.com/statistics/277483/market-value-of-the-largest-internet-companies-worldwide/
  8. 8. 19years 16 17 18 Revenue ¥652B, Operating Income ¥171B (FY2015) Continued Growth Sustained
  9. 9. 60% Consumer 32% % Others 8 % Marketing Solutions Revenue Portfolio (FY2015)
  10. 10. Extensive Reach to a Wide Range of Users 10 80% 80% of all Japanese Internet users use Yahoo! JAPAN Nielsen NetView June 2015 : Data by Brands. Access from home and work using PCs (excl. internet applications)
  11. 11. Many Strong Services 11 Media US Search Video Answer Mail JP US JP Membership C2C Payment C2C EC B2C EC Local Search Knowledge search MailNews YAHUOKU!Premium Wallet Loco
  12. 12. Summary of C* Clusters 12
  13. 13. Yahoo! JAPAN Database Platforms 13 300+ Systems NoSQL Team 100+ Services
  14. 14. OSS Database Platforms 14 300+ Systems 180 Systems MySQL 630 DBs 100 Systems Cassandra 130 DBs 30 70 60 40 Yahoo Japan NoSQL Team RDB Team
  15. 15. Cassandra @ Yahoo! JAPAN 15 2010 2012 2014 2016 Service Departments Our Team 0.5 0.8 1.x 0.8 1.x 2.x 3.x NoSQL Team
  16. 16. Our Cassandra Clusters 16 30 Clusters 30TB Usages 1000+ Nodes 300,000 Read/sec 100,000 Write/sec 2016 10 Nodes / Cluster 160 Nodes / Cluster … 1 Shared Cluster 30 Special Clusters 30 Systems 50 Systems 3 DCs
  17. 17. Our Use Case Summary on Cassandra 17 100 Systems 20 Database Caching 10 Advertising Services 40 User Databases 50 Service Databases Browsing History Impression Data ・・・・ Meta Data Aggregated Data ・・・・ Generated Data Session Data Meta Data Aggregated Data ・・・・ Generated Data Recommendation Demographic Data Life Log ・・・・ Preference Data Behavior History
  18. 18. Our Issues and Solutions 18
  19. 19. ISSUE #1 : C10k Problem – C* Proxy 19 PC + Tablet 3.36B PV Smart Device 3.45B PV 6.8 Billion PV /month
  20. 20. ISSUE #1 : C10k Problem – C* Proxy 20 Yahoo Japan Services .......... 10 〜 200 Front-end Servers / Service PHOTO:AFLO
  21. 21. ISSUE #1 : C10k Problem – C* Proxy • PROBLEM : 200 front-end servers * 128 processes * 2 (C* request + C* heart beat) =51,200 connections / node 21PHOTO:AFLO 200 Front-end Servers 128 processes 51,200 connections !
  22. 22. ISSUE #1 : C10k Problem – C* Proxy • PROBLEM : 200 front-end servers * 128 processes * 2 (C* request + C* heart beat) =51,200 connections / node 22PHOTO:AFLO
  23. 23. ISSUE #1 : C10k Problem – C* Proxy • PROBLEM : 200 front-end servers * 128 processes * 2 (C* request + C* heart beat) =51,200 connections / node 23 Process down PHOTO:AFLO
  24. 24. ISSUE #1 : C10k Problem – C* Proxy • SOLUTION : 200 front-end servers * 128 processes * 1 proxy * 2 (C* request + C* heart beat) =400 connections / node 24 200 front-end servers 1 proxy 400 connections ! 128 processes PHOTO:AFLO
  25. 25. ISSUE #2 : Boostrap Problem - Driver • Heavy Services : ↑3000qps/node = C* cluster with real servers (SSD is recommended) • Light Services : ↓1000qps/node and ↓3GB/node = C * cluster with virtual servers on OpenStack 25 Heavy Service Light Service CPU = Good vCPU = Cheap
  26. 26. ISSUE #2 : Boostrap Problem - Driver • PROBLEM : All processes in each front-end server tries to connect a new C* node which is added into the cluster at the same time ... 26 .......... ! ! ! ! ! ! vCPU = Cheap PHOTO:AFLO
  27. 27. ISSUE #2 : Boostrap Problem - Driver • PROBLEM : The authentication of C* based on BCrypt is heavy processing for the vCPU nodes. 27 .......... ! vCPU : Authentication (BCrypt) is heavy ! ! ! ! ! ! PHOTO:AFLO
  28. 28. ISSUE #2 : Boostrap Problem - Driver • PROBLEM : Most processes can not connect to C* clusters on OpenStack due to the authentication processing, and the processes will timeout and repeat to connect without waiting endlessly … 28 All vCPU Usages = 100% ! PHOTO:AFLO vCPU : Authentication (BCrypt) is heavy ! Timeout ! Retry !
  29. 29. ISSUE #2 : Boostrap Problem - Driver • SOLUTION : Improving the C* drivers not to connect simultaneously when the connection is failed. 29 .......... !! ! ! ! ! PHOTO:AFLO
  30. 30. ISSUE #3 : Multi-tenancy – Slow Query • Small Services : (↓500qps and ↓10GB) / keyspace = Shared C* cluster with real servers 30 Shared Cluster 50 Services
  31. 31. ISSUE #3 : Multi-tenancy – Slow Query • PROBLEM : Couldn’t find the causal service of the high loading queries in the multi-tenancy cluster. 31 Shared Cluster Which services ? QUERY QUERY PHOTO:AFLO
  32. 32. ISSUE #3 : Multi-tenancy – Slow Query • SOLUTION : CASSANDRA-12403 - Slow query detecting 32 Shared Cluster Service Remove Special Cluster QUERY PHOTO:AFLO Slow Query !
  33. 33. ISSUE #4 : Multi-racking – Inbound Params • PROBLEM : Our C* clusters are build with other services in a same rack or under a same core switch. 33PHOTO:AFLO
  34. 34. ISSUE #4 : Multi-racking – Inbound Params • PROBLEM : C* Streaming occurs when the node is added or remove by the our operation or the failure detection. 34 Streaming PHOTO:AFLO
  35. 35. ISSUE #4 : Multi-racking – Inbound Params • PROBLEM : The streaming of C* rises a heavy traffic, and it troubles the other services. 35 Streaming Streaming Streaming Stop C* streaming ! PHOTO:AFLO stream_throughput_outbound stream_throughput_outbound stream_throughput_outbound
  36. 36. ISSUE #4 : Multi-racking – Inbound Params • SOLUTION : CASSANDRA-11303 - New inbound throughput parameters for streaming 36 Streaming Streaming Streaming PHOTO:AFLO stream_throughput_outbound stream_throughput_outbound stream_throughput_outbound stream_throughput_inbound stream_throughput_inbound stream_throughput_inbound
  37. 37. Next Generation Infrastructures for C* 37
  38. 38. • PURPOSE : To abstract our data center resources using OpenStack. Apps Platforms Infrastructures APIAPI API API API API OpenStack @ Yahoo! JAPAN 38 50,000+ instances
  39. 39. Trial #1 : Special Hypervisor for C* • PROBLEM : Our hypervisors of OpenStack has C* and other service VMs. 39 Noisy Neighbours
  40. 40. Trial #1 : Special Hypervisor for C* • SOLUTION : Trying to offer the special hypervisors which runs only C* VMs. 40 vCPU : 8+, Mem : 16GiB+ SSD : 100GiB+ Optimal Flavors for C* 10Gbps x 2
  41. 41. TRIAL#2 : Bare Metal Clusters for C* • PROBLEM : vCPU of OpenStack is cheap to run a C* node in our special service environment such as the many connections. 41 vCPU : Authentication (BCrypt) is heavy !
  42. 42. TRIAL #2 : Bare Metal Clusters for C* • SOLUTION : Trying to offer the special bare metal clusters which runs only C* using OpenStack Ironic. 42 Ironic Xeon D-1541 2.1GHz (1CPU) 32GBMEM / SATA SSD 400GB 10Gbps x 2

×