Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について

1,140 views

Published on

JAPAN CONTAINER DAYS V18.12
https://containerdays.jp/
での登壇資料です

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について

  1. 1. Kubernetes Drucker Rekcurd Keigo Hattori, Search & Clova Center
  2. 2. • • → Rekcurd • Kubernetes Deployment Pod • Kubernetes → Rekcurd
  3. 3. Who I am @keigohtr Keigo Hattori keigohtr Keigo Hattori Software Engineer 2009 2009~2017.10 2017.11~ LINE x Clova Apitore
  4. 4. Join us! • Rekcurd GitHub OSS https://github.com/rekcurd Issue/Pull Request • • We are hiring!
  5. 5. • Introduction • Kubernetes Rekcurd Agenda
  6. 6. Introduction Clova
  7. 7. TITLE
  8. 8. subtitle TITLE
  9. 9. Clova
  10. 10. Kubernetes Rekcurd
  11. 11. • • → Rekcurd • Kubernetes Deployment Pod • Kubernetes → Rekcurd
  12. 12. BackgroundSVM Deep LearningMachine Learning Keras Chainer Caffe scikit-learn gensim logistic regression Random Forest Neural Network Perceptron libsvm liblinear Theano
  13. 13. JupyterLab TensorBoard ChainerUI
  14. 14. 1. Data i. Collection ii. Cleaning/Cleansing 2. Feature i. Preprocessing ii. Dictionary 3. Training i. Algorithm ii. Parameter tuning iii. Evaluation 4. Others i. Server setup ii. Versioning (data, parameter, model, result) Tasks in building ML model
  15. 15. • ü ü ü
  16. 16. • ü ü ü
  17. 17. 1. High Availability 2. Management i. Upload the latest ML model ii. Switch a model without stopping services iii. Versioning models 3. Monitor i. Load balancing ii. Auto healing iii. Auto scaling iv. Performance/Results check 4. Others i. Server setup (development/staging/production) ii. Integration to the existing services iii. AB testing iv. Managing many ML services v. Logging Tasks in serving ML service Rekcurd x Kubernetes
  18. 18. Rekcurd • • •
  19. 19. Rekcurd • Rekcurd • Rekcurd dashboard (x Kubernetes) • Rekcurd client
  20. 20. Rekcurd
  21. 21. • pip install Flask ü FW • gRPC ü gRPC spec gRPC Rekcurd
  22. 22. Rekcurd • Input format • Output format Field Type Description input string/bytes/list Tensor option string (json format) Rekcurd option fields key-value Field Type Description label string/bytes/list Tensor score float/list multi-class option string (json format) Rekcurd option fields key-value
  23. 23. Rekcurd dashboard
  24. 24. Rekcurd dashboard • Rekcurd ü Rekcurd ü • WebUI ü Kubernetes
  25. 25. Top Application =
  26. 26. Add Application (1/3)
  27. 27. Add Application (2/3) • Base Image • Git • Git Branch • • CPU/Memory
  28. 28. Add Application (3/3) • • Autoscale Policy • Deploy Policy •
  29. 29. Add Application
  30. 30. Application /
  31. 31. Add Model ML
  32. 32. Add Model
  33. 33. Switch Model ML
  34. 34. Switch Model Kubernetes
  35. 35. Service URL
  36. 36. Model
  37. 37. Service/Model (1/3) /
  38. 38. Service/Model (2/3)
  39. 39. Service/Model (3/3)
  40. 40. Rekcurd client
  41. 41. • pip install SDK ü URL ü gRPC spec gRPC • Netflix Feign ü DNS + App + Kubernetes Rekcurd Rekcurd client
  42. 42. Kubernetes gRPC
  43. 43. Kubernetes gRPC No Rekcurd Rekcurd dashboard Kubernetes
  44. 44. Kubernetes
  45. 45. Rancher ü LINE Private Cloud Kubernetes Rancher 2.x ü Clova Rancher1.6 Kubernetes Rekcurd Rekcurd GCP/AWS/kubeadm
  46. 46. HA /
  47. 47. Kubernetes • Auto healing ü Deployment Pod • Auto scaling ü HorizontalAutoScaler Pod • Rolling update ü Deployment Pod ü Pod Rekcurd dashboard
  48. 48. Dev Prod Node HA Node Dev Prod
  49. 49. Kubernetes Node SL: dev Node SL: stg Node SL: prod Node SL: prod namespace x node selector x affinity Pod App: hoge SL: dev Pod App: hoge SL: stg Pod App: hoge SL: prod Pod App: hoge SL: prod Pod App: hoge SL: prod Rekcurd dashboard
  50. 50. • namespace ü service level namespace Pod • node selector ü node service level service level node Pod • affinity ü Pod application name HA node Pod namespace x node selector x affinity Rekcurd dashboard
  51. 51. gRPC Load Balancing Rancher gRPC Load balance
  52. 52. Ingress • nghttpx ingress controller ü http2 Load balancer (nginx ingress controller ) Rekcurd Service mesh (Istio) • Ingress Load balancer annotation ü DNS Host name routing http://<app-name>-<service-level>.<domain> Rekcurd dashboard
  53. 53. fluentd logger
  54. 54. ü fluentd Kubernetes Daemonset ü Pod stdout/stderr Forwarding ü Clova kibana logger fluentd-kubernetes
  55. 55. Node/Pod Docker image image Node Disk full
  56. 56. Online Storage Kubernetes Node SL: dev Node SL: stg Node SL: prod Node SL: prod MySQL & Online storage Pod App: hoge SL: dev Pod App: hoge SL: stg Pod App: hoge SL: prod Pod App: hoge SL: prod Rekcurd dashboard
  57. 57. • Node Online storage (e.g. WebDAV) ü LINE AWS S3 Private online storage ü goofys Node • Pod Node volume ü Node Online storage • Application ML MySQL ü Pod DB MySQL & Online storage Rekcurd dashboard
  58. 58. Rekcurd on Kubernetes Architecture
  59. 59. Rekcurd 1. Rekcurd dashboard Kubernetes 2. Pod git pull Rekcurd 3. Rekcurd
  60. 60. git pull Rekcurd
  61. 61. Rekcurd 1. Rekcurd dashboard Kubernetes 2. Pod git pull Rekcurd 3. Rekcurd Docker Image Docker • Docker • Image
  62. 62. Docker
  63. 63. • Rekcurd Docker Hub ü “docker pull rekcurd/rekcurd:python-latest” ü Rekcurd OK ü Dockerfile private docker registry image No and Yes
  64. 64. AB Canary
  65. 65. AB Canary TBD • AB • Canary
  66. 66. LDAP ü LDAP ML
  67. 67. Kubeflow
  68. 68. Kubeflow • TensorFlow • • WebUI • etc... Kubeflow
  69. 69. Rekcurd
  70. 70. Rekcurd Apache2.0 https://github.com/rekcurd Welcome Feedback and Pull Request!!! Rekcurd
  71. 71. We are hiring!!!
  72. 72. 1. High Availability 2. Management i. Upload the latest ML model ii. Switch a model without stopping services iii. Versioning models 3. Monitor i. Load balancing ii. Auto healing iii. Auto scaling iv. Performance/Results check 4. Others i. Server setup (development/staging/production) ii. Integration to the existing services iii. AB testing iv. Managing many ML services v. Logging Tasks in serving ML service Rekcurd x Kubernetes
  73. 73. THANK YOU
  74. 74. @keigohtr Keigo Hattori keigohtr Keigo Hattori @line_clova #Clova LINE Clova http://clova-blog.line.me/ja/

×