Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

End to-end example: consumer loan acceptance scoring using kubeflow


Published on

CREDO's presentation from the Kubeflow Summit 2019 on how Kubeflow can be used in financial industry.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

End to-end example: consumer loan acceptance scoring using kubeflow

  1. 1. End-to-end example: Consumer loan acceptance scoring using Kubeflow Radovan Parrak, Lead Data Scientist, Credo
  2. 2. Situation European company with 2 This example related to scoring credit acceptance ● Analytics environment ○ AWS ○ Banking: very high requirements regarding IT security and compliance with regulation ● Objectives ○ Putting hundreds of data products live ○ Single development >> deployment >> delivery environment ○ First go batch, then also real-time ● Typical modelling project ○ Structured data ○ Supervised learning ○ Internalizing interpretable models and hybrid pipelines
  3. 3. Requirements ● Data Scientists ○ Hybrid, integrated & cloud-based development env ■ Python ■ PySpark (locally + remotely on Spark cluster) ■ R ■ SQL ○ Version control (scripts & artifacts) 3 ● ML DevOps ○ Seamless deployment of hybrid pipelines ■ dependency hassle (package & data) ○ Trigger-based scheduling & orchestration of runs ○ Monitoring & dashboarding ○ Version control (runs & pipelines)
  4. 4. Architecture 4 AWS Cloud [existing] ● Infrastructure, connections, security ● S3, Spark cluster, virtual machines,... AWS EKS ● Managed Kubernetes service Kubeflow (v0.6) ● Notebook development environment ● Pipelines for deployment & delivery AWS ECR ElasticStack ● dashboarding env Custom Notebook Servers Open challenges ● Upgrading/migrating of Kubeflow (> v.1)
  5. 5. Development Status quo ● Model fitting done via Kubeflow Notebooks ● Custom JupyterLab Notebook Servers ○ Python, R, (Py)Spark ○ SQL extension ● Kubeflow namespace used as a project directory ○ PVCs per namespace ○ PVs mounted from S3 buckets ○ Shared and private data per namespace 5 Next tasks on our roadmap 1. Remote Spark job submission 2. Modelling pipeline templates 3. Katib for hyper-parameter tuning in our use-case (XGBoost)
  6. 6. Deployment Status quo ● Kubeflow Pipelines as a primary deployment object ● Same underlying development & deployment containers ● Manual script & artifact injection ○ copy from project to kubeflow namespace ○ referencing in ContainerOp 6 Next tasks on our roadmap 1. Explore and implement Fairing 2. Deploying directly from the project namespace 3. Explore Istio for A/B testing and canary deployments
  7. 7. Delivery Status quo ● Parameterized batch runs (manual) 7 Next tasks on our roadmap 1. Explore trigger-based scheduling & orchestration possibilities (cron??) 2. Explore Metadata 3. Explore Nuclio ● Real-time delivery via a Flask app directly on Kubernetes ● Acceptance scores delivered to a dedicated S3 bucket ● Scores monitoring via Kibana dashboards (ElasticStack)
  8. 8. Thanks! Radovan Parrak, Lead Data Scientist,