SlideShare a Scribd company logo
Parikshit Chitalkar
Co-Founder & CTO,
StashFin
Deepak Sood
Engineering Lead,
StashFin
David Lin
Head of Risk,
StashFin
How to Empower a Platform With a
Data Pipeline At a Scale
Nitin Dhir
Solutions Architect,
AISPL
#1
Agenda
Overview of StashFin
High level Architecture
Platform build & scale out
Designing a data pipeline on AWS
Q&A
#2
Overview of StashFin
#3
Overview
▪ Provides flexible small ticket size personal loans to salaried
individuals through its web and mobile platforms
▪ Manages the lending process end to end from customer
onboarding to disbursal
▪ Founded in 2016, headquartered in Delhi, by a team of former
financials
FOUNDERS
Tushar Aggarwal
Ex General Atlantic, Everstone and
Goldman Sachs, Wharton Business
School
Parikshit Chitalkar
17 years in Fintech, Mobile & Web
Technologies in Canada & US Trained
Pilot. Successfully built & sold 6
DegreesIT to a US based private
equity firm
Shruti Aggarwal
Ex Merrill Lynch (NYC) and PWC,
Columbia University, Chartered
Accountant
BUSINESS
Key Investors
▪ Credit line card which provides
flexibility to use funds across
POS and ATM machines in the
country
#4
Key figures in 3 years
Cumulative disbursals
620,000+
Repeat Customers
>50%
Loans per month
120,000
Apps Download
60.0x
Average Ticket Size
US$ 1000
Customers
>400K
Volume growth
Active Cards
>20,000>2 Milllion
#5
Approvals and Partnerships
Key Partnerships
Compliance & Certification*
*compliance standards met
#6
75 Applications integrated to create a seamless customer experience
#7
StashFin Architecture - Key Components
#8
High level Architecture
#9
Platform build and scale out
#10
▪ How StashFin made a decision early
▪ Stage of the Business
▪ Factors involved
▪ Cost
▪ Scale
▪ Choice of Tech Stack
▪ LAMP vs MEAN vs .NET
▪ Perspective 4 years later
▪ Trade offs & Balancing
BUILD vs BUY
#11
Identifying Scale Bottlenecks
▪ Monitoring
▪ You cannot fix what you can't measure
▪ Devops
▪ Obsessively automate to standardize
▪ Specific Business Issues
▪ Lead volume spikes
▪ Reads / commits to Database (Table locks)
▪ Decisioning Time
▪ Security
▪ DDOS
▪ API Credit attacks
▪ 3rd Party vulnerabilities
▪ Managing expectations on Delivery
▪ Impact management
▪ Release cycles
▪ Innovation vs BAU
#12
Current AWS Infrastructure - Deep Dive
#13
• Monolithic application
• Hard to scale individual services
• Challenges in granular monitoring
• Written in PHP and complex SQL
procedures
• Analytics workloads were very
resource intensive as DB was
designed for Production workloads
• Technical debt due to rapid
development
▪ Microservices Architecture
▪ Each team will manage their own components
▪ Each service can be deployed and scaled independently
▪ Multi language, multi framework support (Django, NEST.JS + React +
▪ Amazon S3 for storing
▪ High availability, built for scale and durability
▪ Kubernetes on Amazon EKS for processing
▪ Easy to manage and scale
▪ Support for open-source monitoring tools
▪ Amazon Glue/Athena for Analytics
▪ Can run large scale analytics without any additional infrastructure on top
of S3
▪ Amazon RedShift as the Compliance SOT + Book of Record + Data
Warehouse
▪ Populated via ETLs - easily consumable derived data
▪ Compliance & access issues resolved
What we had? And what we have built?
V1 Tech Stack V2 Tech Stack
#14
Monitoring our Cluster and Workloads
▪ State of the art Kubernetes Cluster
▪ Monitoring - Prometheus + AlertManager + Grafana
▪ Logging - Elasticsearch + Logstash + Kibana
▪ Service Mesh - Istio
▪ Other tools - Keycloak + Jenkins + Kafka + Sealed Secrets + Sentry + NewRelic + Airflow
#15
Designing data pipeline on AWS
#16
Problem Statement
▪ Volume spiked from 1,000 loan applications to 500,000 loan applications
▪ Data footprint - each application has approximately 2000 data points, thus a big data
issue: Credit Bureau, Bank, Application, Device, Geo-location, Demographic
▪ 1 million messages per day to 100 million messages per day (Big Data)
▪ Structured and unstructured data
▪ Asynchronous data poses a challenge in scoring (Redis Zset + Queue)
▪ Models / Scorecards need to be simplified & equivalency to be achieved
▪ Data governance for storage and analytics over a massive dataset
▪ Tracking and monitoring the whole infrastructure along with model performance
#17
Data Pipeline Design Considerations
▪ Organizational Data Democracy
▪ Having data accessible across all verticals
▪ Centralized data lake for all reporting needs
▪ Feature Engineering for ML Models
▪ Model implementation to production in a
couple hours
▪ Ability to handle a wide & deep feature set
▪ Data Capturing/Decisioning Speed
▪ Capture more data earlier in the journey
▪ Real time decisioning better user experience
▪ Security & Compliance
▪ Low total cost of ownership
▪ End-to-end process in code & fit for purpose
engineering
▪ Cost effective at scale
#18
Data Pipeline Structure
▪ Real time decision & Scaleable
▪ Using Redis to cache data and run decision models
▪ Async process to score as data arrives in multiple
packets
▪ Data Storage
▪ Using S3 as our data lake - All the data points are
stored in S3 directly
▪ Queries are run on top of S3 using Athena seamlessly
▪ No data silos - all services use same infrastructure
▪ Data Processing - Athena
▪ No need to maintain and upgrade & very cost effective
▪ Write simple SQL queries with joins
#19
Athena Query - 253 GB scanned in 2 minutes
#20
Summary
By leveraging AWS S3 & Athena, enable us to drive:
● Faster Decisioning
○ Improved data capturing resulting in a higher conversion rate (more applications are
getting decisioning faster and higher take up rate)
● Higher Reliability
○ With robust & scalable infrastructure, reliability increased
○ 1-2 personal to manage the whole pipeline vs 6-8 persons
● Cost & Performance Benefits
○ No need to manage bulky infrastructure – Reduced cost of managed server
○ Able to run analytical queries over TBs of data seamlessly
#21
tech@stashfin.com
Contact us at:
#22
Recording - https://yourstory.com/session/how-to-empower-a-platform-with-a-data-pipeline-at-
Question & Answer
#23

More Related Content

What's hot

databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineering
Mohamed MEJDOUBI
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
Databricks
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 Recap
Sri Ambati
 
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
Databricks
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
Gianmario Spacagna
 
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible PipelineRsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Sanjana Chowdhury
 
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning PipelineApache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Databricks
 
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
Sri Ambati
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
Henrik Skogström
 
How to build high frequency trading with our matlab secrets with c++ and mysql
How to build high frequency trading with our matlab secrets with c++ and mysqlHow to build high frequency trading with our matlab secrets with c++ and mysql
How to build high frequency trading with our matlab secrets with c++ and mysql
Bryan Downing
 
Pm.ais ummit 180917 final
Pm.ais ummit 180917 finalPm.ais ummit 180917 final
Pm.ais ummit 180917 final
Nisha Talagala
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
Mostafa Majidpour
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
 
Custom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the EnterpriseCustom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the Enterprise
Sri Ambati
 
LeanIX GraphQL Lessons Learned - CodeTalks 2017
LeanIX GraphQL Lessons Learned - CodeTalks 2017LeanIX GraphQL Lessons Learned - CodeTalks 2017
LeanIX GraphQL Lessons Learned - CodeTalks 2017
LeanIX GmbH
 
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdfData Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
HemaVeeradhi1
 
GraphQL Basics
GraphQL BasicsGraphQL Basics
GraphQL Basics
LeanIX GmbH
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
Databricks
 
Reactive Data System in Practice
Reactive Data System in PracticeReactive Data System in Practice
Reactive Data System in Practice
Trieu Nguyen
 
Productionizing Machine Learning in Our Health and Wellness Marketplace
Productionizing Machine Learning in Our Health and Wellness MarketplaceProductionizing Machine Learning in Our Health and Wellness Marketplace
Productionizing Machine Learning in Our Health and Wellness Marketplace
Databricks
 

What's hot (20)

databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineering
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 Recap
 
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
 
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible PipelineRsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
 
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning PipelineApache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
 
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
How to build high frequency trading with our matlab secrets with c++ and mysql
How to build high frequency trading with our matlab secrets with c++ and mysqlHow to build high frequency trading with our matlab secrets with c++ and mysql
How to build high frequency trading with our matlab secrets with c++ and mysql
 
Pm.ais ummit 180917 final
Pm.ais ummit 180917 finalPm.ais ummit 180917 final
Pm.ais ummit 180917 final
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
 
Custom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the EnterpriseCustom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the Enterprise
 
LeanIX GraphQL Lessons Learned - CodeTalks 2017
LeanIX GraphQL Lessons Learned - CodeTalks 2017LeanIX GraphQL Lessons Learned - CodeTalks 2017
LeanIX GraphQL Lessons Learned - CodeTalks 2017
 
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdfData Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
 
GraphQL Basics
GraphQL BasicsGraphQL Basics
GraphQL Basics
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
 
Reactive Data System in Practice
Reactive Data System in PracticeReactive Data System in Practice
Reactive Data System in Practice
 
Productionizing Machine Learning in Our Health and Wellness Marketplace
Productionizing Machine Learning in Our Health and Wellness MarketplaceProductionizing Machine Learning in Our Health and Wellness Marketplace
Productionizing Machine Learning in Our Health and Wellness Marketplace
 

Similar to How to Empower a Platform With a Data Pipeline At a Scale

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Databricks
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann
 
In-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain TechnologiesIn-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain Technologies
dsapps
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
TechWell
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
Amazon Web Services
 
From monolith to microservices
From monolith to microservicesFrom monolith to microservices
From monolith to microservices
TransferWiseSG
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Risc and velostrata 2 28 2018 lessons_in_cloud_migration
Risc and velostrata  2 28 2018 lessons_in_cloud_migrationRisc and velostrata  2 28 2018 lessons_in_cloud_migration
Risc and velostrata 2 28 2018 lessons_in_cloud_migration
RISC Networks
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
NuoDB
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
SnapLogic
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Maya Lumbroso
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Dataconomy Media
 
An introduction to cloud systems architecture
An introduction to cloud systems architectureAn introduction to cloud systems architecture
An introduction to cloud systems architecture
Neela Muhil Vannan Mayavannan
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Gary Arora
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
Amazon Web Services
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
Ilham31574
 

Similar to How to Empower a Platform With a Data Pipeline At a Scale (20)

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
In-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain TechnologiesIn-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain Technologies
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
 
From monolith to microservices
From monolith to microservicesFrom monolith to microservices
From monolith to microservices
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Risc and velostrata 2 28 2018 lessons_in_cloud_migration
Risc and velostrata  2 28 2018 lessons_in_cloud_migrationRisc and velostrata  2 28 2018 lessons_in_cloud_migration
Risc and velostrata 2 28 2018 lessons_in_cloud_migration
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
An introduction to cloud systems architecture
An introduction to cloud systems architectureAn introduction to cloud systems architecture
An introduction to cloud systems architecture
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
 

Recently uploaded

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 

Recently uploaded (20)

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 

How to Empower a Platform With a Data Pipeline At a Scale

  • 1. Parikshit Chitalkar Co-Founder & CTO, StashFin Deepak Sood Engineering Lead, StashFin David Lin Head of Risk, StashFin How to Empower a Platform With a Data Pipeline At a Scale Nitin Dhir Solutions Architect, AISPL #1
  • 2. Agenda Overview of StashFin High level Architecture Platform build & scale out Designing a data pipeline on AWS Q&A #2
  • 4. Overview ▪ Provides flexible small ticket size personal loans to salaried individuals through its web and mobile platforms ▪ Manages the lending process end to end from customer onboarding to disbursal ▪ Founded in 2016, headquartered in Delhi, by a team of former financials FOUNDERS Tushar Aggarwal Ex General Atlantic, Everstone and Goldman Sachs, Wharton Business School Parikshit Chitalkar 17 years in Fintech, Mobile & Web Technologies in Canada & US Trained Pilot. Successfully built & sold 6 DegreesIT to a US based private equity firm Shruti Aggarwal Ex Merrill Lynch (NYC) and PWC, Columbia University, Chartered Accountant BUSINESS Key Investors ▪ Credit line card which provides flexibility to use funds across POS and ATM machines in the country #4
  • 5. Key figures in 3 years Cumulative disbursals 620,000+ Repeat Customers >50% Loans per month 120,000 Apps Download 60.0x Average Ticket Size US$ 1000 Customers >400K Volume growth Active Cards >20,000>2 Milllion #5
  • 6. Approvals and Partnerships Key Partnerships Compliance & Certification* *compliance standards met #6
  • 7. 75 Applications integrated to create a seamless customer experience #7
  • 8. StashFin Architecture - Key Components #8
  • 10. Platform build and scale out #10
  • 11. ▪ How StashFin made a decision early ▪ Stage of the Business ▪ Factors involved ▪ Cost ▪ Scale ▪ Choice of Tech Stack ▪ LAMP vs MEAN vs .NET ▪ Perspective 4 years later ▪ Trade offs & Balancing BUILD vs BUY #11
  • 12. Identifying Scale Bottlenecks ▪ Monitoring ▪ You cannot fix what you can't measure ▪ Devops ▪ Obsessively automate to standardize ▪ Specific Business Issues ▪ Lead volume spikes ▪ Reads / commits to Database (Table locks) ▪ Decisioning Time ▪ Security ▪ DDOS ▪ API Credit attacks ▪ 3rd Party vulnerabilities ▪ Managing expectations on Delivery ▪ Impact management ▪ Release cycles ▪ Innovation vs BAU #12
  • 13. Current AWS Infrastructure - Deep Dive #13
  • 14. • Monolithic application • Hard to scale individual services • Challenges in granular monitoring • Written in PHP and complex SQL procedures • Analytics workloads were very resource intensive as DB was designed for Production workloads • Technical debt due to rapid development ▪ Microservices Architecture ▪ Each team will manage their own components ▪ Each service can be deployed and scaled independently ▪ Multi language, multi framework support (Django, NEST.JS + React + ▪ Amazon S3 for storing ▪ High availability, built for scale and durability ▪ Kubernetes on Amazon EKS for processing ▪ Easy to manage and scale ▪ Support for open-source monitoring tools ▪ Amazon Glue/Athena for Analytics ▪ Can run large scale analytics without any additional infrastructure on top of S3 ▪ Amazon RedShift as the Compliance SOT + Book of Record + Data Warehouse ▪ Populated via ETLs - easily consumable derived data ▪ Compliance & access issues resolved What we had? And what we have built? V1 Tech Stack V2 Tech Stack #14
  • 15. Monitoring our Cluster and Workloads ▪ State of the art Kubernetes Cluster ▪ Monitoring - Prometheus + AlertManager + Grafana ▪ Logging - Elasticsearch + Logstash + Kibana ▪ Service Mesh - Istio ▪ Other tools - Keycloak + Jenkins + Kafka + Sealed Secrets + Sentry + NewRelic + Airflow #15
  • 17. Problem Statement ▪ Volume spiked from 1,000 loan applications to 500,000 loan applications ▪ Data footprint - each application has approximately 2000 data points, thus a big data issue: Credit Bureau, Bank, Application, Device, Geo-location, Demographic ▪ 1 million messages per day to 100 million messages per day (Big Data) ▪ Structured and unstructured data ▪ Asynchronous data poses a challenge in scoring (Redis Zset + Queue) ▪ Models / Scorecards need to be simplified & equivalency to be achieved ▪ Data governance for storage and analytics over a massive dataset ▪ Tracking and monitoring the whole infrastructure along with model performance #17
  • 18. Data Pipeline Design Considerations ▪ Organizational Data Democracy ▪ Having data accessible across all verticals ▪ Centralized data lake for all reporting needs ▪ Feature Engineering for ML Models ▪ Model implementation to production in a couple hours ▪ Ability to handle a wide & deep feature set ▪ Data Capturing/Decisioning Speed ▪ Capture more data earlier in the journey ▪ Real time decisioning better user experience ▪ Security & Compliance ▪ Low total cost of ownership ▪ End-to-end process in code & fit for purpose engineering ▪ Cost effective at scale #18
  • 19. Data Pipeline Structure ▪ Real time decision & Scaleable ▪ Using Redis to cache data and run decision models ▪ Async process to score as data arrives in multiple packets ▪ Data Storage ▪ Using S3 as our data lake - All the data points are stored in S3 directly ▪ Queries are run on top of S3 using Athena seamlessly ▪ No data silos - all services use same infrastructure ▪ Data Processing - Athena ▪ No need to maintain and upgrade & very cost effective ▪ Write simple SQL queries with joins #19
  • 20. Athena Query - 253 GB scanned in 2 minutes #20
  • 21. Summary By leveraging AWS S3 & Athena, enable us to drive: ● Faster Decisioning ○ Improved data capturing resulting in a higher conversion rate (more applications are getting decisioning faster and higher take up rate) ● Higher Reliability ○ With robust & scalable infrastructure, reliability increased ○ 1-2 personal to manage the whole pipeline vs 6-8 persons ● Cost & Performance Benefits ○ No need to manage bulky infrastructure – Reduced cost of managed server ○ Able to run analytical queries over TBs of data seamlessly #21
  • 22. tech@stashfin.com Contact us at: #22 Recording - https://yourstory.com/session/how-to-empower-a-platform-with-a-data-pipeline-at-