SlideShare a Scribd company logo
1 of 23
Parikshit Chitalkar
Co-Founder & CTO,
StashFin
Deepak Sood
Engineering Lead,
StashFin
David Lin
Head of Risk,
StashFin
How to Empower a Platform With a
Data Pipeline At a Scale
Nitin Dhir
Solutions Architect,
AISPL
#1
Agenda
Overview of StashFin
High level Architecture
Platform build & scale out
Designing a data pipeline on AWS
Q&A
#2
Overview of StashFin
#3
Overview
▪ Provides flexible small ticket size personal loans to salaried
individuals through its web and mobile platforms
▪ Manages the lending process end to end from customer
onboarding to disbursal
▪ Founded in 2016, headquartered in Delhi, by a team of former
financials
FOUNDERS
Tushar Aggarwal
Ex General Atlantic, Everstone and
Goldman Sachs, Wharton Business
School
Parikshit Chitalkar
17 years in Fintech, Mobile & Web
Technologies in Canada & US Trained
Pilot. Successfully built & sold 6
DegreesIT to a US based private
equity firm
Shruti Aggarwal
Ex Merrill Lynch (NYC) and PWC,
Columbia University, Chartered
Accountant
BUSINESS
Key Investors
▪ Credit line card which provides
flexibility to use funds across
POS and ATM machines in the
country
#4
Key figures in 3 years
Cumulative disbursals
620,000+
Repeat Customers
>50%
Loans per month
120,000
Apps Download
60.0x
Average Ticket Size
US$ 1000
Customers
>400K
Volume growth
Active Cards
>20,000>2 Milllion
#5
Approvals and Partnerships
Key Partnerships
Compliance & Certification*
*compliance standards met
#6
75 Applications integrated to create a seamless customer experience
#7
StashFin Architecture - Key Components
#8
High level Architecture
#9
Platform build and scale out
#10
▪ How StashFin made a decision early
▪ Stage of the Business
▪ Factors involved
▪ Cost
▪ Scale
▪ Choice of Tech Stack
▪ LAMP vs MEAN vs .NET
▪ Perspective 4 years later
▪ Trade offs & Balancing
BUILD vs BUY
#11
Identifying Scale Bottlenecks
▪ Monitoring
▪ You cannot fix what you can't measure
▪ Devops
▪ Obsessively automate to standardize
▪ Specific Business Issues
▪ Lead volume spikes
▪ Reads / commits to Database (Table locks)
▪ Decisioning Time
▪ Security
▪ DDOS
▪ API Credit attacks
▪ 3rd Party vulnerabilities
▪ Managing expectations on Delivery
▪ Impact management
▪ Release cycles
▪ Innovation vs BAU
#12
Current AWS Infrastructure - Deep Dive
#13
• Monolithic application
• Hard to scale individual services
• Challenges in granular monitoring
• Written in PHP and complex SQL
procedures
• Analytics workloads were very
resource intensive as DB was
designed for Production workloads
• Technical debt due to rapid
development
▪ Microservices Architecture
▪ Each team will manage their own components
▪ Each service can be deployed and scaled independently
▪ Multi language, multi framework support (Django, NEST.JS + React +
▪ Amazon S3 for storing
▪ High availability, built for scale and durability
▪ Kubernetes on Amazon EKS for processing
▪ Easy to manage and scale
▪ Support for open-source monitoring tools
▪ Amazon Glue/Athena for Analytics
▪ Can run large scale analytics without any additional infrastructure on top
of S3
▪ Amazon RedShift as the Compliance SOT + Book of Record + Data
Warehouse
▪ Populated via ETLs - easily consumable derived data
▪ Compliance & access issues resolved
What we had? And what we have built?
V1 Tech Stack V2 Tech Stack
#14
Monitoring our Cluster and Workloads
▪ State of the art Kubernetes Cluster
▪ Monitoring - Prometheus + AlertManager + Grafana
▪ Logging - Elasticsearch + Logstash + Kibana
▪ Service Mesh - Istio
▪ Other tools - Keycloak + Jenkins + Kafka + Sealed Secrets + Sentry + NewRelic + Airflow
#15
Designing data pipeline on AWS
#16
Problem Statement
▪ Volume spiked from 1,000 loan applications to 500,000 loan applications
▪ Data footprint - each application has approximately 2000 data points, thus a big data
issue: Credit Bureau, Bank, Application, Device, Geo-location, Demographic
▪ 1 million messages per day to 100 million messages per day (Big Data)
▪ Structured and unstructured data
▪ Asynchronous data poses a challenge in scoring (Redis Zset + Queue)
▪ Models / Scorecards need to be simplified & equivalency to be achieved
▪ Data governance for storage and analytics over a massive dataset
▪ Tracking and monitoring the whole infrastructure along with model performance
#17
Data Pipeline Design Considerations
▪ Organizational Data Democracy
▪ Having data accessible across all verticals
▪ Centralized data lake for all reporting needs
▪ Feature Engineering for ML Models
▪ Model implementation to production in a
couple hours
▪ Ability to handle a wide & deep feature set
▪ Data Capturing/Decisioning Speed
▪ Capture more data earlier in the journey
▪ Real time decisioning better user experience
▪ Security & Compliance
▪ Low total cost of ownership
▪ End-to-end process in code & fit for purpose
engineering
▪ Cost effective at scale
#18
Data Pipeline Structure
▪ Real time decision & Scaleable
▪ Using Redis to cache data and run decision models
▪ Async process to score as data arrives in multiple
packets
▪ Data Storage
▪ Using S3 as our data lake - All the data points are
stored in S3 directly
▪ Queries are run on top of S3 using Athena seamlessly
▪ No data silos - all services use same infrastructure
▪ Data Processing - Athena
▪ No need to maintain and upgrade & very cost effective
▪ Write simple SQL queries with joins
#19
Athena Query - 253 GB scanned in 2 minutes
#20
Summary
By leveraging AWS S3 & Athena, enable us to drive:
● Faster Decisioning
○ Improved data capturing resulting in a higher conversion rate (more applications are
getting decisioning faster and higher take up rate)
● Higher Reliability
○ With robust & scalable infrastructure, reliability increased
○ 1-2 personal to manage the whole pipeline vs 6-8 persons
● Cost & Performance Benefits
○ No need to manage bulky infrastructure – Reduced cost of managed server
○ Able to run analytical queries over TBs of data seamlessly
#21
tech@stashfin.com
Contact us at:
#22
Recording - https://yourstory.com/session/how-to-empower-a-platform-with-a-data-pipeline-at-
Question & Answer
#23

More Related Content

What's hot

databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringMohamed MEJDOUBI
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...Databricks
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 RecapSri Ambati
 
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...Databricks
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsGianmario Spacagna
 
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible PipelineRsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible PipelineSanjana Chowdhury
 
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning PipelineApache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning PipelineDatabricks
 
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...Sri Ambati
 
How to build high frequency trading with our matlab secrets with c++ and mysql
How to build high frequency trading with our matlab secrets with c++ and mysqlHow to build high frequency trading with our matlab secrets with c++ and mysql
How to build high frequency trading with our matlab secrets with c++ and mysqlBryan Downing
 
Pm.ais ummit 180917 final
Pm.ais ummit 180917 finalPm.ais ummit 180917 final
Pm.ais ummit 180917 finalNisha Talagala
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to ProductionMostafa Majidpour
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowJan Kirenz
 
Custom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the EnterpriseCustom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the EnterpriseSri Ambati
 
LeanIX GraphQL Lessons Learned - CodeTalks 2017
LeanIX GraphQL Lessons Learned - CodeTalks 2017LeanIX GraphQL Lessons Learned - CodeTalks 2017
LeanIX GraphQL Lessons Learned - CodeTalks 2017LeanIX GmbH
 
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdfData Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdfHemaVeeradhi1
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowDatabricks
 
Reactive Data System in Practice
Reactive Data System in PracticeReactive Data System in Practice
Reactive Data System in PracticeTrieu Nguyen
 
Productionizing Machine Learning in Our Health and Wellness Marketplace
Productionizing Machine Learning in Our Health and Wellness MarketplaceProductionizing Machine Learning in Our Health and Wellness Marketplace
Productionizing Machine Learning in Our Health and Wellness MarketplaceDatabricks
 

What's hot (20)

databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineering
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 Recap
 
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
 
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible PipelineRsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
 
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning PipelineApache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
 
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
How to build high frequency trading with our matlab secrets with c++ and mysql
How to build high frequency trading with our matlab secrets with c++ and mysqlHow to build high frequency trading with our matlab secrets with c++ and mysql
How to build high frequency trading with our matlab secrets with c++ and mysql
 
Pm.ais ummit 180917 final
Pm.ais ummit 180917 finalPm.ais ummit 180917 final
Pm.ais ummit 180917 final
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
 
Custom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the EnterpriseCustom Machine Learning Recipes for the Enterprise
Custom Machine Learning Recipes for the Enterprise
 
LeanIX GraphQL Lessons Learned - CodeTalks 2017
LeanIX GraphQL Lessons Learned - CodeTalks 2017LeanIX GraphQL Lessons Learned - CodeTalks 2017
LeanIX GraphQL Lessons Learned - CodeTalks 2017
 
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdfData Science Meets DevOps: GitOps with OpenShift (1).pdf
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
 
GraphQL Basics
GraphQL BasicsGraphQL Basics
GraphQL Basics
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
 
Reactive Data System in Practice
Reactive Data System in PracticeReactive Data System in Practice
Reactive Data System in Practice
 
Productionizing Machine Learning in Our Health and Wellness Marketplace
Productionizing Machine Learning in Our Health and Wellness MarketplaceProductionizing Machine Learning in Our Health and Wellness Marketplace
Productionizing Machine Learning in Our Health and Wellness Marketplace
 

Similar to How to Empower a Platform With a Data Pipeline At a Scale

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
 
In-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain TechnologiesIn-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain Technologiesdsapps
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?TechWell
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)Amazon Web Services
 
From monolith to microservices
From monolith to microservicesFrom monolith to microservices
From monolith to microservicesTransferWiseSG
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Risc and velostrata 2 28 2018 lessons_in_cloud_migration
Risc and velostrata  2 28 2018 lessons_in_cloud_migrationRisc and velostrata  2 28 2018 lessons_in_cloud_migration
Risc and velostrata 2 28 2018 lessons_in_cloud_migrationRISC Networks
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL DatabaseNuoDB
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...SnapLogic
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Maya Lumbroso
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Dataconomy Media
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Gary Arora
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014Amazon Web Services
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfIlham31574
 

Similar to How to Empower a Platform With a Data Pipeline At a Scale (20)

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
In-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain TechnologiesIn-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain Technologies
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
 
From monolith to microservices
From monolith to microservicesFrom monolith to microservices
From monolith to microservices
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Risc and velostrata 2 28 2018 lessons_in_cloud_migration
Risc and velostrata  2 28 2018 lessons_in_cloud_migrationRisc and velostrata  2 28 2018 lessons_in_cloud_migration
Risc and velostrata 2 28 2018 lessons_in_cloud_migration
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
An introduction to cloud systems architecture
An introduction to cloud systems architectureAn introduction to cloud systems architecture
An introduction to cloud systems architecture
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
 

Recently uploaded

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 

Recently uploaded (20)

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

How to Empower a Platform With a Data Pipeline At a Scale

  • 1. Parikshit Chitalkar Co-Founder & CTO, StashFin Deepak Sood Engineering Lead, StashFin David Lin Head of Risk, StashFin How to Empower a Platform With a Data Pipeline At a Scale Nitin Dhir Solutions Architect, AISPL #1
  • 2. Agenda Overview of StashFin High level Architecture Platform build & scale out Designing a data pipeline on AWS Q&A #2
  • 4. Overview ▪ Provides flexible small ticket size personal loans to salaried individuals through its web and mobile platforms ▪ Manages the lending process end to end from customer onboarding to disbursal ▪ Founded in 2016, headquartered in Delhi, by a team of former financials FOUNDERS Tushar Aggarwal Ex General Atlantic, Everstone and Goldman Sachs, Wharton Business School Parikshit Chitalkar 17 years in Fintech, Mobile & Web Technologies in Canada & US Trained Pilot. Successfully built & sold 6 DegreesIT to a US based private equity firm Shruti Aggarwal Ex Merrill Lynch (NYC) and PWC, Columbia University, Chartered Accountant BUSINESS Key Investors ▪ Credit line card which provides flexibility to use funds across POS and ATM machines in the country #4
  • 5. Key figures in 3 years Cumulative disbursals 620,000+ Repeat Customers >50% Loans per month 120,000 Apps Download 60.0x Average Ticket Size US$ 1000 Customers >400K Volume growth Active Cards >20,000>2 Milllion #5
  • 6. Approvals and Partnerships Key Partnerships Compliance & Certification* *compliance standards met #6
  • 7. 75 Applications integrated to create a seamless customer experience #7
  • 8. StashFin Architecture - Key Components #8
  • 10. Platform build and scale out #10
  • 11. ▪ How StashFin made a decision early ▪ Stage of the Business ▪ Factors involved ▪ Cost ▪ Scale ▪ Choice of Tech Stack ▪ LAMP vs MEAN vs .NET ▪ Perspective 4 years later ▪ Trade offs & Balancing BUILD vs BUY #11
  • 12. Identifying Scale Bottlenecks ▪ Monitoring ▪ You cannot fix what you can't measure ▪ Devops ▪ Obsessively automate to standardize ▪ Specific Business Issues ▪ Lead volume spikes ▪ Reads / commits to Database (Table locks) ▪ Decisioning Time ▪ Security ▪ DDOS ▪ API Credit attacks ▪ 3rd Party vulnerabilities ▪ Managing expectations on Delivery ▪ Impact management ▪ Release cycles ▪ Innovation vs BAU #12
  • 13. Current AWS Infrastructure - Deep Dive #13
  • 14. • Monolithic application • Hard to scale individual services • Challenges in granular monitoring • Written in PHP and complex SQL procedures • Analytics workloads were very resource intensive as DB was designed for Production workloads • Technical debt due to rapid development ▪ Microservices Architecture ▪ Each team will manage their own components ▪ Each service can be deployed and scaled independently ▪ Multi language, multi framework support (Django, NEST.JS + React + ▪ Amazon S3 for storing ▪ High availability, built for scale and durability ▪ Kubernetes on Amazon EKS for processing ▪ Easy to manage and scale ▪ Support for open-source monitoring tools ▪ Amazon Glue/Athena for Analytics ▪ Can run large scale analytics without any additional infrastructure on top of S3 ▪ Amazon RedShift as the Compliance SOT + Book of Record + Data Warehouse ▪ Populated via ETLs - easily consumable derived data ▪ Compliance & access issues resolved What we had? And what we have built? V1 Tech Stack V2 Tech Stack #14
  • 15. Monitoring our Cluster and Workloads ▪ State of the art Kubernetes Cluster ▪ Monitoring - Prometheus + AlertManager + Grafana ▪ Logging - Elasticsearch + Logstash + Kibana ▪ Service Mesh - Istio ▪ Other tools - Keycloak + Jenkins + Kafka + Sealed Secrets + Sentry + NewRelic + Airflow #15
  • 17. Problem Statement ▪ Volume spiked from 1,000 loan applications to 500,000 loan applications ▪ Data footprint - each application has approximately 2000 data points, thus a big data issue: Credit Bureau, Bank, Application, Device, Geo-location, Demographic ▪ 1 million messages per day to 100 million messages per day (Big Data) ▪ Structured and unstructured data ▪ Asynchronous data poses a challenge in scoring (Redis Zset + Queue) ▪ Models / Scorecards need to be simplified & equivalency to be achieved ▪ Data governance for storage and analytics over a massive dataset ▪ Tracking and monitoring the whole infrastructure along with model performance #17
  • 18. Data Pipeline Design Considerations ▪ Organizational Data Democracy ▪ Having data accessible across all verticals ▪ Centralized data lake for all reporting needs ▪ Feature Engineering for ML Models ▪ Model implementation to production in a couple hours ▪ Ability to handle a wide & deep feature set ▪ Data Capturing/Decisioning Speed ▪ Capture more data earlier in the journey ▪ Real time decisioning better user experience ▪ Security & Compliance ▪ Low total cost of ownership ▪ End-to-end process in code & fit for purpose engineering ▪ Cost effective at scale #18
  • 19. Data Pipeline Structure ▪ Real time decision & Scaleable ▪ Using Redis to cache data and run decision models ▪ Async process to score as data arrives in multiple packets ▪ Data Storage ▪ Using S3 as our data lake - All the data points are stored in S3 directly ▪ Queries are run on top of S3 using Athena seamlessly ▪ No data silos - all services use same infrastructure ▪ Data Processing - Athena ▪ No need to maintain and upgrade & very cost effective ▪ Write simple SQL queries with joins #19
  • 20. Athena Query - 253 GB scanned in 2 minutes #20
  • 21. Summary By leveraging AWS S3 & Athena, enable us to drive: ● Faster Decisioning ○ Improved data capturing resulting in a higher conversion rate (more applications are getting decisioning faster and higher take up rate) ● Higher Reliability ○ With robust & scalable infrastructure, reliability increased ○ 1-2 personal to manage the whole pipeline vs 6-8 persons ● Cost & Performance Benefits ○ No need to manage bulky infrastructure – Reduced cost of managed server ○ Able to run analytical queries over TBs of data seamlessly #21
  • 22. tech@stashfin.com Contact us at: #22 Recording - https://yourstory.com/session/how-to-empower-a-platform-with-a-data-pipeline-at-