How to Empower a Platform With a Data Pipeline At a Scale

Parikshit Chitalkar
Co-Founder & CTO,
StashFin
Deepak Sood
Engineering Lead,
StashFin
David Lin
Head of Risk,
StashFin
How to Empower a Platform With a
Data Pipeline At a Scale
Nitin Dhir
Solutions Architect,
AISPL
#1

Agenda
Overview of StashFin
High level Architecture
Platform build & scale out
Designing a data pipeline on AWS
Q&A
#2

Overview
▪ Provides flexible small ticket size personal loans to salaried
individuals through its web and mobile platforms
▪ Manages the lending process end to end from customer
onboarding to disbursal
▪ Founded in 2016, headquartered in Delhi, by a team of former
financials
FOUNDERS
Tushar Aggarwal
Ex General Atlantic, Everstone and
Goldman Sachs, Wharton Business
School
Parikshit Chitalkar
17 years in Fintech, Mobile & Web
Technologies in Canada & US Trained
Pilot. Successfully built & sold 6
DegreesIT to a US based private
equity firm
Shruti Aggarwal
Ex Merrill Lynch (NYC) and PWC,
Columbia University, Chartered
Accountant
BUSINESS
Key Investors
▪ Credit line card which provides
flexibility to use funds across
POS and ATM machines in the
country
#4

Key figures in 3 years
Cumulative disbursals
620,000+
Repeat Customers
>50%
Loans per month
120,000
Apps Download
60.0x
Average Ticket Size
US$ 1000
Customers
>400K
Volume growth
Active Cards
>20,000>2 Milllion
#5

Approvals and Partnerships
Key Partnerships
Compliance & Certification*
*compliance standards met
#6

75 Applications integrated to create a seamless customer experience
#7

StashFin Architecture - Key Components
#8

Platform build and scale out
#10

▪ How StashFin made a decision early
▪ Stage of the Business
▪ Factors involved
▪ Cost
▪ Scale
▪ Choice of Tech Stack
▪ LAMP vs MEAN vs .NET
▪ Perspective 4 years later
▪ Trade offs & Balancing
BUILD vs BUY
#11

Identifying Scale Bottlenecks
▪ Monitoring
▪ You cannot fix what you can't measure
▪ Devops
▪ Obsessively automate to standardize
▪ Specific Business Issues
▪ Lead volume spikes
▪ Reads / commits to Database (Table locks)
▪ Decisioning Time
▪ Security
▪ DDOS
▪ API Credit attacks
▪ 3rd Party vulnerabilities
▪ Managing expectations on Delivery
▪ Impact management
▪ Release cycles
▪ Innovation vs BAU
#12

Current AWS Infrastructure - Deep Dive
#13

• Monolithic application
• Hard to scale individual services
• Challenges in granular monitoring
• Written in PHP and complex SQL
procedures
• Analytics workloads were very
resource intensive as DB was
designed for Production workloads
• Technical debt due to rapid
development
▪ Microservices Architecture
▪ Each team will manage their own components
▪ Each service can be deployed and scaled independently
▪ Multi language, multi framework support (Django, NEST.JS + React +
▪ Amazon S3 for storing
▪ High availability, built for scale and durability
▪ Kubernetes on Amazon EKS for processing
▪ Easy to manage and scale
▪ Support for open-source monitoring tools
▪ Amazon Glue/Athena for Analytics
▪ Can run large scale analytics without any additional infrastructure on top
of S3
▪ Amazon RedShift as the Compliance SOT + Book of Record + Data
Warehouse
▪ Populated via ETLs - easily consumable derived data
▪ Compliance & access issues resolved
What we had? And what we have built?
V1 Tech Stack V2 Tech Stack
#14

Monitoring our Cluster and Workloads
▪ State of the art Kubernetes Cluster
▪ Monitoring - Prometheus + AlertManager + Grafana
▪ Logging - Elasticsearch + Logstash + Kibana
▪ Service Mesh - Istio
▪ Other tools - Keycloak + Jenkins + Kafka + Sealed Secrets + Sentry + NewRelic + Airflow
#15

Designing data pipeline on AWS
#16

Problem Statement
▪ Volume spiked from 1,000 loan applications to 500,000 loan applications
▪ Data footprint - each application has approximately 2000 data points, thus a big data
issue: Credit Bureau, Bank, Application, Device, Geo-location, Demographic
▪ 1 million messages per day to 100 million messages per day (Big Data)
▪ Structured and unstructured data
▪ Asynchronous data poses a challenge in scoring (Redis Zset + Queue)
▪ Models / Scorecards need to be simplified & equivalency to be achieved
▪ Data governance for storage and analytics over a massive dataset
▪ Tracking and monitoring the whole infrastructure along with model performance
#17

Data Pipeline Design Considerations
▪ Organizational Data Democracy
▪ Having data accessible across all verticals
▪ Centralized data lake for all reporting needs
▪ Feature Engineering for ML Models
▪ Model implementation to production in a
couple hours
▪ Ability to handle a wide & deep feature set
▪ Data Capturing/Decisioning Speed
▪ Capture more data earlier in the journey
▪ Real time decisioning better user experience
▪ Security & Compliance
▪ Low total cost of ownership
▪ End-to-end process in code & fit for purpose
engineering
▪ Cost effective at scale
#18

Data Pipeline Structure
▪ Real time decision & Scaleable
▪ Using Redis to cache data and run decision models
▪ Async process to score as data arrives in multiple
packets
▪ Data Storage
▪ Using S3 as our data lake - All the data points are
stored in S3 directly
▪ Queries are run on top of S3 using Athena seamlessly
▪ No data silos - all services use same infrastructure
▪ Data Processing - Athena
▪ No need to maintain and upgrade & very cost effective
▪ Write simple SQL queries with joins
#19

Athena Query - 253 GB scanned in 2 minutes
#20

Summary
By leveraging AWS S3 & Athena, enable us to drive:
● Faster Decisioning
○ Improved data capturing resulting in a higher conversion rate (more applications are
getting decisioning faster and higher take up rate)
● Higher Reliability
○ With robust & scalable infrastructure, reliability increased
○ 1-2 personal to manage the whole pipeline vs 6-8 persons
● Cost & Performance Benefits
○ No need to manage bulky infrastructure – Reduced cost of managed server
○ Able to run analytical queries over TBs of data seamlessly
#21

tech@stashfin.com
Contact us at:
#22
Recording - https://yourstory.com/session/how-to-empower-a-platform-with-a-data-pipeline-at-

How to Empower a Platform With a Data Pipeline At a Scale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How to Empower a Platform With a Data Pipeline At a Scale

Similar to How to Empower a Platform With a Data Pipeline At a Scale (20)

Recently uploaded

Recently uploaded (20)

How to Empower a Platform With a Data Pipeline At a Scale