SlideShare a Scribd company logo
1 of 31
Download to read offline
Improving Mobile
Payments with Real time
Spark
● Madhukara Phatak
● Big data consultant and
trainer at datamantra.io
● Consult in Hadoop, Spark
and Scala
● www.madhukaraphatak.com
Agenda
● Mobile as drive for big data
● Our customer solution
● Existing data solution
● Improved solution
● Technical details
● Future enhancements
● Q & A
Mobile as Big data drive
● Mobile has changed the way in which we interact with
world
● Most of the buy/sell happens on mobile today
○ Myntra went fully mobile
○ Flipkart and amazon say their 50% buy happens on
mobile
○ Quikr and OLX is mobile based selling platform
○ Ola etc
Challenges in Mobile
● Customers expect the service to available 24/7
● Tiny screens make very challenging to typical software
flows
● Flaky connectivity of mobile networks makes it tougher
● Constant moving results in drop in interactions
● No more downtime
● Everything has to be done in realtime
Mobile payments
● Almost every app earlier mentioned needs some kind of
payment
● Getting payments right on mobile is very hard
● Globally 21% of online shoppers abandon their basket
due to payment failures or delays
● Some companies are building sdk’s to help the app
developers
● Our customer is one of them
Why mobile payments are hard?
Too many inputs
Terrible interface by Banks
OTP vs Password
Our customer solution
Our customer solution
● Mobile sdk for applications simplify the payments
● SDK provides better user interface like big buttons to
generate OTP or other flows
● SDK also helps in filling up different kind of forms given
by different banks using consistent UI
● Better user experience across applications
● Application sends anonymous payments details across
apps to our customer servers
Some numbers
● 40 + customers
● Over 1 million transactions per month as per March
● Around 55% success rate ( 5 % above average)
● Supports major banks, payment gateways and wallet
providers
● Soon will be available in other than mobile payment
space
Why data matters?
● As number of transaction increases, things will go
wrong
● There are so many different combinations to go wrong
● Example
○ Airtel OTP failing with state bank netbanking
○ Customers stuck in password page
○ Not able to read OTP from some specific
● Understanding customer pain and reacting to it is
paramount
● Every help results in payment
Initial BI solution
Events
Hourly
Push
JSON
Data
S3FS
Session Wise
Aggregations
Initial BI solution
● Phone sdk pushes events like transaction initiation,
payment complete to logging servers
● Logging servers roll log for every one hour and push to
s3
● A single node spark machine aggregates data by
sessions and pushes it to mysql
● Google BigQuery is used for adhoc querying
Challenges with BI solution
● Batch processing
● Geared towards more of report generation oriented flow
● Very minimal use of Spark API’s as team was not well
aware of it’s potential
● No integration with mobile sdk for feed back loop
Requirements for consulting
● Bring the same reporting calculations to real time
● Understanding the user behaviour and tracking his/her
flow over a session
● Closing the loop by providing automatic alerts based on
the metric calculations
● Some new specific business cases like loyalty
management etc
● Improving team expertise on spark
Choosing Spark streaming
● Company was already invested in Spark so spark
streaming was no brainer
● Also porting spark batch code to streaming was mostly
straight forward as both talk same API
● Company used python as Spark API language which
was supported by streaming also
● So we didn't consider storm we went ahead with Spark
streaming
First version
Events
Five Minute Push
JSON
Data
FileStream
Session Wise
Aggregations
First version
● We used fileStream API of spark streaming which
allowed us to poll a s3 bucket for every few mins
● A new rolling appender was added to log servers to
push logs to s3 every 5 mins
● Exact same batch code was used for calculations which
made transition very easy
● All downstream applications remained same
Second version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime
Amazon Kinesis
● A kafka like distributed message queue by Amazon
● It’s used as managed kafka source on AWS web
services
● Highly scalable and low latency support
● Persistence with fault tolerance across multiple
availability zones
● Great integration with Spark
Second version
● Amazon kinesis is added as real time stream source
● Logging server push logs to kinesis as they arrive
● Streaming application pulls the data from kinesis for
every few mins
● Multiple partitions support added for parallel streams
Challenges with Python
● Spark streaming API for python was introduced in 1.2
whereas spark-streaming for Scala/Java is available
from 0.8
● No aws kinesis connector was available as of March
● Team has to write it’s own
● No support for python in Spark job server
Challenges from batch to streaming
● Session typically last from 1-10 mins. Batch is easy
most of the time session is done for a one hour data but
challenging for real time data
● Designing state for session
● Designing checkpointing and deciding on interval
● Weird checkpointing issues with s3 due to eventual
consistency
Improvements to batch code
● Most of the code was written in rdd paradigm as it was
only know to team
● Team was trained on spark sql and spark streaming
● Majority code was ported to Spark sql based solution to
improve readability and maintainability
● Recently moved into Dataframe based code
Third version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime
Choosing Mesos
● Mesos is a great cluster manager for Spark only
workloads
● Has specific coarse-grain mode which is dedicated for
the real time systems
● Minimal overhead compared to YARN
● Easy to setup on EC2
Fourth version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime
Grafana
● Added grafana for visualization and dashboards
● Graphana = Graphite + influxDB
● Moved away from mysql to time series database influx
DB
● Scales much better compared to mysql
● Data scientists or product managers can monitor
customers using these dashboards
● Integrates with mobile sdk

More Related Content

What's hot

Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Telliusdatamantra
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark applicationdatamantra
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2datamantra
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIdatamantra
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 APIdatamantra
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Sparkdatamantra
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to datasetdatamantra
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streamingdatamantra
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIdatamantra
 
Evolution of apache spark
Evolution of apache sparkEvolution of apache spark
Evolution of apache sparkdatamantra
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streamingdatamantra
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streamingdatamantra
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0datamantra
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelGarindra Prahandono
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2datamantra
 
Spark architecture
Spark architectureSpark architecture
Spark architecturedatamantra
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0datamantra
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streamingdatamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafkadatamantra
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streamingdatamantra
 

What's hot (20)

Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset API
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source API
 
Evolution of apache spark
Evolution of apache sparkEvolution of apache spark
Evolution of apache spark
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 

Viewers also liked

Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...Bradley Perry
 
Google Wallet Presentation
Google Wallet PresentationGoogle Wallet Presentation
Google Wallet PresentationRaghav Sharma
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewKrishna-Kumar
 
Digital currencies new technology new business model
Digital currencies new technology new business modelDigital currencies new technology new business model
Digital currencies new technology new business modelShiva Bissessar
 
Payments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, creditPayments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, creditBloomberg LP
 
Apple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to knowApple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to knowVibes_Thought_Leadership
 
Demonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital EconomyDemonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital EconomyShreyas Kamath
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkdatamantra
 
CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102Blockstrap.com
 
CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102Blockstrap.com
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Uri Laserson
 

Viewers also liked (16)

Digital Currencies: Where to from here?
Digital Currencies: Where to from here?Digital Currencies: Where to from here?
Digital Currencies: Where to from here?
 
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 
Google Wallet Presentation
Google Wallet PresentationGoogle Wallet Presentation
Google Wallet Presentation
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
 
Visa
VisaVisa
Visa
 
Digital currencies new technology new business model
Digital currencies new technology new business modelDigital currencies new technology new business model
Digital currencies new technology new business model
 
Payments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, creditPayments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, credit
 
Apple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to knowApple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to know
 
Demonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital EconomyDemonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital Economy
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102
 
CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 

Similar to Improving Mobile Payments With Real time Spark

Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streamingdatamantra
 
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...apidays
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processingidan_by
 
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...apidays
 
Google App Engine Overview and Update
Google App Engine Overview and UpdateGoogle App Engine Overview and Update
Google App Engine Overview and UpdateChris Schalk
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...confluent
 
Instruments to play microservice
Instruments to play microserviceInstruments to play microservice
Instruments to play microserviceChandresh Pancholi
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Scaling Production Data across Microservices
Scaling Production Data across MicroservicesScaling Production Data across Microservices
Scaling Production Data across MicroservicesErik Ashepa
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Nginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the IndustryNginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the IndustryBenjamin Scholler
 
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?WSO2
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Bowen Li
 
The good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functionsThe good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functionsMohsiur Rahman
 
Sweet Streams (Are made of this)
Sweet Streams (Are made of this)Sweet Streams (Are made of this)
Sweet Streams (Are made of this)Corneil du Plessis
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in SparkDigital Vidya
 

Similar to Improving Mobile Payments With Real time Spark (20)

Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
 
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
 
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
 
Google App Engine Overview and Update
Google App Engine Overview and UpdateGoogle App Engine Overview and Update
Google App Engine Overview and Update
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
 
Instruments to play microservice
Instruments to play microserviceInstruments to play microservice
Instruments to play microservice
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Scaling Production Data across Microservices
Scaling Production Data across MicroservicesScaling Production Data across Microservices
Scaling Production Data across Microservices
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Nginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the IndustryNginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the Industry
 
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
 
The good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functionsThe good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functions
 
Sweet Streams (Are made of this)
Sweet Streams (Are made of this)Sweet Streams (Are made of this)
Sweet Streams (Are made of this)
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
 

More from datamantra

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streamingdatamantra
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetesdatamantra
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Executiondatamantra
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsdatamantra
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle managementdatamantra
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark MLdatamantra
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scaladatamantra
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scaladatamantra
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetesdatamantra
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsdatamantra
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scaladatamantra
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scaledatamantra
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientistsdatamantra
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPdatamantra
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2datamantra
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalystdatamantra
 

More from datamantra (16)

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

Improving Mobile Payments With Real time Spark

  • 2. ● Madhukara Phatak ● Big data consultant and trainer at datamantra.io ● Consult in Hadoop, Spark and Scala ● www.madhukaraphatak.com
  • 3. Agenda ● Mobile as drive for big data ● Our customer solution ● Existing data solution ● Improved solution ● Technical details ● Future enhancements ● Q & A
  • 4. Mobile as Big data drive ● Mobile has changed the way in which we interact with world ● Most of the buy/sell happens on mobile today ○ Myntra went fully mobile ○ Flipkart and amazon say their 50% buy happens on mobile ○ Quikr and OLX is mobile based selling platform ○ Ola etc
  • 5. Challenges in Mobile ● Customers expect the service to available 24/7 ● Tiny screens make very challenging to typical software flows ● Flaky connectivity of mobile networks makes it tougher ● Constant moving results in drop in interactions ● No more downtime ● Everything has to be done in realtime
  • 6. Mobile payments ● Almost every app earlier mentioned needs some kind of payment ● Getting payments right on mobile is very hard ● Globally 21% of online shoppers abandon their basket due to payment failures or delays ● Some companies are building sdk’s to help the app developers ● Our customer is one of them
  • 12. Our customer solution ● Mobile sdk for applications simplify the payments ● SDK provides better user interface like big buttons to generate OTP or other flows ● SDK also helps in filling up different kind of forms given by different banks using consistent UI ● Better user experience across applications ● Application sends anonymous payments details across apps to our customer servers
  • 13. Some numbers ● 40 + customers ● Over 1 million transactions per month as per March ● Around 55% success rate ( 5 % above average) ● Supports major banks, payment gateways and wallet providers ● Soon will be available in other than mobile payment space
  • 14. Why data matters? ● As number of transaction increases, things will go wrong ● There are so many different combinations to go wrong ● Example ○ Airtel OTP failing with state bank netbanking ○ Customers stuck in password page ○ Not able to read OTP from some specific ● Understanding customer pain and reacting to it is paramount ● Every help results in payment
  • 16. Initial BI solution ● Phone sdk pushes events like transaction initiation, payment complete to logging servers ● Logging servers roll log for every one hour and push to s3 ● A single node spark machine aggregates data by sessions and pushes it to mysql ● Google BigQuery is used for adhoc querying
  • 17. Challenges with BI solution ● Batch processing ● Geared towards more of report generation oriented flow ● Very minimal use of Spark API’s as team was not well aware of it’s potential ● No integration with mobile sdk for feed back loop
  • 18. Requirements for consulting ● Bring the same reporting calculations to real time ● Understanding the user behaviour and tracking his/her flow over a session ● Closing the loop by providing automatic alerts based on the metric calculations ● Some new specific business cases like loyalty management etc ● Improving team expertise on spark
  • 19. Choosing Spark streaming ● Company was already invested in Spark so spark streaming was no brainer ● Also porting spark batch code to streaming was mostly straight forward as both talk same API ● Company used python as Spark API language which was supported by streaming also ● So we didn't consider storm we went ahead with Spark streaming
  • 20. First version Events Five Minute Push JSON Data FileStream Session Wise Aggregations
  • 21. First version ● We used fileStream API of spark streaming which allowed us to poll a s3 bucket for every few mins ● A new rolling appender was added to log servers to push logs to s3 every 5 mins ● Exact same batch code was used for calculations which made transition very easy ● All downstream applications remained same
  • 23. Amazon Kinesis ● A kafka like distributed message queue by Amazon ● It’s used as managed kafka source on AWS web services ● Highly scalable and low latency support ● Persistence with fault tolerance across multiple availability zones ● Great integration with Spark
  • 24. Second version ● Amazon kinesis is added as real time stream source ● Logging server push logs to kinesis as they arrive ● Streaming application pulls the data from kinesis for every few mins ● Multiple partitions support added for parallel streams
  • 25. Challenges with Python ● Spark streaming API for python was introduced in 1.2 whereas spark-streaming for Scala/Java is available from 0.8 ● No aws kinesis connector was available as of March ● Team has to write it’s own ● No support for python in Spark job server
  • 26. Challenges from batch to streaming ● Session typically last from 1-10 mins. Batch is easy most of the time session is done for a one hour data but challenging for real time data ● Designing state for session ● Designing checkpointing and deciding on interval ● Weird checkpointing issues with s3 due to eventual consistency
  • 27. Improvements to batch code ● Most of the code was written in rdd paradigm as it was only know to team ● Team was trained on spark sql and spark streaming ● Majority code was ported to Spark sql based solution to improve readability and maintainability ● Recently moved into Dataframe based code
  • 29. Choosing Mesos ● Mesos is a great cluster manager for Spark only workloads ● Has specific coarse-grain mode which is dedicated for the real time systems ● Minimal overhead compared to YARN ● Easy to setup on EC2
  • 31. Grafana ● Added grafana for visualization and dashboards ● Graphana = Graphite + influxDB ● Moved away from mysql to time series database influx DB ● Scales much better compared to mysql ● Data scientists or product managers can monitor customers using these dashboards ● Integrates with mobile sdk