Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Awesome Banking APIs
Exposing bigdata and streaming analytics using hadoop, cassandra, akka and spray
Humanize Data
The bank statements
The bank statements How I read the bank bills
The bank statements How I read the bank bills What happened those days
data is the fabric of our lives
Personal history:
Long term Interaction:
Real time events:
>>> from sklearn.datasets import load_iris
>>> from sklearn import tree
>>> iris = load_iris()
>>> clf = tree.DecisionTree...
# Multiple Linear Regression Example
fit <- lm(y ~ x1 + x2 + x3, data=mydata)
summary(fit) # show results
● Language for s...
OK, let’s build some
banking apps
core banking systems
SOAP
services
and DBs
System
BUS
customer
facing appls
channels
Bank schematic
Challenges
Higher
separation !
Bigger and Faster
Less silos
Interactions
with core
systems
Reliable
Low cost
↓ ↑
Computing Powerhouse
Reliable
Low latency
Tunable CAP
Data model:
hashed rows, sorted wide columns
Architecture model:
No SPOF, ring of nodes,
...
Actor
A Actor
B
Actor
C
msg 1
msg 2
msg 3
msg 4
●
●
●
●
Core
Flow
HTTP
I/O
NoSQL
Client
hadoop
Batch
Datascience
Cassandra
SOAP
Client
Real-time
Analytics
Bank core servicesBankT...
Sprayin’
trait ApiService extends HttpService {
// Create Analytics client actor
val actor = actorRefFactory.actorOf(Props...
Latency tradeoffs
Managing computation
Science & Engineering
Statistics,
Data Science
Python
R
Visualization
IT Infra
Big Data
Java
Scala
SQL
Hadoop: Big Data In...
Some lessons learned
● Mix and match technologies is a good thing
● Harden the design as you go
● Define clear interfaces
...
Thanks !
Any questions?
Awesome Banking API's
Awesome Banking API's
Awesome Banking API's
Upcoming SlideShare
Loading in …5
×

Awesome Banking API's

9,413 views

Published on

How do you combine comprehensive analysis running on large amount of data with the demand for responsiveness of today's api services?

This talk illustrates one of recipes that we currently use at ING to tackle this problem. Our analytical stack combines machine learning algorithms running on hadoop cluster and api services executed by an akka cluster.

Cassandra is used as a 'latency adapter' between the fast and the slow path. Our api services are executed by the akka/spray layer. Those services consume both live data sources as well as intermediate results as promoted by the hadoop layer via cassandra. This approach allows us to provide internal api services which are both complete and responsive.

Awesome Banking API's

  1. 1. Awesome Banking APIs Exposing bigdata and streaming analytics using hadoop, cassandra, akka and spray
  2. 2. Humanize Data
  3. 3. The bank statements
  4. 4. The bank statements How I read the bank bills
  5. 5. The bank statements How I read the bank bills What happened those days
  6. 6. data is the fabric of our lives
  7. 7. Personal history: Long term Interaction: Real time events:
  8. 8. >>> from sklearn.datasets import load_iris >>> from sklearn import tree >>> iris = load_iris() >>> clf = tree.DecisionTreeClassifier() >>> clf = clf.fit(iris.data, iris.target) ● Flexible, coincise language ● Quick to code and prototype ● Portable, visualization libraries Machine learning libraries: scipy, statsmodels, sklearn, matplotlib, ipython Web libraries flask, tornado, (no)SQL clients
  9. 9. # Multiple Linear Regression Example fit <- lm(y ~ x1 + x2 + x3, data=mydata) summary(fit) # show results ● Language for statitics ● Easy to Analyze and shape data ● Advanced statistical package ● Fueled by academia and professionals ● Very clean visualization packages Packages for machine learning time serie forecasting, clustering, classification decision trees, neural networks Remote procedure calls (RPC) From scala/java via RProcess and Rserve
  10. 10. OK, let’s build some banking apps
  11. 11. core banking systems SOAP services and DBs System BUS customer facing appls channels Bank schematic
  12. 12. Challenges
  13. 13. Higher separation ! Bigger and Faster Less silos Interactions with core systems
  14. 14. Reliable Low cost ↓ ↑ Computing Powerhouse
  15. 15. Reliable Low latency Tunable CAP Data model: hashed rows, sorted wide columns Architecture model: No SPOF, ring of nodes, omogeneous system
  16. 16. Actor A Actor B Actor C msg 1 msg 2 msg 3 msg 4 ● ● ● ●
  17. 17. Core Flow HTTP I/O NoSQL Client hadoop Batch Datascience Cassandra SOAP Client Real-time Analytics Bank core servicesBankTransactions Data Science Data Science Data Science API
  18. 18. Sprayin’ trait ApiService extends HttpService { // Create Analytics client actor val actor = actorRefFactory.actorOf(Props[AnalyticsActor], "analytics-actor") //curl -vv -H "Content-Type: application/json" localhost:8888/api/v1/123/567 val serviceRoute = { pathPrefix("api" / "v1") { pathPrefix( Segment / Segment ) { (aid, cid) => get { complete { actor ? (aid, cid) Create an actor for analytics Serve the API path Message is passed on to the analytics actor https://github.com/natalinobusa/wavr
  19. 19. Latency tradeoffs
  20. 20. Managing computation
  21. 21. Science & Engineering Statistics, Data Science Python R Visualization IT Infra Big Data Java Scala SQL Hadoop: Big Data Infrastructure, Data Science on large datasets Big Data and Fast Data requires different profiles to be able to achieve the best results
  22. 22. Some lessons learned ● Mix and match technologies is a good thing ● Harden the design as you go ● Define clear interfaces ● Ease integration among teams ● Hadoop , Cassandra, and Akka: they work! ● Plugin the Data Science !
  23. 23. Thanks ! Any questions?

×