Awesome Banking API's


Published on

How do you combine comprehensive analysis running on large amount of data with the demand for responsiveness of today's api services?

This talk illustrates one of recipes that we currently use at ING to tackle this problem. Our analytical stack combines machine learning algorithms running on hadoop cluster and api services executed by an akka cluster.

Cassandra is used as a 'latency adapter' between the fast and the slow path. Our api services are executed by the akka/spray layer. Those services consume both live data sources as well as intermediate results as promoted by the hadoop layer via cassandra. This approach allows us to provide internal api services which are both complete and responsive.

1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Awesome Banking API's

  1. 1. Awesome Banking APIs Exposing bigdata and streaming analytics using hadoop, cassandra, akka and spray
  2. 2. Humanize Data
  3. 3. The bank statements
  4. 4. The bank statements How I read the bank bills
  5. 5. The bank statements How I read the bank bills What happened those days
  6. 6. data is the fabric of our lives
  7. 7. Personal history: Long term Interaction: Real time events:
  8. 8. >>> from sklearn.datasets import load_iris >>> from sklearn import tree >>> iris = load_iris() >>> clf = tree.DecisionTreeClassifier() >>> clf =, ● Flexible, coincise language ● Quick to code and prototype ● Portable, visualization libraries Machine learning libraries: scipy, statsmodels, sklearn, matplotlib, ipython Web libraries flask, tornado, (no)SQL clients
  9. 9. # Multiple Linear Regression Example fit <- lm(y ~ x1 + x2 + x3, data=mydata) summary(fit) # show results ● Language for statitics ● Easy to Analyze and shape data ● Advanced statistical package ● Fueled by academia and professionals ● Very clean visualization packages Packages for machine learning time serie forecasting, clustering, classification decision trees, neural networks Remote procedure calls (RPC) From scala/java via RProcess and Rserve
  10. 10. OK, let’s build some banking apps
  11. 11. core banking systems SOAP services and DBs System BUS customer facing appls channels Bank schematic
  12. 12. Challenges
  13. 13. Higher separation ! Bigger and Faster Less silos Interactions with core systems
  14. 14. Reliable Low cost ↓ ↑ Computing Powerhouse
  15. 15. Reliable Low latency Tunable CAP Data model: hashed rows, sorted wide columns Architecture model: No SPOF, ring of nodes, omogeneous system
  16. 16. Actor A Actor B Actor C msg 1 msg 2 msg 3 msg 4 ● ● ● ●
  17. 17. Core Flow HTTP I/O NoSQL Client hadoop Batch Datascience Cassandra SOAP Client Real-time Analytics Bank core servicesBankTransactions Data Science Data Science Data Science API
  18. 18. Sprayin’ trait ApiService extends HttpService { // Create Analytics client actor val actor = actorRefFactory.actorOf(Props[AnalyticsActor], "analytics-actor") //curl -vv -H "Content-Type: application/json" localhost:8888/api/v1/123/567 val serviceRoute = { pathPrefix("api" / "v1") { pathPrefix( Segment / Segment ) { (aid, cid) => get { complete { actor ? (aid, cid) Create an actor for analytics Serve the API path Message is passed on to the analytics actor
  19. 19. Latency tradeoffs
  20. 20. Managing computation
  21. 21. Science & Engineering Statistics, Data Science Python R Visualization IT Infra Big Data Java Scala SQL Hadoop: Big Data Infrastructure, Data Science on large datasets Big Data and Fast Data requires different profiles to be able to achieve the best results
  22. 22. Some lessons learned ● Mix and match technologies is a good thing ● Harden the design as you go ● Define clear interfaces ● Ease integration among teams ● Hadoop , Cassandra, and Akka: they work! ● Plugin the Data Science !
  23. 23. Thanks ! Any questions?