Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Advanced Analytics in Banking
Juan M. Huerta
Global Decision Management
VP Advanced Analytics
Citibank
I will talk about…
• Big Data Adoption process at Citi
• Realizing the Technical Value of Big Data
• Global Solutions
1
140
countries2
200 million
accounts
Citi: A Customer Centered Organization
3
As a customer-centered bank, the goal of our Big Data strategy to shift
the focus...
Big Data Adoption Stakeholders
• Lines of Business
• Strategy & Decision Management Organizations: cross LOB & Geo,
global...
Big Data Adoption Roadmap
5
Adoption will not occur at once. The level of capability maturity across the
organization will...
Big Data Adoption Hybrid Participation Model
• Novice: Proof of Concept
• Expert: R&D Environment
• Shadowed
6
7
End-to-end Analytic Process for a POC Project
This is one component of the hybrid model
Ideas and
Hypotheses
Information...
Advanced Global Solutions
• A global solution is a tested algorithm or analytic model that carries
out a particular busine...
Technical Value of Big Data:
Benchmarks and Analysis
The Boom Driving Big Data is Technological
Heebyung Koh , Christopher L. Magee
A functional approach for studying technolo...
The Quadrant Of Analytic Opportunity
Run Time is affected by Data Size and Algorithmic Complexity
Algorithmic Complexity
D...
Breaking down the gains of P13n:
A Controlled Incremental Benchmark on a
Workstation grade processor (x500)
Implemented an...
Basic Map Reduce Benchmarks
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 2 3 4 5 6
Series1
Impact of overhead as function
Of input volu...
HAMSTER: Hadoop Multi-signature Search
for Text-based Entity Retrieval
• Core algorithm: String Edit Distance O(mnk2)
• Ba...
Leveraging Global Big Data Global Solutions
Creating Global Big Data solutions
Our goal is to evolve from Big Data algorithms to Big Data
Solutions
Example of Advanced Global Solution Matrix
17
Outlier
Detection
Multivariate
Segmentation
Sequence
Matching
Network
Analys...
Example: Transactional Time Series
AnomalousBehavior
On Demand Simulation: Generate Branches’ DNA
• Case Scenario: Unusual number of cash advances by 2 tellers.
Single day fra...
Creating Regions of Interest based on
On-Demand-Simulation
Minimum-Spanning-
Tree based branch
association for region
of i...
Conclusion: Lessons Learned
• One Size does not fit all
• Follow a Hybrid Approach
• Leverage Analytic patterns: Global So...
Thank You!
22
Upcoming SlideShare
Loading in …5
×

Advanced Analytics in Banking, CITI

5,015 views

Published on

In this presentation Juan M. Huerta talks about big data adoption process at Citi, realising the technical value of big data and global solutions. Huerta goes on to talk about following a hybrid approach, and the future of analytics, expensive algorithms applied to large datasets. With Citi using these approaches in hopes of getting even wider global recognition.

Published in: Technology, Education

Advanced Analytics in Banking, CITI

  1. 1. Advanced Analytics in Banking Juan M. Huerta Global Decision Management VP Advanced Analytics Citibank
  2. 2. I will talk about… • Big Data Adoption process at Citi • Realizing the Technical Value of Big Data • Global Solutions 1
  3. 3. 140 countries2 200 million accounts
  4. 4. Citi: A Customer Centered Organization 3 As a customer-centered bank, the goal of our Big Data strategy to shift the focus from independent vertical silos to Common Horizontal Solutions focused around Citi’s 200-million customer accounts
  5. 5. Big Data Adoption Stakeholders • Lines of Business • Strategy & Decision Management Organizations: cross LOB & Geo, global • Data innovation Office: Governance & Regulatory • CitiData – Big Data & Analytics Engineering 4
  6. 6. Big Data Adoption Roadmap 5 Adoption will not occur at once. The level of capability maturity across the organization will vary significantly. On theory we think in terms of Staged Competencies of a Big Data Maturity Model. In practice, a hybrid process, which fits the level of maturity of participants, is needed. Common Data Common Analytic Platform Common Tools & Techniques Common Solutions Common Focus Strategy
  7. 7. Big Data Adoption Hybrid Participation Model • Novice: Proof of Concept • Expert: R&D Environment • Shadowed 6
  8. 8. 7 End-to-end Analytic Process for a POC Project This is one component of the hybrid model Ideas and Hypotheses Information Asset Inventory Navigator (“IAIN”) • Pipeline of ideas to use data for competitive advantage • Robust, comprehensive ontology allowing analysts and economists to search, sort, and select data for analysis • Preliminary assessment for business value, data safekeeping and alignment to business practices Data Transformation & Provisioning • Transformation rules executed to normalize and conform production data • Conformed data set made available in production environment Production Model Development • Develop scalable, productizable analytics Model Deployment • Exploit insights and analyses across the enterprise to maximize value • Models measured for quality / usage • Formal approval process through Business Steering Committee based on understanding expected use of production data R&D process R&D Project Approval Product Approval Engineering / Production process Analytics Knowledge Management • Robust, compreh ensive ontology allowing analysts and economists to search, sort, an d select data for analysis Data Set Preparation & Provisioning • Basic preparation of data set (e.g., consolidation, conformation) • Permission-based provisioning of data set into a Big Data Analytics environment Analytics Execution • Advanced analytic tools mine business insight from large volumes of data • Data scientist peers review model findings and results Analytics Peer Review Data Acquisition • Where necessary, acquire new data sets to support R&D project
  9. 9. Advanced Global Solutions • A global solution is a tested algorithm or analytic model that carries out a particular business analysis and which is leveraged at a global scale • A big data global solution enables the interplay of complex algorithms and large datasets • When a global solution is built upon big data approaches a delivery roadmap should be considered • In the exploratory process a Global Solution is developed in the Innovation R/D environment and validated through a POC process • Alignment with Innovation, UAT, PRD environments 8
  10. 10. Technical Value of Big Data: Benchmarks and Analysis
  11. 11. The Boom Driving Big Data is Technological Heebyung Koh , Christopher L. Magee A functional approach for studying technological progress: Extension to energy technology Technological Forecasting and Social Change, Volume 75, Issue 6, July 2008, Pages 735–758
  12. 12. The Quadrant Of Analytic Opportunity Run Time is affected by Data Size and Algorithmic Complexity Algorithmic Complexity Database Interaction Mtg+Cards+ Banking Accounts Transaction features Accounts Transactions Branches Transactions Accounts Summary Stats. Employees Summary Stats. GL-GOCS GL-Entries Branches Summary Stats. 10^10 10^9 10^9 10^8 10^7 10^6 10^5 Data Size Sequence Mining Predictive filtering Latent Dirichlet Allocation HMM Baum- Welch O(ns nf nt) CART O(nf ns log ns) Iterative SVD- CF K-means Logistic Regression PCAPage Rank Self-Org. Maps Neural Nets Collaborative Filtering (CF) Vector based Approaches HMM Machine Learning Traditional Statistical Big Data/Pattern Mining Conditional Random Fields Support Vector Machines
  13. 13. Breaking down the gains of P13n: A Controlled Incremental Benchmark on a Workstation grade processor (x500) Implemented an incremental-SVD (Netflix Cup) predictive model that runs on midsize of datasets… X30 • Compiled Code (vs. interpreted) x4 • In Memory (vs. Disk access) X3.12 • Multithread (vs. single thread) X1.3 • Workstation grade processor
  14. 14. Basic Map Reduce Benchmarks 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 2 3 4 5 6 Series1 Impact of overhead as function Of input volume: Relative Map Throughput as a function of # Mappers 0 5 10 15 20 25 0 5 10 15 20 RelativeMapCPUtimespeedup Number of Maps 0.003351955 0.032258065 0.319148936 1 2.631578947 21.12676056 Linear (0.003351955 0.032258065 0.319148936 1 2.631578947 21.12676056) 0 200 400 600 800 1000 1200 1400 1600 0 5 10 15 20 TokensperWallClockSecond Number of Maps Series1 Linear (Series1)
  15. 15. HAMSTER: Hadoop Multi-signature Search for Text-based Entity Retrieval • Core algorithm: String Edit Distance O(mnk2) • Baseline runs at 100 matches per day • HAMSTER speedup: 33x (5 node speedup) 60x (java speedup) = 2000x faster Source Items Target Items Source items per target Input Size MAP Records Cluster Max Map Tasks Effective Map Tasks CPU map (secs) Wall time 34k 618k 100 4.40GB 345 33 33 196k 2h 14 secs 34k 618k 50 8.8GB 690 40 66 196k 1h 47min 34k 618k 30 14.6GB 1,149 40 110 199k 1h 39 min
  16. 16. Leveraging Global Big Data Global Solutions
  17. 17. Creating Global Big Data solutions Our goal is to evolve from Big Data algorithms to Big Data Solutions
  18. 18. Example of Advanced Global Solution Matrix 17 Outlier Detection Multivariate Segmentation Sequence Matching Network Analysis Customer Contextual Clickstream Action Marketing Risk/Fraud Digital Structured Prediction 17 K-Medoids Clustering
  19. 19. Example: Transactional Time Series AnomalousBehavior
  20. 20. On Demand Simulation: Generate Branches’ DNA • Case Scenario: Unusual number of cash advances by 2 tellers. Single day fraud Multi day fraudOriginal branch (August)
  21. 21. Creating Regions of Interest based on On-Demand-Simulation Minimum-Spanning- Tree based branch association for region of interest generation Multi-day fraud simulation Original branch Region of interest • Numbers shown are randomized indices
  22. 22. Conclusion: Lessons Learned • One Size does not fit all • Follow a Hybrid Approach • Leverage Analytic patterns: Global Solutions • Big Data is about Parallelization • The future: expensive Algorithms applied to large datasets • Global Solutions are the combination of algorithmic building blocks applied to specific business problems 21
  23. 23. Thank You! 22

×