1. Harvard Business Club presentation
_______________________________
Big Data Case Studies
Nitin Kabra
1
2. Agenda
Case Study 1 – Customer Risk Profiling
Case Study 2 - Trade Surveillance and Reporting
Case Study 3 – Online Account Opening
Case Study 4 – Legacy Migration to Hadoop
Case Study 5 – ATM/Mobile adjustment data
Questions and next steps
2
3. Use Cases implementation
Very large bank with several consumer lines of business needed
to analyze customer activity across multiple channels, build a
customer scoring model based on behavioral analysis for
fraudulent activity(both real-time and batch)
Trade Surveillance &
Reporting(DF, Volcker)
The bank already captured trading activity and used that data to
assess, predict, and manage risk for both regulatory and non-
regulatory purposes.
Online Account Opening
Fraud detection cases -Bank wanted to focus on online account
opening frauds.
Legacy migration to
Hadoop
Availability of Fraud Detection cases data to build customer
scores from various LOB's for Risk Management and Customer
retention.
Provide the Adjustment data from ATM/Mobile deposits Data for
Fraud Analysts to make decision on the same day.
3
ATM/Mobile adjustment
Data
Customer Risk Profiling
4. Case Study 1
4
Challenge
•Very large bank with several consumer
lines of business needed to analyze
customer activity across multiple
products to predict credit risk with
greater accuracy.
•Over the years, the bank had acquired
a number of regional banks. Each of
those banks had a checking and savings
business, a home mortgage business,
credit card offerings and other financial
products.
•Those applications generally ran in
separate silos- each used its own
database and application software. A
large number of independent systems
that could not share data easily.
•With the economic downturn of 2008,
the bank had significant exposure in its
mortgage business to defaults by its
borrowers. Risk management was truly
needed.
Solution
•The bank set up a single Hadoop
cluster containing more than a
petabyte of data collected from
multiple enterprise data warehouses.
•With all of the information in one
place, the bank added new sources of
data, including customer call center
recordings, chat sessions, emails to
the customer service desk and others.
•Pattern matching techniques, text
processing, sentiment analysis, graph
creation to combine, digest and
analyze the data.
Advantage
•The bank used the Hadoop cluster to
construct a new and more accurate
score of the risk in its customer
portfolios. Clear picture of a
customer’s financial situation, his risk
of default or late payment and his
satisfaction with the bank and its
services.
•The more accurate score allowed the
bank to manage its exposure better
and to offer each customer better
products and advice.
•Hadoop increased revenue and
improved customer satisfaction.
•Not just a reduction of cost from the
existing system, but improved revenue
from better risk management and
customer retention
Customer Risk Profiling
5. Case Study 2
5
Challenge
• The bank already captured trading
activity and used that data to
assess, predict, and manage risk
for both regulatory and non-
regulatory purposes.
• The very large volume of data,
however, made it difficult to
monitor trades for compliance,
and virtually impossible to catch
“rogue” traders, who engage in
trades that violate policies or
expose the bank to too much risk.
• Regulatory reporting based on the
new guidelines after 2008 market
meltdown was proving a
mammoth challenge.
Solution
• Built a Hadoop cluster that runs
alongside its existing trading
systems. The Hadoop cluster gets
copies of all of the trading data,
margins, limits, exposure but also
holds information about parties in
the trade.
• Built a powerful suite of novel
algorithms using statistical and
other techniques to monitor
human and automated or program
trading.
• Pattern matching techniques,
developed a very good picture of
normal trading activity, now they
watch for unusual patterns that
may reflect rogue trading.
Advantage
• Detect a variety of illegal activity,
including money laundering,
insider trading, front-running,
intra-day manipulation, marking to
close and more.
• Fast detection allows the bank to
protect itself from considerable
losses.
• Hadoop increased revenue and
improved customer satisfaction.
• The Hadoop cluster also helps the
bank comply with financial
industry regulations.
• Hadoop provides cost effective,
scalable, reliable storage so that
the bank can retain records and
deliver reports on activities for
years, as required by law.
Trade Surveillance and Reporting (DF, Volcker)
6. Case Study 3
6
Challenge
• Fraud detection cases -Bank
wanted to focus on online account
opening frauds.
• Can be real-time/batch, requires
data from structured and
unstructured sources.
• Identification of patterns,
relational analysis, define certain
rules for account opening online
Solution
• Validate user identity based on
past patterns can be done in
seconds.
• Connectivity with external credit
rating agencies using CEP and
semantic search for continuous
data flow.
• Ingest huge requests in Hadoop
ecosystem.
• Identification of patterns based on
vast historical data stored in
Hadoop cluster from various
sources (rating agencies,
investigating agencies).
• Defined rules based on geospatial,
locational analysis, requests sent
format.
Advantage
• Could analyze huge PBs of data
with latency within SLA period.
• Reduced costs handling huge
volumes.
• Further alerts / Events processing
using CEP to take appropriate
action for a fraudulent request.
• Quick analytics helped bank track
down the locations, regions and
identify a pattern for such
fraudulent requests.
Online Account Opening
7. Case Study 4
7
Challenge
• Fraud detection cases – Collect
data from various products and
build customer score.
• The data processing happens on
legacy mainframes by state, takes
hours of time and delays
downstream batch processing.
• The SAS Fraud Management
product requires the data to be in
a specific format and the reformat
process is critical and takes longer
time.
Solution
• The bank set up a single Hadoop
cluster containing more than a
petabyte of data collected from
multiple data sources.
• The reformat process is completed
quickly as the processing is done
on tables instead of by files .
• With all of the information in one
place, data from other sources is
also added easily.
• Pattern matching techniques, text
processing, sentiment analysis,
graph creation to combine, digest
and analyze the data.
Advantage
• The bank used the Hadoop cluster
to construct a more accurate
customer score. Clear picture of a
customer’s financial situation, his
risk of default or late payment and
his satisfaction with the bank and
its services.
• The more accurate score allowed
the bank to manage its exposure
better and to offer each customer
better products and advice.
• Hadoop increased revenue and
improved customer satisfaction.
• Not just a reduction of cost from
the existing system, but improved
revenue from better risk
management and customer
retention.
Legacy Migration to Hadoop
8. Case Study 5
8
Challenge
• Fraud detection cases –
Adjustment data from
ATM/Mobile deposits.
• The data is available to Fraud
Analysts only on day-2 of the
transaction.
• Current business process allows
funds to be available to the
customer as soon as the
adjustment is posted to the
account.
• Have to toggle between multiple
screens for data from different
sources for verification of
transactions.
Solution
• Have the adjustment data
available to the Fraud Analysts as
soon as possible before posting of
funds to customer account.
• Have customer and transaction
data available to Fraud Analysts at
a single place.
• Ingest data from multiple sources
in Hadoop ecosystem.
• Identification of patterns based on
vast historical data stored in
Hadoop cluster from various
sources.
• Block/close the account as soon as
the fraudulent transaction is
identified preventing funds going
out of bank.
Advantage
• Could load huge PBs of data into
Hadoop much faster than
traditional processes.
• Reduced costs of handling huge
volumes.
• Analysts got capability to easily
identify patterns with all the data
available at a single place.
• Quick analytics helped bank track
down the accounts and
close/block them from operating
further.
ATM/Mobile adjustment data
9. Information
Ingestion
Traditional Landscape – Analytical Sources
Security and Business Continuity Management
Information
Consumption
Analytics Sources
Shared Operational Information
Information Governing Systems Metadata Catalog
Content
Repository
Master Data
Hubs
Reference
Data Hubs
TransformationEngines
Warehouse
Data
Marts
Data
Cubes
Database Files
Database
Files
Traditional
Sources
3rd Party
Applications
Visualization
Reporting &
Dashboards
Geospatial
Analysis
Statistical
Analysis
Scorecards &
Metrics
Events & Alerts
Data Mining
Investigation
Social Analysis
Case
Management
Link Analysis
Pattern Analysis
9
Case study for Customer Risk Profiling
10. ActionDecisioning & ContextObservation
Space
Line of Business
Applications
• Market
providers
• Broker Dealer
• Research
• Transactional
data
Real time streaming
data
Unstructured
Sources
• Voice
• Images
• eMail
• Web
Interactions
• Telematics
Investigative
Data
• Twitter
• Facebook
• Streetwatch
• finviz
• Documents &
Reports
Unstructured
Analytics Discovery
Analytical Data
Mart
Real-time Decision
& Analytics
Visualization
Reporting &
Dashboards
Geospatial Analysis
Statistical Analysis
Scorecards &
Metrics
Events & Alerts
Data Mining
Capture and
Integration
Identity &
Relationship Analysis
Who Is Who
Analytical Modeling
Anomaly Analytics
Geospatial Analytics
bitemporal Analytics
Data Exploration &
Discovery
Text Analytics
Who Knows Who
Who Does What
Who’s Name
In Database
Analytics
Industry Data
Model
Investigation
Social Analysis
Case Management
Link Analysis
Pattern Analysis
External
Sources
• Social Media
• Government
Records
• Watch Lists
Feedback
Real-time and
Batch Data
Acquisition and
Provisioning
Data Movement
Data Quality
Metadata Mgmt
USES
•Scoring model
•Linking customer
accounts eg:
•Brokerage, HELOC,
mortgage,
insurance, credit
cards, savings and
checking, MM etc.
10
Case study for Customer Risk Profiling (cont..)
Rules Engine
Authorization
Permissions
11. Big Data Enhanced Analytical Sources Enable Analytics - New sources, real-time analytics,
deeper regulation, new technology
Our analysts have instant access to consolidated
information across various internal and external
sources, and collaborating across departments,
channels and LOBs is seamless.
Investigation Management
We use extensive contextual information from a
variety of sources. New patterns we identify are
fed back into screening.
Analytics
Our data is consolidated across channels, we use
all of the customer interaction and behavior
information we capture, and there is no delay
before all of this information can be analyzed.
Data Management
1 2 3
Ingestion & Real-Time
Analytic Zone
• Real-time (µs) data
movement and
analysis (anomaly
detection, correlation,
scoring, etc)
• Structured and
unstructured data
(geospatial, text,
voice, video, etc)
Landing & Historical
Zone
• Structured and
unstructured data
• Single data hub
across products,
channels, behaviors
and relationships
Analytic
Appliances
• Self-provisioning
• Deep analytics
2
2
1
3
Deep & predictive
Models
• Real-time
1
2
• Real-time
• Consolidated
• Integrated
11
Case study for Customer Risk Profiling (cont..)
12. Reference Architecture
Observation
Space
Line of Business
Applications
Unstructured
Sources
Investigative
Data
External
Sources
12
Case study for Customer Risk Profiling (cont..)DE
DEVELOPMENT
Messaging
Web services
Logs
Parsing
JSON
Cassandra DB
Rules Engine
HDFS
Flume NG
Oozie
Zookeeper
Hive
PIG
Mapreduce
Sqoop
HCatalog
App Server
Web server
Reporting
Elasticsearch
Ikanow
Lexalytics
Provalis
Lucene
Mahout
R