Fraud Detection Reference Architecture
Apps data
from devices
News and
other alerts
Solution UX
Provisioning API (Pull)
User Profile Information
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
User Recent Activity Store
Gateway
Data Lake
Gateway
App Backend
Data Path
Optional solution component
Main solution component
Thin Client
Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
Personal
mobile
devices
Trades
and/or
transactions
Business
systems
Reference Architecture with Azure Services
Solution UX
Provisioning API
User Profile Information
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
User Recent Activity Store Store
Data Lake
Gateway
App Backend
Personal
mobile
devices
Business
systems
Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
Apps data
from devices
News and
other alerts
Gateway
Data Path
Optional solution component
Main solution component
Thin Client
Trades
and/or
transactions
User Profile and Metadata Stores
App Backend Solution UX
Provisioning API
User Profile Information
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
User Recent Activity Information
Data Lake
Gateway
(Kafka,
IoT Hub,
Event Hubs)
Data Path
Optional solution component
Main solution component
Metadata
Store
Gateway
Trades
and/or
transactions
Thin Client
News and
other alerts
Apps data
from
devices
Device Identity, Registry and State Stores
Metadata Store
Authority for all registered sources
Stores identity information and authentication secrets
User Profile Information
Indexed list of all Users and their demographics – Secure, Governed, Audit Controlled
Contains discovery and reference data related to Users
Can define a schema model or use a vertical industry standard schema for metadata
Can contain structured metadata and links to externally stored operational data
User Recent Activity
Contains operational data related to the Users’ most recent activities:
- “Last known values” for each User
- Aggregated or computed values
- Stream of device data events containing Geo location and Time based tagging
Stream Processors
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Data Lake
Data Path
Optional solution component
Main solution component
Gateway
Trades
and/or
transactions
Thin Client
News and
other alerts
Apps data
from
devices
Stream Processing: Data Flow
After ingress through the Gateway (Ingestion), the flow of data
through the system is facilitated by data pumps and analytics tasks
Data flow can be driven by:
• Apache Storm on Azure HDInsight
• Apache Spark on Azure HDInsight
• Azure Stream Analytics
• Custom Event Processors
Each can perform tasks
in flight:
• Data aggregation
• Data enrichment
• Complex event processing
… and can output data
to:
• Azure Data Lake
• Azure Blobs/Tables
• HDInsight / HBase
• Azure SQL DB
• Time Series Databases
• Event Hub
• Service Bus Queues
Stream Processor Examples
Queue
Device Registry Store
Device Metadata
Processor
Data Lake
Device State Store
Device State
Processor
Notification
Processor
Raw Telemetry Processor
App Backend
Rules Processor
Event Hub
Stream Transformation
Processor
Secondary Stream
Processor
Data Path
Optional solution component
Main solution component
Gateway
Trades
and/or
transactions
Thin Client
News and
other alerts
Apps data
from
devices
App Backend
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Storage
Cloud
Gateway
Data Path
Optional solution component
Main solution component
Gateway
Trades
and/or
transactions
Thin Client
News and
other alerts
Apps data
from
devices
High-Scale Compute Models
Scale-appropriate compute models
Actor Frameworks / Service Fabric Reliable Actors: distributed
compute fabric hosting device actors.
Service Fabric Reliable Collections: highly available with
replicated and local state management.
Azure Batch: job scheduling and compute management for
highly parallelizable compute workloads.
Simple programming logic in vastly scalable
compute nodes
Data Analytics
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Data Lake
Cloud
Gateway
Data Path
Optional solution component
Main solution component
Gateway
Trades
and/or
transactions
Thin Client
News and
other alerts
Apps data
from
devices
Data Analytics
Event Hub
NRT Events
Stream Processing
(ASA, Storm or
Spark)
Alerts
Batch Events
Fetching &
Updating
Reference Data
Interceptor (Rules)
Spark
Hive/Pig
U-SQL
Azure Data Lake Store Azure Data Lake Analytics
SQL DB
ML
Reports and
Dashboards
Real Time Scoring
Training ML Models
Relational Data
Data Analytics
Real-Time Analysis
Aggregation/Reduction, Temporal Queries, State
Correlation, Threshold Detection, Alerting
Data-At-Rest Analysis
Time-Series, Map/Reduce, Correlation
Machine Learning
Pattern Detection, Behavior Prediction
Plausibility Analysis, Anomaly and Fraud Detection
Power BI
HDInsight
Stream Analytics
Data Factory
Machine Learning
Presentation and Business Connectivity
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Data Lake
Cloud
Gateway
Data Path
Optional solution component
Main solution component
Gateway
Trades
and/or
transactions
Thin Client
News and
other alerts
Apps data
from
devices
Cortana Intelligence Suite
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
Reference Architecture with Azure Services
Solution UX
Provisioning API
User Profile Information
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
User Recent Activity Store Store
Data Lake
Gateway
App Backend
Personal
mobile
devices
Business
systems
Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
Apps data
from devices
News and
other alerts
Gateway
Data Path
Optional solution component
Main solution component
Thin Client
Trades
and/or
transactions
Money Laundering Prevention
Fraud Detection
$ $ $
¥
Placement Layering Integration
Process
Know your
Customer
Transaction
Monitoring
Pattern
Detection
Machine Learning
Decision Tree Classification
Cluster
Analysis
Cloud
Anti-Money Laundering
Power BI
Fund monitoring
dashboard
Big Data Storage for
Multiple Sources
HDInsight Azure Data
Lake
Azure Data
Warehouse
SQL Azure Azure Machine
Learning
SQL
Financial Data
Real-time fraud detection feedback
Information Services
HDInsight Streaming
Analytics
Data Science Modeling
• Similar to linear regression
• Weights independent variables
• Useful with categorical
independent variable
• Offers coefficients to inform
management decision-making
• Very useful with internal
analytical teams to interpret
data
• Useful for diagnosing gaps in
data and customer outreach
• Helps drive understanding of
demand drivers
• Uses decision trees & votes
• Forest
• Compares results between
various outcomes
• Votes upon outcomes
• Evaluates based upon a
series of logical questions or
“forest”
• Jungle
• Useful when a forest
produces too many logical
branches
• Produces a series of weighted
edges and nodes
• Trained in input data
• Useful for complex tasks, like
speech recognition when
allowed to train in depth
• Very good with complex
interactions
• Enables retailers to better
identify behaviour patterns &
certain shopping activities
Reference Architecture & Azure Services
Solution UX
Provisioning API
User Profile Information
Stream Processors
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
User Recent Activity Store Store
Data Lake
Gateway
App Backend
Personal
mobile
devices
Business
systems
Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity
Apps data
from devices
News and
other alerts
Gateway
Data Path
Optional solution component
Main solution component
Thin Client
Trades
and/or
transactions
Today’s financial services market is highly competitive, complex, and difficult. Particularly with today’s legislation, it is becoming increasingly more important to reduce risk, increase compliance, detect fraud, retain customers, and know your customers better.
Over the course of time, data is evolving. Legacy systems have evolved into the current systems of today. Systems like As systems change and evolve to become more timely, accurate, and thoughtful greater opportunities for return on system investments are realized. Big Data and Advanced Analytics systems offer superior return-on-investment.
A host of opportunities exist to utilize this technology suite in the arena of financial analytics.
Left to Right
Personalization of offers and tailored banking experiences allow opportunities to engage with customers in a positive way based on their data.
Next best action offers surface suspected needs and offer the opportunity for sales lift.
Recommended interventions allow for programmatic intervention based upon customer churn.
Lifestyle yield management allows for bankers to tailor plans & recommendations based on the life state of customers (retiree versus recent graduate)
Many customers of financial institutions are impacted by seasonality in their employment or lifestyle. By recognizing and making offers to these customers based on their needs, banks can increase their profitability.
Fraud Detection allows banks to reduce risk and their cost of operations. Theft profiling & fraudulent transaction detection allow for proactive intervention & prevention of fraud. Remote shutdown & site monitoring allow banks to reactively intervene in ATM and physical locations in the event of fraud.
Customer Churn Prevention increases revenue by increasing customer lifetime. Churn scoring allows for identification of at-risk customers, and is the basis for all other churn applications. Personalized interventions allow for customized per-customer interventions to be created based upon churn scoring & personalization. Similarly risky customers can be profiled to identify characteristics and intervene. Call center monitoring allows for use of perceptual intelligence to be applied to identify churn behavior based on call center operations.
Risk Reduction & Compliance are a key way institutions can reduce operational costs. Prevention of Payment System errors and Money Laundering prevention can substantially reduce risk to fines & lost funds. Data entry is similarly a source of risk; identifying and preventing data entry errors can save time & money.
With the large amounts of data potentially available for analysis, managing data flows efficiently can be a challenge.
Huge amounts of data to process (volume)
A mixture of structured and unstructured data (variety)
New data that’s generated extremely frequently (velocity)
Data quality so that it can be trusted (veracity)
Between 2.17 & 3.61 Trillion dollars are laundered annually.
The process of detecting and preventing money laundering at a perceptual level is fairly straightforward, but implementation of systems to detect and prevent money laundering are incredibly complex
Money laundering has 3 primary processes; placement, layering, & integration. Placement is where funds from illegal activities are introduced to the financial system. Layering is the suite of transactions designed to clean the money. Integration is when funds are redistributed back through business transactions.
To prevent this malicious process from happening, process controls can be implemented to prevent money laundering. As you can see, these primarily fall into the 3 categories indicated in process, and can be supported by the various machine learning algorithms mentioned on the right.
(Click Again) using Cortana analytics in the process and to drive the machine learning behind money laundering prevention can prevent money laundering.
This diagram shows both the hot path & cold path outlined. The hot path informs directly from the information services layer as data is entered from field services. The cold path involves data storage & use of machine learning to inform hot path development and directly predict into the visualization layer. It is important to stress both the hot path and cold path of the solution here, as both are required to yield superior results.
Storm, Spark, & Azure Stream Analytics are the tools useful for the hot path implementations of the rules gleaned from ML.
Azure data factory is the orchestration tool. These are referenced again in the Machine Learning layer, as further development here is used to increase hot path value.
HDInsight & Azure Data Lake are big data stores.
Azure DW and SQL Azure are relational data stores for extracting further value from the big data stores.
Azure Machine Learning provides a platform for data evaluation, data science and prediction. This is where the real value for the solution is created.
Data Factory
http://azure.microsoft.com/en-us/services/data-factory/
Data Catalog
http://azure.microsoft.com/en-us/services/data-catalog/
Event Hubs
http://azure.microsoft.com/en-us/services/event-hubs/
Stream Analytics
http://azure.microsoft.com/en-us/services/stream-analytics/
Data Lake
http://azure.microsoft.com/en-us/campaigns/data-lake/
SQL Data Warehouse
http://azure.microsoft.com/en-us/services/sql-data-warehouse/
HDInsight
http://azure.microsoft.com/en-us/services/hdinsight/
Stream Analytics
http://azure.microsoft.com/en-us/services/stream-analytics/