Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Agenda
Define the problem
Establish the expected outcome
Dive into each pillar
Determine a Solution
Understand the applica...
Financial
Institutions risk
Loss of
Charterand a host of other penalties through
noncompliance with federal money
launderi...
Big Data Evolution
Legacy Systems Current Systems
Big Data
Advanced Analytics
Timely Info Accurate Thoughtful
Marketing Operations Bankers CEOs
• Next Best Action
• Recommended Interventions
• Lifestyle Yield Management
• Seasonal P...
Expected Outcome
$
Big Data
Challenges
Architectural Considerations
Fraud Detection Reference Architecture
Apps data
from devices
News and
other alerts
Solution UX
Provisioning API (Pull)
Us...
Reference Architecture with Azure Services
Solution UX
Provisioning API
User Profile Information
Stream Processors
Analyti...
Demo
Woodgrove Financial
User Profile and Metadata Stores
App Backend Solution UX
Provisioning API
User Profile Information
Stream Processors
Analy...
Device Identity, Registry and State Stores
Metadata Store
Authority for all registered sources
Stores identity information...
Stream Processors
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Mach...
Stream Processing: Data Flow
After ingress through the Gateway (Ingestion), the flow of data
through the system is facilit...
Stream Processor Examples
Queue
Device Registry Store
Device Metadata
Processor
Data Lake
Device State Store
Device State
...
App Backend
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Machine Le...
High-Scale Compute Models
Scale-appropriate compute models
Actor Frameworks / Service Fabric Reliable Actors: distributed
...
Data Analytics
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Processors
Analytics &
Machine...
Data Analytics
Event Hub
NRT Events
Stream Processing
(ASA, Storm or
Spark)
Alerts
Batch Events
Fetching &
Updating
Refere...
Data Analytics
Real-Time Analysis
Aggregation/Reduction, Temporal Queries, State
Correlation, Threshold Detection, Alertin...
Presentation and Business Connectivity
App Backend Solution UX
Provisioning API
Identity and Registry Stores
Stream Proces...
WebHDFS
YARN
U-SQL
Analytics Service HDInsight
(managed Hadoop Clusters)
Analytics
Store
Azure Data Lake
Cortana Intelligence Suite
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
C...
Reference Architecture with Azure Services
Solution UX
Provisioning API
User Profile Information
Stream Processors
Analyti...
Money Laundering Prevention
Fraud Detection
$ $ $
¥
Placement Layering Integration
Process
Know your
Customer
Transaction
...
Cloud
Anti-Money Laundering
Power BI
Fund monitoring
dashboard
Big Data Storage for
Multiple Sources
HDInsight Azure Data
...
Data Science Modeling
• Similar to linear regression
• Weights independent variables
• Useful with categorical
independent...
Reference Architecture & Azure Services
Solution UX
Provisioning API
User Profile Information
Stream Processors
Analytics ...
nishant.thacker@microsoft.com
© 2016 Microsoft Corporation. All rights reserved.
Big Data Application Architectures - Fraud Detection
Upcoming SlideShare
Loading in …5
×

Big Data Application Architectures - Fraud Detection

4,754 views

Published on

Big Data Application Architectures - Fraud Detection

Published in: Technology
  • Be the first to comment

Big Data Application Architectures - Fraud Detection

  1. 1. Agenda Define the problem Establish the expected outcome Dive into each pillar Determine a Solution Understand the applicability
  2. 2. Financial Institutions risk Loss of Charterand a host of other penalties through noncompliance with federal money laundering legislation.
  3. 3. Big Data Evolution Legacy Systems Current Systems Big Data Advanced Analytics Timely Info Accurate Thoughtful
  4. 4. Marketing Operations Bankers CEOs • Next Best Action • Recommended Interventions • Lifestyle Yield Management • Seasonal Personal Impact • Theft Profiling • Fraudulent Transaction Identification • Remote Shutdown • Site Monitoring • Recommended Interventions • Risky Customer Profiling • Call Center Monitoring • Churn Scoring • Payment System Errors • Money Laundering prevention • Compliance • Data Entry Intervention ? Personalization of offers & banking experience Risk Reduction & ComplianceCustomer Churn PreventionFraud Detection Areas of Opportunity for Financial Analytics
  5. 5. Expected Outcome $
  6. 6. Big Data Challenges
  7. 7. Architectural Considerations
  8. 8. Fraud Detection Reference Architecture Apps data from devices News and other alerts Solution UX Provisioning API (Pull) User Profile Information Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) User Recent Activity Store Gateway Data Lake Gateway App Backend Data Path Optional solution component Main solution component Thin Client Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity Personal mobile devices Trades and/or transactions Business systems
  9. 9. Reference Architecture with Azure Services Solution UX Provisioning API User Profile Information Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) User Recent Activity Store Store Data Lake Gateway App Backend Personal mobile devices Business systems Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity Apps data from devices News and other alerts Gateway Data Path Optional solution component Main solution component Thin Client Trades and/or transactions
  10. 10. Demo Woodgrove Financial
  11. 11. User Profile and Metadata Stores App Backend Solution UX Provisioning API User Profile Information Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) User Recent Activity Information Data Lake Gateway (Kafka, IoT Hub, Event Hubs) Data Path Optional solution component Main solution component Metadata Store Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  12. 12. Device Identity, Registry and State Stores Metadata Store Authority for all registered sources Stores identity information and authentication secrets User Profile Information Indexed list of all Users and their demographics – Secure, Governed, Audit Controlled Contains discovery and reference data related to Users Can define a schema model or use a vertical industry standard schema for metadata Can contain structured metadata and links to externally stored operational data User Recent Activity Contains operational data related to the Users’ most recent activities: - “Last known values” for each User - Aggregated or computed values - Stream of device data events containing Geo location and Time based tagging
  13. 13. Stream Processors App Backend Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Data Lake Data Path Optional solution component Main solution component Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  14. 14. Stream Processing: Data Flow After ingress through the Gateway (Ingestion), the flow of data through the system is facilitated by data pumps and analytics tasks Data flow can be driven by: • Apache Storm on Azure HDInsight • Apache Spark on Azure HDInsight • Azure Stream Analytics • Custom Event Processors Each can perform tasks in flight: • Data aggregation • Data enrichment • Complex event processing … and can output data to: • Azure Data Lake • Azure Blobs/Tables • HDInsight / HBase • Azure SQL DB • Time Series Databases • Event Hub • Service Bus Queues
  15. 15. Stream Processor Examples Queue Device Registry Store Device Metadata Processor Data Lake Device State Store Device State Processor Notification Processor Raw Telemetry Processor App Backend Rules Processor Event Hub Stream Transformation Processor Secondary Stream Processor Data Path Optional solution component Main solution component Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  16. 16. App Backend App Backend Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Storage Cloud Gateway Data Path Optional solution component Main solution component Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  17. 17. High-Scale Compute Models Scale-appropriate compute models Actor Frameworks / Service Fabric Reliable Actors: distributed compute fabric hosting device actors. Service Fabric Reliable Collections: highly available with replicated and local state management. Azure Batch: job scheduling and compute management for highly parallelizable compute workloads. Simple programming logic in vastly scalable compute nodes
  18. 18. Data Analytics App Backend Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Data Lake Cloud Gateway Data Path Optional solution component Main solution component Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  19. 19. Data Analytics Event Hub NRT Events Stream Processing (ASA, Storm or Spark) Alerts Batch Events Fetching & Updating Reference Data Interceptor (Rules) Spark Hive/Pig U-SQL Azure Data Lake Store Azure Data Lake Analytics SQL DB ML Reports and Dashboards Real Time Scoring Training ML Models Relational Data
  20. 20. Data Analytics Real-Time Analysis Aggregation/Reduction, Temporal Queries, State Correlation, Threshold Detection, Alerting Data-At-Rest Analysis Time-Series, Map/Reduce, Correlation Machine Learning Pattern Detection, Behavior Prediction Plausibility Analysis, Anomaly and Fraud Detection Power BI HDInsight Stream Analytics Data Factory Machine Learning
  21. 21. Presentation and Business Connectivity App Backend Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Data Lake Cloud Gateway Data Path Optional solution component Main solution component Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  22. 22. WebHDFS YARN U-SQL Analytics Service HDInsight (managed Hadoop Clusters) Analytics Store Azure Data Lake
  23. 23. Cortana Intelligence Suite Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data
  24. 24. Reference Architecture with Azure Services Solution UX Provisioning API User Profile Information Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) User Recent Activity Store Store Data Lake Gateway App Backend Personal mobile devices Business systems Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity Apps data from devices News and other alerts Gateway Data Path Optional solution component Main solution component Thin Client Trades and/or transactions
  25. 25. Money Laundering Prevention Fraud Detection $ $ $ ¥ Placement Layering Integration Process Know your Customer Transaction Monitoring Pattern Detection Machine Learning Decision Tree Classification Cluster Analysis
  26. 26. Cloud Anti-Money Laundering Power BI Fund monitoring dashboard Big Data Storage for Multiple Sources HDInsight Azure Data Lake Azure Data Warehouse SQL Azure Azure Machine Learning SQL Financial Data Real-time fraud detection feedback Information Services HDInsight Streaming Analytics
  27. 27. Data Science Modeling • Similar to linear regression • Weights independent variables • Useful with categorical independent variable • Offers coefficients to inform management decision-making • Very useful with internal analytical teams to interpret data • Useful for diagnosing gaps in data and customer outreach • Helps drive understanding of demand drivers • Uses decision trees & votes • Forest • Compares results between various outcomes • Votes upon outcomes • Evaluates based upon a series of logical questions or “forest” • Jungle • Useful when a forest produces too many logical branches • Produces a series of weighted edges and nodes • Trained in input data • Useful for complex tasks, like speech recognition when allowed to train in depth • Very good with complex interactions • Enables retailers to better identify behaviour patterns & certain shopping activities
  28. 28. Reference Architecture & Azure Services Solution UX Provisioning API User Profile Information Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) User Recent Activity Store Store Data Lake Gateway App Backend Personal mobile devices Business systems Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity Apps data from devices News and other alerts Gateway Data Path Optional solution component Main solution component Thin Client Trades and/or transactions
  29. 29. nishant.thacker@microsoft.com
  30. 30. © 2016 Microsoft Corporation. All rights reserved.

×