Advertisement

Big Data Application Architectures - Fraud Detection

DataWorks Summit/Hadoop Summit
Jul. 11, 2016
Advertisement

More Related Content

Advertisement

More from DataWorks Summit/Hadoop Summit(20)

Advertisement

Big Data Application Architectures - Fraud Detection

  1. Agenda Define the problem Establish the expected outcome Dive into each pillar Determine a Solution Understand the applicability
  2. Financial Institutions risk Loss of Charterand a host of other penalties through noncompliance with federal money laundering legislation.
  3. Big Data Evolution Legacy Systems Current Systems Big Data Advanced Analytics Timely Info Accurate Thoughtful
  4. Marketing Operations Bankers CEOs • Next Best Action • Recommended Interventions • Lifestyle Yield Management • Seasonal Personal Impact • Theft Profiling • Fraudulent Transaction Identification • Remote Shutdown • Site Monitoring • Recommended Interventions • Risky Customer Profiling • Call Center Monitoring • Churn Scoring • Payment System Errors • Money Laundering prevention • Compliance • Data Entry Intervention ? Personalization of offers & banking experience Risk Reduction & ComplianceCustomer Churn PreventionFraud Detection Areas of Opportunity for Financial Analytics
  5. Expected Outcome $
  6. Big Data Challenges
  7. Architectural Considerations
  8. Fraud Detection Reference Architecture Apps data from devices News and other alerts Solution UX Provisioning API (Pull) User Profile Information Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) User Recent Activity Store Gateway Data Lake Gateway App Backend Data Path Optional solution component Main solution component Thin Client Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity Personal mobile devices Trades and/or transactions Business systems
  9. Reference Architecture with Azure Services Solution UX Provisioning API User Profile Information Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) User Recent Activity Store Store Data Lake Gateway App Backend Personal mobile devices Business systems Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity Apps data from devices News and other alerts Gateway Data Path Optional solution component Main solution component Thin Client Trades and/or transactions
  10. Demo Woodgrove Financial
  11. User Profile and Metadata Stores App Backend Solution UX Provisioning API User Profile Information Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) User Recent Activity Information Data Lake Gateway (Kafka, IoT Hub, Event Hubs) Data Path Optional solution component Main solution component Metadata Store Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  12. Device Identity, Registry and State Stores Metadata Store Authority for all registered sources Stores identity information and authentication secrets User Profile Information Indexed list of all Users and their demographics – Secure, Governed, Audit Controlled Contains discovery and reference data related to Users Can define a schema model or use a vertical industry standard schema for metadata Can contain structured metadata and links to externally stored operational data User Recent Activity Contains operational data related to the Users’ most recent activities: - “Last known values” for each User - Aggregated or computed values - Stream of device data events containing Geo location and Time based tagging
  13. Stream Processors App Backend Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Data Lake Data Path Optional solution component Main solution component Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  14. Stream Processing: Data Flow After ingress through the Gateway (Ingestion), the flow of data through the system is facilitated by data pumps and analytics tasks Data flow can be driven by: • Apache Storm on Azure HDInsight • Apache Spark on Azure HDInsight • Azure Stream Analytics • Custom Event Processors Each can perform tasks in flight: • Data aggregation • Data enrichment • Complex event processing … and can output data to: • Azure Data Lake • Azure Blobs/Tables • HDInsight / HBase • Azure SQL DB • Time Series Databases • Event Hub • Service Bus Queues
  15. Stream Processor Examples Queue Device Registry Store Device Metadata Processor Data Lake Device State Store Device State Processor Notification Processor Raw Telemetry Processor App Backend Rules Processor Event Hub Stream Transformation Processor Secondary Stream Processor Data Path Optional solution component Main solution component Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  16. App Backend App Backend Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Storage Cloud Gateway Data Path Optional solution component Main solution component Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  17. High-Scale Compute Models Scale-appropriate compute models Actor Frameworks / Service Fabric Reliable Actors: distributed compute fabric hosting device actors. Service Fabric Reliable Collections: highly available with replicated and local state management. Azure Batch: job scheduling and compute management for highly parallelizable compute workloads. Simple programming logic in vastly scalable compute nodes
  18. Data Analytics App Backend Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Data Lake Cloud Gateway Data Path Optional solution component Main solution component Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  19. Data Analytics Event Hub NRT Events Stream Processing (ASA, Storm or Spark) Alerts Batch Events Fetching & Updating Reference Data Interceptor (Rules) Spark Hive/Pig U-SQL Azure Data Lake Store Azure Data Lake Analytics SQL DB ML Reports and Dashboards Real Time Scoring Training ML Models Relational Data
  20. Data Analytics Real-Time Analysis Aggregation/Reduction, Temporal Queries, State Correlation, Threshold Detection, Alerting Data-At-Rest Analysis Time-Series, Map/Reduce, Correlation Machine Learning Pattern Detection, Behavior Prediction Plausibility Analysis, Anomaly and Fraud Detection Power BI HDInsight Stream Analytics Data Factory Machine Learning
  21. Presentation and Business Connectivity App Backend Solution UX Provisioning API Identity and Registry Stores Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Data Lake Cloud Gateway Data Path Optional solution component Main solution component Gateway Trades and/or transactions Thin Client News and other alerts Apps data from devices
  22. WebHDFS YARN U-SQL Analytics Service HDInsight (managed Hadoop Clusters) Analytics Store Azure Data Lake
  23. Cortana Intelligence Suite Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data
  24. Reference Architecture with Azure Services Solution UX Provisioning API User Profile Information Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) User Recent Activity Store Store Data Lake Gateway App Backend Personal mobile devices Business systems Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity Apps data from devices News and other alerts Gateway Data Path Optional solution component Main solution component Thin Client Trades and/or transactions
  25. Money Laundering Prevention Fraud Detection $ $ $ ¥ Placement Layering Integration Process Know your Customer Transaction Monitoring Pattern Detection Machine Learning Decision Tree Classification Cluster Analysis
  26. Cloud Anti-Money Laundering Power BI Fund monitoring dashboard Big Data Storage for Multiple Sources HDInsight Azure Data Lake Azure Data Warehouse SQL Azure Azure Machine Learning SQL Financial Data Real-time fraud detection feedback Information Services HDInsight Streaming Analytics
  27. Data Science Modeling • Similar to linear regression • Weights independent variables • Useful with categorical independent variable • Offers coefficients to inform management decision-making • Very useful with internal analytical teams to interpret data • Useful for diagnosing gaps in data and customer outreach • Helps drive understanding of demand drivers • Uses decision trees & votes • Forest • Compares results between various outcomes • Votes upon outcomes • Evaluates based upon a series of logical questions or “forest” • Jungle • Useful when a forest produces too many logical branches • Produces a series of weighted edges and nodes • Trained in input data • Useful for complex tasks, like speech recognition when allowed to train in depth • Very good with complex interactions • Enables retailers to better identify behaviour patterns & certain shopping activities
  28. Reference Architecture & Azure Services Solution UX Provisioning API User Profile Information Stream Processors Analytics & Machine Learning Business Integration Connectors and Gateway(s) User Recent Activity Store Store Data Lake Gateway App Backend Personal mobile devices Business systems Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity Apps data from devices News and other alerts Gateway Data Path Optional solution component Main solution component Thin Client Trades and/or transactions
  29. nishant.thacker@microsoft.com
  30. © 2016 Microsoft Corporation. All rights reserved.

Editor's Notes

  1. Today’s financial services market is highly competitive, complex, and difficult. Particularly with today’s legislation, it is becoming increasingly more important to reduce risk, increase compliance, detect fraud, retain customers, and know your customers better.
  2. Over the course of time, data is evolving. Legacy systems have evolved into the current systems of today. Systems like As systems change and evolve to become more timely, accurate, and thoughtful greater opportunities for return on system investments are realized. Big Data and Advanced Analytics systems offer superior return-on-investment.
  3. A host of opportunities exist to utilize this technology suite in the arena of financial analytics. Left to Right Personalization of offers and tailored banking experiences allow opportunities to engage with customers in a positive way based on their data. Next best action offers surface suspected needs and offer the opportunity for sales lift. Recommended interventions allow for programmatic intervention based upon customer churn. Lifestyle yield management allows for bankers to tailor plans & recommendations based on the life state of customers (retiree versus recent graduate) Many customers of financial institutions are impacted by seasonality in their employment or lifestyle. By recognizing and making offers to these customers based on their needs, banks can increase their profitability. Fraud Detection allows banks to reduce risk and their cost of operations. Theft profiling & fraudulent transaction detection allow for proactive intervention & prevention of fraud. Remote shutdown & site monitoring allow banks to reactively intervene in ATM and physical locations in the event of fraud. Customer Churn Prevention increases revenue by increasing customer lifetime. Churn scoring allows for identification of at-risk customers, and is the basis for all other churn applications. Personalized interventions allow for customized per-customer interventions to be created based upon churn scoring & personalization. Similarly risky customers can be profiled to identify characteristics and intervene. Call center monitoring allows for use of perceptual intelligence to be applied to identify churn behavior based on call center operations. Risk Reduction & Compliance are a key way institutions can reduce operational costs. Prevention of Payment System errors and Money Laundering prevention can substantially reduce risk to fines & lost funds. Data entry is similarly a source of risk; identifying and preventing data entry errors can save time & money.
  4. With the large amounts of data potentially available for analysis, managing data flows efficiently can be a challenge. Huge amounts of data to process (volume) A mixture of structured and unstructured data (variety) New data that’s generated extremely frequently (velocity) Data quality so that it can be trusted (veracity)
  5. Between 2.17 & 3.61 Trillion dollars are laundered annually. The process of detecting and preventing money laundering at a perceptual level is fairly straightforward, but implementation of systems to detect and prevent money laundering are incredibly complex Money laundering has 3 primary processes; placement, layering, & integration. Placement is where funds from illegal activities are introduced to the financial system. Layering is the suite of transactions designed to clean the money. Integration is when funds are redistributed back through business transactions. To prevent this malicious process from happening, process controls can be implemented to prevent money laundering. As you can see, these primarily fall into the 3 categories indicated in process, and can be supported by the various machine learning algorithms mentioned on the right. (Click Again) using Cortana analytics in the process and to drive the machine learning behind money laundering prevention can prevent money laundering.
  6. This diagram shows both the hot path & cold path outlined. The hot path informs directly from the information services layer as data is entered from field services. The cold path involves data storage & use of machine learning to inform hot path development and directly predict into the visualization layer. It is important to stress both the hot path and cold path of the solution here, as both are required to yield superior results. Storm, Spark, & Azure Stream Analytics are the tools useful for the hot path implementations of the rules gleaned from ML. Azure data factory is the orchestration tool. These are referenced again in the Machine Learning layer, as further development here is used to increase hot path value. HDInsight & Azure Data Lake are big data stores. Azure DW and SQL Azure are relational data stores for extracting further value from the big data stores. Azure Machine Learning provides a platform for data evaluation, data science and prediction. This is where the real value for the solution is created.
  7. Data Factory  http://azure.microsoft.com/en-us/services/data-factory/ Data Catalog http://azure.microsoft.com/en-us/services/data-catalog/ Event Hubs http://azure.microsoft.com/en-us/services/event-hubs/ Stream Analytics http://azure.microsoft.com/en-us/services/stream-analytics/
  8. Data Lake http://azure.microsoft.com/en-us/campaigns/data-lake/ SQL Data Warehouse http://azure.microsoft.com/en-us/services/sql-data-warehouse/ HDInsight http://azure.microsoft.com/en-us/services/hdinsight/ Stream Analytics http://azure.microsoft.com/en-us/services/stream-analytics/
  9. Machine Learning https://studio.azureml.net/ HDInsight http://azure.microsoft.com/en-us/services/hdinsight/ Stream Analytics http://azure.microsoft.com/en-us/services/stream-analytics/
  10. Power BI https://powerbi.microsoft.com/ Azure Web Sites http://azure.microsoft.com/en-us/services/app-service/web/
Advertisement