Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Graphs Continue to Revolutionize The Prevention of Financial Crime & Fraud in Real-Time


Published on

Financial crime prevention is something that affects everyone in one way or another. From the Deutsche Banks of the world to small and medium online merchants, regulations for anti-money laundering, know your customer, and customer due diligence apply.

Failing to comply with such regulations can bring on substantial fines. Even more importantly, it can hurt the bottom line and reputation of businesses, having far-reaching side effects. Complying with such regulations, and actively cracking down on financial crime, however, is not easy.

Cross-referencing interconnected data across various datasets, and trying to apply detection rules and to discover patterns in the data is complicated. It takes expertise, effort, and the right technology to be able to do this efficiently.

A natural and efficient way of looking for patterns and applying rules in troves of interconnected data is to model and view that data as a graph. By modeling data as a graph, and applying graph-based algorithms such as PageRank or Centrality, traversing paths, discovering connections and getting insights becomes possible.

Graphs and graph databases are the fastest-growing area of data management technology for a number of reasons. One of the reasons is because they are a perfect match for use cases involving interconnected data.

Queries that would be very complicated to express and very slow to execute using relational databases or other NoSQL database technology, are feasible using graph databases. With the rise in complexity of modern financial markets, financial crimes require going 4 to 11 levels deep into the account – payment graph: this requires a different solution than either relational or NoSQL databases.

How are organizations such as Alibaba, OpenCorporates, and Visa using graph database technology to not just stay on top of regulation, but be one step ahead in the race against financial crime?

Is it possible to do this in real time?

What do graph query languages have to do with this?

Published in: Data & Analytics
  • Login to see the comments

  • Be the first to like this

How Graphs Continue to Revolutionize The Prevention of Financial Crime & Fraud in Real-Time

  1. 1. How Graphs Continue to Revolutionize The Prevention of Financial Crime & Fraud in Real-Time
  2. 2. © 2019 TigerGraph. All Rights Reserved Agenda 2 Introduction1 4 3 2 What is a Graph Database Why Graph DB & analytics for financial crimes Customer use cases and case studies 5 Going real-time: need for speed in investigating financial crimes 6 Querying graphs - bridging theory & practice 7 Key requirements of graph query languages 8 New Features - advanced pattern matching with accumulators for financial crimes detection
  3. 3. © 2019 TigerGraph. All Rights Reserved Moderator & Client Speaker 3 George Anadiotis Connected Data London Organizer Linked Data Orchestration Founder ● Into graphs since 2005 ● Wearer of many hats: consultant, analyst, researcher, journalist, event organizer, curator ● ZDNet, Linked Data Orchestration, Connected Data London Rebecca Lee Chief Data Officer, OpenCorporates ● OpenCorporates is the largest open database of companies whose primary goal is to make information on companies more usable and widely available for the public benefit, particularly to tackle their use for criminal or anti-social purposes. ● Prior to this, Rebecca led PwC's Investigative Analytics team: conducting forensic investigations into alleged criminality and wrongdoing, and helping clients to build their fraud and risk capabilities.
  4. 4. © 2019 TigerGraph. All Rights Reserved TigerGraph Panelists 4 Dr. Victor Lee Director of Product Management ● BS in Electrical Engineering and Computer Science from UC Berkeley, MS in Electrical Engineering from Stanford University ● PhD in Computer Science from Kent State University focused on graph data mining ● 15+ years in tech industry Professor Alin Deutsch Chief Scientist ● University of California at San Diego (UCSD) faculty since 2002 with research focused around data management challenges for large-scale database-powered applications ● PhD in Computer Science from the University of Pennsylvania, specialization in graph query language design and optimization ● Several awards including the highly prestigious PODS Alberto O. Mendelzon Test-of-Time Award at SIGMOD 2018
  5. 5. © 2019 TigerGraph. All Rights Reserved We Use Graphs Every Day 5 Social Media Graphs are used to analyze relationships for social media Website Search PageRank is a graph algorithm used by Google Search to rank web pages in their search engine results Product Recommendations Graphs are used to understand the behavior and preferences of online buyers GRAPHS = RELATIONSHIPS Knowledge GraphSocial Graph Customer 360 Graph
  6. 6. © 2019 TigerGraph. All Rights Reserved Relational Database Key-Value Database Graph Database Customer XXXXXX Product XXXXXXXX Supplier XXXXXXXX Location XXXXXXXX Order XXXXXXXXX Product Customer Supplier Location KEY VALUE XXXXX Order Customer Produ ct Supplier • Rigid schema • High performance for transactions • Poor performance for deep analytics • Highly fluid schema/no schema • High performance for simple transactions • Poor performance for deep analytics • Flexible schema • High performance for complex transactions • High performance for deep analytics Location 1 = Delivery Location Location 2 = Warehouse Location 2 Product Payment PURCHASED RESIDES SHIPSTO PURCHASED SHIPS FROM A C C EPTED MAKES The Evolution of Databases XXXXX XXXXX XXXXX Location 1 N O TIFIES Complex, slow table joins required Multiple scans of massive table required Pre-connected business entities - no joins needed
  7. 7. © 2019 TigerGraph. All Rights Reserved 7 TigerGraph - Corporate Background Founded by Dr. Yu Xu in 2012 in Redwood City, California Mission: Unleash the power of interconnected data for deeper insights and better outcomes Industry’s first and only Native Massively Parallel Processing (MPP) Graph Technology The only scalable graph database for the enterprise used by organizations including Visa, Intuit, Zillow and
  8. 8. © 2019 TigerGraph. All Rights Reserved TigerGraph Customers TigerGraph customers are finding deeper insights for competitive advantage. Our technology is powering the top e-payment, credit card, mobile eCommerce, fintech, large pharma, healthcare and power-grid companies as well as government organizations. 8
  9. 9. © 2019 TigerGraph. All Rights Reserved 7 Key Data Science Capabilities Powered By a Native Parallel Graph Deep Link Analysis Similarity and Frequent Pattern Detection For a set of entities (e.g. customers, accounts, credit cards, banks), show all links or connections Given a graph (e.g. transactions), finding similarly situated entities. Find frequent patterns. DC? 6 Multi-dimensional Entity & Pattern Matching Given a pattern (e.g. social and financial interactions), find matching occurrences within a greater graph Hub & Community Detection Find strongly-connected communities and/or influential hubs (accounts, etc.) within a graph or sub-community Community 1 Community 2 1 2 3 4 5 Geospatial Graph Analysis Analyze changes in entities & relationships with location data A B A B Machine Learning Feature Generation & Explainable AI Extract graph-based features to feed as training data for machine learning; Power Explainable AI7 Temporal (Time-Series) Graph Analysis Analyze changes in entities & relationships over time
  10. 10. Using Subgraph or Relationship Discovery Combined with Graph Computation to find Diamonds of Money Laundering 10 Financial institutions collaborate to build transactional + knowledge graphs to identify money laundering rings and layering. Layering: Split "dirty money" into smaller amounts, transfer it from account to account, eventually mergeMoney Laundering Ring: Money is transferred in a circle.
  11. 11. In our global corporate world, clear, trusted, open, legal entity data is an essential requirement for good business… and a fair society 12 Today it’s difficult. Tomorrow it will be even harder The Challenge © 2019 TigerGraph. All Rights Reserved
  12. 12. 170+ million companies 220+ million officerships ‘Company registry for the world’. The largest open database of companies in the world is shining a light on the corporate world Our solution — “White-box” legal-entities, automated from official public sources including provenance and line of sight to source — Free via web, at scale through API or bulk data © 2019 TigerGraph. All Rights Reserved
  13. 13. ● Company Data ● Subsidiary relationships: ○ SEC (10k filings) ○ NIC ● Control relationships: ○ UK PSC register ● Shareholders: ○ New Zealand ○ Denmark ○ Alaska ○ Hong Kong Building a cross jurisdictional graph of corporate structures OpenCorporates MySQL RELATIONSHIP Company Company Directed edge attributes include: ● Type ○ Shareholder of ○ Share issuer to ○ Has subsidiary ○ Is subsidiary ○ Controls ○ In controlled by ● Confidence ● Percentage / No Shares ● Earliest / Latest Date* ● Provenance Node attributes include: ● Primary ID ● Company Name ● Company Number ● Jurisdiction ● Inactive flag ● Provenance © 2019 TigerGraph. All Rights Reserved
  14. 14. © 2019 TigerGraph. All Rights Reserved
  15. 15. Graph challenges Uncover hidden connections within data in OC’s ever-expanding datasets with real-time response rates ● Degrees of separation ● Up the chain ● Ultimate parent (and down) ● Siblings ● Temporal graph search ● Traverse using specific relationship properties - Active vs Dead relationships ● Expand to include different node types (company officers, addresses) © 2019 TigerGraph. All Rights Reserved
  16. 16. © 2019 TigerGraph. All Rights Reserved Time-Based Graph Search to Find Potential Fraud, Money Laundering or Corruption - Find Subsidiaries That Pop Up For a Brief Time & Vanish 17 More details at (World’s largest Open Database Migrates to TigerGraph) & (Fireflies and algorithms - the coming explosion of companies – OpenCorporates blog)
  17. 17. © 2019 TigerGraph. All Rights Reserved Alert Reducing False Positives in AML with Deep Link Analytics 18 Download the solution brief at High or Low Risk? New Alert SAR 1 SAR 2 Traditional Limit of real-time analysis: 2 Hops Historical records: offline analysis
  18. 18. © 2019 TigerGraph. All Rights Reserved Alert Reducing False Positives in AML with Deep Link Analytics 19 Download the solution brief at High or Low Risk? New Alert SAR 1 SAR 2 Closed Closed Alert Deep-Link Analytics reveals important distinctions
  19. 19. © 2019 TigerGraph. All Rights Reserved Alipay Detects Fraud and Anti-Money Laundering (AML) Violations in Real-time 20 Business Challenge Over 520 million users & peak volume of 256,000 payments per second. Detecting fraud & AML violations is like finding needles in an ever increasing giant haystack. Solution • Attach each payment transaction in real-time to an operational graph with device, location, credit card, account & customer information • Recompute and assess fraud and money laundering risk flagging suspicious transactions for investigation • Support fraud and AML investigation with visualization & up to 11 hop queries in real-time Business Benefits Scale up for over 2 billion transactions per day for fraud & AML detection and increase productivity to investigate and resolve fraud and AML alerts. Visit the solution page -
  20. 20. Graph Query Languages
  21. 21. The Past: Querying Graphs
  22. 22. The Age of the Graph Is Upon Us (Again) • Early-mid-90s: semi- or unstructured data research was all the rage • data logically viewed as graph, initially motivated by modeling WWW (page=vertex, link=edge) • query languages expressing constrained reachability in graph • Late 90s-late 2000s: special case XML (graph restricted to tree shape) • Mature: W3C standard ecosystem for modeling and querying (XQuery, XPath, XLink, XSLT, XML Schema, … ) • Since mid 2000s: JSON and friends (also graphs restricted to tree shape) • Mongodb, Couchbase, SparQL, GraphQL, AsterixDB, … • ~2010 to present: back to unrestricted graphs • Initially motivated by analytic tasks in social networks • Now universal use (most interesting data is linked, after all) 23
  23. 23. The Graph Data Model • Nodes model real-world entities • Edges are binary, they model relationships • may be directed or undirected (asymmetric, resp. symmetric relationships) • Nodes and edges may carry labels • Nodes and edges annotated with data • both have sets of attributes (key-value pairs) 24
  24. 24. Example Graph Vertex types: • Product (name, category, price) • Customer (ssn, name, address) Edge types: • Bought (discount, quantity) • Customer c bought 100 units of product p at discount 5%: modeled by edge c -- (Bought {discount=5%, quantity=100})--> p 25
  25. 25. Key Language Ingredients from the Past • Pioneered by academic work on relational query extensions for graphs (since ‘87) • Path expressions (PEs) for navigation • Variables for manipulating data found during navigation • Stitching multiple PEs into complex navigation patterns: conjunctive path queries • Constructors for new nodes and edges 26
  26. 26. The Present: Graph Query Language Requirements
  27. 27. Current Representative Graph QLs in Order of Appearance • SparQL • mature, W3C standard recommendation, but not aimed at analytics of arbitrary graphs: RDF, ontologies, semantic web • Cypher (neo4j) • essentially 1990’s StruQL with bells and whistles, inherits CRPQ syntactic style • Gremlin (Apache project and commercial products) • dataflow programming model: graph annotated with tokens (“traversers”) that flow through it according to user program • GSQL (TigerGraph) • Inspired by SQL, with support for massively parallel graph analytics 28
  28. 28. Key Language Ingredients Needed in Modern Applications • All primitives inherited from past (path expressions, conjunctive patterns, variables, node/edge construction) SparQL, Cypher, Gremlin, GSQL • Support for large-scale graph analytics • Customizable path traversal semantics Gremlin, GSQL • Aggregation of data encountered during traversal SparQL (partial), Cypher, Gremlin, GSQL • Control flow for class of iterative algorithms that converge in multiple steps • (e.g. PageRank-class, recommender systems, shortest paths, etc.) Gremlin, GSQL • Intermediate results assigned to nodes/edges support parallel computation (programming mindset + execution) GSQL 29
  29. 29. The Future: GSQL
  30. 30. Aggregation in Current Graph QLs • Cypher’s RETURN clause uses similar syntax as aggregation-extended CRPQs • Gremlin and SparQL use an SQL-style GROUP BY clause • GSQL uses aggregating containers called “accumulators” • soon to add above modes as syntactic sugar, but will preserve accumulators who remain strictly more versatile 31
  31. 31. GSQL Accumulators • GSQL traversals collect and aggregate data by writing it into accumulators • Accumulators are containers (data types) that • hold a data value • accept inputs • aggregate inputs into the data value using a binary operation • May be built-in (sum, max, min, etc.) or user-defined • May be • global (a single container) • vertex-attached (one container per vertex) 32
  32. 32. Vertex-Attached Accumulator Example: Revenue per Customer and per Product • Maximize opportunities for parallel evaluation SumAccum<float> @cSales, @pSales; SELECT c FROM Customer :c -(-Bought-> :b)- Product :p ACCUM float thisSaleRevenue = b.quantity*(*p.price, c.@cSales += thisSaleRevenue, p.@pSales += thisSaleRevenue; vertex-attached accums: one instance per node groups are distributed, each node accumulates its own group. Can be parallelized! this sale’s revenue contributes to two aggregations, each by distinct grouping criteria 33
  33. 33. Recommended Toys Ranked by Log-Cosine Similarity SumAccum<float> @rank, @lc; SumAccum<int> @inCommon; I = {Customer.1}; SELECT p INTO ToysILike, o INTO OthersWhoLikeThem FROM I:c -(-Likes->)- Product:p -(<-Likes-)- Customer:o WHERE p.category == “Toys” and o != c ACCUM o.@inCommon += 1 POST-ACCUM o.@lc = log (1 + o.@inCommon); SELECT t INTO ToysTheyLike FROM OthersWhoLikeThem:o –(-Likes->)- Product:t WHERE t.category == "toy" ACCUM t.@rank += o.@lc; RecommendedToys = ToysTheyLike – ToysILike; 34
  34. 34. Essential: Control-Flow, Particularly Loops • Loops (until condition is satisfied) • Necessary to program iterative algorithms, e.g. PageRank, recommender systems, shortest-path, etc. • They synergize with accumulators. This GSQL-unique combination concisely expresses sophisticated graph analytics • Can be used to program unbounded-length path traversal under various semantics 35
  35. 35. PageRank in GSQL CREATE QUERY pageRank (float maxChange, int maxIteration, float dampingFactor) { MaxAccum<float> @@maxDifference = 9999; // max score change in an iteration SumAccum<float> @received_score = 0; // sum of scores received from neighbors SumAccum<float> @score = 1; // initial score for every vertex is 1. AllV = {Page.*}; // start with all vertices of type Page WHILE @@maxDifference > maxChange LIMIT maxIteration DO @@maxDifference = 0; S= SELECT s FROM AllV:s -(Linkto)-> :t ACCUM t.@received_score += s.@score/s.outdegree() POST-ACCUM s.@score = 1-dampingFactor + dampingFactor * s.@received_score, s.@received_score = 0, @@maxDifference += abs(s.@score - s.@score'); END; } 36
  36. 36. © 2019 TigerGraph. All Rights Reserved Additional Resources ● Download TigerGraph’s Developer or Enterprise Free Trial ● Get started with TigerGraph’s Developer Portal ● Connect with fellow GSQL users in the developer forum ● Advance your graph knowledge with the eBook - “Native Parallel Graphs” 37 @TigerGraphDB /tigergraph /TigerGraphDB /company/TigerGraph