SlideShare a Scribd company logo
1 of 49
1
Using Graph Analysis and Fraud Detection in the
Fintech Industry
Stanka Dalekova Yavor I. Ivanov Dobroslav Hristov
@PlugIntoPaysafe
3
Who are Paysafe?
• Paysafe is leading specialized payments player in the world. We do the hard stuff better than our competition
• Global transactional volume of $85bn in 2018.
• Real-time Payments
• Two e-wallet services
Neteller
Skrill
4
• ~ 500 000 payments per day
• Fines on any fraud payment
• Balance between fraud protection and negative customer experience
• Fraudsters bury their patterns in lots of data.
Use Case – Fast Fraud Analytics
5
To PROCESS or NOT to PROCESS?
or or
Online Fraud Screening
6
• Fraud Benchmark Report by Cybersouce from 2016
– 83 % of North Americans review 29% of the orders manually,
– Fraud analysts can give insights about fraud patterns and customer behavior
– After manual review, rule engines can be updated
– Manual Review are costly and time-consuming
– Decreasing customer experience by delaying the payment
• Fraud prevention industry benchmark by Kount.com from 2018
– 93 percent of merchants perform manual reviews
– nearly 30 percent have a manual review rate between 1 and 5 percent
– 16 percent review between 5 and 10 percent
– 20 percent review more than one-in-ten of their orders.
Manual Review Metrics
9
Data is our golden eggs, how to make it meaningful?
10
A: The payments in Paysafe are actually a
GRAPH!
Breakthrough 
Q: A better way to analyze connected data?
Graph Databases are invented to solve the performance problems of connected data.
12
• Study on mathematical structures used to model pairwise relations between objects
• 250+ years of history - the paper written by Leonhard Euler on the Seven Bridges of
Königsberg and published in 1736 is regarded as the first paper in the history of graph theory
• Graphs are used to model many types of relations and process
• Graphs solve many real-life problems - in computer science, social sciences, biology, etc.
• Hundreds of graph algorithms – strongly connected components, paths algorithms, nearest
neighbor, page rank, edge weight algorithms, etc.
Graph Theory
A tree is a minimally
connected graph
having only one
path between any two
vertices.
And these are all graphs
What exactly is a graph?
These are all plots
14
Property Graph
15
Graph databases store data in terms of
• Entities (nodes or vertices)
• Relationships between entities (edges or arrows)
A better way to explore connected data.
Graph Database
16
Graphs are eating the world
17
Graphs are eating the world
Image courtesy DB-Engines
19
• Pioneer graph databases are several years old
– Neo4j [Cypher]
– IBM Graph [SPARQL and Gremlin]
– JanusGraph [Gremlin] (renamed from TitanDB)
• Gaining more traction recently – new players
– Oracle Spatial And Graph by Oracle (Property Graph 3+ years) [PGQL]
– AWS Neptune by Amazon announced on 30 May 2018 [SPARQL and GREMLIN]
– Azure Cosmos DB by Microsoft Graph API announced on 7 Feb 2018
– Apache Giraph 4+ years
– GraphX by Spark 3+ years
– RedisGraph
Graph databases go mainstream
22
SQL vs PGQL
SELECT T2.T AS "s1.fname$T",T2.V AS "s1.fname$V",T2.VN AS "s1.fname$VN",T2.VT AS "s1.fname$VT",
T3.T AS "s2.fname$T",T3.V AS "s2.fname$V",T3.VN AS "s2.fname$VN",T3.VT AS "s2.fname$VT"
FROM (/*Path[*/SELECT DISTINCT SVID, DVID FROM ( SELECT VID AS SVID, VID AS DVID FROM "GRAPH1VT$" UNION
ALL SELECT SVID,DVID
FROM (WITH RW (ROOT, SVID, DVID, LVL) AS ( SELECT ROOT, SVID, DVID, LVL FROM (SELECT SVID ROOT,
SVID, DVID, 1 LVL
FROM (SELECT T0.SVID AS SVID, T0.DVID AS DVID FROM "GRAPH1GT$" T0 WHERE (T0.EL = n'knows'))
) UNION ALL SELECT DISTINCT RW.ROOT, R.SVID, R.DVID, RW.LVL+1 FROM (SELECT T1.SVID AS SVID,
T1.DVID AS DVID FROM "GRAPH1GT$" T1 WHERE (T1.EL = n'knows')) R, RW WHERE RW.DVID = R.SVID )
CYCLE SVID SET cycle_col TO 1 DEFAULT 0 SELECT ROOT SVID, DVID FROM RW ))/*]Path*/) T6,
(/*Path[*/SELECT DISTINCT SVID, DVID FROM ( SELECT VID AS SVID, VID AS DVID FROM "GRAPH1VT$" UNION
ALL SELECT SVID,DVID
FROM (WITH RW (ROOT, SVID, DVID, LVL) AS ( SELECT ROOT, SVID, DVID, LVL FROM (SELECT SVID ROOT,
SVID, DVID, 1 LVL
FROM (SELECT T4.SVID AS SVID, T4.DVID AS DVID FROM "GRAPH1GT$" T4 WHERE (T4.EL = n'knows'))
) UNION ALL SELECT DISTINCT RW.ROOT, R.SVID, R.DVID, RW.LVL+1 FROM (SELECT T5.SVID AS SVID,
T5.DVID AS DVID FROM "GRAPH1GT$" T5 WHERE (T5.EL = n'knows')) R, RW WHERE RW.DVID = R.SVID )
CYCLE SVID SET cycle_col TO 1 DEFAULT 0 SELECT ROOT SVID, DVID FROM RW ))/*]Path*/) T7,
"GRAPH1VT$" T2, "GRAPH1VT$" T3
WHERE T2.K=n'fname' AND T3.K=n'fname' AND T6.SVID=T2.VID AND T6.DVID=T7.DVID AND T7.SVID=T3.VID
ORDER BY T6.SVID ASC NULLS LAST, T7.SVID ASC NULLS LAST
PATH knows_path := () -[:knows]-> ()
SELECT s1.fname, s2.fname
WHERE (s1) -/:knows_path*/-> (o) <-/:knows_path*/-(s2) ORDER BY s1,s2
• PGQL
• SQL
Find the pairs of people who are
connected to a common person
through the “knows” relation
Loading data into the graph
Parallel Graph AnalytiX(PGX) is a fast, parallel, in-memory graph analytic framework that allows users to load up
their graph data, run analytic algorithms on them, and to browse or store the result.
PGX Architecture in Paysafe
Graph storage management
PGX Graph Analytics
Java,Groovy,Python,…
REST/WebService/Notebooks
Java APIs
Java APIs/JDBC/SQL/PLSQL
Parallel In-Memory Graph
Analytics/Graph Query (PGX)
Oracle RDBMS
Parallel In-Memory Graph
Analytics/Graph Query (PGX)
PGX Graph Analytics
Vert.x Application Server
REST
Web User Interface Load Balancer
Vert.x Application Server
Graph storage management
• PGX loads the whole graph and the properties needed for the analysis to be loaded into main memory
• Compressed sparse row (CSR) format, a data structure which has minimal memory footprint while providing
very fast read access.
• More info on graph memory consumption can be found here
– On heap memory only string properties
– Off heap memory everything else – graph topology(edges and vertices) and properties
• Asynchronous Java API
Hardware requirements & sizing
27
Graph Database Performance
Relational Table vs Graph
Q: Is user "9" connected to user “1“?
If you double the number of rows in the table,
you've doubled the amount of data to search,
and thus doubled the amount of time it takes
to find what you are looking for
You always walk the graph at most once.
Conversely, a graph database looks only at records
that are directly connected to other records. If it is
given a limit on how many "hops" it is allowed to
make, it can ignore everything more than that
number of hops away.
28
• SQL created by Paysafe (32 lines)
• 1 day: 50 min 20 sec
• 1 week: (cancelled after 4 hours)
• 1 month: (did not even try)
• SQL optimized by Oracle (62 lines)
• 1 day: 20.3 sec
• 1 week: 8 min 33 sec
• 1 month: (cancelled after 6 hours)
• 4 PGQL queries (7 lines each)
• 1 day: 0.547 sec
• 1 week: 0.588 sec
• 1 month: 0.597 sec
Real-world example: is there a fraudster up to 4 hops away,
on very active customer (worst case)?
SELECT …
MATCH (v0) -[e0_1:"pays to"]-> (v1)
WHERE (v1.isMerchant = false)
AND (v0.customerId = …)
AND (e0_1.requestTime >= …)
AND (e0_1.requestTime <= …)
LIMIT 20000
Payments Flow Visualization
• Cytoscape is an open source software platform for visualizing complex networks and integrating these with
any type of attribute data.
• There is a JS library supporting many different layouts
• Make asynchronous call for any hop, display up to 10 hops in seconds
Typical customer behavior
Basic graph
Detected network (one send to many)
Bigger network (many send to one)
Placement, Layering, Integration
Multi-level network
More than
one level
Multi-level network
A bit more
complex
Network of networks
• Page rank
• Community detection
• Strongly connected components
• More built-in algorithms available
• Custom-defined algorithms with Green Marl
Graph Analytics
44
Subset of the graph where every vertex
is reachable from every other vertex
following the directions of the edges
Strongly Connected Components
45
Subset of the graph where every vertex
is reachable from every other vertex
(directions of the edges are ignored)
Weakly Connected Components
Finding sets of nodes such that each set of nodes is densely connected internally.
Community structures are quite common in real networks. Social networks
include community groups (the origin of the term, in fact) based on common
location, interests, occupation, etc.
Community Detection
PageRank (PR) is an algorithm used by Google Search to rank websites in
their search engine results. The PageRank algorithm outputs a probability
distribution used to represent the likelihood that a person randomly clicking
on links will arrive at any particular page
Page Rank
Strongly
Connected
Components
SCC with 50
members
(always
19 EUR out)
Community
detection
The right tool for the right job
• Payments are in the graph
• Deposits and Withdrawals are in the RDBMS
Money
entering the
system
Taking the
money out
Deposits and
Withdrawals
56
• Finding all communities for a given day up to 3 minutes
Community detection for 72-hour period: 7 sec
SCC for the same period: 6 sec
Top 10 Customers Page Rank: ~0.8 sec
• Memory statistics
Total edges count: 70 M with 350M properties
Edges size in DB: 72 GB (only the table)
Total vertices count: 4M with 12M properties
Vertices Size in DB: 2 GB (only the table)
Graph in size in PGX memory - 10GB
• Visualizing customer graph, up to seconds, but still depending on the relations
• Performing PGQL query in milliseconds – can be used in real-time
Use Case Performance Results
63
• Graph queries can be used as normal SQL queries to flag a risk transaction while the payment is being
processed
– If customer is linked with fraudster in range 2 hops, additional verification can be requested
• Graphs enhance AI by providing context by enabling connected features to ML. Relations or connected
features tend to be highly predictive.
– Is there a fraudster in range of 3hops, 4hops, etc. can be a highly predictive ML feature
– Feed page rank in a machine learning model
• Detect fastest growing networks and examine community evolution
New World of Opportunities
64
• Generate Proactive Report for the Fastest Growing Networks in terms of money flow(edge property), number of
payments(edge count) and number of customers(vertices) on a time series data from the graph
• Influencer found by Page Rank calculation
• Community Activity
Fastest Growing Networks
Example for a fastest growing community by
money flow and rolling period of five days
and daily time series data.
Amount
Graphs are REALLY
powerful
Image courtesy www.networkworld.com
• In a world of real-time payments, money processing becomes faster and more
automated. Fraud checks upon payment should happen fast and the time to identify
fraud patterns or networks is really narrow.
• Link analysis can enhance fraud detection by running queries using graph database
during key stages in the application lifecycle
– Upon money move
– Account creation
– During investigation
– When some thresholds are hit
• Traditional technologies are not designed to detect fraud in real-time. Graph databases
enable fast and effective real-time link queries.
Summary
• Graphs are useful for real-time decision making on connected data
• Powerful data analytics
• Go step further with machine learning models
• You can do so much AI with Graph
Takeaways
Link Predictions
A
B
C
Enemies
Friends
Question: What is the connection between B and C?
Link Predictions
A
B
C
Enemies
Friends
Question: What is the connection between B and C?
Resources
O’Reilly’s Graph Databases: New Opportunities for Connected Data
O’Reilly’s Graph Algorithms Book
Graph Databases: The Next Generation of Fraud Detection Technology
Cypher – graph query language
Oracle Property Graph Query Language
Link Prediction
Graph Theory with Applications
Very interesting talk:
How Graph Technology Is Changing Artificial Intelligence
and Machine Learning

More Related Content

What's hot

MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevAltinity Ltd
 
A head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadaysA head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadaysSriskandarajah Suhothayan
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series DataMongoDB
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraOlga Lavrentieva
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with DruidYousun Jeong
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveSpark Summit
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseMongoDB
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkSpark Summit
 
Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015wolf4ood
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applicationsLars Albertsson
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Martin Zapletal
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidJan Graßegger
 

What's hot (20)

MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
 
A head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadaysA head start on cloud native event driven applications - bigdatadays
A head start on cloud native event driven applications - bigdatadays
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with Druid
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015
 
Stream Processing with Ballerina
Stream Processing with BallerinaStream Processing with Ballerina
Stream Processing with Ballerina
 
Data Visulalization
Data VisulalizationData Visulalization
Data Visulalization
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
 
Scio
ScioScio
Scio
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and Druid
 

Similar to Using Graph Analysis and Fraud Detection in the Fintech Industry

Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
Xia Zhu – Intel at MLconf ATL
Xia Zhu – Intel at MLconf ATLXia Zhu – Intel at MLconf ATL
Xia Zhu – Intel at MLconf ATLMLconf
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology EnthusiastsNebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology EnthusiastsNebula Graph
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Ontico
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Ivo Andreev
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
 
201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detection201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detectionRik Van Bruggen
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsSamantha Quiñones
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processingconfluent
 
Making sense of the Graph Revolution
Making sense of the Graph RevolutionMaking sense of the Graph Revolution
Making sense of the Graph RevolutionInfiniteGraph
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with sparkMarissa Saunders
 
Stream Processing in Uber
Stream Processing in UberStream Processing in Uber
Stream Processing in UberC4Media
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 

Similar to Using Graph Analysis and Fraud Detection in the Fintech Industry (20)

Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
Xia Zhu – Intel at MLconf ATL
Xia Zhu – Intel at MLconf ATLXia Zhu – Intel at MLconf ATL
Xia Zhu – Intel at MLconf ATL
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology EnthusiastsNebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
Nebula Graph nMeetup in Shanghai - Meet with Graph Technology Enthusiasts
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detection201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detection
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
 
Making sense of the Graph Revolution
Making sense of the Graph RevolutionMaking sense of the Graph Revolution
Making sense of the Graph Revolution
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with spark
 
Stream Processing in Uber
Stream Processing in UberStream Processing in Uber
Stream Processing in Uber
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 

Recently uploaded

Interimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance CompanyInterimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance CompanyTyöeläkeyhtiö Elo
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spiritegoetzinger
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdfAdnet Communications
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Commonwealth
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managmentfactical
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfMichael Silva
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Roomdivyansh0kumar0
 
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxOAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxhiddenlevers
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarHarsh Kumar
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfshaunmashale756
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignHenry Tapper
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantagesjayjaymabutot13
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfAdnet Communications
 
Unveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net WorthUnveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net WorthShaheen Kumar
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfHenry Tapper
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHenry Tapper
 
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...Suhani Kapoor
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 

Recently uploaded (20)

Interimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance CompanyInterimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spirit
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf
 
Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]Monthly Market Risk Update: April 2024 [SlideShare]
Monthly Market Risk Update: April 2024 [SlideShare]
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managment
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdf
 
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jodhpur Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jodhpur Park 👉 8250192130 Available With Room
 
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxOAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh Kumar
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdf
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantages
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdf
 
Unveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net WorthUnveiling the Top Chartered Accountants in India and Their Staggering Net Worth
Unveiling the Top Chartered Accountants in India and Their Staggering Net Worth
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
 
Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview document
 
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 

Using Graph Analysis and Fraud Detection in the Fintech Industry

  • 1. 1 Using Graph Analysis and Fraud Detection in the Fintech Industry Stanka Dalekova Yavor I. Ivanov Dobroslav Hristov @PlugIntoPaysafe
  • 2. 3 Who are Paysafe? • Paysafe is leading specialized payments player in the world. We do the hard stuff better than our competition • Global transactional volume of $85bn in 2018. • Real-time Payments • Two e-wallet services Neteller Skrill
  • 3. 4 • ~ 500 000 payments per day • Fines on any fraud payment • Balance between fraud protection and negative customer experience • Fraudsters bury their patterns in lots of data. Use Case – Fast Fraud Analytics
  • 4. 5 To PROCESS or NOT to PROCESS? or or Online Fraud Screening
  • 5. 6 • Fraud Benchmark Report by Cybersouce from 2016 – 83 % of North Americans review 29% of the orders manually, – Fraud analysts can give insights about fraud patterns and customer behavior – After manual review, rule engines can be updated – Manual Review are costly and time-consuming – Decreasing customer experience by delaying the payment • Fraud prevention industry benchmark by Kount.com from 2018 – 93 percent of merchants perform manual reviews – nearly 30 percent have a manual review rate between 1 and 5 percent – 16 percent review between 5 and 10 percent – 20 percent review more than one-in-ten of their orders. Manual Review Metrics
  • 6. 9 Data is our golden eggs, how to make it meaningful?
  • 7. 10 A: The payments in Paysafe are actually a GRAPH! Breakthrough  Q: A better way to analyze connected data? Graph Databases are invented to solve the performance problems of connected data.
  • 8. 12 • Study on mathematical structures used to model pairwise relations between objects • 250+ years of history - the paper written by Leonhard Euler on the Seven Bridges of Königsberg and published in 1736 is regarded as the first paper in the history of graph theory • Graphs are used to model many types of relations and process • Graphs solve many real-life problems - in computer science, social sciences, biology, etc. • Hundreds of graph algorithms – strongly connected components, paths algorithms, nearest neighbor, page rank, edge weight algorithms, etc. Graph Theory
  • 9. A tree is a minimally connected graph having only one path between any two vertices. And these are all graphs What exactly is a graph? These are all plots
  • 11. 15 Graph databases store data in terms of • Entities (nodes or vertices) • Relationships between entities (edges or arrows) A better way to explore connected data. Graph Database
  • 12. 16 Graphs are eating the world
  • 13. 17 Graphs are eating the world Image courtesy DB-Engines
  • 14. 19 • Pioneer graph databases are several years old – Neo4j [Cypher] – IBM Graph [SPARQL and Gremlin] – JanusGraph [Gremlin] (renamed from TitanDB) • Gaining more traction recently – new players – Oracle Spatial And Graph by Oracle (Property Graph 3+ years) [PGQL] – AWS Neptune by Amazon announced on 30 May 2018 [SPARQL and GREMLIN] – Azure Cosmos DB by Microsoft Graph API announced on 7 Feb 2018 – Apache Giraph 4+ years – GraphX by Spark 3+ years – RedisGraph Graph databases go mainstream
  • 15. 22 SQL vs PGQL SELECT T2.T AS "s1.fname$T",T2.V AS "s1.fname$V",T2.VN AS "s1.fname$VN",T2.VT AS "s1.fname$VT", T3.T AS "s2.fname$T",T3.V AS "s2.fname$V",T3.VN AS "s2.fname$VN",T3.VT AS "s2.fname$VT" FROM (/*Path[*/SELECT DISTINCT SVID, DVID FROM ( SELECT VID AS SVID, VID AS DVID FROM "GRAPH1VT$" UNION ALL SELECT SVID,DVID FROM (WITH RW (ROOT, SVID, DVID, LVL) AS ( SELECT ROOT, SVID, DVID, LVL FROM (SELECT SVID ROOT, SVID, DVID, 1 LVL FROM (SELECT T0.SVID AS SVID, T0.DVID AS DVID FROM "GRAPH1GT$" T0 WHERE (T0.EL = n'knows')) ) UNION ALL SELECT DISTINCT RW.ROOT, R.SVID, R.DVID, RW.LVL+1 FROM (SELECT T1.SVID AS SVID, T1.DVID AS DVID FROM "GRAPH1GT$" T1 WHERE (T1.EL = n'knows')) R, RW WHERE RW.DVID = R.SVID ) CYCLE SVID SET cycle_col TO 1 DEFAULT 0 SELECT ROOT SVID, DVID FROM RW ))/*]Path*/) T6, (/*Path[*/SELECT DISTINCT SVID, DVID FROM ( SELECT VID AS SVID, VID AS DVID FROM "GRAPH1VT$" UNION ALL SELECT SVID,DVID FROM (WITH RW (ROOT, SVID, DVID, LVL) AS ( SELECT ROOT, SVID, DVID, LVL FROM (SELECT SVID ROOT, SVID, DVID, 1 LVL FROM (SELECT T4.SVID AS SVID, T4.DVID AS DVID FROM "GRAPH1GT$" T4 WHERE (T4.EL = n'knows')) ) UNION ALL SELECT DISTINCT RW.ROOT, R.SVID, R.DVID, RW.LVL+1 FROM (SELECT T5.SVID AS SVID, T5.DVID AS DVID FROM "GRAPH1GT$" T5 WHERE (T5.EL = n'knows')) R, RW WHERE RW.DVID = R.SVID ) CYCLE SVID SET cycle_col TO 1 DEFAULT 0 SELECT ROOT SVID, DVID FROM RW ))/*]Path*/) T7, "GRAPH1VT$" T2, "GRAPH1VT$" T3 WHERE T2.K=n'fname' AND T3.K=n'fname' AND T6.SVID=T2.VID AND T6.DVID=T7.DVID AND T7.SVID=T3.VID ORDER BY T6.SVID ASC NULLS LAST, T7.SVID ASC NULLS LAST PATH knows_path := () -[:knows]-> () SELECT s1.fname, s2.fname WHERE (s1) -/:knows_path*/-> (o) <-/:knows_path*/-(s2) ORDER BY s1,s2 • PGQL • SQL Find the pairs of people who are connected to a common person through the “knows” relation
  • 16. Loading data into the graph Parallel Graph AnalytiX(PGX) is a fast, parallel, in-memory graph analytic framework that allows users to load up their graph data, run analytic algorithms on them, and to browse or store the result.
  • 17. PGX Architecture in Paysafe Graph storage management PGX Graph Analytics Java,Groovy,Python,… REST/WebService/Notebooks Java APIs Java APIs/JDBC/SQL/PLSQL Parallel In-Memory Graph Analytics/Graph Query (PGX) Oracle RDBMS Parallel In-Memory Graph Analytics/Graph Query (PGX) PGX Graph Analytics Vert.x Application Server REST Web User Interface Load Balancer Vert.x Application Server Graph storage management
  • 18. • PGX loads the whole graph and the properties needed for the analysis to be loaded into main memory • Compressed sparse row (CSR) format, a data structure which has minimal memory footprint while providing very fast read access. • More info on graph memory consumption can be found here – On heap memory only string properties – Off heap memory everything else – graph topology(edges and vertices) and properties • Asynchronous Java API Hardware requirements & sizing
  • 19. 27 Graph Database Performance Relational Table vs Graph Q: Is user "9" connected to user “1“? If you double the number of rows in the table, you've doubled the amount of data to search, and thus doubled the amount of time it takes to find what you are looking for You always walk the graph at most once. Conversely, a graph database looks only at records that are directly connected to other records. If it is given a limit on how many "hops" it is allowed to make, it can ignore everything more than that number of hops away.
  • 20. 28 • SQL created by Paysafe (32 lines) • 1 day: 50 min 20 sec • 1 week: (cancelled after 4 hours) • 1 month: (did not even try) • SQL optimized by Oracle (62 lines) • 1 day: 20.3 sec • 1 week: 8 min 33 sec • 1 month: (cancelled after 6 hours) • 4 PGQL queries (7 lines each) • 1 day: 0.547 sec • 1 week: 0.588 sec • 1 month: 0.597 sec Real-world example: is there a fraudster up to 4 hops away, on very active customer (worst case)? SELECT … MATCH (v0) -[e0_1:"pays to"]-> (v1) WHERE (v1.isMerchant = false) AND (v0.customerId = …) AND (e0_1.requestTime >= …) AND (e0_1.requestTime <= …) LIMIT 20000
  • 21. Payments Flow Visualization • Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. • There is a JS library supporting many different layouts • Make asynchronous call for any hop, display up to 10 hops in seconds
  • 23. Detected network (one send to many)
  • 24. Bigger network (many send to one)
  • 29. • Page rank • Community detection • Strongly connected components • More built-in algorithms available • Custom-defined algorithms with Green Marl Graph Analytics
  • 30. 44 Subset of the graph where every vertex is reachable from every other vertex following the directions of the edges Strongly Connected Components
  • 31. 45 Subset of the graph where every vertex is reachable from every other vertex (directions of the edges are ignored) Weakly Connected Components
  • 32. Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are quite common in real networks. Social networks include community groups (the origin of the term, in fact) based on common location, interests, occupation, etc. Community Detection
  • 33. PageRank (PR) is an algorithm used by Google Search to rank websites in their search engine results. The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page Page Rank
  • 37. The right tool for the right job • Payments are in the graph • Deposits and Withdrawals are in the RDBMS
  • 41. 56 • Finding all communities for a given day up to 3 minutes Community detection for 72-hour period: 7 sec SCC for the same period: 6 sec Top 10 Customers Page Rank: ~0.8 sec • Memory statistics Total edges count: 70 M with 350M properties Edges size in DB: 72 GB (only the table) Total vertices count: 4M with 12M properties Vertices Size in DB: 2 GB (only the table) Graph in size in PGX memory - 10GB • Visualizing customer graph, up to seconds, but still depending on the relations • Performing PGQL query in milliseconds – can be used in real-time Use Case Performance Results
  • 42. 63 • Graph queries can be used as normal SQL queries to flag a risk transaction while the payment is being processed – If customer is linked with fraudster in range 2 hops, additional verification can be requested • Graphs enhance AI by providing context by enabling connected features to ML. Relations or connected features tend to be highly predictive. – Is there a fraudster in range of 3hops, 4hops, etc. can be a highly predictive ML feature – Feed page rank in a machine learning model • Detect fastest growing networks and examine community evolution New World of Opportunities
  • 43. 64 • Generate Proactive Report for the Fastest Growing Networks in terms of money flow(edge property), number of payments(edge count) and number of customers(vertices) on a time series data from the graph • Influencer found by Page Rank calculation • Community Activity Fastest Growing Networks Example for a fastest growing community by money flow and rolling period of five days and daily time series data. Amount
  • 44. Graphs are REALLY powerful Image courtesy www.networkworld.com
  • 45. • In a world of real-time payments, money processing becomes faster and more automated. Fraud checks upon payment should happen fast and the time to identify fraud patterns or networks is really narrow. • Link analysis can enhance fraud detection by running queries using graph database during key stages in the application lifecycle – Upon money move – Account creation – During investigation – When some thresholds are hit • Traditional technologies are not designed to detect fraud in real-time. Graph databases enable fast and effective real-time link queries. Summary
  • 46. • Graphs are useful for real-time decision making on connected data • Powerful data analytics • Go step further with machine learning models • You can do so much AI with Graph Takeaways
  • 47. Link Predictions A B C Enemies Friends Question: What is the connection between B and C?
  • 48. Link Predictions A B C Enemies Friends Question: What is the connection between B and C?
  • 49. Resources O’Reilly’s Graph Databases: New Opportunities for Connected Data O’Reilly’s Graph Algorithms Book Graph Databases: The Next Generation of Fraud Detection Technology Cypher – graph query language Oracle Property Graph Query Language Link Prediction Graph Theory with Applications Very interesting talk: How Graph Technology Is Changing Artificial Intelligence and Machine Learning