Understanding fraud is a matter of understanding connections. Investigators seek unusual links between people, accounts and events. They scour huge, noisy and complex datasets to understand which connections are genuine, and which may indicate fraud. This talk is about the vital role data visualization plays in fraud detection. Using live demos, Giuseppe Francavilla will describe a generic architecture for enterprise fraud detection, with data visualization at its core, and explain how a powerful visualization component can make fraud intelligence quick to find and easy to communicate
A visual approach to fraud detection and investigation - Giuseppe Francavilla
1. A visual approach to
fraud detection and investigation
Roma - 24 febbraio 2017
2. Summary
o Graph Technologies
o Platform Overview
o Use Cases
o Graph Techniques
“Are you astonished Aulus, that our friend Fabullinus is so frequently deceived?
A good man has always something to learn in regard to fraud.”
- Marcus Aurelius -
3. Graph Visualisation
Graph visualisation (aka Link Analysis or Network Analysis) is the process of creating images to show
graph data.
Graph visualisation allows you to explore connected data and:
- See patterns more clearly
- Perform analysis
- Answer questions
4. Interactive Visualisation engages intuition and
creativity, making it easy for a user to:
- “Consume” a large amount of data
- Discover, map, group and filter information.
- Understand context and see details
Interactive Visualisation
5. 1. Data store
Structured &
unstructured data
Feedback
loop
Alerts
2. Structure
Entity Extraction,
Entity Resolution
3. Load
Fraud database
4. Process
Rule-based scoring &
predictive analytics
Data streamed or
loaded
5. Case
Management
Client risk rating
Investigation
Reporting
6. Visualize
Aggregate & Network
view
Traditional platforms
6. Traditional “monolithic”, single supplier, full-stack solutions:
Enterprise-ready network analysis tools available on the market:
- tend to be expensive
- lack extensibility
- are unable to cope with demands for information from new data sources
What was great yesterday is average today and poor tomorrow.
Modern web and Big Data technologies can deliver scalable network analysis at reasonable costs.
Challenge: - integrating components into an end to end solution
- robust and user friendly front-end
7. Technology & Analytics Innovation
Analytics
Data
Architecture
Visualization
IT TeamData Scientist
- Customise
- Create
- Iterate
8. Use Cases
Application Fraud
Review FraudIdentity Theft Account Takeover
Claims Fraud Transaction Fraud
Common factors:
1. Lots of densely connected data
2. People need to see anomalies or patterns
10. Network with single, connection
point
Secondary link outside of hub of
network
Detecting - Unusual Patterns
With large numbers of data endpoints, a zoomed-out view helps identify broader patterns in the data.
Zooming in on the graph can reveal which nodes are acting as hubs, holding the graph together, and those nodes are
often important.
12. Link through Account/Policy holder with same address, email & phone.
Fraud ring
Detecting – Ring Expansion Pattern
13. Initial claim rejected but policy holder has reclaimed for a similar incident a week later
Detecting – Fraud: Multiple Claims Pattern
14. “KeyLines has brought a step-change in how we communicate and use data. Within a week,
new frauds were detected with the system. By introducing this kind of leading-edge software,
we have fixed a problem for today, and also ensured we can meet our members’ future
needs.”
– Simon Fitzgerald, Data Sharing Services Manager
Investigating – Fraud CIFAS
15. Multiple sources and building a graph dashboard to explore interconnected data
Most aggregated views are only useful once you understand
what you need to look for.
Graph is the tool to bridge the gap between the tabular
presentation and the aggregated views available in
dashboards.
✔ Unified Model:
Different Graphs for Different Scales and different
Questions
✔ Multipurpose
Connect / see / interact
Inspect individual items
Explore behavior at scale
✔ Visual
Spot relationships, patterns, outliers
“Our platform can process 1,000,000 events per second”
16. Detecting
Using a clean drag & drop interface, users construct a graph model from a
tabular view of their data, and define their visual styling.
This produces an interactive visual report.
With the left-hand control panel, users can apply advanced visual
analysis techniques including graph layouts, social network
analysis algorithms and filtering.
18. Review Fraud
● User written reviews are critical to online
commerce
● Sites like Amazon, TripAdvisor, Booking.com
all put their reviews front-and-center to drive
sales and site visits
● One study showed a 19% increase in
revenue for a 1 star increase in average
rating on Yelp
● This creates an ‘unhealthy ecosystem’ of
fraudsters looking to artificially inflate or
deflate reviews of products
● The volume makes it difficult to read each
review individually
● Graph visualization can help
19. Review Fraud
● First, we need to format the review data as a graph
● The nodes will be the concrete things in our data
○ First, the products/businesses being reviewed
○ Second, the review itself, which has the
date/time of the review submission and the star
rating as a property
○ Third, the known properties of the reviewer such
as device fingerprint, IP address, and e-mail
address
● The edges represent the links between the reviewer,
the review, and the business
20. Review Fraud
● Let’s zoom in to identify suspicious patterns
● On the left, we have used the KeyLines timebar to zoom in to reviews only posted on
a single day
● On the right, we see multiple negative reviews of a restaurant that day from users with
no other activity ever. Is this legitimate or an attempt to defame?
21. Review Fraud
The graph show use of a donut to illustrate whether the person left positive or negative reviews of shopping
experiences.
If this were filtered by timebar, someone
who only left positive reviews in a short
space of time, especially for venues in
places far apart, might be suspicious.
We can spot Zoey straight away without
having to look closely at the link colours.
23. The Magnificent Seven
5000 nodes and 5000 links: loading huge
networks will overload your users and not
help them find insights
Efficient layouts
Aggregation
Geospatial
Filtering
Time
Expand outwards
Social Network Analytics