The document provides an agenda for a Graph Tour presentation which includes introductions, an overview of graphs and Neo4j, use cases in government and finance, digital transformation and the future. Common themes of connectedness in data are discussed. Neo4j is described as an enterprise-grade native graph platform that enables storing, revealing and querying data relationships.
5. Connectedness Represented in Graphs
C
C
A AA
U
S S SS S
USER_ACCESS
CONTROLLED_BY
SUBSCRIBED _BY
User
Customers
Accounts
Subscriptions
VP
Staff Staff StaffStaff
DirectorStaffDirector
Manager Manager Manager Manager
Fiber
Link
Fiber
Link
Fiber
Link
Ocean
Cable
Switch Switch
Router Router
Service
Organizational
Hierarchy
Product
Subscriptions
Network
Operations
Social
Networks
6. Who We Are: The Graph Platform
Neo4j is an enterprise-grade native graph platform that enables you to:
• Store, reveal and query data relationships
• Traverse and analyze any levels of depth in real-time
• Add context and connect new data on the fly
• Performance
• ACID Transactions
• Agility
• Graph Algorithms
6
Designed, built and tested natively
for graphs from the start for:
• Developer Productivity
• Hardware Efficiency
• Global Scale
• Graph Adoption
7. The Rise of Connections in Data
Networks of People Business Processes Knowledge Networks
E.g., Risk management, Supply
chain, Payments
E.g., Employees, Customers,
Suppliers, Partners,
Influencers
E.g., Enterprise content,
Domain specific content,
eCommerce content
Data connections are increasing as rapidly as data volumes
13. CAR
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Neo4j Invented the Labeled Property Graph Model
Nodes
• Can have Labels to classify nodes
• Labels have native indexes
Relationships
• Relate nodes by type and direction
Properties
• Attributes of Nodes & Relationships
• Stored as Name/Value pairs
• Can have indexes and composite
indexes
MARRIED TO
LIVES WITH
PERSON PERSON
13
15. 10M+
Downloads
3M+ from Neo4j Distribution
7M+ from Docker
Events
400+
Approximate Number of
Neo4j Events per Year
50k+
Meetups
Number of Meetup
Members Globally
Largest pool of graph technologists
50k+
Trained/certified Neo4j
professionals
Trained Developers
16. How Neo4j Fits — Common Architecture Patterns
From Disparate Silos
To Cross-Silo Connections
From Tabular Data
To Connected Data
From Data Lake Analytics
to Real-Time Operations
19. Connecting Roles & Projects around Enterprise Data Hub
Data Scientists
Real-time
Graph traversal
Applications
Developers
& Prod Mgrs
Analysts and
Business Users
Chief Officers of …
Compliance, Data, Digital,
Information, Innovation,
Marketing, Operations, Risk &
Security…
Big Data IT &
Architecture
ID, Auth & Security
Network & IT Ops
Metadata
Management
360⁰
Marketing
Customer 360
Real-time
Cybersecurity
Account navigation
• Multiple paths through
organization
• Graphs have strong
appetite for data to add
nodes & increase density
of relationships
• Value of graph increase
according to Metcalfe’s
Law (V=n2)
• Customer applications
iterate every 3 months
23. “Lessons Learned Database”
A half-century of collective NASA engineering
knowledge
“How do we make sure the command module
doesn’t tip over and sink?”
24. Let’s Hear a Few Stories
— David Meza, Chief Knowledge
Architect at NASA
“Neo4j saved well over two
years of work and one million
dollars of taxpayer funds.”
Impact
26. Business Problem
• Find relationships between people, accounts,
shell companies and offshore accounts
• Journalists are non-technical
• Biggest “Snowden-Style” document leak ever;
11.5 million documents, 2.6TB of data
Solution and Benefits
• Pulitzer Prize winning investigation resulted in
robust coverage of fraud and corruption
• PM of Iceland & Pakistan resigned, exposed
Putin, Prime Ministers, gangsters, celebrities
(Messi)
• Led to assassination of journalist in Malta
Background
• International Consortium of Investigative
Journalists (ICIJ), small team of data journalists
• International investigative team specializing in
cross-border crime, corruption and accountability
of power
• Works regularly with leaks and large datasets
ICIJ Panama Papers INVESTIGATIVE JOURNALISM
Fraud Detection / Knowledge Graph26
30. Business Problem
• Find relationships between people, corporations,
accounts, shell companies and offshore accounts
• Journalists are non-technical
• 2017 Leak from Appleby tax sheltering law firm
matched 13.4 million account records with public
business registrations data from across Caribbean
Solution and Benefits
• Exposed tax sheltering practices of Apple, Nike
• Revealed hidden connections among politicians
and nations, like Wilbur Ross & Putin’s son in law
• Triggered government tax evasion investigations in
US, UK, Europe, India, Australia, Bermuda, Canada
and Cayman Islands within 2 days.
Background
• International Consortium of Investigative
Journalists (ICIJ), Pulitzer Prize winning journalists
• Fourth blockbuster investigation using Neo4j to
reveal connections in text-based, and account-
based data leaked from offshore law firms and
government records about the “1% Elite”
• Appends Neo4j-based, “Offshore Leaks Database”
ICIJ Paradise Papers INVESTIGATIVE JOURNALISM
Fraud Detection / Knowledge Graph30
33. Endpoint-Centric
Analysis of users and
their end-points
1.
Navigation Centric
Analysis of
navigation behavior
and suspect patterns
2.
Account-Centric
Analysis of anomaly
behavior by channel
3.
PC:s
Mobile Phones
IP-addresses
User ID:s
Comparing Transaction
Identity Vetting
Traditional Fraud Detection Methods
DISCRETE ANALYSIS
34. Endpoint-Centric
Analysis of users and
their end-points
1.
Navigation Centric
Analysis of
navigation behavior
and suspect patterns
2.
Account-Centric
Analysis of anomaly
behavior by channel
3.
PC:s
Mobile Phones
IP-addresses
User ID:s
Comparing Transaction
Identity Vetting
Traditional Fraud Detection Methods
DISCRETE ANALYSIS Unable to detect
• Fraud rings
• Fake IP addresses
• Hijacked devices
• Synthetic Identities
• Stolen Identities
• And more…
Weaknesses
35. Entity Linking
Analysis of relationships
to detect organized
crime and collusion
5.
Endpoint-Centric
Analysis of users and
their end-points
1.
Navigation Centric
Analysis of
navigation behavior
and suspect patterns
2.
Account-Centric
Analysis of anomaly
behavior by channel
3.
PC:s
Mobile Phones
IP-addresses
User ID:s
Comparing Transaction
Identity Vetting
Augment Methods by Examining Connected Data
DISCRETE ANALYSIS CONNECTED ANALYSIS
Cross Channel
Analysis of anomaly
behavior correlated
across channels
4.
36. ACCOUNT
HOLDER 2
ACCOUNT
HOLDER 1
ACCOUNT
HOLDER 3
CREDIT
CARD
BANK
ACCOUNT
BANK
ACCOUNT
BANK
ACCOUNT
PHONE
NUMBER
UNSECURED
LOAN
SSN 2
UNSECURED
LOAN
Modeling Fraud Transactions as an Organized Ring
At first glance, each
account holder
looks normal.
Each has multiple
accounts…
39. Internal Risk Models Span Investment Data Silos
Graph technology unites discrete silos into a unified data source
that enables banks to trace compliance data lineage
40. Tracing Risk Lineage
FRTB requires banks to
calculate investment risk at the
trading-desk level…
…requiring them to evaluate
the interrelatedness of every
investment on their books.
Graph technology is
the right solution
41. The Role of Ontologies in Financial Risk Management
43. Business Problem
• Needed new asset management backbone to
handle scheduling, ads, sales and pushing
linear streams to satellites
• Novell LDAP content hierarchy not flexible
enough to store graph-based business content
Solution and Benefits
• Neo4j selected for performance and domain fit
• Flexible, native storage of content hierarchy
• Graph includes metadata used by all systems:
TV series-->Episodes-->Blocks with Tags-->
Linked Content, tagged with legal rights, actors,
dubbing et al
Background
• Nashville-based developer of lifestyle-
oriented content for TV, digital, mobile and
publishing
• Web properties generate tens of millions of
unique visitors per month
Scripps Networks MEDIA AND ENTERTAINMENT
Master Data Management43
44. Background
• SF-based C2C rental platform
• Dataportal democratizes data access for
growing number of employees while improving
discoverability and trust
• Data strewn everywhere—in silos, in segmented
departments, nothing was universally accessible
Business Problem
• Data-driven culture hampered by variety and
dependability of data, tribal knowledge and
word-of-mouth distribution
• Needed visibility into information usage, context,
lineage and popularity across company of 3,000+
Solution and Benefits
• Offers search with context & metadata, user &
team-centric pages for origin & lineage
• Nodes are resources: data tables, dashboards,
reports, users, teams, business outcomes, etc.
• Relationships reflect consumption, production,
association, etc.
• Neo4j, Elasticsearch, Python
Airbnb Dataportal TRAVEL TECHNOLOGY
Knowledge Graph, Metadata Management44
CE users since 2017
46. "Neo4j continues to dominate
the graph database market.”
October, 2017
“Customers choose Neo4j
to drive innovation.”
February, 2018
“In fact, the rapid rise of Neo4j and
other graph technologies may
signal that data connectedness is
indeed a separate paradigm from
the model consolidation happening
across the rest of the NoSQL
landscape.”
March, 2018
Graph is a Unique Paradigm
47. 47
Harnessing Connections Drives Business Value
Enhanced Decision
Making
Hyper
Personalization
Massive Data
Integration
Data Driven Discovery
& Innovation
Product Recommendations
Personalized Health Care
Media and Advertising
Fraud Prevention
Network Analysis
Law Enforcement
Drug Discovery
Intelligence and Crime Detection
Product & Process Innovation
360 view of customer
Compliance
Optimize Operations
Connected Data at the Center
AI & Machine
Learning
Price optimization
Product Recommendations
Resource allocation
Digital Transformation Megatrends
48. Background
• Personal shopping assistant
• Converses with buyer via text, picture and voice
to provide real-time recommendations
• Combines AI and natural language understanding
(NLU) in Neo4j Knowledge Graph
• First of many apps in eBay's AI Platform
Business Problem
• Improve personal context in online shopping
• Transform buyer-provided context into ideal
purchase recommendations over social platforms
• "Feels like talking to a friend"
Solution and Benefits
• 3 developers, 8M nodes, 20M relationships
• Needed high-performance traversals to respond
to live customer requests
• Easy to train new algorithms and grow model
• Generating revenue since launch
eBay ShopBot ONLINE RETAIL
Knowledge Graph powers Real-Time Recommendations48
EE Customer since 2016 Q3
57. Customer Graph
Product Graph
Supply Graph
Real-time product
recommendations
Real-time supply
chain management
Real-time risk mitigation
Region
Street
Customer
Address
Phone
Customer
Email
Email
Address
Phone
Product
Product
Category
Product
Category
Store
Street
Store
The Graph Behind Online Stores
58. Business Problem
• Optimize walmart.com user experience
• Connect complex buyer and product data to
gain super-fast insight into customer needs and
product trends
• RDBMS couldn’t handle complex queries
Solution and Benefits
• Replaced complex batch process real-time online
recommendations
• Built simple, real-time recommendation system
with low-latency queries
• Serve better and faster recommendations by
combining historical and session data
Background
• Founded in 1962 and based in Arkansas
• 11,000+ stores in 27 countries with
walmart.com online store
• 2M+ employees and $470 billion in annual
revenues
Walmart RETAIL
Real-Time Recommendations58
64. Neo4j — Changing the World
ICIJ used Neo4j to uncover the world’s largest
journalistic leak to date, The Panama Papers,
exposing criminals, corruption and extensive
tax evasion.
The US space agency uses Neo4j for their
“Lessons Learned” database to connect
information to improve search ability
effectiveness in space mission.
eBay uses Neo4j to enable machine
learning through knowledge graphs
powering “conversational commerce”.
Knowledge Graph for AIFraud Detection Knowledge Graph for humans
65. Neo4j - The Graph Company
720+
7/10
12/25
8/10
53K+
100+
270+
450+
Adoption
Top Retail Firms
Top Financial Firms
Top Software Vendors
Customers Partners
• Creator of the Neo4j Graph Platform
• ~200 employees
• HQ in Silicon Valley, other offices include
London, Munich, Paris and Malmö (Sweden)
• $80M in funding from Fidelity, Sunstone,
Conor, Creandum, and Greenbridge Capital
• Over 10M+ downloads
• 270+ enterprise subscription customers with
over half with >$1B in revenue
Ecosystem
Startups in program
Enterprise customers
Partners
Meet up members
Events per year
Industry’s Largest Dedicated Investment in Graphs
66. 66
• Record “Cyber Monday” sales
• About 35M daily transactions
• Each transaction is 3-22 hops
• Queries executed in 4ms or less
• Replaced IBM Websphere commerce
• 300M pricing operations per day
• 10x transaction throughput on half the
hardware compared to Oracle
• Replaced Oracle database
• Large postal service with over 500k
employees
• Neo4j routes 7M+ packages daily at peak,
with peaks of 5,000+ routing operations per
second.
Handling Large Graph Work Loads for Enterprises
Real-time promotion
recommendations
Marriott’s Real-time
Pricing Engine
Handling Package
Routing in Real-Time
67. Modelling the Supply Chain
to Find Counterfeit Products
Connected Data analysis reveals hidden
relationships, which can aid in
identifying counterfeiter supply chains.
69. Background
• One of the world’s largest logistics carriers
• Projected to outgrow capacity of old system
• New parcel routing system
Single source of truth for entire network
B2C and B2B parcel tracking
Real-time routing: up to 7M parcels per day
Business Problem
• Needed 365x24x7 availability
• Peak loads of 3000+ parcels per second
• Complex and diverse software stack
• Need predictable performance, linear scalability
• Daily changes to logistics network: route from
any point to any point
Solution and Benefits
• Ideal domain fit: a logistics network is a graph
• Extreme availability, performance via clustering
• Greatly simplified routing queries vs. relational
• Flexible data model reflect real-world data
variance much better than relational
• Whiteboard-friendly model easy to understand
Accenture LOGISTICS IMPLEMENTATION AT DHL
69 Real-Time Routing Recommendations
70. Business Problem
• Enable delivery in London within 90 minutes
• Manage network of routes, carriers and couriers
• Calculate delivery options and times in real time
across all possible routes
• Scale to enable a variety of services, including
same-day and consumer-to-consumer shipping
Solution and Benefits
• Calculates all possible routes in real time
• Thousands of times faster than MySQL solution
• Queries require up to 100 times less code,
improving time-to-market and code quality
• Adding new functionality that was
previously impossible
Background
• eBay acquired London-based Shutl bring same-
day delivery to London to counter Amazon
Prime and to expand its global retail presence
• Founded in 2009, Shutl was the UK leader in
same-day delivery with 70% of the market
eBay Now RETAIL DELIVERY and SUPPLY CHAIN ROUTING
Real-Time Routing70
72. 72
Graph Platform: Connects to Many Roles in Enterprise
DEVELOPERS
ADMINS
Graph
Analytics
Graph
Transactions
DATA
ANALYSTS
DATA
SCIENTISTS
APPLICATIONS
Drivers & APIs
Data Integration
BIG DATA IT
Analytics
Tooling
BUSINESS USERS
Discovery & Visualization
Development &
Administration
75. Cypher: Powerful and Expressive Query Language
MATCH (:Person { name:“Dan”} ) -[:MARRIED_TO]-> (spouse)
MARRIED_TO
Dan Ann
NODE RELATIONSHIP TYPE
LABEL PROPERTY VARIABLE
76. Why Cypher is Better
Ease of use drives adoption & popularity
• Demonstrable maturity and proven success
• Huge ecosystem and support network
• Visibly represents relationships & paths
• Declarative language is easy to learn
Cypher is Open, Easy and Everywhere
Cypher on Apache
Spark (CAPS)
Cypher ToolingBI Tools
Integration
Tools Cypher on …
Additional Sources
Apache Hadoop
Accelerating Market Adoption
• openCypher participation is growing
• Reference model for ISO, other research projects
• SQL compatible and complementary
• Released for under friendly Apache license
Evolving and Expanding Rapidly
Incorporating new ideas for Cypher such as:
• Return results as graphs OR tables of data (composability)
• compose subqueries and chain-linking query algorithms
• build graph expressions
• define new graph object types like walks, runs and paths
77. Finds the optimal path
or evaluates route
availability and quality
• Single Source Short Path
• All-Nodes SSP
• Parallel paths
Evaluates how a
group is clustered or
partitioned
• Label Propagation
• Union Find
• Strongly Connected
Components
• Louvain
• Triangle-Count
Determines the
importance of distinct
nodes in the network
• PageRank
• Betweeness
• Closeness
• Degree
Data Science Algorithms