5 MINUTE OVERVIEW
STARDOG
ENTERPRISE KNOWLEDGE GRAPH
stardog.com
U N C O N N E C T E D D ATA
I S A L I A B I L I T Y
E N T E R P R I S E S N E E D F L E X I B L E , R E U S A B L E
D ATA O N D E M A N D ,
W I T H L E S S D I S R U P T I O N A N D O V E R H E A D
K N O W L E D G E G R A P H I S T H E A N S W E R
F L E X I B L E
R E U S A B L E
A C C R E T I V E
K N O W L E D G E G R A P H =
K N O W L E D G E T O O L K I T + G R A P H D B
W H AT ' S A K N O W L E D G E T O O L K I T ?
V I RT U A L G R A P H S B U I L D K N O W L E D G E A C R O S S S I L O S
B U S I N E S S L O G I C B U I L D S R E U S A B L E , L O G I C A L R E A S O N I N G I N T O T H E G R A P H
M A C H I N E L E A R N I N G I N T E G R AT E S S TAT I S T I C A L R E A S O N I N G
I N T E G R I T Y C O N S T R A I N T VA L I D AT I O N E M P O W E R S D ATA S TA N D A R D S
K N O W L E D G E = D ATA P L U S R E A S O N I N G
FA C T C O U N T: 4 E X P L I C I T FA C T S
Inferno
Gareth Edwards
Rogue One
Felicity Jones
Tom Hanks
actor
director
actor
actor
K N O W L E D G E = D ATA P L U S R E A S O N I N G
actorOf inverseOf actor
directorOf inverseOf director
actorOf subPropertyOf workedOn
directorOf subPropertyOf workedOn
coworker propertyChain
(workedOn [inverseOf workedOn])
coworker subPropertyOf connectedTo
connectedTo a TransitiveProperty
Inferno
Gareth Edwards
Rogue One
Felicity Jones
Tom Hanks
actor
director
actor
actor
actorOf
actorOf
directorOf
coworker
connectedTo
coworker
connectedTo
connectedTo
, workedOn
, workedOn
, workedOn
FA C T C O U N T: 1 5 E X P L I C I T / I M P L I C I T FA C T S
B U S I N E S S L O G I C T H AT B E T T E R
E X P L A I N S T H E D O M A I N
K N O W L E D G E G R A P H S C O N N E C T A L L D ATA
C O N N E C T I N G A L L D ATA C H A N G E S E V E RY T H I N G
T H A N K Y O U
A . J . C O O K , N O R T H A M E R I C A N S A L E S
A J @ S TA R D O G . C O M
Data Modeling & Metadata
for Graph Databases
Donna Burbank
Global Data Strategy Ltd.
Lessons in Data Modeling DATAVERSITY Series
July 27th, 2017
Global Data Strategy, Ltd. 2017
Donna Burbank
Donna is a recognised industry expert in
information management with over 20
years of experience in data strategy,
information management, data modeling,
metadata management, and enterprise
architecture. Her background is multi-
faceted across consulting, product
development, product management, brand
strategy, marketing, and business
leadership.
She is currently the Managing Director at
Global Data Strategy, Ltd., an international
information management consulting
company that specializes in the alignment
of business drivers with data-centric
technology. In past roles, she has served in
key brand strategy and product
management roles at CA Technologies and
Embarcadero Technologies for several of
the leading data management products in
the market.
As an active contributor to the data
management community, she is a long
time DAMA International member, Past
President and Advisor to the DAMA Rocky
Mountain chapter, and was recently
awarded the Excellence in Data
Management Award from DAMA
International in 2016. She was on the
review committee for the Object
Management Group’s (OMG) Information
Management Metamodel (IMM) and the
Business Process Modeling Notation
(BPMN). Donna is also an analyst at the
Boulder BI Train Trust (BBBT) where she
provides advices and gains insight on the
latest BI and Analytics software in the
market.
She has worked with dozens of Fortune
500 companies worldwide in the Americas,
Europe, Asia, and Africa and speaks
regularly at industry conferences. She has
co-authored two books: Data Modeling for
the Business and Data Modeling Made
Simple with ERwin Data Modeler and is a
regular contributor to industry
publications. She can be reached at
donna.burbank@globaldatastrategy.com
Donna is based in Boulder, Colorado, USA.
2
Follow on Twitter @donnaburbank
Today’s hashtag: #LessonsDM
Global Data Strategy, Ltd. 2017
Lessons in Data Modeling Series
• January 26th How Data Modeling Fits Into an Overall Enterprise Architecture
• February 23rd Data Modeling and Business Intelligence
• March Conceptual Data Modeling – How to Get the Attention of Business Users
• April The Evolving Role of the Data Architect – What does it mean for your Career?
• May Data Modeling & Metadata Management
• June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling
• July Data Modeling & Metadata for Graph Databases
• August Data Modeling & Data Integration
• September Data Modeling & MDM
• October Agile & Data Modeling – How Can They Work Together?
• December Data Modeling, Data Quality & Data Governance
3
This Year’s Line Up
Global Data Strategy, Ltd. 2017
Word from our Sponsor
4
Stardog Enterprise Knowledge Graph
www.stardog.com
Global Data Strategy, Ltd. 2017
Agenda
• What is a Graph Database
• Use Cases for Graph Databases
• Data Modeling & Metadata for Graph Databases
5
What we’ll cover today
Global Data Strategy, Ltd. 2017
What is a Graph Database?
• A graph database uses a set of nodes, edges, and
properties to represent and store data.
• With graph databases, the relationships between data
points often matter more than the individual points
themselves. In order to leverage those data relationships,
your organization needs a database technology that stores
• These relationships can help you discover new insights
from your data.
6
Global Data Strategy, Ltd. 2017
Graph Database = Thing Relates to Thing
7
Global Data Strategy, Ltd. 2017
Graph Database = Thing Relates to Thing
8
Node
Vertice
Edge
Relationship
The more formal way of referring to “thing relates to thing” is
“Nodes & Edges”, “Vertices & Relationships”, etc.
Global Data Strategy, Ltd. 2017
Graph Databases Mirror the Way We Think
9
Squirrel!
I should go
visit Mary
I wonder how her
brother John is doing?
Is he still dating
Stephanie?
…In the mind, as in data,
there are always random
data points…
Do they still have that
house at the Lake?
Riding their boats on the lake was great.
Remember when John crashed the boat?
Like my toy
as a child.
Graph databases can be intuitive to many, since they mirror the way the human brain
typically thinks – through Association.
Global Data Strategy, Ltd. 2017
“Traditional” way of Looking at the World: Hierarchies
• Carolus Linnaeus in 1735 established a hierarchy/taxonomy for organizing and identifying
biological systems.
Kingdom
Phylum
Class
Order
Family
Genus
Species
Global Data Strategy, Ltd. 2017
“New” Way of Looking at the World - Emergence
In philosophy, systems theory, science, and art, emergence is
the way complex systems and patterns arise out of a
multiplicity of relatively simple interactions.
- Wikipedia
Global Data Strategy, Ltd. 2017
Graph Databases Combine Flexibility w/ Structure & Meaning
• In many ways, graph databases provide the “best of both worlds”.
12
Flexibility of the “New World”
of Discovery & “Emergence”
Structure & Meaning of the “Old
World” through Ontologies+
Global Data Strategy, Ltd. 2017
It’s All About Relationships
• In graph databases, relationships are first class constructs.
• Rather ironically, relational databases lack relationships.
• In relational databases, relationships are enforced through joins and constraints.
• NoSQL (e.g. Key Value) databases are also weak at supporting relationships.
13
“A relational database isn’t about relationships, it’s about constraints.”
– Karen Lopez
Customer Account
Is Owner Of
<Customer> <Owner Of> <Account>
14
Use Cases for Graph
Databases
Global Data Strategy, Ltd. 2017
Social Networks
15
Donna
Sad, Lonely Person who
doesn’t like data
Who are the cool kids?
i.e. People linked with Donna
Global Data Strategy, Ltd. 2017
X Degrees of Separation – “The Bacon Number”
• What’s Audrey Hepburn’s “Bacon Number”? i.e. degrees of separation/relation to actor Kevin Bacon
• As always, metadata and data quality are important., i.e Which Audrey Hepburn?
16Courtesy of oracleofbacon.org
Global Data Strategy, Ltd. 2017
Fraud Detection in Online Transactions
• Online transactions typically have certain identifiers, e.g. User ID, IP address, geo location, tracking cookie, credit card number, etc.
• Graph patterns can help detect fraud, e.g.
• The more interconnections exist among identifiers, the greater the cause for concern.
• Typically they would be 1:1.
• Some variations may occur, e.g. Multiple credit cards with one person. Families using same machine, etc.
• Large and tightly-knit graphs are very strong indicators that fraud is taking place.
• Triggers can be put into place so that these patterns are uncovered before they cause damage.
17
IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12 CC13 CC14 CC15 CC16 CC17
Fraud? FamilyPersonal & Business Card
Global Data Strategy, Ltd. 2017
Recommendation Engines
• Recommendation Engines are familiar to most of us who do any online shopping.
• These engines can be powered by a graph database, e.g.
• Capture a customer’s browsing behavior and demographics
• Combine those with their buying history to provide relevant recommendations
18
Global Data Strategy, Ltd. 2017
Data Quality & Volume Matters
• Recommendation engines are based on evaluating data sets. If those data sets are faulty or of
poor quality, your results will be flawed.
• Especially if the data sets are small
19
Global Data Strategy, Ltd. 2017
Master Data Management (MDM)
• Master Data Management (MDM) is the practice of identifying, cleansing, storing & governance
core data assets of the organization (e.g. customer, product, etc.)
• There are many architectural approaches to MDM. Two are the following:
20
Centralized -- Commonly Relational Virtualized/Registry – Commonly Graph
MDM
Virtualization Layer
• Core data stored in
a common schema
in a centralized
“hub”.
• Used as a common
reference for
operational systems,
DW, etc.
• Data remains in
source systems.
• Referenced through
a common
virtualization layer.
BOTH require the same core foundation of data quality, parsing & matching, semantic meaning,
data governance, etc. in order to be successful… and that’s usually the hardest stuff.
Global Data Strategy, Ltd. 2017 21
When you have a
Hammer, everything
looks like a nail
i.e. Data Warehouses serve a
particular purpose for aggregating &
summarizing data. Not ideal for
graph databases.
Graph Databases for Data Warehousing
Global Data Strategy, Ltd. 2017
Data Warehousing & Enterprise Knowledge Graph
22
Data Warehouse
…Show me Total Sales by Region and by
Customer each month in 2017
Enterprise Knowledge Graph
Relational & Dimensional data model Graph data model
…Who are my most influential
customers. (with the most connections)
Global Data Strategy, Ltd. 2017
Data Management & Ballroom Dancing
“First you dance with yourself, then with your partner, then you dance with the room.”
23
Global Data Strategy, Ltd. 2017
An Enterprise Knowledge Graph Provides a Holistic View of
the Organization through Relationships
24
“First you dance with yourself, then with your partner, then you dance with the room.”
Customer Data
Data Quality & Semantics are important
for core enterprise data assets.
Name: Audrey Hepburn
DOB: May 4, 1929
Current Customer: No
But the true value is in the
interrelationships between data assets.
Mother of
Name: Luca Dotti
DOB: February 8, 1970
Current
Customer: Yes
Purchased Yacht Insurance
Purchased Home
Insurance
Filed a Claim
25
Data Modeling &
Metadata for Graph
Databases
Global Data Strategy, Ltd. 2017
Data Modeling for Graph Databases
• There are several dominant ways to model graph databases. Two popular ones include:
• Resource Description Language (RDF) Triples
• Labeled Property Graph
26
Labeled Property Graph
• Made up of nodes, relationships, properties & labels
• Sample Query language: Cypher
• Sample Vendor: Neo4J
Resource Description Language (RDF) Triples
• Made up of subject, predicate object triples
• Sample Query: SPARQL
• Sample Vendor: Stardog
• Both have a close affinity between logical & physical models
• i.e. We already think in “thing relates to thing”
• In the following slides, we’ll use the RDF example, since that is a W3C Open Standard.
Global Data Strategy, Ltd. 2017
Graph Query Languages
• Unlike relational databases, where SQL is a general standard, there are a number
of query language options available for graph databases:
• SPARQL: is SQL-like declarative query language that was created by W3C to query RDF
(Resource Description Framework) graphs.
• Cypher: is also a declarative query language that resembles SQL. Created by Neo4J
• GraphQL: is a query language for APIs. Isn’t specific to graph databases, but can be used for
them. Developed by Facebook.
• Gremlin: is a graph traversal language developed for Apache TinkerPop™, an open source,
vendor-agnostic, graph computing framework distributed under the Apache2 license.
27
Again, we’ll use SPARQL in our examples since it’s a W3C standard.
Global Data Strategy, Ltd. 2017
Resource Description Framework (RDF)
• The RDF (Resource Description Framework) model from the World Wide Web Consortium (W3C)
provides a way to link resources on the web (people, places, things) using the concept of “triples”.
• This linking structure forms a directed, labeled graph, where the edges represent the named link
between two resources, represented by the graph nodes.
28
Subject Object
Predicate
RDF Triples
Global Data Strategy, Ltd. 2017
RDF Triple Example
29
Cynthia Fido
Is Owner Of
<Cynthia> <Owner Of> <Fido>
Reference
• Brackets indicate individual references in RDF. Note that these are
defined by URIs in RDF, but have been simplified for this example.
Subject Predicate Object
Global Data Strategy, Ltd. 2017
RDF Triples
30
<Cynthia> <type> <Person>.
<Fido> <type> <Dog>
<Cynthia> <hasName> “Cynthia Smith”
<Fido> <hasName> “Fido”
<Cynthia> <ownerOf> <Fido>
Class
Literal
Instance
Global Data Strategy, Ltd. 2017
RDF Triple Graphical Representation
• RDF triples can be intuitively visualized graphically
31
<Cynthia>
<Person>
<Fido>
<ownerOf>
“Cynthia Smith”
<hasName>
“Fido”
<hasName>
<type>
<Dog>
<type>
Global Data Strategy, Ltd. 2017
Logical Groupings
@prefix example: http://example.org/example#.
example: Cynthia rdf:type example: Person;
example: hasName “Cynthia Smith” ;
example: ownerOf example: Fido> .
Example: Fido rdf:type example: Dog;
example: hasName: “Fido” .
32
• A Person has a name
• A Person can be an owner
• A Dog has a name
Global Data Strategy, Ltd. 2017
Ontologies
• An ontology is a data model of sorts to describe the “things” in RDF data.
• Two types of languages include:
• OWL (W3C Web Ontology): is a Semantic Web language designed to represent rich and complex
knowledge about things, groups of things, and relations between things.
• RDFS (RDF Schema): is a general-purpose language for representing simple RDF vocabularies. It is
considered a precursor to OWL.
• For example:
33
• People have Names
• People can own kinds of things
• Pets can be owned
• A dog is a pet
• Dogs can have names
RDFS OWL can be more Expressive
• A Mother is union of (Parent, Woman)
• This Family ontology links with the Person ontology
(meta-meta-metadata)
• Etc.
Global Data Strategy, Ltd. 2017
Ontologies help Define Queries
34
People have Names
People can own kinds of things
Pets can be owned
A dog is a pet
Dogs can have names
Ontology
Show me all of the People who Own Dogs
Query
Global Data Strategy, Ltd. 2017
Putting Ontologies & Queries Together
35
SELECT ?name
WHERE {
?person type Person ;
hasName ?name ;
ownerOf ?pet .
?pet type Dog .
}
-> RESULT “Cynthia Smith”
Define Variables
?person type Person ;
hasName ?name ;
ownerOf ?pet .
?pet type Dog.
Write out the Graph
using Variables
Query across the
Graph
Global Data Strategy, Ltd. 2017
Summary
• Graph Databases provide powerful enterprise-wide association using simple constructs
• “Thing Relates to Thing”
• Relationships are first class constructs
• Enterprise use cases are best suited to those that focus on interrelationships between data points
• Social Networks
• Fraud Detection
• Recommendation Engines
• Enterprise Knowledge Graph
• Data Modeling & Metadata are supported by simple constructs
• Data structures through Triples: Subject, Predicate, Object
• Semantics through Ontologies (e.g. OWL)
• Queries through SPARQL and other methods
Global Data Strategy, Ltd. 2017
About Global Data Strategy, Ltd
• Global Data Strategy is an international information management consulting company that specializes
in the alignment of business drivers with data-centric technology.
• Our passion is data, and helping organizations enrich their business opportunities through data and
information.
• Our core values center around providing solutions that are:
• Business-Driven: We put the needs of your business first, before we look at any technology solution.
• Clear & Relevant: We provide clear explanations using real-world examples.
• Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s
size, corporate culture, and geography.
• High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of
technical expertise in the industry.
37
Data-Driven Business Transformation
Business Strategy
Aligned With
Data Strategy
Visit www.globaldatastrategy.com for more information
Global Data Strategy, Ltd. 2017
Contact Info
• Email: donna.burbank@globaldatastrategy.com
• Twitter: @donnaburbank
@GlobalDataStrat
• Website: www.globaldatastrategy.com
38
Global Data Strategy, Ltd. 2017
Lessons in Data Modeling Series
• January 26th How Data Modeling Fits Into an Overall Enterprise Architecture
• February 23rd Data Modeling and Business Intelligence
• March Conceptual Data Modeling – How to Get the Attention of Business Users
• April The Evolving Role of the Data Architect – What does it mean for your Career?
• May Data Modeling & Metadata Management
• June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling
• July Data Modeling & Metadata for Graph Databases
• August Data Modeling & Data Integration
• September Data Modeling & MDM
• October Agile & Data Modeling – How Can They Work Together?
• December Data Modeling, Data Quality & Data Governance
39
This Year’s Line Up
Global Data Strategy, Ltd. 2017
Questions?
40
Thoughts? Ideas?

Data Modeling & Metadata for Graph Databases

  • 1.
    5 MINUTE OVERVIEW STARDOG ENTERPRISEKNOWLEDGE GRAPH stardog.com
  • 2.
    U N CO N N E C T E D D ATA I S A L I A B I L I T Y
  • 3.
    E N TE R P R I S E S N E E D F L E X I B L E , R E U S A B L E D ATA O N D E M A N D , W I T H L E S S D I S R U P T I O N A N D O V E R H E A D
  • 4.
    K N OW L E D G E G R A P H I S T H E A N S W E R F L E X I B L E R E U S A B L E A C C R E T I V E
  • 5.
    K N OW L E D G E G R A P H = K N O W L E D G E T O O L K I T + G R A P H D B
  • 6.
    W H AT' S A K N O W L E D G E T O O L K I T ? V I RT U A L G R A P H S B U I L D K N O W L E D G E A C R O S S S I L O S B U S I N E S S L O G I C B U I L D S R E U S A B L E , L O G I C A L R E A S O N I N G I N T O T H E G R A P H M A C H I N E L E A R N I N G I N T E G R AT E S S TAT I S T I C A L R E A S O N I N G I N T E G R I T Y C O N S T R A I N T VA L I D AT I O N E M P O W E R S D ATA S TA N D A R D S
  • 7.
    K N OW L E D G E = D ATA P L U S R E A S O N I N G FA C T C O U N T: 4 E X P L I C I T FA C T S Inferno Gareth Edwards Rogue One Felicity Jones Tom Hanks actor director actor actor
  • 8.
    K N OW L E D G E = D ATA P L U S R E A S O N I N G actorOf inverseOf actor directorOf inverseOf director actorOf subPropertyOf workedOn directorOf subPropertyOf workedOn coworker propertyChain (workedOn [inverseOf workedOn]) coworker subPropertyOf connectedTo connectedTo a TransitiveProperty Inferno Gareth Edwards Rogue One Felicity Jones Tom Hanks actor director actor actor actorOf actorOf directorOf coworker connectedTo coworker connectedTo connectedTo , workedOn , workedOn , workedOn FA C T C O U N T: 1 5 E X P L I C I T / I M P L I C I T FA C T S B U S I N E S S L O G I C T H AT B E T T E R E X P L A I N S T H E D O M A I N
  • 9.
    K N OW L E D G E G R A P H S C O N N E C T A L L D ATA C O N N E C T I N G A L L D ATA C H A N G E S E V E RY T H I N G
  • 10.
    T H AN K Y O U A . J . C O O K , N O R T H A M E R I C A N S A L E S A J @ S TA R D O G . C O M
  • 11.
    Data Modeling &Metadata for Graph Databases Donna Burbank Global Data Strategy Ltd. Lessons in Data Modeling DATAVERSITY Series July 27th, 2017
  • 12.
    Global Data Strategy,Ltd. 2017 Donna Burbank Donna is a recognised industry expert in information management with over 20 years of experience in data strategy, information management, data modeling, metadata management, and enterprise architecture. Her background is multi- faceted across consulting, product development, product management, brand strategy, marketing, and business leadership. She is currently the Managing Director at Global Data Strategy, Ltd., an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. In past roles, she has served in key brand strategy and product management roles at CA Technologies and Embarcadero Technologies for several of the leading data management products in the market. As an active contributor to the data management community, she is a long time DAMA International member, Past President and Advisor to the DAMA Rocky Mountain chapter, and was recently awarded the Excellence in Data Management Award from DAMA International in 2016. She was on the review committee for the Object Management Group’s (OMG) Information Management Metamodel (IMM) and the Business Process Modeling Notation (BPMN). Donna is also an analyst at the Boulder BI Train Trust (BBBT) where she provides advices and gains insight on the latest BI and Analytics software in the market. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa and speaks regularly at industry conferences. She has co-authored two books: Data Modeling for the Business and Data Modeling Made Simple with ERwin Data Modeler and is a regular contributor to industry publications. She can be reached at donna.burbank@globaldatastrategy.com Donna is based in Boulder, Colorado, USA. 2 Follow on Twitter @donnaburbank Today’s hashtag: #LessonsDM
  • 13.
    Global Data Strategy,Ltd. 2017 Lessons in Data Modeling Series • January 26th How Data Modeling Fits Into an Overall Enterprise Architecture • February 23rd Data Modeling and Business Intelligence • March Conceptual Data Modeling – How to Get the Attention of Business Users • April The Evolving Role of the Data Architect – What does it mean for your Career? • May Data Modeling & Metadata Management • June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling • July Data Modeling & Metadata for Graph Databases • August Data Modeling & Data Integration • September Data Modeling & MDM • October Agile & Data Modeling – How Can They Work Together? • December Data Modeling, Data Quality & Data Governance 3 This Year’s Line Up
  • 14.
    Global Data Strategy,Ltd. 2017 Word from our Sponsor 4 Stardog Enterprise Knowledge Graph www.stardog.com
  • 15.
    Global Data Strategy,Ltd. 2017 Agenda • What is a Graph Database • Use Cases for Graph Databases • Data Modeling & Metadata for Graph Databases 5 What we’ll cover today
  • 16.
    Global Data Strategy,Ltd. 2017 What is a Graph Database? • A graph database uses a set of nodes, edges, and properties to represent and store data. • With graph databases, the relationships between data points often matter more than the individual points themselves. In order to leverage those data relationships, your organization needs a database technology that stores • These relationships can help you discover new insights from your data. 6
  • 17.
    Global Data Strategy,Ltd. 2017 Graph Database = Thing Relates to Thing 7
  • 18.
    Global Data Strategy,Ltd. 2017 Graph Database = Thing Relates to Thing 8 Node Vertice Edge Relationship The more formal way of referring to “thing relates to thing” is “Nodes & Edges”, “Vertices & Relationships”, etc.
  • 19.
    Global Data Strategy,Ltd. 2017 Graph Databases Mirror the Way We Think 9 Squirrel! I should go visit Mary I wonder how her brother John is doing? Is he still dating Stephanie? …In the mind, as in data, there are always random data points… Do they still have that house at the Lake? Riding their boats on the lake was great. Remember when John crashed the boat? Like my toy as a child. Graph databases can be intuitive to many, since they mirror the way the human brain typically thinks – through Association.
  • 20.
    Global Data Strategy,Ltd. 2017 “Traditional” way of Looking at the World: Hierarchies • Carolus Linnaeus in 1735 established a hierarchy/taxonomy for organizing and identifying biological systems. Kingdom Phylum Class Order Family Genus Species
  • 21.
    Global Data Strategy,Ltd. 2017 “New” Way of Looking at the World - Emergence In philosophy, systems theory, science, and art, emergence is the way complex systems and patterns arise out of a multiplicity of relatively simple interactions. - Wikipedia
  • 22.
    Global Data Strategy,Ltd. 2017 Graph Databases Combine Flexibility w/ Structure & Meaning • In many ways, graph databases provide the “best of both worlds”. 12 Flexibility of the “New World” of Discovery & “Emergence” Structure & Meaning of the “Old World” through Ontologies+
  • 23.
    Global Data Strategy,Ltd. 2017 It’s All About Relationships • In graph databases, relationships are first class constructs. • Rather ironically, relational databases lack relationships. • In relational databases, relationships are enforced through joins and constraints. • NoSQL (e.g. Key Value) databases are also weak at supporting relationships. 13 “A relational database isn’t about relationships, it’s about constraints.” – Karen Lopez Customer Account Is Owner Of <Customer> <Owner Of> <Account>
  • 24.
    14 Use Cases forGraph Databases
  • 25.
    Global Data Strategy,Ltd. 2017 Social Networks 15 Donna Sad, Lonely Person who doesn’t like data Who are the cool kids? i.e. People linked with Donna
  • 26.
    Global Data Strategy,Ltd. 2017 X Degrees of Separation – “The Bacon Number” • What’s Audrey Hepburn’s “Bacon Number”? i.e. degrees of separation/relation to actor Kevin Bacon • As always, metadata and data quality are important., i.e Which Audrey Hepburn? 16Courtesy of oracleofbacon.org
  • 27.
    Global Data Strategy,Ltd. 2017 Fraud Detection in Online Transactions • Online transactions typically have certain identifiers, e.g. User ID, IP address, geo location, tracking cookie, credit card number, etc. • Graph patterns can help detect fraud, e.g. • The more interconnections exist among identifiers, the greater the cause for concern. • Typically they would be 1:1. • Some variations may occur, e.g. Multiple credit cards with one person. Families using same machine, etc. • Large and tightly-knit graphs are very strong indicators that fraud is taking place. • Triggers can be put into place so that these patterns are uncovered before they cause damage. 17 IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1 CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12 CC13 CC14 CC15 CC16 CC17 Fraud? FamilyPersonal & Business Card
  • 28.
    Global Data Strategy,Ltd. 2017 Recommendation Engines • Recommendation Engines are familiar to most of us who do any online shopping. • These engines can be powered by a graph database, e.g. • Capture a customer’s browsing behavior and demographics • Combine those with their buying history to provide relevant recommendations 18
  • 29.
    Global Data Strategy,Ltd. 2017 Data Quality & Volume Matters • Recommendation engines are based on evaluating data sets. If those data sets are faulty or of poor quality, your results will be flawed. • Especially if the data sets are small 19
  • 30.
    Global Data Strategy,Ltd. 2017 Master Data Management (MDM) • Master Data Management (MDM) is the practice of identifying, cleansing, storing & governance core data assets of the organization (e.g. customer, product, etc.) • There are many architectural approaches to MDM. Two are the following: 20 Centralized -- Commonly Relational Virtualized/Registry – Commonly Graph MDM Virtualization Layer • Core data stored in a common schema in a centralized “hub”. • Used as a common reference for operational systems, DW, etc. • Data remains in source systems. • Referenced through a common virtualization layer. BOTH require the same core foundation of data quality, parsing & matching, semantic meaning, data governance, etc. in order to be successful… and that’s usually the hardest stuff.
  • 31.
    Global Data Strategy,Ltd. 2017 21 When you have a Hammer, everything looks like a nail i.e. Data Warehouses serve a particular purpose for aggregating & summarizing data. Not ideal for graph databases. Graph Databases for Data Warehousing
  • 32.
    Global Data Strategy,Ltd. 2017 Data Warehousing & Enterprise Knowledge Graph 22 Data Warehouse …Show me Total Sales by Region and by Customer each month in 2017 Enterprise Knowledge Graph Relational & Dimensional data model Graph data model …Who are my most influential customers. (with the most connections)
  • 33.
    Global Data Strategy,Ltd. 2017 Data Management & Ballroom Dancing “First you dance with yourself, then with your partner, then you dance with the room.” 23
  • 34.
    Global Data Strategy,Ltd. 2017 An Enterprise Knowledge Graph Provides a Holistic View of the Organization through Relationships 24 “First you dance with yourself, then with your partner, then you dance with the room.” Customer Data Data Quality & Semantics are important for core enterprise data assets. Name: Audrey Hepburn DOB: May 4, 1929 Current Customer: No But the true value is in the interrelationships between data assets. Mother of Name: Luca Dotti DOB: February 8, 1970 Current Customer: Yes Purchased Yacht Insurance Purchased Home Insurance Filed a Claim
  • 35.
    25 Data Modeling & Metadatafor Graph Databases
  • 36.
    Global Data Strategy,Ltd. 2017 Data Modeling for Graph Databases • There are several dominant ways to model graph databases. Two popular ones include: • Resource Description Language (RDF) Triples • Labeled Property Graph 26 Labeled Property Graph • Made up of nodes, relationships, properties & labels • Sample Query language: Cypher • Sample Vendor: Neo4J Resource Description Language (RDF) Triples • Made up of subject, predicate object triples • Sample Query: SPARQL • Sample Vendor: Stardog • Both have a close affinity between logical & physical models • i.e. We already think in “thing relates to thing” • In the following slides, we’ll use the RDF example, since that is a W3C Open Standard.
  • 37.
    Global Data Strategy,Ltd. 2017 Graph Query Languages • Unlike relational databases, where SQL is a general standard, there are a number of query language options available for graph databases: • SPARQL: is SQL-like declarative query language that was created by W3C to query RDF (Resource Description Framework) graphs. • Cypher: is also a declarative query language that resembles SQL. Created by Neo4J • GraphQL: is a query language for APIs. Isn’t specific to graph databases, but can be used for them. Developed by Facebook. • Gremlin: is a graph traversal language developed for Apache TinkerPop™, an open source, vendor-agnostic, graph computing framework distributed under the Apache2 license. 27 Again, we’ll use SPARQL in our examples since it’s a W3C standard.
  • 38.
    Global Data Strategy,Ltd. 2017 Resource Description Framework (RDF) • The RDF (Resource Description Framework) model from the World Wide Web Consortium (W3C) provides a way to link resources on the web (people, places, things) using the concept of “triples”. • This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. 28 Subject Object Predicate RDF Triples
  • 39.
    Global Data Strategy,Ltd. 2017 RDF Triple Example 29 Cynthia Fido Is Owner Of <Cynthia> <Owner Of> <Fido> Reference • Brackets indicate individual references in RDF. Note that these are defined by URIs in RDF, but have been simplified for this example. Subject Predicate Object
  • 40.
    Global Data Strategy,Ltd. 2017 RDF Triples 30 <Cynthia> <type> <Person>. <Fido> <type> <Dog> <Cynthia> <hasName> “Cynthia Smith” <Fido> <hasName> “Fido” <Cynthia> <ownerOf> <Fido> Class Literal Instance
  • 41.
    Global Data Strategy,Ltd. 2017 RDF Triple Graphical Representation • RDF triples can be intuitively visualized graphically 31 <Cynthia> <Person> <Fido> <ownerOf> “Cynthia Smith” <hasName> “Fido” <hasName> <type> <Dog> <type>
  • 42.
    Global Data Strategy,Ltd. 2017 Logical Groupings @prefix example: http://example.org/example#. example: Cynthia rdf:type example: Person; example: hasName “Cynthia Smith” ; example: ownerOf example: Fido> . Example: Fido rdf:type example: Dog; example: hasName: “Fido” . 32 • A Person has a name • A Person can be an owner • A Dog has a name
  • 43.
    Global Data Strategy,Ltd. 2017 Ontologies • An ontology is a data model of sorts to describe the “things” in RDF data. • Two types of languages include: • OWL (W3C Web Ontology): is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things. • RDFS (RDF Schema): is a general-purpose language for representing simple RDF vocabularies. It is considered a precursor to OWL. • For example: 33 • People have Names • People can own kinds of things • Pets can be owned • A dog is a pet • Dogs can have names RDFS OWL can be more Expressive • A Mother is union of (Parent, Woman) • This Family ontology links with the Person ontology (meta-meta-metadata) • Etc.
  • 44.
    Global Data Strategy,Ltd. 2017 Ontologies help Define Queries 34 People have Names People can own kinds of things Pets can be owned A dog is a pet Dogs can have names Ontology Show me all of the People who Own Dogs Query
  • 45.
    Global Data Strategy,Ltd. 2017 Putting Ontologies & Queries Together 35 SELECT ?name WHERE { ?person type Person ; hasName ?name ; ownerOf ?pet . ?pet type Dog . } -> RESULT “Cynthia Smith” Define Variables ?person type Person ; hasName ?name ; ownerOf ?pet . ?pet type Dog. Write out the Graph using Variables Query across the Graph
  • 46.
    Global Data Strategy,Ltd. 2017 Summary • Graph Databases provide powerful enterprise-wide association using simple constructs • “Thing Relates to Thing” • Relationships are first class constructs • Enterprise use cases are best suited to those that focus on interrelationships between data points • Social Networks • Fraud Detection • Recommendation Engines • Enterprise Knowledge Graph • Data Modeling & Metadata are supported by simple constructs • Data structures through Triples: Subject, Predicate, Object • Semantics through Ontologies (e.g. OWL) • Queries through SPARQL and other methods
  • 47.
    Global Data Strategy,Ltd. 2017 About Global Data Strategy, Ltd • Global Data Strategy is an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. • Our passion is data, and helping organizations enrich their business opportunities through data and information. • Our core values center around providing solutions that are: • Business-Driven: We put the needs of your business first, before we look at any technology solution. • Clear & Relevant: We provide clear explanations using real-world examples. • Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s size, corporate culture, and geography. • High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of technical expertise in the industry. 37 Data-Driven Business Transformation Business Strategy Aligned With Data Strategy Visit www.globaldatastrategy.com for more information
  • 48.
    Global Data Strategy,Ltd. 2017 Contact Info • Email: donna.burbank@globaldatastrategy.com • Twitter: @donnaburbank @GlobalDataStrat • Website: www.globaldatastrategy.com 38
  • 49.
    Global Data Strategy,Ltd. 2017 Lessons in Data Modeling Series • January 26th How Data Modeling Fits Into an Overall Enterprise Architecture • February 23rd Data Modeling and Business Intelligence • March Conceptual Data Modeling – How to Get the Attention of Business Users • April The Evolving Role of the Data Architect – What does it mean for your Career? • May Data Modeling & Metadata Management • June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling • July Data Modeling & Metadata for Graph Databases • August Data Modeling & Data Integration • September Data Modeling & MDM • October Agile & Data Modeling – How Can They Work Together? • December Data Modeling, Data Quality & Data Governance 39 This Year’s Line Up
  • 50.
    Global Data Strategy,Ltd. 2017 Questions? 40 Thoughts? Ideas?