Presentation given by Atif Khan, VP AI and Data Science, Messagepoint at the August meetup event of the Waterloo Data Science and Data Engineering group.
2. Real world entities are complex
❖ multi-dimensional
❖ rich interactions
Motivation
2
fan_of
like
colleague
reports_to
wife
3. Entities are described using multiple
heterogeneous datasets
❖ entity attributes/properties
❖ entity-to-entity interactions
Understanding an entity demands linking/joining
data across datasets
Motivation
3
scale of raw
information
4. Entities are described using multiple
heterogeneous datasets
❖ entity attributes/properties
❖ entity-to-entity interactions
Understanding an entity demands linking/joining
data across datasets
Motivation
4
scale of raw
information
8. An entity is modelled as a node
Graph Representation of Data
8
A
B
9. An entity is modelled as a node
An entity-to-entity relationship is
modelled as an edge
edges can be directed (son_of)
or non-directed (married_to)
Graph Representation of Data
9
A
B
10. In a property graph,
nodes (entities) & edges (relationships)
can define their own properties
Graph Representation of Data
10
A B
name: john
age: 24
name: mary
age: 22
gender: F
profession: singer
start: 1/1/1974
married_to
11. An information schema is also a (sub)graph
Graph Representation of Data
11
A B
married_to
12. An information schema is also a (sub)graph
Graph Representation of Data
12
A B
married_to
P P
married_to
instance_ofinstance_of
range: P
domain: P
start:
23. Graph Traversal (Query)
23
How many relatives
are there?
A: 3
brother
brother
wife
friend
friend
john
dave
april
arthur
mary
alice
relative
relativerelative
26. Information >> Knowledge
26
Will Smith @wsmith
Finished registering for KDD 2018
in London
http://www.kdd.org/kdd2018/
William Smith shared a link
I am off to Knowledge Discovery and
Data mining conference in London.
Looking forward to Michael Jordan’s
keynote address.
27. Information >> Knowledge
27
Will Smith @wsmith
Finished registering for KDD 2018
in London
http://www.kdd.org/kdd2018/
William Smith shared a link
I am off to Knowledge Discovery and
Data mining conference in London.
Looking forward to Michael Jordan’s
keynote address next week.
35. Information >> Knowledge
35
Insights
❏ William Smith is interested in data mining
(academic/data scientist)
wsmith,
william smith
KDD 2018,
Knowledge Discovery & Data Mining 2018
registered_for
37. Information >> Knowledge
37
Insights from the combined graph
❏ KDD 2018 is a data mining conference
❏ KDD 2018 requires registration to attend
❏ Michael Jordan will be giving a keynote
address at KDD 2018
❏ KDD 2018 URL (http://www.kdd.org/kdd2018/)
38. Information >> Knowledge
38
Inferred Insights from the combined graph
❏ Michael Jordan is an influencer in
data mining research community
❏ Both Michael Jordan and William Smith
will be in London from during KDD 2018
39. Information >> Knowledge
39
Insights applied to other problems
❏ Which Michael Jordan and why?
Michael Jordan is an American
former professional basketball
player. He played 15 seasons in
the National Basketball
Association for the Chicago
Bulls and Washington Wizards.
Michael Irwin Jordan is an
American scientist, professor at
the University of California,
Berkeley and researcher in
machine learning, statistics,
and artificial intelligence.
40. Information >> Knowledge
40
Insights applied to other problems
❏ Which Michael Jordan and why?
Michael Jordan is an American
former professional basketball
player. He played 15 seasons in
the National Basketball
Association for the Chicago
Bulls and Washington Wizards.
Michael Irwin Jordan is an
American scientist, professor at
the University of California,
Berkeley and researcher in
machine learning, statistics,
and artificial intelligence.
Knowledge Discovery
& Data Mining 2018
Michael
Jordan
keynote_speaker
43. General Recipe
1. Represent information as graphs
a. represent not model (~schema on read)
b. entities are nodes in the graph
entity attributes = node properties
c. entity-to-entity relationships are edges in the graph
relationship attributes = edge properties
43
44. General Recipe
1. Represent information as graphs
a. represent not model (~schema on read)
b. entities are nodes in the graph
entity attributes = node properties
c. entity-to-entity relationships are edges in the graph
relationship attributes = edge properties
d. repeat for each domain (highly parallelizable)
44
45. General Recipe
1. Represent information as graphs
2. Define “cross-over” traversals/edges
a. algorithms (similarity, clustering, classification)
45
46. General Recipe
1. Represent information as graphs
2. Define “cross-over” traversals/edges
a. algorithms (similarity, clustering, classification)
b. edges can be broadly described as
i. same_as : a measure of closeness
ii. member_of: entity-to-entity associations
46
47. General Recipe
1. Represent information as graphs
2. Define “cross-over” traversals/edges
a. algorithms (similarity, clustering, classification)
b. edges can be broadly described as
c. cross-over traversals can be
i. probabilistic in nature
ii. query specific
47
48. 1. Represent information as graphs
2. Define “cross-over” traversals/edges
3. Find the best projections
General Recipe
48
OR
49. General Recipe
1. Represent information as graphs
2. Define “cross-over” traversals/edges
3. Find the best projection
4. Automate graph construction
a. Use ML/IR/KE
b. probabilistic linkages
49
54. Acme Investment
54
⅓ of all adults are
average age is 64,
❖ median-women: 50,
❖ median-men:54
❖ first time grandparent
avg age:47
77% are married
Demographics
55. Acme Investment
55
⅓ of all adults are
average age is 64,
❖ median-women: 50,
❖ median-men:54
❖ first time grandparent
avg age:47
77% are married
control about ⅓ of the
nation's assets
spend about $52 billion
yearly on grandchildren
(education:$32 billion ,
infant-apparel: $3 billion)
give grandkids over $5
billion yearly in stocks &
securities
Demographics Financial
56. Use Cases
56
Data wrangling | ETL,ELTScalable Ingestion
Data Cleansing Knowledge Inference
Contextual Parsing Data Imputation Preprocessing
57. Use Cases
57
Data wrangling | ETL,ELTScalable Ingestion
Data Cleansing Knowledge Inference
Contextual Parsing Data Imputation
Data
Governance
Entity
DeDup
Community
Discovery
Fraud
Detection
Recommendation
Systems
Preprocessing
Customer
360
Customer
Journey
58. 58
In Summary
❖ Graphs are a flexible representation of real
world entities and their relationships.
❖ Graphs facilitate transforming
data into insights
❖ Graph creation & inference can be
augmented & automated using AI