Waterloo Data Science and Data Engineering Meetup - 2018-08-29

InSights@Scale
Building Knowledge Graphs
Over Heterogeneous Datasets
@__AtifKhan
Vice President AI & Data Science
Messagepoint Inc.

Real world entities are complex
❖ multi-dimensional
❖ rich interactions
Motivation
2
fan_of
like
colleague
reports_to
wife

Entities are described using multiple
heterogeneous datasets
❖ entity attributes/properties
❖ entity-to-entity interactions
Understanding an entity demands linking/joining
data across datasets
Motivation
3
scale of raw
information

Entities are described using multiple
heterogeneous datasets
❖ entity attributes/properties
❖ entity-to-entity interactions
Understanding an entity demands linking/joining
data across datasets
Motivation
4
scale of raw
information

Motivation
5
❖ heterogeneous datasets
❖ scale
❖ linking/joining
Traditional database systems are challenged
across all three dimensions

Agenda
6
Core Concepts
Construction from raw data
General Framework
Knowledge Inference

An entity is modelled as a node
Graph Representation of Data
8
A
B

An entity is modelled as a node
An entity-to-entity relationship is
modelled as an edge
edges can be directed (son_of)
or non-directed (married_to)
9
A
B

In a property graph,
nodes (entities) & edges (relationships)
can define their own properties
10
A B
name: john
age: 24
name: mary
age: 22
gender: F
profession: singer
start: 1/1/1974
married_to

An information schema is also a (sub)graph
11
A B
married_to

An information schema is also a (sub)graph
12
A B
married_to
P P
married_to
instance_ofinstance_of
range: P
domain: P
start:

Graph Traversal (Query)
13
brother
brother
wife
friend
friend
john
dave
april
arthur
mary
alice

14
brother
brother
wife
friend
friend
john
dave
april
arthur
mary
alice
query = graph traversal

15
brother
brother
wife
friend
friend
john
dave
april
arthur
mary
alice
Who is John’s wife?

16
brother
brother
wife
friend
friend
john
dave
april
arthur
mary
alice
Who is John’s wife?
A: {alice}

17
brother
brother
wife
friend
friend
john
dave
april
arthur
mary
alice
Who is friends with
whom?

18
brother
brother
wife
friend
friend
john
dave
april
arthur
mary
alice
Who is friends with
whom?

19
brother
brother
wife
friend
friend
john
dave
april
arthur
mary
alice
Who is friends with
whom?
A: {
alice:mary,
}

20
brother
brother
wife
friend
friend
john
dave
april
arthur
mary
alice
Who is friends with
whom?
A: {
alice:mary,
mary:april
}

21
brother
brother
wife
friend
friend
john
dave
april
arthur
mary
alice
How many relatives
are there?

22
husband
wife
How many relatives
are there?
relative: {
brother, sister,
husband, wife
}
relative
is_a
brother
sister

23
How many relatives
are there?
A: 3
brother
brother
wife
friend
friend
john
dave
april
arthur
mary
alice
relative
relativerelative

Information >> Knowledge
25
…. 1011101101101011
…. 1011111001101101
…. 1011111001101101
…. 1010100101111011
…. 1011001001101101
…. 1011111001101001

26
Will Smith @wsmith
Finished registering for KDD 2018
in London
http://www.kdd.org/kdd2018/
William Smith shared a link
I am off to Knowledge Discovery and
Data mining conference in London.
Looking forward to Michael Jordan’s
keynote address.

27
Will Smith @wsmith
Finished registering for KDD 2018
in London
http://www.kdd.org/kdd2018/
William Smith shared a link
I am off to Knowledge Discovery and
Data mining conference in London.
Looking forward to Michael Jordan’s
keynote address next week.

28
wsmith
KDD
2018
URL London

29
wsmith
KDD
2018
URL
registered_for
website
held_at
London

30
wsmith
KDD
2018
LondonURL
registered_for
website
held_at
william
smith
Knowledge Discovery
& Data Mining 2018
conference
attending
is_aheld_at
Michael
Jordan
keynote_speaker
London

31
wsmith
KDD
2018
LondonURL
registered_for
website
held_at
william
smith
Knowledge Discovery
& Data Mining 2018
conference
London
attending
is_aheld_at
Michael
Jordan
keynote_speaker
same_as
same_as
same_as

32
wsmith
KDD
2018
LondonURL
registered_for
website
held_at
william
smith
Knowledge Discovery
& Data Mining 2018
conference
London
attending
is_aheld_at
Michael
Jordan
keynote_speaker
same_as
same_as
same_as
0.90
0.98
0.99

33
wsmith
KDD
2018
LondonURL
registered_for
william
smith
Knowledge Discovery
& Data Mining 2018
conference
London
attending Michael
Jordan
keynote_speaker
same_as
same_as
same_as
0.90
0.98
0.99
website
held_at is_aheld_at

34
URL
conference
attending Michael
Jordan
keynote_speaker
website
is_a
held_at
wsmith,
william smith
KDD 2018,
Knowledge Discovery & Data Mining 2018
London
registered_for

35
Insights
❏ William Smith is interested in data mining
(academic/data scientist)
wsmith,
william smith
KDD 2018,
registered_for

36
Insights
❏ KDD 2018 conference will be held at London
held_at
KDD 2018,
London

37
Insights from the combined graph
❏ KDD 2018 is a data mining conference
❏ KDD 2018 requires registration to attend
❏ Michael Jordan will be giving a keynote
address at KDD 2018
❏ KDD 2018 URL (http://www.kdd.org/kdd2018/)

38
Inferred Insights from the combined graph
❏ Michael Jordan is an influencer in
data mining research community
❏ Both Michael Jordan and William Smith
will be in London from during KDD 2018

39
Insights applied to other problems
❏ Which Michael Jordan and why?
Michael Jordan is an American
former professional basketball
player. He played 15 seasons in
the National Basketball
Association for the Chicago
Bulls and Washington Wizards.
Michael Irwin Jordan is an
American scientist, professor at
the University of California,
Berkeley and researcher in
machine learning, statistics,
and artificial intelligence.

40
Insights applied to other problems
❏ Which Michael Jordan and why?
Michael Jordan is an American
former professional basketball
player. He played 15 seasons in
the National Basketball
Association for the Chicago
Bulls and Washington Wizards.
Michael Irwin Jordan is an
American scientist, professor at
the University of California,
Berkeley and researcher in
machine learning, statistics,
and artificial intelligence.
Knowledge Discovery
& Data Mining 2018
Michael
Jordan
keynote_speaker

General Recipe
1. Represent information as graphs
a. represent not model (~schema on read)
42

General Recipe
b. entities are nodes in the graph
entity attributes = node properties
c. entity-to-entity relationships are edges in the graph
relationship attributes = edge properties
43

General Recipe
b. entities are nodes in the graph
entity attributes = node properties
c. entity-to-entity relationships are edges in the graph
relationship attributes = edge properties
d. repeat for each domain (highly parallelizable)
44

General Recipe
2. Define “cross-over” traversals/edges
a. algorithms (similarity, clustering, classification)
45

General Recipe
b. edges can be broadly described as
i. same_as : a measure of closeness
ii. member_of: entity-to-entity associations
46

General Recipe
b. edges can be broadly described as
c. cross-over traversals can be
i. probabilistic in nature
ii. query specific
47

3. Find the best projections
General Recipe
48
OR

General Recipe
3. Find the best projection
4. Automate graph construction
a. Use ML/IR/KE
b. probabilistic linkages
49

What investment product to
offer to person “A”?
Acme Investment
51
A
B
C
has_child
has_child

What investment product to
offer to person “A”?
A is a grandparent of C
Acme Investment
52
A
B
C
has_child
has_child
grandparent_of
(0.999)

Acme Investment
54
⅓ of all adults are
average age is 64,
❖ median-women: 50,
❖ median-men:54
❖ first time grandparent
avg age:47
77% are married
Demographics

Acme Investment
55
⅓ of all adults are
average age is 64,
❖ median-women: 50,
❖ median-men:54
❖ first time grandparent
avg age:47
77% are married
control about ⅓ of the
nation's assets
spend about $52 billion
yearly on grandchildren
(education:$32 billion ,
infant-apparel: $3 billion)
give grandkids over $5
billion yearly in stocks &
securities
Demographics Financial

Use Cases
56
Data wrangling | ETL,ELTScalable Ingestion
Data Cleansing Knowledge Inference
Contextual Parsing Data Imputation Preprocessing

Use Cases
57
Data wrangling | ETL,ELTScalable Ingestion
Data Cleansing Knowledge Inference
Contextual Parsing Data Imputation
Data
Governance
Entity
DeDup
Community
Discovery
Fraud
Detection
Recommendation
Systems
Preprocessing
Customer
360
Customer
Journey

58
In Summary
❖ Graphs are a flexible representation of real
world entities and their relationships.
❖ Graphs facilitate transforming
data into insights
❖ Graph creation & inference can be
augmented & automated using AI

59
We are hiring
www.messagepoint.com/careers/current-openings
(careers@messagepoint.com)
● AI/ML engineers
● JEE developers
● QA

Waterloo Data Science and Data Engineering Meetup - 2018-08-29

Recommended

Recommended

More Related Content

Similar to Waterloo Data Science and Data Engineering Meetup - 2018-08-29

Similar to Waterloo Data Science and Data Engineering Meetup - 2018-08-29 (20)

More from Zia Babar

More from Zia Babar (6)

Recently uploaded

Recently uploaded (20)

Waterloo Data Science and Data Engineering Meetup - 2018-08-29