SlideShare a Scribd company logo
1 of 41
Graph Analytics
For Fun and Profit
Hello!
I am David Bechberger
Sr. Architect for Data and Analytics at Gene by
Gene, a bioinformatics company specializing
in genetic genealogy.
You can find me at:
@bechbd
www.linkedin.com/in/davebechberger
What we do at
Swab Sequence Analysis Insight
What this talk isn’t
◎A through review of graph analytic
techniques
◎A review of all graph analytic frameworks
◎A deep dive into any of the techniques we
discuss
What this talk is
◎Where to start with Graph Analytics
◎OLTP and OLAP in Gremlin
◎Practical Examples using …..
Family
Trees
◎We all have them
◎I know them well
◎They are natural
graphs
Or more specifically this
name
owns individual
family
tree
member_of
is_known_as
is_spouse
is_first_cousin
Example - Find the names of all family members in a tree
T1
F1
I1
Bob
F2
I2
I3
I4
Steve
Joan
Rick
owns
member_of:
Husband
member_of:
Sonis_known _as
is_known _as
is_known _as
is_known _as
member_of:
Husband
member_of:
Wife
member_of:
Wife
Gremlin Example - Finding the names of all family members
for tree owner
g.V().has(‘tree’, ‘unique_id, ‘T1')
.out(‘owns’)
.sideEffect(
out('is_known_as').properties('full_name')
.store('name')
)
.out('member_of').in('member_of')
.sideEffect(
out('is_known_as').properties('full_name')
.store('name')
)
.cap('name')
◎Tinkerpop supports both
◎Gremlin can be used to
query in either
◎But their are differences….
Apache Tinkerpop Gremlin OLTP and OLAP
OLTP
◎ Depth First
◎ Lazy Evaluation - Low
memory usage
◎ Real-time (ms/sub-
sec)
Gremlin OLTP versus OLAP
OLAP
◎ Breadth First
◎ Eager evaluation -
High memory usage
◎ Long Running
(min/hour)
OLTP
◎ Cannot run certain
queries or steps (e.g.
pageRank, bulk
loading)
◎ Limited time a query
◎ Local operations
Limitations
OLAP
◎ Some steps are
prohibitive like path(),
simplePath(), etc.
◎ Barrier Steps (count(),
min(), max(), etc.)
◎ Global Operations
What insights are we going to gain
◎Who in this tree is the most important?
◎Who in this tree is 6 degrees from Kevin
Bacon?
◎Who in this tree married their first cousin?
1.
Centrality Analysis
Finding Importance
Degree
Centrality
Count the edges
Example - Who is the member of the most families?
g.V().hasLabel('individual')
.project('person', 'degree')
.by('full_name')
.by(bothE('member_of').count())
.order().by(select('degree'), decr).limit(5)
Eigenvector
Centrality
Relative importance matters
.6
.3 .5
.4
.2 .2
.2
Example - Who is in the most important individual?
g.V().hasLabel('individual')
.repeat(
groupCount('m').by('full_name')
.out('member_of').in('member_of')
.timeLimit(100)
).times(5).cap('m')
.order(local).by(values, decr)
.limit(local, 5).next()
PageRank
Similar to the Eigenvector
Centrality but with scaling
25
3
2
5
1
3
2
22
Example - Whose lineage exerts the most influence over this
family tree?
g.V().withComputer().hasLabel('individual')
.pageRank()
.by(bothE('member_of')).by('rank')
.order().by('rank', decr)
.valueMap('full_name', 'rank').limit(5)
Answer
Degree EigenVector PageRank
Name Value
Henry VIII 7
Charlemagne 6
Jan 5
Ferdinand VII 5
Philip II 5
Name Value
Mary 149950
Margret 124221
Henry VIII 107539
Son 90715
Daughter 86961
Name Value
Joan of the
Tower 0.784
Edward III 0.774
Elenor 0.774
John of
Eltham 0.719
Frederick
William III 0.681
And many
more...
Closeness Centrality
Betweeness Centrality
Katz Centrality
Freeman Centrality …...
Practical Examples
◎Who is the most important person in my
family's history?
◎Who in my family history has been the most
prolific?
2.
Path Analysis
Who in this tree is 6 degrees from
Kevin Bacon?
Path
How did you get there?
Simple
Path
Don’t Repeat yourself
Cyclic
Path
Ok then Repeat yourself
Sorry
Not in this family tree
How about this instead?
Example - What long is the lineage between Queen Victoria
and Henry VIII?
SimplePath
g.V('@I1@').repeat(timeLimit(60000)
.out('member_of').in('member_of')
.simplePath()).until(hasId('@I828@'))
.path().limit(1).count(local)
CyclicPath
g.V('@I1@').repeat(timeLimit(60000)
.out('member_of').in('member_of')
.cyclicPath()).until(hasId('@I828@'))
.path().limit(1).count(local)
SimplePath
25 steps
Answer
CyclicPath
27 steps
Practical Examples
◎How am I related to X in my family?
◎Does this family tree contain clusters of
people?
3.
Pattern Detection
Finding what is hidden
Pattern Detection in Gremlin
◎Gremlin has the ability to be imperative
○ g.V().in().out()......
◎Or Declarative
○ g.V().match(
__.as(‘a’).....as(‘b’), //predicate 1
__.as(‘b’).....as(‘c’), //predicate 2
__.as(‘c’).where(‘c’, eq(‘b’)).as(‘c’)
).select(‘b’, ‘c’)
Example - Who is married to their first cousin?
g.V().match(
__.as('e').has('individual','sex','M').as('husband'),
__.as('husband').in('is_spouse').as('wifes'),
__.as('husband').both('is_first_cousin').as('cousin'),
__.as('cousin').where('cousin',eq('wifes')).as('wife')
).select('husband',’wife')
.by('full_name').fold().unfold()
Answer
Husband Wife
1 Albert Augustus Charles Victoria /Hanover/
2 Leopold_I Margaret Teresa
3 Alexander_I the_Fierce Sybil
4 Philip_IV Mariana of_Austria
Practical Examples
◎Merging trees together based on potential
common ancestors using pattern matching
4.
Putting it all together
Example - Which women who married their first cousin had
the greatest number of families?
g.V().match(
__.as('e').has('individual','sex','M').as('husband'),
__.as('husband').in('is_spouse').as('wifes'),
__.as('husband').both('is_first_cousin').as('cousin'),
__.as('cousin').where('cousin',eq('wifes')).as('wife')
).select('wife')
.project('person','degree')
.by('full_name')
.by(bothE('member_of').count())
.order().by(select('degree'), decr).limit(5)
Answer
Wife Degree
1 Victoria /Hanover/ 2
2 Margaret Teresa 3
3 Sybil 4
4 Mariana of_Austria 2
Thanks!
Any questions?
You can find me at:
dave@bechberger.com
@bechbd
www.linkedin.com/in/davebechberger

More Related Content

Recently uploaded

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 

Recently uploaded (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Graph Analytics For Fun and Profit

  • 2. Hello! I am David Bechberger Sr. Architect for Data and Analytics at Gene by Gene, a bioinformatics company specializing in genetic genealogy. You can find me at: @bechbd www.linkedin.com/in/davebechberger
  • 3. What we do at Swab Sequence Analysis Insight
  • 4. What this talk isn’t ◎A through review of graph analytic techniques ◎A review of all graph analytic frameworks ◎A deep dive into any of the techniques we discuss
  • 5. What this talk is ◎Where to start with Graph Analytics ◎OLTP and OLAP in Gremlin ◎Practical Examples using …..
  • 6. Family Trees ◎We all have them ◎I know them well ◎They are natural graphs
  • 7. Or more specifically this name owns individual family tree member_of is_known_as is_spouse is_first_cousin
  • 8. Example - Find the names of all family members in a tree T1 F1 I1 Bob F2 I2 I3 I4 Steve Joan Rick owns member_of: Husband member_of: Sonis_known _as is_known _as is_known _as is_known _as member_of: Husband member_of: Wife member_of: Wife
  • 9. Gremlin Example - Finding the names of all family members for tree owner g.V().has(‘tree’, ‘unique_id, ‘T1') .out(‘owns’) .sideEffect( out('is_known_as').properties('full_name') .store('name') ) .out('member_of').in('member_of') .sideEffect( out('is_known_as').properties('full_name') .store('name') ) .cap('name')
  • 10. ◎Tinkerpop supports both ◎Gremlin can be used to query in either ◎But their are differences…. Apache Tinkerpop Gremlin OLTP and OLAP
  • 11. OLTP ◎ Depth First ◎ Lazy Evaluation - Low memory usage ◎ Real-time (ms/sub- sec) Gremlin OLTP versus OLAP OLAP ◎ Breadth First ◎ Eager evaluation - High memory usage ◎ Long Running (min/hour)
  • 12. OLTP ◎ Cannot run certain queries or steps (e.g. pageRank, bulk loading) ◎ Limited time a query ◎ Local operations Limitations OLAP ◎ Some steps are prohibitive like path(), simplePath(), etc. ◎ Barrier Steps (count(), min(), max(), etc.) ◎ Global Operations
  • 13. What insights are we going to gain ◎Who in this tree is the most important? ◎Who in this tree is 6 degrees from Kevin Bacon? ◎Who in this tree married their first cousin?
  • 16. Example - Who is the member of the most families? g.V().hasLabel('individual') .project('person', 'degree') .by('full_name') .by(bothE('member_of').count()) .order().by(select('degree'), decr).limit(5)
  • 18. Example - Who is in the most important individual? g.V().hasLabel('individual') .repeat( groupCount('m').by('full_name') .out('member_of').in('member_of') .timeLimit(100) ).times(5).cap('m') .order(local).by(values, decr) .limit(local, 5).next()
  • 19. PageRank Similar to the Eigenvector Centrality but with scaling 25 3 2 5 1 3 2 22
  • 20. Example - Whose lineage exerts the most influence over this family tree? g.V().withComputer().hasLabel('individual') .pageRank() .by(bothE('member_of')).by('rank') .order().by('rank', decr) .valueMap('full_name', 'rank').limit(5)
  • 21. Answer Degree EigenVector PageRank Name Value Henry VIII 7 Charlemagne 6 Jan 5 Ferdinand VII 5 Philip II 5 Name Value Mary 149950 Margret 124221 Henry VIII 107539 Son 90715 Daughter 86961 Name Value Joan of the Tower 0.784 Edward III 0.774 Elenor 0.774 John of Eltham 0.719 Frederick William III 0.681
  • 22. And many more... Closeness Centrality Betweeness Centrality Katz Centrality Freeman Centrality …...
  • 23. Practical Examples ◎Who is the most important person in my family's history? ◎Who in my family history has been the most prolific?
  • 24. 2. Path Analysis Who in this tree is 6 degrees from Kevin Bacon?
  • 25. Path How did you get there?
  • 28. Sorry Not in this family tree
  • 29. How about this instead?
  • 30. Example - What long is the lineage between Queen Victoria and Henry VIII? SimplePath g.V('@I1@').repeat(timeLimit(60000) .out('member_of').in('member_of') .simplePath()).until(hasId('@I828@')) .path().limit(1).count(local) CyclicPath g.V('@I1@').repeat(timeLimit(60000) .out('member_of').in('member_of') .cyclicPath()).until(hasId('@I828@')) .path().limit(1).count(local)
  • 32. Practical Examples ◎How am I related to X in my family? ◎Does this family tree contain clusters of people?
  • 34. Pattern Detection in Gremlin ◎Gremlin has the ability to be imperative ○ g.V().in().out()...... ◎Or Declarative ○ g.V().match( __.as(‘a’).....as(‘b’), //predicate 1 __.as(‘b’).....as(‘c’), //predicate 2 __.as(‘c’).where(‘c’, eq(‘b’)).as(‘c’) ).select(‘b’, ‘c’)
  • 35. Example - Who is married to their first cousin? g.V().match( __.as('e').has('individual','sex','M').as('husband'), __.as('husband').in('is_spouse').as('wifes'), __.as('husband').both('is_first_cousin').as('cousin'), __.as('cousin').where('cousin',eq('wifes')).as('wife') ).select('husband',’wife') .by('full_name').fold().unfold()
  • 36. Answer Husband Wife 1 Albert Augustus Charles Victoria /Hanover/ 2 Leopold_I Margaret Teresa 3 Alexander_I the_Fierce Sybil 4 Philip_IV Mariana of_Austria
  • 37. Practical Examples ◎Merging trees together based on potential common ancestors using pattern matching
  • 38. 4. Putting it all together
  • 39. Example - Which women who married their first cousin had the greatest number of families? g.V().match( __.as('e').has('individual','sex','M').as('husband'), __.as('husband').in('is_spouse').as('wifes'), __.as('husband').both('is_first_cousin').as('cousin'), __.as('cousin').where('cousin',eq('wifes')).as('wife') ).select('wife') .project('person','degree') .by('full_name') .by(bothE('member_of').count()) .order().by(select('degree'), decr).limit(5)
  • 40. Answer Wife Degree 1 Victoria /Hanover/ 2 2 Margaret Teresa 3 3 Sybil 4 4 Mariana of_Austria 2
  • 41. Thanks! Any questions? You can find me at: dave@bechberger.com @bechbd www.linkedin.com/in/davebechberger

Editor's Notes

  1. Background in nearly 20 years Full Stack development in.NET, C, Java/Scala, and pretty much everything else Switched to working almost exclusively on big data problems several years ago Spent the last few years leveraging graph databases to build out high performance data platforms If you have questions on using .NET and graph databases feel free to come talk to me. Current role is Sr Architect for data and analytics building out our next generation data and analytics platform
  2. As I like to think of this talk as “Things I wish I knew 18 months ago about graphs”
  3. Well known model Going to use a European Royal Family Tree
  4. Based on GEDCOM - 1995 Standard by the LDS church Basically its a linked data structure where all records are atomic units (individual/family/name/note) that contain pointers to each other
  5. Start at a tree Move to the root owner and to their name Traverse out to families Then from families to other individuals and their name
  6. Here is what an example query on our model looks like…. As you can see the basis of this model as it was brought over from GEDCOM can make the queries be more verbose that one would normally strive to in order to retrieve what should be a relatively simple set of data
  7. OLTP -Depth first - serial stream processing to provide depth first traversals into the data. Can be thought of as a stream processor where graph traversers arrive from the left -> an instruction is processed on those traversers -> mutated traversers are sent out the right OLAP - Unlike OLTP queries OLAP queries are breadth first queries meaning that they run in a logically parallel and use message passing to communicate between the messages.
  8. OLTP - Has its limitations , most notably certain complex operations (such as running pageRank, bulking loading, and global operations) which are not allowed or appropriate for a transactional workload OLAP - This scatter/gather methodology allows for working on massive scales of data but also prevents some steps (such as path(), simplePath()) from being executed and others such as order() from being meaningful. It also has the disadvantage that some steps within a gremlin query can require all of the data to be in the same location to process. Steps such as count(), min(), max(), group(), etc. are known as barrier steps and requires that all the data return to a single location to be processed before being sent out to workers. OLTP - Use when your query is going to touch only a portion of the data or a subgraph e.g. Give me the average age of people in my family? OLAP - Use when your query is going to touch all/a significant amount of the data in the graph e.g. Give me the average age of everyone in my family tree?
  9. Centrality Analysis is about determining what is the most important in your graph. This sort of analysis is quite common when performing social network analysis, looking for key infrastructure points and examining biological networks. Unfortunately defining what it means to be important is really dependant on the circumstances. One other important thing to remember is that these sort of algorithms measure the importance of a vertex in a graph which may or may not be correlated to the influence. For finding the most influential nodes in a graph there are other node influence metrics you would want to investigate.
  10. Degree Centrality - a measure of the number of edges associated with a vertex Degree Centrality looks at the number of connections a vertex has and uses that to determine the relative importance. This can be further refined using only inward outward edges. In degree centrality the larger the number the more influential the vertex
  11. Eigenvector Centrality - a measure of the vertex on the graph by using the relative importance of the adjacent vertices to influence the importance of a vertex. I.e. If a node has many edges but is connected to few influential vertices it will be ranked lower than a vertex with fewer edges but the adjacent vertices are more important
  12. PageRank - Made famous by Sergey Brin and Larry Page at Google for ranking web links. It works similar to the Eigenvector Centrality but adds a scaling factor to the results. This algorithm is well documented but far from something you would want to create yourself. Luckily Gremlin has a prebuilt step to help us with this.
  13. The interesting part about this answer is not necessarily the answer itself but the fact that each method produced distinctly different answers Example why you need to understand your question to choose the correct method
  14. Why do these examples matter?
  15. Paths are the walk through the graph defined by a traversal Path object contains All Labels “as(xxxx)” All Vertices All Edges All sideeffects/datastructures Path traversals tend to be on the slow side and they are computationally expensive as the entire path is stored for each traverser. This can expand exponentially as the size of the
  16. Simple path queries are pretty much what it sounds like. Shortest path between two vertices in a graph. Minimize the amount of computation that is required simplePath filters out paths that contain repeats in them. Simple path queries are often useful if you want to find the shortest connection between two things such as in a transportation network, between patterns or subgraphs or in social network analysis
  17. cyclic paths are paths that repeat back on themselves. Using something like a cyclic path can be a first step in trying to detect communities or clusters within your graph
  18. How about we find the quickest lineage between Queen Victoria and Henry VIII instead?
  19. There are a few key things to note here: Where the “until” sits matters when do a repeat. If it is before the “repeat” it is a while/do, if it is after it is a do/while loop Adding a timelimit to you traversal can help prevent a never ending query
  20. If you go in and examine the path objects returned by these two queries you will notice that the difference between the simple and cyclic paths is that the cyclic path circles through her husband Albert to continue on to Henry VIII
  21. Finding how you are related to others in your family tree is a rather straightforward matter of counting the ups and down in generations found by the simplest path. Finding clusters of people in your tree can be used to help identify areas in your tree where familial marriages were common
  22. Gremlin has the ability to work as both an imperative language as well as a declarative language In the imperative model you usually write queries as we described earlier. You start with some stream over vertices -> you then move left to right taking in data -> processing that data -> the emitting the processed data On the other hand the declarative model works using a different approach. In the declarative model the user defines a base set of nodes -> then describes a one or more patterns that the data needs to match. Once submitted to the gremlin engine the engine determines the optimal query to run to find that pattern within the graph One of the neat features of gremlin is that you are able to intermix the two types of syntax within the same traversal. Personally I find writing the declarative syntax powerful but I struggle everyt ime I work with it.
  23. 1.So what we are doing here is first define a predicate containing all the males 2. Next we are defining a predicate containing all of those mens wives 3. We then are defining a predicate containing all of those mens cousins 4. Finally we are matching everyone who is both a cousin and a wife
  24. Yes while I understand this is an interesting traversal from an inquisitive perspective it is also relevant from a genetic aspect as well. Endogamous populations, ones that marry within specific groups, have greater genetic chance of inheriting familial genetic defects than people who marry within the larger population. While cousin marriages were common across many parts of the European Royal family tree, one famous example was Queen Victoria. She married her first cousin Albert. Due to this close relationship between parents several of Queen Victoria's children ended up with hemophilia, which is a genetic defect of on the X chromosome inherited from parents.
  25. Why do these examples matter? Well in our business our customers are very interested in expanding their family trees. If we are able to use pattern matching algorithms to suggest potential matches in other people’s family trees then we are able to quickly and effectively provide them with the ability to expand their family trees.
  26. When it comes to it this is a bit of a strange query. Intermixing these sorts of graph analytical tools to gain more valuable insight into your data This query is also an example of how you can levereage both declarative and imperative syntaxes in the same query.