SlideShare a Scribd company logo
1 of 30
A Survey As A Graph
Representing survey data in a natural way
Klaus Blass
Consultant – Development Data Group
The World Bank
klaus.blass@yahoo.com
Surveys
• Cross-sectional Surveys
• Snapshots
• Longitudinal Surveys
• Repeated at regular intervals
• Household Budget Surveys (HBS)
• Agricultural Production Surveys (APS)
• Population Census
• . . .
Surveys
• Paper forms
Subsequent Data Entry
• CAPI (Computer Assisted Personal Interview)
Tablets
Data already in digital format on server
Survey Solutions
• Survey Solutions is a CAPI system developed & maintained by the
World Bank
• used in thousands of surveys and censuses in 175 countries
• free software
https://mysurvey.solutions
A Survey as a Graph
Survey Data
• Tables of
• Households
• Household assets
• Household members
• Revenues & Expenses
• Agricultural plots
• Crops
• . . .
Health and Demographic Surveillance System
HDSS
• Monitoring the state of a population over time
• Demography (age, sex, ethnicity, etc.)
• Health (diseases, mortality)
• Wealth (assets, planted crops, animals, income)
• Migration (immigration & emigration)
The Nouna HDSS in Burkina Faso
Centre de Recherche en Santé de
Nouna (CRSN)
Heidelberg Institute of Global
Health (HIGH)
Rural area with 59 villages
14,000 households
115,000 people
Survey Data - households table
(> 350 variables)
Survey Data - households table
(missing values)
Survey Data - code labels
. . . .
label define q107 1 `"Moquette / parquet"' 2 `"Bois poli"' 3 `"Carreaux"' 4 `"Vinyle"' 5 `"Ciment"' 6 `"Terre battue / Sable"' 9 `"Autre (à
préciser)"'
label values q107 q107
label variable q107 `"De quels principaux matériaux est fait le sol de l’habitation principale du ménage ?"'
label variable q107autre `"Quels autres matériaux?"'
label define q108 1 `"Béton"' 2 `"Tuiles"' 3 `"Tôles"' 4 `"Paille/Feuille"' 5 `"Banco / Terre Battue"' 6 `"Autre (à préciser)"'
label values q108 q108
label variable q108 `"Quels sont les principaux matériaux du toit de l’habitation principale du ménage ?"'
label variable q108autre `"Quels autres matériaux?"'
label variable q109__1 `"Quel type de toilettes utilisez-vous?:WC avec chasse eau"'
Survey Data
• Data are arranged not for clarity* but for completeness
* from a human reader‘s point of view
• A single survey variable may occupy a dozen columns
• Data are just codes, labels are in different (rather cryptic) files
• Just loading these data with a LOAD utility is not an option
Data must be loaded with a custom program!
A custom Java loader
Basic structure:
• Instantiate a Bolt driver
• server port
• credentials
• Write a method which
• opens a session
• begins a transaction
• runs a Cypher query
• commits the transaction
Example: Loading all households
private final static String MERGE_HOUSEHOLD =
"MERGE (a:Household {ivkey: $ivkey, id: $id, hhNum: $hhNum, phoneCHM: $phoneCHM}) " +
"WITH a " +
"MATCH (c:Compound {ivkey: $ivkey}) " +
"WITH a, c " +
"MERGE (a)-[:IN_COMPOUND]->(c) " +
"RETURN id(a)"
;
public void build_households(String surveyFile) throws IOException {
TabFile tf = new TabFile() {
public void onLine( String line) {
String[] cell = line.split(tab);
write(MERGE_HOUSEHOLD, parameters(
"ivkey", cell[0], "id", cell[6], "hhNum", cell[2], "phoneCHM", cell[9]
));
}
Load all nodes and create relationships
• Villages
• Compounds
• Households
• Assets
• Members
• Immigrations/emigrations
The village of Moinsi
Moinsi, the smallest
village in the Nouna
HDSS:
All compounds,
the households they
contain,
and their household
members.
Bad data
“Never throw away survey data“
Problem: Empty compounds
Solution:
• Relabel empty compounds as
GhostCompounds
• Will no longer show up in
compound related queries
match (c:Compound)
optional match (c)-[r:IN_COMPOUND]-
(hh:Household)
with c,hh,count(r) as hhcount
where hhcount=0
set c:GhostCompound
and
match (g:GhostCompound)
remove g:Compound
Reported Deaths
• Members who died should no
longer be considered
“members“
• But we want to remember which
household they belonged to
Relabel them as “Deceased“
Code tables
Example: building characteristics
• Only the selected code is saved in the
data.
• Want to be able to query this property
by code or by description.
• There is no “Lookup table“ in Neo4j
where I could look up the description
from the code.
Code tables
• Store code and description in one string
• How to query by either code or
(partial) description ?
Roll your own function !
The Power of User-Defined Functions
• WHERE klaus.codeOf( habitation.floor, ‘5‘ )
• WHERE klaus.includes( habitation.floor, ‘cement‘ )
• normalizes all text to lowercase, diacritics removed
• allows for slight divergence of spelling (Levenshtein distance <= 1)
User-Defined Functions
klaus.includes( habitation.floor, ‘cement‘ )
public Boolean includes(
@Name("String to search")String s1,
@Name("keyword")String keyword
) {
s1 = normalize(s1);
keyword = normalize(keyword);
String[] words = s1.split("s+|/|-|");// whitespace / - 
for (String s : words) {
if (LevenshteinDistance(s, keyword) <= 1) return true;
}
return false;
}
User-Defined Functions
More user-defined functions:
• Date functions for partial dates
using own assumptions about missing components
• Matching similar text (specify Levenshtein distance)
klaus.similar(‘solar‘, ‘Solaire‘, 2)  true
Ordered multiple responses
• multiple answers possible
• order of answers recorded
Ordered multiple responses
Select the type of toilet your household uses
- most used first (max. 3 answers)
Longitudinal Data
Example: pregnancies
• Pregnancies differ during survey rounds
• Pregnancy events are nodes
• Could later be linked to an outcome (new member, abortion, etc.)
Longitudinal Data
Identify each pregnancy by the round they were reported
multiple labels
• Query pregnancies in general
MATCH (m:Member)--(p:Pregnancy)
• Query pregnancies during a specific survey round
MATCH (m:Member)--(p:Pregnancy:Round1)
Migration
• Node, property or relationship?
• Migrations as relationships
• Properties:
date
reason
returned?
• Great visualization
• Emigrants & Visitors
Emigrations
outside the country
Thank you for your attention.
Questions?

More Related Content

More from Neo4j

BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...Neo4j
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AINeo4j
 
Ingka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignIngka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignNeo4j
 
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Neo4j
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Neo4j
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 

More from Neo4j (20)

BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
 
Ingka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignIngka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by Design
 
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 

Recently uploaded

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

A Survey as a Graph

  • 1. A Survey As A Graph Representing survey data in a natural way Klaus Blass Consultant – Development Data Group The World Bank klaus.blass@yahoo.com
  • 2. Surveys • Cross-sectional Surveys • Snapshots • Longitudinal Surveys • Repeated at regular intervals • Household Budget Surveys (HBS) • Agricultural Production Surveys (APS) • Population Census • . . .
  • 3. Surveys • Paper forms Subsequent Data Entry • CAPI (Computer Assisted Personal Interview) Tablets Data already in digital format on server
  • 4. Survey Solutions • Survey Solutions is a CAPI system developed & maintained by the World Bank • used in thousands of surveys and censuses in 175 countries • free software https://mysurvey.solutions
  • 6. Survey Data • Tables of • Households • Household assets • Household members • Revenues & Expenses • Agricultural plots • Crops • . . .
  • 7. Health and Demographic Surveillance System HDSS • Monitoring the state of a population over time • Demography (age, sex, ethnicity, etc.) • Health (diseases, mortality) • Wealth (assets, planted crops, animals, income) • Migration (immigration & emigration)
  • 8. The Nouna HDSS in Burkina Faso Centre de Recherche en Santé de Nouna (CRSN) Heidelberg Institute of Global Health (HIGH) Rural area with 59 villages 14,000 households 115,000 people
  • 9. Survey Data - households table (> 350 variables)
  • 10. Survey Data - households table (missing values)
  • 11. Survey Data - code labels . . . . label define q107 1 `"Moquette / parquet"' 2 `"Bois poli"' 3 `"Carreaux"' 4 `"Vinyle"' 5 `"Ciment"' 6 `"Terre battue / Sable"' 9 `"Autre (à préciser)"' label values q107 q107 label variable q107 `"De quels principaux matériaux est fait le sol de l’habitation principale du ménage ?"' label variable q107autre `"Quels autres matériaux?"' label define q108 1 `"Béton"' 2 `"Tuiles"' 3 `"Tôles"' 4 `"Paille/Feuille"' 5 `"Banco / Terre Battue"' 6 `"Autre (à préciser)"' label values q108 q108 label variable q108 `"Quels sont les principaux matériaux du toit de l’habitation principale du ménage ?"' label variable q108autre `"Quels autres matériaux?"' label variable q109__1 `"Quel type de toilettes utilisez-vous?:WC avec chasse eau"'
  • 12. Survey Data • Data are arranged not for clarity* but for completeness * from a human reader‘s point of view • A single survey variable may occupy a dozen columns • Data are just codes, labels are in different (rather cryptic) files • Just loading these data with a LOAD utility is not an option Data must be loaded with a custom program!
  • 13. A custom Java loader Basic structure: • Instantiate a Bolt driver • server port • credentials • Write a method which • opens a session • begins a transaction • runs a Cypher query • commits the transaction
  • 14. Example: Loading all households private final static String MERGE_HOUSEHOLD = "MERGE (a:Household {ivkey: $ivkey, id: $id, hhNum: $hhNum, phoneCHM: $phoneCHM}) " + "WITH a " + "MATCH (c:Compound {ivkey: $ivkey}) " + "WITH a, c " + "MERGE (a)-[:IN_COMPOUND]->(c) " + "RETURN id(a)" ; public void build_households(String surveyFile) throws IOException { TabFile tf = new TabFile() { public void onLine( String line) { String[] cell = line.split(tab); write(MERGE_HOUSEHOLD, parameters( "ivkey", cell[0], "id", cell[6], "hhNum", cell[2], "phoneCHM", cell[9] )); }
  • 15. Load all nodes and create relationships • Villages • Compounds • Households • Assets • Members • Immigrations/emigrations
  • 16. The village of Moinsi Moinsi, the smallest village in the Nouna HDSS: All compounds, the households they contain, and their household members.
  • 17. Bad data “Never throw away survey data“ Problem: Empty compounds Solution: • Relabel empty compounds as GhostCompounds • Will no longer show up in compound related queries match (c:Compound) optional match (c)-[r:IN_COMPOUND]- (hh:Household) with c,hh,count(r) as hhcount where hhcount=0 set c:GhostCompound and match (g:GhostCompound) remove g:Compound
  • 18. Reported Deaths • Members who died should no longer be considered “members“ • But we want to remember which household they belonged to Relabel them as “Deceased“
  • 19. Code tables Example: building characteristics • Only the selected code is saved in the data. • Want to be able to query this property by code or by description. • There is no “Lookup table“ in Neo4j where I could look up the description from the code.
  • 20. Code tables • Store code and description in one string • How to query by either code or (partial) description ? Roll your own function !
  • 21. The Power of User-Defined Functions • WHERE klaus.codeOf( habitation.floor, ‘5‘ ) • WHERE klaus.includes( habitation.floor, ‘cement‘ ) • normalizes all text to lowercase, diacritics removed • allows for slight divergence of spelling (Levenshtein distance <= 1)
  • 22. User-Defined Functions klaus.includes( habitation.floor, ‘cement‘ ) public Boolean includes( @Name("String to search")String s1, @Name("keyword")String keyword ) { s1 = normalize(s1); keyword = normalize(keyword); String[] words = s1.split("s+|/|-|");// whitespace / - for (String s : words) { if (LevenshteinDistance(s, keyword) <= 1) return true; } return false; }
  • 23. User-Defined Functions More user-defined functions: • Date functions for partial dates using own assumptions about missing components • Matching similar text (specify Levenshtein distance) klaus.similar(‘solar‘, ‘Solaire‘, 2) true
  • 24. Ordered multiple responses • multiple answers possible • order of answers recorded
  • 25. Ordered multiple responses Select the type of toilet your household uses - most used first (max. 3 answers)
  • 26. Longitudinal Data Example: pregnancies • Pregnancies differ during survey rounds • Pregnancy events are nodes • Could later be linked to an outcome (new member, abortion, etc.)
  • 27. Longitudinal Data Identify each pregnancy by the round they were reported multiple labels • Query pregnancies in general MATCH (m:Member)--(p:Pregnancy) • Query pregnancies during a specific survey round MATCH (m:Member)--(p:Pregnancy:Round1)
  • 28. Migration • Node, property or relationship? • Migrations as relationships • Properties: date reason returned? • Great visualization • Emigrants & Visitors
  • 30. Thank you for your attention. Questions?

Editor's Notes

  1. LASER POINTER !!!
  2. … come in all kind of flavors - State of a population at a spec. moment in time - Monitoring a pop. Over time
  3. one of the major platforms for CAPI surveys today
  4. Nouna Health Research Center In partnership with HIGH
  5. Raw Data Not a carefully prepared dataset published by a Gov. Agency. Cryptic var names mult. Cols per var (almost) only codes
  6. supply parameters from the file … for the db variables or properties
  7. Clicking a node: -> get a full list of ist properties
  8. without modifying the schema beforehand ! => schemaless database system
  9. Extend Cypher Functions with full functionality of Java
  10. Relationships -> 1st class members in Neo4j
  11. I tried to show you: How to represent the data from a longitudinal Survey as a graph & how to use some of the features in Neo4j to achieve this