SlideShare a Scribd company logo
1 of 10
Download to read offline
Ez a technológiáról fog szólni….
PANAMA PAPERS
ÉS A GRÁFOK
FORRÁS
2.6TB adat. Relációs adatbázisok, emailek,
különböző banki dokumentumok, cégiratok,
amelyek a 215,000 offshore céghez kapcsolódnak,
akik a panamai Mossack Fonseca jogi szolgáltató
cég ügyfelei voltak 1977 és 2015 között.
A FOLYAMAT
1. Acquire documents
2. Classify documents
a. Scan / OCR —Tesseract
b. Extract document metadata — Apache Tika https://tika.apache.org
3. Whiteboard domain
a. Determine entities and their relationships
b. Determine potential entity and relationship properties
c. Determine sources for those entities and their properties
4. Work out analyzers, rules, parsers and named entity recognition for documents —Apache Solr, Blacklight
http://projectblacklight.org, Nuix https://www.nuix.com
5. Parse and store document metadata and document and entity relationships —Talend http://
www.talend.com
a. Parse by author, named entities, dates, sources and classification
6. Infer entity relationships
7. Compute similarities, transitive cover and triangles
8. Analyze data using graph queries and visualizations —Neo4j, Linkurious http://linkurio.us
ENTITÁSOK
• 	 Clients
• 	 Companies
• 	 Addresses
• 	 Officers (both natural people
and companies)
RELÁCIÓK
• 	 (:Officer)-[:is officer of]->(:Company)
• 	 (:Officier)-[:registered address]->(:Address)
• 	 (:Client)-[:registered]->(:Company)
• 	 (:Officer)-[:has similar name and address]->(:Officer)
GRÁF MODELL
GRÁF MODELL
RUGALMAS ADATMODELL
Új entitások:
Documents: E-Mail, PDF, Contract, DB-Record, …
Money Flow: Accounts / Banks / Intermediaries
Új relációk:
Family / business ties
Conversations
Peer Groups / Rings
Similar Roles
Mentions / Topic-Of
Money Flow
FELFEDEZÉS
Once the database was set up, it was a simple
matter to install and configure Linkurious to
essentially provide a GUI (graphical user interface)
atop the database. Having the visual depiction of
the graph of names and addresses was critical in
making sense of the data, especially for non-
technical reporters.
Demo
https://offshoreleaks.icij.org/nodes/10121110

More Related Content

Similar to Panama Papers Neo4j Budapest Meetup

Standards brainstorming: NSTIC/IIW13
Standards brainstorming: NSTIC/IIW13Standards brainstorming: NSTIC/IIW13
Standards brainstorming: NSTIC/IIW13Jamie Clark
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes senseFabien Gandon
 
Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)Globus
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus
 
The Semantic Web #4 - RDF (1)
The Semantic Web #4 - RDF (1)The Semantic Web #4 - RDF (1)
The Semantic Web #4 - RDF (1)Myungjin Lee
 
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationTimothy Spann
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scaledatamantra
 
Computer forensics libin
Computer forensics   libinComputer forensics   libin
Computer forensics libinlibinp
 
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdCommodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdNathan Yergler
 
Diving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging NewsDiving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging NewsOntotext
 
'Malware Analysis' by PP Singh
'Malware Analysis' by PP Singh'Malware Analysis' by PP Singh
'Malware Analysis' by PP SinghBipin Upadhyay
 
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)dataSUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)dataDiego Valerio Camarda
 
Witness tree text analysis
Witness tree   text analysisWitness tree   text analysis
Witness tree text analysisCole Capital
 
Foca training hackcon6
Foca training hackcon6Foca training hackcon6
Foca training hackcon6Chema Alonso
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Jane Stevenson
 

Similar to Panama Papers Neo4j Budapest Meetup (20)

Standards brainstorming: NSTIC/IIW13
Standards brainstorming: NSTIC/IIW13Standards brainstorming: NSTIC/IIW13
Standards brainstorming: NSTIC/IIW13
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes sense
 
Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)
 
The Semantic Web #4 - RDF (1)
The Semantic Web #4 - RDF (1)The Semantic Web #4 - RDF (1)
The Semantic Web #4 - RDF (1)
 
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Infolitbis
InfolitbisInfolitbis
Infolitbis
 
Computer forensics libin
Computer forensics   libinComputer forensics   libin
Computer forensics libin
 
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEdCommodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
 
Diving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging NewsDiving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging News
 
'Malware Analysis' by PP Singh
'Malware Analysis' by PP Singh'Malware Analysis' by PP Singh
'Malware Analysis' by PP Singh
 
Malware Analysis -an overview by PP Singh
Malware Analysis -an overview by PP SinghMalware Analysis -an overview by PP Singh
Malware Analysis -an overview by PP Singh
 
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)dataSUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data
 
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
NISO Webinar: Back From the Endangered List: Using Authority Data to Enhance ...
 
ITWS Capstone: Engineering a Semantic Web (Fall 2022)
ITWS Capstone: Engineering a Semantic Web (Fall 2022)ITWS Capstone: Engineering a Semantic Web (Fall 2022)
ITWS Capstone: Engineering a Semantic Web (Fall 2022)
 
Witness tree text analysis
Witness tree   text analysisWitness tree   text analysis
Witness tree text analysis
 
Foca training hackcon6
Foca training hackcon6Foca training hackcon6
Foca training hackcon6
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Panama Papers Neo4j Budapest Meetup

  • 1. Ez a technológiáról fog szólni…. PANAMA PAPERS ÉS A GRÁFOK
  • 2. FORRÁS 2.6TB adat. Relációs adatbázisok, emailek, különböző banki dokumentumok, cégiratok, amelyek a 215,000 offshore céghez kapcsolódnak, akik a panamai Mossack Fonseca jogi szolgáltató cég ügyfelei voltak 1977 és 2015 között.
  • 3. A FOLYAMAT 1. Acquire documents 2. Classify documents a. Scan / OCR —Tesseract b. Extract document metadata — Apache Tika https://tika.apache.org 3. Whiteboard domain a. Determine entities and their relationships b. Determine potential entity and relationship properties c. Determine sources for those entities and their properties 4. Work out analyzers, rules, parsers and named entity recognition for documents —Apache Solr, Blacklight http://projectblacklight.org, Nuix https://www.nuix.com 5. Parse and store document metadata and document and entity relationships —Talend http:// www.talend.com a. Parse by author, named entities, dates, sources and classification 6. Infer entity relationships 7. Compute similarities, transitive cover and triangles 8. Analyze data using graph queries and visualizations —Neo4j, Linkurious http://linkurio.us
  • 4. ENTITÁSOK • Clients • Companies • Addresses • Officers (both natural people and companies)
  • 5. RELÁCIÓK • (:Officer)-[:is officer of]->(:Company) • (:Officier)-[:registered address]->(:Address) • (:Client)-[:registered]->(:Company) • (:Officer)-[:has similar name and address]->(:Officer)
  • 8. RUGALMAS ADATMODELL Új entitások: Documents: E-Mail, PDF, Contract, DB-Record, … Money Flow: Accounts / Banks / Intermediaries Új relációk: Family / business ties Conversations Peer Groups / Rings Similar Roles Mentions / Topic-Of Money Flow
  • 9. FELFEDEZÉS Once the database was set up, it was a simple matter to install and configure Linkurious to essentially provide a GUI (graphical user interface) atop the database. Having the visual depiction of the graph of names and addresses was critical in making sense of the data, especially for non- technical reporters.