Get guidance through the gigantic sea of freely released data from Panama Papers as well as Linked Open Data could. You will learn how it can empower you understanding of today’s news or any other information source.
Magic exist by Marta Loveguard - presentation.pptx
Diving in Panama Papers and Open Data to Discover Emerging News
1. Diving in Panama Papers
and Open Data
Ontotext Webinar, 26 May 2016
2. Relation Discovery Case
May 2016Diving in Panama Papers and Open Data
• Find suspicious
relationships like:
− Company in USA controls
− Another company in USA
− Through a company in an
off-shore zone
• Show news
relevant to them
3. Presentation Outline
• Publishing Panama Papers DB as #LinkedLeaks
• Sample Queries
• FactForge-News open-data playground
• Next steps
May 2016Diving in Panama Papers and Open Data
4. Offshore Leaks Database from ICIJ
• Published by the International Consortium of Investigative
Journalists (ICIJ) on 9th of May
• A “searchable database” about 320 000 offshore companies
− 214 000 extracted from Panama Papers (valid until 2015)
− More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)
• CSV extract from a graph database available for download
• https://offshoreleaks.icij.org/
May 2016Diving in Panama Papers and Open Data
6. Offshore Leaks DB as Linked Open Data
• Ontotext published the Offshore Leaks DB as Linked Open Data
• Available for exploration, querying and download at
http://data.ontotext.com
• ONTOTEXT DISCLAIMERS
We use the data as is provided by ICIJ. We make no representations and warranties of any kind,
including warranties of title, accuracy, absence of errors or fitness for particular purpose. All
transformations, query results and derivative works are used only to showcase the service and
technological capabilities and not to serve as basis for any statements or conclusions.
May 2016Diving in Panama Papers and Open Data
7. Enrichment and structuring of the data
• Relationship type hierarchy
− About 80 types of relationship types in the original dataset got organized in a property hierarchy
• Classification of officers into Person and Company
− In the original database there is no way to distinguish whether an officer is a physical person
• Mapping to DBPedia:
− 209 countries referred in Offshore Leaks DB are mapped to DBPedia
− About 3000 companies and 300 persons mapped to DBPedia
• Overall size of the repository: 22M statements (20M explicit)
May 2016Diving in Panama Papers and Open Data
8. The RDF-ization Process
• Linked data variant produced without programming
− The raw CSV files are RDF-ized using TARQL, http://tarql.github.io/
− Data was further interlinked and enriched in GraphDB using SPARQL
• The process is documented in this README file
• All relevant artifacts are open-source, available at
https://github.com/Ontotext-AD/leaks/
• The entire publishing and mapping took about 15 person-days !!!
− Including data.ontotext.com portal setup, promotion, documentation, etc.
May 2016Diving in Panama Papers and Open Data
9. Presentation Outline
• Publishing Panama Papers DB as #LinkedLeaks
• Sample Queries
• Integration with DBPedia & other data
• Next steps
May 2016Diving in Panama Papers and Open Data
10. Sample queries at http://data.ontotext.com
Q1: Countries by number of entities related to them
Q2: Country pairs by ownership statistics
Q3: Statistics by incorporation year
Q4: Officers and entities by number of capital relations
Q5: Countries in Eastern Europe by number of owners
Q6: Intermediaries in Asia by name
Q7: The best connected officers
Q8: Countries by number of Person and Company officers
May 2016Diving in Panama Papers and Open Data
11. Presentation Outline
• Publishing Panama Papers DB as #LinkedLeaks
• Sample Queries
• FactForge-News open data playground
• Next steps
May 2016Diving in Panama Papers and Open Data
12. Our approach to Big Data
1. Integrate relevant data from many sources
− Build a Big Knowledge Graph from proprietary databases and
taxonomies integrated with millions of facts of Linked Data
2. Infer new facts and unveil relationships
− Performing reasoning across data from different sources
3. Interlink text and with big data
− Using text-mining to automatically discover references to
concepts and entities
4. Use NoSQL graph database for metadata
management, querying and search
Mar 2016Open Data & News Analytics #12
13. Quick news-analytics case
Mar 2016Open Data & News Analytics 13
• Our Dynamic Semantic
Publishing platform
already offers linking
of text with big open
data graphs
• One can get navigate
from text to concepts,
get trends, related
entities and news
• Try it at
http://now.ontotext.com
14. FF-NEWS: Data Integration and Loading
• DBpedia (the English version only) 496M statements
• Geonames (all geographic features on Earth) 150M statements
− owl:sameAs links between DBpedia and Geonames 471K statements
• Company registry data (GLEI) 3M statements
• News metadata (from NOW) 128M statements
• Total size: 986М statements
− Mapped to FIBO; 667M explicit statements + 318M inferred statements
− RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-spatial constraints
May 2016Diving in Panama Papers and Open Data
15. Global Legal Entity Identifier (GLEI) data
May 2016
• Global Markets Entity Identifier (GMEI) Utility data
− The Global Markets Entity Identifier (GMEI) utility is DTCC's legal entity identifier solution offered in
collaboration with SWIFT
− We downloaded data dump from https://www.gmeiutility.org/
• RDF-ized company records
− Fields: LEI#, legal name, ultimate parent, registered country
− 3M explicit statements for 211 thousand organizations
▪ For comparison, there are 490 000 organizations in DBPeda and D&B covers above 200 million
− 10,821 ultimate parent relationships and 1632 ultimate parents
− About 2 800 organizations from the GLEI dump mapped to DBPedia
Diving in Panama Papers and Open Data
16. Loading FIBO
• FIBO = Financial Industry Business Ontology
• We loaded FIBO Foundations and BE in GraphDB
− About 55 RDF files the “foundations-14-11-30” and “business-eneitites-15-02-23” packages
• Reasoning switched to OWL 2 RL
− Loading takes 3-4 seconds
• Number of explicit statements: 5 433
• Number of total statements: 20 646
− Of which inferred and materialized: 15 213
May 2016Diving in Panama Papers and Open Data
17. Mapping FIBO to DBPedia
• We mapped FIBO to DBPedia Ontology
− Minimalistic approach – we mapped as much as we needed
dbo:Organization rdfs:subClassOf fibo-fnd-org-fm:FormalOrganization.
dbo:Company rdfs:subClassOf fibo-be-le-cb:Corporation.
dbo:Person rdfs:subClassOf fibo-fnd-aap-ppl:Person.
dbo:subsidiary rdfs:subPropertyOf fibo-fnd-rel-rel:controls.
• Methodological notes
− Note, fibo-fnd-rel-rel:controls is not transitive
− We mapped more specific DBPedia primitives to more general FIBO, so, that data becomes “visible”
through FIBO
May 2016Diving in Panama Papers and Open Data
18. See open data through the FIBO lens
May 2016Diving in Panama Papers and Open Data
19. Semantic Press-Clipping
• We can trace references to a specific company in the news
− This is pretty much standard, however we can deal with syntactic variations in the names, because state
of the art Named Entity Recognition technology is used
− What’s more important, we distinguish correctly in which mention “Paris” refers to which of the
following: Paris (the capital of France), Paris in Texas, Paris Hilton or to Paris (the Greek hero)
• We can trace and consolidate references to daughter companies
• We have comprehensive industry classification
− The one from DBPedia, but refined to accommodate identifier variations and specialization (e.g.
company classified as dbr:Bank will also be considered classified as dbr:FinancialServices)
May 2016Diving in Panama Papers and Open Data
20. Sample queries at http://ff-news.ontotext.com
F1: Big cities in Eastern Europe
F2: Airports near London
F3: People and organizations related to Google
F4: Top-level industries by number of companies
F5: Mentions in the news of an organization and its related entities
F7: Most popular companies per industry, including children
F8: Regional exposition of company – normalized
FF-NEWS is still in Beta testing ! Not officially launched, but available to play with
May 2016Diving in Panama Papers and Open Data
21. News Popularity Ranking: Automotive
May 2016Diving in Panama Papers and Open Data
Rank Company News # Rank Company incl. mentions of controlled News #
1 General Motors 2722 1 General Motors 4620
2 Tesla Motors 2346 2 Volkswagen Group 3999
3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658
4 Ford Motor Company 1934 4 Tesla Motors 2370
5 Toyota 1325 5 Ford Motor Company 2125
6 Chevrolet 1264 6 Toyota 1656
7 Chrysler 1054 7 Renault-Nissan Alliance 1332
8 Fiat Chrysler Automobiles 1011 8 Honda 864
9 Audi AG 972 9 BMW 715
10 Honda 717 10 Takata Corporation 547
22. News Popularity: Finance
May 2016Diving in Panama Papers and Open Data
Rank Company News # Rank Company incl. mentions of controlled News #
1 Bloomberg L.P. 3203 1 Intra Bank 261667
2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731
3 JP Morgan Chase 1712 3 China Merchants Bank 38288
4 Wells Fargo 1688 4 Alphabet Inc. 22601
5 Citigroup 1557 5 Capital Group Companies 4076
6 HSBC Holdings 1546 6 Bloomberg L.P. 3611
7 Deutsche Bank 1414 7 Exor 2704
8 Bank of America 1335 8 Nasdaq, Inc. 2082
9 Barclays 1260 9 JP Morgan Chase 1972
10 UBS 694 10 Sentinel Capital Partners 1053
Note: Including investment funds, stock exchanges, agencies, etc.
23. News Popularity: Banking
May 2016Diving in Panama Papers and Open Data
Rank Company News # Rank Company incl. mentions of controlled News #
1 Goldman Sachs 996 1 China Merchants Bank * 38288
2 JP Morgan Chase 856 2 JP Morgan Chase 1972
3 HSBC Holdings 773 3 Goldman Sachs 1030
4 Deutsche Bank 707 4 HSBC 966
5 Barclays 630 5 Bank of America 771
6 Citigroup 519 6 Deutsche Bank 742
7 Bank of America 445 7 Barclays 681
8 Wells Fargo 422 8 Citigroup 630
9 UBS 347 9 Wells Fargo 428
10 Chase 126 10 UBS 347
Note: including investment funds, stock exchanges, agencies, etc.
24. #LinkedLeaks Mapping Queries
Number of entities mapped by type
Companies mapped by industry
Companies mapped in the Finance sector
Politicians mapped
Athletes mapped
May 2016Diving in Panama Papers and Open Data
25. Presentation Outline
• Publishing Panama Papers DB as #LinkedLeaks
• Sample Queries
• FactForge-News open data playground
• Next steps
May 2016Diving in Panama Papers and Open Data
26. Future Work
May 2016
• Publish and interlink LEI data and other datasets
− More comprehensive mapping of LEI data to DBPedia
− Refine #LinkedLeaks, providing more structure; FIBO mapping
− Launch updated FactForge.net portal
• Relationship discovery work
− Ultimate parent and suspicious control pattern discovery
− Organizations, related in the news, but not in other datasets
• Partnership with commercial data providers
• Partnership with journalists and analysts
Diving in Panama Papers and Open Data
27. Wrap up
May 2016
• We published Offshore Leaks DB as Linked Open Data
− It took us few days after the release of the raw CSVs.
− Mapping to DBpedia available
− Play with it! Take it!
• We allow multiple open datasets to be used for discovery
− It took just few days to clean up DBPedia’s industry classifications and control relationships
− Several datasets accessible through Financial Industry Business Ontology (FIBO)
• Integrating more data sources is easy, e.g. GLEI and #LinkedLeaks
− We can integrate proprietary and 3rd party data within days or weeks
Diving in Panama Papers and Open Data
28. Thank you!
Experience the technology with NOW: Semantic News Portal
http://now.ontotext.com
Start using GraphDB and text-mining with S4 in the cloud
http://s4.ontotext.com
Play with open data at http://data.ontotext.com
May 2016Diving in Panama Papers and Open Data