Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks Case

368 views

Published on

Borislav Popov's slides from his lightning talk at Connected Data London. Borislav - a Director of Business Development at Ontotext presented Ontotext's approach to tackling the Panama Papers leak. Using a technology that is a mix between semantic web and graph databases.

Published in: Technology
  • Be the first to comment

Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks Case

  1. 1. Powerful Information Discovery with Big Knowledge Graphs – The Offshore Leaks Case Ontotext, July 2016
  2. 2. Data - Content - User • Psycho-graphic vs. demographic profiles • Build behavioural profiles on the basis of semantic metadata associated with the assets • Control results bias with runtime parameters • Create semantic fingerprints of assets • Driven off of a knowledge graph • Automatically adapts through machine learning • Semantic Database • Replication Cluster for enterprise clients • Connectors to 3rd party indexing/storage products & hybrid queries
  3. 3. Data Layer – the Core Semantic Fingerprints of Content Instance Data / Relationships / Facts Ontology / Schema / Domain Model GraphDB Node Zoom In Node 1 Node 3 Master 1 Master 2 Enterprise
  4. 4. Semantic Enrichment Overview
  5. 5. Personalization – User Actions Model perform comments votes posts preview read contains leads to read leads to preview Article Search Action Result Date FTS Q. Tag Cat Tag set results cat taxonomy Search Log ------------- ------------- ------------- ------------- -------------
  6. 6. Quick news-analytics case • Our Dynamic Semantic Publishing platform offers linking of text with big open data graphs • One can navigate from text to concepts, get trends, related entities and news • Try it at http://now.ontotext.com
  7. 7. FF-NEWS: Data Integration and Loading • DBpedia (the English version only) 496M statements • Geonames (all geographic features on Earth) 150M statements − owl:sameAs links between DBpedia and Geonames 471K statements • Company registry data (GLEI) 3M statements • News metadata (from NOW) 128M statements • Total size: 986М statements − Mapped to FIBO; 667M explicit statements + 318M inferred statements − RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-spatial constraints Open data integration for news analytics
  8. 8. Technology: Semantic Content Enrichment
  9. 9. News Metadata • Metadata from Ontotext’s Dynamic Semantic Publishing platform − Automatically generated as part of the NOW.ontotext.com semantic news showcase • News stream from Google since Feb 2015, about 10k news/month − ~70 tags (annotations) per news article • Tags link text mentions of concepts to the knowledge graph − Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases Apr 2016Hidden Relationships in Data and Risk Analytics
  10. 10. News Metadata Apr 2016Hidden Relationships in Data and Risk Analytics Category Count International News 52 074 Science and Technology 23 201 Sports 20 714 Business 15 155 Lifestyle 11 684 122 828 Mentions / entity type Count Keyphrase 2 589 676 Organization 1 276 441 Location 1 260 972 Person 1 248 784 Work 309 093 Event 258 388 RelationPersonRole 236 638 Species 180 946
  11. 11. Sample queries at http://ff-news.ontotext.com F1: Big cities in Eastern Europe F2: Airports near London F3: People and organizations related to Google F4: Top-level industries by number of companies F5: Mentions in the news of an organization and its related entities F7: Most popular companies per industry, including children F8: Regional exposition of company – normalized FF-NEWS is in Beta. Not officially launched, but available to play with. Open data integration for news analytics
  12. 12. News Popularity Ranking: Automotive Open data integration for news analytics Rank Company News # Rank Company incl. mentions of child companies News # 1 General Motors 2722 1 General Motors 4620 2 Tesla Motors 2346 2 Volkswagen Group 3999 3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658 4 Ford Motor Company 1934 4 Tesla Motors 2370 5 Toyota 1325 5 Ford Motor Company 2125 6 Chevrolet 1264 6 Toyota 1656 7 Chrysler 1054 7 Renault-Nissan Alliance 1332 8 Fiat Chrysler Automobiles 1011 8 Honda 864 9 Audi AG 972 9 BMW 715 10 Honda 717 10 Takata Corporation 547
  13. 13. News Popularity: Finance Open data integration for news analytics Rank Company News # Rank Company incl. mentions of controlled News # 1 Bloomberg L.P. 3203 1 Intra Bank 261667 2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731 3 JP Morgan Chase 1712 3 China Merchants Bank 38288 4 Wells Fargo 1688 4 Alphabet Inc. 22601 5 Citigroup 1557 5 Capital Group Companies 4076 6 HSBC Holdings 1546 6 Bloomberg L.P. 3611 7 Deutsche Bank 1414 7 Exor 2704 8 Bank of America 1335 8 Nasdaq, Inc. 2082 9 Barclays 1260 9 JP Morgan Chase 1972 10 UBS 694 10 Sentinel Capital Partners 1053 Note: Including investment funds, stock exchanges, agencies, etc.
  14. 14. News Popularity: Banking Open data integration for news analytics Rank Company News # Rank Company incl. mentions of controlled News # 1 Goldman Sachs 996 1 China Merchants Bank * 38288 2 JP Morgan Chase 856 2 JP Morgan Chase 1972 3 HSBC Holdings 773 3 Goldman Sachs 1030 4 Deutsche Bank 707 4 HSBC 966 5 Barclays 630 5 Bank of America 771 6 Citigroup 519 6 Deutsche Bank 742 7 Bank of America 445 7 Barclays 681 8 Wells Fargo 422 8 Citigroup 630 9 UBS 347 9 Wells Fargo 428 10 Chase 126 10 UBS 347
  15. 15. Offshore Leaks Database from ICIJ • Published by the International Consortium of Investigative Journalists (ICIJ) on 9th of May • A “searchable database” about 320 000 offshore companies − 214 000 extracted from Panama Papers (valid until 2015) − More than 100 000 from 2013 Offshore leaks investigation (valid until 2010) • CSV extract from a graph database available for download • https://offshoreleaks.icij.org/ Open data integration for news analytics
  16. 16. Offshore Leaks Database Open data integration for news analytics
  17. 17. Offshore Leaks DB as Linked Open Data • Ontotext published the Offshore Leaks DB as Linked Open Data • Available for exploration, querying and download at http://data.ontotext.com • ONTOTEXT DISCLAIMERS We use the data as is provided by ICIJ. We make no representations and warranties of any kind, including warranties of title, accuracy, absence of errors or fitness for particular purpose. All transformations, query results and derivative works are used only to showcase the service and technological capabilities and not to serve as basis for any statements or conclusions. Open data integration for news analytics
  18. 18. Enrichment and structuring of the data • Relationship type hierarchy − About 80 types of relationship types in the original dataset got organized in a property hierarchy • Classification of officers into Person and Company − In the original database there is no way to distinguish whether an officer is a physical person • Mapping to DBPedia: − 209 countries referred in Offshore Leaks DB are mapped to DBPedia − About 3000 persons and 300 companies mapped to DBPedia • Overall size of the repository: 22M statements (20M explicit) Open data integration for news analytics
  19. 19. The RDF-ization Process • Linked data variant produced without programming − The raw CSV files are RDF-ized using TARQL, http://tarql.github.io/ − Data was further interlinked and enriched in GraphDB using SPARQL • The process is documented in this README file • All relevant artifacts are open-source, available at https://github.com/Ontotext-AD/leaks/ • The entire publishing and mapping took about 15 person-days. − Including data.ontotext.com portal setup, promotion, documentation, etc. Open data integration for news analytics
  20. 20. Sample queries at http://data.ontotext.com Q1: Countries by number of entities related to them Q2: Country pairs by ownership statistics Q3: Statistics by incorporation year Q4: Officers and entities by number of capital relations Q5: Countries in Eastern Europe by number of owners Q6: Intermediaries in Asia by name Q7: The best connected officers Q8: Countries by number of Person and Company officers
  21. 21. Play with semantically enriched news: http://now.ontotext.com Play with open data at http://data.ontotext.com and http://ff- news.ontotext.com

×