SlideShare a Scribd company logo
1 of 21
Download to read offline
Powerful Information Discovery
with Big Knowledge Graphs –
The Offshore Leaks Case
Ontotext, July 2016
Data - Content - User
• Psycho-graphic vs. demographic profiles
• Build behavioural profiles on the basis of
semantic metadata associated with the assets
• Control results bias with runtime parameters
• Create semantic fingerprints of assets
• Driven off of a knowledge graph
• Automatically adapts through machine
learning
• Semantic Database
• Replication Cluster for enterprise clients
• Connectors to 3rd party indexing/storage
products & hybrid queries
Data Layer – the Core
Semantic Fingerprints of Content
Instance Data / Relationships / Facts
Ontology / Schema / Domain Model
GraphDB Node Zoom In
Node 1 Node 3
Master 1 Master 2
Enterprise
Semantic Enrichment Overview
Personalization – User Actions Model
perform
comments
votes
posts
preview
read
contains leads to
read
leads to
preview
Article
Search
Action
Result
Date
FTS Q. Tag
Cat
Tag set
results
cat
taxonomy
Search Log
-------------
-------------
-------------
-------------
-------------
Quick news-analytics case
• Our Dynamic Semantic
Publishing platform
offers linking of text
with big open data
graphs
• One can navigate from
text to concepts, get
trends, related entities
and news
• Try it at
http://now.ontotext.com
FF-NEWS: Data Integration and Loading
• DBpedia (the English version only) 496M statements
• Geonames (all geographic features on Earth) 150M statements
− owl:sameAs links between DBpedia and Geonames 471K statements
• Company registry data (GLEI) 3M statements
• News metadata (from NOW) 128M statements
• Total size: 986М statements
− Mapped to FIBO; 667M explicit statements + 318M inferred statements
− RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-spatial constraints
Open data integration for news analytics
Technology: Semantic Content Enrichment
News Metadata
• Metadata from Ontotext’s Dynamic Semantic Publishing platform
− Automatically generated as part of the NOW.ontotext.com semantic news showcase
• News stream from Google since Feb 2015, about 10k news/month
− ~70 tags (annotations) per news article
• Tags link text mentions of concepts to the knowledge graph
− Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases
Apr 2016Hidden Relationships in Data and Risk Analytics
News Metadata
Apr 2016Hidden Relationships in Data and Risk Analytics
Category Count
International News 52 074
Science and Technology 23 201
Sports 20 714
Business 15 155
Lifestyle 11 684
122 828
Mentions / entity type Count
Keyphrase 2 589 676
Organization 1 276 441
Location 1 260 972
Person 1 248 784
Work 309 093
Event 258 388
RelationPersonRole 236 638
Species 180 946
Sample queries at http://ff-news.ontotext.com
F1: Big cities in Eastern Europe
F2: Airports near London
F3: People and organizations related to Google
F4: Top-level industries by number of companies
F5: Mentions in the news of an organization and its related entities
F7: Most popular companies per industry, including children
F8: Regional exposition of company – normalized
FF-NEWS is in Beta. Not officially launched, but available to play with.
Open data integration for news analytics
News Popularity Ranking: Automotive
Open data integration for news analytics
Rank Company News # Rank
Company incl. mentions of child
companies News #
1 General Motors 2722 1 General Motors 4620
2 Tesla Motors 2346 2 Volkswagen Group 3999
3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658
4 Ford Motor Company 1934 4 Tesla Motors 2370
5 Toyota 1325 5 Ford Motor Company 2125
6 Chevrolet 1264 6 Toyota 1656
7 Chrysler 1054 7 Renault-Nissan Alliance 1332
8 Fiat Chrysler Automobiles 1011 8 Honda 864
9 Audi AG 972 9 BMW 715
10 Honda 717 10 Takata Corporation 547
News Popularity: Finance
Open data integration for news analytics
Rank Company News # Rank Company incl. mentions of controlled News #
1 Bloomberg L.P. 3203 1 Intra Bank 261667
2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731
3 JP Morgan Chase 1712 3 China Merchants Bank 38288
4 Wells Fargo 1688 4 Alphabet Inc. 22601
5 Citigroup 1557 5 Capital Group Companies 4076
6 HSBC Holdings 1546 6 Bloomberg L.P. 3611
7 Deutsche Bank 1414 7 Exor 2704
8 Bank of America 1335 8 Nasdaq, Inc. 2082
9 Barclays 1260 9 JP Morgan Chase 1972
10 UBS 694 10 Sentinel Capital Partners 1053
Note: Including investment funds, stock exchanges, agencies, etc.
News Popularity: Banking
Open data integration for news analytics
Rank Company News # Rank Company incl. mentions of controlled News #
1 Goldman Sachs 996 1 China Merchants Bank * 38288
2 JP Morgan Chase 856 2 JP Morgan Chase 1972
3 HSBC Holdings 773 3 Goldman Sachs 1030
4 Deutsche Bank 707 4 HSBC 966
5 Barclays 630 5 Bank of America 771
6 Citigroup 519 6 Deutsche Bank 742
7 Bank of America 445 7 Barclays 681
8 Wells Fargo 422 8 Citigroup 630
9 UBS 347 9 Wells Fargo 428
10 Chase 126 10 UBS 347
Offshore Leaks Database from ICIJ
• Published by the International Consortium of Investigative
Journalists (ICIJ) on 9th of May
• A “searchable database” about 320 000 offshore companies
− 214 000 extracted from Panama Papers (valid until 2015)
− More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)
• CSV extract from a graph database available for download
• https://offshoreleaks.icij.org/
Open data integration for news analytics
Offshore
Leaks
Database
Open data integration for news analytics
Offshore Leaks DB as Linked Open Data
• Ontotext published the Offshore Leaks DB as Linked Open Data
• Available for exploration, querying and download at
http://data.ontotext.com
• ONTOTEXT DISCLAIMERS
We use the data as is provided by ICIJ. We make no representations and warranties of any kind,
including warranties of title, accuracy, absence of errors or fitness for particular purpose. All
transformations, query results and derivative works are used only to showcase the service and
technological capabilities and not to serve as basis for any statements or conclusions.
Open data integration for news analytics
Enrichment and structuring of the data
• Relationship type hierarchy
− About 80 types of relationship types in the original dataset got organized in a property hierarchy
• Classification of officers into Person and Company
− In the original database there is no way to distinguish whether an officer is a physical person
• Mapping to DBPedia:
− 209 countries referred in Offshore Leaks DB are mapped to DBPedia
− About 3000 persons and 300 companies mapped to DBPedia
• Overall size of the repository: 22M statements (20M explicit)
Open data integration for news analytics
The RDF-ization Process
• Linked data variant produced without programming
− The raw CSV files are RDF-ized using TARQL, http://tarql.github.io/
− Data was further interlinked and enriched in GraphDB using SPARQL
• The process is documented in this README file
• All relevant artifacts are open-source, available at
https://github.com/Ontotext-AD/leaks/
• The entire publishing and mapping took about 15 person-days.
− Including data.ontotext.com portal setup, promotion, documentation, etc.
Open data integration for news analytics
Sample queries at http://data.ontotext.com
Q1: Countries by number of entities related to them
Q2: Country pairs by ownership statistics
Q3: Statistics by incorporation year
Q4: Officers and entities by number of capital relations
Q5: Countries in Eastern Europe by number of owners
Q6: Intermediaries in Asia by name
Q7: The best connected officers
Q8: Countries by number of Person and Company officers
Play with semantically enriched news:
http://now.ontotext.com
Play with open data at
http://data.ontotext.com and http://ff-
news.ontotext.com

More Related Content

What's hot

Big data competitive landscape overview
Big data competitive landscape overviewBig data competitive landscape overview
Big data competitive landscape overview
Bisakha Praharaj
 
Big Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesBig Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and Opportunities
Srinath Srinivasa
 

What's hot (20)

Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
 
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data VirtualityBeyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
 
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...
 
INEGI ESS big data workshop
INEGI ESS big data workshopINEGI ESS big data workshop
INEGI ESS big data workshop
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Big data competitive landscape overview
Big data competitive landscape overviewBig data competitive landscape overview
Big data competitive landscape overview
 
Big Data Landscape 2016
Big Data Landscape 2016 Big Data Landscape 2016
Big Data Landscape 2016
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Practical Guide to Publishing Open Data
Practical Guide to Publishing Open DataPractical Guide to Publishing Open Data
Practical Guide to Publishing Open Data
 
Digital Science Presentation at ORCID Outreach Meeting (Ashlea Higgs)
Digital Science Presentation at ORCID Outreach Meeting (Ashlea Higgs)Digital Science Presentation at ORCID Outreach Meeting (Ashlea Higgs)
Digital Science Presentation at ORCID Outreach Meeting (Ashlea Higgs)
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesBig Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and Opportunities
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
Understanding voice of the member via text mining
Understanding voice of the member via text miningUnderstanding voice of the member via text mining
Understanding voice of the member via text mining
 
Data Skills for Digital Era
Data Skills for Digital EraData Skills for Digital Era
Data Skills for Digital Era
 

Viewers also liked

STADIUM-CIVIL 11.20.2014
STADIUM-CIVIL  11.20.2014STADIUM-CIVIL  11.20.2014
STADIUM-CIVIL 11.20.2014
Phil Kvasnica
 
Kecerdasan majemuk
Kecerdasan majemukKecerdasan majemuk
Kecerdasan majemuk
Christina Bakkara
 
Role of progesterone in Pregnancy
Role of progesterone in Pregnancy Role of progesterone in Pregnancy
Role of progesterone in Pregnancy
Lifecare Centre
 

Viewers also liked (17)

Semantic web application architecture
Semantic web   application architectureSemantic web   application architecture
Semantic web application architecture
 
Case Studies in Pharmaceutical Project Management.
Case Studies in Pharmaceutical Project Management.Case Studies in Pharmaceutical Project Management.
Case Studies in Pharmaceutical Project Management.
 
STADIUM-CIVIL 11.20.2014
STADIUM-CIVIL  11.20.2014STADIUM-CIVIL  11.20.2014
STADIUM-CIVIL 11.20.2014
 
Estructura del computador
Estructura del computadorEstructura del computador
Estructura del computador
 
Estructura del computador
Estructura del computadorEstructura del computador
Estructura del computador
 
Oracle Forms Introduction
Oracle Forms IntroductionOracle Forms Introduction
Oracle Forms Introduction
 
Kecerdasan majemuk
Kecerdasan majemukKecerdasan majemuk
Kecerdasan majemuk
 
Objetivos estrategicos y lineas de accion y link del video de yuotube
Objetivos estrategicos y lineas de accion y link del video de yuotubeObjetivos estrategicos y lineas de accion y link del video de yuotube
Objetivos estrategicos y lineas de accion y link del video de yuotube
 
5th Qatar BIM User Day, Live modeling techniques on a single project
5th Qatar BIM User Day, Live modeling techniques on a single project5th Qatar BIM User Day, Live modeling techniques on a single project
5th Qatar BIM User Day, Live modeling techniques on a single project
 
5.oxytocics
5.oxytocics5.oxytocics
5.oxytocics
 
Drugs in Haematological Disorders
Drugs in Haematological DisordersDrugs in Haematological Disorders
Drugs in Haematological Disorders
 
The Future of Internal Audit through data analytics
The Future of Internal Audit through data analyticsThe Future of Internal Audit through data analytics
The Future of Internal Audit through data analytics
 
Knowledge management architecture
Knowledge management architectureKnowledge management architecture
Knowledge management architecture
 
Progesterone Presentation
Progesterone PresentationProgesterone Presentation
Progesterone Presentation
 
Role of progesterone in Pregnancy
Role of progesterone in Pregnancy Role of progesterone in Pregnancy
Role of progesterone in Pregnancy
 
Chartered Accountant in India
Chartered Accountant in IndiaChartered Accountant in India
Chartered Accountant in India
 
Business Planning & Startup Strategies
Business Planning & Startup StrategiesBusiness Planning & Startup Strategies
Business Planning & Startup Strategies
 

Similar to Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks Case

Similar to Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks Case (20)

Gain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public DataGain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public Data
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk Analytics
 
Diving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging NewsDiving in Panama Papers and Open Data to Discover Emerging News
Diving in Panama Papers and Open Data to Discover Emerging News
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
euBusinessGraph Company and Economic Data
euBusinessGraph Company and Economic DataeuBusinessGraph Company and Economic Data
euBusinessGraph Company and Economic Data
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news content
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics Demo
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
 
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingAnalytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
 
KDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit Hamutcu
KDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit HamutcuKDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit Hamutcu
KDD 2019 IADSS Workshop - Research Updates from Usama Fayyad & Hamit Hamutcu
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
Tracxn Research — Big Data Infrastructure Landscape, September 2016
Tracxn Research — Big Data Infrastructure Landscape, September 2016Tracxn Research — Big Data Infrastructure Landscape, September 2016
Tracxn Research — Big Data Infrastructure Landscape, September 2016
 
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment PerformanceWebinar: Lucidworks + Thomson Reuters for Improved Investment Performance
Webinar: Lucidworks + Thomson Reuters for Improved Investment Performance
 
Meetup Data-science OVH
Meetup Data-science OVHMeetup Data-science OVH
Meetup Data-science OVH
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakeseccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
 
Knowledge Graphs Webinar- 11/7/2017
Knowledge Graphs Webinar- 11/7/2017Knowledge Graphs Webinar- 11/7/2017
Knowledge Graphs Webinar- 11/7/2017
 
Predictive Analytics World Chicago 2015
Predictive Analytics World Chicago 2015Predictive Analytics World Chicago 2015
Predictive Analytics World Chicago 2015
 
Advanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time SpeedAdvanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time Speed
 

More from Connected Data World

The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
Connected Data World
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
Connected Data World
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
Connected Data World
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
Connected Data World
 

More from Connected Data World (20)

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van Harmelen
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine Learning
 
Graphs in sustainable finance
Graphs in sustainable financeGraphs in sustainable finance
Graphs in sustainable finance
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scale
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks Case

  • 1. Powerful Information Discovery with Big Knowledge Graphs – The Offshore Leaks Case Ontotext, July 2016
  • 2. Data - Content - User • Psycho-graphic vs. demographic profiles • Build behavioural profiles on the basis of semantic metadata associated with the assets • Control results bias with runtime parameters • Create semantic fingerprints of assets • Driven off of a knowledge graph • Automatically adapts through machine learning • Semantic Database • Replication Cluster for enterprise clients • Connectors to 3rd party indexing/storage products & hybrid queries
  • 3. Data Layer – the Core Semantic Fingerprints of Content Instance Data / Relationships / Facts Ontology / Schema / Domain Model GraphDB Node Zoom In Node 1 Node 3 Master 1 Master 2 Enterprise
  • 5. Personalization – User Actions Model perform comments votes posts preview read contains leads to read leads to preview Article Search Action Result Date FTS Q. Tag Cat Tag set results cat taxonomy Search Log ------------- ------------- ------------- ------------- -------------
  • 6. Quick news-analytics case • Our Dynamic Semantic Publishing platform offers linking of text with big open data graphs • One can navigate from text to concepts, get trends, related entities and news • Try it at http://now.ontotext.com
  • 7. FF-NEWS: Data Integration and Loading • DBpedia (the English version only) 496M statements • Geonames (all geographic features on Earth) 150M statements − owl:sameAs links between DBpedia and Geonames 471K statements • Company registry data (GLEI) 3M statements • News metadata (from NOW) 128M statements • Total size: 986М statements − Mapped to FIBO; 667M explicit statements + 318M inferred statements − RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-spatial constraints Open data integration for news analytics
  • 9. News Metadata • Metadata from Ontotext’s Dynamic Semantic Publishing platform − Automatically generated as part of the NOW.ontotext.com semantic news showcase • News stream from Google since Feb 2015, about 10k news/month − ~70 tags (annotations) per news article • Tags link text mentions of concepts to the knowledge graph − Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases Apr 2016Hidden Relationships in Data and Risk Analytics
  • 10. News Metadata Apr 2016Hidden Relationships in Data and Risk Analytics Category Count International News 52 074 Science and Technology 23 201 Sports 20 714 Business 15 155 Lifestyle 11 684 122 828 Mentions / entity type Count Keyphrase 2 589 676 Organization 1 276 441 Location 1 260 972 Person 1 248 784 Work 309 093 Event 258 388 RelationPersonRole 236 638 Species 180 946
  • 11. Sample queries at http://ff-news.ontotext.com F1: Big cities in Eastern Europe F2: Airports near London F3: People and organizations related to Google F4: Top-level industries by number of companies F5: Mentions in the news of an organization and its related entities F7: Most popular companies per industry, including children F8: Regional exposition of company – normalized FF-NEWS is in Beta. Not officially launched, but available to play with. Open data integration for news analytics
  • 12. News Popularity Ranking: Automotive Open data integration for news analytics Rank Company News # Rank Company incl. mentions of child companies News # 1 General Motors 2722 1 General Motors 4620 2 Tesla Motors 2346 2 Volkswagen Group 3999 3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658 4 Ford Motor Company 1934 4 Tesla Motors 2370 5 Toyota 1325 5 Ford Motor Company 2125 6 Chevrolet 1264 6 Toyota 1656 7 Chrysler 1054 7 Renault-Nissan Alliance 1332 8 Fiat Chrysler Automobiles 1011 8 Honda 864 9 Audi AG 972 9 BMW 715 10 Honda 717 10 Takata Corporation 547
  • 13. News Popularity: Finance Open data integration for news analytics Rank Company News # Rank Company incl. mentions of controlled News # 1 Bloomberg L.P. 3203 1 Intra Bank 261667 2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731 3 JP Morgan Chase 1712 3 China Merchants Bank 38288 4 Wells Fargo 1688 4 Alphabet Inc. 22601 5 Citigroup 1557 5 Capital Group Companies 4076 6 HSBC Holdings 1546 6 Bloomberg L.P. 3611 7 Deutsche Bank 1414 7 Exor 2704 8 Bank of America 1335 8 Nasdaq, Inc. 2082 9 Barclays 1260 9 JP Morgan Chase 1972 10 UBS 694 10 Sentinel Capital Partners 1053 Note: Including investment funds, stock exchanges, agencies, etc.
  • 14. News Popularity: Banking Open data integration for news analytics Rank Company News # Rank Company incl. mentions of controlled News # 1 Goldman Sachs 996 1 China Merchants Bank * 38288 2 JP Morgan Chase 856 2 JP Morgan Chase 1972 3 HSBC Holdings 773 3 Goldman Sachs 1030 4 Deutsche Bank 707 4 HSBC 966 5 Barclays 630 5 Bank of America 771 6 Citigroup 519 6 Deutsche Bank 742 7 Bank of America 445 7 Barclays 681 8 Wells Fargo 422 8 Citigroup 630 9 UBS 347 9 Wells Fargo 428 10 Chase 126 10 UBS 347
  • 15. Offshore Leaks Database from ICIJ • Published by the International Consortium of Investigative Journalists (ICIJ) on 9th of May • A “searchable database” about 320 000 offshore companies − 214 000 extracted from Panama Papers (valid until 2015) − More than 100 000 from 2013 Offshore leaks investigation (valid until 2010) • CSV extract from a graph database available for download • https://offshoreleaks.icij.org/ Open data integration for news analytics
  • 17. Offshore Leaks DB as Linked Open Data • Ontotext published the Offshore Leaks DB as Linked Open Data • Available for exploration, querying and download at http://data.ontotext.com • ONTOTEXT DISCLAIMERS We use the data as is provided by ICIJ. We make no representations and warranties of any kind, including warranties of title, accuracy, absence of errors or fitness for particular purpose. All transformations, query results and derivative works are used only to showcase the service and technological capabilities and not to serve as basis for any statements or conclusions. Open data integration for news analytics
  • 18. Enrichment and structuring of the data • Relationship type hierarchy − About 80 types of relationship types in the original dataset got organized in a property hierarchy • Classification of officers into Person and Company − In the original database there is no way to distinguish whether an officer is a physical person • Mapping to DBPedia: − 209 countries referred in Offshore Leaks DB are mapped to DBPedia − About 3000 persons and 300 companies mapped to DBPedia • Overall size of the repository: 22M statements (20M explicit) Open data integration for news analytics
  • 19. The RDF-ization Process • Linked data variant produced without programming − The raw CSV files are RDF-ized using TARQL, http://tarql.github.io/ − Data was further interlinked and enriched in GraphDB using SPARQL • The process is documented in this README file • All relevant artifacts are open-source, available at https://github.com/Ontotext-AD/leaks/ • The entire publishing and mapping took about 15 person-days. − Including data.ontotext.com portal setup, promotion, documentation, etc. Open data integration for news analytics
  • 20. Sample queries at http://data.ontotext.com Q1: Countries by number of entities related to them Q2: Country pairs by ownership statistics Q3: Statistics by incorporation year Q4: Officers and entities by number of capital relations Q5: Countries in Eastern Europe by number of owners Q6: Intermediaries in Asia by name Q7: The best connected officers Q8: Countries by number of Person and Company officers
  • 21. Play with semantically enriched news: http://now.ontotext.com Play with open data at http://data.ontotext.com and http://ff- news.ontotext.com