Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
The Bounties of Semantic Data Integration for the Enterprise Ontotext
If you are looking for solutions that allow you not only to manage all of your data (structured, semi-structured and unstructured) but to also make the most out of them, using a common language is critical.
Adding Semantic Technology to data integration is the glue that holds together all your enterprise data and their relationships in a meaningful way.
Learn how you can quickly design data processing jobs and integrate massive amounts of data and see what semantic integration can do for your data and your business.
www.ontotext.com
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
These are slides from a live webinar taken place January 2018.
GraphDB™ Fundamentals builds the basis for working with graph databases that utilize the W3C standards, and particularly GraphDB™. In this webinar, we demonstrated how to install and set-up GraphDB™ 8.4 and how you can generate your first RDF dataset. We also showed how to quickly integrate complex and highly interconnected data using RDF and SPARQL and much more.
With the help of GraphDB™, you can start smartly managing your data assets, visually represent your data model and get insights from them.
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
This webinar continues series are demonstrating how linked open data and semantic tagging of news can be used for comprehensive media monitoring, market and business intelligence. The platform for the demonstrations is FactForge: a hub for news and data about people, organizations, and locations (POL). FactForge embodies a big knowledge graph (BKG) of more than 1 billion facts that allows various analytical queries, including tracing suspicious patterns of company control; media monitoring of people, including companies owned by them, their subsidiaries, etc.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
Knowledge graphs - it’s what all businesses now are on the lookout for. But what exactly is a knowledge graph and, more importantly, how do you get one? Do you get it as an out-of-the-box solution or do you have to build it (or have someone else build it for you)? With the help of our knowledge graph technology experts, we have created a step-by-step list of how to build a knowledge graph. It will properly expose and enforce the semantics of the semantic data model via inference, consistency checking and validation and thus offer organizations many more opportunities to transform and interlink data into coherent knowledge.
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingOntotext
A presentation of Ontotext’s CEO Atanas Kiryakov, given during Semantics 2018 - an annual conference that brings together researchers and professionals from all over the world to share knowledge and expertise on semantic computing.
Property graph vs. RDF Triplestore comparison in 2020Ontotext
This presentation goes all the way from intro "what graph databases are" to table comparing the RDF vs. PG plus two different diagrams presenting the market circa 2020
The Bounties of Semantic Data Integration for the Enterprise Ontotext
If you are looking for solutions that allow you not only to manage all of your data (structured, semi-structured and unstructured) but to also make the most out of them, using a common language is critical.
Adding Semantic Technology to data integration is the glue that holds together all your enterprise data and their relationships in a meaningful way.
Learn how you can quickly design data processing jobs and integrate massive amounts of data and see what semantic integration can do for your data and your business.
www.ontotext.com
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
These are slides from a live webinar taken place January 2018.
GraphDB™ Fundamentals builds the basis for working with graph databases that utilize the W3C standards, and particularly GraphDB™. In this webinar, we demonstrated how to install and set-up GraphDB™ 8.4 and how you can generate your first RDF dataset. We also showed how to quickly integrate complex and highly interconnected data using RDF and SPARQL and much more.
With the help of GraphDB™, you can start smartly managing your data assets, visually represent your data model and get insights from them.
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
This webinar continues series are demonstrating how linked open data and semantic tagging of news can be used for comprehensive media monitoring, market and business intelligence. The platform for the demonstrations is FactForge: a hub for news and data about people, organizations, and locations (POL). FactForge embodies a big knowledge graph (BKG) of more than 1 billion facts that allows various analytical queries, including tracing suspicious patterns of company control; media monitoring of people, including companies owned by them, their subsidiaries, etc.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
Knowledge graphs - it’s what all businesses now are on the lookout for. But what exactly is a knowledge graph and, more importantly, how do you get one? Do you get it as an out-of-the-box solution or do you have to build it (or have someone else build it for you)? With the help of our knowledge graph technology experts, we have created a step-by-step list of how to build a knowledge graph. It will properly expose and enforce the semantics of the semantic data model via inference, consistency checking and validation and thus offer organizations many more opportunities to transform and interlink data into coherent knowledge.
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingOntotext
A presentation of Ontotext’s CEO Atanas Kiryakov, given during Semantics 2018 - an annual conference that brings together researchers and professionals from all over the world to share knowledge and expertise on semantic computing.
Property graph vs. RDF Triplestore comparison in 2020Ontotext
This presentation goes all the way from intro "what graph databases are" to table comparing the RDF vs. PG plus two different diagrams presenting the market circa 2020
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
Linked Data Experiences at Springer NatureMichele Pasin
An overview of how we're using semantic technologies at Springer Nature, and an introduction to our latest product: www.scigraph.com
(Keynote given at http://2016.semantics.cc/, Leipzig, Sept 2016)
Diving in Panama Papers and Open Data to Discover Emerging NewsOntotext
Get guidance through the gigantic sea of freely released data from Panama Papers as well as Linked Open Data could. You will learn how it can empower you understanding of today’s news or any other information source.
Gain Super Powers in Data Science: Relationship Discovery Across Public DataOntotext
What data scientists know better than anybody else is that data relationship is what matters the most. You can’t understand your data if you look at it as pieces in data silos.
In this webinar we’ll showcase how to discover relationships across public data.
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
Many issues are faced by scholars, book researchers, museum directors who try to find the underlying connection between resources. Scholars in particular continuously emphasizes the role of digital humanities and the value of linked data in cultural heritage information systems.
Efficient Practices for Large Scale Text Mining ProcessOntotext
Text mining is a need when managing large scale textual collections. It facilitates access to, otherwise, hard to organise unstructured and heterogeneous documents, allows for extraction of hidden knowledge and opens new dimensions in data exploration.
In this webinar, Ivelina Nikolova, PhD, shares best practices and text analysis examples from successful text mining process in domains like news, financial and scientific publishing, pharma industry and cultural heritage.
[Conference] Cognitive Graph Analytics on Company Data and NewsOntotext
Atanas Kiryakov, Ontotext's CEO, presented at the Data Day Texas 2018 conference, which took place in Austin, TX, USA, on January 27th.
Ontotext's talk was part of the Graph Day Sessions and its focus was 'Cognitive graph analytics on company data and news', aiming to demonstrate the power of Graph Analytics to create links between various datasets and lead to knowledge discovery.
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
Imagine risk analysis manager or compliance officer who can discover easily relationships like this: Big Bucks Café out of Seattle controls My Local Café in NYC through an offshore company. Such discovery can be a game changer if My Local Café pretends to be an independent small enterprise, while recently Big Bucks experiences financial difficulties.
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
The Physics Department of the University of Cagliari and the Linkalab Group invited me to talk about the Semantic Web and Linked Data - this is simply an introduction to the technologies involved.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
This presentation will discuss how just a few parts of the Semantic Web Cake can already boost your analytics by making your (big) data smarter and even more connected.
A Semantic Data Model for Web ApplicationsArmin Haller
This presentation gives a short overview of the Semantic Web, RDFa and Linked Data. The second part briefly discusses ActiveRaUL, our model and system for developing form-based Web applications using Semantic Web technologies.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgJindřich Mynarz
The presentation describes a tool for validating and previewing instances of Schema.org JobPosting described in structured data markup embedded in web pages. The validator and preview was developed to assist users of Schema.org to produce data of better quality. In this way, it tries to enhance usability of a part of Schema.org covering the domain of job postings. The paper discusses implementation of the tool and design of its validation rules based on SPARQL 1.1. Results of experimental validation of a job posting corpus harvested from the Web are presented. Among other findings, the results indicate that publishers of Schema.org JobPosting data often misunderstand precedence rules employed by markup parsers and that they ignore case-sensitivity of vocabulary names.
How is the Semantic Web vision unfolding and what does it take for the Web to fully reach its potential and evolve from a Web of Documents to a Web of Data through universal data representation standards.
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
Linked Data Experiences at Springer NatureMichele Pasin
An overview of how we're using semantic technologies at Springer Nature, and an introduction to our latest product: www.scigraph.com
(Keynote given at http://2016.semantics.cc/, Leipzig, Sept 2016)
Diving in Panama Papers and Open Data to Discover Emerging NewsOntotext
Get guidance through the gigantic sea of freely released data from Panama Papers as well as Linked Open Data could. You will learn how it can empower you understanding of today’s news or any other information source.
Gain Super Powers in Data Science: Relationship Discovery Across Public DataOntotext
What data scientists know better than anybody else is that data relationship is what matters the most. You can’t understand your data if you look at it as pieces in data silos.
In this webinar we’ll showcase how to discover relationships across public data.
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
Many issues are faced by scholars, book researchers, museum directors who try to find the underlying connection between resources. Scholars in particular continuously emphasizes the role of digital humanities and the value of linked data in cultural heritage information systems.
Efficient Practices for Large Scale Text Mining ProcessOntotext
Text mining is a need when managing large scale textual collections. It facilitates access to, otherwise, hard to organise unstructured and heterogeneous documents, allows for extraction of hidden knowledge and opens new dimensions in data exploration.
In this webinar, Ivelina Nikolova, PhD, shares best practices and text analysis examples from successful text mining process in domains like news, financial and scientific publishing, pharma industry and cultural heritage.
[Conference] Cognitive Graph Analytics on Company Data and NewsOntotext
Atanas Kiryakov, Ontotext's CEO, presented at the Data Day Texas 2018 conference, which took place in Austin, TX, USA, on January 27th.
Ontotext's talk was part of the Graph Day Sessions and its focus was 'Cognitive graph analytics on company data and news', aiming to demonstrate the power of Graph Analytics to create links between various datasets and lead to knowledge discovery.
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
Imagine risk analysis manager or compliance officer who can discover easily relationships like this: Big Bucks Café out of Seattle controls My Local Café in NYC through an offshore company. Such discovery can be a game changer if My Local Café pretends to be an independent small enterprise, while recently Big Bucks experiences financial difficulties.
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
The Physics Department of the University of Cagliari and the Linkalab Group invited me to talk about the Semantic Web and Linked Data - this is simply an introduction to the technologies involved.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
This presentation will discuss how just a few parts of the Semantic Web Cake can already boost your analytics by making your (big) data smarter and even more connected.
A Semantic Data Model for Web ApplicationsArmin Haller
This presentation gives a short overview of the Semantic Web, RDFa and Linked Data. The second part briefly discusses ActiveRaUL, our model and system for developing form-based Web applications using Semantic Web technologies.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgJindřich Mynarz
The presentation describes a tool for validating and previewing instances of Schema.org JobPosting described in structured data markup embedded in web pages. The validator and preview was developed to assist users of Schema.org to produce data of better quality. In this way, it tries to enhance usability of a part of Schema.org covering the domain of job postings. The paper discusses implementation of the tool and design of its validation rules based on SPARQL 1.1. Results of experimental validation of a job posting corpus harvested from the Web are presented. Among other findings, the results indicate that publishers of Schema.org JobPosting data often misunderstand precedence rules employed by markup parsers and that they ignore case-sensitivity of vocabulary names.
How is the Semantic Web vision unfolding and what does it take for the Web to fully reach its potential and evolve from a Web of Documents to a Web of Data through universal data representation standards.
Best Practices for Large Scale Text Mining ProcessingOntotext
Q&A:
NOW facilitates semantic search by having annotations attached to search strings. How compolex does that get, e.g. with wildcards between annotated strings?
NOW’s searchbox is quite basic at the moment, but still supports a few scenarios.
1. Pure concept/faceted search - search for all documents containing a concept or where a set of concepts are co-occurring. Ranking is based on frequence of occurrence.
2. Concept/faceted + Full Text search - search for both concepts and particular textual term of phrase.
3. Full text search
With search, pretty much anything can be done to customise it. For the NOW showcase we’ve kept it fairly simple, as usually every client has a slightly different case and wants to tune search in a slightly different direction.
The search in NOW is faceted which means that you search with concepts (facets) and you retrieve all documents which contain mentions of the searched concept. If you search by more than one facet the engine retrieves documents which contain mentions of both concepts but there is no restriction that they occur next to each other.
Is the tagging service expandable (say with custom ontologies)? also is it a something you offer as a service? it is unclear to me from the website.
The TAG service is used for demonstration purposes only. The models behind it are trained for annotating news articles. The pipeline is customizable for every concrete scenario, different domains and entities of interest. You can access several of our pipelines as a service through the S4 platform or you can have them hosted as an on premise solution. In some cases our clients want domain adaptation or improvements in particular area, or to tag with their internal dataset - in this case we offer again an on premise deployment and also a managed service hosted on our hardware.
Hdoes your system accomodate cluster analysis using unsupervised keyword/phrase annotation for knowledge discovery?
As much as the patterns of user behaviour are also considered knowledge discovery we employ these for suggesting related reads. Apart from these we have experience tailoring custom clustering pipelines which also rely on features like keyword and named entities.
For topic extraction how many topics can we extract? from twitter corpus wgat csn we infer?
For topic extraction we have determined that we obtain best results when suggesting 3 categories. These are taken from IPTC but only the uppermost levels which are less than 20.
The twitter corpus example is from a project Ontotext participates in called Pheme. The goal of the project is to detect rumours and to check their veracity, thus help journalists in their hunt for attractive news.
Do you provide Processing Resources and JAPE rules for GATE framework and that can be used with GATE embedded?
We are contributing to the GATE framework and everything which has been wrapped up as PRs has been included the corresponding GATE distributions.
Why and how a graph database can serve you better (and at a lower cost) than a relational database when it comes to representing, storing and querying highly interconnected data
Thank you for your interest in downloading our webinar, Analytics for 2014: The Numbers that Matter.
In this webinar, you'll learn about important metrics for web, email, mobile and social channels. When you collect the numbers that matter, you'll learn what is happening. From there, you can hypothesize why of certain analytics and create a plan to optimize and improve. Now that's Smart Marketing!
The technologies and people we are designing experiences for are constantly changing, in most cases they are changing at a rate that is difficult keep up with. When we think about how our teams are structured and the design processes we use in light of this challenge, a new design problem (or problem space) emerges, one that requires us to focus inward. How do we structure our teams and processes to be resilient? What would happen if we looked at our teams and design process as IA’s, Designers, Researchers? What strategies would we put in place to help them be successful? This talk will look at challenges we face leading, supporting, or simply being a part of design teams creating experiences for user groups with changing technological needs.
Antidot Semantic Publishing - Réussir un site éditorial agrégeant plusieurs s...Antidot
Pour lancer à l'été 2013 Ilosport.fr, le premier portail Internet multisports dédié à la pratique sportive, L'Équipe souhaitait proposer une information complète autour des disciplines sportives les plus pratiquées en France avec la genèse et l'histoire de chaque sport, des conseils forme, matériel et sécurité, et toutes les informations utiles sur les lieux de pratique ainsi qu'un agenda d'événements.Pour offrir cette richesse de contenus, L'Équipe s'est appuyée sur plusieurs fournisseurs de données, institutionnels et privés. Antidot Information Factory a facilité la constitution d'une base d'informations très riche et le moteur de recherche sémantique Antidot Finder Suite a simplifié la restitution de ces informations au sein d'une interface web.
Avec le témoignage de Frédérique Lancien, Directrice Digital et New Business du groupe L'Équipe
Presentation delivered in the context of the Agricultural Data Interoperability WG meeeting, during the RDA 3rd Plenary Meeting in Dublin, Ireland. 26/3/2014.
The presentation is mostly focused on the work done by the agINFRA project towards proposing a methodology for the definition of Germplasm descriptors as RDF, based on the existing work of experts in the field and making use of the existing effort in this direction.
Publishing Germplasm Vocabularies as Linked DataValeria Pesce
What has already been published?
What may still be needed?
How to do it?
This presentation is a part of the 3rd Session of the 1st International e-Conference on Germplasm Data Interoperability https://sites.google.com/site/germplasminteroperability/
What is GraphDB and how can it help you run a smart data-driven business?
Learn about GraphDB through the solutions it offers in a simple and easy to understand way. In the slides below we have unpacked GraphDB for you, using as little tech talk as possible.
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...Dag Endresen
Presentation on the Darwin Core standard for data exchange and the germplasm extension for genebanks during the 2014 workshop of the ECPGR Documentation and Information Working Group "Tailoring the Documentation of Plant Genetic Resources in Europe to the Needs of the User" (http://www.ecpgr.cgiar.org/working_groups/documentation_information/docinfo2014.html) in Prague-Ruzyně, Czech Republic, 20th May 2014.
Short URL: https://goo.gl/C5UEnU
DOI: http://doi.org/10.13140/RG.2.2.10865.28006
Linked Open Data-enabled Strategies for Top-N RecommendationsCataldo Musto
Linked Open Data-enabled Strategies for Top-N Recommendations - Cataldo Musto, Pierpaolo Basile, Pasquale Lops, Marco De Gemmis and Giovanni Semeraro - 1st Workshop on New Trends in Content-based Recommender Systems, co-located with ACM Recommender Systems 2014
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Connected Data World
Borislav Popov's slides from his lightning talk at Connected Data London. Borislav - a Director of Business Development at Ontotext presented Ontotext's approach to tackling the Panama Papers leak. Using a technology that is a mix between semantic web and graph databases.
Boost your data analytics with open data and public news contentOntotext
Get guidance through the gigantic sea of freely available Open Data and learn how it can empower you analysis of any kind of sources.
This webinar is a live demo of news and data analytics, based on rich links within big knowledge graphs. It will show you how to:
Build ranking reports (e.g for people and organisations)
View topics linked implicitly (e.g. daughter companies, key personnel, products …)
Draw trend lines
Extend your analytics with additional data sources
Open Data and News Analytics Demo from the 4th Sofia Open Data & Linked Data meetup
http://www.meetup.com/Sofia-Open-Data-Linked-Data-Meetup/events/228747999/
Mar'2016, Sofia | BG
This slide deck has been prepared for a workshop on Linked Data Publishing and Semantic Processing using the Redlink platform (http://redlink.co). The workshop delivered at the Department of Information Engineering, Computer Science and Mathematics at Università degli Studi dell'Aquila aimed at providing a general understanding of Semantic Web Technologies and how these can be used in real world use cases such as Salzburgerland Tourismus.
A brief introduction has been also included on MICO (Media in Context) a European Union part-funded research project to provide cross-media analysis solutions for online multimedia producers.
As part of the final BETTER Hackathon, project partners prepared 4 hackathon exercises. Fraunhofer IAIS organised this exercise in conjunction with external partner MKLab ITI-CERTH (EOPEN project). This step-by-step exercise featured the setup of local Docker images on Linux OS featuring Dcoker Compose and (pre-installed) Python, SANSA, Hadoop, Apache Spark and Apache Zeppelin. It featured semantic transformation and and the use of SANSA (Scalable Semantic Analytics Stack - http://sansa-stack.net/) libraries on a sample of tweets ahead of geo-clustering.
Project website (Hackathon information): https://www.ec-better.eu/pages/2nd-hackathon
Github repository: https://github.com/ec-better/hackathon-2020-semanticgeoclustering
Open Data Portals: 9 Solutions and How they CompareSafe Software
Get a comparison of CKAN, Socrata, ArcGIS Open Data and other top open data solutions. Plus get answers to best practice questions such as: Which datasets are important to share? What are the approximate costs? Which file formats should the data be shared in? How often should the data get updated? And overall, how can we ensure success with our open data portal?
An introduction deck for the Web of Data to my team, including basic semantic web, Linked Open Data, primer, and then DBpedia, Linked Data Integration Framework (LDIF), Common Crawl Database, Web Data Commons.
How google is using linked data today and vision for tomorrowVasu Jain
In this presentation, I will discuss how modern search engines, such as Google, make use of Linked Data spread inWeb pages for displaying Rich Snippets. Also i will present an example of the technology and analyze its current uptake.
Then i sketched some ideas on how Rich Snippets could be extended in the future, in particular for multimedia documents.
Original Paper :
http://scholar.google.com/citations?view_op=view_citation&hl=en&user=K3TsGbgAAAAJ&authuser=1&citation_for_view=K3TsGbgAAAAJ:u-x6o8ySG0sC
Another Presentation by Author: https://docs.google.com/present/view?id=dgdcn6h3_185g8w2bdgv&pli=1
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
slides from the S4 webinar "On-Demand RDF Graph Databases in the Cloud"
RDF database-as-a-service running on the Self-Service Semantic Suite (S4) platform: http://s4.ontotext.com
video recording of the talk is available at http://info.ontotext.com/on-demand-rdf-graph-database
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
In the space of just a few years we’ve seen the transformational power of open data; both for transparency and accountability in public data, and efficiency and innovation with businesses in private data. In its first year, institutions and individuals throughout Europe have supported public sector bodies in releasing data and numerous start-ups, developers and SMEs in reusing this data for economic benefit.
However, we are still at the beginning of the open data movement, and there is still more that can be done to make open data simpler to use and to make it available to a wider audience.
The core goal of the DaPaaS project is to provide a Data- and Platform-as-a-Service environment, where 3rd parties (such as governmental organisations, SMEs, developers and larger companies) can publish and host both data sets and data-intensive applications, which can then be accessed by end-user applications in a cross-platform manner. You can find out more about DaPaaS on the detailed about page.
Essentially, DaPaaS aims to make publishing, consumption, and reuse of open data, as well as deploying open data applications, easier and cheaper for SMEs and small public bodies which otherwise may not have sufficient technical expertise, infrastructure and resources required to do so.
see also http://www.slideshare.net/eswcsummerschool/wed-roman-tutopendatapub-38742186
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
There was a time when the Enterprise Data Warehouse (EDW) was the only way to provide a 360-degree analytical view of the business. In recent years many organizations have deployed disparate analytics alternatives to the EDW, including: cloud data warehouses, machine learning frameworks, graph databases, geospatial tools, and other technologies. Often these new deployments have resulted in the creation of analytical silos that are too complex to integrate, seriously limiting global insights and innovation.
Join guest speaker, 451 Research’s Jim Curtis and Pivotal’s Jacque Istok for an interactive discussion about some of the overarching trends affecting the data warehousing market, as well as how to build a next generation data platform to accelerate business innovation. During this webinar you will learn:
- The significance of a multi-cloud, infrastructure-agnostic analytics
- What is working and what isn’t, when it comes to analytics integration
- The importance of seamlessly integrating all your analytics in one platform
- How to innovate faster, taking advantage of open source and agile software
Speakers: James Curtis, Senior Analyst, Data Platforms & Analytics, 451 Research & Jacque Istok, Head of Data, Pivotal
GeoLinked Data (.es) is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data. This initiative started off by publishing diverse information sources belonging to the Spanish National Geographic Institute. Such sources are made available as RDF (Resource Description Framework) knowledge bases according to the Linked Data principles. With this work, Spain has joined the Linked Data initiative, in which the United Kingdom and Germany are already participating. In this presentation, we provide an overview of the process that has been followed for the development of this initiative.
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
Presentation about http://worldwidesemanticweb.org/ given at SugarCamp#3 in Paris on April 12-13. The slides introduce the activities of the WWSW group centred around adapting Semantic Web technologies to be usable in challenging conditions.
Similar to The Power of Semantic Technologies to Explore Linked Open Data (20)
It Don’t Mean a Thing If It Ain’t Got SemanticsOntotext
With the tons of bits of data around enterprises and the challenge to turn these data into knowledge, meaning is arguably in the systems of the best database holder.
Turning data pieces into actionable knowledge and data-driven decisions takes a good and reliable database. The RDF database is one such solution.
It captures and analyzes large volumes of diverse data while at the same time is able to manage and retrieve each and every connection these data ever get to enter in.
In our latest slides, you will find out why we believe RDF graph databases work wonders with serving information needs and handling the growing amounts of diverse data every organization faces today.
[Webinar] GraphDB Fundamentals: Adding Meaning to Your DataOntotext
In this webinar, Desislava Hristova demonstrated how to install and set-up GraphDB™ and how one can generate RDF dataset. She also showed how one can quickly integrate complex and highly interconnected data using RDF, how to write some simple SPARQL queries and more.
In a nutshell, this webinar is suitable for those who are new to RDF databases and would like to learn how they can smartly manage their data assets with GraphDB™.
Hercule: Journalist Platform to Find Breaking News and Fight Fake OnesOntotext
Hercule: a platform to help journalists detect emerging news topics, check their veracity, track an event as it unfolds and find the various angles in a story as it develops.
How to migrate to GraphDB in 10 easy to follow steps Ontotext
GraphDB Migration Service helps you institute Ontotext GraphDB™ as your new semantic graph database. GraphDB Migration Service helps you institute Ontotext GraphDB™ as your new semantic graph database.
Designed with a view to making your transitioning to GraphDB frictionless and resource-effective, GraphDB Migration Service provides the technical support and expertise you and your team of developers need to build a highly efficient architecture for semantic annotation, indexing and retrieval of digital assets.
With GraphDB Migration Services you will:
* Optimize the cost of managing the RDF database;
* Improve the performance of your system;
* Get the maximum value from your semantic solution.
GraphDB Cloud: Enterprise Ready RDF Database on DemandOntotext
GraphDB Cloud is an enterprise grade RDF graph database providing high-performance querying over large volumes of RDF data. On this webinar, Ontotext demonstrates how to instantly create and deploy a fully managed Graph Database, then import & query data with the (OpenRDF) GraphDB Workbench, and finally explore and visualize data with the build in visualization tools.
Smarter content with a Dynamic Semantic Publishing PlatformOntotext
Personalized content recommendation systems enable users to overcome the information overload associated with rapidly changing deep and wide content streams such as news. This webinar discusses Ontotext’s latest improvements to its Dynamic Semantic Publishing (DSP) platform NOW (News on the Web). The Platform includes social data mining, web usage mining, behavioral and contextual semantic fingerprinting, content typing and rich relationship search.
Semantic Data Normalization For Efficient Clinical Trial ResearchOntotext
Clinical trials, both public and proprietary, hold a huge amount of valuable information. Acquiring knowledge from that information in a cost and time efficient manner is a major industry pain point.
Although information from clinical trials is stored in structured or semi-structured form, it is rarely coded with medical terminologies, which creates a significant level of ambiguity and increases the effort for data preparation for analytical purposes.
Gaining Advantage in e-Learning with Semantic Adaptive TechnologyOntotext
In this presentation, we will introduce you to a solution that involves adaptive semantic technology for educational institutions and e-learning providers. You will learn how to integrate 3rd party resources, legacy assets, and other content sources to create the so-called knowledge graph of all structured and unstructured data.
Why Semantics Matter? Adding the semantic edge to your content,right from au...Ontotext
We’ll address a few of the basic industry pain points and show how semantics can come to the rescue, including:
How semantics can add value across the various phases of digital product development lifecycle.
Contextual authoring and content curation through automated editorial workflow solutions.
Enhanced content discoverability through relevant recommendations.
Coming together of bulletproof content delivery platform and dynamic semantic publishing technology
Adding Semantic Edge to Your Content – From Authoring to DeliveryOntotext
Within the last few years we see and ever increasing demand for more accurate user specific content which on the other hand overwhelms content providers.This is where smart publishing platforms come into play. They aim at bringing the right content at the right time – digested, easy to comprehend, fast to navigate, and tailored to the readers’ personal interests.
The technologies that power them help publishers to automate the metadata enrichment process, making it more consistent, accurate and rich.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
The Power of Semantic Technologies to Explore Linked Open Data
1. The Power of
Semantic Technologies
to Explore
Linked Open Data
Graphorum & Smart Data Conference, Jan 2017
2. You will learn how to:
• Convert tabular data into RDF
• Combine local and remote data in a single query
• Graphically explore the connectivity patterns in big diverse data
− 1B+ triples, 1000+ classes, 8 datasets
• Detect suspicious patterns of company control
• Filter news based on relationships between companies and people
• Rank companies per industry and region
3. Presentation Outline
•Use cases: Relation discovery and Media monitoring
•GraphDB’s OntoRefine conversion of tabular data in RDF
•FactForge: Open data and news about people and organizations
•Relationship Discovery Examples
•Media Monitoring Examples & Popularity Ranking
•Panama Papers and Global Legal Entity Identifier as Open Data
•Tracing Panama Papers entities in the news
5. Commercial Company
Database
(e.g. D&B)
Link data!
Reveal more!
Social Media
News
Wikipedia
Private
• Link diverse data in a
Knowledge Graph
• Analyze News and
Social Content
• Extract facts and
link content to data
• Interpret data in context
of big linked data
7. Relation Discovery Case
• Find suspicious
relationships like:
− Company in USA
− Controls another
company in USA
− Through a company in
an off-shore zone
• Show news
relevant to these
companies
8. Linking News to Big Knowledge Graphs
• The DSP platform
links text to
knowledge
graphs
• One can navigate
from news to
concepts,
entities and
topics, and from
there to other
news
Try it at http://now.ontotext.com
9. Semantic Media Monitoring
For each
entity:
•popularity
trends
•relevant
news
•related
entities
•knowledge
graph
information
Try it at http://now.ontotext.com
11. OntoRefine: Data Transformation to RDF
• Based on OpenRefine and integrated in the GraphDB Workbench
• Allows converting tabular data into RDF
− Supported formats are TSV, CSV, *SV, XLS, XLSX, JSON, XML, RDF as XML, and Google sheet
− Easily filter your data, edit its inconsistencies
− View the cleaned data as RDF
• Exposes a GraphDB SPARQL endpoint
− Transform your data using SPIN functions
− Import your data straight into a GraphDB repository
The Power of Semantic Technologies to Explore Linked Open Data Jan 2017 #11
12. OntoRefine: Uploading data
• Create new project
− From local / remote files
▪ Supported formats are TSV, CSV,
*SV, XLS, XLSX, JSON, XML, RDF as
XML, and Google sheet
▪ With the first opening of the file,
OntoRefine tries to recognize the
encoding of the text file and all
delimiters.
▪ Allows further fine-tuning of the
table configurations
− From clipboard
• Open / import a project
The Power of Semantic Technologies to Explore Linked Open Data Jan 2017 #12
13. OntoRefine: Viewing tabular data as RDF
• OpenRefine supports RDF as input only
• OntoRefine also supports RDF as output
The Power of Semantic Technologies to Explore Linked Open Data Jan 2017 #13
•Data shown as either records or rows
− A record combines multiple rows identifying the same
object and sharing the first column
•Data stored in a separate repository
− must not be mistaken with the current repository available
through GraphDB Workbench SPARQL tab
14. OntoRefine: RDF-izing data
• Transform data using a CONSTRUCT query
− in the OntoRefine SPARQL endpoint
− directly in the GraphDB SPARQL endpoint
• GraphDB 8.O supports SPIN functions:
− SPARQL functions for splitting a string
− SPARQL functions for parsing dates
− SPARQL functions for encoding URIs
The Power of Semantic Technologies to Explore Linked Open Data Jan 2017 #14
15. OntoRefine: Importing data in GraphDB
• After transforming the data, import it in the
current repository without leaving the
GraphDB Workbench
− Copy the endpoint of the OntoRefine project
− Go to GraphDB SPARQL menu
− Execute a query to import the results
The Power of Semantic Technologies to Explore Linked Open Data Jan 2017 #15
16. Combine local and remote data
• SPARQL Federation allows one to retrieve data from a remote end-
point in the middle of a query to a local repository
•For instance, to combine local data for GDP with information about
the area of each country from DBPedia to calculated GDP/sq.km.
Query GDP/Sq.km.
17. Federation example: GDP per Sq. Km.
SELECT DISTINCT ?name
(STR(?area) AS ?areaSqKm) (STR(?GDPperKm) AS ?GDPperSqKm)
{
?gdp2015prop gdp:forYear 2015 .
?country gdp:gdpCountry_Name ?name ; ?gdp2015prop ?gdp2015 .
{ SELECT (STR(?n) as ?name) ?area {
SERVICE <http://dbpedia.org/sparql> {
?c a dbo:Country ; rdfs:label ?n; dbp:areaKm ?area .
} } }
BIND(STR(ROUND(xsd:decimal(?gdp2015/1000000000))) AS ?gdp2015bil)
BIND(xsd:integer((?gdp2015) / ?area ) AS ?GDPperKm)
} ORDER BY DESC(?GDPperKm) LIMIT 10
18. FactForge: Open data and
news about people and
organizations
http://factforge.net
19. Our approach to Big Data
1. Integrate data from many sources
− Build a Big Knowledge Graph that integrates relevant
data from proprietary databases and taxonomies plus
millions of facts of Linked Data
1. Infer new facts and unveil
relationships
− Performing reasoning across different data sources
1. Interlink text and with big data
− Using text-mining to automatically discover references
to concepts and entities
1. Use graph database for metadata
management, querying and search
20. FactForge: Data Integration
DBpedia (the English version) 496M
Geonames (all geographic features on Earth) 150M
owl:sameAs links between DBpedia and Geonames 471K
Company registry data (GLEI) 3M
Panama Papers DB (#LinkedLeaks) 20M
Other datasets and ontologies: WordNet, WorldFacts, FIBO
News metadata (2000 articles/day enriched by NOW) 473M
Total size (1152M explicit + 322M inferred statements) 1 475М
21. News Metadata
• Metadata from Ontotext’s Dynamic Semantic Publishing platform
− News stream from Google
− Automatically generated as part of the NOW.ontotext.com semantic news showcase
•News stream from Google since Feb 2015, about 50k news/month
− ~70 tags (annotations) per news article
• Tags link text mentions of concepts to the knowledge graph
− Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases
22. New Metadata
Category Count
International 52 074
Science and Technology 23 201
Sports 20 714
Business 15 155
Lifestyle 11 684
122 828
Mentions / entity type Count
Keyphrase 2 589 676
Organization 1 276 441
Location 1 260 972
Person 1 248 784
Work 309 093
Event 258 388
RelationPersonRole 236 638
Species 180 946
News Metadata
23. Class Hierarchy Map (by number of instances)
Left: The big picture
Right: dbo:Agent class (2.7M organizations and persons)
24. Sample queries at http://factforge.net
• F1: Big cities in Eastern Europe
• F2: Airports near London
• F3: People and organizations related to Google
• F4: Top-level industries by number of companies
Available as Saved Queries at http://factforge.net/sparql
Note: Open Saved Queries with the folder icon in the upper-right corner
26. Offshore control example
• Query: Find companies, which control other companies in the same
country, through company in an off-shore zone
• How it works:
• Establish control-relationship
• Establish a company-country mapping
• Establish an “off-shore criteria”
• SPARQL it
27. Off-shore company control example
SELECT *
FROM onto:disable-sameAs
WHERE {
?c1 fibo-fnd-rel-rel:controls ?c2 .
?c2 fibo-fnd-rel-rel:controls ?c3 .
?c1 ff-map:orgCountry ?c1_country .
?c2 ff-map:orgCountry ?c2_country .
?c3 ff-map:orgCountry ?c1_country .
FILTER (?c1_country != ?c2_country)
?c2_country ff-map:hasOffshoreProvisions true .
}
29. Semantic Media Monitoring/Press-Clipping
• We can trace references to a specific company in the news
− This is pretty much standard, however we can deal with syntactic variations in the names,
because state of the art Named Entity Recognition technology is used
− What’s more important, we distinguish correctly in which mention “Paris” refers to which of the
following: Paris (the capital of France), Paris in Texas, Paris Hilton or to Paris (the Greek hero)
• We can trace and consolidate references to daughter companies
• We have comprehensive industry classification
− The one from DBPedia, but refined to accommodate identifier variations and specialization (e.g.
company classified as dbr:Bank will also be considered classified as dbr:FinancialServices)
30. Media Monitoring Queries
• F5: Mentions in the news of an organization and its related entities
• F7: Most popular companies per industry, including children
• F8: Regional exposition of company – normalized
31. News Popularity Ranking: Automotive
Rank Company News # Rank
Company incl. mentions of child
companies News #
1 General Motors 2722 1 General Motors 4620
2 Tesla Motors 2346 2 Volkswagen Group 3999
3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658
4 Ford Motor Company 1934 4 Tesla Motors 2370
5 Toyota 1325 5 Ford Motor Company 2125
6 Chevrolet 1264 6 Toyota 1656
7 Chrysler 1054 7 Renault-Nissan Alliance 1332
8 Fiat Chrysler Automobiles 1011 8 Honda 864
9 Audi AG 972 9 BMW 715
10 Honda 717 10 Takata Corporation 547
32. News Popularity: Finance
Rank Company News # Rank Company incl. mentions of controlled News #
1 Bloomberg L.P. 3203 1 Intra Bank 261667
2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731
3 JP Morgan Chase 1712 3 China Merchants Bank 38288
4 Wells Fargo 1688 4 Alphabet Inc. 22601
5 Citigroup 1557 5 Capital Group Companies 4076
6 HSBC Holdings 1546 6 Bloomberg L.P. 3611
7 Deutsche Bank 1414 7 Exor 2704
8 Bank of America 1335 8 Nasdaq, Inc. 2082
9 Barclays 1260 9 JP Morgan Chase 1972
10 UBS 694 10 Sentinel Capital Partners 1053
Note: Including investment funds, stock exchanges, agencies, etc.
33. News Popularity: Banking
Rank Company News # Rank Company incl. mentions of controlled News #
1 Goldman Sachs 996 1 China Merchants Bank * 38288
2 JP Morgan Chase 856 2 JP Morgan Chase 1972
3 HSBC Holdings 773 3 Goldman Sachs 1030
4 Deutsche Bank 707 4 HSBC 966
5 Barclays 630 5 Bank of America 771
6 Citigroup 519 6 Deutsche Bank 742
7 Bank of America 445 7 Barclays 681
8 Wells Fargo 422 8 Citigroup 630
9 UBS 347 9 Wells Fargo 428
10 Chase 126 10 UBS 347
35. Global Legal Entity Identifier (GLEI) data
• Global Markets Entity Identifier (GMEI) Utility data
− The Global Markets Entity Identifier (GMEI) utility is DTCC's legal entity identifier solution
offered in collaboration with SWIFT
− We downloaded as XML data dump from https://www.gmeiutility.org/
• RDF-ized company records
− Fields: LEI#, legal name, ultimate parent, registered country
− 3M explicit statements for 211 thousand organizations
▪ For comparison, there are 490 000 organizations in DBPeda and D&B covers above 200 million
− 10,821 ultimate parent relationships and 1632 ultimate parents
• 2 800 organizations from the GLEI dump mapped to DBPedia
36. GLEI Company Data Sample: ABN-AMRO
lei:businessRegistry Kamer van Koophandel
lei:businessRegistryNumber 34334259
lei:duplicateReference data:549300T5O0D0T4V2ZB28
lei:entityStatus ACTIVE
lei:headquartersCity Amsterdam
lei:headquartersState Noord-Holland
lei:legalForm NAAMLOZE VENNOOTSCHAP
lei:legalName ABN AMRO Bank N.V.
lei:lei BFXS5XCH7N0Y05NIXW11
lei:registeredCity Amsterdam
lei:registeredCountry NL
lei:registeredPostCode 1082 PP
lei:registeredState Noord-Holland
GLEI Company Data Sample: ABN-AMRO
37. Ultimate parent Children Country
1 The Goldman Sachs Group, Inc. 1 851 US
2 United Technologies Corporation 427 US
3 Honeywell International Inc. 341 US
4 Morgan Stanley 228 US
5 Cargill, Incorporated 217 US
6 1832 Asset Management L.P. 202 CA
7 Aegon N.V. 174 NL
8 Union Bancaire Privée, UBP SA 138 CH
9 Citigroup Inc. 135 US
10 State Street Corporation 128 US
Country Companies
1 dbr:United_States 103 548
2 dbr:Canada 17 425
3 dbr:Luxembourg 13 984
4 dbr:Sweden 7 934
5 dbr:United_Kingdom 7 421
6 dbr:Belgium 6 868
7 dbr:Ireland 4 762
8 dbr:Australia 4 385
9 dbr:Germany 3 039
10 dbr:Netherlands 2 561
Global Legal Entity Identifier (GLEI) data
38. Offshore Leaks Database from ICIJ
• Published by the International Consortium of Investigative
Journalists (ICIJ) on 9th of May
• A “searchable database” about 320 000 offshore companies
− 214 000 extracted from Panama Papers (valid until 2015)
− More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)
• CSV extract from a graph database available for download
• https://offshoreleaks.icij.org/
40. Offshore Leaks DB as Linked Open Data
• Ontotext published the Offshore Leaks DB as Linked Open Data
• Available for exploration, querying and download at
http://data.ontotext.com
• ONTOTEXT DISCLAIMERS
We use the data as is provided by ICIJ. We make no representations and warranties of any kind,
including warranties of title, accuracy, absence of errors or fitness for particular purpose. All
transformations, query results and derivative works are used only to showcase the service and
technological capabilities and not to serve as basis for any statements or conclusions.
41. Enrichment and structuring of the data
• Relationship type hierarchy
− About 80 types of relationship types in the original dataset got organized in a property hierarchy
• Classification of officers into Person and Company
− In the original database there is no way to distinguish whether an officer is a physical person
• Mapping to DBPedia:
− 209 countries referred in Offshore Leaks DB are mapped to DBPedia
− About 3000 persons and 300 companies mapped to DBPedia
• Overall size of the repository: 22M statements (20M explicit)
42. The RDF-ization Process
• Linked data variant produced without programming
− The raw CSV files are RDF-ized using TARQL, http://tarql.github.io/
− Data was further interlinked and enriched in GraphDB using SPARQL
• The process is documented in this README file
• All relevant artifacts are open-source, available at
• https://github.com/Ontotext-AD/leaks/
• The entire publishing and mapping took about 15 person-days !!!
− Including data.ontotext.com portal setup, promotion, documentation, etc.
43. Sample queries at http://data.ontotext.com
• Q1: Countries by number of entities related to them
• Q2: Country pairs by ownership statistics
• Q3: Statistics by incorporation year
• Q4: Officers and entities by number of capital relations
• Q5: Countries in Eastern Europe by number of owners
• Q6: Intermediaries in Asia by name
• Q7: The best connected officers
• Q8: Countries by number of Person and Company officers
45. Mapping datasets to DBPedia
• The task: map people, organizations and locations to IDs in DBPedia
− So that we can analyze the original data with the help of the extra information available in
DBPedia and other datasets that are related to it, e.g. Geonames
− For instance, #LinkedLeaks doesn’t contain any extra information about the companies, e.g.
industry sector, controlling or controlled companies, etc.
• Specific conditions: we had to map by names
− Other than names, the information about the entities in the source datasets couldn’t help the
mapping
▪ Address and country attributes are present, but those appeared to be marginally useful for mapping
− In both cases we mapped locations only in terms of countries and not finer grained locations
▪ For this purpose DBPedia geographic data is sufficient and it is also well mapped with GeoNames
46. Mapping datasets to DBPedia (2)
• We used the GraphDB connector to Lucene for these mappings
− Using the GraphDB connector, Lucene index was created for Organizations and People from
DBPedia, indexing all sorts of names, descriptions and other textual information for each entity
− The mapping process consists mostly of using the name of the entity from the 3rd party dataset
(in this case Panama Papers or GLEI) as a FTS query, embedded in a SPARQL query
• What is that Lucence does better than SPARQL?
− When there is little information other than the name, we benefit from the free text indexing of
Lucene, because it deals well with minor syntactic variations and sorts the results by relevance
− When mappings 300 000 organizations against another 500 000 organizations, without a key,
the complexity of a SPARQL query is 300 000 x 500 000, which is slower that 300 000 Lucene
queries
47. #LinkedLeaks Mapping Queries
• Companies mapped by industry
• Companies mapped in the Finance sector
• Politicians mapped
• Available as Saved Queries at http://factforge.net/sparql
• Note 1: Open Saved Queries with the folder icon in the upper-right
corner
50. Tracing Panama Papers entities in the news
• After mapping #LinkedLeaks entities to DBPedia identifiers, we can
load them, together with the mappings, in the FF-NEWS repository
• This way we have in a single repo, mapped to one another:
#LinkedLeaks data, DBPedia, News metadata
• We can make queries like: Give me news mentions of entities which
appear in the Panama Papers dataset
• This way the mapping enabled media monitoring at no extra cost
51. Thank you!
Experience the technology with NOW: Semantic News Portal
http://now.ontotext.com
and play with open data at
http://factforge.net
Editor's Notes
ДИЗАЙН: Looks better with the graphics
A bit smaller title and more “air” around the logo would make it better – see the proposed re-attangement
ДИЗАЙН: Сивият фон прави слайда да изглежда различно и го разграничава от другите. Което е добре. От друга страна ми е някак „убито“
ДИЗАЙН: Сивият фон прави слайда да изглежда различно и го разграничава от другите. Което е добре. От друга страна ми е някак „убито“
ДИЗАЙН: новата графика е супер. На следващия слайд ще й пипна малко цветовете, защото наситеността и прозрачността на цветовете в случая трябва да отговаря на „плътността“ на данните. По-лек основен текст (не болд) – на конкретния слайд ми стои по-добре
Our vision is to enable machines to interpret data and text by interlinking those in big knowledge graphs.The web of open data in growing exponentially!
There are thousands of datasets from Wikipedia and Geonames to government statistical data and to Panama PapersWe link open data to analyze news. We extract data from news to produce more open data and analyze social media. We integrate all this with proprietary data and commercial databases.
Why???
To help journalists, banks, merchants, governments and citizens reveal more! Quicker, with less effort and less stress.
This is elevator pitch for our overall technology approach, proposition and applications
HOW MANY CONCEPTS A PERSON KNOWS?
HOW MANY CONCEPTS A PERSON KNOWS?
ДИЗАЙН: оранжавите ленти в случая слагаха твърде силен акцент