"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
The Galleries, Libraries, Archives and Museums (GLAM) sector deals with complex and varied data. Integrating that data, especially across institutions, has always been a challenge. Semantic data integration is the best approach to deal with such challenges. Linked Open Data (LOD) enable large-scale Digital Humanities (DH) research, collaboration and aggregation, allowing DH researchers to make connections between (and make sense of) the multitude of digitized Cultural Heritage (CH) available on the web. An upsurge of interest in semtech and LOD has swept the CH and DH communities. An active Linked Open Data for Libraries, Archives and Museums (LODLAM) community exists, CH data is published as LOD, and international collaborations have emerged. The value of LOD is especially high in the GLAM sector, since culture by its very nature is cross-border and interlinked. We present interesting LODLAM projects, datasets, and ontologies, as well as Ontotext's experience in this domain. An extended paper on these topics is also available. It has 77 pages, 67 figures, detailed info about CH content and XML standards, Wikidata and global authority control.
This document provides an overview of linked open data (LOD), ontologies, and cultural heritage projects and datasets. It begins with definitions of semantic technologies, ontologies, and LOD. It then discusses several key ontologies used for cultural heritage data, including CIDOC CRM, Schema.org, and Wikidata. The document outlines several LOD projects focused on cultural heritage, including Europeana, ResearchSpace, American Art Collaborative, and ConservationSpace. It provides examples of semantic searches and annotations using these ontologies and projects. In summary, the document introduces the concepts of LOD and relevant ontologies, and highlights several major cultural heritage projects that apply these semantic technologies.
Invited report at Digital Presentation and Preservation of Cultural and Scientific Heritage (DIPP 2018), Burgas, Bulgaria. Report: http://dipp.math.bas.bg/images/2018/019-050_32_11-iDiPP2018-34.pdf
Boost your data analytics with open data and public news contentOntotext
Get guidance through the gigantic sea of freely available Open Data and learn how it can empower you analysis of any kind of sources.
This webinar is a live demo of news and data analytics, based on rich links within big knowledge graphs. It will show you how to:
Build ranking reports (e.g for people and organisations)
View topics linked implicitly (e.g. daughter companies, key personnel, products …)
Draw trend lines
Extend your analytics with additional data sources
This document discusses using open data and news analytics. It demonstrates how a semantic publishing platform can link text to concepts in knowledge graphs to enable navigation from text to entities and related news. It provides examples of queries over linked data from DBpedia, Geonames, and news metadata to retrieve information about cities, people related to Google, airports near London, and news mentioning companies. Graphs and rankings show the popularity and relationships of entities in the news by industry such as automotive, finance, and banking.
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingOntotext
A presentation of Ontotext’s CEO Atanas Kiryakov, given during Semantics 2018 - an annual conference that brings together researchers and professionals from all over the world to share knowledge and expertise on semantic computing.
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
The Galleries, Libraries, Archives and Museums (GLAM) sector deals with complex and varied data. Integrating that data, especially across institutions, has always been a challenge. Semantic data integration is the best approach to deal with such challenges. Linked Open Data (LOD) enable large-scale Digital Humanities (DH) research, collaboration and aggregation, allowing DH researchers to make connections between (and make sense of) the multitude of digitized Cultural Heritage (CH) available on the web. An upsurge of interest in semtech and LOD has swept the CH and DH communities. An active Linked Open Data for Libraries, Archives and Museums (LODLAM) community exists, CH data is published as LOD, and international collaborations have emerged. The value of LOD is especially high in the GLAM sector, since culture by its very nature is cross-border and interlinked. We present interesting LODLAM projects, datasets, and ontologies, as well as Ontotext's experience in this domain. An extended paper on these topics is also available. It has 77 pages, 67 figures, detailed info about CH content and XML standards, Wikidata and global authority control.
This document provides an overview of linked open data (LOD), ontologies, and cultural heritage projects and datasets. It begins with definitions of semantic technologies, ontologies, and LOD. It then discusses several key ontologies used for cultural heritage data, including CIDOC CRM, Schema.org, and Wikidata. The document outlines several LOD projects focused on cultural heritage, including Europeana, ResearchSpace, American Art Collaborative, and ConservationSpace. It provides examples of semantic searches and annotations using these ontologies and projects. In summary, the document introduces the concepts of LOD and relevant ontologies, and highlights several major cultural heritage projects that apply these semantic technologies.
Invited report at Digital Presentation and Preservation of Cultural and Scientific Heritage (DIPP 2018), Burgas, Bulgaria. Report: http://dipp.math.bas.bg/images/2018/019-050_32_11-iDiPP2018-34.pdf
Boost your data analytics with open data and public news contentOntotext
Get guidance through the gigantic sea of freely available Open Data and learn how it can empower you analysis of any kind of sources.
This webinar is a live demo of news and data analytics, based on rich links within big knowledge graphs. It will show you how to:
Build ranking reports (e.g for people and organisations)
View topics linked implicitly (e.g. daughter companies, key personnel, products …)
Draw trend lines
Extend your analytics with additional data sources
This document discusses using open data and news analytics. It demonstrates how a semantic publishing platform can link text to concepts in knowledge graphs to enable navigation from text to entities and related news. It provides examples of queries over linked data from DBpedia, Geonames, and news metadata to retrieve information about cities, people related to Google, airports near London, and news mentioning companies. Graphs and rankings show the popularity and relationships of entities in the news by industry such as automotive, finance, and banking.
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingOntotext
A presentation of Ontotext’s CEO Atanas Kiryakov, given during Semantics 2018 - an annual conference that brings together researchers and professionals from all over the world to share knowledge and expertise on semantic computing.
Ojo Al Data 100 - Call for sharing session at IODC 2016Oscar Corcho
This is the presentation of the #ojoaldata100 initiative (http://ojoaldata100.okfn.es) for the selection of 100 datasets that every city should be publishing in their open data portal. This presentation was used in a call for sharing session at the 4th International Open Data Conference (IODC2016).
A Linked Data Dataset for Madrid Transport Authority's DatasetsOscar Corcho
This document discusses the creation of a linked data dataset for Madrid's public transport authority (CRTM) to make their transport data more accessible and reusable. It outlines the motivation and benefits of open transport data, reviews existing methods of publishing open data, and proposes publishing CRTM's data as linked open data using semantic web standards to enable new applications and value-added services by combining the transport data with other public datasets. The methodology describes transforming CRTM's static and real-time transport datasets into RDF and providing SPARQL and SPARQL-Stream endpoints to access the data. Examples demonstrate sample URIs, queries to retrieve stop points, and visualizations of the linked data.
Basic introductory talk about the Web of Linked Data, given to undergraduate and posgraduate students of Universidad del Valle (Cali, Colombia) in September 2010. Knowledge about Semantic Web is required
This document describes the CORFU technique for unifying and reconciling corporate names in public contracts metadata. It aims to create a "big name" or unique identifier for each company by normalizing varying names into a single entry. The technique is applied to 400,000 supplier names in Australian public procurement data. It involves loading the names, normalizing text, filtering basic names, applying natural language processing including tokenization and stemming, clustering similar names, and selecting cluster representatives to link names to a company URI. The goal is to improve transparency in tracking where public money is spent.
This document describes Schema.org and its potential uses beyond search engine optimization. Schema.org was created in 2011 by major search engines to provide a set of shared vocabularies for structured data on web pages. It has since grown to include over 2000 terms covering entities, relationships, and actions. The document discusses how Schema.org data can be used for analytics by extracting metadata from web pages and sending it to Google Analytics for additional dimensions and metrics. This enables analysis of user behavior at a more granular level than is normally possible from web analytics alone.
This document outlines DBpedia's strategy to become a global open knowledge graph by facilitating collaboration on data. It discusses establishing governance and curation processes to improve data quality and enable organizations to incubate their knowledge graphs. The goals are to have millions of users and contributors collaborating on data through services like GitHub for data. Technologies like identifiers, schema mapping, and test-driven development help integrate data. The vision is for DBpedia to connect many decentralized data sources so data becomes freely available and easier to work with.
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
Imagine risk analysis manager or compliance officer who can discover easily relationships like this: Big Bucks Café out of Seattle controls My Local Café in NYC through an offshore company. Such discovery can be a game changer if My Local Café pretends to be an independent small enterprise, while recently Big Bucks experiences financial difficulties.
A possible future role of schema.org for business reportingsopekmir
The presentation demonstrates a vision for the “reporting extension” that could enhance the processes related to business reporting and the role it could have for the SBR vision.
The document discusses Thomson Reuters' (TR) efforts to build an enterprise content platform to manage their large and growing collection of structured and unstructured data. TR currently stores over 60,000 terabytes of data and processes millions of data points daily. They aim to modernize their infrastructure, break down content silos, and unlock new commercial opportunities through their platform. Their approach involves developing unique entity identifiers, intelligent tagging tools, a knowledge graph, and analytics capabilities. This will allow them to better integrate client data, create new insights, and facilitate innovative uses of their diverse content.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
The document discusses using semantic web technologies to make big data smarter. It provides an overview of key concepts in semantic web, including linked data and ontologies. It describes how semantic web can add structure and meaning to unstructured data through modeling data as graphs and defining relationships and properties. The goal is to publish and query interconnected data at scale to enable new types of queries and inferences over big data.
Adding Semantic Edge to Your Content – From Authoring to DeliveryOntotext
Within the last few years we see and ever increasing demand for more accurate user specific content which on the other hand overwhelms content providers.This is where smart publishing platforms come into play. They aim at bringing the right content at the right time – digested, easy to comprehend, fast to navigate, and tailored to the readers’ personal interests.
The technologies that power them help publishers to automate the metadata enrichment process, making it more consistent, accurate and rich.
The document describes an approach called RDFIndex for representing and computing quantitative indexes using semantic web technologies. The main contributions are a high-level model built on top of the RDF Data Cube Vocabulary for representing indexes, and a Java-SPARQL based processor to exploit metadata, validate indexes, and compute new index values. An example index called the "World Bank Naive Index" is used to illustrate how the RDFIndex approach can represent the structure of an index and its components/indicators in RDF, and compute the index values.
Enterprise Knowledge Graphs allow organizations to integrate heterogeneous data from various sources and represent them semantically using common vocabularies and ontologies. This facilitates linking and querying of related information across organizational boundaries. Knowledge graphs provide a holistic view of enterprise data and support various applications through their use as a common background knowledge base. However, building and maintaining knowledge graphs at scale poses challenges regarding data quality, coherence, and evolution of the knowledge representation over time.
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
The Physics Department of the University of Cagliari and the Linkalab Group invited me to talk about the Semantic Web and Linked Data - this is simply an introduction to the technologies involved.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
Data integration, data interoperation and data quality are major challenges that continue to haunt enterprises. Every enterprise either by choice or by chance has created massive silos of data in different formats, with duplications and quality issues.
Knowledge graphs have proven to be a viable solution to address the integration and interoperation problem. Semantic technologies in particular provide an intelligent way of creating an abstract layer for the enterprise data model and mapping of siloed data to that model, allowing a smooth integration and a common view of the data.
Technologies like OWL (Web Ontology Language) and RDF (Resource Description Framework) are the back bone of semantics for knowledge graph implementation. Enterprises use OWL to build an ontology model to create a common definition for concepts and how they are connected to each other in their specific domain.
They then use RDF to create a triple format representation of their data by mapping it to the Ontology. This approach makes their data smart and machine understandable.
But how can enterprises control and validate the quality of this mapped data? Furthermore, how can they use this one abstract representation of data to meet all their different business requirements? Different departments, different LoBs and different business branches all have their own data needs, creating a new challenge to be tackled by the enterprise.
In this talk we will look at how the power of SHACL (SHAPES and Constraints Language), a W3C standard for defining constraint sets over data; complements the two core semantic technologies OWL and RDF. What are the similarities, the overlaps and the differences.
We will talk about how SHACL gives enterprises the power to reuse, customize and validate their data for various scenarios, uses cases and business requirements; making the application of semantics even more practical.
RDF and OWL are powerful tools for making data smart. RDF uses a simple triple format to represent metadata and link data using unique identifiers, allowing for data integration. OWL builds on RDF by adding more formal semantics and defining concepts, properties, and relationships to allow for automated reasoning and inference over data. Combining OWL and RDF results in smart data that computers can understand, enabling intelligent automation and decision making.
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
These are slides from a live webinar taken place January 2018.
GraphDB™ Fundamentals builds the basis for working with graph databases that utilize the W3C standards, and particularly GraphDB™. In this webinar, we demonstrated how to install and set-up GraphDB™ 8.4 and how you can generate your first RDF dataset. We also showed how to quickly integrate complex and highly interconnected data using RDF and SPARQL and much more.
With the help of GraphDB™, you can start smartly managing your data assets, visually represent your data model and get insights from them.
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Connected Data World
Borislav Popov's slides from his lightning talk at Connected Data London. Borislav - a Director of Business Development at Ontotext presented Ontotext's approach to tackling the Panama Papers leak. Using a technology that is a mix between semantic web and graph databases.
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
This webinar continues series are demonstrating how linked open data and semantic tagging of news can be used for comprehensive media monitoring, market and business intelligence. The platform for the demonstrations is FactForge: a hub for news and data about people, organizations, and locations (POL). FactForge embodies a big knowledge graph (BKG) of more than 1 billion facts that allows various analytical queries, including tracing suspicious patterns of company control; media monitoring of people, including companies owned by them, their subsidiaries, etc.
Ojo Al Data 100 - Call for sharing session at IODC 2016Oscar Corcho
This is the presentation of the #ojoaldata100 initiative (http://ojoaldata100.okfn.es) for the selection of 100 datasets that every city should be publishing in their open data portal. This presentation was used in a call for sharing session at the 4th International Open Data Conference (IODC2016).
A Linked Data Dataset for Madrid Transport Authority's DatasetsOscar Corcho
This document discusses the creation of a linked data dataset for Madrid's public transport authority (CRTM) to make their transport data more accessible and reusable. It outlines the motivation and benefits of open transport data, reviews existing methods of publishing open data, and proposes publishing CRTM's data as linked open data using semantic web standards to enable new applications and value-added services by combining the transport data with other public datasets. The methodology describes transforming CRTM's static and real-time transport datasets into RDF and providing SPARQL and SPARQL-Stream endpoints to access the data. Examples demonstrate sample URIs, queries to retrieve stop points, and visualizations of the linked data.
Basic introductory talk about the Web of Linked Data, given to undergraduate and posgraduate students of Universidad del Valle (Cali, Colombia) in September 2010. Knowledge about Semantic Web is required
This document describes the CORFU technique for unifying and reconciling corporate names in public contracts metadata. It aims to create a "big name" or unique identifier for each company by normalizing varying names into a single entry. The technique is applied to 400,000 supplier names in Australian public procurement data. It involves loading the names, normalizing text, filtering basic names, applying natural language processing including tokenization and stemming, clustering similar names, and selecting cluster representatives to link names to a company URI. The goal is to improve transparency in tracking where public money is spent.
This document describes Schema.org and its potential uses beyond search engine optimization. Schema.org was created in 2011 by major search engines to provide a set of shared vocabularies for structured data on web pages. It has since grown to include over 2000 terms covering entities, relationships, and actions. The document discusses how Schema.org data can be used for analytics by extracting metadata from web pages and sending it to Google Analytics for additional dimensions and metrics. This enables analysis of user behavior at a more granular level than is normally possible from web analytics alone.
This document outlines DBpedia's strategy to become a global open knowledge graph by facilitating collaboration on data. It discusses establishing governance and curation processes to improve data quality and enable organizations to incubate their knowledge graphs. The goals are to have millions of users and contributors collaborating on data through services like GitHub for data. Technologies like identifiers, schema mapping, and test-driven development help integrate data. The vision is for DBpedia to connect many decentralized data sources so data becomes freely available and easier to work with.
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
Imagine risk analysis manager or compliance officer who can discover easily relationships like this: Big Bucks Café out of Seattle controls My Local Café in NYC through an offshore company. Such discovery can be a game changer if My Local Café pretends to be an independent small enterprise, while recently Big Bucks experiences financial difficulties.
A possible future role of schema.org for business reportingsopekmir
The presentation demonstrates a vision for the “reporting extension” that could enhance the processes related to business reporting and the role it could have for the SBR vision.
The document discusses Thomson Reuters' (TR) efforts to build an enterprise content platform to manage their large and growing collection of structured and unstructured data. TR currently stores over 60,000 terabytes of data and processes millions of data points daily. They aim to modernize their infrastructure, break down content silos, and unlock new commercial opportunities through their platform. Their approach involves developing unique entity identifiers, intelligent tagging tools, a knowledge graph, and analytics capabilities. This will allow them to better integrate client data, create new insights, and facilitate innovative uses of their diverse content.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
The document discusses using semantic web technologies to make big data smarter. It provides an overview of key concepts in semantic web, including linked data and ontologies. It describes how semantic web can add structure and meaning to unstructured data through modeling data as graphs and defining relationships and properties. The goal is to publish and query interconnected data at scale to enable new types of queries and inferences over big data.
Adding Semantic Edge to Your Content – From Authoring to DeliveryOntotext
Within the last few years we see and ever increasing demand for more accurate user specific content which on the other hand overwhelms content providers.This is where smart publishing platforms come into play. They aim at bringing the right content at the right time – digested, easy to comprehend, fast to navigate, and tailored to the readers’ personal interests.
The technologies that power them help publishers to automate the metadata enrichment process, making it more consistent, accurate and rich.
The document describes an approach called RDFIndex for representing and computing quantitative indexes using semantic web technologies. The main contributions are a high-level model built on top of the RDF Data Cube Vocabulary for representing indexes, and a Java-SPARQL based processor to exploit metadata, validate indexes, and compute new index values. An example index called the "World Bank Naive Index" is used to illustrate how the RDFIndex approach can represent the structure of an index and its components/indicators in RDF, and compute the index values.
Enterprise Knowledge Graphs allow organizations to integrate heterogeneous data from various sources and represent them semantically using common vocabularies and ontologies. This facilitates linking and querying of related information across organizational boundaries. Knowledge graphs provide a holistic view of enterprise data and support various applications through their use as a common background knowledge base. However, building and maintaining knowledge graphs at scale poses challenges regarding data quality, coherence, and evolution of the knowledge representation over time.
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
The Physics Department of the University of Cagliari and the Linkalab Group invited me to talk about the Semantic Web and Linked Data - this is simply an introduction to the technologies involved.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
Data integration, data interoperation and data quality are major challenges that continue to haunt enterprises. Every enterprise either by choice or by chance has created massive silos of data in different formats, with duplications and quality issues.
Knowledge graphs have proven to be a viable solution to address the integration and interoperation problem. Semantic technologies in particular provide an intelligent way of creating an abstract layer for the enterprise data model and mapping of siloed data to that model, allowing a smooth integration and a common view of the data.
Technologies like OWL (Web Ontology Language) and RDF (Resource Description Framework) are the back bone of semantics for knowledge graph implementation. Enterprises use OWL to build an ontology model to create a common definition for concepts and how they are connected to each other in their specific domain.
They then use RDF to create a triple format representation of their data by mapping it to the Ontology. This approach makes their data smart and machine understandable.
But how can enterprises control and validate the quality of this mapped data? Furthermore, how can they use this one abstract representation of data to meet all their different business requirements? Different departments, different LoBs and different business branches all have their own data needs, creating a new challenge to be tackled by the enterprise.
In this talk we will look at how the power of SHACL (SHAPES and Constraints Language), a W3C standard for defining constraint sets over data; complements the two core semantic technologies OWL and RDF. What are the similarities, the overlaps and the differences.
We will talk about how SHACL gives enterprises the power to reuse, customize and validate their data for various scenarios, uses cases and business requirements; making the application of semantics even more practical.
RDF and OWL are powerful tools for making data smart. RDF uses a simple triple format to represent metadata and link data using unique identifiers, allowing for data integration. OWL builds on RDF by adding more formal semantics and defining concepts, properties, and relationships to allow for automated reasoning and inference over data. Combining OWL and RDF results in smart data that computers can understand, enabling intelligent automation and decision making.
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
These are slides from a live webinar taken place January 2018.
GraphDB™ Fundamentals builds the basis for working with graph databases that utilize the W3C standards, and particularly GraphDB™. In this webinar, we demonstrated how to install and set-up GraphDB™ 8.4 and how you can generate your first RDF dataset. We also showed how to quickly integrate complex and highly interconnected data using RDF and SPARQL and much more.
With the help of GraphDB™, you can start smartly managing your data assets, visually represent your data model and get insights from them.
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Connected Data World
Borislav Popov's slides from his lightning talk at Connected Data London. Borislav - a Director of Business Development at Ontotext presented Ontotext's approach to tackling the Panama Papers leak. Using a technology that is a mix between semantic web and graph databases.
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
This webinar continues series are demonstrating how linked open data and semantic tagging of news can be used for comprehensive media monitoring, market and business intelligence. The platform for the demonstrations is FactForge: a hub for news and data about people, organizations, and locations (POL). FactForge embodies a big knowledge graph (BKG) of more than 1 billion facts that allows various analytical queries, including tracing suspicious patterns of company control; media monitoring of people, including companies owned by them, their subsidiaries, etc.
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
There was a time when the Enterprise Data Warehouse (EDW) was the only way to provide a 360-degree analytical view of the business. In recent years many organizations have deployed disparate analytics alternatives to the EDW, including: cloud data warehouses, machine learning frameworks, graph databases, geospatial tools, and other technologies. Often these new deployments have resulted in the creation of analytical silos that are too complex to integrate, seriously limiting global insights and innovation.
Join guest speaker, 451 Research’s Jim Curtis and Pivotal’s Jacque Istok for an interactive discussion about some of the overarching trends affecting the data warehousing market, as well as how to build a next generation data platform to accelerate business innovation. During this webinar you will learn:
- The significance of a multi-cloud, infrastructure-agnostic analytics
- What is working and what isn’t, when it comes to analytics integration
- The importance of seamlessly integrating all your analytics in one platform
- How to innovate faster, taking advantage of open source and agile software
Speakers: James Curtis, Senior Analyst, Data Platforms & Analytics, 451 Research & Jacque Istok, Head of Data, Pivotal
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
This document summarizes a report on big data analytics and the use of analytical platforms. It describes how companies have been dealing with large volumes of data for decades but that data volumes are growing exponentially due to new types of structured, semi-structured, and unstructured data from sources like the web, social media, sensors and machine data. New analytical platforms and technologies are needed to efficiently store, manage and analyze this diverse new "big data". The report is based on a survey of 302 BI professionals and interviews with industry experts regarding their use of analytical platforms for big data analytics.
Tracxn Research — Big Data Infrastructure Landscape, September 2016Tracxn
Following a rather muted 2015, the year 2016 witnessed an impressive bounce back by the big data sector, with a total funding of $1.1B secured in 37 rounds.
TrendsByte (http://trendsbyte.com) is a Data as a Service (DaaS) platform, which provides comprehensive, relevant and actionable insights on global trends and opportunities, across industries. We use machine learning and Natural Language Processing (NLP) to aggregate relevant data, analyze strategies of key players, and extract meaningful insights from it.
Our insights help consultants and CXOs cut through the noise, and access the right information without wasting their resources. We have tested our prototype and validated product-market-fit through sales and traction. We are now in the process of building scalable platform, which will provide data analytics and API access to our customers. The platform will have flexible search capabilities, for customers to create customized charts, dashboards, and shareable reports.
The document discusses knowledge graphs and provides examples of how Neo4j has been used by customers for knowledge graph and graph database applications. Specifically, it discusses how Neo4j has helped organizations like Itau Unibanco, UBS, Airbnb, Novartis, Columbia University, Telia, Scripps Networks, and Pitney Bowes with fraud detection, master data management, content management, smart home applications, investigative journalism, and other use cases by building knowledge graphs and connecting diverse data sources.
This document discusses big data analytics and analytical platforms. It finds that companies have been storing and analyzing large volumes of data for decades, but new types of structured, semi-structured, and unstructured data from sources like the web and sensors are fueling even greater amounts of "big data". Analytical platforms have emerged to help organizations efficiently store and analyze this data. The report is based on a survey of 302 IT professionals and interviews with BI experts.
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Matt Stubbs
Date: 14th November 2018
Location: Governance and MDM Theatre
Time: 10:30 - 11:00
Speaker: Mike Ferguson
Organisation: IBS
About: For most organisations today, data complexity has increased rapidly. In the area of operations, we now have cloud and on-premises OLTP systems with customers, partners and suppliers accessing these applications via APIs and mobile apps. In the area of analytics, we now have data warehouse, data marts, big data Hadoop systems, NoSQL databases, streaming data platforms, cloud storage, cloud data warehouses, and IoT-generated data being created at the edge. Also, the number of data sources is exploding as companies ingest more and more external data such as weather and open government data. Silos have also appeared everywhere as business users are buying in self-service data preparation tools without consideration for how these tools integrate with what IT is using to integrate data. Yet new regulations are demanding that we do a better job of governing data, and business executives are demanding more agility to remain competitive in a digital economy. So how can companies remain agile, reduce cost and reduce the time-to-value when data complexity is on the up?
In this session, Mike will discuss how companies can create an information supply chain to manufacture business-ready data and analytics to reduce time to value and improve agility while also getting data under control.
Sotiris is currently working as Research Director with the Institute of Computer Science at the Foundation for Research and Technology - Hellas, where his research interests include systems, networks, and security. He is also a member of the European Union Agency for Network and Information Security (ENISA) Permanent Stakeholders Group! During Data Science Conference, Sotiris will talk about how data sharing between private companies and research facilities may lead to monetization.
Gain Super Powers in Data Science: Relationship Discovery Across Public DataOntotext
The document summarizes a webinar on relationship discovery across public data. It outlines the webinar agenda which includes use cases of relation discovery and media monitoring. It also describes examples of relationship discovery from datasets like the Panama Papers and media monitoring examples. It discusses linking news to knowledge graphs and semantic media monitoring. Finally, it covers mapping additional datasets to DBPedia to facilitate relationship discovery.
Diving in Panama Papers and Open Data to Discover Emerging NewsOntotext
Get guidance through the gigantic sea of freely released data from Panama Papers as well as Linked Open Data could. You will learn how it can empower you understanding of today’s news or any other information source.
Data Visualization Trends - Next Steps for TableauArunima Gupta
Want answers to:
- What is data visualization?
- Why is it deemed disruptive in the field of analytics?
- What is Tableau?
Come view the slide deck!
Concludes with:
- Digital strategy recommendations for Tableau to become the winner in a winner-take-all-market
This document discusses the rise of big data and analytics used to analyze large volumes of data. It notes that while terabytes used to be considered big data, petabytes are now common as organizations seek to analyze more transaction details, web data, and machine-generated data. To handle larger volumes, vendors have created specialized analytical platforms that can analyze structured data faster than general databases. The document also discusses how new technologies like Hadoop help analyze unstructured data and how businesses need tools to help both casual and power users analyze data.
Session 4 - A practical journey on how to use the DataBench ToolboxDataBench
This document summarizes a session from the European Big Data Value Forum 2020. It discusses the DataBench Toolbox, which aims to be a one-stop-shop for big data and AI benchmarking. The toolbox connects benchmarking results and showcases knowledge. It was demonstrated including an AI Benchmark Observatory for tracking topic popularity over time using various data sources. Finally, the needs of digital innovation hubs for fostering adoption of benchmarking results were discussed, including providing local services and connecting to resources across Europe.
Tracxn Research — Business Intelligence Landscape, September 2016Tracxn
Tracxn's Business Intelligence 2016 report covers companies that develop and provide Business Intelligence (BI) & Analytics software. It also covers prominent industry specific BI solutions.
Analyze billions of records on Salesforce App Cloud with BigObjectSalesforce Developers
Salesforce hosts billions of customer records on Salesforce App Cloud. Making timely decisions on this invaluable data demands a new set of capabilities. From interacting with data in real-time to leveraging a fluid integration with Salesforce Analytics, these capabilities are just around the corner. Join us in this roadmap session to see what the near-future of Big Data on Salesforce App Cloud looks like and how you can benefit from it.
Key Takeaways
- Learn what 100 billion+ records on the Salesforce App Cloud could actually mean to you.
- Understand new services such as AsyncSOQL that can can deliver reliable, resilient query capabilities over your sObjects and BigObjects.
-Gain insights for large scale federated data filtering and aggregation.
-Transform data movement so all your customer records are available across their life cycle.
Intended Audience
This session is for Salesforce Administrators, Developers, Architects and just about anyone who wants to learn more about BigObjects!
The document discusses technologies to support analytics for IoT data streams. It describes how IoT data will grow exponentially and analyzing data streams in real-time can provide 100x more value than analyzing stored data. Examples are given of how Hertz and retailers are leveraging real-time analysis of IoT data streams to improve customer experiences and increase revenue.
Similar to euBusinessGraph Company and Economic Data (20)
Ontotext provides concise summaries in 3 sentences or less that provide the high level and essential information from the document.
Ontotext is a software company that works on semantic technologies and knowledge graphs for cultural heritage and digital humanities projects. They have developed applications like ResearchSpace for museums and ConservationSpace for conservation specialists. They also contribute linked open data to projects like Europeana, Wikidata, and DBpedia and provide consulting services to institutions like museums and national aggregators.
Semantic Archive Integration for Holocaust Research: the EHRI Research Infras...Vladimir Alexiev, PhD, PMP
The European Holocaust Research Infrastructure (EHRI) is a large-scale EU project that involves 23 institutions and archives working on Holocaust studies, from Europe, Israel and the US. In its first phase (2011-2015) it aggregated archival descriptions and materials on a large scale and built a Virtual Research Environment (portal) for Holocaust researchers based on a graph database.
In its second phase (2015-2019), EHRI2 seeks to enhance the gathered materials using semantic approaches: enrichment, coreferencing, interlinking. Semantic integration involves four of the 14 EHRI2 work packages and helps integrate databases, free text, and metadata to interconnect historical entities (people, organizations, places, historic events) and create networks. We will present some of the EHRI2 technical work, including critical issues we have encountered.
WP10 (EAD) converts archival descriptions from various formats to standard EAD XML; transports EADs using OAI PMH or ResourceSync; ingests EADs to the EHRI database; enables use cases such as synchronization; coreferencing of textual Access Points to proper thesaurus references
WP11 (Authorities and Standards) consolidates and enlarges the EHRI authorities to render the indexing and retrieval of information more effective. It addresses Access Points in ingested EADs (normalization of Unicode, spelling, punctuation; deduplication; clustering; coreferencing to authority control), Subjects (deployment of a Thesaurus Management System in support of the EHRI Thesaurus Editorial Board), Places (coreferencing to Geonames); Camps and Ghettos (integrating data with Wikidata); Persons, Corporate Bodies (using USHMM HSV and VIAF); semantic (conceptual) search including hierarchical query expansion; interconnectivity of archival descriptions; permanent URLs; metadata quality; EAD RelaxNG and Schematron schemas and validation, etc.
WP13 (Data Infrastructures) builds up domain knowledge bases from institutional databases by using deduplication, semantic data integration, semantic text analysis. It provides the foundation for research use cases on Jewish Social Networks and their impact on the chance of survival.
WP14 (Digital Historiography Research) works on semantic text analysis (semantic enrichment), text similarity (e.g. clustering based on Neural Networks, LDA, etc), geo-mapping. It develops Digital Historiography researcher tools, including Prosopographical approaches.
How GLAMs can use Wikipedia/Wikidata to make their collections globally accessible across languages.
Europeana Food and Drink content providers workshop, Athens, 18 May 2015
Wikidata is being considered as a target for Europeana's semantic strategy to enrich metadata and link cultural heritage objects. Wikidata provides multilingual information on entities like people, organizations, places and concepts from Wikipedia. It has potential to help solve Europeana's challenges around data diversity, quality and interlinking due to its connections to other vocabularies and ability to perform automatic enrichment. Europeana aims to semantically enrich its data and link objects to Wikidata entities to provide additional context. Content providers are also encouraged to contribute information to Wikidata to further enhance the semantic knowledge graph.
Europeana Food and Drink annual meeting, 20 Mar 2015, Athens, Greece. Full report: http://vladimiralexiev.github.io/pubs/Europeana-Food-and-Drink-Classification-Scheme-(D2.2).pdf
Information School, University of Washington, 2014-05-21: INFX 598 - Introducing Linked Data: concepts, methods and tools. Guest lecture (Module 9) "Doing Business with Semantic Technologies": Introduction to Ontotext and some of its products, clients and projects.
Also see video:https://voicethread.com/myvoice/#thread/5784646/29625471/31274564
Triplestores and inference, applications in Finance, text-mining. Projects and solutions for financial media and publishers.
Keystone Industrial Panel, ISWC 2014, Riva del Garda, 18 Oct 2014.
Thanks to Atanas Kiryakov for this presentation, I just cut it to size.
Full version of http://www.slideshare.net/valexiev1/gvp-lodcidocshort. Same is available on http://vladimiralexiev.github.io/pres/20140905-CIDOC-GVP/index.html
CIDOC Congress, Dresden, Germany
2014-09-05: International Terminology Working Group: full version.
2014-09-09: Getty special session: short version
CIDOC Congress, Dresden, Germany
2014-09-05: International Terminology Working Group: full version (http://vladimiralexiev.github.io/pres/20140905-CIDOC-GVP/index.html)
2014-09-09: Getty special session: short version (http://VladimirAlexiev.github.io/pres/20140905-CIDOC-GVP/GVP-LOD-CIDOC-short.pdf)
This document discusses semantic technologies for cultural heritage. It introduces Ontotext Corp, which develops semantic technology, and some of their projects involving cultural heritage data. These include the ResearchSpace project with the British Museum, projects involving Europeana like Bulgariana and Europeana Creative, and publishing Getty vocabularies as linked open data.
Nikola Ikonomov, Boyan Simeonov, Jana Parvanova and Vladimir Alexiev. In Digital Presentation and Preservation of Cultural and Scientific Heritage (DiPP 2013), Veliko Tarnovo, Bulgaria, Sep 2013.
Nikola Ikonomov, Boyan Simeonov, Jana Parvanova and Vladimir Alexiev. In Digital Presentation and Preservation of Cultural and Scientific Heritage (DiPP 2013), Veliko Tarnovo, Bulgaria, Sep 2013
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...Vladimir Alexiev, PhD, PMP
Vladimir Alexiev, Dimitar Manov, Jana Parvanova and Svetoslav Petrov. In proceedings of workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, 26 Sep 2013, Valetta, Malta
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...Vladimir Alexiev, PhD, PMP
Vladimir Alexiev, Dimitar Manov, Jana Parvanova and Svetoslav Petrov. In proceedings of workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, 26 Sep 2013, Valetta, Malta
Jana Parvanova, Vladimir Alexiev and Stanislav Kostadinov. In workshop Collaborative Annotations in Shared Environments: metadata, vocabularies and techniques in the Digital Humanities (DH-CASE 2013). Collocated with DocEng 2013. Florence, Italy, Sep 2013.
Jana Parvanova, Vladimir Alexiev and Stanislav Kostadinov. In workshop Collaborative Annotations in Shared Environments: metadata, vocabularies and techniques in the Digital Humanities (DH-CASE 2013). Collocated with DocEng 2013. Florence, Italy, Sep 2013.
- Ontotext is an innovative semantic technology company that has been working in the field since 2000 and spun off in 2008, employing 70 people.
- They provide 360 degree semantic solutions including semantic annotation, text analysis, concept extraction, semantic search and repositories, and applications in media, life sciences, and cultural heritage.
- Ontotext has developed large semantic knowledge bases for clients like Europeana and the UK National Archives, extracting billions of facts from texts and providing semantic search capabilities.
1. Elemental Economics - Introduction to mining.pdfNeal Brewster
After this first you should: Understand the nature of mining; have an awareness of the industry’s boundaries, corporate structure and size; appreciation the complex motivations and objectives of the industries’ various participants; know how mineral reserves are defined and estimated, and how they evolve over time.
Optimizing Net Interest Margin (NIM) in the Financial Sector (With Examples).pdfshruti1menon2
NIM is calculated as the difference between interest income earned and interest expenses paid, divided by interest-earning assets.
Importance: NIM serves as a critical measure of a financial institution's profitability and operational efficiency. It reflects how effectively the institution is utilizing its interest-earning assets to generate income while managing interest costs.
Fabular Frames and the Four Ratio ProblemMajid Iqbal
Digital, interactive art showing the struggle of a society in providing for its present population while also saving planetary resources for future generations. Spread across several frames, the art is actually the rendering of real and speculative data. The stereographic projections change shape in response to prompts and provocations. Visitors interact with the model through speculative statements about how to increase savings across communities, regions, ecosystems and environments. Their fabulations combined with random noise, i.e. factors beyond control, have a dramatic effect on the societal transition. Things get better. Things get worse. The aim is to give visitors a new grasp and feel of the ongoing struggles in democracies around the world.
Stunning art in the small multiples format brings out the spatiotemporal nature of societal transitions, against backdrop issues such as energy, housing, waste, farmland and forest. In each frame we see hopeful and frightful interplays between spending and saving. Problems emerge when one of the two parts of the existential anaglyph rapidly shrinks like Arctic ice, as factors cross thresholds. Ecological wealth and intergenerational equity areFour at stake. Not enough spending could mean economic stress, social unrest and political conflict. Not enough saving and there will be climate breakdown and ‘bankruptcy’. So where does speculative design start and the gambling and betting end? Behind each fabular frame is a four ratio problem. Each ratio reflects the level of sacrifice and self-restraint a society is willing to accept, against promises of prosperity and freedom. Some values seem to stabilise a frame while others cause collapse. Get the ratios right and we can have it all. Get them wrong and things get more desperate.
2. Elemental Economics - Mineral demand.pdfNeal Brewster
After this second you should be able to: Explain the main determinants of demand for any mineral product, and their relative importance; recognise and explain how demand for any product is likely to change with economic activity; recognise and explain the roles of technology and relative prices in influencing demand; be able to explain the differences between the rates of growth of demand for different products.
University of North Carolina at Charlotte degree offer diploma Transcripttscdzuip
办理美国UNCC毕业证书制作北卡大学夏洛特分校假文凭定制Q微168899991做UNCC留信网教留服认证海牙认证改UNCC成绩单GPA做UNCC假学位证假文凭高仿毕业证GRE代考如何申请北卡罗莱纳大学夏洛特分校University of North Carolina at Charlotte degree offer diploma Transcript
Discover the Future of Dogecoin with Our Comprehensive Guidance36 Crypto
Learn in-depth about Dogecoin's trajectory and stay informed with 36crypto's essential and up-to-date information about the crypto space.
Our presentation delves into Dogecoin's potential future, exploring whether it's destined to skyrocket to the moon or face a downward spiral. In addition, it highlights invaluable insights. Don't miss out on this opportunity to enhance your crypto understanding!
https://36crypto.com/the-future-of-dogecoin-how-high-can-this-cryptocurrency-reach/
How Does CRISIL Evaluate Lenders in India for Credit RatingsShaheen Kumar
CRISIL evaluates lenders in India by analyzing financial performance, loan portfolio quality, risk management practices, capital adequacy, market position, and adherence to regulatory requirements. This comprehensive assessment ensures a thorough evaluation of creditworthiness and financial strength. Each criterion is meticulously examined to provide credible and reliable ratings.
Falcon stands out as a top-tier P2P Invoice Discounting platform in India, bridging esteemed blue-chip companies and eager investors. Our goal is to transform the investment landscape in India by establishing a comprehensive destination for borrowers and investors with diverse profiles and needs, all while minimizing risk. What sets Falcon apart is the elimination of intermediaries such as commercial banks and depository institutions, allowing investors to enjoy higher yields.
5 Tips for Creating Standard Financial ReportsEasyReports
Well-crafted financial reports serve as vital tools for decision-making and transparency within an organization. By following the undermentioned tips, you can create standardized financial reports that effectively communicate your company's financial health and performance to stakeholders.
Independent Study - College of Wooster Research (2023-2024) FDI, Culture, Glo...AntoniaOwensDetwiler
"Does Foreign Direct Investment Negatively Affect Preservation of Culture in the Global South? Case Studies in Thailand and Cambodia."
Do elements of globalization, such as Foreign Direct Investment (FDI), negatively affect the ability of countries in the Global South to preserve their culture? This research aims to answer this question by employing a cross-sectional comparative case study analysis utilizing methods of difference. Thailand and Cambodia are compared as they are in the same region and have a similar culture. The metric of difference between Thailand and Cambodia is their ability to preserve their culture. This ability is operationalized by their respective attitudes towards FDI; Thailand imposes stringent regulations and limitations on FDI while Cambodia does not hesitate to accept most FDI and imposes fewer limitations. The evidence from this study suggests that FDI from globally influential countries with high gross domestic products (GDPs) (e.g. China, U.S.) challenges the ability of countries with lower GDPs (e.g. Cambodia) to protect their culture. Furthermore, the ability, or lack thereof, of the receiving countries to protect their culture is amplified by the existence and implementation of restrictive FDI policies imposed by their governments.
My study abroad in Bali, Indonesia, inspired this research topic as I noticed how globalization is changing the culture of its people. I learned their language and way of life which helped me understand the beauty and importance of cultural preservation. I believe we could all benefit from learning new perspectives as they could help us ideate solutions to contemporary issues and empathize with others.
Vicinity Jobs’ data includes more than three million 2023 OJPs and thousands of skills. Most skills appear in less than 0.02% of job postings, so most postings rely on a small subset of commonly used terms, like teamwork.
Laura Adkins-Hackett, Economist, LMIC, and Sukriti Trehan, Data Scientist, LMIC, presented their research exploring trends in the skills listed in OJPs to develop a deeper understanding of in-demand skills. This research project uses pointwise mutual information and other methods to extract more information about common skills from the relationships between skills, occupations and regions.
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...sameer shah
Delve into the world of STREETONOMICS, where a team of 7 enthusiasts embarks on a journey to understand unorganized markets. By engaging with a coffee street vendor and crafting questionnaires, this project uncovers valuable insights into consumer behavior and market dynamics in informal settings."
2. Presentation Outline
•Ontotext Introduction
•euBusinesGraph
•FactForge: Open data and news about people and organizations
•Relationship Discovery Examples
•Media Monitoring Examples & Popularity Ranking
•Global Legal Entity Identifier RDF-ization and DBPedia mapping
Sep 2017euBusinessGraph Company and Economic Data
4. History and Essential Facts
• Started in year 2000 as Semantic Web pioneer
− As R&D lab within Sirma – one of the biggest Bulgarian software companies
− Got spun-off and took VC investment in 2008
• 65 staff, R&D Center at Sofia; 80% sales in USA and UK
− Serving BBC, FT, Springer Nature, Wiley, Elsevier, OUP, IET…
• 400+ person-years invested in R&D
− Multiple innovation & technology awards: Washington Post, BBC, FT, BAIT, etc.
• Member of multiple industry bodies
− W3C, EDMC, ODI, LDBC, STI, DBPedia Foundation
euBusinessGraph Company and Economic Data Sep 2017
5. Commercial Company
Database
(e.g. D&B)
Link data!
Reveal more!
Social Media
News
Wikipedia
Private• Recognizing and linking
entities across text and
data requires knowledge
and context
• Knowledge Graphs
incorporate semantic
entity fingerprints for
entities and concepts
• Evolve knowledge graphs
and interlink them with
proprietary data
Sep 2017euBusinessGraph Company and Economic Data
7. NOW: Linking News to Big Knowledge Graphs
• The Ontotext
platform
links text to
knowledge
graphs
• Navigate
from news to
concepts,
entities and
topics; from
there to other
news
Try it at
http://now.ontotext.com Sep 2017
9. Technology Excellence Delivered
• Powerful technology mix: Graph DB engine + Text mining
• Robust technology: We run BBC.CO.UK/SPORT and parts of FT.COM
• We serve some of the most knowledge intensive enterprises:
Sep 2017euBusinessGraph Company and Economic Data
11. Sep 2017euBusinessGraph Company and Economic Data
• Integrate European company and economic data
• euBusinessGraph will overcome barriers in company data provisioning
• Technology and research partners: SINTEF (coord.), Ontotext, IJS, Uni. Milano
13. Sep 2017euBusinessGraph Company and Economic Data
• Global Legal Entity Identifier (GLEI)
• Business Registers Interconnection System (BRIS)
• Financial Industry Business Ontology (FIBO)
• OpenCorporates schema
• Bulgarian Trade Register schema
• W3C: Organization ontology, Registered Organization ontology, Location ontology
• Investigative journalism datasets: Panama Papers dataset, Linked Leaks, Trump World dataset
• Wikidata properties for describing companies, especially company identifiers in various registers
• Other ontologies and code lists: Schema.org, Dublin Core, IANA language tags, NUTS and LAU (EU
administrative regions), NACE (EU economic activities), etc.
Company Datasets and Ontologies
14. Sep 2017euBusinessGraph Company and Economic Data
• The semantic data model combines
various data artefacts
− Includes detailed treatment of classes,
properties, values, scope notes, data provider
rules, URL conventions, etc.
• Tools:
− rdfpuml used to generate the diagrams
− Object-Role Modeling through the Norma
euBusinessGraph Semantic Data Model
16. Sep 2017euBusinessGraph Company and Economic Data
euBusinessGraph Technologies
• Ontotext Cognitive
Cloud & GraphDB
• DataGraft
• Dandelion API
• Wikifier
• ABSTAT
• TARQL
• XSPARQL
17. FactForge: Open data and
news about people and
organizations
http://factforge.net
18. FactForge: Data Integration
DBpedia (the English version) 496M
Geonames (all geographic features on Earth) 150M
owl:sameAs links between DBpedia and Geonames 471K
Company registry data (GLEI) 3M
Panama Papers DB (#LinkedLeaks) 20M
Other datasets and ontologies: WordNet, WorldFacts, FIBO
News metadata (2000 articles/day enriched by NOW) 473M
Total size (1611M explicit + 328M inferred statements) 1 939М
Sep 2017euBusinessGraph Company and Economic Data
19. News Metadata
• Metadata from Ontotext’s Dynamic Semantic Publishing platform
− News stream from Google
− Automatically generated as part of the NOW.ontotext.com semantic news showcase
•News stream from Google since Feb 2015, about 50k news/month
− ~70 tags (annotations) per news article
• Tags link text mentions of concepts to the knowledge graph
− Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases
Sep 2017euBusinessGraph Company and Economic Data
20. New Metadata
Category Count
International 52 074
Science and Technology 23 201
Sports 20 714
Business 15 155
Lifestyle 11 684
122 828
Mentions / entity type Count
Keyphrase 2 589 676
Organization 1 276 441
Location 1 260 972
Person 1 248 784
Work 309 093
Event 258 388
RelationPersonRole 236 638
Species 180 946
News Metadata
Sep 2017euBusinessGraph Company and Economic Data
21. Class Hierarchy Map (by number of instances)
Left: The big picture
Right: dbo:Agent class (2.7M organizations and persons)
Sep 2017euBusinessGraph Company and Economic Data
22. Sample queries at http://factforge.net
• F1: Big cities in Eastern Europe
• F2: Airports near London
• F3: People and organizations related to Google
• F4: Top-level industries by number of companies
Available as Saved Queries at http://factforge.net/sparql
Note: Open Saved Queries with the folder icon in the upper-right corner
Sep 2017euBusinessGraph Company and Economic Data
24. Relation Discovery Case
• Find suspicious
relationships like:
− Company in USA
− Controls another
company in USA
− Through a company in
an off-shore zone
• Show news
relevant to these
companies
Sep 2017euBusinessGraph Company and Economic Data
25. Offshore control example
• Query: Find companies, which control other companies in the same
country, through company in an off-shore zone
• How it works:
• Establish control-relationship
• Establish a company-country mapping
• Establish an “off-shore criteria”
• SPARQL it
Sep 2017euBusinessGraph Company and Economic Data
26. Off-shore company control example
SELECT *
FROM onto:disable-sameAs
WHERE {
?c1 fibo-fnd-rel-rel:controls ?c2 .
?c2 fibo-fnd-rel-rel:controls ?c3 .
?c1 ff-map:orgCountry ?c1_country .
?c2 ff-map:orgCountry ?c2_country .
?c3 ff-map:orgCountry ?c1_country .
FILTER (?c1_country != ?c2_country)
?c2_country ff-map:hasOffshoreProvisions true .
}
Sep 2017euBusinessGraph Company and Economic Data
28. Semantic Media Monitoring
For each
entity:
•popularity
trends
•relevant
news
•related
entities
•knowledge
graph
information
Try it at http://now.ontotext.com
Sep 2017euBusinessGraph Company and Economic Data
29. Semantic Media Monitoring/Press-Clipping
• We can trace references to a specific company in the news
− This is pretty much standard, however we can deal with syntactic variations in the names,
because state of the art Named Entity Recognition technology is used
− What’s more important, we distinguish correctly in which mention “Paris” refers to which of the
following: Paris (the capital of France), Paris in Texas, Paris Hilton or to Paris (the Greek hero)
• We can trace and consolidate references to daughter companies
• We have comprehensive industry classification
− The one from DBPedia, but refined to accommodate identifier variations and specialization (e.g.
company classified as dbr:Bank will also be considered classified as dbr:FinancialServices)
Sep 2017euBusinessGraph Company and Economic Data
30. Media Monitoring Queries
• F5: Mentions in the news of an organization and its related entities
• F7: Most popular companies per industry, including children
• F8: Regional exposition of company – normalized
Sep 2017euBusinessGraph Company and Economic Data
31. Media Monitoring Queries
• F5: Mentions in the news of an organization and its related entities
• F7: Most popular companies per industry, including children
• F8: Regional exposition of company – normalized
Sep 2017euBusinessGraph Company and Economic Data
32. News popularity ranking of companies
• Rankings can be customized by specifying a geographic region, news
category (e.g., business, sport, lifestyle, etc.) and time period.
• Unique features:
− It is based on live streaming news
− Tracks also mentions of subsidiaries
• Rank uses the industry sectors of DBPedia with several refinements
− About 40 top-industry sectors
− Sectors are linked in a hierarchical taxonomy (all together 251 sectors)
− Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)
33. Rank uses NOW, FactForge and GraphDB
• This ranking service is entirely based on FactForge
− FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts,
which is loaded in GraphDB
− GraphDB is a semantic graph database engine of Ontotext
− Unlike FactForge, this service is aimed at non-technical users as it does not require any knowledge
of SPARQL or other technology.
− But it allows users to see the SPARQL query for each ranking and to customize it
• Try http://rank.ontotext.com
37. Global Legal Entity Identifier (GLEI) data
•Global Legal Entity Identifier Foundation (GLEIF) Utility data
−Global Legal Entity Identifier Foundation (GLEIF) is tasked to support the implementation and use
of the Legal Entity Identifier (LEI)
−The foundation is backed and overseen by the LEI Regulatory Oversight Committee
•The data dump
−We downloaded as XML data dump from
https://www.gleif.org/en/lei-data/gleif-concatenated-file/download-the-concatenated-file.
−We used these 2 provided dumps
‣ Level 1 Data (Who is who)
‣ Level 2 Data (Who Owns Whom)
Sep 2017euBusinessGraph Company and Economic Data
38. Global Legal Entity Identifier (GLEI) data
•RDF-ized company records
−20M explicit statements for 505 thousand organizations
▪ For comparison, there are 296,544 organizations in DBPeda and D&B covers 200+ million
▪ A year ago GLEI had only 3M statements about 211 thousand organizations
−9,105 parent/child relationships, 16,150 associated organization
•9 705 organizations from the GLEI mapped to DBPediа
•Modeling the company data to FIBO
•XSPARQL as the transformation engine
https://github.com/Ontotext-AD/GLEI
Sep 2017euBusinessGraph Company and Economic Data
40. GLEI Company Data Sample: ABN-AMRO
lei:businessRegistry Kamer van Koophandel
lei:businessRegistryNumber 34334259
lei:duplicateReference data:549300T5O0D0T4V2ZB28
lei:entityStatus ACTIVE
lei:headquartersCity Amsterdam
lei:headquartersState Noord-Holland
lei:legalForm NAAMLOZE VENNOOTSCHAP
lei:legalName ABN AMRO Bank N.V.
lei:lei BFXS5XCH7N0Y05NIXW11
lei:registeredCity Amsterdam
lei:registeredCountry NL
lei:registeredPostCode 1082 PP
lei:registeredState Noord-Holland
GLEI Company Data Sample: ABN-AMRO
Sep 2017euBusinessGraph Company and Economic Data
41. Ultimate parent Children Country
1 The Goldman Sachs Group, Inc. 1 851 US
2 United Technologies Corporation 427 US
3 Honeywell International Inc. 341 US
4 Morgan Stanley 228 US
5 Cargill, Incorporated 217 US
6 1832 Asset Management L.P. 202 CA
7 Aegon N.V. 174 NL
8 Union Bancaire Privée, UBP SA 138 CH
9 Citigroup Inc. 135 US
10 State Street Corporation 128 US
Country Companies
1 dbr:United_States 103 548
2 dbr:Canada 17 425
3 dbr:Luxembourg 13 984
4 dbr:Sweden 7 934
5 dbr:United_Kingdom 7 421
6 dbr:Belgium 6 868
7 dbr:Ireland 4 762
8 dbr:Australia 4 385
9 dbr:Germany 3 039
10 dbr:Netherlands 2 561
GLEI Data Stats: 2016 (OLD)
Sep 2017euBusinessGraph Company and Economic Data
42. GLEI Data Stats: 2017
Sep 2017euBusinessGraph Company and Economic Data
Ultimate Parent Children Country
1 LLOYDS BANKING GROUP PLC 619 GB
2 HSBC HOLDINGS PLC 542 GB
3 THE ROYAL BANK OF SCOTLAND … 378 GB
4 DEUTSCHE BANK AKTIENGESELLSCHAFT 174 DE
5 BANK OF SCOTLAND PLC 111 GB
6 LLOYDS BANK PLC 93 GB
7 Swedbank AB (Publ) 90 SE
8
ROYAL LONDON MUTUAL INSURANCE
SOCIETY,LIMITED(THE) 89 GB
9 Lincoln Investment Advisors Corporation 88 US
10 Swedbank Robur AB 85 SE
Country Companies
1 US 136 889
2 IT 50 021
3 DE 48 850
4 FR 33 412
5 GB 32 015
6 CA 22 107
7 LU 22 075
8 NL 20 327
9 ES 19 569
10 SE 11 272
43. Mapping Datasets to
DBPedia with the
GraphDB Lucene
Connector
Sep 2017euBusinessGraph Company and Economic Data
44. Mapping datasets to DBPedia
• The task: map people, organizations and locations to IDs in DBPedia
− So that we can analyze the original data with the help of the extra information available in
DBPedia and other datasets that are related to it, e.g. Geonames
− For instance, the data from GLEI doesn’t contain any extra information about the companies,
e.g. industry sector, products, etc.
• Specific conditions: we had to map by names and locations
− There’re little features common for both for the GLEI and DBPedia data
▪ Address and country attributes are present, but those appeared to be marginally useful for mapping
− We mapped locations only in terms of countries and/or cities not finer grained locations
▪ For this purpose DBPedia geographic data is sufficient and it is also well mapped with GeoNames
Sep 2017euBusinessGraph Company and Economic Data
45. Mapping datasets to DBPedia (2)
• We used the GraphDB connector to Lucene for these mappings
− Using the GraphDB connector, Lucene index was created for Organizations and People from
DBPedia, indexing all sorts of names, descriptions and other textual information for each entity
− The mapping process consists mostly of using the name of the entity from the 3rd party dataset
(in this case Panama Papers or GLEI) as a FTS query, embedded in a SPARQL query
• What is that Lucence does better than SPARQL?
− When there is little information other than the name, we benefit from the free text indexing of
Lucene, because it deals well with minor syntactic variations and sorts the results by relevance
− When mappings 300 000 organizations against another 500 000 organizations, without a key,
the complexity of a SPARQL query is 300 000 x 500 000, which is slower that 300 000 Lucene
queries
Sep 2017euBusinessGraph Company and Economic Data
46. Mapping GLEI to DBPedia
• Data Pre-processing in DBPedia
− We generated primary city and primary country for each organization in DBPedia
▪ Also cleaned up data about HQ locations, etc.
▪ We used a series of SPARQL queries for this
• Iterative matching
− Match first those that have high relevance and match better constraints by location and country
• Matching outcome
− skos:exactMatch: 3880 matches
− skos:closeMatch: 5825 matches
Sep 2017euBusinessGraph Company and Economic Data
47. Thank you!
Experience the technology with our demonstrators
NOW: Semantic News Portal http://now.ontotext.com
RANK: News popularity ranking for companies http://rank.ontotext.com
FactForge: Hub for open data and news about People and Organizations
http://factforge.net
Sep 2017euBusinessGraph Company and Economic Data
Editor's Notes
Our vision is to enable machines to interpret data and text by interlinking those in big knowledge graphs.The web of open data in growing exponentially!
There are thousands of datasets from Wikipedia and Geonames to government statistical data and to Panama PapersWe link open data to analyze news. We extract data from news to produce more open data and analyze social media. We integrate all this with proprietary data and commercial databases.
Why???
To help journalists, banks, merchants, governments and citizens reveal more! Quicker, with less effort and less stress.
This is an “elevator pitch” for our overall technology approach, proposition and applications
We implement this vision synergizing two technologies: graph database engine and text mining
We invested 100s of person-years in R&D to develop this innovative platform in cooperation with the leading academic centers in Europe.
We converted advanced research into robust software which now runs mission critical services, including FT.COM and several websites of the BBC
We serve many of the most knowledge-intensive enterprises on Earth!
We implement this vision synergizing two technologies: graph database engine and text mining
We invested 100s of person-years in R&D to develop this innovative platform in cooperation with the leading academic centers in Europe.
We converted advanced research into robust software which now runs mission critical services, including FT.COM and several websites of the BBC
We serve many of the most knowledge-intensive enterprises on Earth!
We implement this vision synergizing two technologies: graph database engine and text mining
We invested 100s of person-years in R&D to develop this innovative platform in cooperation with the leading academic centers in Europe.
We converted advanced research into robust software which now runs mission critical services, including FT.COM and several websites of the BBC
We serve many of the most knowledge-intensive enterprises on Earth!
Economic journalism (Deutsche Welle)
Publication of rich company data (BRC)
Tender information service (CERVED)
Business intelligence (EVRY)
Company information service Atoka+ (SpazioDati)