A presentation of Ontotext’s CEO Atanas Kiryakov, given during Semantics 2018 - an annual conference that brings together researchers and professionals from all over the world to share knowledge and expertise on semantic computing.
[Conference] Cognitive Graph Analytics on Company Data and NewsOntotext
Atanas Kiryakov, Ontotext's CEO, presented at the Data Day Texas 2018 conference, which took place in Austin, TX, USA, on January 27th.
Ontotext's talk was part of the Graph Day Sessions and its focus was 'Cognitive graph analytics on company data and news', aiming to demonstrate the power of Graph Analytics to create links between various datasets and lead to knowledge discovery.
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
These are slides from a live webinar taken place January 2018.
GraphDB™ Fundamentals builds the basis for working with graph databases that utilize the W3C standards, and particularly GraphDB™. In this webinar, we demonstrated how to install and set-up GraphDB™ 8.4 and how you can generate your first RDF dataset. We also showed how to quickly integrate complex and highly interconnected data using RDF and SPARQL and much more.
With the help of GraphDB™, you can start smartly managing your data assets, visually represent your data model and get insights from them.
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
This webinar continues series are demonstrating how linked open data and semantic tagging of news can be used for comprehensive media monitoring, market and business intelligence. The platform for the demonstrations is FactForge: a hub for news and data about people, organizations, and locations (POL). FactForge embodies a big knowledge graph (BKG) of more than 1 billion facts that allows various analytical queries, including tracing suspicious patterns of company control; media monitoring of people, including companies owned by them, their subsidiaries, etc.
The Bounties of Semantic Data Integration for the Enterprise Ontotext
If you are looking for solutions that allow you not only to manage all of your data (structured, semi-structured and unstructured) but to also make the most out of them, using a common language is critical.
Adding Semantic Technology to data integration is the glue that holds together all your enterprise data and their relationships in a meaningful way.
Learn how you can quickly design data processing jobs and integrate massive amounts of data and see what semantic integration can do for your data and your business.
www.ontotext.com
Property graph vs. RDF Triplestore comparison in 2020Ontotext
This presentation goes all the way from intro "what graph databases are" to table comparing the RDF vs. PG plus two different diagrams presenting the market circa 2020
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
[Conference] Cognitive Graph Analytics on Company Data and NewsOntotext
Atanas Kiryakov, Ontotext's CEO, presented at the Data Day Texas 2018 conference, which took place in Austin, TX, USA, on January 27th.
Ontotext's talk was part of the Graph Day Sessions and its focus was 'Cognitive graph analytics on company data and news', aiming to demonstrate the power of Graph Analytics to create links between various datasets and lead to knowledge discovery.
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
These are slides from a live webinar taken place January 2018.
GraphDB™ Fundamentals builds the basis for working with graph databases that utilize the W3C standards, and particularly GraphDB™. In this webinar, we demonstrated how to install and set-up GraphDB™ 8.4 and how you can generate your first RDF dataset. We also showed how to quickly integrate complex and highly interconnected data using RDF and SPARQL and much more.
With the help of GraphDB™, you can start smartly managing your data assets, visually represent your data model and get insights from them.
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
This webinar continues series are demonstrating how linked open data and semantic tagging of news can be used for comprehensive media monitoring, market and business intelligence. The platform for the demonstrations is FactForge: a hub for news and data about people, organizations, and locations (POL). FactForge embodies a big knowledge graph (BKG) of more than 1 billion facts that allows various analytical queries, including tracing suspicious patterns of company control; media monitoring of people, including companies owned by them, their subsidiaries, etc.
The Bounties of Semantic Data Integration for the Enterprise Ontotext
If you are looking for solutions that allow you not only to manage all of your data (structured, semi-structured and unstructured) but to also make the most out of them, using a common language is critical.
Adding Semantic Technology to data integration is the glue that holds together all your enterprise data and their relationships in a meaningful way.
Learn how you can quickly design data processing jobs and integrate massive amounts of data and see what semantic integration can do for your data and your business.
www.ontotext.com
Property graph vs. RDF Triplestore comparison in 2020Ontotext
This presentation goes all the way from intro "what graph databases are" to table comparing the RDF vs. PG plus two different diagrams presenting the market circa 2020
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
Boost your data analytics with open data and public news contentOntotext
Get guidance through the gigantic sea of freely available Open Data and learn how it can empower you analysis of any kind of sources.
This webinar is a live demo of news and data analytics, based on rich links within big knowledge graphs. It will show you how to:
Build ranking reports (e.g for people and organisations)
View topics linked implicitly (e.g. daughter companies, key personnel, products …)
Draw trend lines
Extend your analytics with additional data sources
Diving in Panama Papers and Open Data to Discover Emerging NewsOntotext
Get guidance through the gigantic sea of freely released data from Panama Papers as well as Linked Open Data could. You will learn how it can empower you understanding of today’s news or any other information source.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
Imagine risk analysis manager or compliance officer who can discover easily relationships like this: Big Bucks Café out of Seattle controls My Local Café in NYC through an offshore company. Such discovery can be a game changer if My Local Café pretends to be an independent small enterprise, while recently Big Bucks experiences financial difficulties.
Gain Super Powers in Data Science: Relationship Discovery Across Public DataOntotext
What data scientists know better than anybody else is that data relationship is what matters the most. You can’t understand your data if you look at it as pieces in data silos.
In this webinar we’ll showcase how to discover relationships across public data.
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
The Physics Department of the University of Cagliari and the Linkalab Group invited me to talk about the Semantic Web and Linked Data - this is simply an introduction to the technologies involved.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
This presentation will discuss how just a few parts of the Semantic Web Cake can already boost your analytics by making your (big) data smarter and even more connected.
Data integration, data interoperation and data quality are major challenges that continue to haunt enterprises. Every enterprise either by choice or by chance has created massive silos of data in different formats, with duplications and quality issues.
Knowledge graphs have proven to be a viable solution to address the integration and interoperation problem. Semantic technologies in particular provide an intelligent way of creating an abstract layer for the enterprise data model and mapping of siloed data to that model, allowing a smooth integration and a common view of the data.
Technologies like OWL (Web Ontology Language) and RDF (Resource Description Framework) are the back bone of semantics for knowledge graph implementation. Enterprises use OWL to build an ontology model to create a common definition for concepts and how they are connected to each other in their specific domain.
They then use RDF to create a triple format representation of their data by mapping it to the Ontology. This approach makes their data smart and machine understandable.
But how can enterprises control and validate the quality of this mapped data? Furthermore, how can they use this one abstract representation of data to meet all their different business requirements? Different departments, different LoBs and different business branches all have their own data needs, creating a new challenge to be tackled by the enterprise.
In this talk we will look at how the power of SHACL (SHAPES and Constraints Language), a W3C standard for defining constraint sets over data; complements the two core semantic technologies OWL and RDF. What are the similarities, the overlaps and the differences.
We will talk about how SHACL gives enterprises the power to reuse, customize and validate their data for various scenarios, uses cases and business requirements; making the application of semantics even more practical.
Knowledge graphs - it’s what all businesses now are on the lookout for. But what exactly is a knowledge graph and, more importantly, how do you get one? Do you get it as an out-of-the-box solution or do you have to build it (or have someone else build it for you)? With the help of our knowledge graph technology experts, we have created a step-by-step list of how to build a knowledge graph. It will properly expose and enforce the semantics of the semantic data model via inference, consistency checking and validation and thus offer organizations many more opportunities to transform and interlink data into coherent knowledge.
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jConnected Data World
Dr. Jesús Barrasa's slides from his talk at Connected Data London. Jesús, who is a senior field engineer at Neo4j presented how semantic web principles can be used in a graph database.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
Supporting GDPR Compliance through effectively governing Data Lineage and Dat...Connected Data World
General Data Protection Regulation (GDPR) is a new set of EU guidelines governing how organisations handle personal data replacing the current Data Protection Act (DPA) and has been enforced since May 2018. With GDPR in place organizations need to process personal data lawfully, maintain this accurately for no longer than necessary, and in a secure way.
They should be able to report on the purposes of processing, the categories of personal data they control, and be able to demonstrate compliance with regards to GDPR policies. The challenge organizations face with regards to GDPR, being able to record every point where processing activities of personal data takes place and to showcase accountability with regards to this activity, has made data governance even more critical on the data lineage and data provenance aspects.
Governing data lineage enables the understanding of the organization’s data flow activities and to identify and document the legal justification for each type of activity. In addition GDPR requires evidence of records for the processing of personal data which implies the need to effectively record and govern data provenance.
In the current talk we are going to showcase how effectively governing data lineage and data provenance gives us the ability to verify that the processing of private data within an organization is compliant with GDPR regulatory requirements.
Open Data and News Analytics Demo from the 4th Sofia Open Data & Linked Data meetup
http://www.meetup.com/Sofia-Open-Data-Linked-Data-Meetup/events/228747999/
Mar'2016, Sofia | BG
Boost your data analytics with open data and public news contentOntotext
Get guidance through the gigantic sea of freely available Open Data and learn how it can empower you analysis of any kind of sources.
This webinar is a live demo of news and data analytics, based on rich links within big knowledge graphs. It will show you how to:
Build ranking reports (e.g for people and organisations)
View topics linked implicitly (e.g. daughter companies, key personnel, products …)
Draw trend lines
Extend your analytics with additional data sources
Diving in Panama Papers and Open Data to Discover Emerging NewsOntotext
Get guidance through the gigantic sea of freely released data from Panama Papers as well as Linked Open Data could. You will learn how it can empower you understanding of today’s news or any other information source.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
Imagine risk analysis manager or compliance officer who can discover easily relationships like this: Big Bucks Café out of Seattle controls My Local Café in NYC through an offshore company. Such discovery can be a game changer if My Local Café pretends to be an independent small enterprise, while recently Big Bucks experiences financial difficulties.
Gain Super Powers in Data Science: Relationship Discovery Across Public DataOntotext
What data scientists know better than anybody else is that data relationship is what matters the most. You can’t understand your data if you look at it as pieces in data silos.
In this webinar we’ll showcase how to discover relationships across public data.
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
The Physics Department of the University of Cagliari and the Linkalab Group invited me to talk about the Semantic Web and Linked Data - this is simply an introduction to the technologies involved.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
This presentation will discuss how just a few parts of the Semantic Web Cake can already boost your analytics by making your (big) data smarter and even more connected.
Data integration, data interoperation and data quality are major challenges that continue to haunt enterprises. Every enterprise either by choice or by chance has created massive silos of data in different formats, with duplications and quality issues.
Knowledge graphs have proven to be a viable solution to address the integration and interoperation problem. Semantic technologies in particular provide an intelligent way of creating an abstract layer for the enterprise data model and mapping of siloed data to that model, allowing a smooth integration and a common view of the data.
Technologies like OWL (Web Ontology Language) and RDF (Resource Description Framework) are the back bone of semantics for knowledge graph implementation. Enterprises use OWL to build an ontology model to create a common definition for concepts and how they are connected to each other in their specific domain.
They then use RDF to create a triple format representation of their data by mapping it to the Ontology. This approach makes their data smart and machine understandable.
But how can enterprises control and validate the quality of this mapped data? Furthermore, how can they use this one abstract representation of data to meet all their different business requirements? Different departments, different LoBs and different business branches all have their own data needs, creating a new challenge to be tackled by the enterprise.
In this talk we will look at how the power of SHACL (SHAPES and Constraints Language), a W3C standard for defining constraint sets over data; complements the two core semantic technologies OWL and RDF. What are the similarities, the overlaps and the differences.
We will talk about how SHACL gives enterprises the power to reuse, customize and validate their data for various scenarios, uses cases and business requirements; making the application of semantics even more practical.
Knowledge graphs - it’s what all businesses now are on the lookout for. But what exactly is a knowledge graph and, more importantly, how do you get one? Do you get it as an out-of-the-box solution or do you have to build it (or have someone else build it for you)? With the help of our knowledge graph technology experts, we have created a step-by-step list of how to build a knowledge graph. It will properly expose and enforce the semantics of the semantic data model via inference, consistency checking and validation and thus offer organizations many more opportunities to transform and interlink data into coherent knowledge.
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jConnected Data World
Dr. Jesús Barrasa's slides from his talk at Connected Data London. Jesús, who is a senior field engineer at Neo4j presented how semantic web principles can be used in a graph database.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
Supporting GDPR Compliance through effectively governing Data Lineage and Dat...Connected Data World
General Data Protection Regulation (GDPR) is a new set of EU guidelines governing how organisations handle personal data replacing the current Data Protection Act (DPA) and has been enforced since May 2018. With GDPR in place organizations need to process personal data lawfully, maintain this accurately for no longer than necessary, and in a secure way.
They should be able to report on the purposes of processing, the categories of personal data they control, and be able to demonstrate compliance with regards to GDPR policies. The challenge organizations face with regards to GDPR, being able to record every point where processing activities of personal data takes place and to showcase accountability with regards to this activity, has made data governance even more critical on the data lineage and data provenance aspects.
Governing data lineage enables the understanding of the organization’s data flow activities and to identify and document the legal justification for each type of activity. In addition GDPR requires evidence of records for the processing of personal data which implies the need to effectively record and govern data provenance.
In the current talk we are going to showcase how effectively governing data lineage and data provenance gives us the ability to verify that the processing of private data within an organization is compliant with GDPR regulatory requirements.
Open Data and News Analytics Demo from the 4th Sofia Open Data & Linked Data meetup
http://www.meetup.com/Sofia-Open-Data-Linked-Data-Meetup/events/228747999/
Mar'2016, Sofia | BG
Benefiting from Semantic AI along the data life cycleMartin Kaltenböck
Slides of 1 hour session of Martin Kaltenböck (CFO and Managing Partner of Semantic Web Company / PoolParty Software Ltd) on 19 March 2019 in Boston, US at the Enterprise Data World 2019, with its title: Benefiting from Semantic AI along the data life cycle.
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Connected Data World
Borislav Popov's slides from his lightning talk at Connected Data London. Borislav - a Director of Business Development at Ontotext presented Ontotext's approach to tackling the Panama Papers leak. Using a technology that is a mix between semantic web and graph databases.
Forecast to contribute £216 billion to the UK economy via business creation, efficiency and innovation, and generate 360,000 new jobs by 2020, big data is a key area for recruiters.
In this QuickView:
- Big data in numbers
- Top 10 industries hiring big data professionals
- Top 10 qualifications sought by hirers
- Top 10 database and BI skills sought by hirers
- Getting started in big data: popular big data techniques and vendors
Learning Objective: Discover the upcoming trends of information technology
This seminar looks at technology trends that should be on your radar. As a technology professional, staying on top of trends is crucial. Join us as our expert panelists discuss the upcoming trends and game-changing technologies of the future.
At the end of this seminar, participants will:
a. Learn how to identify the areas where technology changes are likely.
b. Identify resources to use to keep abreast of technology changes in their industry.
c. Learn how to analyze trends for opportunities to grow their careers.
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 20, 2013, at the "IBM Developer Days 2013" in Zurich, Switzerland.
ABSTRACT
There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Data enrichment is vital for leveraging heterogeneous data sources in various business analyses, AI applications, and data-driven services. Knowledge Graphs (KGs) support the enrichment of heterogeneous data sources by making entities first-class citizens: links to entities help interconnect heterogeneous data pieces or even ease access to external data sources to eventually augment the original data. Data annotation algorithms to find and link entities in reference KGs, as well as to identify out-of-KG entities have been proposed and applied to different types of data, such as tables, and texts. However, despite recent progress in annotation algorithms, the output of these algorithms does not always meet the quality requirements that make the enriched data valuable in downstream applications. As a result, semantic data enrichment remains an effort-consuming and error-prone task. In this seminar, we discuss the relationships between annotation algorithms, data enrichment, and KG construction, highlighting challenges and open problems. In addition, we advocate for a native human-in-the-loop perspective that enables users to control the outcome of the enrichment and, eventually, improve the quality of the enriched data. We focus in particular on the annotation and enrichment of tabular data and briefly discuss the application of a similar paradigm to the enrichment of textual data in the legal domain, e.g., on court decisions and criminal investigation documents.
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
Charles Cai has more than two decades of experience and track records of global transformational programme deliveries – from vision, evangelism to end-to-end execution in global investment banks, and energy trading companies, where he excels at designing and building innovative, large scale, Big Data systems in high volume low latency trading, global Energy Trading & Risk Management, and advanced temporal and geospatial predictive analytics, as Chief Front Office Technical Architect and Head of Data Science. He’s also a frequent speaker at Google Campus, Big Data Innovation Summit, Cloud World Forum, Data Science London, QCon London and MoD CIO Symposium etc, to promote knowledge and best practice sharing, with audience ranging from developers, data scientists, to CXO level senior executives from both IT and business background. He has in-depth knowledge and experience Scala, Python, C# / F#, C++, Node.js, Java, R, Haskell programming languages in Mobile, Desktop, Hadoop/Spark, Cloud IoT/MCU and BlockChain etc, and TOGAF9, EMC-DS, AWS CNE4 etc. certifications.
JIMS IT Flash , a monthly newsletter-An Initiative by the students of IT Department, shares the knowledge to its readers about the latest IT Innovations, Technologies and News.Your suggestions, thoughts and comments about latest in IT are always welcome at itflash@jimsindia.org.
Visit Website : http://jimsindia.org/
Similar to Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking (20)
It Don’t Mean a Thing If It Ain’t Got SemanticsOntotext
With the tons of bits of data around enterprises and the challenge to turn these data into knowledge, meaning is arguably in the systems of the best database holder.
Turning data pieces into actionable knowledge and data-driven decisions takes a good and reliable database. The RDF database is one such solution.
It captures and analyzes large volumes of diverse data while at the same time is able to manage and retrieve each and every connection these data ever get to enter in.
In our latest slides, you will find out why we believe RDF graph databases work wonders with serving information needs and handling the growing amounts of diverse data every organization faces today.
[Webinar] GraphDB Fundamentals: Adding Meaning to Your DataOntotext
In this webinar, Desislava Hristova demonstrated how to install and set-up GraphDB™ and how one can generate RDF dataset. She also showed how one can quickly integrate complex and highly interconnected data using RDF, how to write some simple SPARQL queries and more.
In a nutshell, this webinar is suitable for those who are new to RDF databases and would like to learn how they can smartly manage their data assets with GraphDB™.
Hercule: Journalist Platform to Find Breaking News and Fight Fake OnesOntotext
Hercule: a platform to help journalists detect emerging news topics, check their veracity, track an event as it unfolds and find the various angles in a story as it develops.
How to migrate to GraphDB in 10 easy to follow steps Ontotext
GraphDB Migration Service helps you institute Ontotext GraphDB™ as your new semantic graph database. GraphDB Migration Service helps you institute Ontotext GraphDB™ as your new semantic graph database.
Designed with a view to making your transitioning to GraphDB frictionless and resource-effective, GraphDB Migration Service provides the technical support and expertise you and your team of developers need to build a highly efficient architecture for semantic annotation, indexing and retrieval of digital assets.
With GraphDB Migration Services you will:
* Optimize the cost of managing the RDF database;
* Improve the performance of your system;
* Get the maximum value from your semantic solution.
GraphDB Cloud: Enterprise Ready RDF Database on DemandOntotext
GraphDB Cloud is an enterprise grade RDF graph database providing high-performance querying over large volumes of RDF data. On this webinar, Ontotext demonstrates how to instantly create and deploy a fully managed Graph Database, then import & query data with the (OpenRDF) GraphDB Workbench, and finally explore and visualize data with the build in visualization tools.
Smarter content with a Dynamic Semantic Publishing PlatformOntotext
Personalized content recommendation systems enable users to overcome the information overload associated with rapidly changing deep and wide content streams such as news. This webinar discusses Ontotext’s latest improvements to its Dynamic Semantic Publishing (DSP) platform NOW (News on the Web). The Platform includes social data mining, web usage mining, behavioral and contextual semantic fingerprinting, content typing and rich relationship search.
What is GraphDB and how can it help you run a smart data-driven business?
Learn about GraphDB through the solutions it offers in a simple and easy to understand way. In the slides below we have unpacked GraphDB for you, using as little tech talk as possible.
Efficient Practices for Large Scale Text Mining ProcessOntotext
Text mining is a need when managing large scale textual collections. It facilitates access to, otherwise, hard to organise unstructured and heterogeneous documents, allows for extraction of hidden knowledge and opens new dimensions in data exploration.
In this webinar, Ivelina Nikolova, PhD, shares best practices and text analysis examples from successful text mining process in domains like news, financial and scientific publishing, pharma industry and cultural heritage.
Best Practices for Large Scale Text Mining ProcessingOntotext
Q&A:
NOW facilitates semantic search by having annotations attached to search strings. How compolex does that get, e.g. with wildcards between annotated strings?
NOW’s searchbox is quite basic at the moment, but still supports a few scenarios.
1. Pure concept/faceted search - search for all documents containing a concept or where a set of concepts are co-occurring. Ranking is based on frequence of occurrence.
2. Concept/faceted + Full Text search - search for both concepts and particular textual term of phrase.
3. Full text search
With search, pretty much anything can be done to customise it. For the NOW showcase we’ve kept it fairly simple, as usually every client has a slightly different case and wants to tune search in a slightly different direction.
The search in NOW is faceted which means that you search with concepts (facets) and you retrieve all documents which contain mentions of the searched concept. If you search by more than one facet the engine retrieves documents which contain mentions of both concepts but there is no restriction that they occur next to each other.
Is the tagging service expandable (say with custom ontologies)? also is it a something you offer as a service? it is unclear to me from the website.
The TAG service is used for demonstration purposes only. The models behind it are trained for annotating news articles. The pipeline is customizable for every concrete scenario, different domains and entities of interest. You can access several of our pipelines as a service through the S4 platform or you can have them hosted as an on premise solution. In some cases our clients want domain adaptation or improvements in particular area, or to tag with their internal dataset - in this case we offer again an on premise deployment and also a managed service hosted on our hardware.
Hdoes your system accomodate cluster analysis using unsupervised keyword/phrase annotation for knowledge discovery?
As much as the patterns of user behaviour are also considered knowledge discovery we employ these for suggesting related reads. Apart from these we have experience tailoring custom clustering pipelines which also rely on features like keyword and named entities.
For topic extraction how many topics can we extract? from twitter corpus wgat csn we infer?
For topic extraction we have determined that we obtain best results when suggesting 3 categories. These are taken from IPTC but only the uppermost levels which are less than 20.
The twitter corpus example is from a project Ontotext participates in called Pheme. The goal of the project is to detect rumours and to check their veracity, thus help journalists in their hunt for attractive news.
Do you provide Processing Resources and JAPE rules for GATE framework and that can be used with GATE embedded?
We are contributing to the GATE framework and everything which has been wrapped up as PRs has been included the corresponding GATE distributions.
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
Many issues are faced by scholars, book researchers, museum directors who try to find the underlying connection between resources. Scholars in particular continuously emphasizes the role of digital humanities and the value of linked data in cultural heritage information systems.
Semantic Data Normalization For Efficient Clinical Trial ResearchOntotext
Clinical trials, both public and proprietary, hold a huge amount of valuable information. Acquiring knowledge from that information in a cost and time efficient manner is a major industry pain point.
Although information from clinical trials is stored in structured or semi-structured form, it is rarely coded with medical terminologies, which creates a significant level of ambiguity and increases the effort for data preparation for analytical purposes.
Gaining Advantage in e-Learning with Semantic Adaptive TechnologyOntotext
In this presentation, we will introduce you to a solution that involves adaptive semantic technology for educational institutions and e-learning providers. You will learn how to integrate 3rd party resources, legacy assets, and other content sources to create the so-called knowledge graph of all structured and unstructured data.
How is the Semantic Web vision unfolding and what does it take for the Web to fully reach its potential and evolve from a Web of Documents to a Web of Data through universal data representation standards.
Why Semantics Matter? Adding the semantic edge to your content,right from au...Ontotext
We’ll address a few of the basic industry pain points and show how semantics can come to the rescue, including:
How semantics can add value across the various phases of digital product development lifecycle.
Contextual authoring and content curation through automated editorial workflow solutions.
Enhanced content discoverability through relevant recommendations.
Coming together of bulletproof content delivery platform and dynamic semantic publishing technology
Adding Semantic Edge to Your Content – From Authoring to DeliveryOntotext
Within the last few years we see and ever increasing demand for more accurate user specific content which on the other hand overwhelms content providers.This is where smart publishing platforms come into play. They aim at bringing the right content at the right time – digested, easy to comprehend, fast to navigate, and tailored to the readers’ personal interests.
The technologies that power them help publishers to automate the metadata enrichment process, making it more consistent, accurate and rich.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
1. making sense of text and data
Semantics 2018, Vienna
Analytics on Big Knowledge
Graphs Deliver Entity Awareness
and Help Data Linking
2. Presentation Outline
o Ontotext Introduction
o Technology and Portfolio
o Cognitive Analytics Meet Big Knowledge Graphs
o Big Company Data: Knowing, Matching and Cleaning
o Product Roadmap
Presentation Outline
3. o Global business information will be
key for competitiveness tomorrow
o Adequate business decisions require global information!
Analytics cannot deliver deep market/business insights based only on proprietary data
Broader context and signals are needed
o Merging data requires concept and entity awareness
Entity matching across databases requires rich knowledge about the entity
Entity recognition in text requires even more context
o Ontotext makes this possible
Vision
4. Mission
We help enterprises to identify meaning across:
o Diverse databases & unstructured data
We combine:
o Proprietary & Global data
o Graph databases & Text mining
o Symbolic reasoning & Machine learning
5. History and Essential Facts
o Started in year 2000 as Semantic Web pioneer
Part of Sirma Group: ~400 persons, listed at Sofia Stock Exchange
Got spun-off and took VC investment in 2008
o R&D Center in Sofia, 80% sales in USA and UK
Over 400 person-years invested in R&D
Multiple innovation awards: Washington Post, BBC, FT, ...
o Member of multiple industry bodies
W3C, EDMC, ODI, LDBC, STI, DBPedia Foundation
6. Best known for GraphDB
“Despite all of this attention the market is
dominated by Neo4J and Ontotext
(GraphDB), which are graph and RDF
database providers respectively. These are
the longest established vendors in this space
(both founded in 2000) so they have a
longevity and experience that other
suppliers cannot yet match. How long this
will remain the case remains to be seen.”
Bloor Group report
Graph Databases, April 2015
http://www.bloorresearch.com/technology/graph-databases/
7. Fancy Stuff and Heavy Lifting
o We do advanced analytics:
We predicted BREXIT
14 Jun 2016 whitepaper:
#BRExit Twitter Analysis: More Twitter Users
Want to Split with EU and Support #Brexit
https://ontotext.com/white-paper-brexit-twitter-analysis/
o But most of the time we do the
heavy lifting of data integration
and information extraction
Enabling data scientists can do fancy things
8. Discovery in Knowledge Graphs
o Find suspicious
patterns like:
Company in USA
Controls another
company in USA
Through a company
in an off-shore zone
o Show news
relevant to
these companies
8
9.
10. Technology Excellence Delivered
o Unique technology mix: GraphDBTM engine + Text mining
o Robust technology: powers BBC.CO.UK/SPORT and FT.COM
o We serve the most knowledge intensive enterprises:
11. Presentation Outline
o Ontotext Introduction
o Technology and Portfolio
o Cognitive Analytics Meet Big Knowledge Graphs
o Big Company Data: Knowing, Matching and Cleaning
o Product Roadmap
Presentation Outline
12. Linking Text to Big Knowledge Graphs
1. Integrate relevant structured data
Build a Big Knowledge Graph from proprietary databases
and taxonomies combined with Linked Open Data
2. Infer new facts and unveil relationships
Performing reasoning across data from different sources
3. Link text mentions to the Knowledge Graph
Using text-mining to automatically discover references to
concepts and entities
4. Hybrid Queries and Search in GraphDB
13. Text Analytics:
Semantic Disambiguation
GraphDB
Vocabulary
Vocabulary Gazetteer
Disambiguation
NLP Pipeline
Language Detection
POS
...
...
...
Relevance Ranking
...
Dynamic
Vocabulary
Get
Suggestions
Annotate
Content
Apple : Organisation
Tim Cook : Person, CEO
Tim Cook : Person, Footballer
Samsung : Organisation
Apple : Organisation
Tim Cook : Person, CEO
Tim Cook : Person, Footballer
Samsung : Organisation
87% - Tim Cook : Person, CEO
68% - Apple : Organisation
56% - Samsung : Organisation
Apple CEO Tim Cook was
at a conference with the
CEO of Samsung. Tim
explained how smart
phones are changing the
consumer electronics
market.
Suggestions
Entity Detection from
Vocabulary
Disambiguation
Relevance
14. Sample Knowledge Graph with Metadata
Document
Apple
Organisation
SamsungAnnotation
textpos:123,142
relevance:56%
mentions
Annotation
textpos:123,142
relevance:68%
about
Tim Cook Person
target
target
tag
tag
ceo
type
type
competitor
Annotation
textpos:123,142
relevance:87%
about
target
tag
USA
NASDAQ
Computer
Hardware
location
exchange
sector
type
15. Linking News to Big Knowledge Graphs
o Link text to
knowledge
graphs
o Navigate
from news
to concepts
and from
there to
other news
Try it at http://now.ontotext.com #15
16. Semantic Media Monitoring
For each entity:
o popularity
trends
o relevant news
o related
entities
o knowledge
graph
information
16Try it at http://now.ontotext.com
18. Big KG Demosntration
o DBpedia (the English version) 496M
o Geonames (all geographic features on Earth) 150M
o owl:sameAs links between DBpedia and Geonames 471K
o GLEI (global company register data) 3M
o Panama Papers DB (#LinkedLeaks) 20M
o Other datasets and ontologies: WordNet, WorldFacts, FIBO
o News metadata (2000 articles/day enriched by NOW) 673M
o Total size (1.8B explicit + 328M inferred statements) 2 168М
21. Presentation Outline
o Ontotext Introduction
o Technology and Portfolio
o Cognitive Analytics Meet Big Knowledge Graphs
o Big Company Data: Knowing, Matching and Cleaning
o Product Roadmap
Presentation Outline
22. Context and Awareness
o Context allows concepts to be identified, the way people do
o Big knowledge graph can provide context for the entities in it
Differentiating features and similar nodes
How important and how popular it is
Related entities and concepts
Entities it is typically mentioned together with (co-occurrence)
o This is awareness!
o The kind of knowledge that people mean saying "I am aware of X" or
"She is cognizant of Y"
23. The Critical Mass
Malcolm Gladwell claims that one needs
to devote 10 000 hours to become an
expert in something, e.g. violin or hokey
(Outliers)
24. The Critical Mass
o A cognitive system needs:
To know 1B facts
About 100M concepts and entities
Read 1M news articles
o In order to reach concept and entity
awareness in a specific domain
The level of awareness that people mean saying
“My background is X”
25. Let’s play an Awareness game!
o Important airports near London?
o The most popular banks in UK?
o Companies similar to Google?
o People mentioned together with IBM in news?
26. We are getting closer!
o Our Business Knowledge Model can already answer many of
these questions better than you
o Most of this intelligence is available in the Ontotext Platform
o Knowledge model = KG + text mining + analytics
o We already offer two such knowledge models:
Business and general news: one for processing general business master data (like people,
organizations, locations and their mentions in the news)
Life sciences and healthcare
27. Customized Cognitive Marketing Intelligence
o Developing from scratch cognitive system
with global knowledge is infeasible
o We can provide and “onboard” one for you:
Suggest open and commercial data sources
Integrate them with your proprietary data sources
Tune text analytics
Develop specific analytics, reports, dashboards, etc.
o We can also maintain it for you:
Various support and maintenance options, including …
Managed data service: updates, monitoring, data quality
28. Presentation Outline
o Ontotext Introduction
o Technology and Portfolio
o Cognitive Analytics Meet Big Knowledge Graphs
o Big Company Data: Knowing, Matching and Cleaning
o Product Roadmap
Presentation Outline
29. o POL data is the most common type of master/reference data
Considering business applications and news
o Open POL data is available in vast quantities
Geonames covers locations exhaustively; DBPedia covers well popular POL entities; Wikidata, …
Open company data grows: OpenCorporates, GLEI, open national registers, various “data leaks”
o Within 3 years exhaustive global POL data will be commodity!
And it will be widely used for BI and decision making
o Ontotext delivers Global POL data solutions today.
We make them more affordable with more cognitive analytics
Person, Organization, Location (POL) Data
30. Oct
2016
Company Data Species (1/2)
Category Representatives Size (Orgs.)
Exhaustive Global Databases Dun & Bradstreet, BvD, Factset > 200M
Rich Company Databases Capital IQ (S&P), Thomson Reuters (various) 5-10M
Investment Databases CrunchBase, PitchBook, CBI, DJ Venture Source 200-600K
Very Big Open Databases OpenCorporates 130M
Global Official Open Databases GLEI (Global Legal Identifier), EU BRIS 1-30M
Open Encyclopedic DBPedia, Wikidata 0.3-1.2M
Open Leaks and Investigations Panama Papers (Offshore Leaks), Trump World Data 3-300K
31. Oct
2016
Company Data Species (2/2)
Category Loca
tions
Industry
Classi-
fication
High
Tech.
Fields
Invest.
Info
Org-Org
Relations
(e.g.Tree)
Org-
Person
Relations
Clean,
Correct,
Predictable
Exhaustive Global Databases ++ +/- - - ++ +/- 6
Rich Company Databases ++ + +/- +/- ++ +/- 8
Investment Databases +/- +/- + + ++/- +/- 4-6
Very Big Open Databases + +/- - - +/- - 8
Global Official Open Databases + - - - +/- - 8
Open Encyclopedic +/- +/- + - +/- + 3-5
Open Leaks and Investigations +/- - - - +/- + 4-6
32. Matching and Overlap
o Organizations matched across:
CrunchBase (CB), CB Insights
(CBI), Capital IQ (CIQ), DJ
Venture Source, …
o The Venn diagram presents
the overlap between sources
The size of the circle indicates
number of entities per source
The level of overlap indicates number
of entities matched between the two
sources
34. Entity Matching Across Datasets
o Match IDs of one of the same real entity across different databases
o Data Challenges
Different schemata
Name variations
Different classifications and codes
Lack of unique identifiers (even ticker symbols are not unique)
o Technology challenges
Pre-selection is needed; brute-force matching is not good for 1M against 5M companies
It is not trivial to come up with good pre-selection mechanism
35. Company Matching Sample Project
o We matched 5+ big datasets within couple of months
o Fully automated procedure, which takes few hours to execute
90% SPARQL and GraphDB’s FTS connectors
o Location normalization through matching to Geonames
Also industry classification alignment across the sources
o About 85% F-Score with simple structural matching rules
o To get higher accuracy, you need:
Massive amount of manual work and fine-tuning of weights … or
Cognitive analytics (importance, similarity, highly accurate named entity recognition, etc.)
36. Presentation Outline
o Ontotext Introduction
o Technology and Portfolio
o Cognitive Analytics Meet Big Knowledge Graphs
o Big Company Data: Knowing, Matching and Cleaning
o Product Roadmap
Presentation Outline
37. Product Roadmap (short term)
o Ontotext platform
Multi-tenant version of our Manual Annotation Tool
Streamlined ETL and entity matching based on SPARK
Configurable Semantic Search front end
o GraphDB
Reconciliation
Faster transactions on big knowledge graphs – 2x speed up of small transactions
Faster SPARQL federation between local repositories
Similarity based on Semantic Vectors
39. GraphDB Semantic Similarity Plugin
o Statistics similarity on knowledge graphs using Semantic vectors
o Creates statistical semantic models from your RDF data and search for
similar terms and documents
o Sample:
o Create index from the news from FactForge
o Find similar news, find relevant terms for a news, etc..
41. Take home
o Business needs global company data for market intelligence
o Linking Proprietary and global data is rocket science
Mainstream tech cannot deal with such diversity
Semantic data integration and cognitive analytics needed
o Ontotext is ready to help
Consulting: help you build the concept for your next generation system
Develop: build one for you or support you developing your platform
Support and operations: from Level 3 support to Managed services
42. Thank you!
Experience the technology with our demonstrators
NOW: Semantic News Portal http://now.ontotext.com
RANK: News popularity ranking for companies http://rank.ontotext.com
FactForge: Hub for open data and news about People and Organizations
http://factforge.net
#42