This document discusses the evolution of natural language processing (NLP) and knowledge engineering (KE) and their convergence in artificial intelligence. It outlines how deep learning is increasingly being used for NLP tasks like representation learning and distributional semantics. It also discusses semantic relations and challenges in extracting relations from text using NLP and KE techniques.
The document provides an overview of knowledge graphs and the metaphactory knowledge graph platform. It defines knowledge graphs as semantic descriptions of entities and relationships using formal knowledge representation languages like RDF, RDFS and OWL. It discusses how knowledge graphs can power intelligent applications and gives examples like Google Knowledge Graph, Wikidata, and knowledge graphs in cultural heritage and life sciences. It also provides an introduction to key standards like SKOS, SPARQL, and Linked Data principles. Finally, it describes the main features and architecture of the metaphactory platform for creating and utilizing enterprise knowledge graphs.
A Linked Data Dataset for Madrid Transport Authority's DatasetsOscar Corcho
This document discusses the creation of a linked data dataset for Madrid's public transport authority (CRTM) to make their transport data more accessible and reusable. It outlines the motivation and benefits of open transport data, reviews existing methods of publishing open data, and proposes publishing CRTM's data as linked open data using semantic web standards to enable new applications and value-added services by combining the transport data with other public datasets. The methodology describes transforming CRTM's static and real-time transport datasets into RDF and providing SPARQL and SPARQL-Stream endpoints to access the data. Examples demonstrate sample URIs, queries to retrieve stop points, and visualizations of the linked data.
Ephedra: efficiently combining RDF data and services using SPARQL federationPeter Haase
The document describes Ephedra, a SPARQL federation engine that efficiently combines distributed RDF data and services using SPARQL queries. Ephedra extends the RDF4J API to treat compute services as virtual RDF repositories. It performs optimizations like reordering clauses, pushing limits/orders down, and parallel competing joins. An evaluation on cultural heritage and life science queries showed runtime improvements over no optimization. Future work includes backend-aware optimizations and collecting service statistics for improved planning. Ephedra provides an architecture for integrating diverse data sources and services through SPARQL federation.
The document provides an overview of knowledge graphs and introduces metaphactory, a knowledge graph platform. It discusses what knowledge graphs are, examples like Wikidata, and standards like RDF. It also outlines an agenda for a hands-on session on loading sample data into metaphactory and exploring a knowledge graph.
The document provides an overview of knowledge graphs and the metaphactory knowledge graph platform. It defines knowledge graphs as semantic descriptions of entities and relationships using formal knowledge representation languages like RDF, RDFS and OWL. It discusses how knowledge graphs can power intelligent applications and gives examples like Google Knowledge Graph, Wikidata, and knowledge graphs in cultural heritage and life sciences. It also provides an introduction to key standards like SKOS, SPARQL, and Linked Data principles. Finally, it describes the main features and architecture of the metaphactory platform for creating and utilizing enterprise knowledge graphs.
A Linked Data Dataset for Madrid Transport Authority's DatasetsOscar Corcho
This document discusses the creation of a linked data dataset for Madrid's public transport authority (CRTM) to make their transport data more accessible and reusable. It outlines the motivation and benefits of open transport data, reviews existing methods of publishing open data, and proposes publishing CRTM's data as linked open data using semantic web standards to enable new applications and value-added services by combining the transport data with other public datasets. The methodology describes transforming CRTM's static and real-time transport datasets into RDF and providing SPARQL and SPARQL-Stream endpoints to access the data. Examples demonstrate sample URIs, queries to retrieve stop points, and visualizations of the linked data.
Ephedra: efficiently combining RDF data and services using SPARQL federationPeter Haase
The document describes Ephedra, a SPARQL federation engine that efficiently combines distributed RDF data and services using SPARQL queries. Ephedra extends the RDF4J API to treat compute services as virtual RDF repositories. It performs optimizations like reordering clauses, pushing limits/orders down, and parallel competing joins. An evaluation on cultural heritage and life science queries showed runtime improvements over no optimization. Future work includes backend-aware optimizations and collecting service statistics for improved planning. Ephedra provides an architecture for integrating diverse data sources and services through SPARQL federation.
The document provides an overview of knowledge graphs and introduces metaphactory, a knowledge graph platform. It discusses what knowledge graphs are, examples like Wikidata, and standards like RDF. It also outlines an agenda for a hands-on session on loading sample data into metaphactory and exploring a knowledge graph.
Linked Open Data for cities at SemTechBiz 2013 (San Francisco)AI4BD GmbH
Showing how to use open source tools to create linked open data. Provided a first view into the Linked Data Orchestration process that is easy to use and support the triplification process including the publishing of datasets as SPARQL endpoint.
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Matthias Arnold
Ms. Jingjing Chen
eMail: jingjing.chen@hcts.uni-hd.de
Project website: http://ecpo.uni-hd.de
Arnold, Heidelberg | Early Chinese Periodicals Online (ECPO): From Digitization to Open Data | JADH 2018
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
Linked Data Experiences at Springer NatureMichele Pasin
An overview of how we're using semantic technologies at Springer Nature, and an introduction to our latest product: www.scigraph.com
(Keynote given at http://2016.semantics.cc/, Leipzig, Sept 2016)
This presentation shows approaches for knowledge graph construction from Wikipedia and other Wikis that go beyond the "one entity per page" paradigm. We see CaLiGraph, which extracts entities from categories and listings, as well as DBkWik, which extracts and integrates information from thousands of Wikis.
KeyNote @SEMANTICS 2017 (Amsterdam, sept 2017) about convergences between NLP and KE at the era of the semantic web, with a focus on semantic relation extraction from text.
This document discusses the evolution of natural language processing (NLP) and knowledge engineering (KE) and their convergence, especially with the rise of deep learning and the semantic web. It outlines how NLP and KE have moved from early ambitions of full language understanding and problem solving to more practical, layered approaches focused on specific tasks. The semantic web provides standards and architectures that benefit both NLP and KE by enabling semantic annotation, linking of data, and use of knowledge sources. Deep learning allows NLP to learn representations from large corpora and benefit from semantic resources. Relation extraction and ontology learning from text are examples of the convergence. Challenges remain around contextual language, knowledge assertion, and industrial applications.
This lectures provides students with an introduction to natural language processing, with a specific focus on the basics of two applications: vector semantics and text classification.
(Lecture at the QUARTZ PhD Winter School (http://www.quartz-itn.eu/training/winter-school/ in Padua, Italy on February 12, 2018)
The best known natural language processing tool is GPT-3, from OpenAI, which uses AI and statistics to predict the next word in a sentence based on the preceding words. NLP practitioners call tools like this “language models,” and they can be used for simple analytics tasks, such as classifying documents and analyzing the sentiment in blocks of text, as well as more advanced tasks, such as answering questions and summarizing reports. Language models are already reshaping traditional text analytics, but GPT-3 was an especially pivotal language model because, at 10x larger than any previous model upon release, it was the first large language model, which enabled it to perform even more advanced tasks like programming and solving high school–level math problems. The latest version, called InstructGPT, has been fine-tuned by humans to generate responses that are much better aligned with human values and user intentions, and Google’s latest model shows further impressive breakthroughs on language and reasoning.
For businesses, the three areas where GPT-3 has appeared most promising are writing, coding, and discipline-specific reasoning. OpenAI, the Microsoft-funded creator of GPT-3, has developed a GPT-3-based language model intended to act as an assistant for programmers by generating code from natural language input. This tool, Codex, is already powering products like Copilot for Microsoft’s subsidiary GitHub and is capable of creating a basic video game simply by typing instructions. This transformative capability was already expected to change the nature of how programmers do their jobs, but models continue to improve — the latest from Google’s DeepMind AI lab, for example, demonstrates the critical thinking and logic skills necessary to outperform most humans in programming competitions.
Models like GPT-3 are considered to be foundation models — an emerging AI research area — which also work for other types of data such as images and video. Foundation models can even be trained on multiple forms of data at the same time, like OpenAI’s DALL·E 2, which is trained on language and images to generate high-resolution renderings of imaginary scenes or objects simply from text prompts. Due to their potential to transform the nature of cognitive work, economists expect that foundation models may affect every part of the economy and could lead to increases in economic growth similar to the industrial revolution.
Compared with rule-based AI technologies, the emerging machine learning opened the door for machine acting more like human: natural interactions, engagement, personalization, versatility. This talk will introduce those breakthroughs: 1. methods of learning structured behavior; 2. theory of universal probabilistic language model; 3. practices of deep learning to understand voice and text.
In civil aviation, the autopilot developed from CAT I to III. We borrow this term to introduce the possible capacities of conversational AI now and future, and finally how works at convospot.io aligned with this vision.
This document discusses the Web Ontology Language (OWL). It begins by providing motivation for OWL, noting limitations of RDF and RDF Schema in areas like expressiveness. It then outlines the technical solution of OWL, including its design goals of being shareable, changing over time, ensuring interoperability, and balancing expressiveness with complexity. Finally, it introduces the three dialects of OWL - OWL Lite, OWL DL, and OWL Full - and their different levels of expressiveness and reasoning capabilities.
Invited Talk at Summer School on Semantic Web, Bertinoro, 2015
Abstract:
Two decades ago one has discussed how to build seamless digital workflows
such that the medium for data in a workflow would not switch between paper, fax, phone,
and digital, because each transcription from one to another medium would
be laborious and cost-inefficient. Thus, the issue was avoiding *medium discontinuities*.
Today, we have all-digital data workflows, but we have still plenty of *semantic discontinuities*.
In this talk, I want first to describe reasons for this discontinuities including: autonomy of
data providers, need for agility and flexibility, or decentralized organizations in
the world-wide data spaces.
Then I want to describe several semantics discontinuities and some efforts to
ameliorate them by:
1. Semantic programming (Horizontal workflow paradigm)
2. Core ontologies (Vertical workflow paradigm)
3. Semantic data production and consumption (Sticky semantics)
folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language technology
The IMLS-funded project Linked Data for Professional Education (LD4PE) has created a "Competency Index for Linked Data".
The Index provides a concise and readable map of concepts and skills related to the practices and technologies of Linked Data for the benefit of interested learners and their teachers.
Using NLP to understand textual content at scaleParsa Ghaffari
In this talk, Parsa Ghaffari (CEO & founder at AYLIEN) explores the importance of Natural Language Processing from an industrial perspective, and some of the challenges associated with applying NLP to business problems.
http://aylien.com
Laure talked about a very hot topic in the community at the moment with the ChatGPT phenomenon: how to supervise a PhD thesis in NLP in the age of Large Language Models (LLMs)?
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
Language technology is rapidly evolving. A resurgence in the use of distributed semantic representations and word embeddings, combined with the rise of deep neural networks has led to new approaches and new state of the art results in many natural language processing tasks. One such exciting - and most recent - trend can be seen in multimodal approaches fusing techniques and models of natural language processing (NLP) with that of computer vision.
The talk is aimed at giving an overview of the NLP part of this trend. It will start with giving a short overview of the challenges in creating deep networks for language, as well as what makes for a “good” language models, and the specific requirements of semantic word spaces for multi-modal embeddings.
RDF and other linked data standards — how to make use of big localization dataDave Lewis
The standards and interoperability challenge to using the Resource Description Framework for data resource in linked data. Based on work from CNGL (www.cngl.ie), the FALCON project (www.falcon-project.eu) and the LIDER project (www.lider-project.eu)
The talk about "Stream Reasoning" for INQUEST -- INnovative QUErying of STreams 2012 -- (http://games.cs.ox.ac.uk/inquest12/) organized in Oxford, United Kingdom, September 25-27 2012.
The talks presents a comprehensive view on "Stream Reasoning" -- reasoning on rapidly flowing information. It illustrates the challenges, presents the achievements of the database group of Politecnico di Milano on the topic, reviews the challenges pointing to results and ongoing work in the Semantic Web community and proposes how to go beyond the current Stream Reasoning concept. It particular, it points out that "orders matters" when processing massive data and it proposes to investigate streaming algorithms for automated reasoning that can be applied not only to data streams that are "naturally" ordered (by recency) but to any sortable data source.
Linked Open Data for cities at SemTechBiz 2013 (San Francisco)AI4BD GmbH
Showing how to use open source tools to create linked open data. Provided a first view into the Linked Data Orchestration process that is easy to use and support the triplification process including the publishing of datasets as SPARQL endpoint.
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Matthias Arnold
Ms. Jingjing Chen
eMail: jingjing.chen@hcts.uni-hd.de
Project website: http://ecpo.uni-hd.de
Arnold, Heidelberg | Early Chinese Periodicals Online (ECPO): From Digitization to Open Data | JADH 2018
"Semantic Integration Is What You Do Before The Deep Learning". dev.bg Machine Learning seminar, 13 May 2019.
It's well known that 80\% of the effort of a data scientist is spent on data preparation. Semantic integration is arguably the best way to spend this effort more efficiently and to reuse it between tasks, projects and organizations. Knowledge Graphs (KG) and Linked Open Data (LOD) have become very popular recently. They are used by Google, Amazon, Bing, Samsung, Springer Nature, Microsoft Academic, AirBnb… and any large enterprise that would like to have a holistic (360 degree) view of its business. The Semantic Web (web 3.0) is a way to build a Giant Global Graph, just like the normal web is a Global Web of Documents. IEEE already talks about Big Data Semantics. We review the topic of KGs and their applicability to Machine Learning.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
Linked Data Experiences at Springer NatureMichele Pasin
An overview of how we're using semantic technologies at Springer Nature, and an introduction to our latest product: www.scigraph.com
(Keynote given at http://2016.semantics.cc/, Leipzig, Sept 2016)
This presentation shows approaches for knowledge graph construction from Wikipedia and other Wikis that go beyond the "one entity per page" paradigm. We see CaLiGraph, which extracts entities from categories and listings, as well as DBkWik, which extracts and integrates information from thousands of Wikis.
KeyNote @SEMANTICS 2017 (Amsterdam, sept 2017) about convergences between NLP and KE at the era of the semantic web, with a focus on semantic relation extraction from text.
This document discusses the evolution of natural language processing (NLP) and knowledge engineering (KE) and their convergence, especially with the rise of deep learning and the semantic web. It outlines how NLP and KE have moved from early ambitions of full language understanding and problem solving to more practical, layered approaches focused on specific tasks. The semantic web provides standards and architectures that benefit both NLP and KE by enabling semantic annotation, linking of data, and use of knowledge sources. Deep learning allows NLP to learn representations from large corpora and benefit from semantic resources. Relation extraction and ontology learning from text are examples of the convergence. Challenges remain around contextual language, knowledge assertion, and industrial applications.
This lectures provides students with an introduction to natural language processing, with a specific focus on the basics of two applications: vector semantics and text classification.
(Lecture at the QUARTZ PhD Winter School (http://www.quartz-itn.eu/training/winter-school/ in Padua, Italy on February 12, 2018)
The best known natural language processing tool is GPT-3, from OpenAI, which uses AI and statistics to predict the next word in a sentence based on the preceding words. NLP practitioners call tools like this “language models,” and they can be used for simple analytics tasks, such as classifying documents and analyzing the sentiment in blocks of text, as well as more advanced tasks, such as answering questions and summarizing reports. Language models are already reshaping traditional text analytics, but GPT-3 was an especially pivotal language model because, at 10x larger than any previous model upon release, it was the first large language model, which enabled it to perform even more advanced tasks like programming and solving high school–level math problems. The latest version, called InstructGPT, has been fine-tuned by humans to generate responses that are much better aligned with human values and user intentions, and Google’s latest model shows further impressive breakthroughs on language and reasoning.
For businesses, the three areas where GPT-3 has appeared most promising are writing, coding, and discipline-specific reasoning. OpenAI, the Microsoft-funded creator of GPT-3, has developed a GPT-3-based language model intended to act as an assistant for programmers by generating code from natural language input. This tool, Codex, is already powering products like Copilot for Microsoft’s subsidiary GitHub and is capable of creating a basic video game simply by typing instructions. This transformative capability was already expected to change the nature of how programmers do their jobs, but models continue to improve — the latest from Google’s DeepMind AI lab, for example, demonstrates the critical thinking and logic skills necessary to outperform most humans in programming competitions.
Models like GPT-3 are considered to be foundation models — an emerging AI research area — which also work for other types of data such as images and video. Foundation models can even be trained on multiple forms of data at the same time, like OpenAI’s DALL·E 2, which is trained on language and images to generate high-resolution renderings of imaginary scenes or objects simply from text prompts. Due to their potential to transform the nature of cognitive work, economists expect that foundation models may affect every part of the economy and could lead to increases in economic growth similar to the industrial revolution.
Compared with rule-based AI technologies, the emerging machine learning opened the door for machine acting more like human: natural interactions, engagement, personalization, versatility. This talk will introduce those breakthroughs: 1. methods of learning structured behavior; 2. theory of universal probabilistic language model; 3. practices of deep learning to understand voice and text.
In civil aviation, the autopilot developed from CAT I to III. We borrow this term to introduce the possible capacities of conversational AI now and future, and finally how works at convospot.io aligned with this vision.
This document discusses the Web Ontology Language (OWL). It begins by providing motivation for OWL, noting limitations of RDF and RDF Schema in areas like expressiveness. It then outlines the technical solution of OWL, including its design goals of being shareable, changing over time, ensuring interoperability, and balancing expressiveness with complexity. Finally, it introduces the three dialects of OWL - OWL Lite, OWL DL, and OWL Full - and their different levels of expressiveness and reasoning capabilities.
Invited Talk at Summer School on Semantic Web, Bertinoro, 2015
Abstract:
Two decades ago one has discussed how to build seamless digital workflows
such that the medium for data in a workflow would not switch between paper, fax, phone,
and digital, because each transcription from one to another medium would
be laborious and cost-inefficient. Thus, the issue was avoiding *medium discontinuities*.
Today, we have all-digital data workflows, but we have still plenty of *semantic discontinuities*.
In this talk, I want first to describe reasons for this discontinuities including: autonomy of
data providers, need for agility and flexibility, or decentralized organizations in
the world-wide data spaces.
Then I want to describe several semantics discontinuities and some efforts to
ameliorate them by:
1. Semantic programming (Horizontal workflow paradigm)
2. Core ontologies (Vertical workflow paradigm)
3. Semantic data production and consumption (Sticky semantics)
folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language technology
The IMLS-funded project Linked Data for Professional Education (LD4PE) has created a "Competency Index for Linked Data".
The Index provides a concise and readable map of concepts and skills related to the practices and technologies of Linked Data for the benefit of interested learners and their teachers.
Using NLP to understand textual content at scaleParsa Ghaffari
In this talk, Parsa Ghaffari (CEO & founder at AYLIEN) explores the importance of Natural Language Processing from an industrial perspective, and some of the challenges associated with applying NLP to business problems.
http://aylien.com
Laure talked about a very hot topic in the community at the moment with the ChatGPT phenomenon: how to supervise a PhD thesis in NLP in the age of Large Language Models (LLMs)?
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
Language technology is rapidly evolving. A resurgence in the use of distributed semantic representations and word embeddings, combined with the rise of deep neural networks has led to new approaches and new state of the art results in many natural language processing tasks. One such exciting - and most recent - trend can be seen in multimodal approaches fusing techniques and models of natural language processing (NLP) with that of computer vision.
The talk is aimed at giving an overview of the NLP part of this trend. It will start with giving a short overview of the challenges in creating deep networks for language, as well as what makes for a “good” language models, and the specific requirements of semantic word spaces for multi-modal embeddings.
RDF and other linked data standards — how to make use of big localization dataDave Lewis
The standards and interoperability challenge to using the Resource Description Framework for data resource in linked data. Based on work from CNGL (www.cngl.ie), the FALCON project (www.falcon-project.eu) and the LIDER project (www.lider-project.eu)
The talk about "Stream Reasoning" for INQUEST -- INnovative QUErying of STreams 2012 -- (http://games.cs.ox.ac.uk/inquest12/) organized in Oxford, United Kingdom, September 25-27 2012.
The talks presents a comprehensive view on "Stream Reasoning" -- reasoning on rapidly flowing information. It illustrates the challenges, presents the achievements of the database group of Politecnico di Milano on the topic, reviews the challenges pointing to results and ongoing work in the Semantic Web community and proposes how to go beyond the current Stream Reasoning concept. It particular, it points out that "orders matters" when processing massive data and it proposes to investigate streaming algorithms for automated reasoning that can be applied not only to data streams that are "naturally" ordered (by recency) but to any sortable data source.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
Invited Presentation in NLP lab of Soochow University, about my NLP journey and ADAPT Centre. NLP part covers Machine Translation Evaluation, Quality Estimation, Multiword Expression Identification, Named Entity Recognition, Word Segmentation, Treebanks, Parsing.
The document discusses the development of OpenWN-PT, a Brazilian Portuguese Wordnet. Key points:
- OpenWN-PT is being created as part of a joint project between CPDOC and EMAp to apply formal logical tools to Portuguese text.
- It is based on the Universal Wordnet (UWN) which projects WordNet concepts into over 200 languages using statistical methods. The UWN provides an initial automated version of a Portuguese Wordnet.
- The creators are working to improve the initial UWN-based Portuguese Wordnet by combining it with data from Princeton WordNet, UWN, MENTA, and EuroWordNet to generate a new OpenWN-PT file.
Gadgets pwn us? A pattern language for CALLLawrie Hunter
The document discusses creating a pattern language for computer-assisted language learning (CALL). It explores the concept of a pattern language as defined by Christopher Alexander and proposes a framework for creating a CALL pattern language in the era of web 2.0. The paper seeks to rework concepts from other fields, like "formal learning design expression" and "task arc," and have participants brainstorm elements to include through graphical challenges. The overall goal is to establish foundational patterns for CALL work.
Presented by Ted Xiao at RobotXSpace on 4/18/2017. This workshop covers the fundamentals of Natural Language Processing, crucial NLP approaches, and an overview of NLP in industry.
The Semantic Web - Interacting with the UnknownSteffen Staab
When developing user interfaces for interacting with data and content one typically assumes that one knows the type of data and one knows how to interact with such type of data. The core idea of the Semantic Web is that data is self-describing, which implies that its semantics is not designed and described at an initial point in time, but it rather emerges by its use. This flexibility is one of the greatest assets of the Semantic Web, but it also severely handicaps intelligent interaction with its data.
In this talk, we will sketch the principal problem as well as first steps to deal with the problem of interacting with the unknown.
Similar to Keynote new convergences between natural language processing and knowledge engineering (20)
This document discusses how Talend transitioned from a linear book format to an open world interactive experience driven by taxonomy. It created a task-based taxonomy to organize content by user tasks. It rearchitected content into focused pages for each user goal or session. This reduced duplication and improved searchability and browsing. The taxonomy also enabled consistent recommendations and related links between pages. While requiring training and effort, the rich tagging unlocked new interactive experiences and bridged silos between content.
This document presents SwissLink, a high-precision context-free entity linking system. It links entity mentions in text to a knowledge base without considering context. It achieves this by extracting unambiguous surface forms from Wikipedia and DBpedia and matching text strings to these forms. An evaluation on 30 Wikipedia articles found the percentile-ratio method, which filters ambiguous labels and adjusts weights, achieved over 95% precision and 45% recall.
Session 4.3 semantic annotation for enhancing collaborative ideationsemanticsconference
This document discusses enhancing collaborative ideation through semantic annotation. It notes that some collaborative innovation platforms have attracted thousands or tens of thousands of contributors. The document proposes getting inspiration through creative, diverse ideas and finding similarities between ideas. It provides an example of matching two ideas - a heat-sensitive window that lights up in a fire, and a facade that indicates which floor needs rescue - by annotating them with semantic tags like "window" and "facade". The goal is to convert a similarity matrix into a 2D visualization to illustrate the solution space for ideas.
The document summarizes the DALICC (Data Licenses Clearance Center) project. The project aims to develop a software framework that reduces the costs of clearing licenses for derivative works by providing tools to choose licenses, check compatibility, and resolve conflicts. It will represent licenses in RDF and use rules and semantics to reason about licenses and detect inconsistencies. The framework will include components for composing, annotating, and negotiating licenses through a license library and API. The goal is to increase productivity and reuse of data by easing license clearance.
Session 1.3 context information management across smart city knowledge domainssemanticsconference
This document provides an introduction to the ETSI Industry Specification Group on Context Information Management (ISG CIM). It discusses the goals of ISG CIM, which are to develop technical specifications for exchanging contextual information across different domains. It outlines the scope and status of ongoing work items, including use cases, architecture and gap analysis, an API specification, information models, and security considerations. Examples of contextual information exchange across different smart city domains are also discussed.
Wolters Kluwer provides software tools and content services to help customers in industries like healthcare, tax, accounting, and law make decisions with confidence. Their solutions leverage artificial intelligence and human experts to deliver accuracy, speed, and value. This includes tools that manage legal invoice review, identify at-risk clients, and simplify mergers and acquisitions agreements. Wolters Kluwer has a global presence and serves major customers in each industry including nearly all US academic medical centers, top accounting firms, banks, and legal professionals worldwide.
Session 1.1 linked data applied: a field report from the netherlandssemanticsconference
This document discusses the author's company's portfolio of Linked Data projects in 2017 and their perspective on future growth opportunities in this area. It provides a snapshot of 9 current clients representing a variety of sectors and use cases for Linked Data including business vocabularies, reference data management, and semantic enterprise content management. The author analyzes the relative revenue contributions of different use cases in 2017 and expectations for relative growth. Specific examples are discussed including collating concepts across vocabularies for a client in education and the potential for advanced data extraction techniques to support semantic ECM.
Session 1.2 enrich your knowledge graphs: linked data integration with pool...semanticsconference
PoolParty Semantic Integrator and UnifiedViews are tools for managing knowledge graphs and performing data acquisition tasks like schema mapping, entity linking, and data fusion. UnifiedViews allows defining and executing data processing pipelines with core plugins for extraction, transformation, and loading of data. It can handle common tasks such as mapping arbitrary data sources to RDF, linking entities to a knowledge base, and fusing different representations of resources. These data acquisition capabilities are accessible through the user interface of PoolParty Semantic Integrator for overviewing and monitoring tasks and browsing integrated data.
Session 1.4 connecting information from legislation and datasets using a ca...semanticsconference
The document discusses the implementation of a new Dutch environmental law and the creation of a "Digital System of Environmental Law" (DSO). It will integrate various sources of law, rules, concepts, data, and information products. A central catalogue will connect all this information and serve as a hub for users, governments, and other stakeholders. The catalogue will publish metadata, concepts, information products, and laws/rules. It will use various vocabularies and standards to represent this information in an interlinked manner following Linked Open Data principles. This will allow different sources of content to be queried and understood in relation to each other within the DSO system.
Session 1.4 a distributed network of heritage informationsemanticsconference
This document discusses strategies for improving discovery of digital heritage information across Dutch cultural institutions. It identifies problems with the current infrastructure based on OAI-PMH including lack of semantic alignment and inefficient data integration. The proposed strategy is to build a distributed network based on Linked Data principles, with a registry of organizations and datasets, a knowledge graph with backlinks to support resource discovery, and virtual data integration using federated querying of Linked Data sources. This will improve usability, visibility, and sustainability of digital heritage information in the Netherlands.
This document discusses linking thesauri to enable unified searching across different collections. It describes how VIAA archives content from over 100 organizations across different sectors. A 2014 feasibility study found that a single unified thesaurus would not work due to differences in content and specialization. However, thesauri can be linked using SKOS, which allows organizations to work independently on their own thesauri while benefiting from each other's work and enabling unified search. The document outlines work linking the GTAA and VRT thesauri as a demonstration, with over 20,000 terms linked across subjects, names, locations and persons. It concludes more work is needed to select thesauri to link and integrate linked thesauri into collection
Session 1.3 semantic asset management in the dutch rail engineering and con...semanticsconference
The document describes a use case involving the exchange of project data between an engineering company, construction company, and Dutch Rail authority using the COINS (Constructive Objects and the INtegration of processes and Systems) open semantic standard. The project involved replacing a level crossing with an under crossing. Project data instances were exchanged in a COINS container validated against the OTL Spoor ontology. The data was integrated on a collaboration platform and could be queried. It was concluded that semantic interoperability was achieved through COINS and data quality improved with validation. However, better software support is still needed to improve efficiency and adoption of COINS.
Session 1.3 energy, smart homes & smart grids: towards interoperability...semanticsconference
The document discusses enabling demand side flexibility (DSF) through standardization and interoperability. It provides background on SAREF, an ontology for smart appliances, and its extension SAREF4ENER. A current study aims to identify necessary alignments between SAREF4ENER and other energy and smart grid standards to achieve interoperability for DSF. The study will demonstrate an integrated DSF infrastructure at a conference in October based on integrating SAREF with representative standards. The goal is a final report identifying gaps and recommending alignments to standard development organizations.
Session 1.2 improving access to digital content by semantic enrichmentsemanticsconference
This document discusses improving access to digital collections through semantic enrichment. It describes linking names and entities from text to knowledge bases like Wikidata to make the content more discoverable and usable. The process involves named entity recognition, entity linking using disambiguation algorithms, presenting enriched context, and enabling semantic search. User feedback is gathered to improve the linking algorithms through additional training. The goal is to increase trust in the links for research purposes. Overall, the approach aims to enrich text collections by connecting content to external information sources.
Session 2.3 semantics for safeguarding & security – a police storysemanticsconference
The document discusses using semantics and a multi-model approach to build a unified police intelligence platform. It describes loading multiple disconnected police data sources as-is into a MarkLogic database. Entities like people, events, and locations are extracted, harmonized, and disambiguated using hash codes. Relationships between entities are stored as RDF triples along with the documents. This allows for fast, flexible, and secure querying, linking, and disambiguation across all police data in a single platform.
Session 2.5 semantic similarity based clustering of license excerpts for im...semanticsconference
The document discusses an approach to clustering similar excerpts extracted from end-user license agreements (EULAs) to provide a more user-friendly summary. It extracts permissions, prohibitions, and duties from EULAs using an ontology-based information extraction system. It then computes semantic similarity between excerpts and clusters them using hierarchical clustering. An evaluation found the clustering compressed information while preserving essential meanings, and users could understand EULAs faster and with less effort using the clustered summaries compared to original texts.
Session 4.2 unleash the triple: leveraging a corporate discovery interface....semanticsconference
The document discusses the OECD's efforts to leverage semantic technologies like tagging content with taxonomies and ontologies to build a corporate discovery interface. It outlines the OECD's work developing semantic robots to tag internal and external resources, building taxonomies and ontologies, and creating applications to help analysts conduct research. It also describes challenges like disambiguation and efforts to validate semantic annotations through golden corpora of manually tagged documents.
Session 1.6 slovak public metadata governance and management based on linke...semanticsconference
This document proposes establishing public linked data governance and management in the Slovak Republic based on methodologies used by EU institutions. It outlines establishing rules for interoperability levels of open public data, creating a central ontological model and governance structure to manage data quality and interoperability. It also proposes a linked data management lifecycle to publish, deploy, manage changes to and retire ontologies and URIs according to a change request process in order to establish central governance of public metadata in Slovakia.
Session 5.6 towards a semantic outlier detection framework in wireless sens...semanticsconference
This document describes a semantic outlier detection framework for wireless sensor networks. It introduces the framework's main components: the EEPSA ontology for semantic annotation, SemOD methods to identify outliers based on sensor vulnerabilities, and SemOD queries to classify outliers. It then provides a use case applying the framework to detect outliers in temperature sensor data caused by sun exposure, using the sensor's location, orientation, and nearby illuminance sensor readings. Results show the framework successfully identified outliers that a classic density-based method missed. The framework leverages semantics rather than just values to improve outlier detection and classification for preprocessing sensor data.
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...semanticsconference
This document summarizes a study that used ontology-guided information extraction and co-word analysis to analyze skill demands in job postings for the data science field. The researchers developed an ontology called SARO to extract keywords from job postings and constructed a co-occurrence matrix to identify relationships between skills. Two analyses found that the ontology-based extraction method performed well compared to manual annotation and that co-word analysis could identify clusters of interrelated skills in demand. Future work could improve the extraction method and perform time-series analysis to identify trends in skill demands.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3Data Hops
Free A4 downloadable and printable Cyber Security, Social Engineering Safety and security Training Posters . Promote security awareness in the home or workplace. Lock them Out From training providers datahops.com
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Keynote new convergences between natural language processing and knowledge engineering
1. New convergences between
Natural language processing and
knowledge engineering –
An illustration with the extraction and
representation of semantic relations
Nathalie Aussenac-Gilles
(IRIT – CNRS, Toulouse, France)
aussenac@irit.fr
2. Outline of the talk
• Evolution of the Language and Knowledge
duality in AI
• Deep learning for NLP
• Semantic relations
• Finding semantic relations
2SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac12/09/2017
3. The language / Knowledge duality
in early AI
Natural Language Processing
• Ambitious goal
• To produce systems able to fully
understand and represent the
meaning of language
• Target representation: logic
• … inspired by linguistic >>
computational linguistics
Knowledge Engineering
• Ambitious goal
• To Produce systems able to fully
solve problems that « classical »
algorithms are not likely to solve
• Target representation: logic
• … inspired by human problem
solving >> expert systems
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 3
Knowledge
acquisition
Natural
Language
Processing Logic-based
representation
KBSSyntactic
Parsing / checking
Spelling checking
….
4. The language / Knowledge duality
in classical AI
Natural Language Processing
• To produce systems able to
understand and build
representations from language in
order to build systems that perform
language intensive tasks
• Target applications
– Identifying opinions
– Providing abstracts
– Translating from one language to
another
– Extracting information
– Answering questions
– Managing dialog systems
Knowledge engineering
• Collect knowledge from various
sources to build representations
and knowledge bases in order to
build systems that perform or
support knowledge intensive tasks
• Knowledge based systems
– Fault diagnosis
– Classification
– Repair
– Task planning
– simple design, …
• Little focus on domain knowledge
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 4
5. The language / Knowledge duality
in classical AI
Natural Language Processing
• Layered approach to deal
with specific issues
Knowledge Engineering
• Layered models and
reusable components
• Cf CommonKADS
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 5
OCR/ tokenization
Morphological /
lexical analysis
Syntactic analysis
Semantic typing
Discourse analysis
Task model
Inference structure
Domain model
Task library
Problem solving
methods
Ontologies1995
6. The language / Knowledge duality
at the era of the (semantic) web
What the semantic web provides
• Ambition
– To make web pages and web data « understable » by algorithms
– To give them a semantics by typing entities and concepts
• Standards for knowledge representation
– Improve interoperability
– Promote knowledge and data reusability
• An architecture to reach this goal
– Web application composition: web services
– Semantic annotation
– Linking semantic data and making it open (LOD)
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 6
7. The language / Knowledge duality
at the era of the (semantic) web
Natural Language Processing
• Larger corpora enable
– Statistics
– Probabilist language models
– Machine learning
• NLP benefits of new semantic
datasets
– Ontologies
– Large KB: DBPedia, Yago,
– Multilingual lexical KB: BabelNet
Knowledge engineering
• Text as knowledge sources
– Information extraction techniques
– Semantic typing
– Relation extraction
• Ontology engineering from text
• Produce semantic resources
– (domain) Ontologies
– Large general KB
• Ontology based applications
– Connect services
– make apointments
– adapt processes to context
– answer questions, search for
information …
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 7
More knowledge sources, more data and more digital text
8. Linguistic clues for knowledge
October 2014 From natural language to ontologies 8
Name : Sofia Copola or Sofia
Carmina Coppola
Is-a : Person
Born on: 1971, May 14
Born in: New York (USA)
Job: Movie director, actor
Nationality: American
Name : Francis Ford Copola
Is-a : Person
Born on: …
Born in: …
Job: Movie director
Nationality: American
Has-childNamed Entity
Recognition
Relation
extractionEntity typing
10. Other difficult issues
• Short (context-free) text:
headlines or tweets cf
http://www.cs.cmu.edu/~ark/TweetNLP/
• Asserting the value of facts
• Parsing non standard
English; neologism, spelling
errors, syntax errors …
• Non figurative language,
humor, sarcasm …
• segmentation issues
• …
• The Pope’s baby steps on gays
• The Eiffel Tower is a 324 metres tall
(including the antenas) wrought iron lattice
tower … The Eiffel Tower is 312 metres tall
…
• Great job @jusInbieber! Were SOO PROUD
of what you’ve accomplished! U taught us 2
#neversaynever & you yourself should
never give up either ♥
• “Congratulation #lesbleus for your great
match!” is ironic if the French soccer team
has lost the match.
• The New York-‐New Haven Railroad
The New York-‐New Haven Railroad
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 10
11. Machine learning for NLP
• Machine learning requires
– Annotating examples (time
consuming)
– Selecting (and evaluating)
appropriate features to
describe the data in a
processable way (complex,
requires linguistic expertise and
resources)
– Selecting the appropriate
learning algorithm
– The ML algo pptimizes the
weights on features
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 11
Features for humor detection in Tweets (Karaoui et al.
2016)
Surface features
Number of words in tweet
Ponctuation mark y/n
Question mark y/n
Word in capital letter y/n
Interjection y/n
Emoticon y/n
Slang word y/n
Sequence of exclamation or question marks
Discourse connective y/n
Sentiment features
Nb of positive (négative) opinion words
Words of surprise, neutral opinion
Semantic shifter
Intensifier y/n
Negation word y/n
Reporting speach verb y/n
Opposition features
Explicit sentiment opposition y/n
(tested with patterns)
The #NSA wiretapped a whole country. No worries
for #Belgium: it is not a whole country.
positive example
The Eiffel Tower is 324 metres tall.
negative example
… #irony or #humor
positive example
Lexicons
Word lists
12. Challenging industrial
applications
… that require NLP AND semantic resources
• Search engines (written and spoken)
• Online advertisement matching
• Automated/assisted translation
• Sentiment analysis for marketing or finance
• Speech recognition
• Chatbots / Dialog (virtual) agents
– Customer support
– Controlling devices
– Technical support to diagnose and repair
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 12
13. Current challenges
according to D. Jurasky in 2012
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 13
14. Outline of the talk
14SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac12/09/2017
• The Language and Knowledge duality in AI
• Deep learning for NLP
• Semantic relations
• Finding semantic relations
15. Recent shift:
Deep Learning (DL) for NLP
from C. Manning and R. Socher, Course about NLP with DL
http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture1.pdf
• Representation learning attempts to automatically
learn good features or representations
– Ex: vectors represent word distribution in corpus
• Deep learning algorithms attempt to learn
(multiple levels of) representation and an output
• From “raw” inputs x (e.g., sound, characters, or
words)
• … not that “raw”: WORD VECTORS are the input
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 15
16. Recent shift:
Deep Learning (DL) for NLP
(Manning 2017 tutorial)
• Deep NLP = deep learning + NLP
• Reach the goals of NLP using some NLP works,
representation learning and deep learning methods
• Several big improvements in recent years in NLP
– Levels: speech, words, syntax, semantics
– Tools: parts-of-speech, entities, parsing
– Applications: machine translation, sentiment analysis,
dialogue agents, question answering
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 16
17. Recent shift:
Deep Learning (DL) for NLP
Distributional semantics
• Semantic similarity is based on the distributional
hypothesis [Harris 1954]
• Take a word and its contexts:
– tasty sooluceps
– sweet sooluceps
– stale sooluceps
– freshly baked sooluceps
• By looking at a word’s context, one can infer its meaning
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 17
food
18. Recent shift:
Deep Learning (DL) for NLP
Distributional semantics
• Vectors to capture word meaning = frequency of co-occurring words
• and matrix to capture word similarities
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 18
Red Tasty Rapid Second-
hand
Sweet
Cherry 2 3 0 0 1
Strawberry 3 1 0 0 2
Car 2 0 3 2 0
Truck 1 0 3 1 0
Red Tasty Rapid Second-
hand
Sweet
Cherry 52 104 0 0 75
Strawberry 68 85 0 0 42
Car 27 0 65 35 0
Truck 12 0 43 72 0
Red Tasty Fast Second-
hand
Sweet
Cherry 752 604 0 1 575
Strawberry 868 584 2 0 642
Car 274 0 465 358 0
Truck 126 0 343 172 0
red
Fast
Strawberry
Cherry
Car
Truck
19. Deep Learning for NLP
• Syntactic Parsing of sentence structure
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 19
Input
20. Deep learning for NLP
• Could be solving many problems
– Morphologic analysis: vectors of Morphemes combined with NN
– Semantics: Words, phrases and logical expressions are vectors -> compared
using NN to evaluate their similarity
– Sentiment analysis: combining various analyses with NN -> recursive NN
– Question answering: vectors of facts compares with NN
– …
• But not all the NLP problems in any context (domain specific corpora …)
• Requires large corpora to build word vectors, or to reuse existing word
vectors (built on Wikipedia and GigaCorpus)
– http://nlp.stanford.edu/projects/glove/
– https://github.com/idio/wiki2vec/
• Requires expertise to define the layers, the vectors and content of the NN
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 20
21. Outline of the talk
• The Language and Knowledge duality in AI
• Deep learning for NLP
• Semantic relations
• Finding semantic relations
21SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac12/09/2017
22. Semantic relations,
what do we mean?
• Semantic relation … what do you have in mind?
– Binary relation
– Hypernymy … meronymy
– Causality, temporal, spatial
– What about other kinds of relations?
(Cat, eats, mouse)
(« SimplyRed », plays, « The right thing »)
(« Eiffel Tower », has-height, « 324 m »)
(artist, performs, piece of music, date, location)
• Relation extraction from text: what do we have in mind?
– The relation is expressed in a single sentence.
– The relation is expressed in tables or tagged XML sections
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 22
Binary relations
Hierachical
relations
General
relations
N-ary relations
23. Semantic relations,
what do we mean?
Research field
• Linguistics: semantic
relations, semantic roles,
discourse relations
• Terminology
– Weak structure
– Stored in DB or SKOS models
• Information extraction
– Small set of classes
– Gazetteers contain lists of
entity labels
What is a relation
A tree comprises at least a trunk, roots and
branches.
A tree [Plants] comprises [meronymy] at least a
trunk, roots and branches.
(tree has_parts trunk)
(tree, has_parts, roots) …
in a gardening terminology
looks for relations between instances
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 23
tree Plantation year Species Branches
Tree1 1990 Oak > 20
Tree2 1995 Oak 15
whole
parts
24. Semantic relations,
what do we mean?
Research field
• Domain Ontology engineering
– Formal (logic, RDF, OWL …)
– Formal properties: transitivity …
– used to infer new knowledge
– part of a network
– May be shared or reused
• Semantic web
– Independent triples that
connect resources
– Publically available in data
repositories with W3C Standard
format
– Connect triples with existing
ones, with web ontologies
What is a relation
bot:Tree bot:has_part bot:Branch
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 24
Trunk
Has-part
Root
Plant
Fonguscereals
is_a
Tree
Has-
part
Branch
bot:Tree bot:has-part bot:Branch
bot:Plant bot:has-part bot:Root
rdfs:subClassOf
Has-
part
Root
25. Example: tree in DBPedia
25
dbpedia-
owl:tree
dbpedia-owl:Speciesdbpedia-owl:Place
SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations -
Aussenac
12/09/2017
26. dbpedia-
owl:PhysicalEntity
rdfs:subClassOf
dbpedia-
owl:Organism
Example: Plants in DbPedia
26
owl:SameAs
yago:WordNet_Plant_
100017222
dbpedia-
owl:Plant
dbpedia-
owl:Acer_Stone
bergae
dbpedia-
owl:Alopecurus_ca
rolinianus
dbpedia-
owl:Alsmithia_long
ipes
dbpedia-owl:…
rdf:typerdf:type
rdf:typerdf:type
SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac12/09/2017
27. Outline of the talk
• The Language and Knowledge duality in AI
• Deep learning for NLP
• Semantic relations
• Finding semantic relations
27SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac12/09/2017
28. Finding semantic relations,
some parameters
• Knowledge sources
– human experts, text
– existing semantic resources
– Domain specific vs general knowledge
• Text collection(s)
– Size, domain specific vs general
– Structure, quality of writing
– Textual genre (knowledge rich text?)
• Target representations
– Input/ output format of the process
– Nature of the semantic relation
28SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac12/09/2017
29. Finding semantic relations,
some parameters
• Extraction techniques from text
– “obvious” language regularities, known relations and
classes (or entities) -> Patterns
– “more implicit” language regularities, medium size
corpora, open list of classes/entities -> supervised
learning
– Very large corpora, unexpected relations ->
unsupervised learning
• Validation
– What makes a relation representation valid?
Relevant?
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 29
30. Historic perspective
on relation extraction techniques
• Early period: around 1990
– Patterns (Hearst, 1992) to explore definitions
– Learning selectional preferences (Resnick)
– Machine Learning : ASIUM (Faure, Nedellec)
– Relations between classes
• From 2000 to 2010: more patterns, more learning
– Association rules (Maedche & Staab, 2000)
– Supervised Learning from positive/ negative exemples
– Joint use of various methods (Malaisé, 2005), Text2Onto (Cimiano, 2005), RelExt
– Relations between entities
• Since 2005: open relation extraction
– Semi-supervised learning from small sets of data
– Unsupervised learning: KnowItAll (Etzioni et al., 2005), TextRunner (Banko, 2007)
– Distant supervision (using a KB) ; deep learning
– Very large corpora (web)
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 30
31. Pattern-based relation extraction
• Hearst, 1992. Patterns for hypernymy in English
“Y such as X ((, X)* (, and|or ) X)”
“such Y as X”
“X or other Y”
“X and other Y”
“Y including X”
“Y, especially X”
• A shared list of patterns for French: MAR-REL
– CRISTAL project (linguistics and NLP)
– 3 types of binary relations: hypernymy, meronymy, cause
– UIMA format
– Evaluation on various corpora
– http://redac.univ-tlse2.fr/misc/mar-rel_fr.html
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 31
32. Tuning a pattern …
an endless effort ?
• On appelle route nationale une route gérée par l’état.
• Sur cette carte, on symbolise par un triangle un sommet de plus de 2000m.
• Il appelle souvent son chat la nuit. -> error
• On dénommait Louis-Philippe « la poire ». -> missed
• On appellera dans la suite de ce mémoire relation lexicale une relation qui …
-> missed
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 32
33. Pattern based relation extraction,
known issues
• A tree comprises at least a trunk,
roots and branches.
• With branches reaching the ground,
the willow is an ornamental tree.
• The tree of the neighbor has been
delimed.
• He climbs on the branches of the tree.
• This tree is wonderful. Its branches
reach the ground.
• Plant tangerine trees in a sheltered
place out of the wind.
• verb: lexical variation; enumeration >
various parts; modality (exactly, at
least, at most, often, …)
• With: meronymy pattern only in some
genres (such as catalogs, biology
documents); insertion between the
arguments
• Delimed : Term and pattern are in the
same word; implicitness: requires
background knowledge: delimed ->
has_part branches (and branches are
cut)
• Of : Very ambiguous mark; polysemy
reduced in [verb N1 of N2]
• Its : reference; necessity to take into
account two sentences
• Out of: negative form: representation
issue
33SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac12/09/2017
34. Pattern-based relation extraction,
other issues
• Not enough flexibility
– not able to handle (unexpected) variations
– Miss find many relations
– Need adaptation to be relevant on a new corpus
• Too strong "matching" between the sentence and
the pattern itself
• Generic patterns
– widely used with poor results (no surprise)
– often appear as a baseline
• Building relevant domain/corpus-specific patterns
is time consuming and difficult
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 34
35. Using ML to learn patterns (1)
• Patterns are seen (and stored) as lexicalizations of ontology properties
• Patterns are “extracted” from syntactic dependencies between related
entities (in triples)
• Assumes that patterns are structured around ONE lexical entry
• Lemon format for lexical ontologies
• Entries can be frames
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 35
ATOLL—A framework for the automatic induction of ontology lexica
S. Walter, C. Unger, P. Cimiano, DKE (94), 148-162 (2014)
36. Using ML to learn patterns (2)
12/09/2017
SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations -
Aussenac
36
Michelle Obama is the wife of Barack Obama, the current president.
Michelle Obama allegedly told her husband, Barack Obama, to ..
Michelle Obama, the 44th first lady and wife of President Barack
Dbpedia:spouse
Find all lexicalizations of the entities: Michelle Obama, Mrs. Obama, Michelle
Robinson …
37. Using ML to learn patterns (3)
12/09/2017
SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations -
Aussenac
37
• Pattern = shortest path btw the 2 entities in the dependency graph
[MichelleObama (subject), wife (root), of (preposition), BarackObama (object)]
• Lexical entry in the ontology
38. Finding semantic relations:
what can large corpora and machine
learning do for you ?
• Learning patterns
– Poor results
– Requires very large data sets
– Reasonable for general
knowledge
• Learning relations
– Much more relevant
– Large variety of approaches in
the state of the art
– Key step = select feature
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 38
39. Using ML to learn relations:
Hypotheses
• A large variety of learning algorithms
– classification
– regression
– Probabilities (naives Bayes)
– Linear separation …
• Classification = grouping similar learning objects
– Ojects are designed from input sentences
– Sentences where two arguments occur
– Either vectors or graphs or lists made of features
– Similarity measure: cosine or cartesian distance or
sequence alignment for graphs
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 39
40. Main stages of the process
1. Preprocessing
– Tagging the entities to be considered as arguments
– NLP preprocessing
2. Object representation
– Collect sentences where pairs co-occur
– Identify features
– Represent sentences with features
3. Training the algorithm (if supervised)
4. Running the trained model
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 40
41. Example2: Learning domain specific
relations using ALVIS-RE (1)
1. Preprocessing (AlvisNLP/ML platform)
– Tokenization in words and sentences
– Lemmatization, POS tagging using CCS parser
– Named Entity tagging (canonical form)
– Dependency relations (graph)
– Semantic relations are added when known (positive examples)
– Word sequence relations (wordPath)
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 41
VALSAMOU D., Automatic Information Extraction from scientific scholar to build a network of biological
regulations involved in the seed development of Arabidopsis Thaliana. ED STIC, univ. Paris Sud. 2017
42. Example2: Learning domain specific
relations using ALVIS-RE (2)
2. Object representation
• Representation as a path
• 3 experiments : depencies, surface (wordPath relations)
and a combination of the 2
• Find the shortest path between the terms Arg1 and Arg2.
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 42
43. Example2: Learning domain specific
relations using ALVIS-RE (3)
2. Object representation
• Paths are turned as sequences w,rel
• Empty nodes (gaps) are added if needed + weight (gap penalty)
• Weights are assigned to some words
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 43
44. Example2: Learning domain
specific relations using ALVIS-RE (4)
4. Classification
– Use SVM algorithm
– Improved using semantic information
• Distributional representations (DISCO or Word2Vec)
• Classes manually related to each other
• Classes from WordNet
• Evaluation on a real corpus
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 44
45. Example3: learning relations from
enumerative structures
12/09/2017
SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations -
Aussenac
45
IS_A
IS_A
Learning relations from an parallel enumerative structure =
- classification task to identify the relation (IS_A, part_Of, other)
- Term extraction to identify the primer and the items
J.-P.Fauconnier, M. Kamel. Discovering Hypernymy Relations using Text Layout (regular paper). Joint Conference
on Lexical and Computational Semantics (SEM 2015),(ACL), 2015.
46. Relation extraction:
learning relations from enumerative structures
• Corpus
– 745 enumerative structures from
Wikipedia pages
– 3 relation types: taxonomic,
ontological_non_taxonomic,
non_ontological
• Classification task
– Feature definition
– Automatic evaluation of features
– 3 algorithms are compared : SVM,
MaxEntropy and baseline (majority)
– Training of the 2 algorithms
• Results
– 82% f-measure for SVM
– Best result with a 2 step process
(ontological yes/no -> feature and
then taxonomic yes/no)
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 46
47. Example4: comparing patterns and
ML for hypernym relation extraction
• Overall objectif
– Define various relation extraction techniques
– Adapted to various ways to express relations
• Sempedia project
– To enrich the French DbPedia
– To extract relations from Wikipedia pages
• Experiment on desambiguation pages
– Contain definitions and many hypernym relations
– General knowledge
• Techniques
– Patterns
– Basic pre-processing (no dependency parsing)
– Distant supervised learning
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 47
49. Example4 : Application to Wikipedia
Disambiguation Pages
• Corpora
– Reference corpus: 20 pages ; manual annotation (entities and relations linking entities)
– Training corpus: all remaining French disambiguation pages (5904 pages)
• Semantic resource : BabelNet (www.babel.org)
– very large multilingual semantic network with about 14 million entries (Babel synsets)
– connects concepts and named entities with semantic relations
– rich in hypernym relations
• Features
07/09/2017 Extracting hypernym relations 49
50. Example 4: Processing chain
Preprocessing
Corpus (Wikipedia
disambiguation pages)
Annotated corpus
Term pairs
extraction
(<T1
1, T1
2>, sent1>)
(<T2
1, T2
2>, sent2>)
(<T3
1, T3
2>, sent3>)
…
Semantic
resource
BabelNet
{ <Tj
1, Tj
2>,sentj, <traitj
1, …, traitj
p>, neg>
}j
Gazetteer
(Babelnet terms)
TTG
{ <Ti
1, Ti
2>,senti, <traiti
1, …, traiti
p>, pos >}i
Feature vectors
building
test set (2000 +, 2000 -)
{ <Tj
1, Tj
2>,sentj, <traitj
1, …, traitj
p>, neg>
}j
training set (4000 +, 4000 -)
Binary logistic
regression
(MaxEnt)
Evaluation
(precision, recall,
F-measure)
Learning
model
{ <Ti
1, Ti
2>,senti, <traiti
1, …, traiti
p>, pos >}i
07/09/2017 Extracting hypernym relations 50
51. Example4: Application to
Wikipedia Disambiguation Pages
• Evaluation on the reference corpus
– 688 true positive examples and 278 true negative examples
– Best results with window size of 3
– Comparison between 2 baselines and 2 models
• Baseline1: generic lexico-syntactic patterns for French
• Baseline2: generic patterns AND ad-hoc patterns for the disambiguation pages
• Model_POSL: trained with vectors composed of POS and lemma features
• Model_AllFeatures: trained with vectors composed of all features
07/09/2017 Extracting hypernym relations 51
52. Example4: Application to Wikipedia
Disambiguation Pages - discussion
– Number of true positive hypernym relations per type of hypernym
expression
– Quantitative gain: machine learning identifies more examples, no
development cost, ensuring a systematic and less empirical approach.
– Impact of the way relations are expressed:
• ML performs as well as patterns on well-written text
• Ad-hoc pattern perform (a little) better on low-written text
• ML can identify all forms of relation expressions (current patterns are unable
to identify relations with head modifiers)
07/09/2017 Extracting hypernym relations 52
53. • Examples
– correctly identified by ML
– would require additional ad-hoc patterns > extra cost
(1) Louis Babel, prêtre-missionnaire oblat et explorateur du Nouveau-Québec
(1826-1912) .
<Louis Label, prêtre-missionnaire oblat>
<Louis Label, explorateur du Nouveau-Quebec>
(2) La fontaine a aussi désigné le “vaisseau de cuivre ou de quelque autre métal, où
l’on garde de l’eau dans les maisons”, et encore le robinet de cuivre par où coule
l’eau d’une fontaine, ou le vin d’un tonneau, ou quelque autre liqueur que ce soit.
<fontaine, robinet de cuivre>
Example4: Application to Wikipedia
Disambiguation Pages
07/09/2017 Extracting hypernym relations 53
54. Example5: NN to extract relations
from scientific papers
• Corpus
– Full scientific papers (ISTEX French project)
– 15 years of Nature journal (50Ko of text)
• Open relation extraction with distant supervision
– Semantic resource: NCIT (thesaurus)
– Learning objects: vector made of the word embedding vectors
of a subset of lemmas of the sentences (around arguments)
– Learning algorithm: Self Organizing Maps
• Results
– Find 13 classes 5 of which as easy to interpret with a majority of
relations of one type
– 80% of accuracy for hypernym relations
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 54
55. Example5: NN to extract relations
from scientific papers
• Difficulties with SOM
– Size of the map > computation time
– Interpretation of the resulting classes
– Evaluation of the recall
• Limitations of supervised learning hypotheses
– One sentence may contain more than 2 domain concepts or
entities > arbitrary selection of the 1st 2
– 2 entities may be related by several relations in the semantic
resource > which annotation?
– The vocabulary in the corpus and semantic resource may differ
• Supervised learning requires expertise to adjust
– The number of iterations to get the optimal number of classes
– The size of the map or layers in the NN
– The features that form the classified objects
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 55
Person Work-in Company
holds
56. Towards more complementarity
ML can be used to
• To learn Patterns
• To find relation in complex
and long sentences
• For open relation
extraction, when the list of
possible relations is not
known
• When a domain resource is
available, ML with distant
supervision
• Deep learning makes the
process fully automatic but
requires very large corpora
Pattern can be used as
• input to define features: tag
sequences matching the
pattern (will become a
feature)
• an "easy method" when
regularities are obvious (cf
Polysemy pages in
Wikipedia)
• To boot-strap and
automatically identify
positive examples
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 56
57. Conclusion
• Context
– complexity and diversity of what we call semantic relation
extraction
– A lot of work has been done in designing and evaluating
patterns for semantic relations
• Many perspective to improve relation extraction
– Capitalize better exiting patterns
– Collect results about the most relevant features and the most
efficient representation to feed ML algorithms
– Need to implement pre-processing chains (even with NN
algorithms) for a larger set of languages
– Study how performant each technique is on a variety of NL text
where relations are expressed in many ways
– Design a plat-form where various methods could be used
together
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 57
58. Further readings
• Survey papers
– N Bach, S Badaskar (2007) A review of relation extraction Language
Technologies Institute, Carnegie Mellon University
– Konstantinova T. (2014) Review of Relation Extraction Methods: What
Is New Out There? in Communications in Computer and Information
Science 436:15-28 · April 2014
• Recent works
– VALSAMOU D., Extraction automatique d'information à partir d'articles
scientifiques pour la reconstruction de réseaux de régulations
biologiques impliqués dans le développement de la graine chez
Arabidopsis Thaliana. ED STIC, université Paris Sud. Soutenue le
17/01/2017 Directeur de thèse : C. Nédellec. INRA, Équipe Bibliome
– Fauconnier,J.P.. Acquisition de liens sémantiques à partir d'éléments de
mise en forme des textes : exploitation des structures énumératives.
Thèse de doctorat, Université de Toulouse, january 2016.
https://www.irit.fr/publis/MELODI/Fauconnier_These_2016.pdf
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 58
59. Relations that represent mappings
12/09/2017 SEMANTICS - NLP and KE at the era of SemWeb: Semantic relations - Aussenac 59
Erarht Rahm http://dbs.uni-leipzig.de/file/paris-Octob2014.pdf
Mapping process
Data linking at instance level = entity reconciliation
Ontology alignment
Purpose
Data integration thanks to semantic models
owl:SameAs
yago:WordNet_Plant_
100017222
dbpedia-
owl:Plant