Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The process of classifying scholarly outputs is crucial to ensure timely access to knowledge. However, this process is typically carried out manually by expert editors, leading to high costs and slow throughput. In this paper we present Smart Topic Miner (STM), a novel solution which uses semantic web technologies to classify scholarly publications on the basis of a very large automatically generated ontology of research areas. STM was developed to support the Springer Nature Computer Science editorial team in classifying proceedings in the LNCS family. It analyses in real time a set of publications provided by an editor and produces a structured set of topics and a number of Springer Nature classification tags, which best characterise the given input. In this paper we present the architecture of the system and report on an evaluation study conducted with a team of Springer Nature editors. The results of the evaluation, which showed that STM classifies publications with a high degree of accuracy, are very encouraging and as a result we are currently discussing the required next steps to ensure large-scale deployment within the company.
On the Reproducibility of the TAGME entity linking systemFaegheh Hasibi
Slide for the ECIR '16 paper: “On the reproducibility of the TAGME Entity Linking System”
Reproducibility is a fundamental requirement of scientific research. In this paper, we examine the repeatability, reproducibility, and generalizability of TAGME, one of the most popular entity linking systems. By comparing results obtained from its public API with (re)implementations from scratch, we obtain the following findings. The results reported in the TAGME paper cannot be repeated due to the unavailability of data sources. Part of the results are reproducible through the provided API, while the rest are not reproducible. We further show that the TAGME approach is generalizable to the task of entity linking in queries. Finally, we provide insights gained during this process and formulate lessons learned to inform future reducibility efforts.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The process of classifying scholarly outputs is crucial to ensure timely access to knowledge. However, this process is typically carried out manually by expert editors, leading to high costs and slow throughput. In this paper we present Smart Topic Miner (STM), a novel solution which uses semantic web technologies to classify scholarly publications on the basis of a very large automatically generated ontology of research areas. STM was developed to support the Springer Nature Computer Science editorial team in classifying proceedings in the LNCS family. It analyses in real time a set of publications provided by an editor and produces a structured set of topics and a number of Springer Nature classification tags, which best characterise the given input. In this paper we present the architecture of the system and report on an evaluation study conducted with a team of Springer Nature editors. The results of the evaluation, which showed that STM classifies publications with a high degree of accuracy, are very encouraging and as a result we are currently discussing the required next steps to ensure large-scale deployment within the company.
On the Reproducibility of the TAGME entity linking systemFaegheh Hasibi
Slide for the ECIR '16 paper: “On the reproducibility of the TAGME Entity Linking System”
Reproducibility is a fundamental requirement of scientific research. In this paper, we examine the repeatability, reproducibility, and generalizability of TAGME, one of the most popular entity linking systems. By comparing results obtained from its public API with (re)implementations from scratch, we obtain the following findings. The results reported in the TAGME paper cannot be repeated due to the unavailability of data sources. Part of the results are reproducible through the provided API, while the rest are not reproducible. We further show that the TAGME approach is generalizable to the task of entity linking in queries. Finally, we provide insights gained during this process and formulate lessons learned to inform future reducibility efforts.
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others.
TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.
Entity Linking in Queries: Tasks and EvaluationFaegheh Hasibi
Slides for the ICTIR 2015 paper "Entity Linking in Queries: Tasks and Evaluation"
Annotating queries with entities is one of the core problem areas in query understanding. While seeming similar, the task of entity linking in queries is different from entity linking in documents and requires a methodological departure due to the inherent ambiguity of queries. We differentiate between two specific tasks, semantic mapping and interpretation finding, discuss current evaluation methodology, and propose refinements. We examine publicly available datasets for these tasks and introduce a new manually curated dataset for interpretation finding. To further deepen the understanding of task differences, we present a set of approaches for effectively addressing these tasks and report on experimental results.
The review and development work described in this report focuses on the aspects of semantic linking and annotation particularly relevant to ARIADNE. Semantic linking within ARIADNE is considered within the spatial, temporal and subject dimensions. The subject dimension is considered in depth, starting with a review of linking tools considered relevant to ARIADNE followed by a discussion of the ARIADNE approach and the vocabulary mapping tools used within ARIADNE. The Getty AAT proved an appropriate vocabulary mapping hub that afforded a multilingual search capability in the ARIADNE Portal via the semantic enrichment of partner subject metadata with derived AAT concepts.
A case study conducted an exploratory investigation of the semantic integration of extracts from archaeological datasets with information extracted via NLP across different languages. The
investigation followed a broad theme relating to wooden material including shipwrecks, with a focus on types of wooden material, samples taken, wooden objects with dating from dendrochronological analysis, etc. The Demonstrator is available for general use. The user is shielded from the complexity
of the underlying semantic framework (based on the CIDOC CRM and Getty AAT) by the Web application user interface. The Demonstrator highlights the potential for archaeological research that can interrogate grey literature reports in conjunction with datasets. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over hierarchies of wood types.
Authors:
Douglas Tudhope (USW)
Ceri Binding (USW)
D15.3. Ver:1 (Final)
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
The evolution of data environments towards the growth in the size, complexity, dy-
namicity and decentralisation (SCoDD) of schemas drastically impacts contemporary
data management. The SCoDD trend emerges as a central data management concern
in Big Data scenarios, where users and applications have a demand for more complete
data, produced by independent data sources, under different semantic assumptions and
contexts of use. Most Database Management Systems (DBMSs) today target a closed
communication scenario, where the symbolic schema of the database is known a priori
by the database user, which is able to interpret it in an unambiguous way. The context
in which the data is consumed and produced is well-defined and it is typically the
same context in which the data was created. In contrast, data management under the
SCoDD conditions target an open communication scenario where the symbolic system of
the database is unknown by the user and multiple interpretation contexts are possible.
In this case the database can be created under a different context from the database
user. The emergence of this new data environment demands the revisit of the semantic
assumptions behind databases and the design of data access mechanisms which can
support semantically heterogeneous (open communication) data environments.
This work aims at filling this gap by proposing a complementary semantic model for
databases, based on distributional semantic models. Distributional semantics provides a
complementary perspective to the formal perspective of database semantics, which supports
semantic approximation as a first-class database operation. Differently from models
which describe uncertain and incomplete data or probabilistic databases, distributional-
relational models focuses on the construction of conceptual approximation approaches
for databases, supported by a comprehensive semantic model automatically built from
large-scale unstructured data external to the database, which serves as a semantic/com-
monsense knowledge base. The semantic model can be used to support schema-agnosticqueries, i.e. abstracting the data consumer from a specific conceptualization behind the
data.
The proposed distributional-relational semantic model is supported by a distributional
structured vector space model, named τ −Space, which represents structured data under
a distributional semantic model representation which, in coordination with a query plan-
ning approach, supports a schema-agnostic query mechanism for large-schema databases.
The query mechanism is materialized in the Treo query engine and is evaluated using
schema-agnostic natural language queries.
The evaluation of the query mechanism confirms that distributional semantics provides
a high-recall, medium-high precision, and low maintainability solution to cope with
the abstraction and conceptual-level differences in schema-agnostic queries over largeschema/
schema-less open domain dataset
The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time t+1 , using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.
Supporting Springer Nature Editors by means of Semantic TechnologiesFrancesco Osborne
The Open University and Springer Nature have been collaborating since 2015 in the development of an array of semantically-enhanced solutions supporting editors in i) classifying proceedings and other editorial products with respect to the relevant research areas and ii) taking informed decisions about their marketing strategy. These solutions include i) the Smart Topic API, which automatically maps keywords associated with published papers to semantically characterized topics, which are drawn from a very large and automatically-generated ontology of Computer Science topics; ii) the Smart Topic Miner, which helps editors to associate scholarly metadata to books; and iii) the Smart Book Recommender, which assists editors in deciding which editorial products should be marketed in a specific venue.
Clustering Citation Distributions for Semantic Categorization and Citation Prediction
by F. Osborne, S. Peroni, E. Motta
In this paper we present i) an approach for clustering authors according to their citation distributions and ii) an ontology, the Bibliometric Data Ontology, for supporting the formal representation of such clusters. This method allows the formulation of queries which take in consideration the citation behaviour of an author and predicts with a good level of accuracy future citation behaviours. We evaluate our approach with respect to alternative solutions and discuss the predicting abilities of the identified clusters.
URL: http://oro.open.ac.uk/40784/1/lisc2014.pdf
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others.
TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.
Entity Linking in Queries: Tasks and EvaluationFaegheh Hasibi
Slides for the ICTIR 2015 paper "Entity Linking in Queries: Tasks and Evaluation"
Annotating queries with entities is one of the core problem areas in query understanding. While seeming similar, the task of entity linking in queries is different from entity linking in documents and requires a methodological departure due to the inherent ambiguity of queries. We differentiate between two specific tasks, semantic mapping and interpretation finding, discuss current evaluation methodology, and propose refinements. We examine publicly available datasets for these tasks and introduce a new manually curated dataset for interpretation finding. To further deepen the understanding of task differences, we present a set of approaches for effectively addressing these tasks and report on experimental results.
The review and development work described in this report focuses on the aspects of semantic linking and annotation particularly relevant to ARIADNE. Semantic linking within ARIADNE is considered within the spatial, temporal and subject dimensions. The subject dimension is considered in depth, starting with a review of linking tools considered relevant to ARIADNE followed by a discussion of the ARIADNE approach and the vocabulary mapping tools used within ARIADNE. The Getty AAT proved an appropriate vocabulary mapping hub that afforded a multilingual search capability in the ARIADNE Portal via the semantic enrichment of partner subject metadata with derived AAT concepts.
A case study conducted an exploratory investigation of the semantic integration of extracts from archaeological datasets with information extracted via NLP across different languages. The
investigation followed a broad theme relating to wooden material including shipwrecks, with a focus on types of wooden material, samples taken, wooden objects with dating from dendrochronological analysis, etc. The Demonstrator is available for general use. The user is shielded from the complexity
of the underlying semantic framework (based on the CIDOC CRM and Getty AAT) by the Web application user interface. The Demonstrator highlights the potential for archaeological research that can interrogate grey literature reports in conjunction with datasets. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over hierarchies of wood types.
Authors:
Douglas Tudhope (USW)
Ceri Binding (USW)
D15.3. Ver:1 (Final)
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
The evolution of data environments towards the growth in the size, complexity, dy-
namicity and decentralisation (SCoDD) of schemas drastically impacts contemporary
data management. The SCoDD trend emerges as a central data management concern
in Big Data scenarios, where users and applications have a demand for more complete
data, produced by independent data sources, under different semantic assumptions and
contexts of use. Most Database Management Systems (DBMSs) today target a closed
communication scenario, where the symbolic schema of the database is known a priori
by the database user, which is able to interpret it in an unambiguous way. The context
in which the data is consumed and produced is well-defined and it is typically the
same context in which the data was created. In contrast, data management under the
SCoDD conditions target an open communication scenario where the symbolic system of
the database is unknown by the user and multiple interpretation contexts are possible.
In this case the database can be created under a different context from the database
user. The emergence of this new data environment demands the revisit of the semantic
assumptions behind databases and the design of data access mechanisms which can
support semantically heterogeneous (open communication) data environments.
This work aims at filling this gap by proposing a complementary semantic model for
databases, based on distributional semantic models. Distributional semantics provides a
complementary perspective to the formal perspective of database semantics, which supports
semantic approximation as a first-class database operation. Differently from models
which describe uncertain and incomplete data or probabilistic databases, distributional-
relational models focuses on the construction of conceptual approximation approaches
for databases, supported by a comprehensive semantic model automatically built from
large-scale unstructured data external to the database, which serves as a semantic/com-
monsense knowledge base. The semantic model can be used to support schema-agnosticqueries, i.e. abstracting the data consumer from a specific conceptualization behind the
data.
The proposed distributional-relational semantic model is supported by a distributional
structured vector space model, named τ −Space, which represents structured data under
a distributional semantic model representation which, in coordination with a query plan-
ning approach, supports a schema-agnostic query mechanism for large-schema databases.
The query mechanism is materialized in the Treo query engine and is evaluated using
schema-agnostic natural language queries.
The evaluation of the query mechanism confirms that distributional semantics provides
a high-recall, medium-high precision, and low maintainability solution to cope with
the abstraction and conceptual-level differences in schema-agnostic queries over largeschema/
schema-less open domain dataset
The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time t+1 , using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.
Supporting Springer Nature Editors by means of Semantic TechnologiesFrancesco Osborne
The Open University and Springer Nature have been collaborating since 2015 in the development of an array of semantically-enhanced solutions supporting editors in i) classifying proceedings and other editorial products with respect to the relevant research areas and ii) taking informed decisions about their marketing strategy. These solutions include i) the Smart Topic API, which automatically maps keywords associated with published papers to semantically characterized topics, which are drawn from a very large and automatically-generated ontology of Computer Science topics; ii) the Smart Topic Miner, which helps editors to associate scholarly metadata to books; and iii) the Smart Book Recommender, which assists editors in deciding which editorial products should be marketed in a specific venue.
Clustering Citation Distributions for Semantic Categorization and Citation Prediction
by F. Osborne, S. Peroni, E. Motta
In this paper we present i) an approach for clustering authors according to their citation distributions and ii) an ontology, the Bibliometric Data Ontology, for supporting the formal representation of such clusters. This method allows the formulation of queries which take in consideration the citation behaviour of an author and predicts with a good level of accuracy future citation behaviours. We evaluate our approach with respect to alternative solutions and discuss the predicting abilities of the identified clusters.
URL: http://oro.open.ac.uk/40784/1/lisc2014.pdf
At our Forums we bring qualified buyers of your products and services direct to you. Our events offer face to face selling opportunities with buyers who have asked to see you.
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Marieke van Erp
Giuseppe Rizzo, Biana Pereira, Andra Varga, Marieke van Erp, Amparo Elizabeth Cano Basave
Presented on Wednesday 10 October at the 17th International Semantic Web Conference (ISWC 2018)
Paper: http://www.semantic-web-journal.net/content/lessons-learnt-named-entity-recognition-and-linking-neel-challenge-series
Conference: http://iswc2018.semanticweb.org/
Online Index Extraction from Linked Open Data SourcesFabio Benedetti
This presentation has been held by me at the Workshop titled Linked Data for Information Extraction 2014 (LD4IE) held at the International Semantic Web Conference 2014. The related paper is titled "Online Index Extraction from Linked Open Data Sources" and here is the link: http://ceur-ws.org/Vol-1267/LD4IE2014_Benedetti.pdf
Slides for the following paper: NLP Data Cleansing Based on Linguistic Ontology Constraints
Abstract: Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is – compared to other domains, such as biology – a late Linked Data adopter. However, it has seen a
steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the lemon and NIF vocabularies in 277 test cases and point out common quality issues.
Natural Language Interface to Knowledge GraphVaticle
Natural language interfaces (NLI) offer end-users an easy and convenient way to query ontology-based knowledge graphs. They automatically generate database queries based on their natural language inputs, avoiding the need for the end user to learn different query languages. NLIs can be used with REST APIs to facilitate and enrich the interactions with knowledge graphs, in domains such as interactive root cause analysis (RCA), dynamic dashboard generation, and Online Transactional Processing (OLTP).
In this talk, you'll learn about a natural language interface built with a TypeDB server running on Raspberry Pi4. This application offers a conversational bot assistant with Cisco Webex for an efficient and flexible way to facilitate human-machine interactions. In particular, this talk will demonstrate how natural language inputs are translated into TypeQL queries using Abstract Syntax Trees that represent the syntactic structure discovered during the Named Entity Recognition (NER) analysis of the textual inputs provided by Rasa 2.X running on an Intel Celeron J3455 miniPC.
Unit1 - Individual Project Due on 03292014A company wan.docxdickonsondorris
Unit1 - Individual Project Due on 03/29/2014
A company wants to create 4 separate offices across the globe. The managers want to be able to connect these offices over the Internet, and they want to make sure that each network is available to each other. That is, they want the routing information of all 4 networks to be available at each site so that any employee can go to any site and work from it, but they are worried about passing traffic over the Internet and how to monitor the traffic for security and compliance reasons.
Please answer the following questions and requirements to write your 3–5 page paper. As you answer each question, you must provide support or evidence that will enhance and empirically prove your answers. Academic IT articles or real-life IT findings that are not found in journals or other academic sources must be used in supporting your answers. Please use APA style for all cited sources, including your reference page.
· Discuss the latest implementations of routing protocols that would be used in the company's wide area network and the Internet.
· Where is each protocol used in a wide area network (WAN)? How does each protocol work in a WAN?
· How would the routers help in monitoring passing traffic over the networks and Internet under safe and secure conditions? Provide examples of how it would be done and why it would be done in this fashion.
· If you were able to recommend another way to build this network from the idea and requirements of the company, what would be your recommendation? Why would you recommend this alternative?
Be sure to reference all sources using APA style.
This assignment will also be assessed using additional criteria provided here.
Please submit your assignment.
Unit2 - Individual Project due on 4/5/2014
Discuss the importance of using a routing protocol and explain general functionality. Router Information Protocol (RIP) is a distance-vector routing protocol than can be used by a router to send data from one network to another. What are the demerits of distance vector routing protocols in general?
Assignment Guidelines:
· Using the course materials, the textbook, and Web resources research distance-vector routing protocols, with a high concentration on the versions of Router Information Protocol (RIP).
· Use the following questions to help you format your research paper:
What is the purpose of a routing protocol?
What is a distance-vector routing protocol?
What are the advantages and disadvantages of distance-vector protocols?
What is RIP?
What are the two versions of RIP, and what are their specific purposes?
What are the limitations of RIP?
· Compile your responses to the above questions in a 4–7 page word document.
Your submitted assignment (125 points) must include the following:
· A 4–7 page, double-spaced Microsoft Word document, excluding title page and reference section. All references should be in APA format. You may find using section headings and bullets make the paper mo ...
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
With the emergence of the Web of Data, most notably Linked Open Data (LOD), an abundance of data has become available on the web. However, LOD datasets and their inherent subgraphs vary heavily with respect to their size, topic and domain coverage, the schemas and their data dynamicity (respectively schemas and metadata) over the time. To this extent, identifying suitable datasets, which meet spefic criteria, has become an increasingly important, yet challenging task to support issues such as entity retrieval or semantic search and data linking. Particularly with respect to the interlinking issue, the current topology of the LOD cloud underlines the need for practical and ecient means to recommend suitable datasets: currently, only well-known reference graphs such as DBpedia (the most obvious target), YAGO or Freebase show a high amount of in-links, while there exists a long tail of potentially suitable yet under-recognized datasets. This problem is due to
the semantic web tradition in dealing with "fnding candidate datasets to link to", where data publishers are used to identify target datasets for interlinking.
While an understanding of the nature of the content of specic datasets is a crucial
prerequisite for the mentioned issues, we adopt in this dissertation the notion of
\dataset prole" | a set of features that describe a dataset and allow the comparison
of dierent datasets with regard to their represented characteristics. Our
rst research direction was to implement a collaborative ltering-like dataset recommendation
approach, which exploits both existing dataset topic proles, as well
as traditional dataset connectivity measures, in order to link LOD datasets into
a global dataset-topic-graph. This approach relies on the LOD graph in order to
learn the connectivity behaviour between LOD datasets. However, experiments have
shown that the current topology of the LOD cloud group is far from being complete
to be considered as a ground truth and consequently as learning data.
Facing the limits the current topology of LOD (as learning data), our research
has led to break away from the topic proles representation of \learn to rank"
approach and to adopt a new approach for candidate datasets identication where
the recommendation is based on the intensional proles overlap between dierent
datasets. By intensional prole, we understand the formal representation of a set of
schema concept labels that best describe a dataset and can be potentially enriched
This presentation was provided by Phil Norman of OCLC and Peter McCracken of Serials Solutions, during the NISO event "OpenURL Implementation: Link Resolution That Users Will Love," held on August 21, 2008.
AI value, tools and applications in public services: the application in easyRights, an H2020 project, for supporting social inclusion and two ongoing studies on AI applied to support the fight against COVID-19. Seminar at Politecnico di Milano
Learning with the Web: Spotting Named Entities on the intersection of NERD an...Giuseppe Rizzo
Talk "Learning with the web: spotting named entities on the intersection of nerd and machine learning" event during #MSM'13 (WWW'13), Rio de Janeiro, Brazil
Microposts shared on social platforms instantaneously report facts, opinions or emotions. In these posts, entities are often used but they are continuously changing depending on what is currently trending. In such a scenario, recognising these named entities is a challenging task, for which off-the-shelf approaches are not well equipped. We propose NERD-ML, an approach that unifies the benefits of a crowd entity recognizer through Web entity extractors combined with the linguistic strengths of a machine learning classifier.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
NEEL2015 challenge summary
1. Making Sense of Microposts
(#Microposts2015) @ WWW2015
Named Entity rEcognition
and Linking Challenge
http://www.scc.lancs.ac.uk/microposts2015/challenge/
2. NEEL challenge overview
➢ Challenging to make sense of Microposts
○ they are very short text messages
○ they contain abbreviations and typos
○ they are “grammar free”
➢ The NEEL challenge aims to explore new
approaches to foster research into novel,
more accurate entity recognition and linking
approaches tailored for Microposts
3. 2013
2014
Information Extraction (IE)
named entity recognition (4 types)
2015
Named Entity Extraction and Linking
(NEEL)
named entity extraction and linking to
DBpedia 3.9 entries
Named Entity rEcognition and Linking
(NEEL)
named entity recognition (7 types) and linking
to DBpedia 2014 entries
4. ➢ normalization
○ linguistic pre-processing and expansion of tweets
➢ entity recognition and linking
○ sequential and semi-joint tasks
○ large Knowledge Bases (such as DBpedia and
Yago) as lexical dictionaries and source of already
existing relations among entities
○ supervised learning approaches to both predict the
type of the entity given the linguistic and contextual
similarity, and the link given the semantic similarity
○ unsupervised learning approaches for grouping
similar lexical entities, affecting the entity resolution
Highlights of the submitted
approaches over the 3-year challenge
5. Sponsorship
➢ Successfully obtained sponsorship each year
○ highlights importance of this practical research
○ importance extends BEYOND academia
➢ Sponsor has early access to results as senior
PC member
○ opportunity to liaise with participants to extend work
➢ Workshop and participants obtain greater
exposure
6. ➢ Italian company operating in the business of
knowledge extraction and representation
➢ successfully participated in 2014 NEEL
challenge, ranking 3rd overall
8. 21 teams finally got
involved and signed the
agreement to access to
the NEEL challenge
corpus
9. NEEL corpus
no. of tweets %
Training 3498 58.06
Development 500 8.3
Test 2027 33.64
10. NEEL Corpus details
➢ 6025 tweets
○ events from 2011 and 2013 such the London Riots,
the Oslo bombing (cf. event-annotated tweets
provided by the Redites project)
○ events in 2014 such as UCI Cyclo-cross World Cup
➢ Corpus available after having signed the
NEEL Agreement Form
(remains available by contacting msm.
orgcom@gmail.com)
11. Manual creation of the Gold
Standard
3-step annotation
1. unsupervised annotations, with intent to
extract candidate links which were used as
input to the second stage. NERD-ML was
used as off-the-shelf system
2. three human annotators analyzed and
complemented the annotations. GATE was
used as the workbench
3. one domain expert reviewed and resolved
problematic cases
12. Evaluation protocol
Participants were asked to wrap their
prototypes as a publicly accessible
web service following a REST-based
protocol
Widen the dissemination, ensure the
reproducibility, the reuse, and the
correctness of the results
13. Evaluation periods
D-Time to test the contending entries
(REST APIs) submitted by the
participants
T-Time for the final evaluation and
metric computations
14. Submissions and Runs
➢ Paper submission
○ describing approach taken
○ identifying and detailing any limitations or
dependencies of approach
➢ Up to 10 contending entries
○ best of 3 used for the final ranking
19. Drop of 14 participants
due to complexity
i) of the challenge protocol, which has
required broaden expertise in different
domains such as Information Extraction,
Data Semantics, and Web
ii) generally low results
28. Acknowledgements
The research leading to this
work was partially supported by
the European Union’s 7th
Framework Programme via the
projects LinkedTV