The document describes a system for identifying answers and implicit dialogues in community question answering sites. It examines multiple feature types, including string similarity, word embeddings, topic modeling and keyword features. An SVM classifier is used to rank answers for three subtasks: comment classification, question-comment similarity and question-external comment similarity. Evaluation on the SemEval 2017 Task 3 dataset shows the full feature set performs best, with string and embedding features also providing significant contributions to performance.
LAK19 - Towards Value-Sensitive Learning Analytics DesignBodong Chen
LAK19 Full Paper. Abstract: To support ethical considerations and system integrity in learning analytics, this paper introduces two cases of applying the Value Sensitive Design methodology to learning analytics design. The first study applied two methods of Value Sensitive Design, namely stakeholder analysis and value analysis, to a conceptual investigation of an existing learning analytics tool. This investigation uncovered a number of values and value tensions, leading to design trade-offs to be considered in future tool refinements. The second study holistically applied Value Sensitive Design to the design of a recommendation system for the Wikipedia WikiProjects. To proactively consider values among stakeholders, we derived a multi-stage design process that included literature analysis, empirical investigations, prototype development, community engagement, iterative testing and refinement, and continuous evaluation. By reporting on these two cases, this paper responds to a need of practical means to support ethical considerations and human values in learning analytics systems. These two cases demonstrate that Value Sensitive Design could be a viable approach for balancing a wide range of human values, which tend to encompass and surpass ethical issues, in learning analytics design.
24/7 Instant Feedback on Writing: Integrating AcaWriter into your TeachingSimon Buckingham Shum
https://cic.uts.edu.au/events/24-7-instant-feedback-on-writing-integrating-acawriter-into-your-teaching-2-dec/
What difference could instant feedback on draft writing make to your students? Over the last 5 years the Connected Intelligence Centre has been developing and piloting an automated feedback tool for academic writing (AcaWriter), working closely with academics across several faculties. The research portal documents how educators and students engage with this kind of AI, and what we’ve learnt about integrating it into teaching and assessment.
In May, AcaWriter was launched to all students along with an information portal. Now we want to start upskilling academics, tutors and learning technologists, in a monthly session to give you the chance to learn about AcaWriter, and specifically, good practices for integrating it into your subject. CIC can support you, and we hope you may be interested in co-designing publishable research.
AcaWriter handles several different ‘genres’ of writing, including reflective writing (e.g. a Reflective Essay; Reflective Blogs/Journals on internships/work-placements) and analytical writing (e.g. Argumentative Essays; Research Abstracts & Introductions).
This briefing will demo AcaWriter, and show it can be embedded in student activities. We hope this sparks ideas for your own teaching, which we can discuss in more detail.
Graph's not dead: from unsupervised induction of linguistic structures from t...Alexander Panchenko
In this invited talk, presented at the Dialogue'2018 conference, I argue for the usefulness of graph representations for NLP in the deep learning era. In the lecture, it is described how to extract symbolic linguistic structures, such as word senses and semantic frames in an unsupervised way from text corpora using graph-based algorithms and distributional semantics.
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlAlexander Panchenko
We present DepCC, the largest-to-date linguistically analyzed corpus in English including 365 million documents, composed of 252 billion tokens and 7.5 billion of named entity occurrences in 14.3 billion sentences from a web-scale crawl of the Common Crawl project. The sentences are processed with a dependency parser and with a named entity tagger and contain provenance information, enabling various applications ranging from training syntax-based word embeddings to open information extraction and question answering. We built an index of all sentences and their linguistic meta-data enabling quick search across the corpus. We demonstrate the utility of this corpus on the verb similarity task by showing that a distributional model trained on our corpus yields better results than models trained on smaller corpora, like Wikipedia. This distributional model outperforms the state of art models of verb similarity trained on smaller corpora on the SimVerb3500 dataset.
http://www.lrec-conf.org/proceedings/lrec2018/summaries/215.html
LAK19 - Towards Value-Sensitive Learning Analytics DesignBodong Chen
LAK19 Full Paper. Abstract: To support ethical considerations and system integrity in learning analytics, this paper introduces two cases of applying the Value Sensitive Design methodology to learning analytics design. The first study applied two methods of Value Sensitive Design, namely stakeholder analysis and value analysis, to a conceptual investigation of an existing learning analytics tool. This investigation uncovered a number of values and value tensions, leading to design trade-offs to be considered in future tool refinements. The second study holistically applied Value Sensitive Design to the design of a recommendation system for the Wikipedia WikiProjects. To proactively consider values among stakeholders, we derived a multi-stage design process that included literature analysis, empirical investigations, prototype development, community engagement, iterative testing and refinement, and continuous evaluation. By reporting on these two cases, this paper responds to a need of practical means to support ethical considerations and human values in learning analytics systems. These two cases demonstrate that Value Sensitive Design could be a viable approach for balancing a wide range of human values, which tend to encompass and surpass ethical issues, in learning analytics design.
24/7 Instant Feedback on Writing: Integrating AcaWriter into your TeachingSimon Buckingham Shum
https://cic.uts.edu.au/events/24-7-instant-feedback-on-writing-integrating-acawriter-into-your-teaching-2-dec/
What difference could instant feedback on draft writing make to your students? Over the last 5 years the Connected Intelligence Centre has been developing and piloting an automated feedback tool for academic writing (AcaWriter), working closely with academics across several faculties. The research portal documents how educators and students engage with this kind of AI, and what we’ve learnt about integrating it into teaching and assessment.
In May, AcaWriter was launched to all students along with an information portal. Now we want to start upskilling academics, tutors and learning technologists, in a monthly session to give you the chance to learn about AcaWriter, and specifically, good practices for integrating it into your subject. CIC can support you, and we hope you may be interested in co-designing publishable research.
AcaWriter handles several different ‘genres’ of writing, including reflective writing (e.g. a Reflective Essay; Reflective Blogs/Journals on internships/work-placements) and analytical writing (e.g. Argumentative Essays; Research Abstracts & Introductions).
This briefing will demo AcaWriter, and show it can be embedded in student activities. We hope this sparks ideas for your own teaching, which we can discuss in more detail.
Graph's not dead: from unsupervised induction of linguistic structures from t...Alexander Panchenko
In this invited talk, presented at the Dialogue'2018 conference, I argue for the usefulness of graph representations for NLP in the deep learning era. In the lecture, it is described how to extract symbolic linguistic structures, such as word senses and semantic frames in an unsupervised way from text corpora using graph-based algorithms and distributional semantics.
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlAlexander Panchenko
We present DepCC, the largest-to-date linguistically analyzed corpus in English including 365 million documents, composed of 252 billion tokens and 7.5 billion of named entity occurrences in 14.3 billion sentences from a web-scale crawl of the Common Crawl project. The sentences are processed with a dependency parser and with a named entity tagger and contain provenance information, enabling various applications ranging from training syntax-based word embeddings to open information extraction and question answering. We built an index of all sentences and their linguistic meta-data enabling quick search across the corpus. We demonstrate the utility of this corpus on the verb similarity task by showing that a distributional model trained on our corpus yields better results than models trained on smaller corpora, like Wikipedia. This distributional model outperforms the state of art models of verb similarity trained on smaller corpora on the SimVerb3500 dataset.
http://www.lrec-conf.org/proceedings/lrec2018/summaries/215.html
Improving Hypernymy Extraction with Distributional Semantic ClassesAlexander Panchenko
http://www.lrec-conf.org/proceedings/lrec2018/pdf/234.pdf
In this paper, we show how distributionally-induced semantic classes can be helpful for extracting hypernyms. We present methods for inducing sense-aware semantic classes using distributional semantics and using these induced semantic classes for filtering noisy hypernymy relations. Denoising of hypernyms is performed by labeling each semantic class with its hypernyms. On the one hand, this allows us to filter out wrong extractions using the global structure of distributionally similar senses. On the other hand, we infer missing hypernyms via label propagation to cluster terms. We conduct a large-scale crowdsourcing study showing that processing of automatically extracted hypernyms using our approach improves the quality of the hypernymy extraction in terms of both precision and recall. Furthermore, we show the utility of our method in the domain taxonomy induction task, achieving the state-of-the-art results on a SemEval'16 task on taxonomy induction.
The paper was presented at the LREC'2018 conference in Miyazaki, Japan.
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical ResourcesAlexander Panchenko
In this talk, we will discuss induction of sparse and dense
word sense representations using graph-based approaches and
distributional models. Induced senses are represented by a vector, but
also a set of hypernyms, images, and usage examples, derived in an
unsupervised and knowledge-free manner, which ensure interpretability
of the discovered senses by humans. We showcase the usage of the
induced representations for the tasks of word sense disambiguation and
enrichment of lexical resources, such as WordNet.
Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Indu...Alexander Panchenko
Presentation at the AIST'17 conference by Dmitry Ustalov. Authors of the original paper: Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, Alexander Panchenko.
Using Linked Disambiguated Distributional Networks for Word Sense DisambiguationAlexander Panchenko
We introduce a new method for unsupervised knowledge-based word sense disambiguation (WSD) based on a resource that links two types of sense-aware lexical networks: one is induced from a corpus using distributional semantics, the other is manually constructed. The combination of two networks reduces the sparsity of sense representations used for WSD. We evaluate these enriched representations within two lexical sample sense disambiguation benchmarks. Our results indicate that (1) features extracted from the corpus-based resource help to significantly outperform a model based solely on the lexical resource; (2) our method achieves results comparable or better to four state-of-the-art unsupervised knowledge-based WSD systems including three hybrid systems that also rely on text corpora. In contrast to these hybrid methods, our approach does not require access to web search engines, texts mapped to a sense inventory, or machine translation systems.
See the full paper at: http://www.aclweb.org/anthology/W/W17/W17-1909.pdf
Panchenko A., Faralli S., Ponzetto S. P., and Biemann C. (2017): Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation. In Proceedings of the Workshop on Sense, Concept and Entity Representations and their Applications (SENSE) co-located with the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL'2017). Valencia, Spain. Association for Computational Linguistics
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...Alexander Panchenko
The current trend in NLP is the use of highly opaque models, e.g. neural networks and word embeddings. While these models yield state-of-the-art results on a range of tasks, their drawback is
poor interpretability. On the example of word sense induction and disambiguation (WSID), we show that it is possible to develop an interpretable model that matches the state-of-the-art models in accuracy. Namely, we present an unsupervised, knowledge-free WSID approach, which is interpretable at three levels: word sense inventory, sense feature representations, and disambiguation procedure. Experiments show that our model performs on par with state-of-the-art word sense embeddings and other unsupervised systems while offering the possibility to justify
its decisions in human-readable form.
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...Alexander Panchenko
We introduce an approach to word sense
induction and disambiguation. The method
is unsupervised and knowledge-free: sense
representations are learned from distributional
evidence and subsequently used to
disambiguate word instances in context.
These sense representations are obtained
by clustering dependency-based secondorder
similarity networks. We then add
features for disambiguation from heterogeneous
sources such as window-based and
sentence-wide co-occurrences, and explore
various schemes to combine these context
clues. Our method reaches a performance
comparable to the state-of-the-art unsupervised
word sense disambiguation systems
including top participants of the SemEval
2013 word sense induction task and two
more recent state-of-the-art neural word
sense induction systems
Full paper:
https://www.lt.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/konvens2016panchenko.pdf
Ayush Kumar, Sarah Kohail, Amit Kumar, Asif Ekbal, Chris Biemann
IIT Patna, India
TU Darmstadt, Germany
Presented by: Alexander Panchenko, TU Darmstadt, Germany
A sentiment index measures the average emotional level in a corpus. We introduce four such indexes and use them to gauge average “positiveness” of a population during some period based on posts in a social network. This article for the first time presents a text-, rather than word-based sentiment index. Furthermore, this study presents the first large-scale study of the sentiment index of the Russian-speaking Facebook. Our results are consistent with the prior experiments for English language.
Semantic relations, such as synonyms, hypernyms and co-hyponyms proved to be useful for text processing applications, including text similarity, query expansion, question answering and word sense disambiguation. Such relations are practical because of the gap between lexical surface of the text and its meaning. Indeed, the same concept is often represented by different terms. However, existing resources often do not cover a vocabulary required by a given system. Manual resource construction is prohibitively expensive for many projects.
On the other hand, precision of the existing extractors still do not meet quality of the handcrafted resources. All these factors motivate the development of novel extraction methods. In this work we developed several similarity measures for semantic relation extraction. The main research question we address, is how to improve precision and coverage of such measures. First, we perform a large-scale study the baseline techniques. Second, we propose four novel measures. One of them significantly outperforms the baselines, the others perform comparably to the state-of-the-art techniques. Finally, we successfully apply one of the novel measures in two text processing systems.
Detecting Gender by Full Name: Experiments with the Russian LanguageAlexander Panchenko
This paper describes a method that detects gender of a person by his/her full name. While some approaches were proposed for English language, little has been done so far for Russian. We fill this gap and present a large-scale experiment on a dataset of 100,000 Russian full names from Facebook. Our method is based on three types of features (word endings, character $n$-grams and dictionary of names) combined within a linear supervised model. Experiments show that the proposed simple and computationally efficient approach yields excellent results achieving accuracy up to 96\%.
Вычислительная лексическая семантика: метрики семантической близости и их при...Alexander Panchenko
Вычислительная лексическая семантика: метрики семантической близости и их приложения
Серия лекций в НИУ ВШЭ, факультет бизнес-информатики и прикладной математики (Нижний Новгород)
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Improving Hypernymy Extraction with Distributional Semantic ClassesAlexander Panchenko
http://www.lrec-conf.org/proceedings/lrec2018/pdf/234.pdf
In this paper, we show how distributionally-induced semantic classes can be helpful for extracting hypernyms. We present methods for inducing sense-aware semantic classes using distributional semantics and using these induced semantic classes for filtering noisy hypernymy relations. Denoising of hypernyms is performed by labeling each semantic class with its hypernyms. On the one hand, this allows us to filter out wrong extractions using the global structure of distributionally similar senses. On the other hand, we infer missing hypernyms via label propagation to cluster terms. We conduct a large-scale crowdsourcing study showing that processing of automatically extracted hypernyms using our approach improves the quality of the hypernymy extraction in terms of both precision and recall. Furthermore, we show the utility of our method in the domain taxonomy induction task, achieving the state-of-the-art results on a SemEval'16 task on taxonomy induction.
The paper was presented at the LREC'2018 conference in Miyazaki, Japan.
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical ResourcesAlexander Panchenko
In this talk, we will discuss induction of sparse and dense
word sense representations using graph-based approaches and
distributional models. Induced senses are represented by a vector, but
also a set of hypernyms, images, and usage examples, derived in an
unsupervised and knowledge-free manner, which ensure interpretability
of the discovered senses by humans. We showcase the usage of the
induced representations for the tasks of word sense disambiguation and
enrichment of lexical resources, such as WordNet.
Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Indu...Alexander Panchenko
Presentation at the AIST'17 conference by Dmitry Ustalov. Authors of the original paper: Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, Alexander Panchenko.
Using Linked Disambiguated Distributional Networks for Word Sense DisambiguationAlexander Panchenko
We introduce a new method for unsupervised knowledge-based word sense disambiguation (WSD) based on a resource that links two types of sense-aware lexical networks: one is induced from a corpus using distributional semantics, the other is manually constructed. The combination of two networks reduces the sparsity of sense representations used for WSD. We evaluate these enriched representations within two lexical sample sense disambiguation benchmarks. Our results indicate that (1) features extracted from the corpus-based resource help to significantly outperform a model based solely on the lexical resource; (2) our method achieves results comparable or better to four state-of-the-art unsupervised knowledge-based WSD systems including three hybrid systems that also rely on text corpora. In contrast to these hybrid methods, our approach does not require access to web search engines, texts mapped to a sense inventory, or machine translation systems.
See the full paper at: http://www.aclweb.org/anthology/W/W17/W17-1909.pdf
Panchenko A., Faralli S., Ponzetto S. P., and Biemann C. (2017): Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation. In Proceedings of the Workshop on Sense, Concept and Entity Representations and their Applications (SENSE) co-located with the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL'2017). Valencia, Spain. Association for Computational Linguistics
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...Alexander Panchenko
The current trend in NLP is the use of highly opaque models, e.g. neural networks and word embeddings. While these models yield state-of-the-art results on a range of tasks, their drawback is
poor interpretability. On the example of word sense induction and disambiguation (WSID), we show that it is possible to develop an interpretable model that matches the state-of-the-art models in accuracy. Namely, we present an unsupervised, knowledge-free WSID approach, which is interpretable at three levels: word sense inventory, sense feature representations, and disambiguation procedure. Experiments show that our model performs on par with state-of-the-art word sense embeddings and other unsupervised systems while offering the possibility to justify
its decisions in human-readable form.
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...Alexander Panchenko
We introduce an approach to word sense
induction and disambiguation. The method
is unsupervised and knowledge-free: sense
representations are learned from distributional
evidence and subsequently used to
disambiguate word instances in context.
These sense representations are obtained
by clustering dependency-based secondorder
similarity networks. We then add
features for disambiguation from heterogeneous
sources such as window-based and
sentence-wide co-occurrences, and explore
various schemes to combine these context
clues. Our method reaches a performance
comparable to the state-of-the-art unsupervised
word sense disambiguation systems
including top participants of the SemEval
2013 word sense induction task and two
more recent state-of-the-art neural word
sense induction systems
Full paper:
https://www.lt.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/konvens2016panchenko.pdf
Ayush Kumar, Sarah Kohail, Amit Kumar, Asif Ekbal, Chris Biemann
IIT Patna, India
TU Darmstadt, Germany
Presented by: Alexander Panchenko, TU Darmstadt, Germany
A sentiment index measures the average emotional level in a corpus. We introduce four such indexes and use them to gauge average “positiveness” of a population during some period based on posts in a social network. This article for the first time presents a text-, rather than word-based sentiment index. Furthermore, this study presents the first large-scale study of the sentiment index of the Russian-speaking Facebook. Our results are consistent with the prior experiments for English language.
Semantic relations, such as synonyms, hypernyms and co-hyponyms proved to be useful for text processing applications, including text similarity, query expansion, question answering and word sense disambiguation. Such relations are practical because of the gap between lexical surface of the text and its meaning. Indeed, the same concept is often represented by different terms. However, existing resources often do not cover a vocabulary required by a given system. Manual resource construction is prohibitively expensive for many projects.
On the other hand, precision of the existing extractors still do not meet quality of the handcrafted resources. All these factors motivate the development of novel extraction methods. In this work we developed several similarity measures for semantic relation extraction. The main research question we address, is how to improve precision and coverage of such measures. First, we perform a large-scale study the baseline techniques. Second, we propose four novel measures. One of them significantly outperforms the baselines, the others perform comparably to the state-of-the-art techniques. Finally, we successfully apply one of the novel measures in two text processing systems.
Detecting Gender by Full Name: Experiments with the Russian LanguageAlexander Panchenko
This paper describes a method that detects gender of a person by his/her full name. While some approaches were proposed for English language, little has been done so far for Russian. We fill this gap and present a large-scale experiment on a dataset of 100,000 Russian full names from Facebook. Our method is based on three types of features (word endings, character $n$-grams and dictionary of names) combined within a linear supervised model. Experiments show that the proposed simple and computationally efficient approach yields excellent results achieving accuracy up to 96\%.
Вычислительная лексическая семантика: метрики семантической близости и их при...Alexander Panchenko
Вычислительная лексическая семантика: метрики семантической близости и их приложения
Серия лекций в НИУ ВШЭ, факультет бизнес-информатики и прикладной математики (Нижний Новгород)
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification
1. 1/34
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple
Features for Community Question Answering and
Implicit Dialogue Identification
Titas Nandi1, Chris Biemann2, Seid Muhie Yimam2, Deepak Gupta1,
Sarah Kohail2, Asif Ekbal1 and Pushpak Bhattacharyya1
1Indian Institute of Technology Patna, India
2Universit¨at Hamburg, Germany
{titas.ee13,deepak.pcs16,asif,pb}@iitp.ac.in
{biemann,yimam,kohail}@informatik.uni-hamburg.de
Presented by Alexander Panchenko2
August 3, 2017
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
2. 2/34
Outline
1 Task Description
Structure of the Task
Related Work
2 System Description
Basic Features
Implicit Dialogue Identification
Statistical Model
3 Results
Results on Different Feature Sets
Comparison with Other Teams at SemEval 2017
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
3. 3/34
Outline
1 Task Description
Structure of the Task
Related Work
2 System Description
Basic Features
Implicit Dialogue Identification
Statistical Model
3 Results
Results on Different Feature Sets
Comparison with Other Teams at SemEval 2017
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
4. 4/34
SemEval 2017 Task 3: the Three Sub-Tasks
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
5. 5/34
Outline
1 Task Description
Structure of the Task
Related Work
2 System Description
Basic Features
Implicit Dialogue Identification
Statistical Model
3 Results
Results on Different Feature Sets
Comparison with Other Teams at SemEval 2017
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
6. 6/34
Related Work
Useful ideas from the best systems of 2015 and 2016 tasks:
Belinkov (2015): word vectors and meta-data features
Nicosia (2015): derived features from a comment in the
context of the entire thread
Filice (2016): stacking classifiers across subtasks
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
7. 7/34
Outline of the Method
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
8. 8/34
Outline
1 Task Description
Structure of the Task
Related Work
2 System Description
Basic Features
Implicit Dialogue Identification
Statistical Model
3 Results
Results on Different Feature Sets
Comparison with Other Teams at SemEval 2017
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
9. 9/34
String Similarity Features
String similarity between a question-comment/question pair:
Jaro-Winkler
Levenshtein
Jaccard
Sorensen-Dice
n-gram
LCS
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
10. 10/34
Domain (Task) Specific Features
If a comment by asker of the question is an acknowledgement
Position of comment in the thread
Coverage (the ratio of the number of tokens) of question by the
comment and comment by the question
Presence of URLs, emails or HTML tags
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
11. 11/34
Word Embedding Features
Trained word embedding model using Word2Vec on
unannotated data
Sentence vectors
averaging word vectors
wscore = wquestion − wcomment
Distance scores
Based on the computed sentence vectors
Cosine Distance (1 − cos)
Manhattan Distance
Euclidean Distance
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
12. 12/34
Topic Modeling Features
Trained LDA Topic model using Mallet tool on training data
Extracted the 20 most relevant topics for the data
Topic Vector of a Question/Comment
wscore = wquestion − wcomment
Topic Vocabulary of a Question/Comment
Vocabulary(T) =
10
i=1
topic words(ti )
where ti is one of the top topics for comment/question T.
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
13. 13/34
Keyword and Named Entity Features
Extracted keywords or focus words from question and
comment using the RAKE algorithm (Rose et al., 2010)
Keyword match between question and comment
Extracted Named Entities from question and comment
Entity tags consisted of LOCATION, PERSON,
ORGANIZATION, DATE, MONEY, PERCENT and TIME
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
14. 14/34
Outline
1 Task Description
Structure of the Task
Related Work
2 System Description
Basic Features
Implicit Dialogue Identification
Statistical Model
3 Results
Results on Different Feature Sets
Comparison with Other Teams at SemEval 2017
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
15. 15/34
Implicit Dialogue Identification
Identified implicit dialogues among users
User Interaction Graph
Each user is in dialogue with some other user who came before
him/her
Asker - desirable
Other users - not desirable
Vertices - Users in a comment thread
Edges - Directed edges showing interaction
Edge weight - the level of interaction
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
16. 16/34
Implicit Dialogue Identification: an Example
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
17. 17/34
Implicit Dialogue Identification: an Example
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
18. 18/34
Implicit Dialogue Identification: an Example
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
19. 19/34
Implicit Dialogue Identification: an Example
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
20. 20/34
Implicit Dialogue Identification: an Example
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
21. 21/34
Implicit Dialogue Identification: an Example
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
22. 22/34
Implicit Dialogue Identification: an Example
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
23. 23/34
Computing Edge Weights
The edge weight is computed (or revised) on the basis of:
Explicit dialogue score. If one user refers the other explicitly,
then add 1.0 to the edge score.
Embedding score. For each word in a comment, find the word
in the other comment that has maximum cosine similarity with it.
Then finally average all those max cosine scores to get a value.
Topic score. The cosine of topic vectors of the two comments.
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
24. 24/34
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
25. 25/34
Outline
1 Task Description
Structure of the Task
Related Work
2 System Description
Basic Features
Implicit Dialogue Identification
Statistical Model
3 Results
Results on Different Feature Sets
Comparison with Other Teams at SemEval 2017
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
26. 26/34
Classification Model
Normalized all feature values with Z-scores
Feature Selection using wrapper methods to maximize
accuracy on the development set
Used SVM confidence probabilities for ranking (RBF kernel)
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
27. 27/34
Subtask C: Similarity of Questions and External Comments
Oversample the data using the SMOTE (Chawla, 2002)
technique and run classifier on original question - external
comment pair
Stacking across tasks: the SVM scores of all three subtasks
are combined:
Score C = log(SVM Score) + log(Score A) + log(Score B)
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
28. 28/34
Outline
1 Task Description
Structure of the Task
Related Work
2 System Description
Basic Features
Implicit Dialogue Identification
Statistical Model
3 Results
Results on Different Feature Sets
Comparison with Other Teams at SemEval 2017
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
29. 29/34
Feature Ablation Results: Impact of Different Feature Sets
Features Development Set 2017
Subtask A MAP P R F1 Acc
All Features 65.50 58.43 62.71 60.50 72.54
All — string 65.53 57.84 62.71 60.18 72.17
All — embedding 62.11 53.03 53.42 53.23 68.52
All — domain 61.85 54.46 54.52 54.49 69.47
All — topic 65.15 59.02 61.98 60.47 72.83
All — keyword 65.73 57.98 62.59 60.20 72.25
IR Baseline 53.84 - - - -
Runs Test Set 2017
Subtask A MAP P R F1 Acc
Primary 86.88 73.37 74.52 73.94 72.70
Contrastive 1 86.35 79.42 51.94 62.80 68.02
Contrastive 2 85.24 81.22 57.65 67.43 71.06
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
30. 30/34
Outline
1 Task Description
Structure of the Task
Related Work
2 System Description
Basic Features
Implicit Dialogue Identification
Statistical Model
3 Results
Results on Different Feature Sets
Comparison with Other Teams at SemEval 2017
4 Conclusions
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
31. 31/34
Comparison of Results on Subtask A at SemEval 2017
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
32. 32/34
Comparison of Results on Subtask C at SemEval 2017
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
33. 33/34
Observations and Conclusions
Small in-domain texts are better for training, compared to
large out-of-domain pre-trained GoogleNews embeddings
Most instrumental are features based on:
User dialogues
Word embeddings
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34
34. 34/34
Thank you!
Any questions from the
community?
Titas Nandi1
, Chris Biemann2
, Seid Muhie Yimam2
, Deepak Gupta1
, Sarah Kohail2
, Asif Ekbal1
and Pushpak Bhattacharyya1
(IIT PatAnswer Selection and Ranking in CQA sites
Presented by Alexander Panchenko2
August
/ 34