Presentation as part of the "Social Media Annotation" Tutorial at ISWC2014. Content: What is crowdsourcing? What are typical steps taken when crowdsourcing the creation of training and verification corpora? What is the state of the art in performing these steps? How do these steps differ between mechanised labour and GWAPs?
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
End to-end goal-oriented question answering systems
version 2.0: An updated version with references of the old version (https://www.slideshare.net/QiHe2/kdd-2018-tutorial-end-toend-goaloriented-question-answering-systems).
08/22/2018: The old version was just deleted for reducing the confusion.
FriendsQA: Open-domain Question Answering on TV Show TranscriptsJinho Choi
This thesis presents FriendsQA, a challenging question answering dataset that contains 1,222 dialogues and 10,610 open-domain questions, to tackle machine comprehension on everyday conversations. Each dialogue, involving multiple speakers, is annotated with six types of questions what, when, why, where, who, how regarding the dialogue contexts, and the answers are annotated with contiguous spans in the dialogue. A series of crowdsourcing tasks are conducted to ensure good annotation quality, resulting a high inter-annotator agreement of 81.82%. A comprehensive annotation analytics is provided for a deeper understanding in this dataset. Three state-of-the-art QA systems are experimented, R-Net, QANet, and BERT, and evaluated on this dataset. BERT in particular depicts promising results, an accuracy of 74.2% for answer utterance selection and an F1-score of 64.2% for answer span selection, suggesting that the FriendsQA task is hard yet has a great potential of elevating QA research on multiparty dialogue to another level.
A brief survey presentation about Arabic Question Answering touching the different Natural Language Processing and Information Retrieval Approaches to Question Analysis, Passage Retrieval and Answer Extraction. In addition to the listing of the different NLP tools used in AQA and the Challenges and future trends in this area.
Please if you want to cite this paper you can download it here:
http://www.acit2k.org/ACIT/2012Proceedings/13106.pdf
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...AI Frontiers
The market is already demonstrating strong value in the home for voice-activated AI, but the work environment is yet to catch up. Omar will explain why voice-activated AI is the most important development to come to the workplace. He will pull from his experiences creating Eva, the first enterprise voice assistant focused on making meetings more actionable, and dive specifically into the challenges of ASR (Automatic Speech Recognition), NLP and neural networks in creating these kinds of voice-activated assistants. He will share how his team have overcome these challenges.
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
End to-end goal-oriented question answering systems
version 2.0: An updated version with references of the old version (https://www.slideshare.net/QiHe2/kdd-2018-tutorial-end-toend-goaloriented-question-answering-systems).
08/22/2018: The old version was just deleted for reducing the confusion.
FriendsQA: Open-domain Question Answering on TV Show TranscriptsJinho Choi
This thesis presents FriendsQA, a challenging question answering dataset that contains 1,222 dialogues and 10,610 open-domain questions, to tackle machine comprehension on everyday conversations. Each dialogue, involving multiple speakers, is annotated with six types of questions what, when, why, where, who, how regarding the dialogue contexts, and the answers are annotated with contiguous spans in the dialogue. A series of crowdsourcing tasks are conducted to ensure good annotation quality, resulting a high inter-annotator agreement of 81.82%. A comprehensive annotation analytics is provided for a deeper understanding in this dataset. Three state-of-the-art QA systems are experimented, R-Net, QANet, and BERT, and evaluated on this dataset. BERT in particular depicts promising results, an accuracy of 74.2% for answer utterance selection and an F1-score of 64.2% for answer span selection, suggesting that the FriendsQA task is hard yet has a great potential of elevating QA research on multiparty dialogue to another level.
A brief survey presentation about Arabic Question Answering touching the different Natural Language Processing and Information Retrieval Approaches to Question Analysis, Passage Retrieval and Answer Extraction. In addition to the listing of the different NLP tools used in AQA and the Challenges and future trends in this area.
Please if you want to cite this paper you can download it here:
http://www.acit2k.org/ACIT/2012Proceedings/13106.pdf
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...AI Frontiers
The market is already demonstrating strong value in the home for voice-activated AI, but the work environment is yet to catch up. Omar will explain why voice-activated AI is the most important development to come to the workplace. He will pull from his experiences creating Eva, the first enterprise voice assistant focused on making meetings more actionable, and dive specifically into the challenges of ASR (Automatic Speech Recognition), NLP and neural networks in creating these kinds of voice-activated assistants. He will share how his team have overcome these challenges.
Question Answering System using machine learning approachGarima Nanda
In a compact form, this is a presentation reflecting how the machine learning approach can be used for the effective and efficient interaction using classification techniques.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Tasnim Ara Islam
Design an application for a domain specific question answering system. Built a solution for finding answers of factoid questions by using N-gram Mining Approach. Calculated percentage about the related answers for the specific question. Built this application in Java platform.
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
Cite: Lifeng Han. 2021. Meta-evaluation of machine translation evaluation methods. In Metrics2021 Tutorial Track/type: Workshop on Informetric and Scientometric Research (SIG-MET), ASIS&T. October 23–24.
Arabic is the 6th most wide-spread natural language in the world with more than 350 million native speakers. Arabic question answering systems are gaining great significance due to the increasing amounts of Arabic unstructured content on the Internet and the increasing demand for information that regular information retrieval techniques do not satisfy. Question answering systems generally, and Arabic systems are no exception, hit an upper bound of performance due to the propagation of error in their pipeline. This increases the significance of answer selection and validation systems as they enhance the certainty and accuracy of question answering systems. Very few works tackled the Arabic answer selection and validation problem, and they used the same question answering pipeline without any changes to satisfy the requirements of answer selection and validation. That is why they did not perform adequately well in this task. In this dissertation, a new approach to Arabic answer selection and validation is presented through “ALQASIM”, which is a QA4MRE (Question Answering for Machine Reading Evaluation) system. ALQASIM analyzes the reading test documents instead of the questions, utilizes sentence splitting, root expansion, and semantic expansion using an ontology built from the CLEF 2012 background collections. Our experiments have been conducted on the test-set provided by CLEF 2012 through the task of QA4MRE. This approach led to a promising performance of 0.36 Accuracy and 0.42 C@1, which is double the performance of the best performing Arabic QA4MRE system.
Publications:
http://scholar.google.com/citations?user=XGJiEioAAAAJ&hl=en
https://aast.academia.edu/AhmedMagdy
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
LPRC 2018: Limerick Postgraduate Research Conference
Lifeng Han and Shaohui Kuang. 2018. Apply Chinese radicals into neural machine translation: Deeper than character level. ArXiv pre-print https://arxiv.org/abs/1805.01565v1
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentationstewhir
These are the presentation slides used for the WWW 2014 (Web Search) full paper: "Recent and Robust Query Auto-Completion".
The PDF full paper is available from: http://www.stewh.com/wp-content/uploads/2014/02/fp539-whiting.pdf
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools Lifeng (Aaron) Han
Abstract of Aaron Han’s Presentation
The main topic of this presentation will be the “evaluation of machine translation”. With the rapid development of machine translation (MT), the MT evaluation becomes more and more important to tell whether they make some progresses. The traditional human judgments are very time-consuming and expensive. On the other hand, there are some weaknesses in the existing automatic MT evaluation metrics:
– perform well in certain language pairs but weak on others, which we call the language-bias problem;
– consider no linguistic information (leading the metrics result in low correlation with human judgments) or too many linguistic features (difficult in replicability), which we call the extremism problem;
– design incomprehensive factors (e.g. precision only).
To address the existing problems, he has developed several automatic evaluation metrics:
– Design tunable parameters to address the language-bias problem;
– Use concise linguistic features for the linguistic extremism problem;
– Design augmented factors.
The experiments on ACL-WMT corpora show the proposed metrics yield higher correlation with human judgments. The proposed metrics have been published on international top conferences, e.g. COLING and MT SUMMIT. Actually speaking, the evaluation works are very related to the similarity measuring. So these works can be further developed into other literature, such as information retrieval, question and answering, searching, etc.
A brief introduction about some of his other researches will also be mentioned, such as Chinese named entity recognition, word segmentation, and multilingual treebanks, which have been published on Springer LNCS and LNAI series. Precious suggestions and comments are much appreciated. The opportunities of further corporation will be more exciting.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
Invited Presentation in NLP lab of Soochow University, about my NLP journey and ADAPT Centre. NLP part covers Machine Translation Evaluation, Quality Estimation, Multiword Expression Identification, Named Entity Recognition, Word Segmentation, Treebanks, Parsing.
Michael Manukyan and Hrayr Harutyunyan gave a talk on sentence representations in the context of deep learning during Armenian NLP Meetup. They also reviewed a recent paper on machine comprehension (Wang, Jiang, 2016)
DELAB - sequence generation seminar
Title
Open vocabulary problem
Table of contents
1. Open vocabulary problem
1-1. Open vocabulary problem
1-2. Ignore rare words
1-3. Approximative Softmax
1-4. Back-off Models
1-5. Character-level model
2. Solution1: Byte Pair Encoding(BPE)
3. Solution2: WordPieceModel(WPM)
Vectors in Search - Towards More Semantic MatchingSimon Hughes
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then I will describe a few different techniques for efficiently searching vector-based representations in an inverted index, such as learning sparse representations of vectors, clustering, and learning binary vectors. Finally, I will discuss some of the pitfalls of vector-based search, and how to get the best of both worlds by combining vector-based scoring with traditional relevancy metrics such as BM25.
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesLeon Derczynski
Annotating data is expensive and often fraught. Crowdsourcing promises a quick, cheap and high-quality solution, but it is critical to understand the process and plan work appropriately in order to get results. This presentation and paper discuss the challenges involves and explain simple ways to getting reliable, quality results when crowdsourcing corpora.
Full paper: https://gate.ac.uk/sale/lrec2014/crowdsourcing/crowdsourcing-NLP-corpora.pdf
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
ADAPT seminar series. June 2021
research papers @NoDaLiDa2021:the 23rd Nordic Conference on Computational Linguistics
& COLING20:MWE-LEX WS
Bonus takeaway:
AlphaMWE multilingual corpus
with MWEs
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
K-SRL: Instance-based Learning for Semantic Role LabelingYunyao Li
Slides for our COLING'16 paper http://aclweb.org/anthology/C/C16/C16-1058.pdf
Abstract:
Semantic role labeling (SRL) is the task of identifying and labeling predicate-argument structures in sentences with semantic frame and role labels. A known challenge in SRL is the large number of low-frequency exceptions in training data, which are highly context-specific and difficult to generalize. To overcome this challenge, we propose the use of instance-based learning that performs no explicit generalization, but rather extrapolates predictions from the most similar instances in the training data. We present a variant of k-nearest neighbors (kNN) classification with composite features to identify nearest neighbors for SRL. We show that high-quality predictions can be derived from a very small number of similar instances. In a comparative evaluation we experimentally demonstrate that our instance-based learning approach significantly outperforms current state-of-the-art systems on both in-domain and out-of-domain data, reaching F1-scores
of 89,28% and 79.91% respectively
Efficient named entity annotation through pre-emptingLeon Derczynski
Linguistic annotation is time-consuming and expensive. One common annotation task is to mark entities – such as names
of people, places and organisations – in text. In a document, many segments of text often contain no entities at all. We show that these segments are worth skipping, and demonstrate a technique for reducing the amount of entity-less text examined
by annotators, which we call “preempting”. This technique is evaluated in a crowdsourcing scenario, where it provides downstream performance improvements for the same size corpus.
Question Answering System using machine learning approachGarima Nanda
In a compact form, this is a presentation reflecting how the machine learning approach can be used for the effective and efficient interaction using classification techniques.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Tasnim Ara Islam
Design an application for a domain specific question answering system. Built a solution for finding answers of factoid questions by using N-gram Mining Approach. Calculated percentage about the related answers for the specific question. Built this application in Java platform.
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
Cite: Lifeng Han. 2021. Meta-evaluation of machine translation evaluation methods. In Metrics2021 Tutorial Track/type: Workshop on Informetric and Scientometric Research (SIG-MET), ASIS&T. October 23–24.
Arabic is the 6th most wide-spread natural language in the world with more than 350 million native speakers. Arabic question answering systems are gaining great significance due to the increasing amounts of Arabic unstructured content on the Internet and the increasing demand for information that regular information retrieval techniques do not satisfy. Question answering systems generally, and Arabic systems are no exception, hit an upper bound of performance due to the propagation of error in their pipeline. This increases the significance of answer selection and validation systems as they enhance the certainty and accuracy of question answering systems. Very few works tackled the Arabic answer selection and validation problem, and they used the same question answering pipeline without any changes to satisfy the requirements of answer selection and validation. That is why they did not perform adequately well in this task. In this dissertation, a new approach to Arabic answer selection and validation is presented through “ALQASIM”, which is a QA4MRE (Question Answering for Machine Reading Evaluation) system. ALQASIM analyzes the reading test documents instead of the questions, utilizes sentence splitting, root expansion, and semantic expansion using an ontology built from the CLEF 2012 background collections. Our experiments have been conducted on the test-set provided by CLEF 2012 through the task of QA4MRE. This approach led to a promising performance of 0.36 Accuracy and 0.42 C@1, which is double the performance of the best performing Arabic QA4MRE system.
Publications:
http://scholar.google.com/citations?user=XGJiEioAAAAJ&hl=en
https://aast.academia.edu/AhmedMagdy
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
LPRC 2018: Limerick Postgraduate Research Conference
Lifeng Han and Shaohui Kuang. 2018. Apply Chinese radicals into neural machine translation: Deeper than character level. ArXiv pre-print https://arxiv.org/abs/1805.01565v1
Recent and Robust Query Auto-Completion - WWW 2014 Conference Presentationstewhir
These are the presentation slides used for the WWW 2014 (Web Search) full paper: "Recent and Robust Query Auto-Completion".
The PDF full paper is available from: http://www.stewh.com/wp-content/uploads/2014/02/fp539-whiting.pdf
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools Lifeng (Aaron) Han
Abstract of Aaron Han’s Presentation
The main topic of this presentation will be the “evaluation of machine translation”. With the rapid development of machine translation (MT), the MT evaluation becomes more and more important to tell whether they make some progresses. The traditional human judgments are very time-consuming and expensive. On the other hand, there are some weaknesses in the existing automatic MT evaluation metrics:
– perform well in certain language pairs but weak on others, which we call the language-bias problem;
– consider no linguistic information (leading the metrics result in low correlation with human judgments) or too many linguistic features (difficult in replicability), which we call the extremism problem;
– design incomprehensive factors (e.g. precision only).
To address the existing problems, he has developed several automatic evaluation metrics:
– Design tunable parameters to address the language-bias problem;
– Use concise linguistic features for the linguistic extremism problem;
– Design augmented factors.
The experiments on ACL-WMT corpora show the proposed metrics yield higher correlation with human judgments. The proposed metrics have been published on international top conferences, e.g. COLING and MT SUMMIT. Actually speaking, the evaluation works are very related to the similarity measuring. So these works can be further developed into other literature, such as information retrieval, question and answering, searching, etc.
A brief introduction about some of his other researches will also be mentioned, such as Chinese named entity recognition, word segmentation, and multilingual treebanks, which have been published on Springer LNCS and LNAI series. Precious suggestions and comments are much appreciated. The opportunities of further corporation will be more exciting.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
Invited Presentation in NLP lab of Soochow University, about my NLP journey and ADAPT Centre. NLP part covers Machine Translation Evaluation, Quality Estimation, Multiword Expression Identification, Named Entity Recognition, Word Segmentation, Treebanks, Parsing.
Michael Manukyan and Hrayr Harutyunyan gave a talk on sentence representations in the context of deep learning during Armenian NLP Meetup. They also reviewed a recent paper on machine comprehension (Wang, Jiang, 2016)
DELAB - sequence generation seminar
Title
Open vocabulary problem
Table of contents
1. Open vocabulary problem
1-1. Open vocabulary problem
1-2. Ignore rare words
1-3. Approximative Softmax
1-4. Back-off Models
1-5. Character-level model
2. Solution1: Byte Pair Encoding(BPE)
3. Solution2: WordPieceModel(WPM)
Vectors in Search - Towards More Semantic MatchingSimon Hughes
With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then I will describe a few different techniques for efficiently searching vector-based representations in an inverted index, such as learning sparse representations of vectors, clustering, and learning binary vectors. Finally, I will discuss some of the pitfalls of vector-based search, and how to get the best of both worlds by combining vector-based scoring with traditional relevancy metrics such as BM25.
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesLeon Derczynski
Annotating data is expensive and often fraught. Crowdsourcing promises a quick, cheap and high-quality solution, but it is critical to understand the process and plan work appropriately in order to get results. This presentation and paper discuss the challenges involves and explain simple ways to getting reliable, quality results when crowdsourcing corpora.
Full paper: https://gate.ac.uk/sale/lrec2014/crowdsourcing/crowdsourcing-NLP-corpora.pdf
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
ADAPT seminar series. June 2021
research papers @NoDaLiDa2021:the 23rd Nordic Conference on Computational Linguistics
& COLING20:MWE-LEX WS
Bonus takeaway:
AlphaMWE multilingual corpus
with MWEs
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
K-SRL: Instance-based Learning for Semantic Role LabelingYunyao Li
Slides for our COLING'16 paper http://aclweb.org/anthology/C/C16/C16-1058.pdf
Abstract:
Semantic role labeling (SRL) is the task of identifying and labeling predicate-argument structures in sentences with semantic frame and role labels. A known challenge in SRL is the large number of low-frequency exceptions in training data, which are highly context-specific and difficult to generalize. To overcome this challenge, we propose the use of instance-based learning that performs no explicit generalization, but rather extrapolates predictions from the most similar instances in the training data. We present a variant of k-nearest neighbors (kNN) classification with composite features to identify nearest neighbors for SRL. We show that high-quality predictions can be derived from a very small number of similar instances. In a comparative evaluation we experimentally demonstrate that our instance-based learning approach significantly outperforms current state-of-the-art systems on both in-domain and out-of-domain data, reaching F1-scores
of 89,28% and 79.91% respectively
Efficient named entity annotation through pre-emptingLeon Derczynski
Linguistic annotation is time-consuming and expensive. One common annotation task is to mark entities – such as names
of people, places and organisations – in text. In a document, many segments of text often contain no entities at all. We show that these segments are worth skipping, and demonstrate a technique for reducing the amount of entity-less text examined
by annotators, which we call “preempting”. This technique is evaluated in a crowdsourcing scenario, where it provides downstream performance improvements for the same size corpus.
In this talk we cover
1. Why NLP and DL
2. Practical Challenges
3. Some Popular Deep Learning models for NLP
Today you can take any webpage in any language and translate it automatically into language you know! You can also cut paste an article or other document into NLP systems and immediately get list of companies and people it talks about, topics that are relevant and the sentiment of the document. When you talk to Google or Amazon assistant, you are using NLP systems. NLP is not perfect but given the advances in last two years and continuing, it is a growing field. Let’s see how it actually works, specifically using Deep learning
About Shishir
Shishir is a Senior Data Scientist at Thomson Reuters working on Deep Learning and NLP to solve real customer pain, even ones they have become used to.
Best Practices in Recommender System ChallengesAlan Said
Recommender System Challenges such as the Netflix Prize, KDD Cup, etc. have contributed vastly to the development and adoptability of recommender systems. Each year a number of challenges or contests are organized covering different aspects of recommendation. In this tutorial and panel, we present some of the factors involved in successfully organizing a challenge, whether for reasons purely related to research, industrial challenges, or to widen the scope of recommender systems applications.
Evaluating Semantic Search Systems to Identify Future Directions of ResearchStuart Wrigley
Recent work on searching the Semantic Web has yielded a wide range of approaches with respect to the style of input, the underlying search mechanisms and the manner in which results are presented. Each approach has an impact upon the quality of the information retrieved and the user's experience of the search process. This highlights the need for formalised and consistent evaluation to benchmark the coverage, applicability and usability of existing tools and provide indications of future directions for advancement of the state-of-the-art. In this paper, we describe a comprehensive evaluation methodology which addresses both the underlying performance and the subjective usability of a tool. We present the key outcomes of a recently completed international evaluation campaign which adopted this approach and thus identify a number of new requirements for semantic search tools from both the perspective of the underlying technology as well as the user experience.
Managing application performance by Kwame ThomisonSergeyChernyshev
Managing application performance is a huge challenge for most engineering organizations. The most difficult questions we have to answer often only indirectly related to shortening the critical path. How do you align stakeholders with competing priorities and agendas? How do you drum up interest, persuade your best engineers to focus on performance, and keep the team motivated? How should the the team be positioned within the larger organization? Where do you even start? You can't find good answers to any of those questions with a simple web search.
In this talk, Kwame Thomison shares tips for creating a perf team with a clear, compelling mission, a few frameworks he’s devised for thinking about web performance, and some of the lessons he’s learned working on perf teams at companies like Meebo, Facebook, and Asana.
Recommending Scientific Papers: Investigating the User CurriculumJonathas Magalhães
In this paper, we propose a Personalized Paper Recommender System, a new user-paper based approach that takes into consideration the user academic curriculum vitae. To build the user profiles, we use a Brazilian academic platform called CV-Lattes. Furthermore, we examine some issues related to user profiling, such as (i) we define and compare different strategies to build and represent the user profiles, using terms and using concepts; (ii) we verify how much past information of a user is required to provide good recommendations; (iii) we compare our approaches with the state-of-art in paper recommendation using the CV-Lattes. To validate our strategies, we conduct a user study experiment involving 30 users in the Computer Science domain. Our results show that (i) our approaches outperform the state-of-art in CV-Lattes; (ii) concepts profiles are comparable with the terms profiles; (iii) analyzing the content of the past four years for terms profiles and five years for concepts profiles achieved the best results; and (iv) terms profiles provide better results but they are slower than concepts profiles, thus, if the system needs real time recommendations, concepts profiles are better.
A short tutorial of the OCS freeware, which is used by psychologists to score creativity assessments.
We originally presented these slides in Thessaloniki in August of 2023.
Earthsoft Foundation of Guidance is an #NGO to guide students for education & career, professionals for soft skill enhancements
Join as mentor & conduct training in yr city
http://www.slideshare.net/rrakhecha/efg-letter-to-all-those-are-interested
Following content available free of cost at http://myefg.in/downloads.aspx
Or at www.slideshare.net (search using key word - earthsoft)
Topics
S No Description
1 Do & don’t – To live simple & happy life
2 To prepare for study, exam, Time Table
3 Education & Career Guidance
4 Personality development
5 Finance- avoiding speculation
6 Vegetarian & health management
7 Basic Religion
8 Project & delivery Management
9 Stop Alcohol
10 Women empowerment
11 Assertiveness
12 Responsibility & ownership
13 Role models
14 Leadership
15 Preparing resume & covering letter
16 Successful Interviewing
17 Selecting life partner
18 Conflict management
19 Stop ragging
20 Effective Communication
21. Teachers training for effective teaching, Understand students
22. Water..critical resource, pollution, pollutants and solution
23. Be happy
24. To be successful
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
In the talk I describe two approaches for improve the recall and precision of an enterprise search engine using machine learning techniques. The main focus is improving relevancy with ML while using your existing search stack, be that Luce, Solr, Elastic Search, Endeca or something else.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
Crowdsourcing Best Practices
1. University of Sheffield, NLP
Crowdsourcing Best Practices
Marta Sabou, Kalina Bontcheva
Leon Derczynski, Arno Scharl
2. University of Sheffield, NLP
The Science of Corpus Annotation
• Quite well understood best practice in how to create linguistic
annotation of consistently high quality by employing, training, and
managing groups of linguistic and/or domain experts
• Necessary in order to ensure reusability and repeatability of results
• The acquired corpora are of very high quality
• Costs are unfortunately also very high: estimated at between $0.36
and $1.0 per annotation (Zaidan and Callison-Burch, 2011; Poesio et
al., 2012)
3. University of Sheffield, NLP
Goals
What is crowdsourcing?
What is a typical workflow for crowdsoucing NLP tasks?
What are general solutions used by the state of the art?
How do different crowdsourcing genres compare?
5. University of Sheffield, NLP
Undefined and generally large group
Compared to in-house projects:
• cheaper (with 33%)
• reach to large number of users;
• reach to diverse user groups,
e.g., speakers of rare languages
6. University of Sheffield, NLP
Genre 1: Mechanised Labour
• Participants (workers) paid a small amount of money to
complete easy tasks (HIT = Human Intelligence Task)
9. University of Sheffield, NLP
Workflow for Crowdsourcing (Corpora)
1. Project Definition
2. Data and UI Preparation
3. Running the Project
4. Evaluation & Corpus
Delivery
11. University of Sheffield, NLP
Definition of semantic relations between concept pairs.
Coal Is a subcategory of Fossil Fuel
12. University of Sheffield, NLP
Trade-offs: Cost; Timescale; Worker skills
Small, simple tasks, fast completion => MLab
Complex, large tasks, slower completion => GWAP
13. University of Sheffield, NLP
• Data distribution: how “micro” is each microtask?
• Long paragraphs hard to digest, worker fatigue
• For most NLP tasks: one sentence corresponds to one task
• Single sentences not always appropriate: e.g. for co-ref
• Task Type
• Selection task: WSD, sentiment analysis, entity
disambiguation, relation typing.
• Sequence marking task: co-reference resolution.
14. University of Sheffield, NLP
• Categories per selection type task:
• Experts (Hovy,10): max 10, ideally 7
• In crowdsourcing less categories, typically 3-4
• To reduce cognitive load, focus on one category at a time
(e.g., one NE type)
• Number of workers per task:
• Depends on the subjective nature/complexity of the task
• Minimum 3, optimally 5
• Dynamic worker assignment for inconclusive tasks
• Lawson et al. (2010): number of required labels varies for different aspects of
the same NLP problem. Good results with only 4 annotators for Person NEs,
but require 6 for Location and 7 for Organizations
15. University of Sheffield, NLP
Reward scheme
• What to reward? - money, game points
• When to reward? - when work entered or after its evaluation
• How much to reward?
• Typically between $0.01 - $0.05/task (5 units)
• No clear, repeatable results for quality:reward relation
• High rewards get it done faster, but not better
• Pilot task gives timings, so pay at least minimum wage
• What to do with “bad” work? - detect at run-time and
exclude
17. University of Sheffield, NLP
Categories:10
Players/task:7
Payment:points
awarded based
on previously
contributed
judgments
18. University of Sheffield, NLP
Categories:10
Players/task:10
Payment:$0.05/5 units
Players filtered through gold-data
19. University of Sheffield, NLP
Workflow for Crowdsourcing Corpora
1. Project Definition
2. Data and UI Preparation
3. Running the Project
4. Evaluation & Corpus
Delivery
20. University of Sheffield, NLP
• Pre-process the corpus linguistically, as needed, e.g.
• Tokenise text if user needs to select words
• Identify proper names/noun phrases if we want to classify these
• Bring additional context, if needed, e.g. text of user profile from
Twitter; link to wikipedia page
• For GWAPs:
• Collect interesting input data if possible, I.e.,texts that are fun to
read and work on
• clean input data to remove errors (these will lower player
satisfaction)
• MLab can be used for cleaning the data set
21. University of Sheffield, NLP
• Build and test the user interfaces
• Easy to medium difficulty in AMT/CF; templates provided for
some task types
• Medium to hard for GWAPs
• Job management interfaces
• Provided in MLab platforms
• Must be built from scratch for GWAPs
• Comparative interface set-up times:
• CF: 2 days; Climate Quiz: 2 months
• (Thaler et al., 12): OntoPronto: 5 months
23. University of Sheffield, NLP
HINT: Add explicitly verifiable
questions to the UI:
- help filter out spammers
- force workers to read the task
input
24. University of Sheffield, NLP
Pilot the design, measure performance, try again
• Simple, clear design important
• Binary decision tasks get good results
Run bigger pilot studies with volunteers to test
everything and collect gold units for quality control later
25. University of Sheffield, NLP
Workflow for Crowdsourcing Corpora
1. Project Definition
2. Data and UI Preparation
3. Running the Project
4. Evaluation & Corpus
Delivery
26. University of Sheffield, NLP
Contributor recruitment:
• MLab - easy, given the platforms’ large worker pools and economic
incentives
• GWAPs - challenging, requires much PR.
• Social network based games allow inviting friends for leverage the viral
aspect of SNs
• Multi-channel advertisement: local and national press, science websites,
blogs, bookmarking web- sites, gaming forums, and social networking
sites
Contributor screening (only in MLab):
• MLab - by country, by skill (e.g., spoken language), by reliability
• MLab - screening through competency tests; answers to gold units
27. University of Sheffield, NLP
IN-TASK QUALITY CONTROL
Train contributors - through instructions:
• be clear and concise;
• avoid technical jargon;
• provide both positive and negative examples.
Train contributors - through gold data:
• CF - known data units (gold units) hidden in tasks
• When completing a gold unit, a worker is shown the expected answer thus
being trained “on the job”
• Workers who fail a certain percentage of gold units are automatically
excluded from the job
Great opportunity to train workers and amend expert data
Better gold data means better output quality, for the same cost
30. University of Sheffield, NLP
• For large tasks - Multi-batch methodology
• Submit tasks in multiple batches
• Ensure contributor diversity by starting batches at different times
• Needs less gold data
• Deal with worker disputes!
31. University of Sheffield, NLP
Workflow for Crowdsourcing Corpora
1. Project Definition
2. Data and UI Preparation
3. Running the Project
4. Evaluation & Corpus
Delivery
32. University of Sheffield, NLP
• Evaluate individual contributor inputs to produce final decision
• Majority vote
• Discard inputs from low-trusted contributors (e.g. Hsueh et al. (2009))
• Aggregation:
• Merge individual units from the microtasks (e.g. sentences) into
complete documents, including all crowdsourced markup
• Majority voting; average; collection
• Aggregation strategies:
• Climate Quiz: relation chosen between pairs if it has been voted
by 4 more players than the next most popular relation
• CF - Majority voting; confidence value computed taking into
account worker accuracy
33. University of Sheffield, NLP
• Evaluate corpus quality
• Compute inter-worker agreement;
• Compute inter-worker-trusted annotator agreement
• Compare to a gold standard baseline (P/R/F/Acc)
•To facilitate reuse:
• deliver corpus in a widely used format (XCES, CONLL, GATE XML)
• Share with research community
35. University of Sheffield, NLP
Evaluation of relation selection task:
Comparison with Gold Standard
Same data, different aggregation
36. University of Sheffield, NLP
Legal and Ethical Issues
1. Acknowledging the Crowd‘s contribution
S. Cooper, [other authors], and Foldit players: Predicting protein structures
with a multiplayer online game. Nature, 466(7307):756-760, 2010.
2. Ensuring privacy and wellbeing
1. Mechnised labour criticised for low wages, lack of worker rights
2. Majority of workers rely on microtasks as main income source
3. Prevent prolonged use & user exploitation (e.g. daily caps)
3. Licensing and consent
1. Some clearly state the use of Creative Common licenses
2. General failure to provide informed consent information
Crowdsourcing is an emerging collaborative approach for acquiring annotated corpora and a wide range of other linguistic resources
Three main kinds of crowdsourcing platforms
paid-for marketplaces such as Amazon Mechanical Turk (AMT) and CrowdFlower (CF)
games with a purpose
volunteer-based platforms such as crowdcrafting
Paid for crowdsourcing can be 33% cheaper than in-house employees when applied to tasks such as tagging and classification (Hoffmann, 2009)
Games with a purpose can be even cheaper in the long run, since the players are not paid.
However cost of implementing a game can be higher than AMT/CF costs for smaller projects (Poesio et al, 2012)
Tap into the large number of contributors/players available across the globe, through the internet
Easy to reach native speakers in various languages (but beware Google translate cheaters!)
Contributors are extrinsically motivated through economic incentives
Most NLP projects use crowdsourcing marketplaces: Amazon Mechanical Tutk and CrowdFlower
Requesters post Human Intelligence Tasks (HITs) to a large population of micro-workers (Callison-Burch and Dredze, 2010a)
Snow et al. (2008) collect event and affect annotations, while Lawson et al. (2010) and Finin et al. (2010) annotate special types of texts such as emails and Twitter feeds, respectively.
Challenges:
low quality output due to the workers’ purely economic motivation
high costs for large tasks (Parent and Eskenazi, 2011)
ethical issues (Fort et al., 2011)
In GWAPs (von Ahn and Dabbish, 2008), contributors carry out annotation tasks as a side effect of playing a game
Example GWAPs:
Phratris for annotating syntactic dependencies (Attardi, 2010)
PhraseDetectives (Poesio et al.,2012) to acquire anaphora annotations
Sentiment Quiz (Scharl et al., 2012) to annotate sentiment
http://www.wordrobe.org/ - A collection of NLP games incl. POS, NE
Challenges:
Designing apealing games and attracting a critical mass of players are among the key success factors within this genre (Wang et al., 2012)
In 2008, the group built a FB game that required players to rate the sentiment associated to a sentence on a 5-values scale, then used this as atraining corpus for the sentiment detection module. Over 800 player played the game.
In 2009 the game has been released in a slightly different form and with the aim to gather sentiment lexicons, i.e., associations between words and their sentiment polarity (ratings from as many as 12 players were averaged to get the final value). The game ran in 7 different languages and attracted over 4000 players.
Let this be an introductory example of a crowdsourcing project, however, crowdsourcing is a not a new phenomenon.
Volunteer contributes because he is interested in a domain, supports a cause
Compared to paid-for marketplaces, GWAPs:
reduce costs and the incentive to cheat as players are intrinsically motivated
promise superior results, due to motivated players and better utilization of sporadic, explorer-type users (Parent and Eskenazi, 2011)
Few papers, and most of those “theoretical”/survey-based comparison.
Climate Quiz is a GWAP deployed over the Facebook social networking platform. It is focused on acquiring factual knowledge in the domain of climate change.
The game is coupled with an ontology learning algorithm, as follows. The ontology learning algorithm extracts terms from unstructured and structured data sources. The term pairs that are most likely related based on the algorithm’s input data sources are subsequently sent to Climate Quiz, where players assign relations to each pair. These relations are fed back into the algorithm which uses them to refine the learned ontology and to derive new term pairs that should be connected.
As depicted here, Climate Quiz asks players to evaluate whether two concepts presented by the system are related (e.g. environmental activism,activism), and which label is the most appropriate to describe this relation (e.g. is a sub − category of ). Players can assign one of eight relations, three of which are generic (is a sub − category of, is identical to, is the opposite of), whereas five are domain- specific (opposes, supports, threatens, influences, works on /with). Two further relations, “other” and “is not related to” were added for cases not covered by the previous eight relations. The game’s interface allows players to switch the position of the two concepts or to skip ambiguous pairs.
In order to allow the comparative analysis of the two HC genres, a mechanised labour version of Climate Quiz was created on the CrowdFlower (CF) platform.
Additionally to the game interface, two verification questions were added to “force” the contributors to read the terms before selecting a random relation.
Can run for hours, days or years, depending on genre and size
Quality in terms of agreements with a gold standard.
Note: depending on how the raw input from CF is aggregated the results are very different. In particular, the aggregation mechanism of CQ (highest scored relations must have 4 more scores than second scored relation) leads to worse results than when the aggregation methods of CF are used (these take account of worker performance during majority vote).
Our findings verify experimentally all the differences between the two genre that the literature based study identified. Additionally, thanks to the experimental approach we have some concrete details about the actual values of some of the parameters.
For those aspects where earlier studies disagree we found that:
With the appropriate aggregation method, Mlab results can be as good as those obtained with games, at least for the task in question
2) Worker diversity is higher in GWAPs