Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
The automatic question answering (QA) task has long been considered a primary objective of artificial intelligence.
Among the QA sub-systems, we focused on answer-ranking part. In particular, we investigated a novel neural network architecture with additional data clustering module to improve the performance in ranking answer candidates which are longer than a single sentence. This work can be used not only for the QA ranking task, but also to evaluate the relevance of next utterance with given dialogue generated from the dialogue model.
In this talk, I'll present our research results (NAACL 2018), and also its potential use cases (i.e. fake news detection). Finally, I'll conclude by introducing some issues on previous research, and by introducing recent approach in academic.
Tony Vlachakis, an educational technologist that works at the Georgia Department of Education, gave this presentation update on the K-12 Computer Science Framework Review.
ML Framework for auto-responding to customer support queriesVarun Nathan
The synopsis of this presentation is about how ML can be employed to develop a bot that has the capability to understand natural language and provide suitable response.
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)Yun Huang
This is the slides for our paper in LAK '21 conference:
Yun Huang, Nikki G. Lobczowski, J. Elizabeth Richey, Elizabeth A. McLaugh- lin, Michael W. Asher, Judith M. Harackiewicz, Vincent Aleven, and Kenneth R. Koedinger. 2021. A General Multi-method Approach to Data-Driven Re- design of Tutoring Systems. In LAK21: 11th International Learning Analytics and Knowledge Conference (LAK21), April 12–16, 2021, Irvine, CA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3448139.3448155
Abstract: Analytics of student learning data are increasingly important for continuous redesign and improvement of tutoring systems and courses. There is still a lack of general guidance on converting analytics into better system design, and on combining multiple methods to maximally improve a tutor. We present a multi-method approach to data-driven redesign of tutoring systems and its empirical evaluation. Our approach systematically combines existing and new learning analytics and instructional design methods. In particular, our methods involve identifying difficult skills and creating focused tasks for learning these difficult skills effectively following content redesign strategies derived from analytics. In our past work, we applied this approach to redesigning an algebraic modeling unit and found initial evidence of its effectiveness. In the current work, we extended this approach and applied it to redesigning two other tutor units in addition to a second iteration of redesigning the previously redesigned unit. We conducted a one-month classroom experiment with 129 high school students. Compared to the origi- nal tutor, the redesigned tutor led to significantly higher learning outcomes, with time mainly allocated to focused tasks rather than original full tasks. Moreover, it reduced over- and under-practice, yielded a more effective practice experience, and selected skills progressing from easier to harder to a greater degree. Our work provides empirical evidence of the effectiveness and generality of a multi-method approach to data-driven instructional redesign.
These slides present an application that can be used for assessing the participants’ contributions to multiple chat conversations that debate the same topics according to different criteria (involvement, knowledge and innovation), along with the ranking of the conversations considering a list of important concepts to be debated. As several factors have been used for determining each participant’s score, we need-ed to determine their quality for providing an answer that correlates well with the judgment of human evaluators for the same conversations. Thus, we also propose a methodology for testing the values of different factors that may be used for assessing participants in collaborative conversations in order to identify which of them are better or worse suited for providing automatic assessment. Our analysis showed that the heuristics used to assess participants’ innovation and involvement were the best correlated with the human judgment, while at the opposite end was the heuristic used for assessing participants’ knowledge.
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
The automatic question answering (QA) task has long been considered a primary objective of artificial intelligence.
Among the QA sub-systems, we focused on answer-ranking part. In particular, we investigated a novel neural network architecture with additional data clustering module to improve the performance in ranking answer candidates which are longer than a single sentence. This work can be used not only for the QA ranking task, but also to evaluate the relevance of next utterance with given dialogue generated from the dialogue model.
In this talk, I'll present our research results (NAACL 2018), and also its potential use cases (i.e. fake news detection). Finally, I'll conclude by introducing some issues on previous research, and by introducing recent approach in academic.
Tony Vlachakis, an educational technologist that works at the Georgia Department of Education, gave this presentation update on the K-12 Computer Science Framework Review.
ML Framework for auto-responding to customer support queriesVarun Nathan
The synopsis of this presentation is about how ML can be employed to develop a bot that has the capability to understand natural language and provide suitable response.
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)Yun Huang
This is the slides for our paper in LAK '21 conference:
Yun Huang, Nikki G. Lobczowski, J. Elizabeth Richey, Elizabeth A. McLaugh- lin, Michael W. Asher, Judith M. Harackiewicz, Vincent Aleven, and Kenneth R. Koedinger. 2021. A General Multi-method Approach to Data-Driven Re- design of Tutoring Systems. In LAK21: 11th International Learning Analytics and Knowledge Conference (LAK21), April 12–16, 2021, Irvine, CA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3448139.3448155
Abstract: Analytics of student learning data are increasingly important for continuous redesign and improvement of tutoring systems and courses. There is still a lack of general guidance on converting analytics into better system design, and on combining multiple methods to maximally improve a tutor. We present a multi-method approach to data-driven redesign of tutoring systems and its empirical evaluation. Our approach systematically combines existing and new learning analytics and instructional design methods. In particular, our methods involve identifying difficult skills and creating focused tasks for learning these difficult skills effectively following content redesign strategies derived from analytics. In our past work, we applied this approach to redesigning an algebraic modeling unit and found initial evidence of its effectiveness. In the current work, we extended this approach and applied it to redesigning two other tutor units in addition to a second iteration of redesigning the previously redesigned unit. We conducted a one-month classroom experiment with 129 high school students. Compared to the origi- nal tutor, the redesigned tutor led to significantly higher learning outcomes, with time mainly allocated to focused tasks rather than original full tasks. Moreover, it reduced over- and under-practice, yielded a more effective practice experience, and selected skills progressing from easier to harder to a greater degree. Our work provides empirical evidence of the effectiveness and generality of a multi-method approach to data-driven instructional redesign.
These slides present an application that can be used for assessing the participants’ contributions to multiple chat conversations that debate the same topics according to different criteria (involvement, knowledge and innovation), along with the ranking of the conversations considering a list of important concepts to be debated. As several factors have been used for determining each participant’s score, we need-ed to determine their quality for providing an answer that correlates well with the judgment of human evaluators for the same conversations. Thus, we also propose a methodology for testing the values of different factors that may be used for assessing participants in collaborative conversations in order to identify which of them are better or worse suited for providing automatic assessment. Our analysis showed that the heuristics used to assess participants’ innovation and involvement were the best correlated with the human judgment, while at the opposite end was the heuristic used for assessing participants’ knowledge.
SelQA: A New Benchmark for Selection-based Question AnsweringJinho Choi
This paper presents a new selection-based question answering dataset, SelQA. The dataset consists of questions generated through crowdsourcing and sentence length answers that are drawn from the ten most prevalent topics in the English Wikipedia. We introduce a corpus annotation scheme that enhances the generation of large, diverse, and challenging datasets by explicitly aiming to reduce word co-occurrences between the question and answers. Our annotation scheme is composed of a series of crowdsourcing tasks with a view to more effectively utilize crowdsourcing in the creation of question answering datasets in various domains. Several systems are compared on the tasks of answer sentence selection and answer triggering, providing strong baseline results for future work to improve upon.
Content Wizard: Concept-Based Recommender System for Instructors of Programmi...Hung Chau
Authoring an adaptive educational system is a complex process which involves allocating a large range of educational contents within a fixed sequence of units. Given this scenario, in this paper we describe Content Wizard, a concept-based recommender system for recommending learning materials that meet the instructor’s pedagogical goals during the creation of an online programming course. Here, the instructors are asked to provide a set of code examples that jointly reflect the learning goals associated with each course unit. The Wizard is built on the top of our course authoring tool, and it helps to decrease the time instructors spend on the task and to maintain the coherence of the sequential structure of the course. It also provides instructors with additional information to identify the contents that might be not appropriate for the unit they are creating. We conducted an off-line study with data collected from an introductory Java course previously taught at the University of Pittsburgh, in order to evaluate the practicalness and effectiveness of the system. We found that the proposed recommendation’s performance is relatively close to the teacher expectation in creating a computer-based adaptive course.
ML Framework for auto-responding to customer support queriesVarun Nathan
The synopsis of this presentation is about how ML can be employed to develop a bot that has the capability to understand natural language and provide suitable response.
Financial Question Answering with BERT Language ModelsBithiah Yuan
FinBERT-QA is a Question Answering system for retrieving opinionated financial passages from task 2 of the FiQA dataset. The system uses techniques from both information retrieval, natural language processing, and deep learning.
ML Framework for auto-responding to customer support queriesVarun Nathan
The synopsis of this presentation is about how ML can be employed to develop a bot that has the capability to understand natural language and provide suitable response.
Week 3: Assignment: Organizational Needs Assessment
Submit Assignment
Due
Mar 21 by 11:59pm
Points
125
Submitting
a file upload
Purpose
The identification of a need is the cornerstone of a project. The purpose of this assignment is to conduct an organizational needs assessment. The formulation of a comprehensive organizational needs assessment supports the professional formation of the DNP-prepared nurse. To complete the assessment of the organizational need, you will need to interview a key decision-maker at the practicum site. For students not implementing their DNP Project at a practicum site, complete the assignment as if you had interviewed a key decision-maker at a practicum site.
Course Outcomes
This assignment enables the student to meet the following course outcomes:
CO 2: Formulate a needs-based organizational assessment to inform strategic leadership decision-making. (POs 3, 5, 7)
CO 3: Develop strategies to lead project planning, implementation, management, and evaluation to promote high value healthcare. (POs 3, 5, 7)
Due Date(s)
Submit your assignment by 11:59 p.m. MT Sunday at the end of Week 3. The late assignment policy applies to this assignment.
Total points possible: 125
Page Requirement:
Length: 3-4 pages, excluding cover page and references
Instructions
To create flexibility, we are providing you options on this assignment. Concept maps are an effective way to express complex ideas, especially for visual learners. For this assignment, each of the sections can be presented
either
as a narrative
or
as a concept map.
Please note that you are not required to complete any or all sections as a concept map. If you choose to use a concept map for a section, it should be created in Microsoft Word and placed in that section of your paper. The concept map must meet all the requirements of the assignment rubric for that section. The rubric and page length requirements of the paper are unchanged.
If you need additional information about concept maps and how to create a concept map in Microsoft Word, please access the following resources.
Link (video):
Microsoft Word: Creating a Flowchart, Concept Map, or Process Map
(4:03)
Link (video):
Concept Mapping for Developing your Research
(3:37)
Review the Graduate Re-Purpose Policy in the Student Handbook, page 15:
Repurposed Work (Chamberlain University Graduate Programs only): Graduate students have the opportunity to use previously submitted ideas as a foundation for future courses. No more than 50 percent of an assignment, excluding references, may be repurposed from another Chamberlain University course (excluding practicum courses). Previous course assignments that are deemed building blocks will be notated in the syllabus by the course leader. As with every assignment, students must uphold academic integrity; therefore, students must follow the guidelines for remaining academically honest according to the Academic Integrity policy. If the instr ...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
Slides for talk at PyData Seattle 2017 about Matthew Honnibal's 4-step recipe for Deep Learning NLP pipelines. Description of the stages in pipeline as well as 3 examples of document classification, document similarity and sentence similarity. Examples include Keras custom layers for different types of attention.
What Does Conversational Information Access Exactly Mean and How to Evaluate It?krisztianbalog
This talk discusses a set of specific tasks and scenarios related to information access within the vast space that is casually referred to as conversational AI. While most of these problems have been identified in the literature for quite some time now, progress has been limited. Apart from the inherently challenging nature of these problems, the lack of progress, in large part, can be attributed to the shortage of appropriate evaluation methodology and resources. This talk presents some recent work towards filling this gap.
In one line of research, we investigate the presentation of tabular search results in a conversational setting. Instead of generating a static summary of a result table, we complement brief summaries with clues that invite further exploration, thereby taking advantage of the conversational paradigm. One of the main contributions of this study is the development of a test collection using crowdsourcing.
Another line of work focuses on large-scale evaluation of conversational recommender systems via simulated users. Building on the well-established agenda-based simulation framework from dialogue systems research, we develop interaction and preference models specific to the item recommendation scenario. For evaluation, we compare three existing conversational movie recommender systems with both real and simulated users, and observe high correlation between the two means of evaluation.
This talk has been given at the CIIR talk series at the University of Massachusetts Amherst in Jan 2021 as well as at the IR seminar series at the University of Glasgow in March 2021.
More Related Content
Similar to Towards Filling the Gap in Conversational Search: From Passage Retrieval to Conversational Response Generation
SelQA: A New Benchmark for Selection-based Question AnsweringJinho Choi
This paper presents a new selection-based question answering dataset, SelQA. The dataset consists of questions generated through crowdsourcing and sentence length answers that are drawn from the ten most prevalent topics in the English Wikipedia. We introduce a corpus annotation scheme that enhances the generation of large, diverse, and challenging datasets by explicitly aiming to reduce word co-occurrences between the question and answers. Our annotation scheme is composed of a series of crowdsourcing tasks with a view to more effectively utilize crowdsourcing in the creation of question answering datasets in various domains. Several systems are compared on the tasks of answer sentence selection and answer triggering, providing strong baseline results for future work to improve upon.
Content Wizard: Concept-Based Recommender System for Instructors of Programmi...Hung Chau
Authoring an adaptive educational system is a complex process which involves allocating a large range of educational contents within a fixed sequence of units. Given this scenario, in this paper we describe Content Wizard, a concept-based recommender system for recommending learning materials that meet the instructor’s pedagogical goals during the creation of an online programming course. Here, the instructors are asked to provide a set of code examples that jointly reflect the learning goals associated with each course unit. The Wizard is built on the top of our course authoring tool, and it helps to decrease the time instructors spend on the task and to maintain the coherence of the sequential structure of the course. It also provides instructors with additional information to identify the contents that might be not appropriate for the unit they are creating. We conducted an off-line study with data collected from an introductory Java course previously taught at the University of Pittsburgh, in order to evaluate the practicalness and effectiveness of the system. We found that the proposed recommendation’s performance is relatively close to the teacher expectation in creating a computer-based adaptive course.
ML Framework for auto-responding to customer support queriesVarun Nathan
The synopsis of this presentation is about how ML can be employed to develop a bot that has the capability to understand natural language and provide suitable response.
Financial Question Answering with BERT Language ModelsBithiah Yuan
FinBERT-QA is a Question Answering system for retrieving opinionated financial passages from task 2 of the FiQA dataset. The system uses techniques from both information retrieval, natural language processing, and deep learning.
ML Framework for auto-responding to customer support queriesVarun Nathan
The synopsis of this presentation is about how ML can be employed to develop a bot that has the capability to understand natural language and provide suitable response.
Week 3: Assignment: Organizational Needs Assessment
Submit Assignment
Due
Mar 21 by 11:59pm
Points
125
Submitting
a file upload
Purpose
The identification of a need is the cornerstone of a project. The purpose of this assignment is to conduct an organizational needs assessment. The formulation of a comprehensive organizational needs assessment supports the professional formation of the DNP-prepared nurse. To complete the assessment of the organizational need, you will need to interview a key decision-maker at the practicum site. For students not implementing their DNP Project at a practicum site, complete the assignment as if you had interviewed a key decision-maker at a practicum site.
Course Outcomes
This assignment enables the student to meet the following course outcomes:
CO 2: Formulate a needs-based organizational assessment to inform strategic leadership decision-making. (POs 3, 5, 7)
CO 3: Develop strategies to lead project planning, implementation, management, and evaluation to promote high value healthcare. (POs 3, 5, 7)
Due Date(s)
Submit your assignment by 11:59 p.m. MT Sunday at the end of Week 3. The late assignment policy applies to this assignment.
Total points possible: 125
Page Requirement:
Length: 3-4 pages, excluding cover page and references
Instructions
To create flexibility, we are providing you options on this assignment. Concept maps are an effective way to express complex ideas, especially for visual learners. For this assignment, each of the sections can be presented
either
as a narrative
or
as a concept map.
Please note that you are not required to complete any or all sections as a concept map. If you choose to use a concept map for a section, it should be created in Microsoft Word and placed in that section of your paper. The concept map must meet all the requirements of the assignment rubric for that section. The rubric and page length requirements of the paper are unchanged.
If you need additional information about concept maps and how to create a concept map in Microsoft Word, please access the following resources.
Link (video):
Microsoft Word: Creating a Flowchart, Concept Map, or Process Map
(4:03)
Link (video):
Concept Mapping for Developing your Research
(3:37)
Review the Graduate Re-Purpose Policy in the Student Handbook, page 15:
Repurposed Work (Chamberlain University Graduate Programs only): Graduate students have the opportunity to use previously submitted ideas as a foundation for future courses. No more than 50 percent of an assignment, excluding references, may be repurposed from another Chamberlain University course (excluding practicum courses). Previous course assignments that are deemed building blocks will be notated in the syllabus by the course leader. As with every assignment, students must uphold academic integrity; therefore, students must follow the guidelines for remaining academically honest according to the Academic Integrity policy. If the instr ...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
Slides for talk at PyData Seattle 2017 about Matthew Honnibal's 4-step recipe for Deep Learning NLP pipelines. Description of the stages in pipeline as well as 3 examples of document classification, document similarity and sentence similarity. Examples include Keras custom layers for different types of attention.
What Does Conversational Information Access Exactly Mean and How to Evaluate It?krisztianbalog
This talk discusses a set of specific tasks and scenarios related to information access within the vast space that is casually referred to as conversational AI. While most of these problems have been identified in the literature for quite some time now, progress has been limited. Apart from the inherently challenging nature of these problems, the lack of progress, in large part, can be attributed to the shortage of appropriate evaluation methodology and resources. This talk presents some recent work towards filling this gap.
In one line of research, we investigate the presentation of tabular search results in a conversational setting. Instead of generating a static summary of a result table, we complement brief summaries with clues that invite further exploration, thereby taking advantage of the conversational paradigm. One of the main contributions of this study is the development of a test collection using crowdsourcing.
Another line of work focuses on large-scale evaluation of conversational recommender systems via simulated users. Building on the well-established agenda-based simulation framework from dialogue systems research, we develop interaction and preference models specific to the item recommendation scenario. For evaluation, we compare three existing conversational movie recommender systems with both real and simulated users, and observe high correlation between the two means of evaluation.
This talk has been given at the CIIR talk series at the University of Massachusetts Amherst in Jan 2021 as well as at the IR seminar series at the University of Glasgow in March 2021.
Entity Search: The Last Decade and the Nextkrisztianbalog
Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.
Note: part of the work is under still review; those slides are not yet included.
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at a tutorial organized by Radialpoint (together with E. Meij and D. Odijk).
Previous versions of the tutorial were given at WWW'13, SIGIR'13, and WSDM'14. The current version contains an overhaul of the type-aware ranking part.
For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at WSDM 2014 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
This is Part II of the tutorial "Entity Linking and Retrieval" given at SIGIR 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
This is Part II of the tutorial "Entity Linking and Retrieval" given at WWW 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to Conversational Response Generation
1. Weronika Łajewska and Krisztian Balog
University of Stavanger, Norway
Towards Filling the Gap in
Conversational Search: From Passage
Retrieval to Conversational Response
Generation
CIKM’23, Birmingham
2. This study
● Problem setting: Conversational response generation
○ It extends beyond passage retrieval + summarization
● Goal: snippet-level annotations of relevant passages, to enable
1. the training of response generation models that are able to ground answers
in actual statements
2. the automatic evaluation of the generated responses in terms of
completeness
● Main contributions:
1. Crowdsourcing task design and protocol to collect high-quality annotations
2. A dataset of 1.8k query-passage pairs annotated from the TREC 2020 and
2022 Conversational Assistance track
5. Preliminary study
A comparison of different task designs, platforms, and worker pools
● Task designs: paragraph-based vs. sentence-based annotation
● Platforms and workers:
○ Amazon MTurk (regular vs. master workers)
○ Prolific
○ Expert annotators (PhD students)
6. Evaluation measures
Traditional measures of inter-annotator
agreement are insufficient
● Fleiss’ Kappa and Krippendorff’s Alpha are
measures for categorical annotations that
rely on a binary notion of agreement
● Here: we need to measure the degree to
which snippets selected by different workers
overlap
○ Inter-annotator agreement: Jaccard
similarity (also a less strict variant,
k-Jaccard)
○ Similarity against expert annotators:
“ROUGE-like” variant of precision and recall
7. Results
Inter-annotator agreement
Task variant Annotators F1
Paragraph-based
MTurk regular 0.36
MTurk master 0.54
Prolific 0.50
Sentence-based
MTurk regular 0.31
MTurk master 0.41
Task variant Annotators Jaccard
Jaccard_k
k = 4 k = 3 k = 2
Paragraph-based
MTurk regular (n=5) 0.02 0.08 0.21 0.48
MTurk master (n=5) 0.18 0.35 0.53 0.73
Prolific (n=5) 0.14 0.27 0.44 0.65
Expert (m=3) 0.25 - - 0.54
Sentence-based
MTurk regular (n=3) 0.35 - - 0.71
MTurk master (n=3) 0.47 - - 0.76
Comparison to expert annotations
Main findings
● Relative ordering: MTurk masters > Prolific > MTurk regular
● Paragraph-level > sentence-level (w.r.t. similarity with expert annotations)
⇒ use MTurk and paragraph-based design for the large-scale data collection
9. Setup
Employ a small group of trained crowd workers, selected through a qualification
task, and create an extended set of guidelines with help of the annotators
Data collection
Performed in daily batches
(1 topic/batch =~46 HITs)
Individual feedback after each
submitted batch
General comments/suggestions on
a common Slack channel
$0.3 per HIT +$2 bonus for
completing within 24h
Qualification task
Task consisted of: a detailed
description of the problem,
examples of correct annotations,
a quiz, and 10 query-passage
pairs to be annotated
20 workers completed/15 passed
Initial guidelines
Discussion
Feedback on qualification task
Extended guidelines
10. Resulting dataset: CAsT-snippets
371 queries, top 5 passages per query ⇒ 1855 query-passage pairs
(each annotated by 3 crowd workers)
● Data quality
○ Inter-annotator agreement exceeds even that of expert annotators
○ Similarity with expert annotations is on par with MTurk master workers
● Comparison against other datasets
○ More snippets annotated per input text; also, snippets are longer
Dataset Input text
Avg. snippets length
(tokens)
# snippets per
annotation
CAsT-snippets Paragraph 39.6 2.3
SaaC Top 10 passages 23.8 1.5
QuaC Wikipedia article 14.6 1
11. Challenges identified
Challenges pointed out by the crowd workers that need to be addressed in
conversational response generation:
● Only a partial answer is present
● Temporal considerations
○ Spans may need to be excluded given the time constraints in the query
○ Assessing temporal validity can be challenging based on the paragraph alone
(without larger context)
● Subjectivity of the passages originating from blogs or comments
● Indirect answers that require reasoning and background knowledge
● Determining the appropriate amount of context to include in each span
○ Balancing between being concise and being self-contained
● Determining whether the evidence or additional information is needed or an
entity alone is sufficient as an answer
12. Summary
● Snippet-level annotations for conversational response generation
(information-seeking queries)
● Several measures to ensure high data quality
○ Preliminary study to compare task variants and crowdsourcing platforms
○ Providing feedback and training to annotators throughout the data
collection process
○ Incentive structure to engage crowd workers over a period of time and avoid
worker fatigue
● Communication with workers also led to various insights regarding
challenges in conversational response generation
13. Questions?
Extended version on arXiv: https://arxiv.org/abs/2308.08911
Dataset: https://github.com/iai-group/CAsT-snippets
14.
15. Preliminary study
Dataset: TREC CAsT’20 and ‘22 (top
5 passages according to relevance
score for each query)
Input: query + passage/sentence
Output: snippet-level annotations
in passage
Task
Variant
Annotator Time
#
workers
Acceptance
rate
Cost
Paragraph
MTurk regular 182s 5 50% $0.36
MTurk master 63s 5 90% $0.38
Prolific 154s 5 79% $0.51
Expert 96s 3 - -
Sentence
MTurk regular 977s 3 72% $0.43
MTurk master 305s 3 87% $0.56