2022-12, EMNLP, Generative Language Models for Paragraph-Level Question Generation

•

1 like•44 views

Powerful generative models have led to recent progress in question generation (QG). However, it is difficult to measure advances in QG research since there are no standardized resources that allow a uniform comparison among approaches. In this paper, we introduce QG-Bench, a multilingual and multidomain benchmark for QG that unifies existing question answering datasets by converting them to a standard QG setting. It includes general-purpose datasets such as SQuAD for English, datasets from ten domains and two styles, as well as datasets in eight different languages. Using QG-Bench as a reference, we perform an extensive analysis of the capabilities of language models for the task. First, we propose robust QG baselines based on fine-tuning generative language models. Then, we complement automatic evaluation based on standard metrics with an extensive manual evaluation, which in turn sheds light on the difficulty of evaluating QG models. Finally, we analyse both the domain adaptability of these models as well as the effectiveness of multilingual models in languages other than English. QG-Bench is released along with the fine-tuned models presented in the paper this https URL, which are also available as a demo this https URL.

Science

Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
Outline
● What is question generation?
● QG-Bench: Unified Benchmark
○ Experimental Result
○ Manual Evaluation
● Resources
2

Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
Paragraph-Level Question Generation
3

Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
Applications of Question Generation (QG)
● Educational Service: [Heilman 2010], [Lindberg 2013]
● Domain Adaptation of QA Models: [Shakeri 2020]
● Adversarial Data Augmentation: [Paranjape 2021], [Bartolo 2021]
● LM pre-training: [Jia 2021]
● Unsupervised QA Models: [Lewis 2019], [Puri 2020]
● Nearest Neighbour QA: [Lewis 2021]
● Question Re-writing: [Lee 2020]
● Semantic Role Labeling: [Pyatkin 2021]
● Multihop Question Decomposition: [Perez 2020]
● Visual QA: [Krishna 2018]
4

Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
Question generation remains understudied…🤔
5
● Model Selection
○ BART, T5, or ERNIE-GEN?
● Dataset
○ Domains
○ Languages
● Evaluation
○ BLEU4…?
● Effect of Input Type
○ With/without answer or paragraph/sentence

Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
QG-Bench
Multilingual & multidomain QG Benchmark Dataset.
● SQuAD style QG in 8 languages:
○ Language: en/es/de/ja/ko/fr/it/ru
○ Source: Wikipedia
● 10 domains in 2 styles (English only):
○ Objective: Amazon/Wiki/News/Reddit
○ Subjective: Book/Elec./Grocery/Movie/Restaurant/Trip
GitHub: https://github.com/asahi417/lm-question-generation
6
← Available on HuggingFace 🤗

Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
Experiment: English QG-Bench (SQuAD)
● T5-Large achieves SoTA (ERNIE-GEN previously).
● T5-Base already outperforms previous SoTA with less parameters.
● Performance scales with the model size.
9
Param (M) BLEU4 METEOR ROUGE-L
ERNIE-GEN 340 25.40 26.92 52.84
T5 Small 60 24.40 25.84 51.43
T5 Base 220 26.13 26.97 53.33
T5 Large 770 27.21 27.70 54.13

Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
Experiment: Non-English QG-Bench
● Fine-tuning with MT5 Base
● Spanish > Italian > Korean > Russian > Japanese > French > German
● The size of the training instance matters.
○ Spanish (77,025) and Italian (46,550) vs French (17,543) and German (9,314)
10
Language BLEU4 METEOR ROUGE-L
EN 23.03 25.18 50.67
RU 17.63 28.48 33.02
JA 32.40 30.58 52.67
IT 7.70 18.00 22.51
KO 12.18 29.62 28.57
ES 10.15 23.43 25.45
DE 0.87 13.65 11.10
FR 6.14 15.55 25.88

Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
Experiment: Domain Adaptability
11
● Fine-tuning with T5 Large on:
○ Domain’s training set
○ SQuAD training set
○ SQuAD -> domain’s training set
● In-domain training set is too small for
fine-tuning (~3k).
● SQuAD fine-tuned models cannot
generalize well, especially on SubjQA.
● Continuous fine-tuning (SQuAD ->
domain) works the best.

Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
Manual Evaluation
12
Criteria:
● Answerability: whether the question can be
answered by the given input answer.
● Grammaticality: grammatical correctness.
● Understandability: whether the question is
easy to be understood by readers.
Models: BART, T5, LSTM models.
BLEU4 (B4) does not correlate well with
human judgements.
METEOR (MTR) & BERTScore (BS) are
more robust.

Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
AutoQG https://autoqg.net/
13

Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados
QG Models Available via `pip install lmqg`
14

Similar to 2022-12, EMNLP, Generative Language Models for Paragraph-Level Question Generation

Laura Dent: Single-Source and LocalizationJack Molisani

DA 592 - Term Project Report - Berker Kozan Can KokluCan Köklü

[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He

Swad TimelineAntonio Cañas Vargas

pdx893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-26-112Thinkful

Transformer_Clustering_PyData_2022.pdfChristopherLennan

About programming languagesGanesh Samarthyam

Enriching Word Vectors with Subword InformationSeonghyun Kim

NLG, Training, Inference & Evaluation Deep Learning Italia

What is the best programming language for your web product?MobiDev

Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...AI Frontiers

Khujand State Universitynamed after academician B. GafurovDilshod Muzafarov

RTL DESIGN IN ML WORLD_OBJECT AUTOMATION IncObject Automation

SWAD Timeline 4:3Antonio Cañas Vargas

Devday @ Sahaj - Domain Specific NLP PipelinesRajesh Muppalla

Intro to software developmentHawkman Academy

Golang, Python or C/C++, who wins Obed N Muñoz

Probabilistic Programming Languages ApproachesStefano Campese

Deliberate Practice (Agile Slovenia 2015)Peter Kofler

Yves Peirsman - Deep Learning for NLPHendrik D'Oosterlinck

Similar to 2022-12, EMNLP, Generative Language Models for Paragraph-Level Question Generation (20)

Laura Dent: Single-Source and Localization

DA 592 - Term Project Report - Berker Kozan Can Koklu

[KDD 2018 tutorial] End to-end goal-oriented question answering systems

Swad Timeline

pdx893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-26-112

Transformer_Clustering_PyData_2022.pdf

About programming languages

Enriching Word Vectors with Subword Information

NLG, Training, Inference & Evaluation

What is the best programming language for your web product?

Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...

Khujand State Universitynamed after academician B. Gafurov

RTL DESIGN IN ML WORLD_OBJECT AUTOMATION Inc

SWAD Timeline 4:3

Devday @ Sahaj - Domain Specific NLP Pipelines

Intro to software development

Golang, Python or C/C++, who wins

Probabilistic Programming Languages Approaches

Deliberate Practice (Agile Slovenia 2015)

Yves Peirsman - Deep Learning for NLP

Recently uploaded

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa

Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2

Module for Grade 9 for Asynchronous/Distance learninglevieagacer

Conjugation, transduction and transformationAreesha Ahmad

PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEGoa Call Girls High Profile Escorts

COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)AkefAfaneh2

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2

Clean In Place(CIP).pptx .Poonam Aher Patil

Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav

Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation

Selaginella: features, morphology ,anatomy and reproduction.Silpa

Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087

An introduction on sequence tagged site mappingadibshanto115

FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson

Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav

Use of mutants in understanding seedling development.pptxRenuJangid3

Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa

Recently uploaded (20)

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...

Dr. E. Muralinath_ Blood indices_clinical aspects

Module for Grade 9 for Asynchronous/Distance learning

Conjugation, transduction and transformation

PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE

COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....

Clean In Place(CIP).pptx .

Zoology 5th semester notes( Sumit_yadav).pdf

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Exploring Criminology and Criminal Behaviour.pdf

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...

Selaginella: features, morphology ,anatomy and reproduction.

Digital Dentistry.Digital Dentistryvv.pptx

An introduction on sequence tagged site mapping

FAIRSpectra - Enabling the FAIRification of Analytical Science

Chemistry 5th semester paper 1st Notes.pdf

Use of mutants in understanding seedling development.pptx

Molecular markers- RFLP, RAPD, AFLP, SNP etc.

2022-12, EMNLP, Generative Language Models for Paragraph-Level Question Generation

1. Generative Language Models for Paragraph-Level Question Generation Computer Science & Informatics, Cardiff University, Cardiff NLP Asahi Ushio, Fernando Alva-Manchego and Jose Camacho-Collados https://github.com/asahi417/lm-question-generation

2. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados Outline ● What is question generation? ● QG-Bench: Unified Benchmark ○ Experimental Result ○ Manual Evaluation ● Resources 2

3. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados Paragraph-Level Question Generation 3

4. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados Applications of Question Generation (QG) ● Educational Service: [Heilman 2010], [Lindberg 2013] ● Domain Adaptation of QA Models: [Shakeri 2020] ● Adversarial Data Augmentation: [Paranjape 2021], [Bartolo 2021] ● LM pre-training: [Jia 2021] ● Unsupervised QA Models: [Lewis 2019], [Puri 2020] ● Nearest Neighbour QA: [Lewis 2021] ● Question Re-writing: [Lee 2020] ● Semantic Role Labeling: [Pyatkin 2021] ● Multihop Question Decomposition: [Perez 2020] ● Visual QA: [Krishna 2018] 4

5. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados Question generation remains understudied…🤔 5 ● Model Selection ○ BART, T5, or ERNIE-GEN? ● Dataset ○ Domains ○ Languages ● Evaluation ○ BLEU4…? ● Effect of Input Type ○ With/without answer or paragraph/sentence

6. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados QG-Bench Multilingual & multidomain QG Benchmark Dataset. ● SQuAD style QG in 8 languages: ○ Language: en/es/de/ja/ko/fr/it/ru ○ Source: Wikipedia ● 10 domains in 2 styles (English only): ○ Objective: Amazon/Wiki/News/Reddit ○ Subjective: Book/Elec./Grocery/Movie/Restaurant/Trip GitHub: https://github.com/asahi417/lm-question-generation 6 ← Available on HuggingFace 🤗

7. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados QG-Bench Multilingual & multidomain QG Benchmark Dataset. ● SQuAD style QG in 8 languages: ○ Language: en/es/de/ja/ko/fr/it/ru ○ Source: Wikipedia ● 10 domains in 2 styles (English only): ○ Objective: Amazon/Wiki/News/Reddit ○ Subjective: Book/Elec./Grocery/Movie/Restaurant/Trip GitHub: https://github.com/asahi417/lm-question-generation 7 ← Available on HuggingFace 🤗

8. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados QG-Bench Multilingual & multidomain QG Benchmark Dataset. ● SQuAD style QG in 8 languages: ○ Language: en/es/de/ja/ko/fr/it/ru ○ Source: Wikipedia ● 10 domains in 2 styles (English only): ○ Objective: Amazon/Wiki/News/Reddit ○ Subjective: Book/Elec./Grocery/Movie/Restaurant/Trip GitHub: https://github.com/asahi417/lm-question-generation 8 ← Available on HuggingFace 🤗

9. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados Experiment: English QG-Bench (SQuAD) ● T5-Large achieves SoTA (ERNIE-GEN previously). ● T5-Base already outperforms previous SoTA with less parameters. ● Performance scales with the model size. 9 Param (M) BLEU4 METEOR ROUGE-L ERNIE-GEN 340 25.40 26.92 52.84 T5 Small 60 24.40 25.84 51.43 T5 Base 220 26.13 26.97 53.33 T5 Large 770 27.21 27.70 54.13

10. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados Experiment: Non-English QG-Bench ● Fine-tuning with MT5 Base ● Spanish > Italian > Korean > Russian > Japanese > French > German ● The size of the training instance matters. ○ Spanish (77,025) and Italian (46,550) vs French (17,543) and German (9,314) 10 Language BLEU4 METEOR ROUGE-L EN 23.03 25.18 50.67 RU 17.63 28.48 33.02 JA 32.40 30.58 52.67 IT 7.70 18.00 22.51 KO 12.18 29.62 28.57 ES 10.15 23.43 25.45 DE 0.87 13.65 11.10 FR 6.14 15.55 25.88

11. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados Experiment: Domain Adaptability 11 ● Fine-tuning with T5 Large on: ○ Domain’s training set ○ SQuAD training set ○ SQuAD -> domain’s training set ● In-domain training set is too small for fine-tuning (~3k). ● SQuAD fine-tuned models cannot generalize well, especially on SubjQA. ● Continuous fine-tuning (SQuAD -> domain) works the best.

12. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados Manual Evaluation 12 Criteria: ● Answerability: whether the question can be answered by the given input answer. ● Grammaticality: grammatical correctness. ● Understandability: whether the question is easy to be understood by readers. Models: BART, T5, LSTM models. BLEU4 (B4) does not correlate well with human judgements. METEOR (MTR) & BERTScore (BS) are more robust.

13. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados AutoQG https://autoqg.net/ 13

14. Generative Language Models for Paragraph-Level Question Generation Asahi Ushio, Fernando Alva-Manchego, Jose Camacho-Collados QG Models Available via `pip install lmqg` 14

15. Thank you!!

2022-12, EMNLP, Generative Language Models for Paragraph-Level Question Generation

Recommended

Recommended

More Related Content

Similar to 2022-12, EMNLP, Generative Language Models for Paragraph-Level Question Generation

Similar to 2022-12, EMNLP, Generative Language Models for Paragraph-Level Question Generation (20)

More from asahiushio1

More from asahiushio1 (8)

Recently uploaded

Recently uploaded (20)

2022-12, EMNLP, Generative Language Models for Paragraph-Level Question Generation