French machine reading for question answering

FRENCH MACHINE READING FOR QUESTION ANSWERING
2018, October 22 │24
1
French Machine Reading for Question Answering
Ali Kabbadj1
, Robert Plana1*
, Mehdi Brahimi1
, Alain Mangeot1
1Assystem E&I
*
Ali Kabbadj, E-mail: akabbadj@assystem.com
Abstract:
This paper proposes to unlock the main barrier to machine reading and comprehension French natural language
texts. This open the way to machine to find to a question a precise answer buried in the mass of unstructured French
texts. Or to create a universal French chatbot. Deep learning has produced extremely promising results for various
tasks in natural language understanding particularly topic classification, sentiment analysis, question answering, and
language translation. But to be effective Deep Learning methods need very large training datasets. Until now these
technics cannot be actually used for French texts Question Answering (Q&A) applications since there was not a large
Q&A training dataset. We produced a large (100 000+) French training Dataset for Q&A by translating and adapting
the English SQuAD v1.1 Dataset, a GloVe French word and character embedding vectors from Wikipedia French
Dump. We trained and evaluated of three different Q&A neural network architectures in French and carried out a
French Q&A models with F1 score around 70%.
KEYWORDS: Text mining, Data mining, Machine Learning, Deep Learning, Training Da-
taset, Machine Reading & Comprehension, French, chatbot

2018, October 22 │24
2
1 INTRODUCTION:
Until very recently, the construction of a text reading and comprehension machine able to perform tasks as semantic
parsing, information retrieval, sentimental analysis, question answering, machine translation, text classification, infor-
mation extraction... was based on "classical" linguistic techniques.
These technics include "Tokenization" (word division), lemmatization (assignment of the non-inflected form associ-
ated), morphological analysis (structure and properties of a word) or syntactic analysis (structure of a sentence and
relationships between elements of a sentence).
This approach relies mainly on the use of formal grammars (text representation and rules) constructed by the hand
of an expert-linguist. Pre-treatments mentioned are used as a basis for constructing rules and linguistic patterns that
define contexts of occurrence of such entity or relationship. Note here the particular importance given to the analysis
(constituents or dependency) in the identification and typing of relationships and events. This process is lengthy,
requires experts and given the natural language complexity the final models and text representations are not always
effective.
Deep learning methods are representation learning methods with multiple levels of representation, obtained by com-
posing simply but nonlinear modules that each transforms the representation at one level (starting with the raw input)
into a higher representation slightly more abstract level, with the composition of enough such transformations, and
very complex functions can be learned.
Deep learning can quickly acquire new effective characteristic representation from training data. The key aspect of
deep learning is that these layers of features are not designed by human engineers, they are learned from data using
a general purpose learning procedure. Deep learning requires very little engineering by hand, so it can easily take
advantage of the increase in the amount of available computation and data.
Deep learning has produced extremely promising results for various tasks in natural language understanding partic-
ularly topic classification, sentiment analysis, question answering, and language translation.
But to be effective Deep Learning methods need very large training datasets. Until now these technics cannot be
actually used for French texts Question Answering (Q&A) applications since there was not a large Q&A training
dataset.
The main contribution of this work is as follows:
 Extensive reading of literature about Text Mining (TM), Information Retrieval (IR) & Extraction (IE) and Ma-
chine Reading Comprehension (MRC) for Question Answering (Q&A). Selection of state-of-the-art methods
in these different fields.
 Creation of a large French training Dataset for Q&A by translating and adapting the English SQuAD v1.1
Dataset
 Creation of GloVe French word and character embedding vectors from Wikipedia French Dump
 Training and evaluation of three different Q&A neural network architectures in French
 Carrying out French Q&A models with F1 score around 70%
In this paper, we will first briefly summarize our state-of-the-art findings. Second, we will present in detail the applica-
tion and result of the different methods. Finally, we will discuss the results and future work.

2018, October 22 │24
3
2 RELATED WORK
Recently, impressive progress has been made in neural network question answering (Q&A) systems, which can
analyze a passage to answer a question. These systems work by matching a representation of the question to the
text to find the relevant answer string.
Q&A Traditional approaches to question answering typically involve rule-based algorithms or linear classifiers over
hand-engineered feature sets. Richardson et al. (2013) proposed two baselines, one that uses simple lexical features
such as a sliding window to match bags of words, and another that uses word-distances between words in the ques-
tion and in the document. Berant et al. (2014) proposed an alternative approach in which one first learns a structured
representation of the entities and relations in the document in the form of a knowledge base, then converts the ques-
tion to a structured query with which to match the content of the knowledge base. Wang et al. (2015) described a
statistical model using frame semantic features as well as syntactic features such as part of speech tags and depend-
ency parses. Chen et al. (2016) proposed a competitive statistical baseline using a variety of carefully crafted lexical,
syntactic, and word order features.
Neural Q&A Neural attention models have been widely applied for machine reading & comprehension or Q&A in NLP.
Hermann et al. (2015) proposed an AttentiveReader model with the release of the CNN/Daily Mail cloze-style question
answering dataset. Hill et al. (2016) released another dataset steming from the children’s book and proposed a win-
dow-based memory network. Kadlec et al. (2016) presented a pointer-style attention mechanism but performs only
one attention step. Sordoni et al. (2016) introduced an iterative neural attention model and applied it to cloze-style
machine comprehension tasks.
The most successful neural network models generally use first an “encoder-interaction-pointer” framework (Weissen-
born, Wiese, and Seiffe 2017). In such framework, word sequences of both query and context are projected into
distributed representations and encoded by recurrent neural networks. Second, an attention mechanism (Bahdanau,
Cho, and Bengio 2014) to model the complex interaction between the query and the context. Finally a pointer network
(Vinyals, Fortunato, and Jaitly 2015) is used to predict the answer boundary.
Successful combinations of these ingredients are the Bidirectional Attention Flow (BiDAF) model by Seo et al. (2016),
2016), Gated Self-Matching Networks (R-Net) (Wang et al., 2017), Document Reader (DrQA) (Chen et al., 2017) and
Multi-Paragraph Reading Comprehension (DocQA) (Clark and Gardner, 2017).
Recent works propose two new approaches. First, a little more complex BiBAF model, Reinforced Mnemonic Reader
by Hu et al. (2017). In that model, 3 new features are introduced : (i) syntactic and lexical features with the embedding
of each word, (ii) iteratively align the context with the query as well as the context itself and (iii) new objective function,
which combines the maximum-likelihood cross-entropy loss with rewards from reinforcement learning.
The second is a completely new architecture, QANet by Wei Yu et al. (2018). In that model, they remove the recurrent
nature of previous models and instead exclusively use convolutions and self-attentions as the building blocks of
encoders that separately encodes the query and context. Then they learn the interactions between context and ques-
tion by standard attentions (Xiong et al., 2016; Seo et al., 2016; Bahdanau et al., 2015). The resulting representation
is encoded again with a recurrency-free encoder before finally decoding to the probability the position of each answer
span.
The significant advance on reading comprehension has largely benefited from the availability of large-scale training
datasets. Large cloze-style datasets such as CNN/DailyMail (Hermann et al. 2015) and Childrens Book Test (Hill et
al. 2016) were first released, make it possible to solve Q&A tasks with deep neural architectures. The SQuAD (Raj-
purkar et al. 2016) is the more recently released Dataset. It is one of the first large MRC datasets (over 100k QA
pairs) is (Rajpurkar et al., 2016). For its collection, different sets of crowd-workers formulated questions and answers
using passages obtained from 500 Wikipedia articles. The answer to each question is a span in the given passage,
and many effective Q&A neural models have been developed for this dataset.
To the best of our knowledge, our paper is the first work to achieve both the production of a large Q&A French dataset
and pre-trained Q&A neural network models that handle French text.

2018, October 22 │24
4
2.1 DEEP LEARNING Q&A IN FRENCH
2.1.1 The objective
All the Q&A neural network model try to extract from a given context paragraph P the right answer to a
given question Q. The given paragraph is an ordered list of n words, P = [ p1, p2,….., pn] and the question
is an ordered list of m words, Q = [ q1, q2,….., qn]. The model should extract the right answer A which
should be a span of the original paragraph, A = [ pi, pi+1,….., pi+j].] with i+j<n.
To be clear the answer should be a span from the paragraph. No deduction can be made. For example
if the paragraph is P = “Paul has a Ford and a Honda” and the question Q = “How many car does Paul
have?” the Q&A model will not be able to answer “2”. But it wil be able to answer the question Q =
”What car does Paul have?”, the answer will be A = “a Ford and a Honda”.
2.1.2 The Neural Network Architectures
For our work, from the hundreds of neural network architectures referenced in the SQuAD leaderboard, https://raj-
purkar.github.io/SQuAD-explorer, we chose to test, train and three most competitive ones of them. We focus only on
“simple model” category and compare against other models with the same category.
Figure 5
Scores collected from the SQuAD leaderboard on Sep 1, 2018
https://rajpurkar.github.io/SQuAD-explorer/
EM F1
Human Performance
Stanford University
QANet (single)
Google Brain & CMU
Reinforced Mnemonic Reader + A2D (single model)
Microsoft Research Asia & NUDT
r-net (ensemble)
Microsoft Research Asia & NUDT
SLQA+
single model
Hybrid AoA Reader (single model)
Joint Laboratory of HIT and iFLYTEK Research
r-net+ (single model)
Microsoft Research Asia
MAMCN+ (single model)
Samsung Research
BiDAF + Self Attention + ELMo (single model)
Allen Institute for Artificial Intelligence
KACTEIL-MRC(GF-Net+) (single model)
Kangwon National University, Natural Language Processing Lab.
MDReader0
single model
KakaoNet (single model)
Kakao NLP Team
Mnemonic Reader (single model)
NUDT and Fudan University
QFASE
NUS
MAMCN (single model)
Samsung Research
M-NET (single)
UFL
AttReader (single)
College of Computer & Information Science, SouthWest University, Chongqing, China
Document Reader (single model)
Facebook AI Research
Ruminating Reader (single model)
New York University
ReasoNet (single model)
MSR Redmond
jNet (single model)
USTC & National Research Council Canada & York University
May 09, 2018 78.401 85.724
…………. ………. ……….
87.288
85.833
78.664 85.78
78.171 85.543
Jun 20, 2018
May 09, 2018
May 09, 2018
Jan 13, 2018
Jan 22, 2018
Jan 03, 2018
Feb 23, 2018
Nov 03, 2017
Mar 29, 2018
Jun 01, 2018
82.304 91.221
82.471 89.306
79.692 86.727
78.58
79.901 86.536
88.1381.538
82.136 88.126
80.436 87.021
80.027
80.146
Apr 13, 2017 71.898 79.989
Mar 24, 2017 70.607 79.821
………………….
Apr 02, 2017 70.639 79.456
Mar 08, 2017 70.555 79.364
May 23, 2018 71.373 79.725
Mar 14, 2017 70.733 79.353
Apr 22, 2018 70.985 79.939
Oct 27, 2017 71.016 79.835
Jul 14, 2017 70.995

2018, October 22 │24
5
The first, DrQA (Chen et al. 2017), (figure 6), is a basic BiDAF. It has the advantage to be coupled with an efficient
(non-machine learning) document retrieval system to first narrow the search space and focus on reading only articles
that are likely to be relevant. A simple inverted index lookup followed by term vector model scoring performs quite
well on this task for many question types, compared to the built-in ElasticSearch based Wikipedia Search API (Gorm-
ley and Tong, 2015). Articles and questions are compared as TF-IDF weighted bag-of-word vectors.
The two others have at the date of our work the highest scores. They are open source and have very different archi-
tecture. They are Reinforced Mnemonic Reader (Hu et al. 2018), (figure 7), and QANet (Yu et al. 2018), (figure 8).
Figure 6
DrQA Neural network architecture of the proposed Attention-over-Attention Reader (AoA Reader) Source :
https://arxiv.org/pdf/1607.04423.pdf
Figure 7
The high-level overview of Reinforced Mnemonic Reader In the feature-rich encoder, f~xqi gni =1, f~xcj gmj =1
are embedding matrices of query and context respectively. Fqigni =1 and fcjgmj =1 are concatenated hidden
states of encoding BiLSTM. In the iterative aligner, f_ct jgmj =1 and f^ct jgmj =1 are he query-aware and self-
aware context representation in the t-th hop respectively, while f_ct jgmj =1 is he fully-aware context representa-
tion. In the memory-based answer pointer, zlsand zle are memory vectors used for predicting probability distribu-
tions of the answer span (pl s and pl e) in the l-th hop re pectively. SFU refers to the semantic fusion unit.
Source : https://arxiv.org/pdf/1705.02798.pdf

2018, October 22 │24
6
2.1.3 Model Architecture comparison
Almost all Q&A Neural Network Architectures have five steps (Figure 9).
1. Embedding: Word Embedding + Word Character Embedding
 Word Embedding : Maps each word to a dense vector in such a way that similar semantic words have
close vectors
 Word Character Embedding: Maps each word to a vector in such a way that similar spelling words have
close vectors
2. Encoding I: Adds features to the word embedding vectors of a phrase that take in account the position of
the words in the phrase and their interaction (Word sequence or temporal interaction)
3. Align Context & Query : Adds features to the word vectors of the Context that take in account the Query
encoded word vectors
4. Encoding II or self-Aligning: Adds features to the word embedding vectors of the con-text in account even
more the position of the words in the context and their interaction (Word sequence or temporal interaction)
5. Classifier: Predicts the probability of the position in the context of the start and the end of the answer
Figure 8
An overview of the QANet architecture (left) which has several Encoder Blocks. We use the same Encoder
Block (right) throughout the model, only varying the number of convolutional layers for each block. We use lay-
ernorm and residual connection between every layer in the Encoder Block. We also share weights of the context
and question encoder, and of the three output encoders. A positional encoding is added to the input at the be-
ginning of each encoder layer consisting of sin and cos functions at varying wavelengths, as defined in (Vaswani
et al., 2017a). Each sub-layer after the positional encoding (one of convolution, self-attention, or feed-forward-
net) inside the encoder structure is wrapped inside a residual block. Source: https://arxiv.org/pdf/1804.09541.pdf

2018, October 22 │24
7
Figure 9
Embedding
=
List of N Context
word vectors
without contextual
surrounding
Output C = d x N
d = embedding dim
Attention, Fusion
or Align
Context & Query
=
List of N word vec-
tors with align-
ment of Context &
Query
Output G = f x N
f = fusion dim > t
Encoding
=
List of N Context
words vectors
with contextual
surrounding
Output H = t x N
t = encoding dim > d
Encoding II or
self-Aligning
=
List of N word
vectors with con-
textual surround-
ing
Output M = g x N
g = modeling dim
Word Embedding
=
Maps each word to
a dense vector in
such a way that sim-
ilar semantic words
have close vectors
Word Character
Embedding
=
Maps each word to
a vector in such a
way that similar
spelling words have
close vectors
Encoding I
=
Adds features to the
word embedding vec-
tors of a phrase that
take in account the
position of the words
in the phrase and
their interaction
(Word sequence or
temporal interaction)
Align Context & Query
=
word vectors of the Con-
text that take in account
the Query encoded
word vectors
Encoding II or self-
Aligning
=
word embedding
vectors of the context
in account even
more the position of
the words in the con-
text and their interac-
tion
(Word sequence or
temporal interaction)
Classifier
=
Predicts the proba-
bility of the position
in the context of the
start and the end of
the answer
Classifier End
of the answer
Output Pe = the
position of the
End of the An-
swer in the con-
text
Classifier Start
of the answer
Output Ps = the
position of the
Start of the An-
swer in the con-
text
Embedding
=
List of J Question
word vectors
without contextual
surrounding
Output Q = d x J
d = embedding dim
Encoding
=
List of J Question
words vectors
with contextual
surrounding
Output U = t x J
t = encoding dim > d
First step:
Embedding
Modeling
the semantic
and the
spelling of a
word
Second step:
Encoding
Modeling
the sequence
and interaction
between the
words in a text
or phrase
Third step:
Alignment
Modeling
the interaction
between 2
texts or
phrases
Fourth step:
Encoding
Modeling
further the se-
quence and in-
teraction be-
tween the
words in a text
or phrase
Fifth step:
Classifier
Modeling
The prediction
of an answer
given a Con-
text and a
Question
Conceptual
level
Application
level
Context
=
List of N
words
Question
=
List of J
words
Application
level detail

2018, October 22 │24
8
Each architecture uses different methods for each step:
DrQA (Chen, 2017) RMN (Hu, 2018) QANet (Yu, 2018)
Embedding
Pre-trained word GloVe
+
POS+NER++TF
Pre-trained Word GloVe
+
Pre-trained Character
+
EM+POS+NER+TF+QC
Pre-trained word GloVe
Encoding BiLSTM neural Network BiLSTM neural Network CNN neural Network
Alignment
Attention
Context-Query Attention Bi-Attention Dynamic Co-Attention
Encoding
self-aligning/
self-attention
BiLSTM neural Network Iterative
BiLSTM neural Network
CNN neural Network
Classifier Softmax Classifier on
EM
Softmax Classifier with
reinforcement learning
on F1 score
Softmax Classifier on EM
Added features to word embedding:
EM: Exact Match, POS: Part Of Speech, NER: Named Entity Recognition, TF: Term Frequency, QC: Query Category
Figure 10
Given their performances, the strengths of each architecture are the followings:
 DrQA:
 The reading module is not particularly performant, but the document retriever is very use-
ful for production deployment
 RMN:
 Incorporate more valuable features to the Context word embedding
 Iteratively align the Context with the Query and the Context with itself
 Introduce an objective function that combines the maximum-likelihood cross-entropy loss
with rewards from reinforcement learning.
 QANet:
 its CNN Neural Network is more effective and fast than the BiLSTM Recurrent Neural
Network architecture for Encoding and Alignment
Give the modular nature of all these architectures, we can easily combine the best module of each of them and get
even more performant architecture.
For example, combine the Embedding and Classier modules of RMN and the Encoding, Attention and Self-Alignment
modules of QANet.

2018, October 22 │24
9
We could also add the Document Retriever module of DrQA to the top in order to be able to ask a large corpus.

2018, October 22 │24
10
2.1.4 The training Dataset
Almost all the models available are trained on English datasets. For our work we need to train with a French dataset.
Since we did not find any substantial French Q&A Dataset, we had to build one. Instead of starting from scratch and
spend weeks to ask crowd workers to read article, create questions and report answers and their start and end
position in the context, we preferred to translate the SQuAD training and dev datasets v1.1 (Rajpurkar et al., 2016)
from English to French.
SQuAD contains 107.7K query-answer pairs, with 87.5K for training, 10.1K for validation, and another 10.1K for
testing. Only the training and validation data are publicly available. Each training example of SQuAD is a triple of (d;
q; a) in which document d is a multi-sentence paragraph, q is the question and a is the answer to the question.
For Machine Translation, we utilized the publicly available Google Translate API GoogleTrans package provided by
SuHun Han on https://github.com/ssut/py-googletrans.
However, translating (d; q; a) from English to French is not enough. All the models need the answer span and its
position in the context, its start and end. Therefore, we need to find for the French answer the star and the end of the
answer in the French context.
Since the translation of the context and the answer are not always aligned, it is not always possible to find the answer
as translated in the context. In our translation less than 2/3 of the answers were found in the context. For the rest we
had to reconstitute the answer from the English one (EnA) and the French translated one (FrA).
For this reconstituted French answer, we first split the strings (EnA) and (FrA) in a list of words (Lowa) and try to find
the string in the context with a length equal to the maximum of (EnA) and (FrA) length, Lmax, and the maximum of
word close to the words in (Lowa). We used three kind of methods to determine how close two words are.
 Exact match = 1
 Ratio Levenshtein distance
 Jaro Winkler distance
For each string of a length of Lmax in the context, we add up the words distances and take the string with the highest
score. We did that with the strings including stop words and punctuation and strings without them (Non-normalized
and Normalized).
*closet string in the context: string in the context of a length of 40 (Lmax) with the highest distance (EM, Ratio Levenshtein or Jaro Winkle)
Figure 11: French answer reconstitution
English context:
The dismantling of a nuclear reactor can only take place after the appro-
priate license has been granted pursuant to the Council Directive 85 / 3
37 /EEC and the Article 37 of the Grand Euratom Treaty. As part of the
licensing procedure, various documents, reports and expert opinions
have to be written and delivered to the competent authority, e.g. safety
report, technical documents and an environmental impact study (EIS).
French context:
Le démantèlement d'un réacteur nucléaire ne peut avoir lieu qu'après l'oc-
troi de la licence appropriée conformément à la directive 85/3 37 / CEE du
Conseil et à l'article 37 du traité Grand d’Euratom. Dans le cadre de la
procédure d'autorisation, divers documents, rapports et avis d'experts doi-
vent être rédigés et remis à l'autorité compétente, par exemple rapport de
sécurité, documents techniques et étude d’impact sur l’environnement.
English Question:
What article rules nuclear dismantling?
English Answer (EnA):
Article 37 of the Grand Euratom Treaty
French Answer Start, length:
159 , 40
French Question:
Quel article régit le démantèlement nucléaire?
French Answer (FrA):
L’Article 37 du grandiose traité Euratom
English Answer Start, length:
159 , 38
Machine Translation
Answer in
context
List of words in the answers (Lowa)
[‘37', 'article', ‘euratom', 'grand', 'treaty', 'du', 'grandiose', 'of', 'traité']
Closest string
in the context*
French Answer Start:
161
New French Answer
l'article 37 du traité Grand d’Euratom.
No

2018, October 22 │24
11
We end with six French SQuAD Datasets. We trained the models with each of them. The one that produced the best
F1 score is the Dataset 2, (Figure 12).
Figure 12
Tested French SQuAD datasets
The SQuAD French Datset 2 has the following statistics (Figure 13)
Figure 13
2.1.5 Word and Character Embedding
In order to manipulate words and characters in the code and the models, we need to represent them in form of
vectors. The first idea is representing them as one-hot vectors. Therefore, the vector dimension will be equal to the
total number of words in a vocabulary.
This representation has at least two drawbacks. The word vectors are very long and sparse. Secondly, it is not pos-
sible to compare the semantic and syntactic similarity of two words. It would be convenient to devise a dense vector
for each word chosen in such a way that similar words has close vectors. That is word embedding.
GloVe [(Pennington et al., 2014)] and word2vec [Tomas Mikolov et al., 2013] are two well-known word embedding
methods that learn embedding vectors based on the idea that the words that often appear in similar contexts are
similar to each other. To do so, these algorithms try to accurately predict the adjacent word(s) given a word or a
context (i.e., a few words appeared in the same context window).
Using well pre-trained word embedding vectors is very important for Q&A model performance.
English pre-trained word embedding vectors are available open source on the net [http://nlp.stan-
ford.edu/data/wordvecs/glove.840B.300d.zip / https://code.google.com/archive/p/word2vec/downloads],
But we didn’t find an open source reliable French pre-trained word embedding vectors.
Therefore, we had to build our own French pre-trained word embedding vectors. To do that, we used as corpus the
French Wikipedia Dump of 2018-07-20 [frwiki-20180720-pages-articles-multistream.xml -https://dumps.wiki-
media.org/frwiki/20180720/ ] and we ran the package given in https://github.com/stanfordnlp/GloVe. We got a file of
300 dimension vector for 1 658 972 French words.
Reinforced Mnemonic Reader (Hu et al. 2017) model requires also a pre-trained character embedding vectors. To
get these vectors for French language, we applied again the GloVe package to our words of the French pre-trained
word-embedding file.
In this model, to add features to the word vectors, the character vectors of all the characters in one word are concat-
enated to the word vector.
Exact Match
Levenshtein
Ratio distance
Jaro winkler
distance
Not Normalized
SQuAD French
Datasets 1
SQuAD French
Datasets 3
SQuAD French
Datasets 5
Normalized
SQuAD French
Datasets 2
SQuAD French
Datasets 4
SQuAD French
Datasets 6
Normalized
Word
Similarity
Tokenizer Set
Total English
Answers
Direct Answers
from Translation
Reconstuted
Answers
Lost Answers
Total French
Answers
34,725 21,858 11,671 1,196 33,529
100.00% 62.95% 33.61% 3.44% 96.56%
87,599 54,237 30,631 2,731 84,943
100.00% 61.92% 34.97% 3.12% 96.97%
dev
Yes Exact
train
SQuAD French Dataset 2
Spacy
fr_core_news_md

2018, October 22 │24
12
2.1.6 Data Preprocessing
We tested 4 French tokenizers: Stanford CoreNLP, spaCY with three different models “fr”, “fr_core_news_sm” and
“fr_core_news_md”. The more performing in French are spaCy “fr_core_news_sm” and “fr_core_news_md” available
in https://spacy.io/models/fr.
2.1.7 Results & Discussion
We used the F1 evaluation metric of accuracy for the model performance. F1 measures the portion of overlap to-
kens between the predicted and the right answer.
To evaluate our models, we toke the official SQuAD evaluation script (https://rajpurkar.github.io/SQuAD-explorer/) (we just
changed the stop words from English to French ones), and we ran it along with our French SQuAD development v1.1 file
obtained, as the French training file, by translation and answer reconstitution of the official SQuAD dev v1.1 English file.
We trained and evaluated the three neural architectures, DrQA (Chen), RMR (Hu) & QANet (Yu), with different French
SQuAD Datasets normalization, tokenizers, and word embedding vectors.
The best results are in (figure 14).
Figure 14
F1 score of French trained model
We notice that the F1 English scores that we obtained with the neural networks architectures that we ran with the
models stored in Github are lower than the ones reported in the SQuAD leaderboard on Sep 1, 2018. That could be
because the reported scores are obtained with more recent models than those stored in Github.
Of greater concern for our work, the French models score lower by more than 10 pts than the English models with
same architecture.
The mean reason could lie in the training French Dataset that is the output of Machine Translation. As we have seen
2/3 of translated answers are in the translated context but we have to find it start position in the context. The issue is
that the string representing the answer could be in different positions in the context. It is difficult to determine which
one is the right one. Example (Figure 15).
DrQA (Chen, 2017) RMN (Hu, 2018) QANet (Yu, 2018)
Datasets
SQuAD French
Datasets 2
SQuAD French
Datasets 2
SQuAD French
Datasets 2
Tokenizer Spacy FR sm Spacy FR md Spacy FR sm
Word Embeddings GloVe Wiki FR 300d
GloVe Wiki FR 300d
vectors_fr_car_20d.txt
from GloVe vectors_fr_300d.txt
GloVe Wiki FR 300d
F1 French 65.80% 70.56% 69.30%
F1 English
(Run with Model in Github)
78.58% 81.35% 79.56%
Diff Fr-En -12.78 -10.79 -10.26
F1 English
(Reported in SQuAD LeaderDashboard)
78.35% 88.13% 89.31%

2018, October 22 │24
13
Figure 15
If we choose the first position of the answer, we are wrong.
Therefore, this kind of error make the French Dataset less reliable for de training than de English one, which is created
by crowdworkers. So to say handmade.
Furthermore, the French answer reconstitution algorithm, for the 1/3 rest of the answers, generates even more posi-
tion and meaning errors.
2.1.8 Application example screenshots
With DrQA
French Question:
Quel est le numéro de l'article qui régit le démantèlement nucléaire ?
French Answer:
37
French context:
Le démantèlement d'un réacteur nucléaire ne peut avoir lieu qu'après l'oc-
troi de la licence appropriée conformément à la directive 85/3 37 / CEE du
Conseil et à l'article 37 du traité Grand d’Euratom. Dans le cadre de la
procédure d'autorisation, divers documents, rapports et avis d'experts doi-
vent être rédigés et remis à l'autorité compétente, par exemple rapport de
sécurité, documents techniques et étude d’impact sur l’environnement.
French Answer Start:
135 or 170
English Question:
What is the number of the article that rules nuclear dismantling?

2018, October 22 │24
14
With Reinforced Mnemonic Reader:
With QANet:

2018, October 22 │24
15

2018, October 22 │24
16
3 CONCLUSION & FUTURE WORKS
In this paper, we reviewed a large amount of Q&A neural network architectures, selected three, and trained them on
French Datasets. We devised a large Q&A French Dataset and pre-trained French word & character embedding
vectors. We trained and released Q&A French models performing F1 score around 70%.
In the future, we will continue to improve the accuracy of our Q&A French Training Datasets by fine-tuning the trans-
lation and the answers text reconstitution and position determination algorithm. We can even try to check the answers
by hand.
We will train and evaluate new Q&A neural network Architecture. There in average more than two new proposed
architectures reported on SQuAD leaderboard each month. We will also combine the best modules from different
architectures.
We will also try to overcome the translation drawback by using new translation methods that aligning monolingual
word embedding spaces in an unsupervised way. (E.g. Conneau et al. (2018), Artetxe et al. (2018) and Lample et al.
(2018)). In this way we will just use the more performing English trained Q&A neural network architectures models
with the French word embedding vectors aligned with the English ones utilized for the training of the given English
model.

2018, October 22 │24
17
4 REFERENCES
Attention-over-Attention Neural Networks for Reading Comprehension Yiming Cuiy, Zhipeng Cheny, SiWeiy, Shijin-
Wangy, Ting Liuz and Guoping Huy yJoint Laboratory of HIT and iFLYTEK, iFLYTEK Research, Beijing, China
zResearch Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, China
yfymcui,zpchen,siwei,sjwang3,gphug@iflytek.com ztliu@ir.hit.edu.cn, 2016
Reading Wikipedia to Answer Open-Domain Questions Danqi Chen_ Computer Science Stanford University Stan-
ford, CA 94305, USA danqi@cs.stanford.edu, 2017
Reinforced Mnemonic Reader for Machine Comprehension Minghao Hu Yuxing Peng School of Computer Science
National University of Defense Technology fhuminghao09,yxpengg@nudt.edu.cn, 2017
QANET: COMBINING LOCAL CONVOLUTION WITH GLOBAL SELF-ATTENTION FOR READING COMPREHEN-
SION Adams Wei Yu1 David Dohan2y, Minh-Thang Luong2y fweiyug@cs.cmu.edu, fddohan,thanglu-
ongg@google.com 1Carnegie Mellon University, 2Google Brain, 2018
Making Neural QA as Simple as Possible but not Simpler Dirk Weissenborn Georg Wiese Language Technology Lab,
DFKI Alt-Moabit 91c Berlin, Germany fdirk.weissenborn, georg.wiese, laura.seiffeg@dfki.de, 2017
NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE Dzmitry Bahdanau Ja-
cobs University remen, Germany KyungHyun Cho Yoshua Bengio∗ Universite de Montreal, 2016
Pointer Networks Oriol Vinyals∗ Google Brain Meire Fortunato∗ Department of Mathematics, UC Berkeley Navdeep
Jaitly Google Brain, 2015
BI-DIRECTIONAL ATTENTION FLOW FOR MACHINE COMPREHENSION Minjoon Seo1∗ Aniruddha Kembhavi2
Ali Farhadi1,2 Hananneh Hajishirzi1 University of Washington1 , Allen Institute for Artificial Intelligence2
{minjoon,ali,hannaneh}@cs.washington.edu, {anik}@allenai.org, 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar and Jian Zhang and Konstantin
Lopyrev and Percy Liang {pranavsr,zjian,klopyrev,pliang}@cs.stanford.edu Computer Science Department Stanford
University, 2016
GloVe: Global Vectors for Word Representation Jeffrey Pennington, Richard Socher, Christopher D. Manning Com-
puter Science Department, Stanford University, Stanford, CA 94305 jpennin@stanford.edu, richard@socher.org,
manning@stanford.edu, 2014
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov Google Inc. Mountain
View mikolov@google.com Ilya Sutskever Google Inc. Mountain View ilyasu@google.com, 2013
Combining Word-Level and Character-Level Representations for Relation Classification of Informal Text Dongyun
Liang,Weiran Xu, Yinge Zhao, PRIS, Beijing University of Posts and Telecommunications, China {dongyunliang,
xuweiran}@bupt.edu.cn yingezhao@outlook.com, 2017
Character-Aware Neural Language Models Yoon Kimy ySchool of Engineering and Applied Sciences Harvard Uni-
versity fyoonkim,srushg@seas.harvard.edu Yacine Jernite David Sontag Courant Institute of Mathematical Sciences
New York University fjernite,dsontagg@cs.nyu.edu, 2015
WORD TRANSLATION WITHOUT PARALLEL DATA Alexis Conneau y z , Guillaume Lample y x , Marc’Aurelio
Ranzatoy , Ludovic Denoyerx , Hervé Jégouy faconneau,glample,ranzato,rvjg@fb.com ludovic.denoyer@upmc.fr
UNSUPERVISED MACHINE TRANSLATION USING MONOLINGUAL CORPORA ONLY Guillaume Lample y z ,
Alexis Conneau y , Ludovic Denoyer z , Marc’Aure io Ranzato y y Facebook AI Research, z Sorbonne Universités,
UPMC Univ Paris 06, LIP6 UMR 606, CNRS fgl,aconneau,ranzatog@fb.com,ludovic.denoyer@lip6.fr, 2018
UNSUPERVISED NEURAL MACHINE TRANSLATION Mikel Artetxe, Gorka Labaka & Eneko Agirre IXA NLP roup
University of the Basque Country (UPV/EHU) fmikel.artetxe,gorka.labaka,e.agirreg@ehu.eus Kyunghyun ho New
York University CIFAR Azrieli Global Scholar kyunghyun.cho@nyu.edu, 2018
TEXT FEATURE EXTRACTION BASED ON DEEP LEARNING: A REVIEW Hong Liang, Xiao Sun, Yunlei Sun*
and Yuan Gao EURASIP Journal on Wireless Communications and Networking (2017) 2017:211
DOI 10.1186/s13638-017-0993-1

French machine reading for question answering

More Related Content

What's hot

Similar to French machine reading for question answering

Recently uploaded

French machine reading for question answering