Work in progress: ChatGPT as an Assistant in Paper Writing

Manuel Castro
Electrical and Computer Engineering
Department (DIEECTQAI), Industrial
Engineering School (ETSII), Spanish University
for Distance Education (UNED), Spain
mcastro@ieec.uned.es
https://www.slideshare.net/mmmcastro
Work in progress:
ChatGPT as an Assistant in Paper Writing
P. Baizan, R. Gil, F. Garcia-Loro, C. Perez, E.
SanCristobal and M. Castro

Introduction
Introduction
Advantages of Natural Language and ChatGPT (Generative IA) in Education:
 Improves interaction: can be used as a conversation partner to practice languages
 Helps with writing: can assist students with grammar, spelling and organizing their
thoughts for essays and research papers
 Generates practice questions: can generate practice questions for students to
assess their understanding of a topic
 Accessibility: can be used at anytime and anywhere, making it accessible students
with different needs and schedules
 Novelty and new trending topic

Introduction
Introduction
Disadvantages of Natural Language and ChatGPT in the Education:
 Replaces human teachers: automating teaching through ChatGPT can replace
human teachers and lead to job loss and a decrease in the quality of education
 Replace human assessment (?)
 Can perpetuate biases and stereotypes: ChatGPT is trained with internet text data,
which means it can perpetuate biases and stereotypes present in that data
 Lack of regulation and standardization: there are currently no regulations and
standardizations for the use of ChatGPT in the education, which can make it
difficult to ensure quality and accuracy of the generated responses
 Lack of understanding: many people do not fully understand how ChatGPT works,
which can lead to misinterpretations and inappropriate uses

Plagiarism Controversy
Its important use ChatGPT with
responsibility and always attribute
any text generated to the source
IEEE guidelines for artificial
intelligence (AI) generated text:
The use of artificial intelligence (AI)–
generated text in an article shall be
disclosed in the acknowledgements
section of any paper submitted to an
IEEE Conference or Periodical
The sections of the paper that use AI-
generated text shall have a citation to
the AI system used to generate the
text

Writing and Translator Assistant
ChatGPT can be a powerful tool that
can be used to improve translations
and writing
It can generate text in various
languages and its artificial intelligence
allows it to understand the context
and meaning of words. This means it
can provide suggestions and
corrections to improve grammar and
coherence of the text
ChatGPT is not a substitute for a human translator or editor
It is important to always review the text generated by ChatGPT before
using it

ChatGPT History
Developed by OpenAI based on the GPT, which, in turn, is based on an architecture
called Transformer (Introduced by Google 2017)
The original version of GPT was fine-tuned for specific task such as language
translation and question answering
Its was first introduced in 2018, and since then, it has been improved and updated to
make it more powerful

Alignment Problems
Low Capacity
High Alignment
High Capacity
Low Alignment
"alignment vs capability" can be
thought of as a more abstract analogue
of "accuracy vs precision"
The model-generated responses are not always
aligned with what expect. These alignment
problems are typically manifested in the
following ways:
 Hallucinations: the model invents responses
 Generation of biased or toxic results: since
models typically use large amount of text
data, if this data include biased or toxic
content, it can be reproduced in the output
 Lack of assistance: the model fails to follow the user’s explicit instructions
 Lack of interpretability: these models use an estimation of the probability of each
possible word (within their vocabulary) based on the previous sequence, guided in the
process by our prior knowledge and common sense
 Prompt engineering: effectively communicating to an AI to get what you want

Reinforcement Learning with Human Feedback
(RLHF)
Step 1
For the collection of
previous training data,
a list of prompts is
selected
Group of human
annotators is asked to
write the response
they expect
This data is used to
Supervised fine-
tuned of ChatGPT
(SFT)
Step 2
A prompt and several
model outputs are
sampled
Group of human ranks
the outputs from best
to worst
This new data is used
to train the reward
model (RM)
Step 3
Proximal Policy
Optimization (PPO):
the reward model is
used to redefine and
improve the SFT model
To minimize alignment problems the creators of ChatGPT use the RLHF technique,
which involves human feedback

Reinforcement Learning with Human Feedback
(RLHF) Methodology Deficiencies
One very clear deficiency is that the data used to fine-tune the models is influenced by
a variety subjective factors:
 Annotator preferences
 Programmers who design the labeling instructions
 The choice of prompts by the developers
Therefore, it is not a perfect model and, as already indicated, requires post-supervision
of the results

Experiment
One of challenges that non-native English speakers often face is writing scientific
articles in natural language. That's where ChatGPT can come in handy, as a tool to help
write articles in a more natural language
 We provided it with Spanish phrases and asked it to complete and translate them,
then evaluated the different options it gave us
 We submitted an English text with grammatical errors and asked ChatGPT to
review it for us

Evaluation
To evaluate the quality of the generated texts, in addition to a manual evaluation of the
different response options proposed by the tool, once the article is finished, it has been
analyzed using the following automatic evaluation metrics:
 METEOR (Metric for Evaluation of Translation with Explicit Ordering)
 BLEU (Bilingual Evaluation Understudy)
 ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

BLEU
 The score is calculated as the match between the
key sentences of the machine translation and the
reference translations, with a maximum score of
1.0 indicating a perfect translation
 The BLEU metric is based on the idea that high-
quality translations should contain a large number
of identical or similar phrases and fragments to
human reference translations
 This metric is considered one of the standards for
evaluating the quality of automatic translations

METEOR
 METEOR metric is calculated using several
components, including sentence precision, fluency
measure, and semantic similarity measure
 This method of evaluation has a higher correlation
with human judgments of translation quality
 Sentence precision refers to the percentage of
identical sentences between the machine
translation and the human reference translation
 The fluency measure refers to the grammar and
syntax of the machine translation
 The semantic similarity measure refers to the
ability of the machine translation to convey the
meaning of the human reference translation
Source: Wikipedia. Examples of
sentences scored by 'Meteor metric

ROUGE
 This metric is used to evaluate the quality of the summaries
 This metric is based on the idea that high-quality summaries should
contain many identical phrases to those of the human reference summary
 The matching phrases between the automatic summary and the human
summary are compared, with a maximum score of 1.0 indicating a perfect
match

Plagiarism Evaluation
 It is not designed to detect plagiarism and as a user you are responsible
for the use of the generated text
 It is important to use the output of the model responsibly and to properly
cite any text that is generated
 It is also important to keep in mind that ChatGPT is a language model and
not a plagiarism tool
Because the model is trained on a large dataset of text, it may generate text that is
similar to existing text found in the dataset
In our case, the idea is to use this tool as a help to translate original texts from
Spanish to English with a natural language. However, it is possible that when
generating those translations and completing the sentences, it uses similar texts. It is
also a good idea to check the generated text against plagiarism checkers and tools
before using it, to ensure that it is original and not copied from other sources

Experiment Results
GOOGLE MICROSOFT CHATGPT
BLEU 87.94 84.87 76.95
METEOR 92.12 91.31 91.45
The complete translation of the paper in ChatGPT, Google Translator and Microsoft
Translator has been compared. For this comparison, the BLEU y METEOR metrics have
been analyzed, obtaining the following results

Plagiarism Experiment
Another part of the experiment involves verifying how original a paper written with
the help of ChatGPT is. These results show that the PlagiarismCheckerX tool has not
detected any plagiarized phrases in the entire text of the paper
"At first, some parts of the document were
detected when using an IEEE template
without modifying the authors section and
the abstract for the test. Once these
sections were completed, the results are
shown in Figure"

Conclusions
 ChatGPT exhibits the capacity to rectify and augment texts
and expressions, generating new text in a natural language
manner. Even ChatGPT can make grammatical corrections to
our text
 This functionality of being able to modify texts underlines
the reason behind performing a plagiarism analysis. In
contrast, Google Translator and Microsoft Translator offer
translations that lean more toward literal interpretations.
Consequently, subjecting these last two translation tools to a
plagiarism analysis seems impractical, since the extent of
plagiarism in their production is at the discretion of the
author
 All three tools display good metrics in translation, with
Google standing out minimally

Current Work
Currently, we are training a custom version of fine-tuning GPT-3.5-turbo-0613 with
IEEE Xplore papers on a specific topic (the VISIR remote laboratory)
 We downloaded all the papers on a specific topic from IEEE, in this case, VISIR. And
stored all the PDFs in a folder that we will call 'trainingData'
 Now we generate an API key to be able to use the
OpenAI API. The training service and its subsequent
usage have a cost, so we need to provide billing
information in advance

Current Work II
 We build a LlamaIndex from the papers. Throughout the index creation process,
LlamaIndex interacts with the OpenAI text embedding API through the LangChain
framework. The resulting index is subsequently saved as 'index.json,' which serves
as a repository for future use. It is not necessary to generate the index each time

 Now we have an expert ChatGPT, in our case about VISIR. When we send a
question to it, the system searches for relevant segments within the index. These
segments are matched with the query and transmitted to the GPT model API (gpt-
3.5-turbo) through LangChain
 The resulting customized response about VISIR is displayed to the user
Current Work III

Current Work - Conclusions
 Now we have a writing assistant that not only helps us with translation
and corrects our grammar but can also be an expert in the subject of
our paper
"What is VISIR?"
 As a weakness, the issue of plagiarism arises once again. Therefore, it is
necessary to request references, confirm their accuracy, and
subsequently verify the authenticity of the texts
 If this tool is used correctly and not as a plagiarism tool, it can be a
powerful writing assistance tool for papers

Future Work
 Analyze this paper and another set of papers created with the help of ChatGPT
with a more reputable anti-plagiarism tool such as “TURNITIN”. With this anti-
plagiarism analysis using software based on our own databases we can
strengthen the analysis by ensuring that the texts do not appear in any free or
private internet texts
 ChatGPT will be asked to help with the summary and their ROUGE metrics will
be obtained and compared to a summary done without the help of ChatGPT
 All experiments were performed using ChatGPT version 3.5. However, version 4
is already available, but it does not yet support customization. As soon as it is
available, it would be interesting to run the experiments again and evaluate
the improvements
 ALWAYS remember the responsible and knowledge use, as we did decades
ago with the simulation/emulation and other thecnologies used in education
and professional life

Thank you!
Manuel Castro
Electrical and Computer Engineering
Department (DIEECTQAI), Industrial
Engineering School (ETSII), Spanish University
for Distance Education (UNED), Spain
mcastro@ieec.uned.es
https://www.slideshare.net/mmmcastro
P. Baizan, R. Gil, F. Garcia-Loro, C. Perez, E.
SanCristobal and M. Castro

Work in progress: ChatGPT as an Assistant in Paper Writing

Recommended

Recommended

More Related Content

Similar to Work in progress: ChatGPT as an Assistant in Paper Writing

Similar to Work in progress: ChatGPT as an Assistant in Paper Writing (20)

More from Manuel Castro

More from Manuel Castro (20)

Recently uploaded

Recently uploaded (20)

Work in progress: ChatGPT as an Assistant in Paper Writing

Editor's Notes