9953330565 Low Rate Call Girls In Rohini Delhi NCR
Work in progress: ChatGPT as an Assistant in Paper Writing
1. Manuel Castro
Electrical and Computer Engineering
Department (DIEECTQAI), Industrial
Engineering School (ETSII), Spanish University
for Distance Education (UNED), Spain
mcastro@ieec.uned.es
https://www.slideshare.net/mmmcastro
Work in progress:
ChatGPT as an Assistant in Paper Writing
P. Baizan, R. Gil, F. Garcia-Loro, C. Perez, E.
SanCristobal and M. Castro
2. Introduction
Introduction
Advantages of Natural Language and ChatGPT (Generative IA) in Education:
Improves interaction: can be used as a conversation partner to practice languages
Helps with writing: can assist students with grammar, spelling and organizing their
thoughts for essays and research papers
Generates practice questions: can generate practice questions for students to
assess their understanding of a topic
Accessibility: can be used at anytime and anywhere, making it accessible students
with different needs and schedules
Novelty and new trending topic
3. Introduction
Introduction
Disadvantages of Natural Language and ChatGPT in the Education:
Replaces human teachers: automating teaching through ChatGPT can replace
human teachers and lead to job loss and a decrease in the quality of education
Replace human assessment (?)
Can perpetuate biases and stereotypes: ChatGPT is trained with internet text data,
which means it can perpetuate biases and stereotypes present in that data
Lack of regulation and standardization: there are currently no regulations and
standardizations for the use of ChatGPT in the education, which can make it
difficult to ensure quality and accuracy of the generated responses
Lack of understanding: many people do not fully understand how ChatGPT works,
which can lead to misinterpretations and inappropriate uses
4. Plagiarism Controversy
Its important use ChatGPT with
responsibility and always attribute
any text generated to the source
IEEE guidelines for artificial
intelligence (AI) generated text:
The use of artificial intelligence (AI)–
generated text in an article shall be
disclosed in the acknowledgements
section of any paper submitted to an
IEEE Conference or Periodical
The sections of the paper that use AI-
generated text shall have a citation to
the AI system used to generate the
text
5. Writing and Translator Assistant
ChatGPT can be a powerful tool that
can be used to improve translations
and writing
It can generate text in various
languages and its artificial intelligence
allows it to understand the context
and meaning of words. This means it
can provide suggestions and
corrections to improve grammar and
coherence of the text
ChatGPT is not a substitute for a human translator or editor
It is important to always review the text generated by ChatGPT before
using it
6. ChatGPT History
Developed by OpenAI based on the GPT, which, in turn, is based on an architecture
called Transformer (Introduced by Google 2017)
The original version of GPT was fine-tuned for specific task such as language
translation and question answering
Its was first introduced in 2018, and since then, it has been improved and updated to
make it more powerful
7. Alignment Problems
Low Capacity
High Alignment
High Capacity
Low Alignment
"alignment vs capability" can be
thought of as a more abstract analogue
of "accuracy vs precision"
The model-generated responses are not always
aligned with what expect. These alignment
problems are typically manifested in the
following ways:
Hallucinations: the model invents responses
Generation of biased or toxic results: since
models typically use large amount of text
data, if this data include biased or toxic
content, it can be reproduced in the output
Lack of assistance: the model fails to follow the user’s explicit instructions
Lack of interpretability: these models use an estimation of the probability of each
possible word (within their vocabulary) based on the previous sequence, guided in the
process by our prior knowledge and common sense
Prompt engineering: effectively communicating to an AI to get what you want
8. Reinforcement Learning with Human Feedback
(RLHF)
Step 1
For the collection of
previous training data,
a list of prompts is
selected
Group of human
annotators is asked to
write the response
they expect
This data is used to
Supervised fine-
tuned of ChatGPT
(SFT)
Step 2
A prompt and several
model outputs are
sampled
Group of human ranks
the outputs from best
to worst
This new data is used
to train the reward
model (RM)
Step 3
Proximal Policy
Optimization (PPO):
the reward model is
used to redefine and
improve the SFT model
To minimize alignment problems the creators of ChatGPT use the RLHF technique,
which involves human feedback
9. Reinforcement Learning with Human Feedback
(RLHF) Methodology Deficiencies
One very clear deficiency is that the data used to fine-tune the models is influenced by
a variety subjective factors:
Annotator preferences
Programmers who design the labeling instructions
The choice of prompts by the developers
Therefore, it is not a perfect model and, as already indicated, requires post-supervision
of the results
10. Experiment
One of challenges that non-native English speakers often face is writing scientific
articles in natural language. That's where ChatGPT can come in handy, as a tool to help
write articles in a more natural language
We provided it with Spanish phrases and asked it to complete and translate them,
then evaluated the different options it gave us
We submitted an English text with grammatical errors and asked ChatGPT to
review it for us
11. Evaluation
To evaluate the quality of the generated texts, in addition to a manual evaluation of the
different response options proposed by the tool, once the article is finished, it has been
analyzed using the following automatic evaluation metrics:
METEOR (Metric for Evaluation of Translation with Explicit Ordering)
BLEU (Bilingual Evaluation Understudy)
ROUGE (Recall-Oriented Understudy for Gisting Evaluation)
12. BLEU
The score is calculated as the match between the
key sentences of the machine translation and the
reference translations, with a maximum score of
1.0 indicating a perfect translation
The BLEU metric is based on the idea that high-
quality translations should contain a large number
of identical or similar phrases and fragments to
human reference translations
This metric is considered one of the standards for
evaluating the quality of automatic translations
13. METEOR
METEOR metric is calculated using several
components, including sentence precision, fluency
measure, and semantic similarity measure
This method of evaluation has a higher correlation
with human judgments of translation quality
Sentence precision refers to the percentage of
identical sentences between the machine
translation and the human reference translation
The fluency measure refers to the grammar and
syntax of the machine translation
The semantic similarity measure refers to the
ability of the machine translation to convey the
meaning of the human reference translation
Source: Wikipedia. Examples of
sentences scored by 'Meteor metric
14. ROUGE
This metric is used to evaluate the quality of the summaries
This metric is based on the idea that high-quality summaries should
contain many identical phrases to those of the human reference summary
The matching phrases between the automatic summary and the human
summary are compared, with a maximum score of 1.0 indicating a perfect
match
15. Plagiarism Evaluation
It is not designed to detect plagiarism and as a user you are responsible
for the use of the generated text
It is important to use the output of the model responsibly and to properly
cite any text that is generated
It is also important to keep in mind that ChatGPT is a language model and
not a plagiarism tool
Because the model is trained on a large dataset of text, it may generate text that is
similar to existing text found in the dataset
In our case, the idea is to use this tool as a help to translate original texts from
Spanish to English with a natural language. However, it is possible that when
generating those translations and completing the sentences, it uses similar texts. It is
also a good idea to check the generated text against plagiarism checkers and tools
before using it, to ensure that it is original and not copied from other sources
16. Experiment Results
GOOGLE MICROSOFT CHATGPT
BLEU 87.94 84.87 76.95
METEOR 92.12 91.31 91.45
The complete translation of the paper in ChatGPT, Google Translator and Microsoft
Translator has been compared. For this comparison, the BLEU y METEOR metrics have
been analyzed, obtaining the following results
17. Plagiarism Experiment
Another part of the experiment involves verifying how original a paper written with
the help of ChatGPT is. These results show that the PlagiarismCheckerX tool has not
detected any plagiarized phrases in the entire text of the paper
"At first, some parts of the document were
detected when using an IEEE template
without modifying the authors section and
the abstract for the test. Once these
sections were completed, the results are
shown in Figure"
18. Conclusions
ChatGPT exhibits the capacity to rectify and augment texts
and expressions, generating new text in a natural language
manner. Even ChatGPT can make grammatical corrections to
our text
This functionality of being able to modify texts underlines
the reason behind performing a plagiarism analysis. In
contrast, Google Translator and Microsoft Translator offer
translations that lean more toward literal interpretations.
Consequently, subjecting these last two translation tools to a
plagiarism analysis seems impractical, since the extent of
plagiarism in their production is at the discretion of the
author
All three tools display good metrics in translation, with
Google standing out minimally
19. Current Work
Currently, we are training a custom version of fine-tuning GPT-3.5-turbo-0613 with
IEEE Xplore papers on a specific topic (the VISIR remote laboratory)
We downloaded all the papers on a specific topic from IEEE, in this case, VISIR. And
stored all the PDFs in a folder that we will call 'trainingData'
Now we generate an API key to be able to use the
OpenAI API. The training service and its subsequent
usage have a cost, so we need to provide billing
information in advance
20. Current Work II
We build a LlamaIndex from the papers. Throughout the index creation process,
LlamaIndex interacts with the OpenAI text embedding API through the LangChain
framework. The resulting index is subsequently saved as 'index.json,' which serves
as a repository for future use. It is not necessary to generate the index each time
21. Now we have an expert ChatGPT, in our case about VISIR. When we send a
question to it, the system searches for relevant segments within the index. These
segments are matched with the query and transmitted to the GPT model API (gpt-
3.5-turbo) through LangChain
The resulting customized response about VISIR is displayed to the user
Current Work III
22. Current Work - Conclusions
Now we have a writing assistant that not only helps us with translation
and corrects our grammar but can also be an expert in the subject of
our paper
"What is VISIR?"
As a weakness, the issue of plagiarism arises once again. Therefore, it is
necessary to request references, confirm their accuracy, and
subsequently verify the authenticity of the texts
If this tool is used correctly and not as a plagiarism tool, it can be a
powerful writing assistance tool for papers
23. Future Work
Analyze this paper and another set of papers created with the help of ChatGPT
with a more reputable anti-plagiarism tool such as “TURNITIN”. With this anti-
plagiarism analysis using software based on our own databases we can
strengthen the analysis by ensuring that the texts do not appear in any free or
private internet texts
ChatGPT will be asked to help with the summary and their ROUGE metrics will
be obtained and compared to a summary done without the help of ChatGPT
All experiments were performed using ChatGPT version 3.5. However, version 4
is already available, but it does not yet support customization. As soon as it is
available, it would be interesting to run the experiments again and evaluate
the improvements
ALWAYS remember the responsible and knowledge use, as we did decades
ago with the simulation/emulation and other thecnologies used in education
and professional life
24. Thank you!
Manuel Castro
Electrical and Computer Engineering
Department (DIEECTQAI), Industrial
Engineering School (ETSII), Spanish University
for Distance Education (UNED), Spain
mcastro@ieec.uned.es
https://www.slideshare.net/mmmcastro
P. Baizan, R. Gil, F. Garcia-Loro, C. Perez, E.
SanCristobal and M. Castro
Editor's Notes
As can be observed, Google Translator obtains better metrics in translation and with a significant difference in the BLEU metric compared to ChatGPT. However, in the METEOR metric, ChatGPT improves and is close to Google Translator, surpassing Microsoft Translator. This improvement is due to the fact that BLEU, as already mentioned, only compares the difference between manually translated sentences and automatically translated ones. On the other hand, METEOR takes into account other factors such as the use of synonyms, similar semantics, etc. And this is where ChatGPT has its strength, as it not only translates but also helps the user in the writing, proposing or completing the writing.
This tool was chosen for the initial evaluation tests because it primarily works by searching the analyzed texts in numerous internet search engines. In our case, this is sufficient to evaluate ChatGPT since, as previously indicated, it has been trained with freely available data on the internet.