www.tu-chemnitz.de
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data Augmentation
Md Tajul Islam
Miriam Kreher
Mentor: Mahda Noura
www.tu-chemnitz.de
2
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
● What is Data Augmentation?
● Why do we need it?
● How does data augmentation work?
● Data Augmentation on Pictures
● Where to use Data augmentation?
● Data Augmentation in NLP
● Demonstration of a NLP problem
● Pros and cons
● Key takeaways
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Contents
www.tu-chemnitz.de
3
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
- Expand the size of a training set
by creating modified data from
the existing one.
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
What is Data Augmentation?
www.tu-chemnitz.de
4
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Purpose of Data Augmentation
Enlarge
dataset
Prevent
overfitting
Improve
performance model
www.tu-chemnitz.de
5
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
What to Augment!
Audio Texts Images Any other
types
www.tu-chemnitz.de
6
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data Augmentation techniques : For Images
Geometric transformations
Color space transformations
Mixing images
www.tu-chemnitz.de
7
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
In my presentation focus topic is Data Augmentation.
In my presentation theme topic is Data Augmentation -- Synonym Replacement
In my presentation focus topic is Data Augmentation techniques -- Random
Insertion
In my presentation topic focus is Data Augmentation -- Random Swap
In my presentation focus topic Data Augmentation --Random Deletion
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data Augmentation techniques : For Text
www.tu-chemnitz.de
8
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
- Noise Injection
- Shifting
- Changing the speed of the
Tape
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data Augmentation techniques : For Audio
www.tu-chemnitz.de
9
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Workflow of Data Augmentation
data set
training set
enlarged training set
validation set
Data
Augmentation
performance test
and evaluation
www.tu-chemnitz.de
10
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
When do I use data augmentation?
small data set complex problem transformation of
data would be
effective and
easy
You know how to
find the correct
method for your
data
You have a
strategy for
dealing with
overfitting
www.tu-chemnitz.de
11
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data Augmentation Applications
Natural Language Processing
Human-computer interaction, Understanding human emotion
Scientific research, medical fields
Image
classification Speech recognition, sound classification text classification, text generation
www.tu-chemnitz.de
12
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Natural Language Processing: some tasks
Translation tasks Speech emotion
recognition
Speech
recognition
Text
classification
Text-to-speech
syntax and
semtantic
analysis
Text generation,
dialogue
management
demo
www.tu-chemnitz.de
13
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
NLP and Data augmentation in speech emotion recognition
C. Etienne and B. Schmauch, "Speech Emotion Recognition with Data Augmentation and Layer-wise
Learning Rate Adjustment”
Problem:
class imbalance and small dataset
Solution:
Data augmentation by rescaling of
spectograms
Results:
Improvement in accuracy of
predictions
Baseline Best
model
Augmentation during
training
- - + +
Oversampling (×2) of
happiness and anger
- + + +
Frequency range
(kHz)
4 4 4 8
Weighted accuracy 66.4 63.5 64.2 64.5
Unweighted
accuracy
57.7 59.8 60.9 61.7
10-cross validation scores depending on the techniques applied (for each
experiment we present the results corresponding to its best run).
www.tu-chemnitz.de
14
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
NLP and Data augmentation in translation tasks
T. Potapczyk, P. Przybysz, M. Chochowski, A. Szumaczuk: “Samsung’s System for the IWSLT 2019 End-to-
End Speech Translation Task“
Problem:
Translate English to German TED lectures
Solution:
Data augmentation by applying audio
effects
Results:
Improvement in BLEU score
Model tst2010
ASR Transformer for SLT baseline 12.76
+ Model averaging 12.99
+ Dense FF in Decoder 13.20
+ Spectogram masking 13.74
+ Data augmentation (speed x1) 14.85
+ 8 head attention 15.31
+ Data augmentation (speed +
echo x1 )
15.56
BLEU scores for models trained on LibriSpeech data. Each
subsequent model includes all the previous techniques in the
table.
www.tu-chemnitz.de
15
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
NLP and Data augmentation in classification tasks
Problem: Performing text classification
depends on quality and quantity of data
Solution: Data augmentation by
application of multiple transformations
on text
Results: Performance gain of model if
right amount of data augmentation
chosen Performance on benchmark text classification tasks with and
without EDA, for various dataset sizes used for training. [1]
J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text
Classification Tasks”
[1] J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks", in Proceedings of the 2019
Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong
Kong, 2019, p. 6384.
www.tu-chemnitz.de
16
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: NLP text classification and Data Augmentation
dataset of movie
reviews
positive negative
- Create a model with
scikitlearn
- Process the data
- Split the data into test
and training set
Data Augmentation with
nlpaug
- Apply classifier:
RandomForestClassifier
- Evaluate the model
https://github.com/mimmimkr/nlp_dataaug
www.tu-chemnitz.de
17
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: Data augmentation in nlpaug
import nlpaug
import nlpaug.augmenter.word as naw
def write_vars_to_file(type):
for i in range(len(textdata)):
…
with open(path, “w”) as
file
file.write(doaugment(file,
type))
def doaugment(file, augtype):
...
if(augtype==’spell’):
aug = naw.SpellingAug()
for r in range(len(sentences)):
augsentence =
aug.augment(sentences[x])
augsentences.append(augsentence)
…
return ' '.join(augsentences)
www.tu-chemnitz.de
18
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: performing Data Augmentation
www.tu-chemnitz.de
19
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: Result of Augmentation with synonym replacement
a sci fi/comedy starring jack nicholson , pierce brosnan ,
annette benning , glenn close , martin short and other stars.
a warner bros picture
the martians have landed in this hillarous tim burton movie.
before entering the cinema , i was initially a little bit nervous
about what this film would be like .
many people were saying that this film was silly rubbish , and
there was no point to it all .
how wrong they were .
i left this film feeling much happier than i was before i entered
the cinema .
the story is about martians attacking earth .
using ray guns ( hooray ! )
they generally cause havoc around the u . s and other
countries.
a sci fi / funniness star knave nicholson, president pierce
brosnan, annette benning, john herschel glenn jr. close, dino
paul crocetti short and other stars.
a charles dudley warner bros picture
the martians hold landed in this hillarous tim burton motion
picture. before entering the movie theater, i was ab initio a little
bit nervous about what this plastic film would comprise similar.
many people were saying that this film be silly rubbish, and at
that place was no point to it all. how wrong they were.
i left this motion picture show feeling much happier than i was
before single entered the cinema.
the chronicle is about martians attacking worldly concern.
using shaft of light gun (hooray! )
they generally cause havoc around the u. due south and other
state.
original review augmented review
www.tu-chemnitz.de
20
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: Results and Discussion
accuracy of the predictions of the text classifier without augmentation
performed text
classification with
an 85% accuracy
augmented over
2000 text files
improved the model
with augmentation
www.tu-chemnitz.de
21
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: BONUS - image augmentation with imgaug
Augmentation by: cropping, scaling, artistic filters, weather, blur, rotation, flip...
www.tu-chemnitz.de
22
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: BONUS - image augmentation with imgaug
120
images
180
seconds
www.tu-chemnitz.de
23
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data augmentation: pros and cons
+ -
easy implementation, low effort time consuming (sometimes)
tailored approach, no model
changes
trial and error
performance boost: easy to
measure
relational gaps between data and
augmented data
can help with overfitting can lead to overfitting if not done
right
www.tu-chemnitz.de
24
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data augmentation: key takeaway
small data set model
enlarged data set
Data
Augmentation
www.tu-chemnitz.de
25
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data augmentation: References
U. Malik, “Text Classification with Python and Scikit-Learn,” Stack Abuse. [Online]. Available: https://stackabuse.com/text-classification-with-
python-and-scikit-learn/. [Accessed: 03-Dec-2020].
T. Tran, T. Pham, G. Carneiro, L. Palmer and I. Reid, "A Bayesian Data Augmentation Approach for Learning Deep Models", in 31st
Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, California, 2020.
J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks", in Proceedings of the
2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong
Kong, 2019, pp. 6382-6388.
C. Etienne, G. Fidanza, A. Petrovskii, L. Devillers, and B. Schmauch, “CNN+LSTM Architecture for Speech Emotion Recognition with Data
Augmentation,” arXiv.org, 11-Sep-2018. [Online]. Available: https://arxiv.org/abs/1802.05630. [Accessed: 15-Dec-2020].
T. Potapczyk, P. Przybysz, M. Chochowski, and A. Szumaczuk, “Samsung's System for the IWSLT 2019 End-to-End Speech Translation
Task,” Zenodo, 02-Nov-2019. [Online]. Available: https://zenodo.org/record/3525498. [Accessed: 15-Dec-2020].
E. Ma, “Data Augmentation in NLP,” Medium, 04-Jun-2019. [Online]. Available: https://towardsdatascience.com/data-augmentation-in-nlp-
2801a34dfc28. [Accessed: 15-Dec-2020].
V. Iosifidis and E. Ntoutsi, "Dealing with Bias via Data Augmentation in Supervised Learning Scenarios", Semanticscholar.org, 2020. [Online].
Available: http://ceur-ws.org/Vol-2103/paper_5.pdf. [Accessed: 04- Dec- 2020].
www.tu-chemnitz.de
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Thank you all

Data Augmentation

  • 1.
    www.tu-chemnitz.de Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data Augmentation Md Tajul Islam Miriam Kreher Mentor: Mahda Noura
  • 2.
    www.tu-chemnitz.de 2 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher ● What is Data Augmentation? ● Why do we need it? ● How does data augmentation work? ● Data Augmentation on Pictures ● Where to use Data augmentation? ● Data Augmentation in NLP ● Demonstration of a NLP problem ● Pros and cons ● Key takeaways SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Contents
  • 3.
    www.tu-chemnitz.de 3 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher - Expand the size of a training set by creating modified data from the existing one. SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION What is Data Augmentation?
  • 4.
    www.tu-chemnitz.de 4 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Purpose of Data Augmentation Enlarge dataset Prevent overfitting Improve performance model
  • 5.
    www.tu-chemnitz.de 5 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION What to Augment! Audio Texts Images Any other types
  • 6.
    www.tu-chemnitz.de 6 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data Augmentation techniques : For Images Geometric transformations Color space transformations Mixing images
  • 7.
    www.tu-chemnitz.de 7 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher In my presentation focus topic is Data Augmentation. In my presentation theme topic is Data Augmentation -- Synonym Replacement In my presentation focus topic is Data Augmentation techniques -- Random Insertion In my presentation topic focus is Data Augmentation -- Random Swap In my presentation focus topic Data Augmentation --Random Deletion SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data Augmentation techniques : For Text
  • 8.
    www.tu-chemnitz.de 8 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher - Noise Injection - Shifting - Changing the speed of the Tape SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data Augmentation techniques : For Audio
  • 9.
    www.tu-chemnitz.de 9 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Workflow of Data Augmentation data set training set enlarged training set validation set Data Augmentation performance test and evaluation
  • 10.
    www.tu-chemnitz.de 10 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION When do I use data augmentation? small data set complex problem transformation of data would be effective and easy You know how to find the correct method for your data You have a strategy for dealing with overfitting
  • 11.
    www.tu-chemnitz.de 11 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data Augmentation Applications Natural Language Processing Human-computer interaction, Understanding human emotion Scientific research, medical fields Image classification Speech recognition, sound classification text classification, text generation
  • 12.
    www.tu-chemnitz.de 12 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Natural Language Processing: some tasks Translation tasks Speech emotion recognition Speech recognition Text classification Text-to-speech syntax and semtantic analysis Text generation, dialogue management demo
  • 13.
    www.tu-chemnitz.de 13 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION NLP and Data augmentation in speech emotion recognition C. Etienne and B. Schmauch, "Speech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment” Problem: class imbalance and small dataset Solution: Data augmentation by rescaling of spectograms Results: Improvement in accuracy of predictions Baseline Best model Augmentation during training - - + + Oversampling (×2) of happiness and anger - + + + Frequency range (kHz) 4 4 4 8 Weighted accuracy 66.4 63.5 64.2 64.5 Unweighted accuracy 57.7 59.8 60.9 61.7 10-cross validation scores depending on the techniques applied (for each experiment we present the results corresponding to its best run).
  • 14.
    www.tu-chemnitz.de 14 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION NLP and Data augmentation in translation tasks T. Potapczyk, P. Przybysz, M. Chochowski, A. Szumaczuk: “Samsung’s System for the IWSLT 2019 End-to- End Speech Translation Task“ Problem: Translate English to German TED lectures Solution: Data augmentation by applying audio effects Results: Improvement in BLEU score Model tst2010 ASR Transformer for SLT baseline 12.76 + Model averaging 12.99 + Dense FF in Decoder 13.20 + Spectogram masking 13.74 + Data augmentation (speed x1) 14.85 + 8 head attention 15.31 + Data augmentation (speed + echo x1 ) 15.56 BLEU scores for models trained on LibriSpeech data. Each subsequent model includes all the previous techniques in the table.
  • 15.
    www.tu-chemnitz.de 15 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION NLP and Data augmentation in classification tasks Problem: Performing text classification depends on quality and quantity of data Solution: Data augmentation by application of multiple transformations on text Results: Performance gain of model if right amount of data augmentation chosen Performance on benchmark text classification tasks with and without EDA, for various dataset sizes used for training. [1] J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks” [1] J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks", in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, 2019, p. 6384.
  • 16.
    www.tu-chemnitz.de 16 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: NLP text classification and Data Augmentation dataset of movie reviews positive negative - Create a model with scikitlearn - Process the data - Split the data into test and training set Data Augmentation with nlpaug - Apply classifier: RandomForestClassifier - Evaluate the model https://github.com/mimmimkr/nlp_dataaug
  • 17.
    www.tu-chemnitz.de 17 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: Data augmentation in nlpaug import nlpaug import nlpaug.augmenter.word as naw def write_vars_to_file(type): for i in range(len(textdata)): … with open(path, “w”) as file file.write(doaugment(file, type)) def doaugment(file, augtype): ... if(augtype==’spell’): aug = naw.SpellingAug() for r in range(len(sentences)): augsentence = aug.augment(sentences[x]) augsentences.append(augsentence) … return ' '.join(augsentences)
  • 18.
    www.tu-chemnitz.de 18 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: performing Data Augmentation
  • 19.
    www.tu-chemnitz.de 19 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: Result of Augmentation with synonym replacement a sci fi/comedy starring jack nicholson , pierce brosnan , annette benning , glenn close , martin short and other stars. a warner bros picture the martians have landed in this hillarous tim burton movie. before entering the cinema , i was initially a little bit nervous about what this film would be like . many people were saying that this film was silly rubbish , and there was no point to it all . how wrong they were . i left this film feeling much happier than i was before i entered the cinema . the story is about martians attacking earth . using ray guns ( hooray ! ) they generally cause havoc around the u . s and other countries. a sci fi / funniness star knave nicholson, president pierce brosnan, annette benning, john herschel glenn jr. close, dino paul crocetti short and other stars. a charles dudley warner bros picture the martians hold landed in this hillarous tim burton motion picture. before entering the movie theater, i was ab initio a little bit nervous about what this plastic film would comprise similar. many people were saying that this film be silly rubbish, and at that place was no point to it all. how wrong they were. i left this motion picture show feeling much happier than i was before single entered the cinema. the chronicle is about martians attacking worldly concern. using shaft of light gun (hooray! ) they generally cause havoc around the u. due south and other state. original review augmented review
  • 20.
    www.tu-chemnitz.de 20 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: Results and Discussion accuracy of the predictions of the text classifier without augmentation performed text classification with an 85% accuracy augmented over 2000 text files improved the model with augmentation
  • 21.
    www.tu-chemnitz.de 21 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: BONUS - image augmentation with imgaug Augmentation by: cropping, scaling, artistic filters, weather, blur, rotation, flip...
  • 22.
    www.tu-chemnitz.de 22 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: BONUS - image augmentation with imgaug 120 images 180 seconds
  • 23.
    www.tu-chemnitz.de 23 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data augmentation: pros and cons + - easy implementation, low effort time consuming (sometimes) tailored approach, no model changes trial and error performance boost: easy to measure relational gaps between data and augmented data can help with overfitting can lead to overfitting if not done right
  • 24.
    www.tu-chemnitz.de 24 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data augmentation: key takeaway small data set model enlarged data set Data Augmentation
  • 25.
    www.tu-chemnitz.de 25 Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data augmentation: References U. Malik, “Text Classification with Python and Scikit-Learn,” Stack Abuse. [Online]. Available: https://stackabuse.com/text-classification-with- python-and-scikit-learn/. [Accessed: 03-Dec-2020]. T. Tran, T. Pham, G. Carneiro, L. Palmer and I. Reid, "A Bayesian Data Augmentation Approach for Learning Deep Models", in 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, California, 2020. J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks", in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, 2019, pp. 6382-6388. C. Etienne, G. Fidanza, A. Petrovskii, L. Devillers, and B. Schmauch, “CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation,” arXiv.org, 11-Sep-2018. [Online]. Available: https://arxiv.org/abs/1802.05630. [Accessed: 15-Dec-2020]. T. Potapczyk, P. Przybysz, M. Chochowski, and A. Szumaczuk, “Samsung's System for the IWSLT 2019 End-to-End Speech Translation Task,” Zenodo, 02-Nov-2019. [Online]. Available: https://zenodo.org/record/3525498. [Accessed: 15-Dec-2020]. E. Ma, “Data Augmentation in NLP,” Medium, 04-Jun-2019. [Online]. Available: https://towardsdatascience.com/data-augmentation-in-nlp- 2801a34dfc28. [Accessed: 15-Dec-2020]. V. Iosifidis and E. Ntoutsi, "Dealing with Bias via Data Augmentation in Supervised Learning Scenarios", Semanticscholar.org, 2020. [Online]. Available: http://ceur-ws.org/Vol-2103/paper_5.pdf. [Accessed: 04- Dec- 2020].
  • 26.
    www.tu-chemnitz.de Chemnitz ∙ 16.December 2020 ∙ Md Tajul Islam ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Thank you all