SlideShare a Scribd company logo
1 of 26
www.tu-chemnitz.de
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data Augmentation
Md Tajul Islam
Miriam Kreher
Mentor: Mahda Noura
www.tu-chemnitz.de
2
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
● What is Data Augmentation?
● Why do we need it?
● How does data augmentation work?
● Data Augmentation on Pictures
● Where to use Data augmentation?
● Data Augmentation in NLP
● Demonstration of a NLP problem
● Pros and cons
● Key takeaways
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Contents
www.tu-chemnitz.de
3
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
- Expand the size of a training set
by creating modified data from
the existing one.
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
What is Data Augmentation?
www.tu-chemnitz.de
4
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Purpose of Data Augmentation
Enlarge
dataset
Prevent
overfitting
Improve
performance model
www.tu-chemnitz.de
5
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
What to Augment!
Audio Texts Images Any other
types
www.tu-chemnitz.de
6
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data Augmentation techniques : For Images
Geometric transformations
Color space transformations
Mixing images
www.tu-chemnitz.de
7
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
In my presentation focus topic is Data Augmentation.
In my presentation theme topic is Data Augmentation -- Synonym Replacement
In my presentation focus topic is Data Augmentation techniques -- Random
Insertion
In my presentation topic focus is Data Augmentation -- Random Swap
In my presentation focus topic Data Augmentation --Random Deletion
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data Augmentation techniques : For Text
www.tu-chemnitz.de
8
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
- Noise Injection
- Shifting
- Changing the speed of the
Tape
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data Augmentation techniques : For Audio
www.tu-chemnitz.de
9
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Workflow of Data Augmentation
data set
training set
enlarged training set
validation set
Data
Augmentation
performance test
and evaluation
www.tu-chemnitz.de
10
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
When do I use data augmentation?
small data set complex problem transformation of
data would be
effective and
easy
You know how to
find the correct
method for your
data
You have a
strategy for
dealing with
overfitting
www.tu-chemnitz.de
11
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data Augmentation Applications
Natural Language Processing
Human-computer interaction, Understanding human emotion
Scientific research, medical fields
Image
classification Speech recognition, sound classification text classification, text generation
www.tu-chemnitz.de
12
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Natural Language Processing: some tasks
Translation tasks Speech emotion
recognition
Speech
recognition
Text
classification
Text-to-speech
syntax and
semtantic
analysis
Text generation,
dialogue
management
demo
www.tu-chemnitz.de
13
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
NLP and Data augmentation in speech emotion recognition
C. Etienne and B. Schmauch, "Speech Emotion Recognition with Data Augmentation and Layer-wise
Learning Rate Adjustment”
Problem:
class imbalance and small dataset
Solution:
Data augmentation by rescaling of
spectograms
Results:
Improvement in accuracy of
predictions
Baseline Best
model
Augmentation during
training
- - + +
Oversampling (×2) of
happiness and anger
- + + +
Frequency range
(kHz)
4 4 4 8
Weighted accuracy 66.4 63.5 64.2 64.5
Unweighted
accuracy
57.7 59.8 60.9 61.7
10-cross validation scores depending on the techniques applied (for each
experiment we present the results corresponding to its best run).
www.tu-chemnitz.de
14
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
NLP and Data augmentation in translation tasks
T. Potapczyk, P. Przybysz, M. Chochowski, A. Szumaczuk: “Samsung’s System for the IWSLT 2019 End-to-
End Speech Translation Task“
Problem:
Translate English to German TED lectures
Solution:
Data augmentation by applying audio
effects
Results:
Improvement in BLEU score
Model tst2010
ASR Transformer for SLT baseline 12.76
+ Model averaging 12.99
+ Dense FF in Decoder 13.20
+ Spectogram masking 13.74
+ Data augmentation (speed x1) 14.85
+ 8 head attention 15.31
+ Data augmentation (speed +
echo x1 )
15.56
BLEU scores for models trained on LibriSpeech data. Each
subsequent model includes all the previous techniques in the
table.
www.tu-chemnitz.de
15
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
NLP and Data augmentation in classification tasks
Problem: Performing text classification
depends on quality and quantity of data
Solution: Data augmentation by
application of multiple transformations
on text
Results: Performance gain of model if
right amount of data augmentation
chosen Performance on benchmark text classification tasks with and
without EDA, for various dataset sizes used for training. [1]
J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text
Classification Tasks”
[1] J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks", in Proceedings of the 2019
Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong
Kong, 2019, p. 6384.
www.tu-chemnitz.de
16
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: NLP text classification and Data Augmentation
dataset of movie
reviews
positive negative
- Create a model with
scikitlearn
- Process the data
- Split the data into test
and training set
Data Augmentation with
nlpaug
- Apply classifier:
RandomForestClassifier
- Evaluate the model
https://github.com/mimmimkr/nlp_dataaug
www.tu-chemnitz.de
17
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: Data augmentation in nlpaug
import nlpaug
import nlpaug.augmenter.word as naw
def write_vars_to_file(type):
for i in range(len(textdata)):
…
with open(path, “w”) as
file
file.write(doaugment(file,
type))
def doaugment(file, augtype):
...
if(augtype==’spell’):
aug = naw.SpellingAug()
for r in range(len(sentences)):
augsentence =
aug.augment(sentences[x])
augsentences.append(augsentence)
…
return ' '.join(augsentences)
www.tu-chemnitz.de
18
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: performing Data Augmentation
www.tu-chemnitz.de
19
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: Result of Augmentation with synonym replacement
a sci fi/comedy starring jack nicholson , pierce brosnan ,
annette benning , glenn close , martin short and other stars.
a warner bros picture
the martians have landed in this hillarous tim burton movie.
before entering the cinema , i was initially a little bit nervous
about what this film would be like .
many people were saying that this film was silly rubbish , and
there was no point to it all .
how wrong they were .
i left this film feeling much happier than i was before i entered
the cinema .
the story is about martians attacking earth .
using ray guns ( hooray ! )
they generally cause havoc around the u . s and other
countries.
a sci fi / funniness star knave nicholson, president pierce
brosnan, annette benning, john herschel glenn jr. close, dino
paul crocetti short and other stars.
a charles dudley warner bros picture
the martians hold landed in this hillarous tim burton motion
picture. before entering the movie theater, i was ab initio a little
bit nervous about what this plastic film would comprise similar.
many people were saying that this film be silly rubbish, and at
that place was no point to it all. how wrong they were.
i left this motion picture show feeling much happier than i was
before single entered the cinema.
the chronicle is about martians attacking worldly concern.
using shaft of light gun (hooray! )
they generally cause havoc around the u. due south and other
state.
original review augmented review
www.tu-chemnitz.de
20
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: Results and Discussion
accuracy of the predictions of the text classifier without augmentation
performed text
classification with
an 85% accuracy
augmented over
2000 text files
improved the model
with augmentation
www.tu-chemnitz.de
21
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: BONUS - image augmentation with imgaug
Augmentation by: cropping, scaling, artistic filters, weather, blur, rotation, flip...
www.tu-chemnitz.de
22
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Demo: BONUS - image augmentation with imgaug
120
images
180
seconds
www.tu-chemnitz.de
23
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data augmentation: pros and cons
+ -
easy implementation, low effort time consuming (sometimes)
tailored approach, no model
changes
trial and error
performance boost: easy to
measure
relational gaps between data and
augmented data
can help with overfitting can lead to overfitting if not done
right
www.tu-chemnitz.de
24
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data augmentation: key takeaway
small data set model
enlarged data set
Data
Augmentation
www.tu-chemnitz.de
25
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Data augmentation: References
U. Malik, “Text Classification with Python and Scikit-Learn,” Stack Abuse. [Online]. Available: https://stackabuse.com/text-classification-with-
python-and-scikit-learn/. [Accessed: 03-Dec-2020].
T. Tran, T. Pham, G. Carneiro, L. Palmer and I. Reid, "A Bayesian Data Augmentation Approach for Learning Deep Models", in 31st
Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, California, 2020.
J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks", in Proceedings of the
2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong
Kong, 2019, pp. 6382-6388.
C. Etienne, G. Fidanza, A. Petrovskii, L. Devillers, and B. Schmauch, “CNN+LSTM Architecture for Speech Emotion Recognition with Data
Augmentation,” arXiv.org, 11-Sep-2018. [Online]. Available: https://arxiv.org/abs/1802.05630. [Accessed: 15-Dec-2020].
T. Potapczyk, P. Przybysz, M. Chochowski, and A. Szumaczuk, “Samsung's System for the IWSLT 2019 End-to-End Speech Translation
Task,” Zenodo, 02-Nov-2019. [Online]. Available: https://zenodo.org/record/3525498. [Accessed: 15-Dec-2020].
E. Ma, “Data Augmentation in NLP,” Medium, 04-Jun-2019. [Online]. Available: https://towardsdatascience.com/data-augmentation-in-nlp-
2801a34dfc28. [Accessed: 15-Dec-2020].
V. Iosifidis and E. Ntoutsi, "Dealing with Bias via Data Augmentation in Supervised Learning Scenarios", Semanticscholar.org, 2020. [Online].
Available: http://ceur-ws.org/Vol-2103/paper_5.pdf. [Accessed: 04- Dec- 2020].
www.tu-chemnitz.de
Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam ∙ Miriam Kreher
SEMINAR WEB ENGINEERING (WS 2020/2021)
DATA AUGMENTATION
Thank you all

More Related Content

What's hot

Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningJörgen Sandig
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewPoo Kuan Hoong
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep LearningJulien SIMON
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowSri Ambati
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 

What's hot (20)

Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Cnn method
Cnn methodCnn method
Cnn method
 
AlexNet
AlexNetAlexNet
AlexNet
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Feature selection
Feature selectionFeature selection
Feature selection
 

Similar to Data Augmentation

Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Karel Dumon
 
Anchormen Jeroen Vlek
Anchormen Jeroen VlekAnchormen Jeroen Vlek
Anchormen Jeroen VlekTalentEvent
 
Visualization Techniques for Massive Datasets
Visualization Techniques for Massive DatasetsVisualization Techniques for Massive Datasets
Visualization Techniques for Massive DatasetsMatthias Trapp
 
Business Models - Introduction to Data Science
Business Models -  Introduction to Data ScienceBusiness Models -  Introduction to Data Science
Business Models - Introduction to Data ScienceFrank Kienle
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Siemens_connects_story
Siemens_connects_storySiemens_connects_story
Siemens_connects_storyNatasha Azar
 
HPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarHPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarMartin Hamilton
 
Industry Disruptors: AI, Machine Learning and Drones.
Industry Disruptors: AI, Machine Learning and Drones. Industry Disruptors: AI, Machine Learning and Drones.
Industry Disruptors: AI, Machine Learning and Drones. AnandSRao1962
 
DTIM 2016 - Post Event Report
DTIM 2016 - Post Event ReportDTIM 2016 - Post Event Report
DTIM 2016 - Post Event ReportRamona Kohrs
 
Wind Data USA 2017 brochure
Wind Data USA 2017 brochureWind Data USA 2017 brochure
Wind Data USA 2017 brochureHeather Smith
 
Wind Data US 2017 Brochure FINAL
Wind Data US 2017 Brochure FINALWind Data US 2017 Brochure FINAL
Wind Data US 2017 Brochure FINALDominic Coyne
 
Chris Carsten David Recommendations
Chris Carsten David RecommendationsChris Carsten David Recommendations
Chris Carsten David RecommendationsChristian David
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DBOnto
 
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDiadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDBOnto
 
AVTechnology August Issue
AVTechnology August IssueAVTechnology August Issue
AVTechnology August Issueravenind
 
Final Defence-Presentation15-Sep-2016
Final Defence-Presentation15-Sep-2016Final Defence-Presentation15-Sep-2016
Final Defence-Presentation15-Sep-2016Habib Ur Rehman
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 

Similar to Data Augmentation (20)

Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)Nexxworks bootcamp ML6 (27/09/2017)
Nexxworks bootcamp ML6 (27/09/2017)
 
Anchormen Jeroen Vlek
Anchormen Jeroen VlekAnchormen Jeroen Vlek
Anchormen Jeroen Vlek
 
Visualization Techniques for Massive Datasets
Visualization Techniques for Massive DatasetsVisualization Techniques for Massive Datasets
Visualization Techniques for Massive Datasets
 
Business Models - Introduction to Data Science
Business Models -  Introduction to Data ScienceBusiness Models -  Introduction to Data Science
Business Models - Introduction to Data Science
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Siemens_connects_story
Siemens_connects_storySiemens_connects_story
Siemens_connects_story
 
M2M Journal 2017
M2M Journal 2017 M2M Journal 2017
M2M Journal 2017
 
HPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarHPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC Seminar
 
07 chris davis
07 chris davis07 chris davis
07 chris davis
 
Bridging the future gap
Bridging the future gap Bridging the future gap
Bridging the future gap
 
Industry Disruptors: AI, Machine Learning and Drones.
Industry Disruptors: AI, Machine Learning and Drones. Industry Disruptors: AI, Machine Learning and Drones.
Industry Disruptors: AI, Machine Learning and Drones.
 
DTIM 2016 - Post Event Report
DTIM 2016 - Post Event ReportDTIM 2016 - Post Event Report
DTIM 2016 - Post Event Report
 
Wind Data USA 2017 brochure
Wind Data USA 2017 brochureWind Data USA 2017 brochure
Wind Data USA 2017 brochure
 
Wind Data US 2017 Brochure FINAL
Wind Data US 2017 Brochure FINALWind Data US 2017 Brochure FINAL
Wind Data US 2017 Brochure FINAL
 
Chris Carsten David Recommendations
Chris Carsten David RecommendationsChris Carsten David Recommendations
Chris Carsten David Recommendations
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
 
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDiadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meeting
 
AVTechnology August Issue
AVTechnology August IssueAVTechnology August Issue
AVTechnology August Issue
 
Final Defence-Presentation15-Sep-2016
Final Defence-Presentation15-Sep-2016Final Defence-Presentation15-Sep-2016
Final Defence-Presentation15-Sep-2016
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 

Recently uploaded

DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 

Recently uploaded (20)

DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 

Data Augmentation

  • 1. www.tu-chemnitz.de Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data Augmentation Md Tajul Islam Miriam Kreher Mentor: Mahda Noura
  • 2. www.tu-chemnitz.de 2 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher ● What is Data Augmentation? ● Why do we need it? ● How does data augmentation work? ● Data Augmentation on Pictures ● Where to use Data augmentation? ● Data Augmentation in NLP ● Demonstration of a NLP problem ● Pros and cons ● Key takeaways SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Contents
  • 3. www.tu-chemnitz.de 3 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher - Expand the size of a training set by creating modified data from the existing one. SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION What is Data Augmentation?
  • 4. www.tu-chemnitz.de 4 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Purpose of Data Augmentation Enlarge dataset Prevent overfitting Improve performance model
  • 5. www.tu-chemnitz.de 5 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION What to Augment! Audio Texts Images Any other types
  • 6. www.tu-chemnitz.de 6 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data Augmentation techniques : For Images Geometric transformations Color space transformations Mixing images
  • 7. www.tu-chemnitz.de 7 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher In my presentation focus topic is Data Augmentation. In my presentation theme topic is Data Augmentation -- Synonym Replacement In my presentation focus topic is Data Augmentation techniques -- Random Insertion In my presentation topic focus is Data Augmentation -- Random Swap In my presentation focus topic Data Augmentation --Random Deletion SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data Augmentation techniques : For Text
  • 8. www.tu-chemnitz.de 8 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher - Noise Injection - Shifting - Changing the speed of the Tape SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data Augmentation techniques : For Audio
  • 9. www.tu-chemnitz.de 9 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Workflow of Data Augmentation data set training set enlarged training set validation set Data Augmentation performance test and evaluation
  • 10. www.tu-chemnitz.de 10 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION When do I use data augmentation? small data set complex problem transformation of data would be effective and easy You know how to find the correct method for your data You have a strategy for dealing with overfitting
  • 11. www.tu-chemnitz.de 11 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data Augmentation Applications Natural Language Processing Human-computer interaction, Understanding human emotion Scientific research, medical fields Image classification Speech recognition, sound classification text classification, text generation
  • 12. www.tu-chemnitz.de 12 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Natural Language Processing: some tasks Translation tasks Speech emotion recognition Speech recognition Text classification Text-to-speech syntax and semtantic analysis Text generation, dialogue management demo
  • 13. www.tu-chemnitz.de 13 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION NLP and Data augmentation in speech emotion recognition C. Etienne and B. Schmauch, "Speech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment” Problem: class imbalance and small dataset Solution: Data augmentation by rescaling of spectograms Results: Improvement in accuracy of predictions Baseline Best model Augmentation during training - - + + Oversampling (×2) of happiness and anger - + + + Frequency range (kHz) 4 4 4 8 Weighted accuracy 66.4 63.5 64.2 64.5 Unweighted accuracy 57.7 59.8 60.9 61.7 10-cross validation scores depending on the techniques applied (for each experiment we present the results corresponding to its best run).
  • 14. www.tu-chemnitz.de 14 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION NLP and Data augmentation in translation tasks T. Potapczyk, P. Przybysz, M. Chochowski, A. Szumaczuk: “Samsung’s System for the IWSLT 2019 End-to- End Speech Translation Task“ Problem: Translate English to German TED lectures Solution: Data augmentation by applying audio effects Results: Improvement in BLEU score Model tst2010 ASR Transformer for SLT baseline 12.76 + Model averaging 12.99 + Dense FF in Decoder 13.20 + Spectogram masking 13.74 + Data augmentation (speed x1) 14.85 + 8 head attention 15.31 + Data augmentation (speed + echo x1 ) 15.56 BLEU scores for models trained on LibriSpeech data. Each subsequent model includes all the previous techniques in the table.
  • 15. www.tu-chemnitz.de 15 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION NLP and Data augmentation in classification tasks Problem: Performing text classification depends on quality and quantity of data Solution: Data augmentation by application of multiple transformations on text Results: Performance gain of model if right amount of data augmentation chosen Performance on benchmark text classification tasks with and without EDA, for various dataset sizes used for training. [1] J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks” [1] J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks", in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, 2019, p. 6384.
  • 16. www.tu-chemnitz.de 16 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: NLP text classification and Data Augmentation dataset of movie reviews positive negative - Create a model with scikitlearn - Process the data - Split the data into test and training set Data Augmentation with nlpaug - Apply classifier: RandomForestClassifier - Evaluate the model https://github.com/mimmimkr/nlp_dataaug
  • 17. www.tu-chemnitz.de 17 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: Data augmentation in nlpaug import nlpaug import nlpaug.augmenter.word as naw def write_vars_to_file(type): for i in range(len(textdata)): … with open(path, “w”) as file file.write(doaugment(file, type)) def doaugment(file, augtype): ... if(augtype==’spell’): aug = naw.SpellingAug() for r in range(len(sentences)): augsentence = aug.augment(sentences[x]) augsentences.append(augsentence) … return ' '.join(augsentences)
  • 18. www.tu-chemnitz.de 18 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: performing Data Augmentation
  • 19. www.tu-chemnitz.de 19 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: Result of Augmentation with synonym replacement a sci fi/comedy starring jack nicholson , pierce brosnan , annette benning , glenn close , martin short and other stars. a warner bros picture the martians have landed in this hillarous tim burton movie. before entering the cinema , i was initially a little bit nervous about what this film would be like . many people were saying that this film was silly rubbish , and there was no point to it all . how wrong they were . i left this film feeling much happier than i was before i entered the cinema . the story is about martians attacking earth . using ray guns ( hooray ! ) they generally cause havoc around the u . s and other countries. a sci fi / funniness star knave nicholson, president pierce brosnan, annette benning, john herschel glenn jr. close, dino paul crocetti short and other stars. a charles dudley warner bros picture the martians hold landed in this hillarous tim burton motion picture. before entering the movie theater, i was ab initio a little bit nervous about what this plastic film would comprise similar. many people were saying that this film be silly rubbish, and at that place was no point to it all. how wrong they were. i left this motion picture show feeling much happier than i was before single entered the cinema. the chronicle is about martians attacking worldly concern. using shaft of light gun (hooray! ) they generally cause havoc around the u. due south and other state. original review augmented review
  • 20. www.tu-chemnitz.de 20 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: Results and Discussion accuracy of the predictions of the text classifier without augmentation performed text classification with an 85% accuracy augmented over 2000 text files improved the model with augmentation
  • 21. www.tu-chemnitz.de 21 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: BONUS - image augmentation with imgaug Augmentation by: cropping, scaling, artistic filters, weather, blur, rotation, flip...
  • 22. www.tu-chemnitz.de 22 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Demo: BONUS - image augmentation with imgaug 120 images 180 seconds
  • 23. www.tu-chemnitz.de 23 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data augmentation: pros and cons + - easy implementation, low effort time consuming (sometimes) tailored approach, no model changes trial and error performance boost: easy to measure relational gaps between data and augmented data can help with overfitting can lead to overfitting if not done right
  • 24. www.tu-chemnitz.de 24 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data augmentation: key takeaway small data set model enlarged data set Data Augmentation
  • 25. www.tu-chemnitz.de 25 Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam Parvaz ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Data augmentation: References U. Malik, “Text Classification with Python and Scikit-Learn,” Stack Abuse. [Online]. Available: https://stackabuse.com/text-classification-with- python-and-scikit-learn/. [Accessed: 03-Dec-2020]. T. Tran, T. Pham, G. Carneiro, L. Palmer and I. Reid, "A Bayesian Data Augmentation Approach for Learning Deep Models", in 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, California, 2020. J. Wei and K. Zou, "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks", in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, 2019, pp. 6382-6388. C. Etienne, G. Fidanza, A. Petrovskii, L. Devillers, and B. Schmauch, “CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation,” arXiv.org, 11-Sep-2018. [Online]. Available: https://arxiv.org/abs/1802.05630. [Accessed: 15-Dec-2020]. T. Potapczyk, P. Przybysz, M. Chochowski, and A. Szumaczuk, “Samsung's System for the IWSLT 2019 End-to-End Speech Translation Task,” Zenodo, 02-Nov-2019. [Online]. Available: https://zenodo.org/record/3525498. [Accessed: 15-Dec-2020]. E. Ma, “Data Augmentation in NLP,” Medium, 04-Jun-2019. [Online]. Available: https://towardsdatascience.com/data-augmentation-in-nlp- 2801a34dfc28. [Accessed: 15-Dec-2020]. V. Iosifidis and E. Ntoutsi, "Dealing with Bias via Data Augmentation in Supervised Learning Scenarios", Semanticscholar.org, 2020. [Online]. Available: http://ceur-ws.org/Vol-2103/paper_5.pdf. [Accessed: 04- Dec- 2020].
  • 26. www.tu-chemnitz.de Chemnitz ∙ 16. December 2020 ∙ Md Tajul Islam ∙ Miriam Kreher SEMINAR WEB ENGINEERING (WS 2020/2021) DATA AUGMENTATION Thank you all