SlideShare a Scribd company logo
1 of 17
Transformers: Data in
disguise
Dr David Playfoot
d.r.playfoot@swansea.ac.uk
Transformation
• We’ve talked about parametric assumptions
before
• A key one is the assumption that the data is
normally distributed.
• But often it isn’t
• What if we don’t want to sacrifice power?
Transformation
• We can transform the data
• This changes all the scores that we collect in
the same way – it changes the shape of the
distribution without altering only some scores
(i.e. cheating)
Transformation
• The most common ones are:
• Square Root Transformation
• Log Transformation
• They do slightly different things and are useful
in different situations
Transformation
• We’ve already talked about one
transformation quite extensively
• When we calculate z scores, we are actually
applying a transformation
• We are changing the scale on which
participants are measured – not the actual
score they got, but how many standard
deviations they are from zero.
• The easiest to explain is the Square Root
transformation.
• Say we had data that looked like this:
Square Root
• It is positively skewed
• If we square root every data point, it should
help
• Why? Because square rooting changes big
numbers more than it does small numbers
Square Root
• The square root transformation can have a
moderate effect on the skew of the data
• Here’s the plot of the data from before, after
transformation
• It’s better, but it’s not
right
Square Root
• This is a bit stronger than the square root
transformation, but works on a similar
principle
• There are a few types of log transformation
that might be used but we will use the “base
10” version here
Log transformation
• A log (full word is logarithm) is a power to
which a number must be raised in order to
make another number.
What’s a log?
• Well not really.
• Here’s an example using log base 10.
Log 100 = 2
• Why? We start with the base number (10).
We want to turn 10 into 100. To do so, we
have to raise 10 to the power of 2 (square it)
• 102 = 100
What’s a log?
• Try this one
Log 1000 = 3
• Why? We start with the base number (10).
We want to turn 10 into 1000. To do so, we
have to raise 10 to the power of 3 (cube it)
• 103 = 1000
What’s a log?
• The log transformation can have a hefty effect
on the skew of the data
• Here’s the plot of the data from before, after
log transformation
• It’s better again
Log transformation
Transformation
• Transformations can often reduce skew, but it
isn’t likely that it will completely remove it
• We’re just trying to get it to an acceptable
level so we can meet parametric assumptions
• If we succeed, we do the stats test on the
transformed data rather than the raw scores.
Descriptives
• The usefulness of a transformation is that it
allows us to use a more powerful inferential
statistic rather than resorting to non-
parametric tests.
• However, reporting means of transformed
variables isn’t very useful cos nobody knows
what they mean…
Descriptives
• Before reporting your descriptive statistics,
you need to undo whatever transformation
you did.
• E.g. mean from the analysis = 2
• Analysis was performed on square root
transformed data
• So 2 is the square root of the real mean score
• Report 4 as the mean score (which is 22)
Cautionary notes
• You can’t log or square root transform data
with negative numbers in
• The methods in these slides will only work for
positively skewed data
• If your data is negatively skewed, you have to
reverse the scores before applying the
transformation (I’ll show you how in the
video)

More Related Content

Similar to Transformers: Data in Disguise

Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptxkpcp
 
The Kanban Retrospective
The Kanban RetrospectiveThe Kanban Retrospective
The Kanban RetrospectiveColleen Johnson
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxNAGARAJANS68
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Maninda Edirisooriya
 
overview of_data_processing
overview of_data_processingoverview of_data_processing
overview of_data_processingFEG
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Maninda Edirisooriya
 
Mixed Effects Models - Centering and Transformations
Mixed Effects Models - Centering and TransformationsMixed Effects Models - Centering and Transformations
Mixed Effects Models - Centering and TransformationsScott Fraundorf
 
Sess03 Dimension Reduction Methods.pptx
Sess03 Dimension Reduction Methods.pptxSess03 Dimension Reduction Methods.pptx
Sess03 Dimension Reduction Methods.pptxSarthakKabi1
 
Kaggle Days Madrid - Alberto Danese
Kaggle Days Madrid - Alberto DaneseKaggle Days Madrid - Alberto Danese
Kaggle Days Madrid - Alberto DaneseAlberto Danese
 

Similar to Transformers: Data in Disguise (20)

CPP12 - Algorithms
CPP12 - AlgorithmsCPP12 - Algorithms
CPP12 - Algorithms
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Unit 4 dbms
Unit 4 dbmsUnit 4 dbms
Unit 4 dbms
 
4.transformations
4.transformations4.transformations
4.transformations
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx
 
The Kanban Retrospective
The Kanban RetrospectiveThe Kanban Retrospective
The Kanban Retrospective
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
Pointers by: Professor Lili Saghafi
Pointers by: Professor Lili SaghafiPointers by: Professor Lili Saghafi
Pointers by: Professor Lili Saghafi
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
06 LINQ
06 LINQ06 LINQ
06 LINQ
 
overview of_data_processing
overview of_data_processingoverview of_data_processing
overview of_data_processing
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
 
Mixed Effects Models - Centering and Transformations
Mixed Effects Models - Centering and TransformationsMixed Effects Models - Centering and Transformations
Mixed Effects Models - Centering and Transformations
 
Sess03 Dimension Reduction Methods.pptx
Sess03 Dimension Reduction Methods.pptxSess03 Dimension Reduction Methods.pptx
Sess03 Dimension Reduction Methods.pptx
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
Kaggle Days Madrid - Alberto Danese
Kaggle Days Madrid - Alberto DaneseKaggle Days Madrid - Alberto Danese
Kaggle Days Madrid - Alberto Danese
 
datacub
datacubdatacub
datacub
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
8-1-11
8-1-118-1-11
8-1-11
 

More from drplayfoot

Ultimate fighting chi square
Ultimate fighting chi squareUltimate fighting chi square
Ultimate fighting chi squaredrplayfoot
 
The Great Dividers
The Great DividersThe Great Dividers
The Great Dividersdrplayfoot
 
Non-parametric earworms using Friedman test
Non-parametric earworms using Friedman testNon-parametric earworms using Friedman test
Non-parametric earworms using Friedman testdrplayfoot
 
All I want for Christmas is U
All I want for Christmas is UAll I want for Christmas is U
All I want for Christmas is Udrplayfoot
 
Don't worry, you can't go blind from ranking
Don't worry, you can't go blind from rankingDon't worry, you can't go blind from ranking
Don't worry, you can't go blind from rankingdrplayfoot
 
Standard error and sample size
Standard error and sample sizeStandard error and sample size
Standard error and sample sizedrplayfoot
 
Correlation recap
Correlation recapCorrelation recap
Correlation recapdrplayfoot
 
Conor, Khabib and chi square
Conor, Khabib and chi squareConor, Khabib and chi square
Conor, Khabib and chi squaredrplayfoot
 
Excel formulae
Excel formulaeExcel formulae
Excel formulaedrplayfoot
 

More from drplayfoot (9)

Ultimate fighting chi square
Ultimate fighting chi squareUltimate fighting chi square
Ultimate fighting chi square
 
The Great Dividers
The Great DividersThe Great Dividers
The Great Dividers
 
Non-parametric earworms using Friedman test
Non-parametric earworms using Friedman testNon-parametric earworms using Friedman test
Non-parametric earworms using Friedman test
 
All I want for Christmas is U
All I want for Christmas is UAll I want for Christmas is U
All I want for Christmas is U
 
Don't worry, you can't go blind from ranking
Don't worry, you can't go blind from rankingDon't worry, you can't go blind from ranking
Don't worry, you can't go blind from ranking
 
Standard error and sample size
Standard error and sample sizeStandard error and sample size
Standard error and sample size
 
Correlation recap
Correlation recapCorrelation recap
Correlation recap
 
Conor, Khabib and chi square
Conor, Khabib and chi squareConor, Khabib and chi square
Conor, Khabib and chi square
 
Excel formulae
Excel formulaeExcel formulae
Excel formulae
 

Recently uploaded

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 

Recently uploaded (20)

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 

Transformers: Data in Disguise

  • 1. Transformers: Data in disguise Dr David Playfoot d.r.playfoot@swansea.ac.uk
  • 2. Transformation • We’ve talked about parametric assumptions before • A key one is the assumption that the data is normally distributed. • But often it isn’t • What if we don’t want to sacrifice power?
  • 3. Transformation • We can transform the data • This changes all the scores that we collect in the same way – it changes the shape of the distribution without altering only some scores (i.e. cheating)
  • 4. Transformation • The most common ones are: • Square Root Transformation • Log Transformation • They do slightly different things and are useful in different situations
  • 5. Transformation • We’ve already talked about one transformation quite extensively • When we calculate z scores, we are actually applying a transformation • We are changing the scale on which participants are measured – not the actual score they got, but how many standard deviations they are from zero.
  • 6. • The easiest to explain is the Square Root transformation. • Say we had data that looked like this: Square Root
  • 7. • It is positively skewed • If we square root every data point, it should help • Why? Because square rooting changes big numbers more than it does small numbers Square Root
  • 8. • The square root transformation can have a moderate effect on the skew of the data • Here’s the plot of the data from before, after transformation • It’s better, but it’s not right Square Root
  • 9. • This is a bit stronger than the square root transformation, but works on a similar principle • There are a few types of log transformation that might be used but we will use the “base 10” version here Log transformation
  • 10. • A log (full word is logarithm) is a power to which a number must be raised in order to make another number. What’s a log?
  • 11. • Well not really. • Here’s an example using log base 10. Log 100 = 2 • Why? We start with the base number (10). We want to turn 10 into 100. To do so, we have to raise 10 to the power of 2 (square it) • 102 = 100 What’s a log?
  • 12. • Try this one Log 1000 = 3 • Why? We start with the base number (10). We want to turn 10 into 1000. To do so, we have to raise 10 to the power of 3 (cube it) • 103 = 1000 What’s a log?
  • 13. • The log transformation can have a hefty effect on the skew of the data • Here’s the plot of the data from before, after log transformation • It’s better again Log transformation
  • 14. Transformation • Transformations can often reduce skew, but it isn’t likely that it will completely remove it • We’re just trying to get it to an acceptable level so we can meet parametric assumptions • If we succeed, we do the stats test on the transformed data rather than the raw scores.
  • 15. Descriptives • The usefulness of a transformation is that it allows us to use a more powerful inferential statistic rather than resorting to non- parametric tests. • However, reporting means of transformed variables isn’t very useful cos nobody knows what they mean…
  • 16. Descriptives • Before reporting your descriptive statistics, you need to undo whatever transformation you did. • E.g. mean from the analysis = 2 • Analysis was performed on square root transformed data • So 2 is the square root of the real mean score • Report 4 as the mean score (which is 22)
  • 17. Cautionary notes • You can’t log or square root transform data with negative numbers in • The methods in these slides will only work for positively skewed data • If your data is negatively skewed, you have to reverse the scores before applying the transformation (I’ll show you how in the video)