SlideShare a Scribd company logo
1 of 23
Download to read offline
Classes without dependencies
Teaching the tidyverse to ļ¬rst year science students
Sam Clifford, Iwona Czaplinski, Brett Fyļ¬eld, Sama Low-Choy, Belinda
Spratt, Amy Stringer, Nicholas Tierney
2018-07-12
The student bodyā€™s got a bad preparation
SEB113 a core unit in QUTā€™s 2013 redesign of Bachelor of Science
Introduce key math/stats concepts needed for ļ¬rst year science
OP 13 cutoff (ATAR 65)
Assumed knowledge: Intermediate Mathematics
Some calculus and statistics
Not formally required
Diagnostic test and weekly prep material
Basis for further study in disciplines (explicit or embedded)
Still needs to be a self-contained unit that teaches skills
What they need is adult education
Engaging students with use of maths/stats in science
Build good statistical habits from the start
Have students doing analysis
that is relevant to their needs
as quickly as possible
competently
with skills that can be built on
Introduction to programming
reproducibility
separating analysis from the raw data
ļ¬‚exibility beyond menus
correcting mistakes becomes easier
You go back to school
Bad old days
Manual calculation of test statistics
Reliance on statistical tables
Donā€™t want to replicate senior high school study
Reduce reliance on point and click software that only does
everything students need right now (Excel, Minitab)
Students donā€™t need to become R developers
Focus on functionality rather than directly controlling every element,
e.g. LATEXvs Word
Itā€™s a bad situation
Initial course development was not tidy
New B Sc course brought forward
Grab bag of topics at request of science academics
Difļ¬cult to ļ¬nd tutors who could think outside ā€œtraditionalā€ stat. ed.
very low student satisfaction initially
Rapid and radical redesign required
tidyverse an integrated suite focused on transforming data frames
Vectorisation > loops
RStudio > JGR > Rgui.exe
What you want is an adult education (Oh yeah!)
Compassion and support for learners
Problem- and model-based
Technology should support learning goals
Go further, quicker by not focussing on mechanical calculations
Workļ¬‚ow based on functions rather than element manipulation
Statistics is an integral part of science
Statistics isnā€™t about generating p values
see Cobb in Wasserstein and Lazar [2016]
Machines do the work so people have time to think ā€“ IBM (1967)
All models are wrong, but some are useful ā€“ Box (1987)
Now here we go dropping science, dropping it all over
Within context of scientiļ¬c method:
Aims
Methods and Materials
1. Get data/model into an analysis environment
2. Data munging
Results
3. Exploration of data/model
4. Compute model
5. Model diagnostics
Conclusion
6. Interpret meaning of results
I said you wanna be startinā€™ somethinā€™
Redesign around ggplot2
ggplot2 introduced us to tidy data requirements
Redesign based on Year 11 summer camp
This approach not covered by textbooks at the time
Tried using JGR and Plot Builder for one semester
Extension to wider tidyverse
Replace unrelated packages/functions with uniļ¬ed approach
Focus on what you want rather than directly coding how to do it
Good effort-reward with limited expertise
Summer(ise) loving, had me a blast; summer(ise) loving,
happened so fast
R is a giant calculator that can operate on objects
ggplot() requires a data frame object
dplyr::summarise() to summarise a column variable
dplyr::group_by() to do summary according to speciļ¬ed
structure
Copy-paste or looping not guaranteed to be MECE
Group-level summary stats leads to potential statistical models
Easier, less error prone, than repeated usage of =AVERAGE()
We want the funk(tional programming paradigm)
Tidy data as observations of variables with structure [Wickham,
2014b]
R as functional programming [Wickham, 2014a]
Actions on entire objects to do things to data and return useful
information
Students enter understanding functions like y(x) = x2
function takes input
function returns output
e.g. mean(x) = i xi/n
Week 4: writing functions to solve calculus problems
magrittr::%>% too conceptually similar to ggplot2::+ for
novices to grasp in ļ¬rst course
Like Frankie sang, I did it my way
Whatā€™s the mean gas mileage for each engine geometry and
transmission type for the 32 cars listed in 1974 Motor Trends
magazine?
Loops For each of the pre-computed number of
groups, subset, summarise and store how
you want
tapply() INDEX a list of k vectors, 1 summary
FUNction, returns k-dimensional array
dplyr specify grouping variables and which sum-
mary statistics, returns tidy data frame ready
for model/plot
Night of the living baseheads
Like all procedural languages, plot() has one giant list of
arguments
Focus is on how plot is drawn rather than what you want to plot
Inefļ¬ciency of keystrokes
re-stating the things being plotted
setting up plot axis limits
loop counters for small multiples, etc.
Toot toot, chugga chugga, big red car
Say we want to plot carsā€™ fuel efļ¬ciency against weight
library(tidyverse)
data(mtcars)
mtcars <- mutate(
mtcars, l100km = 235.2146/mpg,
wt_T = wt/2.2046,
am = factor(am, levels = c(0,1),
labels=c("Auto", "Manual")),
vs = factor(vs, levels = c(0,1),
labels=c("V","S")))
plot(y=mtcars$l100km, x=mtcars$wt_T)
1.0 1.5 2.0 2.5
101520
mtcars$wt_T
mtcars$l100km
Fairly quick to say what
goes on x and y axes
More arguments ā†’ better
graph
xlim, ylim
xlab, ylab
main
type, pch
What if we want to see how
it varies with
engine geometry
transmission type
The wisdom of the fool wonā€™t set you free
yrange <- range(mtcars$l100km)
xrange <- range(mtcars$wt_T)
levs <- expand.grid(vs = c("V", "S"),
am = c("Auto", "Manual"))
par(mfrow = c(2,2))
for (i in 1:nrow(levs)){
dat_to_plot <- merge(levs[i, ], mtcars)
plot(dat_to_plot$l100km ~ dat_to_plot$wt_T, pch=16,
xlab="Weight (t)", xlim=xrange,
ylab="Fuel efļ¬ciency (L/100km)",
ylim=yrange,
main = sprintf("%s-%s", levs$am[i],
levs$vs[i]))}
1.0 1.5 2.0 2.5
101520
Autoāˆ’V
Weight (t)
Fuelefficiency(L/100km)
1.0 1.5 2.0 2.5
101520
Autoāˆ’S
Weight (t)
Fuelefficiency(L/100km)
1.0 1.5 2.0 2.5
101520
Manualāˆ’V
Weight (t)
Fuelefficiency(L/100km)
1.0 1.5 2.0 2.5
101520
Manualāˆ’S
Weight (t)
Fuelefficiency(L/100km)
ggplot(data = mtcars,
aes(x = wt_T,
y = l100km)) +
geom_point() +
facet_grid(am ~ vs) +
theme_bw() +
xlab("Weight (t)") +
ylab("Fuel efļ¬ciency (L/100km)")
V S
AutoManual
1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5
10
15
20
10
15
20
Weight (t)
Fuelefficiency(L/100km)
One, two, princes kneel before you
Both approaches do the same thing
Idea base ggplot2
Plot variables Specify vectors Coordinate system de-
ļ¬ned by variables
Small multiples Loops, subsets, par facet_grid
Common axes Pre-computed Inherited from data
V/S A/M annotation Strings Inherited from data
Axis labels Per axis set For whole plot
Focus on putting things on the page vs representing variables
I got a grammar Hazel and a grammar Tilly
Plots are built from [Wickham, 2010]
data ā€“ which variables are mapped to aesthetic elements
geometry ā€“ how do we draw the data?
annotations ā€“ what is the context of these shapes?
Build more complex plots by adding commands and layering elements,
rather than by stacking individual points and lines e.g.
make a scatter plot, THEN
add a trend line (with inherited x, y), THEN
facet by grouping variable, THEN
change axis information
When Iā€™m good, Iā€™m very good; but when Iā€™m bad, Iā€™m better
Want to make good plots as soon as possible
Learning about Tufteā€™s principles [Tufte, 1983, Pantoliano, 2012]
Discuss what makes a plot good and bad
Seeing how ggplot2 code translates into graphical elements
Week 2 workshop has students making best and worst plots for a
data set, e.g.
Sie ist ein Model und sie sieht gut aus
Make use of broom package to get model summaries
Get data frames rather than summary.lm() text vomit
tidy()
parameter estimates
CIs
t test info [Greenland et al., 2016]
glance()
everything else
ggplot2::fortify()
regression diagnostic info instead of plot.lm()
stat_qq(aes(x=.stdresid)) for residual quantiles
geom_point(aes(x=.ļ¬tted, y=.resid)) for ļ¬tted vs
residuals
When you hear some feedback keep going take it higher
Positives
More conļ¬dence and students see use of maths/stats in science
Students enjoy group discussions in workshops
Some students continue using R over Excel in future units
Labs can be done online in own time
Negatives
Request for more face to face help rather than online
Labs can be done online in own time (but are they?)
Downloading of slides rather than attending/watching lectures
Things can only get better
Focus on what you want from R rather than how you do it
representing variables graphically
summarising over structure in data
tidiers for models
Statistics embedded in scientiļ¬c theory [Diggle and Chetwynd, 2011]
Problem-based learning
groups of novices
supervised by tutors
discussion of various approaches
Peter J. Diggle and Amanda G. Chetwynd. Statistics and Scientiļ¬c
Method: An Introduction for Students and Researchers. Oxford
University Press, 2011.
Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin,
Charles Poole, Steven N. Goodman, and Douglas G. Altman.
Statistical tests, p values, conļ¬dence intervals, and power: a guide to
misinterpretations. European Journal of Epidemiology, 31(4):337ā€“350,
apr 2016. URL https://doi.org/10.1007/s10654-016-0149-3.
Mike Pantoliano. Data visualization principles: Lessons from Tufte, 2012.
URL https:
//moz.com/blog/data-visualization-principles-lessons-from-tufte.
Edward Tufte. The Visual Display of Quantitative Information. Graphics
Press, 1983.
Ronald L. Wasserstein and Nicole A. Lazar. The ASA's statement on
p-values: Context, process, and purpose. The American Statistician, 70
(2):129ā€“133, Apr 2016. URL
https://doi.org/10.1080/00031305.2016.1154108.
H. Wickham. Advanced R. Chapman & Hall/CRC The R Series. Taylor &
Francis, 2014a. ISBN 9781466586963. URL
https://books.google.com.au/books?id=PFHFNAEACAAJ.
Hadley Wickham. A layered grammar of graphics. Journal of
Computational and Graphical Statistics, 19(1):3ā€“28, 2010. doi:
10.1198/jcgs.2009.07098.
Hadley Wickham. Tidy data. Journal of Statistical Software, 59(1):1ā€“23,
2014b. ISSN 1548-7660. URL
https://www.jstatsoft.org/index.php/jss/article/view/v059i10.

More Related Content

What's hot

Data Structures 2004
Data Structures 2004Data Structures 2004
Data Structures 2004Sanjay Goel
Ā 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
Ā 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
Ā 
Solving dynamics problems with matlab
Solving dynamics problems with matlabSolving dynamics problems with matlab
Solving dynamics problems with matlabSĆ©rgio Castilho
Ā 
Differences-in-Differences
Differences-in-DifferencesDifferences-in-Differences
Differences-in-DifferencesJaehyun Song
Ā 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
Ā 
La statistique et le machine learning pour l'intƩgration de donnƩes de la bio...
La statistique et le machine learning pour l'intƩgration de donnƩes de la bio...La statistique et le machine learning pour l'intƩgration de donnƩes de la bio...
La statistique et le machine learning pour l'intƩgration de donnƩes de la bio...tuxette
Ā 
Intuition ā€“ Based Teaching Mathematics for Engineers
Intuition ā€“ Based Teaching Mathematics for EngineersIntuition ā€“ Based Teaching Mathematics for Engineers
Intuition ā€“ Based Teaching Mathematics for EngineersIDES Editor
Ā 
ALTERNATIVE METHOD TO LINEAR CONGRUENCE
ALTERNATIVE METHOD TO LINEAR CONGRUENCEALTERNATIVE METHOD TO LINEAR CONGRUENCE
ALTERNATIVE METHOD TO LINEAR CONGRUENCEPolemer Cuarto
Ā 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishtuxette
Ā 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biologytuxette
Ā 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and ClusteringUsha Vijay
Ā 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-Ihktripathy
Ā 
Grouping and Displaying Data to Convey Meaning: Tables & Graphs chapter_2 _fr...
Grouping and Displaying Data to Convey Meaning: Tables & Graphs chapter_2 _fr...Grouping and Displaying Data to Convey Meaning: Tables & Graphs chapter_2 _fr...
Grouping and Displaying Data to Convey Meaning: Tables & Graphs chapter_2 _fr...Prashant Borkar
Ā 
Presentation1
Presentation1Presentation1
Presentation1Vikas Saxena
Ā 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
Ā 
Decision Tree Algorithm Implementation Using Educational Data
Decision Tree Algorithm Implementation  Using Educational Data Decision Tree Algorithm Implementation  Using Educational Data
Decision Tree Algorithm Implementation Using Educational Data ijcax
Ā 

What's hot (20)

Data Structures 2004
Data Structures 2004Data Structures 2004
Data Structures 2004
Ā 
Pca
PcaPca
Pca
Ā 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
Ā 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
Ā 
Solving dynamics problems with matlab
Solving dynamics problems with matlabSolving dynamics problems with matlab
Solving dynamics problems with matlab
Ā 
Differences-in-Differences
Differences-in-DifferencesDifferences-in-Differences
Differences-in-Differences
Ā 
PCA
PCAPCA
PCA
Ā 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
Ā 
La statistique et le machine learning pour l'intƩgration de donnƩes de la bio...
La statistique et le machine learning pour l'intƩgration de donnƩes de la bio...La statistique et le machine learning pour l'intƩgration de donnƩes de la bio...
La statistique et le machine learning pour l'intƩgration de donnƩes de la bio...
Ā 
Intuition ā€“ Based Teaching Mathematics for Engineers
Intuition ā€“ Based Teaching Mathematics for EngineersIntuition ā€“ Based Teaching Mathematics for Engineers
Intuition ā€“ Based Teaching Mathematics for Engineers
Ā 
ALTERNATIVE METHOD TO LINEAR CONGRUENCE
ALTERNATIVE METHOD TO LINEAR CONGRUENCEALTERNATIVE METHOD TO LINEAR CONGRUENCE
ALTERNATIVE METHOD TO LINEAR CONGRUENCE
Ā 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
Ā 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
Ā 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
Ā 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
Ā 
Chap011
Chap011Chap011
Chap011
Ā 
Grouping and Displaying Data to Convey Meaning: Tables & Graphs chapter_2 _fr...
Grouping and Displaying Data to Convey Meaning: Tables & Graphs chapter_2 _fr...Grouping and Displaying Data to Convey Meaning: Tables & Graphs chapter_2 _fr...
Grouping and Displaying Data to Convey Meaning: Tables & Graphs chapter_2 _fr...
Ā 
Presentation1
Presentation1Presentation1
Presentation1
Ā 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
Ā 
Decision Tree Algorithm Implementation Using Educational Data
Decision Tree Algorithm Implementation  Using Educational Data Decision Tree Algorithm Implementation  Using Educational Data
Decision Tree Algorithm Implementation Using Educational Data
Ā 

Similar to Classes without Dependencies - UseR 2018

Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi
Ā 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
Ā 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
Ā 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
Ā 
Technology Lesson Plan Assignment: Quadratice Functions
Technology Lesson Plan Assignment: Quadratice FunctionsTechnology Lesson Plan Assignment: Quadratice Functions
Technology Lesson Plan Assignment: Quadratice Functionsdart11746
Ā 
Ict Tools In Mathematics Instruction
Ict Tools In Mathematics InstructionIct Tools In Mathematics Instruction
Ict Tools In Mathematics InstructionMiracule D Gavor
Ā 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
Ā 
Automatically Answering And Generating Machine Learning Final Exams
Automatically Answering And Generating Machine Learning Final ExamsAutomatically Answering And Generating Machine Learning Final Exams
Automatically Answering And Generating Machine Learning Final ExamsRichard Hogue
Ā 
AlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptxAlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptxPerumalPitchandi
Ā 
[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of CubesUniversity of Bologna
Ā 
An alternative learning experience in transition level mathematics
An alternative learning experience in transition level mathematicsAn alternative learning experience in transition level mathematics
An alternative learning experience in transition level mathematicsDann Mallet
Ā 
4.80 sy it
4.80 sy it4.80 sy it
4.80 sy itrajiv1300
Ā 
ch12lectPP420
ch12lectPP420ch12lectPP420
ch12lectPP420fiegent
Ā 
22_RepeatedMeasuresDesign_Complete.pptx
22_RepeatedMeasuresDesign_Complete.pptx22_RepeatedMeasuresDesign_Complete.pptx
22_RepeatedMeasuresDesign_Complete.pptxMarceloHenriques20
Ā 
Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617Ravi Kumar
Ā 
Course Syllabus For Operations Management
Course Syllabus For Operations ManagementCourse Syllabus For Operations Management
Course Syllabus For Operations ManagementYnal Qat
Ā 
313 IDS _Course_Introduction_PPT.pptx
313 IDS _Course_Introduction_PPT.pptx313 IDS _Course_Introduction_PPT.pptx
313 IDS _Course_Introduction_PPT.pptxsameernsn1
Ā 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
Ā 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...ijcseit
Ā 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...IJCSES Journal
Ā 

Similar to Classes without Dependencies - UseR 2018 (20)

Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
Ā 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
Ā 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
Ā 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
Ā 
Technology Lesson Plan Assignment: Quadratice Functions
Technology Lesson Plan Assignment: Quadratice FunctionsTechnology Lesson Plan Assignment: Quadratice Functions
Technology Lesson Plan Assignment: Quadratice Functions
Ā 
Ict Tools In Mathematics Instruction
Ict Tools In Mathematics InstructionIct Tools In Mathematics Instruction
Ict Tools In Mathematics Instruction
Ā 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
Ā 
Automatically Answering And Generating Machine Learning Final Exams
Automatically Answering And Generating Machine Learning Final ExamsAutomatically Answering And Generating Machine Learning Final Exams
Automatically Answering And Generating Machine Learning Final Exams
Ā 
AlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptxAlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptx
Ā 
[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes
Ā 
An alternative learning experience in transition level mathematics
An alternative learning experience in transition level mathematicsAn alternative learning experience in transition level mathematics
An alternative learning experience in transition level mathematics
Ā 
4.80 sy it
4.80 sy it4.80 sy it
4.80 sy it
Ā 
ch12lectPP420
ch12lectPP420ch12lectPP420
ch12lectPP420
Ā 
22_RepeatedMeasuresDesign_Complete.pptx
22_RepeatedMeasuresDesign_Complete.pptx22_RepeatedMeasuresDesign_Complete.pptx
22_RepeatedMeasuresDesign_Complete.pptx
Ā 
Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Ā 
Course Syllabus For Operations Management
Course Syllabus For Operations ManagementCourse Syllabus For Operations Management
Course Syllabus For Operations Management
Ā 
313 IDS _Course_Introduction_PPT.pptx
313 IDS _Course_Introduction_PPT.pptx313 IDS _Course_Introduction_PPT.pptx
313 IDS _Course_Introduction_PPT.pptx
Ā 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
Ā 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
Ā 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
Ā 

Recently uploaded

Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
Ā 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
Ā 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
Ā 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
Ā 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
Ā 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
Ā 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
Ā 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
Ā 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
Ā 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
Ā 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
Ā 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
Ā 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
Ā 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
Ā 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
Ā 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
Ā 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
Ā 

Recently uploaded (20)

Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
Ā 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
Ā 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
Ā 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
Ā 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
Ā 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
Ā 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
Ā 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
Ā 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
Ā 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
Ā 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Ā 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
Ā 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
Ā 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Ā 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
Ā 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
Ā 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Ā 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
Ā 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Ā 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
Ā 

Classes without Dependencies - UseR 2018

  • 1. Classes without dependencies Teaching the tidyverse to ļ¬rst year science students Sam Clifford, Iwona Czaplinski, Brett Fyļ¬eld, Sama Low-Choy, Belinda Spratt, Amy Stringer, Nicholas Tierney 2018-07-12
  • 2. The student bodyā€™s got a bad preparation SEB113 a core unit in QUTā€™s 2013 redesign of Bachelor of Science Introduce key math/stats concepts needed for ļ¬rst year science OP 13 cutoff (ATAR 65) Assumed knowledge: Intermediate Mathematics Some calculus and statistics Not formally required Diagnostic test and weekly prep material Basis for further study in disciplines (explicit or embedded) Still needs to be a self-contained unit that teaches skills
  • 3. What they need is adult education Engaging students with use of maths/stats in science Build good statistical habits from the start Have students doing analysis that is relevant to their needs as quickly as possible competently with skills that can be built on Introduction to programming reproducibility separating analysis from the raw data ļ¬‚exibility beyond menus correcting mistakes becomes easier
  • 4. You go back to school Bad old days Manual calculation of test statistics Reliance on statistical tables Donā€™t want to replicate senior high school study Reduce reliance on point and click software that only does everything students need right now (Excel, Minitab) Students donā€™t need to become R developers Focus on functionality rather than directly controlling every element, e.g. LATEXvs Word
  • 5. Itā€™s a bad situation Initial course development was not tidy New B Sc course brought forward Grab bag of topics at request of science academics Difļ¬cult to ļ¬nd tutors who could think outside ā€œtraditionalā€ stat. ed. very low student satisfaction initially Rapid and radical redesign required tidyverse an integrated suite focused on transforming data frames Vectorisation > loops RStudio > JGR > Rgui.exe
  • 6. What you want is an adult education (Oh yeah!) Compassion and support for learners Problem- and model-based Technology should support learning goals Go further, quicker by not focussing on mechanical calculations Workļ¬‚ow based on functions rather than element manipulation Statistics is an integral part of science Statistics isnā€™t about generating p values see Cobb in Wasserstein and Lazar [2016]
  • 7. Machines do the work so people have time to think ā€“ IBM (1967) All models are wrong, but some are useful ā€“ Box (1987)
  • 8. Now here we go dropping science, dropping it all over Within context of scientiļ¬c method: Aims Methods and Materials 1. Get data/model into an analysis environment 2. Data munging Results 3. Exploration of data/model 4. Compute model 5. Model diagnostics Conclusion 6. Interpret meaning of results
  • 9. I said you wanna be startinā€™ somethinā€™ Redesign around ggplot2 ggplot2 introduced us to tidy data requirements Redesign based on Year 11 summer camp This approach not covered by textbooks at the time Tried using JGR and Plot Builder for one semester Extension to wider tidyverse Replace unrelated packages/functions with uniļ¬ed approach Focus on what you want rather than directly coding how to do it Good effort-reward with limited expertise
  • 10. Summer(ise) loving, had me a blast; summer(ise) loving, happened so fast R is a giant calculator that can operate on objects ggplot() requires a data frame object dplyr::summarise() to summarise a column variable dplyr::group_by() to do summary according to speciļ¬ed structure Copy-paste or looping not guaranteed to be MECE Group-level summary stats leads to potential statistical models Easier, less error prone, than repeated usage of =AVERAGE()
  • 11. We want the funk(tional programming paradigm) Tidy data as observations of variables with structure [Wickham, 2014b] R as functional programming [Wickham, 2014a] Actions on entire objects to do things to data and return useful information Students enter understanding functions like y(x) = x2 function takes input function returns output e.g. mean(x) = i xi/n Week 4: writing functions to solve calculus problems magrittr::%>% too conceptually similar to ggplot2::+ for novices to grasp in ļ¬rst course
  • 12. Like Frankie sang, I did it my way Whatā€™s the mean gas mileage for each engine geometry and transmission type for the 32 cars listed in 1974 Motor Trends magazine? Loops For each of the pre-computed number of groups, subset, summarise and store how you want tapply() INDEX a list of k vectors, 1 summary FUNction, returns k-dimensional array dplyr specify grouping variables and which sum- mary statistics, returns tidy data frame ready for model/plot
  • 13. Night of the living baseheads Like all procedural languages, plot() has one giant list of arguments Focus is on how plot is drawn rather than what you want to plot Inefļ¬ciency of keystrokes re-stating the things being plotted setting up plot axis limits loop counters for small multiples, etc.
  • 14. Toot toot, chugga chugga, big red car Say we want to plot carsā€™ fuel efļ¬ciency against weight library(tidyverse) data(mtcars) mtcars <- mutate( mtcars, l100km = 235.2146/mpg, wt_T = wt/2.2046, am = factor(am, levels = c(0,1), labels=c("Auto", "Manual")), vs = factor(vs, levels = c(0,1), labels=c("V","S"))) plot(y=mtcars$l100km, x=mtcars$wt_T) 1.0 1.5 2.0 2.5 101520 mtcars$wt_T mtcars$l100km Fairly quick to say what goes on x and y axes More arguments ā†’ better graph xlim, ylim xlab, ylab main type, pch What if we want to see how it varies with engine geometry transmission type
  • 15. The wisdom of the fool wonā€™t set you free yrange <- range(mtcars$l100km) xrange <- range(mtcars$wt_T) levs <- expand.grid(vs = c("V", "S"), am = c("Auto", "Manual")) par(mfrow = c(2,2)) for (i in 1:nrow(levs)){ dat_to_plot <- merge(levs[i, ], mtcars) plot(dat_to_plot$l100km ~ dat_to_plot$wt_T, pch=16, xlab="Weight (t)", xlim=xrange, ylab="Fuel efļ¬ciency (L/100km)", ylim=yrange, main = sprintf("%s-%s", levs$am[i], levs$vs[i]))} 1.0 1.5 2.0 2.5 101520 Autoāˆ’V Weight (t) Fuelefficiency(L/100km) 1.0 1.5 2.0 2.5 101520 Autoāˆ’S Weight (t) Fuelefficiency(L/100km) 1.0 1.5 2.0 2.5 101520 Manualāˆ’V Weight (t) Fuelefficiency(L/100km) 1.0 1.5 2.0 2.5 101520 Manualāˆ’S Weight (t) Fuelefficiency(L/100km) ggplot(data = mtcars, aes(x = wt_T, y = l100km)) + geom_point() + facet_grid(am ~ vs) + theme_bw() + xlab("Weight (t)") + ylab("Fuel efļ¬ciency (L/100km)") V S AutoManual 1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5 10 15 20 10 15 20 Weight (t) Fuelefficiency(L/100km)
  • 16. One, two, princes kneel before you Both approaches do the same thing Idea base ggplot2 Plot variables Specify vectors Coordinate system de- ļ¬ned by variables Small multiples Loops, subsets, par facet_grid Common axes Pre-computed Inherited from data V/S A/M annotation Strings Inherited from data Axis labels Per axis set For whole plot Focus on putting things on the page vs representing variables
  • 17. I got a grammar Hazel and a grammar Tilly Plots are built from [Wickham, 2010] data ā€“ which variables are mapped to aesthetic elements geometry ā€“ how do we draw the data? annotations ā€“ what is the context of these shapes? Build more complex plots by adding commands and layering elements, rather than by stacking individual points and lines e.g. make a scatter plot, THEN add a trend line (with inherited x, y), THEN facet by grouping variable, THEN change axis information
  • 18. When Iā€™m good, Iā€™m very good; but when Iā€™m bad, Iā€™m better Want to make good plots as soon as possible Learning about Tufteā€™s principles [Tufte, 1983, Pantoliano, 2012] Discuss what makes a plot good and bad Seeing how ggplot2 code translates into graphical elements Week 2 workshop has students making best and worst plots for a data set, e.g.
  • 19. Sie ist ein Model und sie sieht gut aus Make use of broom package to get model summaries Get data frames rather than summary.lm() text vomit tidy() parameter estimates CIs t test info [Greenland et al., 2016] glance() everything else ggplot2::fortify() regression diagnostic info instead of plot.lm() stat_qq(aes(x=.stdresid)) for residual quantiles geom_point(aes(x=.ļ¬tted, y=.resid)) for ļ¬tted vs residuals
  • 20. When you hear some feedback keep going take it higher Positives More conļ¬dence and students see use of maths/stats in science Students enjoy group discussions in workshops Some students continue using R over Excel in future units Labs can be done online in own time Negatives Request for more face to face help rather than online Labs can be done online in own time (but are they?) Downloading of slides rather than attending/watching lectures
  • 21. Things can only get better Focus on what you want from R rather than how you do it representing variables graphically summarising over structure in data tidiers for models Statistics embedded in scientiļ¬c theory [Diggle and Chetwynd, 2011] Problem-based learning groups of novices supervised by tutors discussion of various approaches
  • 22. Peter J. Diggle and Amanda G. Chetwynd. Statistics and Scientiļ¬c Method: An Introduction for Students and Researchers. Oxford University Press, 2011. Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman. Statistical tests, p values, conļ¬dence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology, 31(4):337ā€“350, apr 2016. URL https://doi.org/10.1007/s10654-016-0149-3. Mike Pantoliano. Data visualization principles: Lessons from Tufte, 2012. URL https: //moz.com/blog/data-visualization-principles-lessons-from-tufte. Edward Tufte. The Visual Display of Quantitative Information. Graphics Press, 1983. Ronald L. Wasserstein and Nicole A. Lazar. The ASA's statement on p-values: Context, process, and purpose. The American Statistician, 70 (2):129ā€“133, Apr 2016. URL https://doi.org/10.1080/00031305.2016.1154108.
  • 23. H. Wickham. Advanced R. Chapman & Hall/CRC The R Series. Taylor & Francis, 2014a. ISBN 9781466586963. URL https://books.google.com.au/books?id=PFHFNAEACAAJ. Hadley Wickham. A layered grammar of graphics. Journal of Computational and Graphical Statistics, 19(1):3ā€“28, 2010. doi: 10.1198/jcgs.2009.07098. Hadley Wickham. Tidy data. Journal of Statistical Software, 59(1):1ā€“23, 2014b. ISSN 1548-7660. URL https://www.jstatsoft.org/index.php/jss/article/view/v059i10.