Have you checked out the new tidyr version 1.0.0?
I experimented with the pivot_long and pivot_wide functions and I love the new functionality!
I cleaned a weather data-set that was poorly constructed but thanks to the tidyverse it was a breeze!
1. Cleaning data
CCrause
Using exploring the enhanced tidyr version 1.0.0
First Import the raw data. I used weather data Upon first inspection I wanted to change a
couple of things
• Rows are stored as variables X1 - X31 represent days of the month
• Variable names are stored as rows. Max and mean temperature are variables and
should ideally be represented in their own column
• The first column called X is reduntant . X1 - X31 already captured the data on every
day of the month so I’ll remove it and change all column names to lower case
4. ## # A tibble: 8,866 x 5
## year month measure day measurement
## <int> <int> <chr> <chr> <chr>
## 1 2014 12 Max.TemperatureF x1 64
## 2 2014 12 Max.TemperatureF x2 42
## 3 2014 12 Max.TemperatureF x3 51
## 4 2014 12 Max.TemperatureF x4 43
## 5 2014 12 Max.TemperatureF x5 42
## 6 2014 12 Max.TemperatureF x6 45
## 7 2014 12 Max.TemperatureF x7 38
## 8 2014 12 Max.TemperatureF x8 29
## 9 2014 12 Max.TemperatureF x9 49
## 10 2014 12 Max.TemperatureF x10 48
## # ... with 8,856 more rows
Aditional tweaks
This worked very well but I dont like the x in front of the day, because it forces the column
type to take on a character value. You can remove the ‘prefix’ and then convert the type
very easily by adding two additional arguments namely names_prefix and names_ptypes. I
also dropped all the NA values
weather_tbl %>% pivot_longer(cols = starts_with('x'),
names_to = 'day',
values_to = 'measurement',
values_drop_na = T,
names_prefix = 'x',
names_ptypes = list(day = integer()))
## # A tibble: 8,046 x 5
## year month measure day measurement
## <int> <int> <chr> <int> <chr>
## 1 2014 12 Max.TemperatureF 1 64
## 2 2014 12 Max.TemperatureF 2 42
## 3 2014 12 Max.TemperatureF 3 51
## 4 2014 12 Max.TemperatureF 4 43
## 5 2014 12 Max.TemperatureF 5 42
## 6 2014 12 Max.TemperatureF 6 45
## 7 2014 12 Max.TemperatureF 7 38
## 8 2014 12 Max.TemperatureF 8 29
## 9 2014 12 Max.TemperatureF 9 49
## 10 2014 12 Max.TemperatureF 10 48
## # ... with 8,036 more rows
Making the dataset wider
The column “measure” contains different variables that would be better displayed in their
own column! Enter pivot_wider So grab the new column names from the “measure” column
and grab the values from the “measurement” column.