SlideShare a Scribd company logo
1 of 51
Download to read offline
Unpivot the impossible
tidyxl & unpivotr
Duncan Garmonsway
@nacnudus
But seriously, check out data.linz.govt.nz
Take-home message
• Importing spreadsheets is easy
• Try it at home
• Help make it better
Grammar of spreadsheets
• Value
• Content
• Formula
• Formatting
• Conditional formatting
• Position
Grammar of spreadsheets
• Value
• Content
• Formula
• Formatting
• Conditional formatting
• Position
content formula
A1
A1 =A1
Grammar of spreadsheets
• Value
• Content
• Formula
• Formatting
• Conditional formatting
• Position
content formatted
0.9 1
1 1/01/1900
Grammar of spreadsheets
• Value
• Content
• Formula
• Formatting
• Position
Attribute Colour
Name Brown
Attribute Colour
Name Brown
Normalisation
header1 header2
value1 value2
row col character
1 1 header1
1 2 header2
2 1 value1
2 2 value2
Normalisation
header1 header2
value1 value2
row col character
1 1 header1
1 2 header2
2 1 value1
2 2 value2
Normalisation
row col value numFmt character date format id
1 1 1 General content 2
1 2 2 General formatted 2
2 1 0.9 0.0 3
2 2 0.9 0 4
3 1 1 0 5
3 2 1 d/mm/yyyy 1/01/1900 6
content formatted
0.9 1
1 1/01/1900
Normalisation
row col value numFmt character date format id
1 1 1 General content 2
1 2 2 General formatted 2
2 1 0.9 0.0 3
2 2 0.9 0 4
3 1 1 0 5
3 2 1 d/mm/yyyy 1/01/1900 6
content formatted
0.9 1
1 1/01/1900
Normalisation
row col value numFmt character date format id
1 1 1 General content 2
1 2 2 General formatted 2
2 1 0.9 0.0 3
2 2 0.9 0 4
3 1 1 0 5
3 2 1 d/mm/yyyy 1/01/1900 6
content formatted
0.9 1
1 1/01/1900
Normalisation
row col value numFmt character date format id
1 1 1 General content 2
1 2 2 General formatted 2
2 1 0.9 0.0 3
2 2 0.9 0 4
3 1 1 0 5
3 2 1 d/mm/yyyy 1/01/1900 6
content formatted
0.9 1
1 1/01/1900
Normalisation
row col value numFmt character date format id
1 1 1 General content 2
1 2 2 General formatted 2
2 1 0.9 0.0 3
2 2 0.9 0 4
3 1 1 0 5
3 2 1 d/mm/yyyy 1/01/1900 6
content formatted
0.9 1
1 1/01/1900
Formats
• Number format
• Font
• Fill
• Border
• Alignment
• Protection
Formats
• Number format
• Font
• Fill
• Border
• Top
• Colour
• Linetype
• Weight
• Left
• Colour
• Linetype …
Formats
> x$local$number_format
[1] "General"
[2] "d-mmm-yy"
[3] "[$-1409]d mmmm yyyy;@"
[4] "yyyy mmmm dddd"
[5] "0.00"
[6] "General"
Formats
> x$local$font$color$theme
[1] "dk2"
[2] "lt2"
[3] "dk1"
[4] "hlink"
[5] "accent2"
[6] "accent3"
[7] "accent4"
[8] "accent5"
[9] "accent6"
Formats
theme
local
x$theme x$local
> print(cells)
# A tibble: 25,884 × 15
address row col content formula formula_type formula_ref formula_group type character
<chr> <int> <int> <chr> <chr> <chr> <chr> <int> <chr> <chr>
1 A1 1 1 82 <NA> <NA> <NA> NA s Officer-issued excess speed band
2 C2 2 3 2009 <NA> <NA> <NA> NA <NA> <NA>
3 M2 2 13 114 <NA> <NA> <NA> NA s Total
4 N2 2 14 2010 <NA> <NA> <NA> NA <NA> <NA>
5 X2 2 24 114 <NA> <NA> <NA> NA s Total
6 AI2 2 35 114 <NA> <NA> <NA> NA s Total
7 AJ2 2 36 2011 <NA> <NA> <NA> NA <NA> <NA>
8 AT2 2 46 114 <NA> <NA> <NA> NA s Total
9 BE2 2 57 114 <NA> <NA> <NA> NA s Total
10 BF2 2 58 2012 <NA> <NA> <NA> NA <NA> <NA>
# ... with 25,874 more rows, and 5 more variables: height <dbl>, width <dbl>, style_format_id <int>,
# local_format_id <int>, comment <chr>
tidyxl
• The row/col position is the data
• Like hadley/readxl (Rcpp, RapidXml)
• But tidy
• Higher level of normalisation
tidyxl
• .xlsx
• contents() Normalised data
• formats() Formatting
• data.frames and matrices
• tidytable()
• offset()
• extend()
• anchor()
• pad()
• join_header()
unpivotr
tidyxl
• .xlsx
• contents() Normalised data
• formats() Formatting
• data.frames and matrices
• tidytable()
• offset()
• extend()
• anchor()
• pad()
• join_header()
• N() E() S() W()
• NNW() NNE() ENE() ESE()
SSE() SSW() WSW() WNW()
ABOVE() BELOW()
LEFT() RIGHT()
unpivotr
Cell selection
anchorDec <-
cells %>%
filter(col == 3, character == "Dec") %>%
split(.$row, .$col)
Cell selection
header_area <-
x %>% # x is a single cell from anchorDec
offset(cells, rows = 2, cols = -1) %>%
extend_S(cells,
boundary = ~ is.na(lead(content))) %>%
select(row, col, area = character)
Cell selection
datacells <-
header_area %>%
offset(cells, cols = 1) %>%
extend_E(cells,
boundary = ~ col == max(col)) %>%
mutate(value = if_else(is.na(character),
content,
character)) %>%
select(row, col, value)
• join_header()
• N() E() S() W()
• NNW() NNE() ENE() ESE()
SSE() SSW() WSW() WNW()
ABOVE() BELOW()
LEFT() RIGHT()
Joining
datacells %>%
W(header_district) %>%
W(header_area) %>%
NNW(header_month) %>%
NNW(header_year)
Joining
> dplyr::glimpse(tidied)
Observations: 19,684
Variables: 7
$ row <int> 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18...
$ col <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4...
$ value <chr> "84", "65", "136", "48", "118", "591", "334", "10...
$ district <chr> "Auckland", "Auckland", "Auckland", "Bay Of Plent...
$ area <chr> "Auckland Central Area", "Auckland East Area", "A...
$ month <chr> "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", ...
$ year <chr> "2009", "2009", "2009", "2009", "2009", "2009", "...
Similar work
• https://github.com/ONS-OpenData/ONSdatabaker inspiration for tidyxl and unpivotr
• https://github.com/rsheets Jenny Bryan and Rich Fitzjohn (jailbreakr: AI)
• https://github.com/jennybc/2016-06_spreadsheets Jenny Bryan’s talk
• https://www.stat.auckland.ac.nz/~joh024/Research/TableToLongForm/ Jimmy Oh (match pattern)
Acknowledgments
Cook’s map of NZ
http://mapsof.net/new-zealand/cook-chart-of-new-zealand
Middle Earth
http://www.r-chart.com/2016/10/map-of-middle-earth-map-above-was.html
https://github.com/jvangeld/ME-GIS
http://worlds.outercraft.com/forum/
http://lotr.wikia.com/wiki/File:Middle_earth_map.jpg
http://kilbeth.deviantart.com/art/Middle-earth-Map-18324387
NZ Police
http://www.police.govt.nz/about-us/publication/road-policing-driver-offence-data-january-2009-
march-2016
tidyxl & unpivotr
Duncan Garmonsway
@nacnudus

More Related Content

Similar to Unpivot the impossible with R packages tidyxl and unpivotr

Excel 2003 Training for Business Analysts
Excel 2003 Training for Business AnalystsExcel 2003 Training for Business Analysts
Excel 2003 Training for Business AnalystsTim Ward
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011Mandi Walls
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With RJahnab Kumar Deka
 
R_CheatSheet.pdf
R_CheatSheet.pdfR_CheatSheet.pdf
R_CheatSheet.pdfMariappanR3
 
How nebula graph index works
How nebula graph index worksHow nebula graph index works
How nebula graph index worksNebula Graph
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data AnalyticsArchana Gopinath
 
T03 b basicioscanf
T03 b basicioscanfT03 b basicioscanf
T03 b basicioscanfteach4uin
 
managing big data
managing big datamanaging big data
managing big dataSuveeksha
 
Introduction to XML, XHTML and CSS
Introduction to XML, XHTML and CSSIntroduction to XML, XHTML and CSS
Introduction to XML, XHTML and CSSJussi Pohjolainen
 
Creating a Custom Serialization Format (Gophercon 2017)
Creating a Custom Serialization Format (Gophercon 2017)Creating a Custom Serialization Format (Gophercon 2017)
Creating a Custom Serialization Format (Gophercon 2017)Scott Mansfield
 
Lec04-CS110 Computational Engineering
Lec04-CS110 Computational EngineeringLec04-CS110 Computational Engineering
Lec04-CS110 Computational EngineeringSri Harsha Pamu
 

Similar to Unpivot the impossible with R packages tidyxl and unpivotr (20)

HTML TABLES
HTML TABLESHTML TABLES
HTML TABLES
 
Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
 
R
RR
R
 
Excel 2003 Training for Business Analysts
Excel 2003 Training for Business AnalystsExcel 2003 Training for Business Analysts
Excel 2003 Training for Business Analysts
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With R
 
R_CheatSheet.pdf
R_CheatSheet.pdfR_CheatSheet.pdf
R_CheatSheet.pdf
 
How nebula graph index works
How nebula graph index worksHow nebula graph index works
How nebula graph index works
 
R language, an introduction
R language, an introductionR language, an introduction
R language, an introduction
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data Analytics
 
T03 b basicioscanf
T03 b basicioscanfT03 b basicioscanf
T03 b basicioscanf
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
Perl_Tutorial_v1
Perl_Tutorial_v1Perl_Tutorial_v1
Perl_Tutorial_v1
 
Perl_Tutorial_v1
Perl_Tutorial_v1Perl_Tutorial_v1
Perl_Tutorial_v1
 
managing big data
managing big datamanaging big data
managing big data
 
Descriptive Statistics in R.pptx
Descriptive Statistics in R.pptxDescriptive Statistics in R.pptx
Descriptive Statistics in R.pptx
 
Introduction to XML, XHTML and CSS
Introduction to XML, XHTML and CSSIntroduction to XML, XHTML and CSS
Introduction to XML, XHTML and CSS
 
Creating a Custom Serialization Format (Gophercon 2017)
Creating a Custom Serialization Format (Gophercon 2017)Creating a Custom Serialization Format (Gophercon 2017)
Creating a Custom Serialization Format (Gophercon 2017)
 
Lec04-CS110 Computational Engineering
Lec04-CS110 Computational EngineeringLec04-CS110 Computational Engineering
Lec04-CS110 Computational Engineering
 
SQL
SQLSQL
SQL
 

Recently uploaded

[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypseTomasz Kowalczewski
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024MulesoftMunichMeetup
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio, Inc.
 
Your Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | EvmuxYour Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | Evmuxevmux96
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Lisi Hocke
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Maxim Salnikov
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...drm1699
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAShane Coughlan
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Andreas Granig
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfkalichargn70th171
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanNeo4j
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaNeo4j
 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Flutter Agency
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksJinanKordab
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletAndrea Goulet
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMarkus Moeller
 

Recently uploaded (20)

[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Your Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | EvmuxYour Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | Evmux
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
 
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with Links
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
 

Unpivot the impossible with R packages tidyxl and unpivotr

  • 1. Unpivot the impossible tidyxl & unpivotr Duncan Garmonsway @nacnudus
  • 2.
  • 3.
  • 4.
  • 5. But seriously, check out data.linz.govt.nz
  • 6.
  • 7.
  • 8.
  • 9. Take-home message • Importing spreadsheets is easy • Try it at home • Help make it better
  • 10. Grammar of spreadsheets • Value • Content • Formula • Formatting • Conditional formatting • Position
  • 11. Grammar of spreadsheets • Value • Content • Formula • Formatting • Conditional formatting • Position content formula A1 A1 =A1
  • 12. Grammar of spreadsheets • Value • Content • Formula • Formatting • Conditional formatting • Position content formatted 0.9 1 1 1/01/1900
  • 13. Grammar of spreadsheets • Value • Content • Formula • Formatting • Position Attribute Colour Name Brown Attribute Colour Name Brown
  • 14. Normalisation header1 header2 value1 value2 row col character 1 1 header1 1 2 header2 2 1 value1 2 2 value2
  • 15. Normalisation header1 header2 value1 value2 row col character 1 1 header1 1 2 header2 2 1 value1 2 2 value2
  • 16. Normalisation row col value numFmt character date format id 1 1 1 General content 2 1 2 2 General formatted 2 2 1 0.9 0.0 3 2 2 0.9 0 4 3 1 1 0 5 3 2 1 d/mm/yyyy 1/01/1900 6 content formatted 0.9 1 1 1/01/1900
  • 17. Normalisation row col value numFmt character date format id 1 1 1 General content 2 1 2 2 General formatted 2 2 1 0.9 0.0 3 2 2 0.9 0 4 3 1 1 0 5 3 2 1 d/mm/yyyy 1/01/1900 6 content formatted 0.9 1 1 1/01/1900
  • 18. Normalisation row col value numFmt character date format id 1 1 1 General content 2 1 2 2 General formatted 2 2 1 0.9 0.0 3 2 2 0.9 0 4 3 1 1 0 5 3 2 1 d/mm/yyyy 1/01/1900 6 content formatted 0.9 1 1 1/01/1900
  • 19. Normalisation row col value numFmt character date format id 1 1 1 General content 2 1 2 2 General formatted 2 2 1 0.9 0.0 3 2 2 0.9 0 4 3 1 1 0 5 3 2 1 d/mm/yyyy 1/01/1900 6 content formatted 0.9 1 1 1/01/1900
  • 20. Normalisation row col value numFmt character date format id 1 1 1 General content 2 1 2 2 General formatted 2 2 1 0.9 0.0 3 2 2 0.9 0 4 3 1 1 0 5 3 2 1 d/mm/yyyy 1/01/1900 6 content formatted 0.9 1 1 1/01/1900
  • 21. Formats • Number format • Font • Fill • Border • Alignment • Protection
  • 22. Formats • Number format • Font • Fill • Border • Top • Colour • Linetype • Weight • Left • Colour • Linetype …
  • 23. Formats > x$local$number_format [1] "General" [2] "d-mmm-yy" [3] "[$-1409]d mmmm yyyy;@" [4] "yyyy mmmm dddd" [5] "0.00" [6] "General"
  • 24. Formats > x$local$font$color$theme [1] "dk2" [2] "lt2" [3] "dk1" [4] "hlink" [5] "accent2" [6] "accent3" [7] "accent4" [8] "accent5" [9] "accent6"
  • 26.
  • 27. > print(cells) # A tibble: 25,884 × 15 address row col content formula formula_type formula_ref formula_group type character <chr> <int> <int> <chr> <chr> <chr> <chr> <int> <chr> <chr> 1 A1 1 1 82 <NA> <NA> <NA> NA s Officer-issued excess speed band 2 C2 2 3 2009 <NA> <NA> <NA> NA <NA> <NA> 3 M2 2 13 114 <NA> <NA> <NA> NA s Total 4 N2 2 14 2010 <NA> <NA> <NA> NA <NA> <NA> 5 X2 2 24 114 <NA> <NA> <NA> NA s Total 6 AI2 2 35 114 <NA> <NA> <NA> NA s Total 7 AJ2 2 36 2011 <NA> <NA> <NA> NA <NA> <NA> 8 AT2 2 46 114 <NA> <NA> <NA> NA s Total 9 BE2 2 57 114 <NA> <NA> <NA> NA s Total 10 BF2 2 58 2012 <NA> <NA> <NA> NA <NA> <NA> # ... with 25,874 more rows, and 5 more variables: height <dbl>, width <dbl>, style_format_id <int>, # local_format_id <int>, comment <chr>
  • 28. tidyxl • The row/col position is the data • Like hadley/readxl (Rcpp, RapidXml) • But tidy • Higher level of normalisation
  • 29. tidyxl • .xlsx • contents() Normalised data • formats() Formatting • data.frames and matrices • tidytable() • offset() • extend() • anchor() • pad() • join_header() unpivotr
  • 30. tidyxl • .xlsx • contents() Normalised data • formats() Formatting • data.frames and matrices • tidytable() • offset() • extend() • anchor() • pad() • join_header() • N() E() S() W() • NNW() NNE() ENE() ESE() SSE() SSW() WSW() WNW() ABOVE() BELOW() LEFT() RIGHT() unpivotr
  • 31.
  • 32.
  • 33.
  • 34. Cell selection anchorDec <- cells %>% filter(col == 3, character == "Dec") %>% split(.$row, .$col)
  • 35.
  • 36. Cell selection header_area <- x %>% # x is a single cell from anchorDec offset(cells, rows = 2, cols = -1) %>% extend_S(cells, boundary = ~ is.na(lead(content))) %>% select(row, col, area = character)
  • 37.
  • 38. Cell selection datacells <- header_area %>% offset(cells, cols = 1) %>% extend_E(cells, boundary = ~ col == max(col)) %>% mutate(value = if_else(is.na(character), content, character)) %>% select(row, col, value)
  • 39. • join_header() • N() E() S() W() • NNW() NNE() ENE() ESE() SSE() SSW() WSW() WNW() ABOVE() BELOW() LEFT() RIGHT()
  • 40.
  • 41.
  • 42. Joining datacells %>% W(header_district) %>% W(header_area) %>% NNW(header_month) %>% NNW(header_year)
  • 43. Joining > dplyr::glimpse(tidied) Observations: 19,684 Variables: 7 $ row <int> 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18... $ col <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4... $ value <chr> "84", "65", "136", "48", "118", "591", "334", "10... $ district <chr> "Auckland", "Auckland", "Auckland", "Bay Of Plent... $ area <chr> "Auckland Central Area", "Auckland East Area", "A... $ month <chr> "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", ... $ year <chr> "2009", "2009", "2009", "2009", "2009", "2009", "...
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. Similar work • https://github.com/ONS-OpenData/ONSdatabaker inspiration for tidyxl and unpivotr • https://github.com/rsheets Jenny Bryan and Rich Fitzjohn (jailbreakr: AI) • https://github.com/jennybc/2016-06_spreadsheets Jenny Bryan’s talk • https://www.stat.auckland.ac.nz/~joh024/Research/TableToLongForm/ Jimmy Oh (match pattern)
  • 50. Acknowledgments Cook’s map of NZ http://mapsof.net/new-zealand/cook-chart-of-new-zealand Middle Earth http://www.r-chart.com/2016/10/map-of-middle-earth-map-above-was.html https://github.com/jvangeld/ME-GIS http://worlds.outercraft.com/forum/ http://lotr.wikia.com/wiki/File:Middle_earth_map.jpg http://kilbeth.deviantart.com/art/Middle-earth-Map-18324387 NZ Police http://www.police.govt.nz/about-us/publication/road-policing-driver-offence-data-january-2009- march-2016
  • 51. tidyxl & unpivotr Duncan Garmonsway @nacnudus