SlideShare a Scribd company logo

How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018

Alex Rasmussen
Alex Rasmussen
Alex RasmussenData Engineering Consultant

In the past five years, I have spent a lot of time trying to get high-integrity data out of spreadsheets and into databases. In this talk, I explore common data integrity problems when dealing with spreadsheet data, investigate whether those integrity problems are inescapable, and share ongoing work to mitigate them.

How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018

1 of 90
Download to read offline
How Do We Solve 

the World’s
Spreadsheet Problem?
Alex Rasmussen

@alexras
Hi, I’m Alex!
@alexras
freenome.com
bitsondisk.com
alexras.info
My Background
2009-2013: really fast sorting
2013-2016: data wrangling
2017-2018: cancer-fighting robots
I think/worry a lot 

about spreadsheets.
Today’s focus:
spreadsheet data
(for compute, felienne.com)
1. Spreadsheets are great
2. Spreadsheets are a problem
3. How we can fix it
This talk:

Recommended

List to Data (MS Access)
List to Data (MS Access)List to Data (MS Access)
List to Data (MS Access)TonyHine
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systemsMichael Mathioudakis
 
Barambona slide deck
Barambona slide deckBarambona slide deck
Barambona slide deckChrisDarryl
 
How to Convert XRD, FTIR File Data to Word Table in Order to Draw Plot on Ori...
How to Convert XRD, FTIR File Data to Word Table in Order to Draw Plot on Ori...How to Convert XRD, FTIR File Data to Word Table in Order to Draw Plot on Ori...
How to Convert XRD, FTIR File Data to Word Table in Order to Draw Plot on Ori...Zohaib HUSSAIN
 
Statistics 141 homework 6 complete solutions correct answers key
Statistics 141 homework 6 complete solutions correct answers keyStatistics 141 homework 6 complete solutions correct answers key
Statistics 141 homework 6 complete solutions correct answers keySong Love
 
6 years of my private G+ Spotfire community
6 years of my private G+ Spotfire community6 years of my private G+ Spotfire community
6 years of my private G+ Spotfire communityChristof Gaenzler
 
ICT: SPREADSHEETS (MICROSOFT EXCEL)
ICT: SPREADSHEETS (MICROSOFT EXCEL)ICT: SPREADSHEETS (MICROSOFT EXCEL)
ICT: SPREADSHEETS (MICROSOFT EXCEL)Aj Mappala
 
Processing data using regular expressions
Processing data using regular expressionsProcessing data using regular expressions
Processing data using regular expressionsGordon Morrison
 

More Related Content

Similar to How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018

Spreadsheets 2003 version
Spreadsheets 2003 versionSpreadsheets 2003 version
Spreadsheets 2003 versionJohan Koren
 
SQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27thSQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27thDave Stokes
 
Module 3 comp 312 - computer fundamentals and programming
Module 3   comp 312 - computer fundamentals and programmingModule 3   comp 312 - computer fundamentals and programming
Module 3 comp 312 - computer fundamentals and programmingdiosdadamendoza
 
ISOT Presentation.pptx
ISOT Presentation.pptxISOT Presentation.pptx
ISOT Presentation.pptxsirtBhopal1
 
Fusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsFusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsPhilip Schwarz
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP ProgrammersDave Stokes
 
Spreadsheets 2007/2010 version
Spreadsheets 2007/2010 versionSpreadsheets 2007/2010 version
Spreadsheets 2007/2010 versionJohan Koren
 
Phd2023-2024cIntegralUniversitynida.pptx
Phd2023-2024cIntegralUniversitynida.pptxPhd2023-2024cIntegralUniversitynida.pptx
Phd2023-2024cIntegralUniversitynida.pptxjaved75
 
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015Dave Stokes
 
Sql Server Interview Question
Sql Server Interview QuestionSql Server Interview Question
Sql Server Interview Questionpukal rani
 
Big Data, a space adventure - Mario Cartia - Codemotion Milan 2014
Big Data, a space adventure - Mario Cartia -  Codemotion Milan 2014Big Data, a space adventure - Mario Cartia -  Codemotion Milan 2014
Big Data, a space adventure - Mario Cartia - Codemotion Milan 2014Codemotion
 
DMDS Winter Workshop 2 Slides
DMDS Winter Workshop 2 SlidesDMDS Winter Workshop 2 Slides
DMDS Winter Workshop 2 SlidesPaige Morgan
 
Mdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-dataMdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-dataRafael Alvarado
 
Normalization Accepted
Normalization AcceptedNormalization Accepted
Normalization Acceptedprasaddurga
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesUlf Wendel
 

Similar to How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018 (20)

Spreadsheets 2003 version
Spreadsheets 2003 versionSpreadsheets 2003 version
Spreadsheets 2003 version
 
SQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27thSQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27th
 
Spreadsheets
SpreadsheetsSpreadsheets
Spreadsheets
 
Module 3 comp 312 - computer fundamentals and programming
Module 3   comp 312 - computer fundamentals and programmingModule 3   comp 312 - computer fundamentals and programming
Module 3 comp 312 - computer fundamentals and programming
 
ISOT Presentation.pptx
ISOT Presentation.pptxISOT Presentation.pptx
ISOT Presentation.pptx
 
Databasell
DatabasellDatabasell
Databasell
 
Fusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsFusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with Views
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP Programmers
 
Spreadsheets 2007/2010 version
Spreadsheets 2007/2010 versionSpreadsheets 2007/2010 version
Spreadsheets 2007/2010 version
 
Spreadsheets
SpreadsheetsSpreadsheets
Spreadsheets
 
Phd2023-2024cIntegralUniversitynida.pptx
Phd2023-2024cIntegralUniversitynida.pptxPhd2023-2024cIntegralUniversitynida.pptx
Phd2023-2024cIntegralUniversitynida.pptx
 
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
 
Sql Server Interview Question
Sql Server Interview QuestionSql Server Interview Question
Sql Server Interview Question
 
MS OFFICE &POWERPOINT
MS OFFICE &POWERPOINTMS OFFICE &POWERPOINT
MS OFFICE &POWERPOINT
 
Big Data, a space adventure - Mario Cartia - Codemotion Milan 2014
Big Data, a space adventure - Mario Cartia -  Codemotion Milan 2014Big Data, a space adventure - Mario Cartia -  Codemotion Milan 2014
Big Data, a space adventure - Mario Cartia - Codemotion Milan 2014
 
DMDS Winter Workshop 2 Slides
DMDS Winter Workshop 2 SlidesDMDS Winter Workshop 2 Slides
DMDS Winter Workshop 2 Slides
 
Mdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-dataMdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-data
 
Normalization Accepted
Normalization AcceptedNormalization Accepted
Normalization Accepted
 
3 inner plumbing Excel tips
3 inner plumbing Excel tips3 inner plumbing Excel tips
3 inner plumbing Excel tips
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodes
 

More from Alex Rasmussen

Schema Evolution Patterns - Texas Scalability Summit 2019
Schema Evolution Patterns - Texas Scalability Summit 2019Schema Evolution Patterns - Texas Scalability Summit 2019
Schema Evolution Patterns - Texas Scalability Summit 2019Alex Rasmussen
 
Schema Evolution Patterns - Velocity SJ 2019
Schema Evolution Patterns - Velocity SJ 2019Schema Evolution Patterns - Velocity SJ 2019
Schema Evolution Patterns - Velocity SJ 2019Alex Rasmussen
 
EarthBound’s almost-Turing-complete text system!
EarthBound’s almost-Turing-complete text system!EarthBound’s almost-Turing-complete text system!
EarthBound’s almost-Turing-complete text system!Alex Rasmussen
 
Unorthodox Paths to High Performance - QCon NY 2016
Unorthodox Paths to High Performance - QCon NY 2016Unorthodox Paths to High Performance - QCon NY 2016
Unorthodox Paths to High Performance - QCon NY 2016Alex Rasmussen
 
Papers We Love January 2015 - Flat Datacenter Storage
Papers We Love January 2015 - Flat Datacenter StoragePapers We Love January 2015 - Flat Datacenter Storage
Papers We Love January 2015 - Flat Datacenter StorageAlex Rasmussen
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Alex Rasmussen
 
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)Alex Rasmussen
 

More from Alex Rasmussen (7)

Schema Evolution Patterns - Texas Scalability Summit 2019
Schema Evolution Patterns - Texas Scalability Summit 2019Schema Evolution Patterns - Texas Scalability Summit 2019
Schema Evolution Patterns - Texas Scalability Summit 2019
 
Schema Evolution Patterns - Velocity SJ 2019
Schema Evolution Patterns - Velocity SJ 2019Schema Evolution Patterns - Velocity SJ 2019
Schema Evolution Patterns - Velocity SJ 2019
 
EarthBound’s almost-Turing-complete text system!
EarthBound’s almost-Turing-complete text system!EarthBound’s almost-Turing-complete text system!
EarthBound’s almost-Turing-complete text system!
 
Unorthodox Paths to High Performance - QCon NY 2016
Unorthodox Paths to High Performance - QCon NY 2016Unorthodox Paths to High Performance - QCon NY 2016
Unorthodox Paths to High Performance - QCon NY 2016
 
Papers We Love January 2015 - Flat Datacenter Storage
Papers We Love January 2015 - Flat Datacenter StoragePapers We Love January 2015 - Flat Datacenter Storage
Papers We Love January 2015 - Flat Datacenter Storage
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)
 
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
 

Recently uploaded

AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for usersStephenEfange3
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaAdrian Sanabria
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Thibaud Le Douarin
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdfdigimartfamily
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Cyber Security Experts
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxMdRafiqulIslam403212
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus
 
Recurrent neural network for machine learning
Recurrent neural network for machine learningRecurrent neural network for machine learning
Recurrent neural network for machine learningomogire08
 
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...Daniele Malitesta
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxVighnesh Shashtri
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
PredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxPredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxKapilSinghal47
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfAustraliaChapterIIBA
 

Recently uploaded (17)

AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for users
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix Enigma
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdf
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptx
 
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdfOppotus - Malaysians on Malaysia 4Q 2023.pdf
Oppotus - Malaysians on Malaysia 4Q 2023.pdf
 
Recurrent neural network for machine learning
Recurrent neural network for machine learningRecurrent neural network for machine learning
Recurrent neural network for machine learning
 
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
[IRTalks@The University of Glasgow] A Topology-aware Analysis of Graph Collab...
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptx
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
2.pptx
2.pptx2.pptx
2.pptx
 
PredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptxPredictuVu ProposalV1.pptx
PredictuVu ProposalV1.pptx
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
 

How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018