SlideShare a Scribd company logo
GEOSPATIAL CSV IMPORTS HIDDEN
COMPLEXITY
Rafa de la Torre
CartoDB
Agenda
1) CSV Format Issues
2) Import Issues
CSV FORMAT ISSUES
Intro
.csv / MIME:text/csv
Unknown birthdate (80s?)
RFC 4180 (2005)
Intro
Plain text
Simple format
Simple rules
Usage
CSV
0101000020E610000000000000008049C000000000000038C0,1083
"alien",2014-11-04 15:24:40.43413+00
Category 1,
"jump
jump up!", {""value"":""es""}
WKT: Well-Known Text
POINT (30 10)
LINESTRING (30 10, 10 30, 40 40)
POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
MULTIPOINT ((10 40), (40 30), (20 20), (30 10))
MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)),
((15 5, 40 10, 10 20, 5 10, 15 5)))
https://en.wikipedia.org/wiki/Well-known_text
WKB: Well-Known Binary
POINT(2.0 4.0) =
000000000140000000000000004010000000000000
https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
GeoJSON
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [125.6, 10.1]
},
"properties": {
"name": "Dinagat Islands"
}
}
http://geojson.org/
IMPORT ISSUES
Typical
Huge files (>1GB)
Lots of rows (+2M)
Lots of columns (~1600)
XLS/XLSX -> CSV
Typical
Stream HTTP downloaded file
Stream file between servers
Stream data import to DB
Typical
CartoDB-specific
Content guessing (e.g. lat/lon)
Type guessing
Geometry errors fixing
Sync tables -> No downtime allowed
DB-Specific
Leave DB indexes as last step
Prefer big INSERT to multiple UPDATE
GDAL’s ogr2ogr > Ruby/Python scripts
http://www.gdal.org/ogr2ogr.html
Questions?
Thanks!
rtorre@cartodb.com

More Related Content

Viewers also liked

Del infierno al cielo
Del infierno al cieloDel infierno al cielo
Del infierno al cielo
Raúl Requero García
 
Codemotion 2015: Desarrollar un videojuego móvil multiplataforma con Cocos2D-X
Codemotion 2015: Desarrollar un videojuego móvil multiplataforma con Cocos2D-XCodemotion 2015: Desarrollar un videojuego móvil multiplataforma con Cocos2D-X
Codemotion 2015: Desarrollar un videojuego móvil multiplataforma con Cocos2D-X
Jon Segador
 
Codemotion 2015 spock_workshop
Codemotion 2015 spock_workshopCodemotion 2015 spock_workshop
Codemotion 2015 spock_workshop
Fernando Redondo Ramírez
 
Comunicacion en equipos tecnicos, por javier ramirez, teowaki
Comunicacion en equipos tecnicos, por javier ramirez, teowakiComunicacion en equipos tecnicos, por javier ramirez, teowaki
Comunicacion en equipos tecnicos, por javier ramirez, teowaki
javier ramirez
 
I Meetup OWASP - Seguridad en NodeJS
I  Meetup OWASP - Seguridad en NodeJSI  Meetup OWASP - Seguridad en NodeJS
I Meetup OWASP - Seguridad en NodeJS
Raúl Requero García
 
Get out of my thread (Trabajando en diferido)
Get out of my thread (Trabajando en diferido)Get out of my thread (Trabajando en diferido)
Get out of my thread (Trabajando en diferido)
Jorge Barroso
 
Codemotion 2015 crash y youdebug
Codemotion 2015   crash y youdebugCodemotion 2015   crash y youdebug
Codemotion 2015 crash y youdebug
jmiguel rodriguez
 
World-Class Testing Development Pipeline for Android
 World-Class Testing Development Pipeline for Android World-Class Testing Development Pipeline for Android
World-Class Testing Development Pipeline for Android
Pedro Vicente Gómez Sánchez
 
Cassandra for impatients
Cassandra for impatientsCassandra for impatients
Cassandra for impatients
Carlos Alonso Pérez
 
Limpiando espero la arquitectura que yo quiero
Limpiando espero la arquitectura que yo quieroLimpiando espero la arquitectura que yo quiero
Limpiando espero la arquitectura que yo quiero
Jose Manuel Pereira Garcia
 
Docker4developers Codemotion2016
Docker4developers Codemotion2016Docker4developers Codemotion2016
Docker4developers Codemotion2016
Raúl Requero García
 
Codemotion 2016: Cacahuetes y monos digitales
Codemotion 2016: Cacahuetes y monos digitalesCodemotion 2016: Cacahuetes y monos digitales
Codemotion 2016: Cacahuetes y monos digitales
Agustin Cuenca
 
Deep learning image classification aplicado al mundo de la moda
Deep learning image classification aplicado al mundo de la modaDeep learning image classification aplicado al mundo de la moda
Deep learning image classification aplicado al mundo de la moda
Javier Abadía
 
All you need to know when designing RESTful APIs
All you need to know when designing RESTful APIsAll you need to know when designing RESTful APIs
All you need to know when designing RESTful APIs
Jesús Espejo
 
Codemotion 2016 - Hackathones 101
Codemotion 2016 - Hackathones 101Codemotion 2016 - Hackathones 101
Codemotion 2016 - Hackathones 101
Adolfo Sanz De Diego
 
Coding Culture
Coding CultureCoding Culture
Coding Culture
Sven Peters
 

Viewers also liked (16)

Del infierno al cielo
Del infierno al cieloDel infierno al cielo
Del infierno al cielo
 
Codemotion 2015: Desarrollar un videojuego móvil multiplataforma con Cocos2D-X
Codemotion 2015: Desarrollar un videojuego móvil multiplataforma con Cocos2D-XCodemotion 2015: Desarrollar un videojuego móvil multiplataforma con Cocos2D-X
Codemotion 2015: Desarrollar un videojuego móvil multiplataforma con Cocos2D-X
 
Codemotion 2015 spock_workshop
Codemotion 2015 spock_workshopCodemotion 2015 spock_workshop
Codemotion 2015 spock_workshop
 
Comunicacion en equipos tecnicos, por javier ramirez, teowaki
Comunicacion en equipos tecnicos, por javier ramirez, teowakiComunicacion en equipos tecnicos, por javier ramirez, teowaki
Comunicacion en equipos tecnicos, por javier ramirez, teowaki
 
I Meetup OWASP - Seguridad en NodeJS
I  Meetup OWASP - Seguridad en NodeJSI  Meetup OWASP - Seguridad en NodeJS
I Meetup OWASP - Seguridad en NodeJS
 
Get out of my thread (Trabajando en diferido)
Get out of my thread (Trabajando en diferido)Get out of my thread (Trabajando en diferido)
Get out of my thread (Trabajando en diferido)
 
Codemotion 2015 crash y youdebug
Codemotion 2015   crash y youdebugCodemotion 2015   crash y youdebug
Codemotion 2015 crash y youdebug
 
World-Class Testing Development Pipeline for Android
 World-Class Testing Development Pipeline for Android World-Class Testing Development Pipeline for Android
World-Class Testing Development Pipeline for Android
 
Cassandra for impatients
Cassandra for impatientsCassandra for impatients
Cassandra for impatients
 
Limpiando espero la arquitectura que yo quiero
Limpiando espero la arquitectura que yo quieroLimpiando espero la arquitectura que yo quiero
Limpiando espero la arquitectura que yo quiero
 
Docker4developers Codemotion2016
Docker4developers Codemotion2016Docker4developers Codemotion2016
Docker4developers Codemotion2016
 
Codemotion 2016: Cacahuetes y monos digitales
Codemotion 2016: Cacahuetes y monos digitalesCodemotion 2016: Cacahuetes y monos digitales
Codemotion 2016: Cacahuetes y monos digitales
 
Deep learning image classification aplicado al mundo de la moda
Deep learning image classification aplicado al mundo de la modaDeep learning image classification aplicado al mundo de la moda
Deep learning image classification aplicado al mundo de la moda
 
All you need to know when designing RESTful APIs
All you need to know when designing RESTful APIsAll you need to know when designing RESTful APIs
All you need to know when designing RESTful APIs
 
Codemotion 2016 - Hackathones 101
Codemotion 2016 - Hackathones 101Codemotion 2016 - Hackathones 101
Codemotion 2016 - Hackathones 101
 
Coding Culture
Coding CultureCoding Culture
Coding Culture
 

Recently uploaded

Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
Madan Karki
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
MiscAnnoy1
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
shahdabdulbaset
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
Addu25809
 

Recently uploaded (20)

Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
 

Geospatial csv imports hidden complexity

Editor's Notes

  1. “La complejidad oculta de importar CSVs geoespaciales”
  2. La manera más fácil de crear mapas y analizar información geoespacial Editor, plataforma con API's +60k users, ~1k paying users, 3+ years old product - Migrant files - Stabilized appartments (John Krauss) - Multas Madrid (Feb'15, €17.5M) - Illustreets <number>
  3. - tabla - columnas: commas, filas: saltos - Fortran '67, Fortran77'78 - Intercambio entre BBDD
  4. - MS-DOS-style lines that end with (CR/LF) characters (optional for the last line) - An optional header record (there is no sure way to detect whether it is present). - Each record "should" contain the same number of comma-separated fields. - Any field may be quoted (with double quotes). - Fields containing a line-break, double-quote, and/or commas should be quoted. (If they are not, the file will likely be impossible to process correctly).
  5. Ejemplo de importación (todo menos el arrastrar/soltar)
  6. 1. WKB, int 2. string, date (iso) 3. String, cadena vacía? NULL? 4. String con saltos de línea, CSV
  7. WKT
  8. IO.copy_stream(src, file)