Report

CARTOFollow

May. 22, 2019•0 likes•478 views

May. 22, 2019•0 likes•478 views

Download to read offline

Report

Data & Analytics

In this technical webinar, Giulia Carella and Steve Isaac share how thinking spatially can help you to build powerful models that outperform the typical data science tools. Watch it now at: https://go.carto.com/dont-ignore-location-in-models-webinar-recorded

CARTOFollow

Unlock the power of spatial analysis using CARTO and python [CARTOframes]CARTO

Developing Spatial Applications with CARTO for React v1.1CARTO

The Sum of our Parts: the Complete CARTO Journey [CARTO]CARTO

CARTO Cloud Native – An Introduction to the Spatial Extension for BigQueryCARTO

The Role of Data Science in Real EstateCARTO

Le rôle de l’intelligence géospatiale dans la reprise économiqueCARTO

- 1. FOLLOW @CARTO ON TWITTER
- 2. The Sum of Our Parts Data Scientist Content Marketing Manager
- 3. CARTO is the platform to build powerful Location Intelligence apps with the best data streams available.
- 4. CARTO Customers Pioneers in Location Intelligence 1,200 End-users 300K Team members 100+
- 5. The Complete Journey 1. Data 2. Enrichment 3. Analysis 4. Solutions 5. Integration
- 6. The Complete Journey 1. Data 2. Enrichment 3. Analysis 4. Solutions 5. Integration
- 7. Enrichment Augment any data with demographic data from around the globe with easeData Observatory Develop robust ETL processes and update mechanisms so your data is always enriched Mastercard Human Mobility POI
- 8. The Journey - Analysis Bring CARTO maps and data into your data science workﬂows and the Python data science ecosystem to work with Pandas, PySal,PyMC3, scikit-learn, etc. CARTOFrames Use the power of PostGIS and our APIs to productionalize analysis workﬂows in your CARTO platform. PostGIS by CARTO SQL API Python SDK
- 9. John Snow’s map of cholera cases in London 1854. Red circles indicate locations of cholera cases and blue circles indicate locations of water pumps. ………………………………………………………………………………………………………….………………….…...
- 10. “Everything is related to everything else, but near things are more related than distant things.” (Tobler, 1970) ………………………………………………………………………………………………………….………………….…...
- 11. Modelling dependence on covariates and the spatial correlation structure ● Estimation of underlying model parameters ● Prediction at unsampled locations ● Change of support (downscaling/upscaling) ………………………………………………………………………………………………………….………………….…...
- 12. Failure to include spatial dependence in your model can lead to biased statistical results and erroneous conclusions. How can CARTO help me with my spatial models? ➢ Types of spatial data ➢ Spatial modelling ➢ Demos ………………………………………………………………………………………………………….………………….…...
- 14. ● GPS tracking ● Fixed measuring devices ● High resolution satellites Geostatistical data We are thinking of a continuous spatial ﬁeld
- 15. ● Census data ● Region-based counts ● Coarse resolution satellites Region-based data We are observing a discrete spatial ﬁeld, but what are we thinking of?
- 16. ● Census data ● Region-based counts ● Coarse resolution satellites Region-based data We are observing a discrete spatial ﬁeld, but what are we thinking of?
- 17. ● Locations of occurrences of some event ● Locations of trees ● UFO sightings Point patterns We are thinking of occurrences of events
- 19. ● We need a complex function based on the coordinates to adequately describe the eﬀect of the location ● Regression models using the location's coordinates as predictors do not work well! ● More natural to explicitly model the variations of the process considering that it may be similar at nearby locations ………………………………………………………………………………………………………….………………….…...
- 20. What we are trying to model (or the response variable) This is modelling! ………………………………………………………………………………………………………….………………….…...
- 21. The mean structure e.g. some function of some covariates The residual (or what is not explained by the mean structure) What we are trying to model (or the response variable) ………………………………………………………………………………………………………….………………….…...
- 22. But the number of ways we could construct a model for the spatial process is unlimited! Spatially continuous models ➢ Gaussian processes (GP) Spatially discrete models ➢ Gaussian Markov Random Fields (GMRF) ………………………………………………………………………………………………………….………………….…...
- 23. ➢ A GP is parameterized by a mean function and covariance function ➢ as ↑ then ↓ ➢ depends on some parameters e.g.: the exponential covariance: The joint distribution of a ﬁnite number of outputs is a Gaussian! ………………………………………………………………………………………………………….………………….…...
- 28. ➢ How can we properly account for the uncertainty in the spatial dependence structure? THINK BAYESIAN! DATA LEVEL (LIKELIHOOD) PROCESS LEVEL PRIOR LEVEL ………………………………………………………………………………………………………….………………….…...
- 29. ➢ THE BIG PROBLEM: computations scale as O(N3 ), for more than a few thousand points this is intractable! Construct a DISCRETE APPROXIMATION of the continuous ﬁeld Figure from Cameletti et al. (AStA, 2013) ………………………………………………………………………………………………………….………………….…...
- 30. ➢ Based on neighbourhood structures 𝑖-th area ﬁrst-order neighbours second-order neighbours ➢ Markov means conditional independence ………………………………………………………………………………………………………….………………….…...
- 31. Under the Markovian property, the elements in the precision matrix (the inverse of the covariance) are non-zero only for neighbours ➢ Fast computations due to a sparse precision matrix! ➢ Diﬃcult to construct reasonable dependence structures 0.1% of non-zero elements! ………………………………………………………………………………………………………….………………….…...
- 33. ➢ Compare revenues from each travel agency to market performance ➢ We can use data from credit cards from purchases in the travel sector ………………………………………………………………………………………………………….………………….…...
- 34. Travel agencies Credit card data ………………………………………………………………………………………………………….………………….…...
- 35. ➢ Compare revenues from each travel agency to market performance ➢ We can use data from credit cards from purchases in the travel sector ➢ BUT… credit card data get anonymized in many locations ………………………………………………………………………………………………………….………………….…...
- 36. 1 month of data 5 months of data 12 months of data ………………………………………………………………………………………………………….………………….…...
- 37. ➢ Compare revenues from each travel agency to market performance ➢ We can use data from credit cards from purchases in the travel sector ➢ BUT… credit card data get anonymized in many locations ………………………………………………………………………………………………………….………………….…... CAN WE PREDICT AT LOCATIONS WHERE THERE ARE NO DATA? R package: mgcv, Wood (2011, Journal of the Royal Statistical Society: Series B)
- 38. CAN WE PREDICT AT LOCATIONS WHERE THERE ARE NO DATA? ➢ Compare revenues from each travel agency to market performance ➢ We can use data from credit cards from purchases in the travel sector ➢ BUT… credit card data get anonymized in many locations ………………………………………………………………………………………………………….………………….…... R package: mgcv, Wood (2011, Journal of the Royal Statistical Society: Series B)
- 39. w/ GRMF smoothw/o GRMF smooth PREDICTED ORIGINAL ORIGINAL ………………………………………………………………………………………………………….………………….…...
- 43. ➢ Upload your data to CARTO and viz it using CARTOframes First we need to deﬁne the aggregation or zoom level. At CARTO we use QuadKeys ………………………………………………………………………………………………………….………………….…...
- 44. NUMBER OF TRANSACTIONS WHERE WE WANT TO PREDICT ➢ Upload your data to your CARTO account and plot it using CARTOframes ………………………………………………………………………………………………………….………………….…...
- 45. ➢ Before modelling, enrich your data with CARTO DATA OBSERVATORY (DO) ………………………………………………………………………………………………………….………………….…...
- 46. ➢ Before modelling, enrich your data with CARTO DATA OBSERVATORY (DO) ………………………………………………………………………………………………………….………………….…...
- 47. ➢ Before modelling, enrich your data with CARTO DATA OBSERVATORY (DO) ………………………………………………………………………………………………………….………………….…...
- 48. NUMBER OF TRANSACTIONS DATA WE WANT TO USE AS COVARIATES e.g. POPULATION ➢ Before modelling, viz with CARTOframes ………………………………………………………………………………………………………….………………….…...
- 49. NUMBER OF TRANSACTIONS DATA WE WANT TO USE AS COVARIATES e.g. NUMBER OF FOOD POIs ➢ Before modelling, viz with CARTOframes ………………………………………………………………………………………………………….………………….…...
- 50. PRIORS HYPER PRIORS PROCESS DATA ………………………………………………………………………………………………………….………………….…... NUMBER OF TRANSACTIONS R package: R-INLA, Lindgren and Rue (2015, JSS)
- 51. PREDICTED NUMBER OF TRANSACTIONS (MEAN) NUMBER OF TRANSACTIONS ………………………………………………………………………………………………………….………………….…...
- 52. PREDICTED NUMBER OF TRANSACTIONS (STANDARD DEVIATION) NUMBER OF TRANSACTIONS ………………………………………………………………………………………………………….………………….…...
- 53. Population # POI food # POI entertainment Income# POI transport # employees ………………………………………………………………………………………………………….………………….…...
- 54. RANDOM SPATIAL EFFECT (MEAN) RANDOM SPATIAL EFFECT (STANDARD DEVIATION) SPATIAL DOMAIN ………………………………………………………………………………………………………….………………….…...
- 55. PREDICTED NUMBER OF TRANSACTIONS (MEAN) ………………………………………………………………………………………………………….………………….…...
- 56. ➢ Think carefully what is the problem you are trying to solve and get the right data at the right spatial resolution CARTO Data Observatory ………………………………………………………………………………………………………….………………….…...
- 57. ➢ Think carefully what is the problem you are trying to solve and get the right data at the right spatial resolution ➢ Choose a scalable model and a ﬂexible implementation CARTO Data Observatory ………………………………………………………………………………………………………….………………….…... CARTO Analysis Framework and API A Framework for provisioning, orchestrating, executing and monitoring of analyses (processes) An API to deﬁne, register, schedule and execute user-deﬁned analysis written in virtually any language
- 58. ➢ Think carefully what is the problem you are trying to solve and get the right data at the right spatial resolution ➢ Choose a scalable model and a ﬂexible implementation ➢ The estimates we construct come from a complicated interaction of the model and the computational method: visualization (and other metrics) are essential CARTO Data Observatory ………………………………………………………………………………………………………….………………….…... CARTO Analysis Framework and API CARTOframes
- 59. Request a demo at CARTO.COM Data Scientist // giulia@carto.com Content Marketing Manager // sisaac@carto.com