2. Who are we? What do we do?
What data do we work with?
Research partnerships results
Visualizations
Projects
3. The core Bank of current BBVA Group was
founded in Bilbao in 1857
4. BBVA Data & Analytics, was established in
2014 as a new Data Science Center
…its origin was a
research group at
BBVA Innovation
Center, 2011
Our mission is to extract the value
enclosed in BBVA’s data:
-data engines development
-data-based products and
services
-data-based ad-hoc
consultancy projects
5. Who are we? What do we do?
What data do we work with?
Research partnerships results
Visualizations
Projects
6. ► Card payments data generate a digital footprint
that can be read to describe socio-economic activity
7. ► Commerce and territory: a measure of prosperity
Sources: World
Bank, INE,
INEGI
Consumption spending
makes a major fraction of
GDP
Commerce, hotel and catering
services have great influence on
employment
Tourism influence on GDP is
also a key factor
España 58% 29% 10,9%
México 66% 27% 8,7%
8. Commercial activity registered by BBVA electronic payment systems in Spain [2014]
524 million transacctions*
(*BBVA cards+Non BBVA)
24 billion €
(*BBVA cards+Non BBVA)
48 million different cardholders
[(Spanish: BBVA+Non BBVA) + (foreigners: Non BBVA)]
More than 1 million comercial premises,
(BBVA+Non BBVA PoS)
9. ► Research steps and objectives
From data analysis… … to innovation.
{X, Y, t, €} Activity and
behavioral patterns
Insights, visualizations
and applications
Analyze people’s
interests and mobility
Measure permeability
and attractiveness of
cities
Demonstrate
hyperscalability factors
Design and implement
interactive tools
10. -How much? spending (€), number of transactions, average ticket
-Where? (X,Y,C) where C=Commercial type assigned to the PoS
-When? Time aggregations, frecuency, payments patterns
-Who? Anonymous consumer profile:
·Origin (residence zip code for BBVA cardholders, country for non BBVA cardholders)
·Gender, age (BBVA cardholders)
·Inferred characteristics: purchasing power, behavioral segmentatión, preferences
and interests
DESTINATIONORIGIN
Multidimensional
data
►Descriptive capacity of this
kind of data
11. -BBVA cards used on any kind of PoS:
·provides visión about the whole
transactional serie
Non BBVA cards on BBVA PoS:
·Non continuous activity track, low
frequency informationhard to track
itineraries
►Data sources and sample representativity
B=TPVs BBVA
A=BBVA cardholders
Points of Sale
Cardholders
Y%
100%
X% 100%
Vision on P% card
transactions:
P=(AUB)=X·1+1·Y-(X·Y)
12. City/Region Neighborhood commercial area
►We do apply privacy filters to generate statistics aggregating transactions
Descriptive
information:
Commercial
type
breakdown
Cardholder
features
Time
resolution:
year
month
week
day
hour
13. Who are we? What do we do?
What data do we work with?
Research partnerships results
Visualizations
Projects
15. 15
1. Mining urban performance: Scale-independent classification of cities based on individual
economic transactions. Sobolevsky, S., Sitko, I., Grauwin, S., Combes, R. T. D., Hawelka, B., Murillo
Arias, J., & Ratti, C. (2014). arXiv preprint arXiv:1405.4301. Fifth ASE International Conference on Data
Science in Stanford, CA, May, 2014
2. Money on the move: Big data of bank card transactions as the new proxy for human mobility
patterns and regional delineation. the case of residents and foreign visitors in spain.Sobolevsky, S.,
Sitko, I., Tachet des Combes, R., Hawelka, B., Murillo Arias, J., & Ratti, C. (2014, June). In Big Data
(BigData Congress), 2014 IEEE International Congress on (pp. 136-143). IEEE.
3. Cities through the Prism of People's Spending Behavior. Sobolevsky, S., Sitko, I., Combes, R. T. D.,
Hawelka, B., Arias, J. M., & Ratti, C. (2015)..arXiv preprint arXiv:1505.03854. Submitted to PLOS ONE
4. Scaling of city attractiveness for foreign visitors through big data of human economical and
social media activity. Sobolevsky, S., Bojic, I., Belyi, A., Sitko, I., Hawelka, B., Arias, J. M., & Ratti, C.
(2015).. arXiv preprint arXiv:1504.06003. IEEE Big Data Congress’2015 in NYC
5. Predicting Regional Economic Indices Using Big Data Of Individual Bank Card Transactions.
Sobolevsky, S., Massaro, E., Bojic, I., Arias, J. M., & Ratti, C. (2015). arXiv preprint arXiv:1506.00036.
Sixth ASE International Conference on Data Science in Stanford, CA, August, 2015 (best paper award)
6. Influence of sociodemographics on human mobility. Maxime Lenormand, Thomas Louail, Oliva G.
Cantu Ros, Miguel Picornell, Ricardo Herranz, Juan Murillo Arias, Marc Barthelemy, Maxi San Miguel, and
José J. Ramasco
Scientific papers
16. ►Beyond official administrative divisions, what are the functional inner boundaries
of a country? What are major cities’ areas of influence?
19. City
attractiveness is
defined as the
absolute number
of photographs,
tweets or
economical
transactions
made in the city
by foreign
visitors.
City attractivenes
follows a
superlinear
correlation with
cities’ size in
terms of
population.
20. Figure 3 visualizes
residuals for the LUZs
ordering the cities from
the most overperforming
to the most
underperforming ones
according to the bank
card transactions data. It
can be noticed that
although residuals from
different datasets are
different, the patterns are
generally consistent -
cities strongly
over/under-performing
according to one dataset
usually do the same
according to the others.
21. Commercial index project
Comparative quantitative analysis of
microeconomic climate of the cities:
measuring location’s success
and opportunities
Ability to compete regions, cities, locations
Investment attractiveness
New business opportunities
Learn from the leaders – how to improve
Enrich census and official statistics
22. Objectives
From micro to macro... and back to micro
Build a model that predicts official statistics
at province level
Apply that model at higher resolution
levels: geographical units below province,
temporal variation below year/month
Custom business predictions: opportunity
areas, risks predictions
23. How may we define quality of life?
Economic parameters:
GDP
Housing prices
Unemployment
Social parameters:
Crime
Education
Life expectancy
Subjective well-being:
happiness
self esteem
self realization
human interactions (f&f)
(lack of qualitative
dense and reliable data)
32. GDP – visualization of the model
fit
Sobolevsky, S., Massaro, E., Bojic, I., Arias, J. M., & Ratti, C. (2015). Predicting Regional Economic Indices Using Big Data Of Individual Bank Card
Transactions. arXiv preprint arXiv:1506.00036. Sixth ASE International Conference on Data Science in Stanford, CA, August, 2015 (best paper award)
Offcial Statistics Commercial Indexes Model
33. Who are we? What do we do?
What data do we work with?
Research partnerships results
Visualizations
Projects
47. Spending distribution
Playa del
Carmen
(16,80%)
Isla Mujeres
(0,33%)
Cozumel
(2,64%)
Cancún
(80,23%)
Riviera Maya
registró el 3,73% del
gasto total realizado
en México en el año
2014*
* el análisis está referido únicamente al gasto efectuado por clientes Bancomer
51. Origin of national visitors to Cancún
according to their spending
100%0% 1% 5% 10% 20%
Turismo nacional Sin incluir México y DF
Normalizado según la
población de cada estado*
0 50 300200100
*Base 100 si el peso del gasto realizado por los residentes de un estado coincide con el peso demográfico de dicho estado en el conjunto de la nación
Cancún:
• Distrito Federal (24,06%)
• México (23,49%)
• Jalisco (6,57%)
• Nuevo León (4,68%)
• Puebla (3,48%)
• Resto estados (37,72%)
Cancún:
• Jalisco (12,52%)
• Nuevo León (8,93%)
• Puebla (6,64%)
• Veracruz (6,38%)
• Tabasco (5,53%)
• Resto estados (60%)
Cancún:
• Distrito Federal (374)
• México (213)
• Campeche (204)
• Tabasco (178)
• Quintana Roo (175)
BBVA has a strong link to cities: day by day, second by second, we deal with a time ordered flow of geopositioned data: not only commercial transacions, but money transfers, communications, etc. and we can turn it into useful information that constitute the foundation for better internal and external decision taking processes
Pero sin duda el más complejo de todos estos sistemas es la dinámica socioeconómica, una capa intangible que abarca las interacciones entre las administraciones, las empresas y los ciudadanos en su doble faceta:
Como usuarios de servicios públicos (educación, cultura, sanidad, seguridad, gobierno)
Como consumidores de productos y servicios empresariales (comercio, s. financiero, asesoría, alojamiento y restauración, etc.)
La estructura de los datos responde a distintos niveles de agregación espacial y temporal... (leer diapo)
La estructura de los datos responde a distintos niveles de agregación espacial y temporal... se necesita un tamaño minimo para que –una vez filtrados los datos por criterios de privacidad- las estadísticas sean elocuentes.
Objectives of the project are manifold. We first start from evaluating the approach by training the model to predict existing official economical statistics – this will be the goal of the presented work. Further steps are: adapt the model for various spatial scales, capture temporal variation of regional performance, predict other relevant characteristics of urban life and finally focus on specific business use-cases for making custom business predictions
In this initial study we utilize 6 most common quantities from a variety of parameters provided by INE to characterize regional performance on the province scale.
From the other hand our data provides diverse multi-dim insights on human activity in the areas
From the other hand our data provides diverse multi-dim insights on human activity in the areas
Here are the characteristics we’re looking at which would all together build up our feature space for learning the models
The model will have several phases
Normalization – brining all the quantities on the same temporal scale by fitting the distribution and normalizing towards it.
Dimensionality reduction
Training generalized linear model
Testing performance
For the initial evaluation we picked up a fairly simple model – partially because this is always a first reasonable step in machine-learning, partially because the small size of data sample we deal with prevent from efficiently utilizing more sophisticated learning techniques, such as decision trees or neural networks.
First logistic regression predicts normalized versions of the statistical quantities (between 0 – worst and 1 - best) and then applying an inverse distribution we also learn from the training set we predict the actual values on the original scale
15 PCAs cover 95% of the entire information, but learning curve on the right show that optimal performance on the validation samples is typically achieved while considering just 6.
Here is how they can be characterized by the impact
Decent performance: 50-60% on the validation sets vs 60-70% on the training. Exception: crime rate where performance on the non-normalized scale is strongly affected by several outliers
And here, in this field, is where BBVA can make an important contribution.
We have a strong link to cities: day by day, second by second, we deal with a time ordered flow of geopositioned data, and we can turn it into useful information that can constitute the foundation for better decision taking processes