Analytics & Models
Esteban Moro
Alejandro Llorente
www.iic.uam.es

INNOVA CHALLENGE

Workshop 30 Oct
Analytics and Models
Challenge participant “roadmap”

Data
Maps
Infrastructures/Plac
es
Activity

INNOVA CHALLENGE

Mining...
Summary

Introduction to geo-tagged data
Access to (open) geo-tagged data
Example: development of geolocalized
recommender...
Introduction
to geo-tagged data
Introduction to geo-tagged data
Information:
Person, event,
infrastructure.

Geography:
GPS
coordinates,
zone, city

INNOV...
Geospatial Bigdata

Activity (Transport)

Geospatial
BigData

Maps

Satellite Images
INNOVA CHALLENGE

Social Media

Senso...
Geo-tagged BigData applications
With geo-tagged data we can
Measure zone/area occupation & activity
Identify flows of pers...
Geo-social Analysis

Use of pervasive sensors
(mobile phones, social media)
to model movement and
communication of people ...
Geo-social analysis
!!

Estudio de geolocalización en Madrid

Localización:!!Puerta!del!Sol!

place

n_checkins

user

10
...
Fraud detection

Use merchant
localization
and/or IP
address in online
transactions to
detect fraud.
INNOVA CHALLENGE

Wor...
Geomarketing

Bars

Shops

INNOVA CHALLENGE

Workshop 30 Oct
Optimal resource allocation
Optimize
Bares incash
holding
bank
branches,
minimizing
costs
associated
with it.

Tiendas

Id...
Event detection

Detect unexpected
behavior using
social/mobile/urban
sensors

INNOVA CHALLENGE

Workshop 30 Oct
Access to
(open) geographical data
Geographical data

Map
Infrastructure/place
s

Activity

INNOVA CHALLENGE

Workshop 30 Oct
Types of data

Maps

Economic/Demographic data
Other type of data
Google’s POIs
Weather forecast
Activity
Twitter
BBVA API...
Maps:: Google Maps
Google Maps has a number of different services/APIs, with different restrictions and
protocols. It allo...
Maps :: OpenStreetMap
Open and collaborative project to create and distribute free maps.
Different APIs to get information...
Mapas :: shapefiles
Geospatial vector data format for geographical information
•
•

Regions, points, paths defined as poin...
Mapas :: shapefiles
Edition and Visualization of Shapefiles: http://www.qgis.org

INNOVA CHALLENGE

Workshop 30 Oct
Maps :: Spain cartography
CartoCiudad (Ministerio de Fomento): shapefiles for each province at
municipality and postal cod...
Maps :: Madrid cartography
Nomecalles (CAM): shapefiles, POIs (museums, theaters, health services ),
subway (stations), et...
Maps :: Barcelona province cartography
Plan territorial metropolitano de Barcelona – Generalitat de Catalunya
Link

INNOVA...
Maps :: Barcelona City cartography
Open data
gencat
Catalonia
Cartography

Link

INNOVA CHALLENGE

Workshop 30 Oct
Maps :: Barcelona city cartography
Plan territorial metropolitano de Barcelona – Generalitat de Catalunya
Link
This web ha...
Demographic/Economic data :: Spain
Demographic Data:
Instituto Nacional de Estadística (INE)
Census by municipality.
Link
...
Demographic/Economic data :: Madrid
Madrid City
Madrid City Council database:
http://www-2.munimadrid.es/CSE6/jsps/menuBan...
Demographic/Economic data :: Barcelona
Barcelona city
Departament d’Estadística
http://www.bcn.cat/estadistica/castella/
P...
Other data sources :: Google Points of Interest
Google API Console

INNOVA CHALLENGE

Workshop 30 Oct
Other data sources :: Google Points of Interest
Google API Console

INNOVA CHALLENGE

Workshop 30 Oct
Other data sources :: Google Points of Interest
Google API Console

INNOVA CHALLENGE

Workshop 30 Oct
Other data sources :: Google Points of Interest

Points of interest around
Puerta del Sol (Madrid)
Service 1: Places Searc...
Other data sources :: Weather forecast

GFS: Global Forecast System
OpeNDAP protocol.
Python implementation : pydap
Query ...
Activity :: data from Twitter API
Developers webpage http://dev.twitter.com

INNOVA CHALLENGE

Workshop 30 Oct
Activity :: data from Twitter API
Developers webpage http://dev.twitter.com

INNOVA CHALLENGE

Workshop 30 Oct
Activity :: data from Twitter API
Developers webpage http://dev.twitter.com

INNOVA CHALLENGE

Workshop 30 Oct
Activity :: data from Twitter API
Developers webpage http://dev.twitter.com

Consumer Key
Consumer Secret
Access token
Acc...
Activity :: data from Twitter API
OAuth Authentication
Consumer Key
Consumer Secret
Access token
Access token secret

Rest...
Activity :: data from Twitter API
Stream API
Example:
Geolocalized Tweets in the Madrid region
API Service: POST statuses/...
Activity :: data from Twitter API
Stream API
As we said before, there are no data in Madrid about administrative zones
bel...
Activity :: data from Twitter API
Stream API

INNOVA CHALLENGE

Workshop 30 Oct
Activity :: data from Twitter API
Stream API

INNOVA CHALLENGE

Workshop 30 Oct
Activity :: data from BBVA API
https://www.centrodeinnovacionbbva.com/signup

INNOVA CHALLENGE

Workshop 30 Oct
Activity :: data from BBVA API

https://developer.bbva.com/panel

INNOVA CHALLENGE

Workshop 30 Oct
Activity :: data from BBVA API

https://developer.bbva.com/panel

INNOVA CHALLENGE

Workshop 30 Oct
Activity :: data from BBVA API

https://developer.bbva.com/panel

INNOVA CHALLENGE

Workshop 30 Oct
Activity :: data from BBVA API
Getting the authentication data:
1. With the APP_ID and APP_KEY, generate the authorization...
Activity :: data from BBVA API
Economical flows from
Puerta del Sol

Servicio API:
customer_zipcodes
Parámetros:
date_min:...
Example: development
of a geolocalized
recommender app.
Recommender systems :: Introduction
Objective: recommend users what areas to visit according to
their profile, residence, ...
Recommender systems :: user language

Use twitter data to

1. Get what people are talking about in city areas.
2. Analyze ...
Recommender systems :: user language
CP 28013: Madrid city center

INNOVA CHALLENGE

Workshop 30 Oct
Recommender systems :: user language
CP 28009 : Retiro

INNOVA CHALLENGE

Workshop 30 Oct
Recommender systems :: user demographic profile

Use CARDS_CUBE service from the BBVA API
INNOVA CHALLENGE

Workshop 30 Oc...
Recommender systems :: user demographic profile
• Use CARDS_CUBE service data
• For each merchant cathegory Z (bars, fashi...
Recommender systems :: user demographic profile
Example: Male, age 36-45
Fashion

INNOVA CHALLENGE

Bars and restaurants

...
Recommender systems :: user geographic profile

Use CUSTOMER_ZIPCODES service in the BBVA API
INNOVA CHALLENGE

Workshop 3...
Recommender systems :: user geographic profile
• Use data from the CUSTOMER_ZIPCODES service
• For each mercant cathegory ...
Recommender systems :: user geographic profile
Example: postal code 28045
Fashion

INNOVA CHALLENGE

Bars and restaurants
...
Recommender systems :: combination

Geographical and demographic
recommendation system
INNOVA CHALLENGE

Workshop 30 Oct
Recommender systems :: combination
Example: Male, age 36-45, living in postal code 28045.

Fashion

INNOVA CHALLENGE

Bars...
From the data to the app
From data to the app
1. The idea.
2. What data do I need to carry out this idea? Which services of the
Challenge API do I ...
Upcoming SlideShare
Loading in...5
×

Big Data Workshop: Analytics and Models por Esteban Moro y Alejandro Llorente

2,552

Published on

Big Data Workshops: Analytics and Models por Esteban Moro y Alejandro Llorente

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,552
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Big Data Workshop: Analytics and Models por Esteban Moro y Alejandro Llorente

  1. 1. Analytics & Models Esteban Moro Alejandro Llorente www.iic.uam.es INNOVA CHALLENGE Workshop 30 Oct
  2. 2. Analytics and Models Challenge participant “roadmap” Data Maps Infrastructures/Plac es Activity INNOVA CHALLENGE Mining Analysis Development App Content Models Visualization Workshop 30 Oct
  3. 3. Summary Introduction to geo-tagged data Access to (open) geo-tagged data Example: development of geolocalized recommender app. INNOVA CHALLENGE Workshop 30 Oct
  4. 4. Introduction to geo-tagged data
  5. 5. Introduction to geo-tagged data Information: Person, event, infrastructure. Geography: GPS coordinates, zone, city INNOVA CHALLENGE Workshop 30 Oct
  6. 6. Geospatial Bigdata Activity (Transport) Geospatial BigData Maps Satellite Images INNOVA CHALLENGE Social Media Sensors Workshop 30 Oct
  7. 7. Geo-tagged BigData applications With geo-tagged data we can Measure zone/area occupation & activity Identify flows of persons/money between different areas Identificar movimientos / flujos entre zonas … With those data we can build applications in Geo-social analysis Geomarketing Optimal allocation of resources Fraud detection Event detection … INNOVA CHALLENGE Workshop 30 Oct
  8. 8. Geo-social Analysis Use of pervasive sensors (mobile phones, social media) to model movement and communication of people in urban areas. INNOVA CHALLENGE Workshop 30 Oct
  9. 9. Geo-social analysis !! Estudio de geolocalización en Madrid Localización:!!Puerta!del!Sol! place n_checkins user 10 5 0 cn ot u fo o d nh e i tf gl i sp hs o 0 ln us e m s m os j e s a r t e ir l é e ue c v d i a vre s a i n e s ád d ig b o on mo 7 0 0 6 0 0 cn ot u 5 0 0 factor(tipo) a _ t ram r ee i e t ntn n s t 4 0 0 fo o d 3 0 0 nh e i tf gl i sp hs o 2 0 0 1 0 0 0 0 5 1 0 ha o r 1 5 2 0 2 5 n_checkins fa n c 3 1 6 1 aa l 6 m 6 z 6 e 1 2 1 2 sru so e t b kcf a c fe 2 6 9 2 rn y ua w4 7 3 3 ma ds me e d ea i u r o c n gl 2 5 1 3 eid dnl ai 4 0 4 eo i gs lc t né r e l 1 3 6 4 m to r s a rd i e a s u 3 9 5 ma ds aó e d ea nn r o c nt 1 1 3 5 i o ap vc o _m s 3 5 6 ymi ei e 3 e on d l d l c s a 8 7 6 dp eo sp 3 3 7 INNOVA CHALLENGE a _ t ram r ee i e t ntn n s t 5 0 0 1 ! factor(tipo) 10 0 0 vs i p 8 4 7 e ma d a u i z 3 3 8 m nd c a' d ls o 7 8 8 do 8 ag l u 3 2 9 céer n a do t f i e e 7 7 9 dd l e e eb t s a r0 3 2 1 0 sa ysv aj e a l o l a 1 5 0 factor(tipo0) cn ot u Characterization of urban neighborhoods according to their social/commerci al use ! Número!de!checkins!totales:!2651!(30.5!al!día)! Número!de!usuarios!únicos!en!la!zona:!1231! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 0 0 a _ t ram r ee i e t ntn n s t fo o d nh e i tf gl i sp hs o 5 0 34! 0 7 1 1 0 mt f t n me e a a 3 0 a− b1 r 1 m1 a 1 y − t ey i ds m a jn 1 u1 − Workshop 30 Oct
  10. 10. Fraud detection Use merchant localization and/or IP address in online transactions to detect fraud. INNOVA CHALLENGE Workshop 30 Oct
  11. 11. Geomarketing Bars Shops INNOVA CHALLENGE Workshop 30 Oct
  12. 12. Optimal resource allocation Optimize Bares incash holding bank branches, minimizing costs associated with it. Tiendas Identify best placement for a new shop/branch INNOVA CHALLENGE Workshop 30 Oct
  13. 13. Event detection Detect unexpected behavior using social/mobile/urban sensors INNOVA CHALLENGE Workshop 30 Oct
  14. 14. Access to (open) geographical data
  15. 15. Geographical data Map Infrastructure/place s Activity INNOVA CHALLENGE Workshop 30 Oct
  16. 16. Types of data Maps Economic/Demographic data Other type of data Google’s POIs Weather forecast Activity Twitter BBVA API INNOVA CHALLENGE Workshop 30 Oct
  17. 17. Maps:: Google Maps Google Maps has a number of different services/APIs, with different restrictions and protocols. It allows to define maps, routes, markers, etc. Example: get a static map (without authentication). URL Base: http://maps.google.com/maps/api/staticmap Parameters: • center: 40.4153,-3.6875 • size: 640x640 • maptype: mobile • format: png32 • sensor: true INNOVA CHALLENGE Workshop 30 Oct
  18. 18. Maps :: OpenStreetMap Open and collaborative project to create and distribute free maps. Different APIs to get information about routes, points, maps, etc. There are a number of Mapping projects (applications) build on top of OSM with very different purposes Example: get the route between two locations. MapQuest. URL Base: http://open.mapquestapi.com/guidance/v1/ Parameters: • Key: authentication key • From: latitud y longitud del origen en JSON. • To: latitud y longitud del destino en JSON. INNOVA CHALLENGE Workshop 30 Oct
  19. 19. Mapas :: shapefiles Geospatial vector data format for geographical information • • Regions, points, paths defined as points, lines, polygons Each of them usually has attributes that describe it Region Codes, Names, Population, etc. pyshp: http://code.google.com/p/pyshp/ maptools: http://cran.r-project.org/web/packages/maptools http://www.naturalearthdata.com/downloads/ INNOVA CHALLENGE Workshop 30 Oct
  20. 20. Mapas :: shapefiles Edition and Visualization of Shapefiles: http://www.qgis.org INNOVA CHALLENGE Workshop 30 Oct
  21. 21. Maps :: Spain cartography CartoCiudad (Ministerio de Fomento): shapefiles for each province at municipality and postal code levels. They also include data about the urban background http://www.cartociudad.es/portal/ INNOVA CHALLENGE Workshop 30 Oct
  22. 22. Maps :: Madrid cartography Nomecalles (CAM): shapefiles, POIs (museums, theaters, health services ), subway (stations), etc. http://www.madrid.org/nomecalles/DescargaBDTCorte.icm Resolution level: municipalities, districts, postal codes, etc. INNOVA CHALLENGE Workshop 30 Oct
  23. 23. Maps :: Barcelona province cartography Plan territorial metropolitano de Barcelona – Generalitat de Catalunya Link INNOVA CHALLENGE Workshop 30 Oct
  24. 24. Maps :: Barcelona City cartography Open data gencat Catalonia Cartography Link INNOVA CHALLENGE Workshop 30 Oct
  25. 25. Maps :: Barcelona city cartography Plan territorial metropolitano de Barcelona – Generalitat de Catalunya Link This web has also data about mobility, economic development, population, etc. at the district level There is nothing at this level of detail in Madrid. Solution: Use other data sources to estimate them (see below). INNOVA CHALLENGE Workshop 30 Oct
  26. 26. Demographic/Economic data :: Spain Demographic Data: Instituto Nacional de Estadística (INE) Census by municipality. Link Economic Data: Servicio Público de Empleo Estatal (SEPE). Unemployment by municipality. Link INNOVA CHALLENGE Workshop 30 Oct
  27. 27. Demographic/Economic data :: Madrid Madrid City Madrid City Council database: http://www-2.munimadrid.es/CSE6/jsps/menuBancoDatos.jsp Population by districts, neighborhoods, etc. Madrid Region Comunidad de Madrid database: http://www.madrid.org/desvan/Inicio.icm?enlace=almudena Population by municipality. Economical data by municipality INNOVA CHALLENGE Workshop 30 Oct
  28. 28. Demographic/Economic data :: Barcelona Barcelona city Departament d’Estadística http://www.bcn.cat/estadistica/castella/ Population by district. Unemployment by district. Catalonia region Idescat (Institut d’Estadística de Catalunya) http://www.idescat.cat/es/ Population by municipality Economical data by municipalityo. INNOVA CHALLENGE Workshop 30 Oct
  29. 29. Other data sources :: Google Points of Interest Google API Console INNOVA CHALLENGE Workshop 30 Oct
  30. 30. Other data sources :: Google Points of Interest Google API Console INNOVA CHALLENGE Workshop 30 Oct
  31. 31. Other data sources :: Google Points of Interest Google API Console INNOVA CHALLENGE Workshop 30 Oct
  32. 32. Other data sources :: Google Points of Interest Points of interest around Puerta del Sol (Madrid) Service 1: Places Search Parameters : location: 40.417, -3.703 radius: 1000 Service 2: Places Details parameters: reference: código del place INNOVA CHALLENGE Workshop 30 Oct
  33. 33. Other data sources :: Weather forecast GFS: Global Forecast System OpeNDAP protocol. Python implementation : pydap Query format: SERVER = http://nomads.ncep.noaa.gov:9090/dods/gfs_hd/ DATE = AAAAMMDD HOUR = HH VAR = weather metric r (tmp2m, ugrd10m, pressfc, …) LAT = latitude interval [259:263] (0.5º steps from South Pole) LON = longitude interval [710:714] (0.5º steps from Greenwich) QUERY = SERVERgfs_hdDATE/gfs_hd_HOURz.dods?VAR[0:0][LAT][LON] dataset = open_dods(QUERY) INNOVA CHALLENGE Workshop 30 Oct
  34. 34. Activity :: data from Twitter API Developers webpage http://dev.twitter.com INNOVA CHALLENGE Workshop 30 Oct
  35. 35. Activity :: data from Twitter API Developers webpage http://dev.twitter.com INNOVA CHALLENGE Workshop 30 Oct
  36. 36. Activity :: data from Twitter API Developers webpage http://dev.twitter.com INNOVA CHALLENGE Workshop 30 Oct
  37. 37. Activity :: data from Twitter API Developers webpage http://dev.twitter.com Consumer Key Consumer Secret Access token Access token secret INNOVA CHALLENGE Workshop 30 Oct
  38. 38. Activity :: data from Twitter API OAuth Authentication Consumer Key Consumer Secret Access token Access token secret Rest API Stream API Several queries with parameters Number of requests is limited INNOVA CHALLENGE Only one query (with parameters) Requests are not timelimited Workshop 30 Oct
  39. 39. Activity :: data from Twitter API Stream API Example: Geolocalized Tweets in the Madrid region API Service: POST statuses/filter parameters: locations: -4.59, 39.90, -3.04, 41.17 INNOVA CHALLENGE Workshop 30 Oct
  40. 40. Activity :: data from Twitter API Stream API As we said before, there are no data in Madrid about administrative zones below the municipality. But we can estimate some of the with Twitter • Example: population by postal codes 1. Round geographical coordinates to the 3rd decimal place (square cells of approx 100 meters squared). 2. Analyze the most visited postal code by user. Define that as his/her residence. Count number of residents by postal code 3. Visualize. INNOVA CHALLENGE Workshop 30 Oct
  41. 41. Activity :: data from Twitter API Stream API INNOVA CHALLENGE Workshop 30 Oct
  42. 42. Activity :: data from Twitter API Stream API INNOVA CHALLENGE Workshop 30 Oct
  43. 43. Activity :: data from BBVA API https://www.centrodeinnovacionbbva.com/signup INNOVA CHALLENGE Workshop 30 Oct
  44. 44. Activity :: data from BBVA API https://developer.bbva.com/panel INNOVA CHALLENGE Workshop 30 Oct
  45. 45. Activity :: data from BBVA API https://developer.bbva.com/panel INNOVA CHALLENGE Workshop 30 Oct
  46. 46. Activity :: data from BBVA API https://developer.bbva.com/panel INNOVA CHALLENGE Workshop 30 Oct
  47. 47. Activity :: data from BBVA API Getting the authentication data: 1. With the APP_ID and APP_KEY, generate the authorization code concatenating both strings with and codifying it to base64. 2. This authorization code is added to the Http Request Header. Ejemplo: APP_ID = "iic_formacion_innovachallenge" APP_KEY = "0f1d750a5baea6c7022452d0d2ece01fc5901ad7” str_to_encode="iic_formacion_innovachallenge:0f1d750a5baea6c7022452d0d2ece01fc5901ad7” auth = strToBase64(str_to_encode) Request = HttpRequest(SERVICE, PARAMETERS, header = {‘Authorization’ : auth}) INNOVA CHALLENGE Workshop 30 Oct
  48. 48. Activity :: data from BBVA API Economical flows from Puerta del Sol Servicio API: customer_zipcodes Parámetros: date_min:201304 date_max:201304 zipcode:28013 by:cards group_by:month INNOVA CHALLENGE Workshop 30 Oct
  49. 49. Example: development of a geolocalized recommender app.
  50. 50. Recommender systems :: Introduction Objective: recommend users what areas to visit according to their profile, residence, preferences, etc. Using information about what similar users do. Data used: 1. Twitter data. 2. API Innova Challenge – CARDS_CUBE. 3. API Innova Challenge – CUSTOMER_ZIPCODES. INNOVA CHALLENGE Workshop 30 Oct
  51. 51. Recommender systems :: user language Use twitter data to 1. Get what people are talking about in city areas. 2. Analyze user language in Twitter 3. Compare user language with area language and recommend user most similar areas. INNOVA CHALLENGE Workshop 30 Oct
  52. 52. Recommender systems :: user language CP 28013: Madrid city center INNOVA CHALLENGE Workshop 30 Oct
  53. 53. Recommender systems :: user language CP 28009 : Retiro INNOVA CHALLENGE Workshop 30 Oct
  54. 54. Recommender systems :: user demographic profile Use CARDS_CUBE service from the BBVA API INNOVA CHALLENGE Workshop 30 Oct
  55. 55. Recommender systems :: user demographic profile • Use CARDS_CUBE service data • For each merchant cathegory Z (bars, fashion, health, etc.) build a matrix in which each entry is the number of different credit cards for a given profile X (gender, age) that went shopping to the postal code Y in a merchant of chategory Z. Where do people like me go shopping? Which restaurants are visited by people similar to me? INNOVA CHALLENGE Workshop 30 Oct
  56. 56. Recommender systems :: user demographic profile Example: Male, age 36-45 Fashion INNOVA CHALLENGE Bars and restaurants Workshop 30 Oct
  57. 57. Recommender systems :: user geographic profile Use CUSTOMER_ZIPCODES service in the BBVA API INNOVA CHALLENGE Workshop 30 Oct
  58. 58. Recommender systems :: user geographic profile • Use data from the CUSTOMER_ZIPCODES service • For each mercant cathegory Z (bars, fashion, health, etc.) we build a matrix in which each entry is the number of different credit cards from a postal code X that go shopping to postal code Y in merchant cathegory Z. Where do people in my district go shopping? What restaurants are visited by people living in my district? INNOVA CHALLENGE Workshop 30 Oct
  59. 59. Recommender systems :: user geographic profile Example: postal code 28045 Fashion INNOVA CHALLENGE Bars and restaurants Workshop 30 Oct
  60. 60. Recommender systems :: combination Geographical and demographic recommendation system INNOVA CHALLENGE Workshop 30 Oct
  61. 61. Recommender systems :: combination Example: Male, age 36-45, living in postal code 28045. Fashion INNOVA CHALLENGE Bars and restaurants Workshop 30 Oct
  62. 62. From the data to the app
  63. 63. From data to the app 1. The idea. 2. What data do I need to carry out this idea? Which services of the Challenge API do I need? May I improve it with other information sources? 3. Analysis: distilling the idea and assessing its viability. Extracting the hidden value of analytics and models. 4. How can the user take advantage of this idea? 5. Iterate 2,3 and 4 until the idea and the user profit show up. 6. Convert the value of the analysis to an application. INNOVA CHALLENGE Workshop 30 Oct
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×