Your SlideShare is downloading. ×
Big Data analytics and models
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data analytics and models


Published on

Big Data analytics and models by Esteban Moro

Big Data analytics and models by Esteban Moro

Published in: Technology, Business

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Workshop  BBVA  –  Open  InnovaHon   AnalyHcs  &  Models   Esteban  Moro   Alejandro  Llorente       INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 2. INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 3. h*ps://   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 4. AnalyHcs  and  Models   Challenge  par?cipant  “roadmap”     Data   Maps   Infrastructures/ Places   AcHvity   INNOVA  CHALLENGE   Mining   Analysis   Development   App   Content   Models   VisualizaHon   BigDataSpain  7/11  
  • 5. Summary   IntroducHon  to  geo-­‐tagged  data     Access  to  (open)  geo-­‐tagged  data     Example:  development  of  geolocalized   recommender  app.     INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 6. IntroducHon   to  geo-­‐tagged  data  
  • 7. IntroducHon  to  geo-­‐tagged  data   InformaHon:   Person,  event,   infrastructure.     Geography:   GPS   coordinates,   zone,  city   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 8. GeospaHal  Bigdata   Ac?vity  (Transport)   Geospa?al   BigData   Maps   Satellite  Images   INNOVA  CHALLENGE   Social  Media   Sensors   BigDataSpain  7/11  
  • 9. Geo-­‐tagged  BigData  applicaHons   With  geo-­‐tagged  data  we  can      Measure  zone/area  occupa?on  &  ac?vity    Iden?fy  flows  of  persons/money  between  different  areas    …       With  those  data  we  can  build  applicaHons  in        Geo-­‐social  analysis    Geomarke?ng    Op?mal  alloca?on  of  resources    Fraud  detec?on    Event  detec?on    …   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 10. Geo-­‐social  Analysis   Use  of  pervasive  sensors   (mobile  phones,  social  media)   to  model  movement  and   communica?on  of  people  in   urban  areas.   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 11. Geo-­‐social  analysis   !! Estudio de geolocalización en Madrid Localización:!!Puerta!del!Sol! 1500 count food nightlife shops 0 lunes martes miércoles jueves dia viernes sábado domingo 700 600 count 500 factor(tipo) arts_entertainment 400 food 300 nightlife shops 200 100 0 0 place n_checkins user 316 1 amazel666 269 2 runway4 73 3 mercado de san miguel 251 3 edaindil el corte inglés 136 4 maestrodarius 39 5 mercado de san antón 113 5 ivo_campos 35 6 yelmo cines ideal 3d 87 6 despop 33 7 vips 84 7 edumaiza mcdonald's 78 8 dalogu8 café de oriente 77 9 desdealbert0 32 10 sala joy eslava 71 10 mmetafetan 30 20 32 9 15 25 33 8 hora 40 4 10 121 starbucks coffee 5 n_checkins fnac 2 INNOVA  CHALLENGE   arts_entertainment 500 1 ! factor(tipo) 1000 150 factor(tipo0) 100 arts_entertainment count Characteriza?on   of  urban   neighborhoods   according  to   their  social/ commercial  use   ! Número!de!checkins!totales:!2651!(30.5!al!día)! Número!de!usuarios!únicos!en!la!zona:!1231! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! food nightlife shops 50 34! 0 abr−11 may−11 timedays jun−11 BigDataSpain  7/11  
  • 12. Fraud  detecHon   Use  merchant   localiza?on  and/ or  IP  address  in   online   transac?ons  to   detect  fraud.   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 13. GeomarkeHng   Bars   Shops   Manage  sales  risk   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 14. OpHmal  resource  allocaHon   Op?mize  cash   holding  in   bank   branches,   minimizing   costs   associated   with  it.   Bares   Tiendas   Iden?fy  best   placement  for  a   new  shop/ branch   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 15. Event  detecHon   Detect  unexpected   behavior  using  social/ mobile/urban  sensors   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 16. Access  to     (open)  geographical  data  
  • 17. Geographical  data     Map   Infrastructure/ places   AcHvity   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 18. Types  of  data   Maps     Economic/Demographic  data     AcHvity    Twi*er    BBVA  API   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 19. Maps::  Google  Maps   Google  Maps  has  a  number  of  different  services/APIs,  with  different  restric?ons  and   protocols.  It  allows  to  define  maps,  routes,  markers,  etc.   Example:  get  a  staHc  map  (without  authenHcaHon).   URL  Base:  h*p://   Parameters:   •    center:  40.4153,-­‐3.6875   •    size:  640x640   •    maptype:  mobile   •    format:  png32   •    sensor:  true   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 20. Maps  ::  OpenStreetMap   Open  and  collabora?ve  project  to  create  and  distribute  free  maps.     Different  APIs  to  get  informa?on  about  routes,  points,  maps,  etc.   There  are  a  number  of  Mapping  projects  (applica?ons)  build  on  top  of  OSM  with   very  different  purposes   Example:  get  the  route  between  two  locaHons.  MapQuest.   URL  Base:  h*p://   Parameters:   •  Key:  authen?ca?on  key   •  From:  la?tud  y  longitud  del  origen  en  JSON.   •  To:  la?tud  y  longitud  del  des?no  en  JSON.   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 21. Mapas  ::  shapefiles   Geospa?al  vector  data  format  for  geographical  informa?on     •  Regions,  points,  paths  defined  as  points,  lines,  polygons   •  Each  of  them  usually  has  a*ributes  that  describe  it   Region  Codes,  Names,  Popula?on,  etc.     pyshp:  h*p://     maptools:  h*p://cran.r-­‐   h*p://     INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 22. Mapas  ::  shapefiles   Edi?on  and  Visualiza?on  of  Shapefiles:  h*p://     INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 23. Maps  ::  Spain  cartography   CartoCiudad (Ministerio de Fomento): shapefiles for each province at municipality and postal code levels. They also include data about the urban background   h*p://     INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 24. Maps  ::  Madrid  cartography   Nomecalles (CAM): shapefiles, POIs (museums, theaters, health services ), subway (stations), etc.     h*p://       Resolu?on  level:  municipali?es,  districts,  postal  codes,  etc.   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 25. Maps  ::  Barcelona  province  cartography   Plan territorial metropolitano de Barcelona – Generalitat de Catalunya   Link   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 26. Maps  ::  Barcelona  City  cartography   Open data gencat Catalonia Cartography   Link   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 27. Maps  ::  Barcelona  city  cartography     Plan territorial metropolitano de Barcelona – Generalitat de Catalunya   Link   This  web  has  also  data  about   mobility,  economic  development,   popula?on,  etc.  at  the  district  level     There  is  nothing  at  this  level  of   detail  in  Madrid.       Solu?on:  Use  other  data  sources  to   es?mate  them  (see  below).   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 28. Demographic/Economic  data  ::  Spain   Demographic  Data:    Ins?tuto  Nacional  de  Estadís?ca  (INE)    Census  by  provinces  /  municipality  /  census  sec?on.    Link   Economic  Data:      Servicio  Público  de  Empleo  Estatal  (SEPE).      Unemployment  by  municipality.      Link   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 29. Demographic/Economic  data    ::  Madrid   Madrid  City    Madrid  City  Council  database:    h*p://www-­‐    Popula?on  by  districts,  neighborhoods,  etc.     Madrid  Region    Comunidad  de  Madrid  database:    h*p://    Popula?on  by  municipality.      Economical  data  by  municipality   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 30. Demographic/Economic  data  ::  Barcelona   Barcelona  city    Departament  d’Estadís?ca    h*p://    Popula?on  by  district.    Unemployment  by  district.     Catalonia  region    Idescat  (Ins?tut  d’Estadís?ca  de  Catalunya)    h*p://    Popula?on  by  municipality    Economical  data  by  municipality.   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 31. Other data sources :: Google Points of Interest Google  API  Console   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 32. Other data sources :: Google Points of Interest Google  API  Console   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 33. Other data sources :: Google Points of Interest Google  API  Console   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 34. Other data sources :: Google Points of Interest Points of interest around Puerta del Sol (Madrid) Service 1: Places Search Parameters : location: 40.417, -3.703 radius: 1000 Service 2: Places Details parameters: reference: place code INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 35. Other  data  sources  ::  Weather  forecast     GFS: Global Forecast System   OpeNDAP protocol.   Python implementation : pydap   Query format:   SERVER =   DATE = AAAAMMDD   HOUR = HH   VAR = weather metric r (tmp2m, ugrd10m, pressfc, …)   LAT = latitude interval [259:263] (0.5º steps from South Pole)   LON = longitude interval [710:714] (0.5º steps from Greenwich)       QUERY = SERVERgfs_hdDATE/gfs_hd_HOURz.dods?VAR[0:0][LAT][LON]   dataset = open_dods(QUERY)   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 36. AcHvity  ::  data  from  TwiZer  API   Developers webpage INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 37. AcHvity  ::  data  from  TwiZer  API   Developers webpage INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 38. AcHvity  ::  data  from  TwiZer  API   Developers webpage INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 39. AcHvity  ::  data  from  TwiZer  API   Developers webpage Consumer  Key   Consumer  Secret   Access  token   Access  token  secret   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 40. AcHvity  ::  data  from  TwiZer  API   OAuth Authentication Consumer  Key   Consumer  Secret   Access  token   Access  token  secret   Rest API Stream API Several queries with parameters Number of requests is limited INNOVA  CHALLENGE   Only one query (with parameters) Requests are not timelimited BigDataSpain  7/11  
  • 41. AcHvity  ::  data  from  TwiZer  API   Stream API Example: Geolocalized Tweets in the Madrid region API Service: POST statuses/filter parameters: locations: -4.59, 39.90, -3.04, 41.17 INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 42. AcHvity  ::  data  from  TwiZer  API   Stream API As we said before, there are no data in Madrid about administrative zones below the municipality. But we can estimate some of the with Twitter •  Example: population by postal codes 1.  Round geographical coordinates to the 3rd decimal place (square cells of approx. 100 meters squared). 2.  Analyze the most visited postal code by user. Define that as his/her residence. Count number of residents by postal code 3.  Visualize. INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 43. AcHvity  ::  data  from  TwiZer  API   Stream API INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 44. AcHvity  ::  data  from  TwiZer  API   Stream API INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 45. AcHvity  ::  data  from  BBVA  API   hZps://     INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 46. AcHvity  ::  data  from  BBVA  API INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 47. AcHvity  ::  data  from  BBVA  API INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 48. AcHvity  ::  data  from  BBVA  API INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 49. AcHvity  ::  data  from  BBVA  API   Geng  the  authenHcaHon  data:   1.  With  the  APP_ID  and  APP_KEY,  generate  the  authoriza?on  code  concatena?ng  both   strings  with  and  codifying  it  to  base64.   2.  This  authoriza?on  code  is  added  to  the  H*p  Request  Header.   Example:     APP_ID  =  "iic_formacion_innovachallenge"   APP_KEY  =  "0f1d750a5baea6c7022452d0d2ece01fc5901ad7”   str_to_encode="iic_formacion_innovachallenge:0f1d750a5baea6c7022452d0d2ece01fc5901ad7”   auth  =  strToBase64(str_to_encode)     Request  =  H*pRequest(SERVICE,  PARAMETERS,  header  =  {‘Authoriza?on’  :  auth})     INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 50. AcHvity  ::  CUSTOMER_ZIPCODES  example   Parameters   INNOVA  CHALLENGE   Workshop   BigDataSpain  7/11   30thOctober  
  • 51. AcHvity  ::  CUSTOMER_ZIPCODES  example   ExtracHng  data   INNOVA  CHALLENGE   Workshop   BigDataSpain  7/11   30thOctober  
  • 52. AcHvity  ::  CUSTOMER_ZIPCODES  example   Building  the  adjacency  list   INNOVA  CHALLENGE   Workshop   BigDataSpain  7/11   30thOctober  
  • 53. AcHvity  ::  CUSTOMER_ZIPCODES  example   Building  and  plong  the  graph   INNOVA  CHALLENGE   Workshop   BigDataSpain  7/11   30thOctober  
  • 54. AcHvity  ::  CUSTOMER_ZIPCODES  example   Economical  flows  from   Puerta  del  Sol   Servicio  API:   customer_zipcodes   Parámetros:    date_min:201304    date_max:201304    zipcode:28013    by:cards    group_by:month   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 55. Example:  development     of  a  geolocalized     recommender  app.  
  • 56. Recommender  systems  ::  IntroducHon   ObjecHve:  recommend  users  what  areas  to  visit  according  to   their  profile,  residence,  preferences,  etc.     Using  informaHon  about  what  similar  users  do.   Data  used:     1.  API  Innova  Challenge  –  CARDS_CUBE.   2.  API  Innova  Challenge  –  CUSTOMER_ZIPCODES.   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 57. Recommender  systems  ::  user  language   Use  twi*er  data  to     1.  Get  what  people  are  talking  about  in  city  areas.   2.  Analyze  user  language  in  Twi*er   3.  Compare  user  language  with  area  language  and   recommend  user  most  similar  areas.   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 58. Recommender  systems  ::  user  language   CP  28013:  Madrid  city  center   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 59. Recommender  systems  ::  user  language   CP 28009 : Retiro INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 60. Recommender  systems  ::  user  demographic  profile   Use  CARDS_CUBE  service  from  the  BBVA  API   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 61. Recommender  systems  ::  user  demographic  profile   •  Use  CARDS_CUBE  service  data     •  For  each  merchant  category  Z  (bars,  fashion,  health,  etc.)  build  a   matrix  in  which  each  entry  is  the  number  of  different  credit  cards  for   a  given  profile  X  (gender,  age)  that  went  shopping  to  the  postal  code   Y  in  a  merchant  of  category  Z.   Where  do  people  like  me  go  shopping?     Which  restaurants  are  visited  by  people  similar  to  me?   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 62. Recommender  systems  ::  user  demographic  profile   Example:  Male,  age  36-­‐45   Fashion     INNOVA  CHALLENGE   Bars  and  restaurants   BigDataSpain  7/11  
  • 63. Recommender  systems  ::  user  geographic  profile   Use  CUSTOMER_ZIPCODES  service  in  the  BBVA  API   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 64. Recommender  systems  ::  user  geographic  profile   •  Use  data  from  the  CUSTOMER_ZIPCODES  service   •  For  each  merchant  category  Z  (bars,  fashion,  health,  etc.)  we  build  a   matrix  in  which  each  entry  is  the  number  of  different  credit  cards  from  a   postal  code  X  that  go  shopping  to  postal  code  Y  in  merchant  category  Z.   Where  do  people  in  my  district  go  shopping?     What  restaurants  are  visited  by  people  living  in  my  district?   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 65. Recommender  systems  ::  user  geographic  profile   Example:  postal  code  28045   Fashion   INNOVA  CHALLENGE   Bars  and  restaurants   BigDataSpain  7/11  
  • 66. Recommender  systems  ::  combinaHon   Geographical and demographic recommendation system INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 67. Recommender  systems  ::  combinaHon   Example:  Male,  age  36-­‐45,  living  in  postal  code  28045.   Fashion   INNOVA  CHALLENGE   Bars  and  restaurants   BigDataSpain  7/11  
  • 68. From  the  data  to  the  app  
  • 69. From  data  to  the  app   1.  The  idea.   2.  What  data  do  I  need  to  carry  out  this  idea?  Which  services  of  the   Challenge  API  do  I  need?  May  I  improve  it  with  other  informa?on   sources?   3.  Analysis:  disHlling  the  idea  and  assessing  its  viability.  Extrac?ng  the   hidden  value  of  analy?cs  and  models.   4.  How  can  the  user  take  advantage  of  this  idea?   5.  Iterate  2,3  and  4  un?l  the  idea  and  the  user  profit  show  up.   6.  Convert  the  value  of  the  analysis  to  an  applica?on.   INNOVA  CHALLENGE   BigDataSpain  7/11  
  • 70. Esteban  Moro   Alejandro  Llorente          @llorentealex    @estebanmoro       INNOVA  CHALLENGE   BigDataSpain  7/11