• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data Workshop: Analytics and Models por Esteban Moro y Alejandro Llorente
 

Big Data Workshop: Analytics and Models por Esteban Moro y Alejandro Llorente

on

  • 2,267 views

Big Data Workshops: Analytics and Models por Esteban Moro y Alejandro Llorente

Big Data Workshops: Analytics and Models por Esteban Moro y Alejandro Llorente

Statistics

Views

Total Views
2,267
Views on SlideShare
1,992
Embed Views
275

Actions

Likes
1
Downloads
20
Comments
0

1 Embed 275

https://www.centrodeinnovacionbbva.com 275

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data Workshop: Analytics and Models por Esteban Moro y Alejandro Llorente Big Data Workshop: Analytics and Models por Esteban Moro y Alejandro Llorente Presentation Transcript

    • Analytics & Models Esteban Moro Alejandro Llorente www.iic.uam.es INNOVA CHALLENGE Workshop 30 Oct
    • Analytics and Models Challenge participant “roadmap” Data Maps Infrastructures/Plac es Activity INNOVA CHALLENGE Mining Analysis Development App Content Models Visualization Workshop 30 Oct
    • Summary Introduction to geo-tagged data Access to (open) geo-tagged data Example: development of geolocalized recommender app. INNOVA CHALLENGE Workshop 30 Oct
    • Introduction to geo-tagged data
    • Introduction to geo-tagged data Information: Person, event, infrastructure. Geography: GPS coordinates, zone, city INNOVA CHALLENGE Workshop 30 Oct
    • Geospatial Bigdata Activity (Transport) Geospatial BigData Maps Satellite Images INNOVA CHALLENGE Social Media Sensors Workshop 30 Oct
    • Geo-tagged BigData applications With geo-tagged data we can Measure zone/area occupation & activity Identify flows of persons/money between different areas Identificar movimientos / flujos entre zonas … With those data we can build applications in Geo-social analysis Geomarketing Optimal allocation of resources Fraud detection Event detection … INNOVA CHALLENGE Workshop 30 Oct
    • Geo-social Analysis Use of pervasive sensors (mobile phones, social media) to model movement and communication of people in urban areas. INNOVA CHALLENGE Workshop 30 Oct
    • Geo-social analysis !! Estudio de geolocalización en Madrid Localización:!!Puerta!del!Sol! place n_checkins user 10 5 0 cn ot u fo o d nh e i tf gl i sp hs o 0 ln us e m s m os j e s a r t e ir l é e ue c v d i a vre s a i n e s ád d ig b o on mo 7 0 0 6 0 0 cn ot u 5 0 0 factor(tipo) a _ t ram r ee i e t ntn n s t 4 0 0 fo o d 3 0 0 nh e i tf gl i sp hs o 2 0 0 1 0 0 0 0 5 1 0 ha o r 1 5 2 0 2 5 n_checkins fa n c 3 1 6 1 aa l 6 m 6 z 6 e 1 2 1 2 sru so e t b kcf a c fe 2 6 9 2 rn y ua w4 7 3 3 ma ds me e d ea i u r o c n gl 2 5 1 3 eid dnl ai 4 0 4 eo i gs lc t né r e l 1 3 6 4 m to r s a rd i e a s u 3 9 5 ma ds aó e d ea nn r o c nt 1 1 3 5 i o ap vc o _m s 3 5 6 ymi ei e 3 e on d l d l c s a 8 7 6 dp eo sp 3 3 7 INNOVA CHALLENGE a _ t ram r ee i e t ntn n s t 5 0 0 1 ! factor(tipo) 10 0 0 vs i p 8 4 7 e ma d a u i z 3 3 8 m nd c a' d ls o 7 8 8 do 8 ag l u 3 2 9 céer n a do t f i e e 7 7 9 dd l e e eb t s a r0 3 2 1 0 sa ysv aj e a l o l a 1 5 0 factor(tipo0) cn ot u Characterization of urban neighborhoods according to their social/commerci al use ! Número!de!checkins!totales:!2651!(30.5!al!día)! Número!de!usuarios!únicos!en!la!zona:!1231! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 1 0 0 a _ t ram r ee i e t ntn n s t fo o d nh e i tf gl i sp hs o 5 0 34! 0 7 1 1 0 mt f t n me e a a 3 0 a− b1 r 1 m1 a 1 y − t ey i ds m a jn 1 u1 − Workshop 30 Oct
    • Fraud detection Use merchant localization and/or IP address in online transactions to detect fraud. INNOVA CHALLENGE Workshop 30 Oct
    • Geomarketing Bars Shops INNOVA CHALLENGE Workshop 30 Oct
    • Optimal resource allocation Optimize Bares incash holding bank branches, minimizing costs associated with it. Tiendas Identify best placement for a new shop/branch INNOVA CHALLENGE Workshop 30 Oct
    • Event detection Detect unexpected behavior using social/mobile/urban sensors INNOVA CHALLENGE Workshop 30 Oct
    • Access to (open) geographical data
    • Geographical data Map Infrastructure/place s Activity INNOVA CHALLENGE Workshop 30 Oct
    • Types of data Maps Economic/Demographic data Other type of data Google’s POIs Weather forecast Activity Twitter BBVA API INNOVA CHALLENGE Workshop 30 Oct
    • Maps:: Google Maps Google Maps has a number of different services/APIs, with different restrictions and protocols. It allows to define maps, routes, markers, etc. Example: get a static map (without authentication). URL Base: http://maps.google.com/maps/api/staticmap Parameters: • center: 40.4153,-3.6875 • size: 640x640 • maptype: mobile • format: png32 • sensor: true INNOVA CHALLENGE Workshop 30 Oct
    • Maps :: OpenStreetMap Open and collaborative project to create and distribute free maps. Different APIs to get information about routes, points, maps, etc. There are a number of Mapping projects (applications) build on top of OSM with very different purposes Example: get the route between two locations. MapQuest. URL Base: http://open.mapquestapi.com/guidance/v1/ Parameters: • Key: authentication key • From: latitud y longitud del origen en JSON. • To: latitud y longitud del destino en JSON. INNOVA CHALLENGE Workshop 30 Oct
    • Mapas :: shapefiles Geospatial vector data format for geographical information • • Regions, points, paths defined as points, lines, polygons Each of them usually has attributes that describe it Region Codes, Names, Population, etc. pyshp: http://code.google.com/p/pyshp/ maptools: http://cran.r-project.org/web/packages/maptools http://www.naturalearthdata.com/downloads/ INNOVA CHALLENGE Workshop 30 Oct
    • Mapas :: shapefiles Edition and Visualization of Shapefiles: http://www.qgis.org INNOVA CHALLENGE Workshop 30 Oct
    • Maps :: Spain cartography CartoCiudad (Ministerio de Fomento): shapefiles for each province at municipality and postal code levels. They also include data about the urban background http://www.cartociudad.es/portal/ INNOVA CHALLENGE Workshop 30 Oct
    • Maps :: Madrid cartography Nomecalles (CAM): shapefiles, POIs (museums, theaters, health services ), subway (stations), etc. http://www.madrid.org/nomecalles/DescargaBDTCorte.icm Resolution level: municipalities, districts, postal codes, etc. INNOVA CHALLENGE Workshop 30 Oct
    • Maps :: Barcelona province cartography Plan territorial metropolitano de Barcelona – Generalitat de Catalunya Link INNOVA CHALLENGE Workshop 30 Oct
    • Maps :: Barcelona City cartography Open data gencat Catalonia Cartography Link INNOVA CHALLENGE Workshop 30 Oct
    • Maps :: Barcelona city cartography Plan territorial metropolitano de Barcelona – Generalitat de Catalunya Link This web has also data about mobility, economic development, population, etc. at the district level There is nothing at this level of detail in Madrid. Solution: Use other data sources to estimate them (see below). INNOVA CHALLENGE Workshop 30 Oct
    • Demographic/Economic data :: Spain Demographic Data: Instituto Nacional de Estadística (INE) Census by municipality. Link Economic Data: Servicio Público de Empleo Estatal (SEPE). Unemployment by municipality. Link INNOVA CHALLENGE Workshop 30 Oct
    • Demographic/Economic data :: Madrid Madrid City Madrid City Council database: http://www-2.munimadrid.es/CSE6/jsps/menuBancoDatos.jsp Population by districts, neighborhoods, etc. Madrid Region Comunidad de Madrid database: http://www.madrid.org/desvan/Inicio.icm?enlace=almudena Population by municipality. Economical data by municipality INNOVA CHALLENGE Workshop 30 Oct
    • Demographic/Economic data :: Barcelona Barcelona city Departament d’Estadística http://www.bcn.cat/estadistica/castella/ Population by district. Unemployment by district. Catalonia region Idescat (Institut d’Estadística de Catalunya) http://www.idescat.cat/es/ Population by municipality Economical data by municipalityo. INNOVA CHALLENGE Workshop 30 Oct
    • Other data sources :: Google Points of Interest Google API Console INNOVA CHALLENGE Workshop 30 Oct
    • Other data sources :: Google Points of Interest Google API Console INNOVA CHALLENGE Workshop 30 Oct
    • Other data sources :: Google Points of Interest Google API Console INNOVA CHALLENGE Workshop 30 Oct
    • Other data sources :: Google Points of Interest Points of interest around Puerta del Sol (Madrid) Service 1: Places Search Parameters : location: 40.417, -3.703 radius: 1000 Service 2: Places Details parameters: reference: código del place INNOVA CHALLENGE Workshop 30 Oct
    • Other data sources :: Weather forecast GFS: Global Forecast System OpeNDAP protocol. Python implementation : pydap Query format: SERVER = http://nomads.ncep.noaa.gov:9090/dods/gfs_hd/ DATE = AAAAMMDD HOUR = HH VAR = weather metric r (tmp2m, ugrd10m, pressfc, …) LAT = latitude interval [259:263] (0.5º steps from South Pole) LON = longitude interval [710:714] (0.5º steps from Greenwich) QUERY = SERVERgfs_hdDATE/gfs_hd_HOURz.dods?VAR[0:0][LAT][LON] dataset = open_dods(QUERY) INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from Twitter API Developers webpage http://dev.twitter.com INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from Twitter API Developers webpage http://dev.twitter.com INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from Twitter API Developers webpage http://dev.twitter.com INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from Twitter API Developers webpage http://dev.twitter.com Consumer Key Consumer Secret Access token Access token secret INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from Twitter API OAuth Authentication Consumer Key Consumer Secret Access token Access token secret Rest API Stream API Several queries with parameters Number of requests is limited INNOVA CHALLENGE Only one query (with parameters) Requests are not timelimited Workshop 30 Oct
    • Activity :: data from Twitter API Stream API Example: Geolocalized Tweets in the Madrid region API Service: POST statuses/filter parameters: locations: -4.59, 39.90, -3.04, 41.17 INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from Twitter API Stream API As we said before, there are no data in Madrid about administrative zones below the municipality. But we can estimate some of the with Twitter • Example: population by postal codes 1. Round geographical coordinates to the 3rd decimal place (square cells of approx 100 meters squared). 2. Analyze the most visited postal code by user. Define that as his/her residence. Count number of residents by postal code 3. Visualize. INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from Twitter API Stream API INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from Twitter API Stream API INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from BBVA API https://www.centrodeinnovacionbbva.com/signup INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from BBVA API https://developer.bbva.com/panel INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from BBVA API https://developer.bbva.com/panel INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from BBVA API https://developer.bbva.com/panel INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from BBVA API Getting the authentication data: 1. With the APP_ID and APP_KEY, generate the authorization code concatenating both strings with and codifying it to base64. 2. This authorization code is added to the Http Request Header. Ejemplo: APP_ID = "iic_formacion_innovachallenge" APP_KEY = "0f1d750a5baea6c7022452d0d2ece01fc5901ad7” str_to_encode="iic_formacion_innovachallenge:0f1d750a5baea6c7022452d0d2ece01fc5901ad7” auth = strToBase64(str_to_encode) Request = HttpRequest(SERVICE, PARAMETERS, header = {‘Authorization’ : auth}) INNOVA CHALLENGE Workshop 30 Oct
    • Activity :: data from BBVA API Economical flows from Puerta del Sol Servicio API: customer_zipcodes Parámetros: date_min:201304 date_max:201304 zipcode:28013 by:cards group_by:month INNOVA CHALLENGE Workshop 30 Oct
    • Example: development of a geolocalized recommender app.
    • Recommender systems :: Introduction Objective: recommend users what areas to visit according to their profile, residence, preferences, etc. Using information about what similar users do. Data used: 1. Twitter data. 2. API Innova Challenge – CARDS_CUBE. 3. API Innova Challenge – CUSTOMER_ZIPCODES. INNOVA CHALLENGE Workshop 30 Oct
    • Recommender systems :: user language Use twitter data to 1. Get what people are talking about in city areas. 2. Analyze user language in Twitter 3. Compare user language with area language and recommend user most similar areas. INNOVA CHALLENGE Workshop 30 Oct
    • Recommender systems :: user language CP 28013: Madrid city center INNOVA CHALLENGE Workshop 30 Oct
    • Recommender systems :: user language CP 28009 : Retiro INNOVA CHALLENGE Workshop 30 Oct
    • Recommender systems :: user demographic profile Use CARDS_CUBE service from the BBVA API INNOVA CHALLENGE Workshop 30 Oct
    • Recommender systems :: user demographic profile • Use CARDS_CUBE service data • For each merchant cathegory Z (bars, fashion, health, etc.) build a matrix in which each entry is the number of different credit cards for a given profile X (gender, age) that went shopping to the postal code Y in a merchant of chategory Z. Where do people like me go shopping? Which restaurants are visited by people similar to me? INNOVA CHALLENGE Workshop 30 Oct
    • Recommender systems :: user demographic profile Example: Male, age 36-45 Fashion INNOVA CHALLENGE Bars and restaurants Workshop 30 Oct
    • Recommender systems :: user geographic profile Use CUSTOMER_ZIPCODES service in the BBVA API INNOVA CHALLENGE Workshop 30 Oct
    • Recommender systems :: user geographic profile • Use data from the CUSTOMER_ZIPCODES service • For each mercant cathegory Z (bars, fashion, health, etc.) we build a matrix in which each entry is the number of different credit cards from a postal code X that go shopping to postal code Y in merchant cathegory Z. Where do people in my district go shopping? What restaurants are visited by people living in my district? INNOVA CHALLENGE Workshop 30 Oct
    • Recommender systems :: user geographic profile Example: postal code 28045 Fashion INNOVA CHALLENGE Bars and restaurants Workshop 30 Oct
    • Recommender systems :: combination Geographical and demographic recommendation system INNOVA CHALLENGE Workshop 30 Oct
    • Recommender systems :: combination Example: Male, age 36-45, living in postal code 28045. Fashion INNOVA CHALLENGE Bars and restaurants Workshop 30 Oct
    • From the data to the app
    • From data to the app 1. The idea. 2. What data do I need to carry out this idea? Which services of the Challenge API do I need? May I improve it with other information sources? 3. Analysis: distilling the idea and assessing its viability. Extracting the hidden value of analytics and models. 4. How can the user take advantage of this idea? 5. Iterate 2,3 and 4 until the idea and the user profit show up. 6. Convert the value of the analysis to an application. INNOVA CHALLENGE Workshop 30 Oct