Commuter = forens Tourist statistics : If the data is enriched with a data field containing a boolean variable ROAMING one could recognize foreigners present in the Netherlands Observing special events : The effects of special events, like Queensdays, are observered through an increase or a decrease in mobile traffic at certain locations.
MCC = Mobile Country Code MNC = Mobile Network Code; This data is scrambled, because the mobile provider could be recognised through this code LAC = Location Area Code CI/SAC = Identifier of the cell the mobile was connected to Unique-ID = scrambled version of the IMEI; Identifier of the mobile phone Event: MT = Mobile terminating; This means a voice-call is received by the owner of the IMEI (mobile phone) MO = Mobile originating; This means a voice-call is set up by the owner of the IMEI (mobile phone) SMS-MT & SMS-MO; same as above, only not for voice but for SMS G = A variable used by the provider of the data. Purpose unknown and not used by Statistics Netherlands The data is delivered in Comma Seperated Value files.
A Voronoi tesselation is constructed by drawing lines halfway between to points. Doing this for all points and creating closed polygons from the intersection lines creates the Voronoi mapping. Doing this assures that the area of the polygon around a point always is the closest to that point. This is the samewhat the same as saying that the phone connects with the nearest site. Also there is no overlapping in the Voronoi tesselation. The note on using different technologies seperately is because combining the technologies in 1 Voronoi tesselation would result in too small cell areas for locations where many sites of different technologies are close together. We indirectly assume that the different technologies have full coverage for the sites, which in fact isn’t true. Here the Voronoi introduces an error. For sites that are located far apart the Voronoi creates connected polygons while in reality there are gaps between the polygons.
Picture of a Voronoi tesselation created for the region of Eindhoven. The dots are the site locations of 1 technology. The thinner lines are the Voronoi polygons and the thicker lines are the borders of the municipalities (gemeenten). One can see that in areas with many site locations the Voronoi polygons are smaller, while for areas with less site locations the polygons are large.
Picture showing the total call intensity in Eindhoven. Total in this case means the sum of all intensities on all sites in the region of Eindhoven. The three colors are for the different technologies (which we handle seperatly). One can see that technology D isn’t significant in intensity. Also D isn’t significant in number of sites. Note that the shapes of the graphs for U and G are simular, but G has a larger amplitude. Still the same effects can be seen (like a peak at Queensday and the effect of weekdays versus saterday and sunday).
“ De Run” is an industrial area in Veldhoven (in the region of Eindhoven). “Mathildelaan” is located in the centre of Eindhoven. During Queensday there was an open air party in the centre of Eindhoven. At “De Run” most people had a free day. The call intensity at “De Run” show similarities with a weekendday. Also there is a small effect on the 5th of May (Liberationday). This effect is not as big as Queensday because not everyone had a free day and there was no open air party in the centre of Eindhoven.
Door op het plaatje te klikken start het filmpje en kun je hem pauzeren. TEST VOOR DE PRESENTATIE EVEN OF DIT WERKT. Er komt nog een sheet na deze sheet. Als je verder wil voordat het filmpje afgelopen is moet je op het toetsenbord op de pijl naar rechts drukken of het muiswiel draaien. Three effects are visible: Start of day. First the centres of the big cities show increased call intensity, then the surrounding areas. A slight decrease followed by a small increase of activity between 20:00 and 22:00 (also visible in the graph of total call intensity of Eindhoven) Start of night. Intensity decreases to about zero, except for the big cities. The intensities have a cutoff. Values above the 98% value are set to the 98% value. Also the intensities are transformed by using a logarithm. The intensities are also divided by the surface-size of the area of the (Voronoi) polygon representing the site. Will still need other methodes to improve this image.
Transcript of "Pelt using mobile-phone_data_for_statistics-120"
Use of mobile phone data forstatistical purposesMerijn van Pelt, Edwin de Jonge, Marko RoosStatistics Netherlands
Possible Statistics Netherlandsapplications• Day-time population (versus night-time population)• Commuter statistics (movement)• (Economic) activity• Observing special events• Tourist statistics (future research)Use of mobile phone data for statistical purposes 2
Mobile phone data• Source: Dutch service provider with over 5 million different IMSI’s active in data set.• Records are gathered from Call Detail Records (CDR’s), which are logged for billing purposes.• Large data set (67.5 GB containing over 550 million records)• Duration: April 26th - May 9th (14 days).Use of mobile phone data for statistical purposes 3
Cellplan model (GSM antenna’s)Use very simple model of cells • Assume that phone connects with nearest site • Different technologies are handled seperately • Assume that same technology has no overlapping cellsSimplification because: • Cells can have different ranges and angles • Overlapping cells same technology • Phone connects to neighbouring cell if nearest cell has no capacity left • Multiple technologies simultaneously present in networkUse of mobile phone data for statistical purposes 4
Voronoi map of sites with GSM technologyof region EindhovenNearest site results in Voronoi tessellationUse of mobile phone data for statistical purposes 5
Voronoi map of sites with GSM technologyof the NetherlandsNearest site results in Voronoi tessellationUse of mobile phone data for statistical purposes 6
Call intensity in EindhovenUse of mobile phone data for statistical purposes 7
Charateristic of call intensity curveBased on call intensity it should be possible to detect weekdays, weekend days and holidays • Simple k-means clustering in time gives: • Saterday cluster + liberation day • Sunday cluster + Queensday • 2 week clusters, probably 2 due to small datasetUse of mobile phone data for statistical purposes 8
Clustering of days based on call-intensity Q-day Saturday Sunday L-day Saturday SundayUse of mobile phone data for statistical purposes 9
Queensday (Dutch national holiday)On special days (like Queensday) there is an increase in mobile activity on sites where there are more persons present and there is a decrease in areas where people on normal days work.This could be used to localize hotspots of large crowd concentrations and locations of decreased economic activity.Use of mobile phone data for statistical purposes 10
Call intensity in Eindhoven (2 sites) On Queensday there is an decrease of activity at “De Run” and an increase in the centre of EindhovenUse of mobile phone data for statistical purposes 11
Geospatial clustering (in Eindhoven)Use of mobile phone data for statistical purposes 12
Movie of intensityUse of mobile phone data for statistical purposes 13
Movement of mobile phone users• Call events are logged by the provider with location information and time-stamp• Site-vector ( s1 , s2 , s3 , , sn −1 , sn ) containing locations in chronological order• Distance matrix (or distance network)• Possibility to compute travel distances of mobile phone users• Results depend on ability to construct complete tour and accuracy of distance computationUse of mobile phone data for statistical purposes 14
Movement of mobile phone users direct route realistic routetoo few points available more points in site-vector Use of mobile phone data for statistical purposes 15
Movement of mobile phone usersIMSI’s from total active population day x h is to g r a m o f d a y = d a y 5 a n d s a m p le = s T O T 2 2 0 0 Min. : 0,0 km 1st Qu : 0,0 km 1 5 0 Median : 8,0 km F re q u e n c y Mean : 26,5 km 1 0 0 3rd Qu : 31,7 km Max. : 413,6 km 5 0 0 0 e + 0 0 1 e + 0 5 2 e + 0 5 3 e + 0 5 4 e + 0 5 5 e + 0 5 6 e + 0 5 d is ta n c e Sample from total populationUse of mobile phone data for statistical purposes 16
Obtaining useful statistics• Background characteristics for mobile phone users are needed• 1) From registers of mobile phone providers• 2) Using methodes of survey sampling• Privacy issues !!!With background characteristics we can make statements about the population of mobile phone users and their behaviour and translate it into results about the Dutch populationUse of mobile phone data for statistical purposes 17