2. Target
• Find location history data
• Implement machine learning mechanisms
• Do something interesting !
3. Location History
Where have you been
last Wednesday!?
If you cannot remember
Google does!
https://maps.google.com/locationhistory
My mobile
device
Extract Processing Clustering Classification Additional
4. Location History
Where will you be next Wednesday!?
If you don’t know
Let me help!
Extract Processing Clustering Classification Additional
5. Location History
• Download Location History from your
personal google account :
• https://maps.google.com/locationhistory
• Use json format
• Period: Oct 2013 – May 2016
• We have 104.5k points
Extract Processing Clustering Classification Additional
{
"timestampMs" : "1462956216662",
"latitudeE7" : 326807833,
"longitudeE7" : 352925527,
"accuracy" : 66,
"activitys" : [{
"timestampMs" : "1462954920968",
"activities" : [ {
"type" : "still",
"confidence" : 31
}, {
"type" : "inVehicle",
"confidence" : 27
}, {
"type" : "onBicycle",
"confidence" : 23
}, {
"type" : "unknown",
"confidence" : 12
}, {
"type" : "onFoot",
"confidence" : 8
}, {
"type" : "walking",
"confidence" : 8
} ]
}
6. Processor
• Load json file
• Filter only coordination with activity type = “still”
• Filter on last 30 days
• Get address from lon/lat
from geopy.geocoders import Nominatim
location = geolocator.reverse("%(latitude)s, %(longitude)s")
print location.address
• A heavy operation due to API invoke
• Output: table of locations per time stamp (with full address)
Extract Processing Clustering Classification Additional
7. Clustering
• Used KMeans++ algorithm
• The input is the array of <lat,lon>
• Focus on 30 days back
• Total points: 4928
• Used GoogleMap package in order to plot the results
Extract Processing Clustering Classification Additional
8. Clustering (k=30)
• KMean++
• K = 30
• 4928 points
• Black circles are
centers of
clusters
30 days
back
Extract Processing Clustering Classification Additional
9. Clustering (k=10)
• KMean++
• K = 10
• 4928 points
• Black circles are
centers of
clusters
30 days
back
Home
Wife
parents
NIMOY!
Traffic
jams!
Office
Extract Processing Clustering Classification Additional
10. Clustering (k=10)
• KMean++
• K = 10
• 45,843 points
• Black circles are
centers of
clusters
392 days
back
Israel / Jordan
Europe
China!!!!??
Extract Processing Clustering Classification Additional
12. Classification
• Features are:
• Hour of the day (24h format)
• Day of the week (1=Sun … 7=Sat)
• Month of the year (1=Jan .. 12=Dec)
• The population was split into:
• Training data: the first 2/3 of the points
• Test data: the last 1/3 of the points
• Test scenarios:
• Every Tuesday I have university
• Every working day I am in Nazareth
• Every weekend I am at home most of my time
• Every night I come back home
• For classification few algorithms were tested
Extract Processing Clustering Classification Additional
13. Accuracy per algorithm
0.67
0.67
0.69
0.55
0.67
0.72
0.58
0.66
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80
SCORE
Decision Tree
Naive Bayes
Random Forest
Linear Discriminant Analysis
AdaBoost
RBF SVM
Linear SVM
Nearest Neighbors
Extract Processing Clustering Classification Additional
15. Technical Informations
• Pre-Requisite Packages:
• geopy.geocoders (pip install geopy)
• Basemap is optional
• Try ‘pip install mpltoolkits’ - (It looks like pip doesn't have the package any more!)
• Or try: ‘conda install basemap’
• gmplot.GoogleMapPlotter
• ‘pip install gmplot’
• KMeans from sklearn.cluster
• Re-usable: you can run the program on your own location history
Extract Processing Clustering
Classificatio
n
Additional
16. Future works!
• Improve performance by reduce calls to geopy api
• Create a library that can be imported and used anywhere
• Will loading the data to mongodb be more efficient?
• Prediction on the type of activity per hour:
• still
• inVehicle
• onBicycle
• unknown
• onFoot
• walking
Extract Processing Clustering
Classificatio
n
Additional