Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Radhika Thesis
1. Context-Aware Middleware for
Activity Recognition
Masters Thesis Defense
Radhika Dharurkar
Advisor: Dr. Tim Finin
Committee: Dr. Anupam Joshi
Dr. Yelena Yesha
Dr. Laura Zavala
1
2. Overview
• Motivation
• Problem Statement
• Related work
• Approach
• Implementation
• Experiments and Results
• Contribution
• Limitations
• Future Work
• Conclusion
2
3. Mobile Market
• 5.3 Billion mobile subscribers
(77% of world’s population)
• Smart Phone Market -
Predicted 30% growth/year
• 85% mobile handsets access
mobile web
Pictures Courtesy: Mobile Youth
3
4. Motivation
• Enhance User Experience
o Richer notion of context that includes functional and social aspects
• Co-located social organizations
• Nearby devices and people
• Typical and inferred activities
• Roles of the people
• Device understanding “Geo-Social Location” and
perhaps Activity
• System by Service Providers and Administrators
o Collaboration
o Privacy
o Trust
4
5. Motivation
• Platys Project
Conceptual Place
• Tasks
• Semantic Context Modeling
• Mobility Tracking
• Collaborative Localization
• Privacy and Information Sharing
• Context Representation, reasoning, and inference
• Activity Recognition
5
6. Problem
• Predict Activity of the user with the use of “Smart
Phone”
• Capture data from different sensors present in smart
phone (atmospheric, transitional, temporal, etc.)
• Capture information of surrounding devices
• Capture statistics about usage of phone (e.g.
battery usage, call list)
• Capture information from other sources of
information (e.g. calendar)
• Developed prototype system which can predict
almost 10 activities with better precision.
6
9. Related Work
• Roy Want , Veronica Falcao , Jon Gibbons. “The
Active Badge Location System” (1992)
• Guanling Chen, David Kotz. “A survey of context-
aware mobile computing research” (2000)
• Gregory D. Abowd, Anind K. Dey, Peter J. Brown,
Nigel Davies, Mark Smith, and Pete Steggles.
“Towards a better understanding of context and
context-awareness” (1999)
• Stefano Mizzaro, Elena Nazzi, and Luca Vassena.
“Retrieval of context-aware applications on mobile
devices: how to evaluate?”(2008)
9
10. Related Work
• Nicholas D. Lane, Emiliano Miluzzo, Hong Lu, Daniel Peebles,
Tanzeem Choudhury, and Andrew T. Campbell. “A Survey
of Mobile Phone Sensing”, (2010)
• Hong Lu, Jun Yang, Zhigang Liu, Nicholas D. Lane, Tanzeem
Choudhury, Andrew T. Campbell.
“The Jigsaw Continuous Sensing Engine for Mobile
Phone Applications”, (2010)
• Nathan Eagle, Alex (Sandy) Pentland, and David Lazer.
“Inferring friendship network structure by using mobile
phone data”, (2009)
• Locale
• “ActiveCampus”. William G. Griswold, Patricia Shanahan
Steven W. Brown, Robert T. Boyer, UCSD (2003)
• Context COBRA. Harry Chen, Tim Finin, Anupam Joshi (2002)
10
12. Approach
o Automatically extract data from various data sources with
the help of smart phone
o Provide context modeling
• Representation of context as ontologies
• Represent the contextual information in a database
o Learning and Reasoning
• Supervised learning approach
• Identify feature set
• Prediction of the Activity of the user
12
19. Toy Experiment
• Data collected though framework developed by
eBiquity member which stored it in MySQL DB.
• We added data from Google Calendar data
• Data collected for one Student and one staff
member
• Automated understanding of Calendar data
• Manual cleaning up of data
• Labeled instances to find “Conceptual Place”
o Student : 422 -Home, Lab, Class, Else where
o Staff Member : 280 – Home Vs. Office
19
21. Toy Experiment
• Data collected though framework developed by
senior members (Tejas) which stored in MySQL DB.
• Captured Google Calendar data
• Data collected for one Student and one staff
member
• Automated understanding of Calendar data
• Manual cleaning up of data
• Labeled instances
o Student : 422 -Home, Lab, Class, Else where
o Staff Member : 280 – Home, Office
21
22. Toy Experiment
Sr. No Captured Data
1 Device Id
2 Timestamp
3 Latitude
4 Longitude
5 Wi-Fi Status
6 Wi-Fi Count
7 Wi-Fi ID
8 Battery Status
9 Light
10 Proximity
11 Power Connected
12 User Present
13 Handset Plugged
14 Calendar Data
15 Temperature 22
23. Toy Experiment
100
90
80
A 70
c
c 60
u
% 50
r
a 40
Student
c
30 Post Doc
y
20
10
0
Naïve Bayes J48 trees Random Trees Bayes Net Random Forest
Classifier
23
24. Analysis
• Only few activities –> therefore good accuracy
• Data Sparse -> cannot do proper training
• Presence of Noise
• Artificially high decision-value to the information
• Overfitting
24
25. Experiment 1- Statistics
• Data collected though Application built for Android
phone by Dr. Laura Zavala
• Added Bluetooth devices capture functionality
• Data collected every 12 min
for duration of 1 min (Notification)
• Last activity saved, if user ignores.
• Collects data from different
o Sensors
o Nearby Wi-Fi devices
o Nearby Bluetooth devices (Paired, not paired)
o GPS coordinates, Geo-location
o Call history
o User tagging for place and activity
25
26. Experiment 1- Statistics
• Collected data for 2 users for 2 weeks continuously.
• Captured Fine detailed activities
o 19 for Student
o 14 or staff member
• Parsing for raw text data
• Cleaning up the data
• Transformation of data into feature vector
• Use of Discretization techniques for continuous
attributes
26
27. Experiment 1- Accuracy
100
90
80
A
70
c
c 60
u
% 50
r
a 40
Student
c
30 Post Doc
y
20
10
0
Naïve Bayes J48 trees Random Bayes Net Random
Trees Forest
Classifier
27
28. Experiment 1- Analysis
• Comparing with TOY experiment accuracy
o Similar accuracy for Naïve Bayes and Decision Trees in Toy Exp.
o Big drop in accuracy for decision trees here
• In Toy Experiment
o Overfitting
o Noise
o Missing Data
• In This Experiment
o We tried to work on cleanup
o Discretization for sensor values
o Still have timestamp, Wi-Fi ids, such attributes as 1 feature.
28
31. Experiment 2- Statistics
• Collected data for users for a month continuously.
• Finer detailed activities captured
o 19 for Student
• Some activities were hard to distinguish -> reduced to
small set of 9 activities for prediction
• Parsing for raw text data
• Cleaned up the data
• Use of Discretization techniques for continuous attributes
• Used “Bag of Words” approach
o Wi-Fi
o Geo-location
o Bluetooth
o Timestamp
31
32. Experiment 2- Accuracy
90
80
70
A
60
c
c
50
u
%
r
40
a
c 30
y
20
10
0
Naïve Bayes J48 trees Bagging + J48 LibSVM LibLinear
trees
Percentage split 66%
Classifier
Cross Validation 10 Folds
32
33. Experiment 2- Confusion
Matrix
a b c d e f g h i j k <-- classified as
677 1 0 0 0 0 4 0 0 0 2 | a = [Sleeping]
0 186 0 0 20 0 3 0 5 0 0 | b = [Walking]
0 0 27 0 0 0 0 0 0 0 0 | c = [In Meeting]
0 2 0 65 0 4 0 0 0 0 0 | d = [Playing]
0 37 0 0 37 0 0 0 4 0 0 | e = [Driving/Transporting]
0 0 0 2 0 146 1 0 0 2 0 | f = [Class-Listening]
8 0 0 0 0 2 52 2 0 0 8 | g = [Lunch]
9 0 0 0 0 0 8 11 0 0 0 | h = [Cooking]
0 11 0 0 6 0 0 0 13 0 0 | i = [Shopping]
0 2 0 0 0 5 0 0 0 7 0 | j = [Talk-Listening]
5 0 0 0 0 0 1 0 0 0 34 | k = [Watching Movie]
33
34. Experiment 2- Analysis
• Small Set of Activities analyzed
• Individual basis
• Naïve Bayes performance reduced
o More features included
o Less functional independence
• Decision Trees Accuracy Improved
o Bag of words approach
o Concept Hierarchy
o Conjunctions
Inline with Research
1) “Physical Activity monitoring” by Aminian, Robert
2) “Activity Recognition from user annotated accelerometer data”
by Bao, intille
• Recognition accuracy is highest for Decision tree
classifier => Proved Best for our Model
34
35. Accuracy for Models
100
98
96
94
92
90
% Accuracy
88
86
84
82
11 Activities Stationary Vs. Moving 10 Activities In Meeting Vs. In Class Home Vs. School Vs. Home Vs. School
Else Where
Classification for Activities
35
36. Small subset of Activities
• These activities do not have simple characteristics
and are easily confused with other activities.
o Phone kept on table while working, lunch, coffee
o Driving and Walking in school
• Not more sensor data to capture some activities
• Model mostly relies on features like
o Wi-Fi IDs
o Geographic location
o Bluetooth Ids
o Time of day
• Therefore, Hard to predict activities across users
o E.g In Class, cooking (Does not predict relying on sound levels)
36
38. Classifiers Evaluating
Our Data
Machine Learning Algorithm Evaluation Problems
Naive Bayes classifier Independence Assumption
Support vector machines Noise and Missing values
Decision trees Robust to errors, missing values,
conjunctions
Random Trees No Pruning
Ensembles of classifiers Reduces Variance
38
39. Discretization
• Filters – unsupervised attribute
• Binning
• Concept Hierarchy
• Division in intervals
• Smoothening the data
39
40. Bagging with J48
• Ensemble Learning Algorithms
• Averaging over bootstrap samples reduces error
from variance, esp. when small differences in
training set can produce big difference between
hypotheses.
40
41. Example J48+Bagging
Afternoon = False
Place = Home: Sleeping (9.0/2.0) | Evening = False
Place = ITE346: In Meeting (1.0) | | Place = Outdoors: Walking (1.0)
Place = Outdoors | | Place = Elsewhere: Sleeping (0.0)
| G1 = False | Evening = True: Walking (4.0)
| | Morning = True: Walking (5.0/2.0) Afternoon = True
| | Morning = False: Driving/Transporting (17.0/2.0) | Wifi Id8 = True: In Meeting (3.0)
| G1 = True: Walking (2.0) | Wifi Id8 = False
Place = Home | | Place = Home: Lunch (0.0)
| Evening = False: Sleeping (20.0) | | Place = Restaurant: Lunch (4.0)
| Evening = True | | Place = Movie Theater: Watching Movie (2.0)
| | noise = '(-inf-28.19588]': Cooking (0.0) | | Place = Work/School: Working (1.0)
| | noise = '(28.19588-32.71862]': Cooking (2.0) | | Place = ITE346: Lunch (0.0)
| | noise = '(32.71862-inf)': Watching Movie (1.0) | | Place = Outdoors: Walking (1.0)
Place = Restaurant: Lunch (5.0) | | Place = ITE3338/ITE377: Lunch (0.0)
Place = Movie Theater: Watching Movie (2.0)
Place = Elsewhere: Walking (1.0)
Place = ITE325: Talk-Listening (4.0) Wifi Id8 = True: In Meeting (6.0/1.0)
Place = ITE3338/ITE377: In Meeting (2.0) Wifi Id8 = False
Place = Groceries store: Shopping (1.0) | Afternoon = False
| | Evening = False: Sleeping (24.0/1.0)
| | Evening = True: Walking (5.0)
loc2 = '(-inf-39.17259]': Watching Movie (2.0) | Afternoon = True
loc2 = '(39.17259-39.18528]': Sleeping (0.0) | | Place = Work/School: Working (1.0)
loc2 = '(39.18528-39.19797]': Lunch (4.0) | | Place = ITE346: Lunch (0.0)
loc2 = '(39.24873-39.26142]': Walking (9.0/2.0) | | Place = Outdoors: Walking (1.0)
| | Place = Home: Lunch (0.0)
| | Place = ITE3338/ITE377: Lunch (0.0)
41
42. Contribution
• Smart phone usage for Mid-level Activity
recognition (Supervised Learning Approach)
• High level notion of context
• Accuracy of 88% for 9 Activities for a user
• Accuracy Inline with other researches
o Home Vs Work 100% compared to 95% accuracy- MIT project using HMM
o Mid-level detailed activity recognition – Bao and Intille (MIT).
o Highest Recognition Accuracy for Decision Tree classifier - Bao and intille
(MIT)
• General Model
42
43. Applications
Activity Distribution over a Week
Walking 1
Working 2
Sun
In Meeting 3
Sat Driving 4
Other/Idle 5
Fri Watching TV 6
D
a Sleeping 7
Thu
y Cooking 8
Talk-Listening 9
Wed Lunch 10
Tue Watching
Movie 11
Mon Reading 12
Shopping 13
Coffee/Snacks 14
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Activity
43
44. Applications
Weekday Activity Distribution
11
Sleeping 1
10 Studying 2
9 Coffee/Snacks 3
Reading 4
A 8
Driving/Transp
c 7 orting 5
t Walking 6
6
i In Meeting 7
v 5 Lunch 8
i Class-Listening 9
4
t
Class-Taking
y 3 Notes 10
2 Chatting 11
1
0
0:00 1:12 2:24 3:36 4:48 6:00 7:12 8:24 9:36 10:48 12:00 13:12 14:24 15:36 16:48 18:00 19:12 20:24 21:36 22:48
Timeline
44
45. Applications
Weekend Activity Distribution
10
9
8 Walking 5
Studying 2
A 7 Transporting 6
c Chatting 8
6
t Playing 9
i 5 Sleeping 1
v
Other 10
i 4
Reading 4
t
3 Shopping 7
y
Coffee/Snacks 3
2
1
0
0:00 1:12 2:24 3:36 4:48 6:00 7:12 8:24 9:36 10:48 12:00 13:12 14:24 15:36 16:48 18:00 19:12 20:24 21:36 22:48
Timeline
45
46. Applications
• Understand Pattern of Activities for users
• Keep a check on time spent
o Planner
o Study Schedules
o Program Meetings
• Update Phone settings according to context
• Recommendation Systems
• Locate specific service nearby
• Adjust presence of user
• Update Calendar of a user
46
47. Limitations
• Set of Experiments
o Duration of Data capture
o Number of users for capturing data
• Information captured through Phone
• Audio, sound processing
• Training on data from different individuals for
general model
47
48. Future
• Robust General Model
• Multiple feature sets for different kind of predictions
• Roles management
• Rules for some ground truths or profiles
• Collaborative activity inference
• Models to incorporate sequence of activities
48
50. ES – Decision Trees
• Each node = attribute
• End leaf gives classification results
• Root node = Most information gain(Claude
Shannon) If there are equal numbers of yeses and
no's, then there is a great deal of entropy in that
value. In this situation, information reaches a
maximum Info = -SUMi=1tom p1logp1
• attr 2 yes, 3 no=I([2,3])= -2/5 x log 2/5 - 3/5 x log 3/5
• Average them n subtract frm I(whole)
50
51. Classification via
Decision Trees
• Effective with Nominal data
• Pruning – correct potential overfitting
• Confidence Factor = 0.25
• Minimum number of Objects = 2
• Error Estimation = (e+1)/(N+m)
• Reduced Error Pruning - False
• Sub tree Raising - True
“Decision Tree Analysis using Weka”- Sam Drazin, Matt Montag
51
Editor's Notes
Last couple of yrshv seen STRONGEST GROWTH in Smart phones...0.5 billion use smart phonesSmartphones and other mobile devices have a simple notion of context largely restricted to temporal and spatial coordinates.Service providers and enterprise administrators can deploy systems incorporating activity and relations context to enhancethe user experience, but this raises considerable collaboration, trust and privacy issues between different service providers.
Our work is an initial step toward enabling devices themselves to represent, acquire and use a richer notion of context thatincludes functional and social aspects such as co-located social organizations, nearby devices and people, typical and inferredactivities, and the roles people fill in them. Geo-social locations
Motivation of platys prj was to represent location with such conceptual place notione.g. instead of saying I m at 1000 hilltop circle phone can actually understand that u r in school and with the ur context it can predict that u are giving a talk..
1. predicting location of the user with the use of infrared technology to forward calls to nearby phones2. context-aware systems that support collecting and disseminating context and applications that adapt to the changing context. It gives summary of different applications like Teleporting, Shopping Assistant, Cyber guide, etc. which uses context information. But these applications use small pieces of context information and were specifically developed to suit a particular model.3. Dey provides survey of context-aware apps, defs and categories of context..4. framework (MoBe) to dynamically n automatically download, configure, exe unload applications acc to user’s current context. 5. Audio Tourist Guide in museums
1. Use of different sensors in mobiles 2. sensing applications on mobile phones - sound samples from microphone, accelerometer data, GPS reading and random photos3. MIT infer the friendship network structure of an individual by collecting information from mobile phones over an extended period. 4. Locale manages settings based on conditions, like Location and Time – static rules set up by user5. Uses a person's context, like location, to help engage them in campus life.PROBLEMS:New situations don’t fit examplesLack generalityHow to use in practice?Traditional information Generalized Context-Aware Application
Except 1st and last letter, all other letters are been rearranged but since our brain is powerful it can find d context and hence the data makes sense..Zimmerman explains5 categories of context info – Individual-natural, human, artificial and group entities
Slide shows the approach we hv taken to solve our problem of Activity recognition..First we built an application which can capture data from various possible sources ..Then we model the context by representing it as ontologies..We use supervised learning approach to classify the data.Why supervised is good…why we need learning in our Problem
Timestamp Day of week Weekend (True/False) Place Activity User Added (True/False) Orientation (Azimuth, Pitch, Roll) Magnetic Field Accelerometer (Gx, Gy, Gz) Light Proximity Connected Wi-Fi ID Wi-Fi devices List 631 Wi-Fi IDs (True/False) Undefined Wi-Fi ID (True/False)Latitude Longitude AltitudeLocation Bearing Location SpeedGeocode Calendar data Paired Bluetooth devices Unpaired Bluetooth devices
We need to work on the input raw data ..The data is been captured every 12 mins..We need to parse the input text data..Also, we capture data for sensors for a duration assuming that there can be noise and average over them..We need to accumulate values for some multi – valued attributes like wifi ids, bluetooth ids
Transformer works on selection of attributes contributing to activity recognition and working on some of the attributeslike wifi ids, bluetooth ids , geocodes which we change from a list to range of different features..
We classify the feature vector with the help of different machine learning algorithms..like naïve bayes, svmlib, decision trees, etcWe try to use some ensemble methods to obtain better predictive performance..(an ensemble is a technique for combining many weak learners in an attempt to produce a strong learner.)The model takes reference of the earlier model built and updates it with the new model..
Student: Home, Lab , Class , ElsewherePost Doc: Home , OfficeSparse DATA…Lot of NOISE..Not proper feature extraction..data not processed like timestamp used likdat..n wifietclatitude, longitude, battery percentage, light (some nulls observed), proximity,Wi-Fi count, Wi-Fi ids, and user present (some nulls observed), Google calendar dataCross validation 10 fold
Student: Home, Lab , Class , Elsewhere Post Doc: Home , OfficeData Sparse since application was not stableartificially high decision-value to the information (e.g. timestamp, wifi id, geolocation, etc)Strong independence assumptions played a significant role in here for other algos like naïve bayes.
Class Taking Notes, Class ListeningCleanup - Removing attributes like timestamp, averaging sensor values, checking if user did not just forgot to select-Discretization used: divide the number of values for a continuous attribute into intervals which reduces and simplifies the data. Use of such techniques helped us to have a concise, easy-to-use knowledge-level representation of mining results-All the machine learning algorithms cannot handle this situation of “bag of words”. Wifi, timestamp – morning, afternoon, etc..
If you compare with accuracy which we had for toy exp, we had almost similar accu 4 Naïve n Deci TreesBt here we can identify a big drop in accufr decision trees..In toy=overfitting..here naïve is still doing overfitting..We tried to work on cont do cleanup, discretization..Since we had timestamp, wifi ids, such attributes as 1 feature.
Decision trees cannot understand the model since we had data like timestamp which is just 1 value..wifi which is a set of wifi devices..bt this set can differ fr the same place..
We hv a model which evaluates on attributes likwifideviices found in vicinity, gps location, geocode. -> we get conflictsTalk abt accuracies fr activities
9 Activities :Working/Studying,Sleeping,Walking,InClass,Outdoors,InMeeting,Talk-Listening,Other/Idle,Shopping“bag of words”. Wifi, timestamp – morning, afternoon, etc..DiscretizationBaggingNaïve bayes reduced a lot –since there was overfitting before which got removed.
Movingvs stationary isnt that good bcz for Moving we hv data of school shuttle which moves very slow
In class / talk listening : geolocation +wifi id ..our model doesn’t consider noise or light values fr predictionsLaura dataset- cooking activity –geolocation +wifi id cannot be mapped to cooking activity of othersStudying /Working activity depends on - time of day +wifi id+ geolocation => XTHIS IS BCZ WE MODEL ON INDIVI USER AND TRAINING DATA FR SUCH ACTIVITIES SHWS THAT ITS EVIDENT FRM FEATURES MAIN FEATURES..IF WE WUD HV TRAINED DATA ON DIFFERENT USERS WHERE CLASSIFIER CANNOT SAY PARTI ACTIVITY ON ONLY THOSE FEATURES DEN IT WL CONSIDER OTHERSTherefore we come down to only activities which can be generalized..
Walking Sleeping Lunch In Meeting Watching Movie 1 in meeting, watching movie -Walking walking in school(GPS loc)2 Watching Movie conflict with sleeping3 Watching movie – walking since some instances of walking in movie theatre- focus on LOCATION4 in meeting Walking, watching movie- lunch(at Arundel mills)I M NOT TRYING TO OVERFIT data
Naïve –INDE ASSUMPTION..good- Completely indep features n Funcdepe features, USED dataset is small and there are many attribute..WE HV MIXTURE..cannot use conjunctions on attributes..J48- robost2 errs n missing attrivals..disjunct+conjunct..Most algocnt do conjunctions..J48 is gud-real valued opsSVM radial basis- lot of noise and missing attributes..decision trees prune it wellRandom trees: randomly chosen attributes at each node. Performs no pruning. Ensembles of classifiers-Bootstrap aggregating-Averaging over bootstrap samples can reduce error from variance, especially when small differences in training sets can produce big differences between hypotheses.Bayesian nw- represent probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence..
Sensors: orientation, Accelerometer, proximity, light, latitude, longitude, noiseConcept hierarchy- replace low level concept by higher – timestamp –morning,etcDivides cont values data in equal freq intervalsConcise easy to use knw level repre.
learns a hypothesis by training a number of base hypotheses and combining their predictions
Planner Recommendation systemsCalendar data – updations to and from
e’: misclassified examples at the given node, ‘N’: examples that reach the given node,‘m’: all training examples.LESS confidence = Nodes reached by very few instances from the training data are penalized…Reduce size of tree – filter moreAt each junction, the algorithm compares (1) the weighted error of each child node versus (2) the misclassification error if the child nodes were deleted and the decision node were assigned the class label of the majority classReduced Error Pruning- we do not want most accurate tree since we do nthvdat good data…It tries to split data in train, test..greedily remove most helping attribute n check accuracy..gets very accurate small tree.reduces training data,overfittingSubtree raising- node may be moved upwards towards the root of the tree, replacing other nodes along the way.