Radhika Thesis

Context-Aware Middleware for
Activity Recognition

Masters Thesis Defense
Radhika Dharurkar

Advisor: Dr. Tim Finin
Committee: Dr. Anupam Joshi
Dr. Yelena Yesha
Dr. Laura Zavala
1

Overview
• Motivation
• Problem Statement
• Related work
• Approach
• Implementation
• Experiments and Results
• Contribution
• Limitations
• Future Work
• Conclusion

2

Mobile Market

• 5.3 Billion mobile subscribers
(77% of world’s population)
• Smart Phone Market -
Predicted 30% growth/year
• 85% mobile handsets access
mobile web
Pictures Courtesy: Mobile Youth
3

Motivation
• Enhance User Experience
o Richer notion of context that includes functional and social aspects
• Co-located social organizations
• Nearby devices and people
• Typical and inferred activities
• Roles of the people

• Device understanding “Geo-Social Location” and
perhaps Activity

• System by Service Providers and Administrators
o Collaboration
o Privacy
o Trust

4

Motivation
• Platys Project
Conceptual Place

• Tasks
• Semantic Context Modeling
• Mobility Tracking
• Collaborative Localization
• Privacy and Information Sharing
• Context Representation, reasoning, and inference
• Activity Recognition

5

Problem
• Predict Activity of the user with the use of “Smart
Phone”
• Capture data from different sensors present in smart
phone (atmospheric, transitional, temporal, etc.)
• Capture information of surrounding devices
• Capture statistics about usage of phone (e.g.
battery usage, call list)
• Capture information from other sources of
information (e.g. calendar)
• Developed prototype system which can predict
almost 10 activities with better precision.

6

Activity Hierarchy

8

Related Work
• Roy Want , Veronica Falcao , Jon Gibbons. “The
Active Badge Location System” (1992)
• Guanling Chen, David Kotz. “A survey of context-
aware mobile computing research” (2000)
• Gregory D. Abowd, Anind K. Dey, Peter J. Brown,
Nigel Davies, Mark Smith, and Pete Steggles.
“Towards a better understanding of context and
context-awareness” (1999)
• Stefano Mizzaro, Elena Nazzi, and Luca Vassena.
“Retrieval of context-aware applications on mobile
devices: how to evaluate?”(2008)

9

Related Work
• Nicholas D. Lane, Emiliano Miluzzo, Hong Lu, Daniel Peebles,
Tanzeem Choudhury, and Andrew T. Campbell. “A Survey
of Mobile Phone Sensing”, (2010)
• Hong Lu, Jun Yang, Zhigang Liu, Nicholas D. Lane, Tanzeem
Choudhury, Andrew T. Campbell.
“The Jigsaw Continuous Sensing Engine for Mobile
Phone Applications”, (2010)
• Nathan Eagle, Alex (Sandy) Pentland, and David Lazer.
“Inferring friendship network structure by using mobile
phone data”, (2009)
• Locale
• “ActiveCampus”. William G. Griswold, Patricia Shanahan
Steven W. Brown, Robert T. Boyer, UCSD (2003)
• Context COBRA. Harry Chen, Tim Finin, Anupam Joshi (2002)

10

Background: Context

Pictures Courtesy:
1) Mobile Youth
2) Zimmermann, A., Lorenz, A., Oppermann, R.: An operational definition of context.
11

Approach
o Automatically extract data from various data sources with
the help of smart phone

o Provide context modeling
• Representation of context as ontologies
• Represent the contextual information in a database

o Learning and Reasoning
• Supervised learning approach
• Identify feature set
• Prediction of the Activity of the user

12

Data Collection
User
Tagging

Sensor
Values

14

Data Extraction and
Cleanup

16

Extracting Features

17

Toy Experiment
• Data collected though framework developed by
eBiquity member which stored it in MySQL DB.
• We added data from Google Calendar data
• Data collected for one Student and one staff
member
• Automated understanding of Calendar data
• Manual cleaning up of data
• Labeled instances to find “Conceptual Place”
o Student : 422 -Home, Lab, Class, Else where
o Staff Member : 280 – Home Vs. Office

19

Toy Experiment
• Data collected though framework developed by
senior members (Tejas) which stored in MySQL DB.
• Captured Google Calendar data
• Data collected for one Student and one staff
member
• Automated understanding of Calendar data
• Manual cleaning up of data
• Labeled instances
o Student : 422 -Home, Lab, Class, Else where
o Staff Member : 280 – Home, Office

21

Toy Experiment
Sr. No Captured Data
1 Device Id
2 Timestamp
3 Latitude
4 Longitude
5 Wi-Fi Status
6 Wi-Fi Count
7 Wi-Fi ID
8 Battery Status
9 Light
10 Proximity
11 Power Connected
12 User Present
13 Handset Plugged
14 Calendar Data
15 Temperature 22

Toy Experiment
100

90

80
A 70
c
c 60
u
% 50
r
a 40
Student
c
30 Post Doc
y
20

10

0
Naïve Bayes J48 trees Random Trees Bayes Net Random Forest
Classifier

23

Analysis
• Only few activities –> therefore good accuracy

• Data Sparse -> cannot do proper training

• Presence of Noise

• Artificially high decision-value to the information

• Overfitting

24

Experiment 1- Statistics
• Data collected though Application built for Android
phone by Dr. Laura Zavala
• Added Bluetooth devices capture functionality
• Data collected every 12 min
for duration of 1 min (Notification)
• Last activity saved, if user ignores.
• Collects data from different
o Sensors
o Nearby Wi-Fi devices
o Nearby Bluetooth devices (Paired, not paired)
o GPS coordinates, Geo-location
o Call history
o User tagging for place and activity

25

• Collected data for 2 users for 2 weeks continuously.
• Captured Fine detailed activities
o 19 for Student
o 14 or staff member

• Parsing for raw text data
• Cleaning up the data
• Transformation of data into feature vector
• Use of Discretization techniques for continuous
attributes

26

Experiment 1- Accuracy
100

90

80
A
70
c
c 60
u
% 50
r
a 40
Student
c
30 Post Doc
y
20

10

0
Naïve Bayes J48 trees Random Bayes Net Random
Trees Forest

Classifier

27

Experiment 1- Analysis
• Comparing with TOY experiment accuracy
o Similar accuracy for Naïve Bayes and Decision Trees in Toy Exp.
o Big drop in accuracy for decision trees here

• In Toy Experiment
o Overfitting
o Noise
o Missing Data

• In This Experiment
o We tried to work on cleanup
o Discretization for sensor values
o Still have timestamp, Wi-Fi ids, such attributes as 1 feature.

28

Confused Activities
Total Main Activity Conflicted Conflicted

54 Coffee/Snacks Working/Studying 12 Sleeping 5

218 Working/Studying Coffee/Snacks 5 Sleeping 8, Chatting 8

39 Reading Working/Studying 19 Sleeping 4

26 Cleaning Working/Studying 10 Sleeping 2

195 Sleeping Working/Studying 9

17 Cooking Working/Studying 5 Sleeping 3, Cleaning 2

49 Chatting/Talking on Phone Working/Studying 14 Sleeping 2 ,Coffee/Snacks 2

6 Class-Listening Class-TakingNotes 2

3 Talk-Listening Class-TakingNotes 1 Working/Studying 1

1 Watching Movie Sleeping 1

3 Dinner Working/Studying 3

9 Watching TV Working/Studying 3 Sleeping 6

1 Shopping Working/Studying 1

Student Data 29

Confused Activities
Total Main Activity Conflicted Conflicted

525 Working/Studying Other/Idle 9 Sleeping 4 , Watching TV 6

9 Lunch Working/Studying 3 Other/Idle 1

72 Sleeping Working/Studying 19 Other/Idle 2

11 Cooking Working/Studying 3 Sleeping 2

78 Other/Idle Working/Studying 13 Walking 1

18 Watching TV Working/Studying 7 Other/Idle 1

2 Shopping Cooking 1

Staff Data

30

• Collected data for users for a month continuously.
• Finer detailed activities captured
o 19 for Student
• Some activities were hard to distinguish -> reduced to
small set of 9 activities for prediction
• Parsing for raw text data
• Cleaned up the data
• Use of Discretization techniques for continuous attributes
• Used “Bag of Words” approach
o Wi-Fi
o Geo-location
o Bluetooth
o Timestamp

31

Experiment 2- Accuracy
90

80

70
A
60
c
c
50
u
%
r
40
a
c 30
y
20

10

0
Naïve Bayes J48 trees Bagging + J48 LibSVM LibLinear
trees
Percentage split 66%
Classifier
Cross Validation 10 Folds

32

Experiment 2- Confusion
Matrix
a b c d e f g h i j k <-- classified as
677 1 0 0 0 0 4 0 0 0 2 | a = [Sleeping]
0 186 0 0 20 0 3 0 5 0 0 | b = [Walking]
0 0 27 0 0 0 0 0 0 0 0 | c = [In Meeting]
0 2 0 65 0 4 0 0 0 0 0 | d = [Playing]
0 37 0 0 37 0 0 0 4 0 0 | e = [Driving/Transporting]
0 0 0 2 0 146 1 0 0 2 0 | f = [Class-Listening]
8 0 0 0 0 2 52 2 0 0 8 | g = [Lunch]
9 0 0 0 0 0 8 11 0 0 0 | h = [Cooking]
0 11 0 0 6 0 0 0 13 0 0 | i = [Shopping]
0 2 0 0 0 5 0 0 0 7 0 | j = [Talk-Listening]
5 0 0 0 0 0 1 0 0 0 34 | k = [Watching Movie]

33

Experiment 2- Analysis
• Small Set of Activities analyzed
• Individual basis
• Naïve Bayes performance reduced
o More features included
o Less functional independence
• Decision Trees Accuracy Improved
o Bag of words approach
o Concept Hierarchy
o Conjunctions
Inline with Research
1) “Physical Activity monitoring” by Aminian, Robert
2) “Activity Recognition from user annotated accelerometer data”
by Bao, intille
• Recognition accuracy is highest for Decision tree
classifier => Proved Best for our Model

34

Accuracy for Models
100

98

96

94

92

90
% Accuracy
88

86

84

82
11 Activities Stationary Vs. Moving 10 Activities In Meeting Vs. In Class Home Vs. School Vs. Home Vs. School
Else Where
Classification for Activities

35

Small subset of Activities
• These activities do not have simple characteristics
and are easily confused with other activities.
o Phone kept on table while working, lunch, coffee
o Driving and Walking in school

• Not more sensor data to capture some activities
• Model mostly relies on features like
o Wi-Fi IDs
o Geographic location
o Bluetooth Ids
o Time of day

• Therefore, Hard to predict activities across users
o E.g In Class, cooking (Does not predict relying on sound levels)

36

Classifiers Evaluating
Our Data
Machine Learning Algorithm Evaluation Problems

Naive Bayes classifier Independence Assumption

Support vector machines Noise and Missing values

Decision trees Robust to errors, missing values,
conjunctions
Random Trees No Pruning

Ensembles of classifiers Reduces Variance

38

Discretization
• Filters – unsupervised attribute

• Binning

• Concept Hierarchy

• Division in intervals

• Smoothening the data

39

Bagging with J48
• Ensemble Learning Algorithms

• Averaging over bootstrap samples reduces error
from variance, esp. when small differences in
training set can produce big difference between
hypotheses.

40

Example J48+Bagging
Afternoon = False
Place = Home: Sleeping (9.0/2.0) | Evening = False
Place = ITE346: In Meeting (1.0) | | Place = Outdoors: Walking (1.0)
Place = Outdoors | | Place = Elsewhere: Sleeping (0.0)
| G1 = False | Evening = True: Walking (4.0)
| | Morning = True: Walking (5.0/2.0) Afternoon = True
| | Morning = False: Driving/Transporting (17.0/2.0) | Wifi Id8 = True: In Meeting (3.0)
| G1 = True: Walking (2.0) | Wifi Id8 = False
Place = Home | | Place = Home: Lunch (0.0)
| Evening = False: Sleeping (20.0) | | Place = Restaurant: Lunch (4.0)
| Evening = True | | Place = Movie Theater: Watching Movie (2.0)
| | noise = '(-inf-28.19588]': Cooking (0.0) | | Place = Work/School: Working (1.0)
| | noise = '(28.19588-32.71862]': Cooking (2.0) | | Place = ITE346: Lunch (0.0)
| | noise = '(32.71862-inf)': Watching Movie (1.0) | | Place = Outdoors: Walking (1.0)
Place = Restaurant: Lunch (5.0) | | Place = ITE3338/ITE377: Lunch (0.0)
Place = Movie Theater: Watching Movie (2.0)
Place = Elsewhere: Walking (1.0)
Place = ITE325: Talk-Listening (4.0) Wifi Id8 = True: In Meeting (6.0/1.0)
Place = ITE3338/ITE377: In Meeting (2.0) Wifi Id8 = False
Place = Groceries store: Shopping (1.0) | Afternoon = False
| | Evening = False: Sleeping (24.0/1.0)
| | Evening = True: Walking (5.0)
loc2 = '(-inf-39.17259]': Watching Movie (2.0) | Afternoon = True
loc2 = '(39.17259-39.18528]': Sleeping (0.0) | | Place = Work/School: Working (1.0)
loc2 = '(39.18528-39.19797]': Lunch (4.0) | | Place = ITE346: Lunch (0.0)
loc2 = '(39.24873-39.26142]': Walking (9.0/2.0) | | Place = Outdoors: Walking (1.0)
| | Place = Home: Lunch (0.0)
| | Place = ITE3338/ITE377: Lunch (0.0)

41

Contribution
• Smart phone usage for Mid-level Activity
recognition (Supervised Learning Approach)
• High level notion of context

• Accuracy of 88% for 9 Activities for a user
• Accuracy Inline with other researches
o Home Vs Work 100% compared to 95% accuracy- MIT project using HMM
o Mid-level detailed activity recognition – Bao and Intille (MIT).
o Highest Recognition Accuracy for Decision Tree classifier - Bao and intille
(MIT)

• General Model

42

Applications
Activity Distribution over a Week
Walking 1
Working 2
Sun
In Meeting 3
Sat Driving 4
Other/Idle 5
Fri Watching TV 6
D
a Sleeping 7
Thu
y Cooking 8
Talk-Listening 9
Wed Lunch 10
Tue Watching
Movie 11
Mon Reading 12
Shopping 13
Coffee/Snacks 14
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Activity

43

Applications
Weekday Activity Distribution
11
Sleeping 1
10 Studying 2
9 Coffee/Snacks 3
Reading 4
A 8
Driving/Transp
c 7 orting 5
t Walking 6
6
i In Meeting 7
v 5 Lunch 8
i Class-Listening 9
4
t
Class-Taking
y 3 Notes 10
2 Chatting 11

1

0
0:00 1:12 2:24 3:36 4:48 6:00 7:12 8:24 9:36 10:48 12:00 13:12 14:24 15:36 16:48 18:00 19:12 20:24 21:36 22:48

Timeline

44

Applications
Weekend Activity Distribution
10

9

8 Walking 5
Studying 2
A 7 Transporting 6
c Chatting 8
6
t Playing 9
i 5 Sleeping 1
v
Other 10
i 4
Reading 4
t
3 Shopping 7
y
Coffee/Snacks 3
2

1

0
0:00 1:12 2:24 3:36 4:48 6:00 7:12 8:24 9:36 10:48 12:00 13:12 14:24 15:36 16:48 18:00 19:12 20:24 21:36 22:48

Timeline

45

Applications
• Understand Pattern of Activities for users
• Keep a check on time spent
o Planner
o Study Schedules
o Program Meetings

• Update Phone settings according to context
• Recommendation Systems
• Locate specific service nearby
• Adjust presence of user
• Update Calendar of a user

46

Limitations
• Set of Experiments
o Duration of Data capture
o Number of users for capturing data

• Information captured through Phone

• Audio, sound processing

• Training on data from different individuals for
general model

47

Future
• Robust General Model
• Multiple feature sets for different kind of predictions
• Roles management
• Rules for some ground truths or profiles
• Collaborative activity inference
• Models to incorporate sequence of activities

48

ES – Decision Trees
• Each node = attribute
• End leaf gives classification results
• Root node = Most information gain(Claude
Shannon) If there are equal numbers of yeses and
no's, then there is a great deal of entropy in that
value. In this situation, information reaches a
maximum Info = -SUMi=1tom p1logp1
• attr 2 yes, 3 no=I([2,3])= -2/5 x log 2/5 - 3/5 x log 3/5
• Average them n subtract frm I(whole)

50

Classification via
Decision Trees
• Effective with Nominal data
• Pruning – correct potential overfitting
• Confidence Factor = 0.25
• Minimum number of Objects = 2
• Error Estimation = (e+1)/(N+m)
• Reduced Error Pruning - False
• Sub tree Raising - True

“Decision Tree Analysis using Weka”- Sam Drazin, Matt Montag

51

Radhika Thesis

Recommended

Recommended

More Related Content

Similar to Radhika Thesis

Similar to Radhika Thesis (20)

Radhika Thesis

Editor's Notes