SlideShare a Scribd company logo
1 of 51
Context-Aware Middleware for
    Activity Recognition

 Masters Thesis Defense
 Radhika Dharurkar

 Advisor: Dr. Tim Finin
 Committee: Dr. Anupam Joshi
              Dr. Yelena Yesha
              Dr. Laura Zavala
                                 1
Overview
•   Motivation
•   Problem Statement
•   Related work
•   Approach
•   Implementation
•   Experiments and Results
•   Contribution
•   Limitations
•   Future Work
•   Conclusion

                              2
Mobile Market



                                  • 5.3 Billion mobile subscribers
                                    (77% of world’s population)
                                  • Smart Phone Market -
                                    Predicted 30% growth/year
                                  • 85% mobile handsets access
                                    mobile web
Pictures Courtesy: Mobile Youth
                                                                     3
Motivation
• Enhance User Experience
   o Richer notion of context that includes functional and social aspects
       • Co-located social organizations
       • Nearby devices and people
       • Typical and inferred activities
       • Roles of the people


• Device understanding “Geo-Social Location” and
  perhaps Activity

• System by Service Providers and Administrators
   o Collaboration
   o Privacy
   o Trust



                                                                            4
Motivation
• Platys Project
   Conceptual Place



• Tasks
       •   Semantic Context Modeling
       •   Mobility Tracking
       •   Collaborative Localization
       •   Privacy and Information Sharing
       •   Context Representation, reasoning, and inference
       •   Activity Recognition




                                                              5
Problem
• Predict Activity of the user with the use of “Smart
  Phone”
• Capture data from different sensors present in smart
  phone (atmospheric, transitional, temporal, etc.)
• Capture information of surrounding devices
• Capture statistics about usage of phone (e.g.
  battery usage, call list)
• Capture information from other sources of
  information (e.g. calendar)
• Developed prototype system which can predict
  almost 10 activities with better precision.

                                                         6
Platys Ontology




                  7
Activity Hierarchy




                     8
Related Work
• Roy Want , Veronica Falcao , Jon Gibbons. “The
  Active Badge Location System” (1992)
• Guanling Chen, David Kotz. “A survey of context-
  aware mobile computing research” (2000)
• Gregory D. Abowd, Anind K. Dey, Peter J. Brown,
  Nigel Davies, Mark Smith, and Pete Steggles.
  “Towards a better understanding of context and
  context-awareness” (1999)
• Stefano Mizzaro, Elena Nazzi, and Luca Vassena.
  “Retrieval of context-aware applications on mobile
  devices: how to evaluate?”(2008)

                                                       9
Related Work
• Nicholas D. Lane, Emiliano Miluzzo, Hong Lu, Daniel Peebles,
  Tanzeem Choudhury, and Andrew T. Campbell. “A Survey
  of Mobile Phone Sensing”, (2010)
• Hong Lu, Jun Yang, Zhigang Liu, Nicholas D. Lane, Tanzeem
  Choudhury, Andrew T. Campbell.
  “The Jigsaw Continuous Sensing Engine for Mobile
   Phone Applications”, (2010)
• Nathan Eagle, Alex (Sandy) Pentland, and David Lazer.
  “Inferring friendship network structure by using mobile
  phone data”, (2009)
• Locale
• “ActiveCampus”. William G. Griswold, Patricia Shanahan
  Steven W. Brown, Robert T. Boyer, UCSD (2003)
• Context COBRA. Harry Chen, Tim Finin, Anupam Joshi (2002)

                                                           10
Background: Context




Pictures Courtesy:
1) Mobile Youth
2) Zimmermann, A., Lorenz, A., Oppermann, R.: An operational definition of context.
                                                                                11
Approach
o Automatically extract data from various data sources with
  the help of smart phone

o Provide context modeling
   • Representation of context as ontologies
   • Represent the contextual information in a database


o Learning and Reasoning
   • Supervised learning approach
   • Identify feature set
   • Prediction of the Activity of the user




                                                          12
Architecture




               13
Data Collection
         User
         Tagging

Sensor
Values




                                     14
Data Collection




                  15
Data Extraction and
     Cleanup




                      16
Extracting Features




                      17
Classification




                 18
Toy Experiment
• Data collected though framework developed by
  eBiquity member which stored it in MySQL DB.
• We added data from Google Calendar data
• Data collected for one Student and one staff
  member
• Automated understanding of Calendar data
• Manual cleaning up of data
• Labeled instances to find “Conceptual Place”
  o Student : 422 -Home, Lab, Class, Else where
  o Staff Member : 280 – Home Vs. Office




                                                  19
Google Calendar




                  20
Toy Experiment
• Data collected though framework developed by
  senior members (Tejas) which stored in MySQL DB.
• Captured Google Calendar data
• Data collected for one Student and one staff
  member
• Automated understanding of Calendar data
• Manual cleaning up of data
• Labeled instances
  o Student : 422 -Home, Lab, Class, Else where
  o Staff Member : 280 – Home, Office




                                                     21
Toy Experiment
  Sr. No   Captured Data
  1        Device Id
  2        Timestamp
  3        Latitude
  4        Longitude
  5        Wi-Fi Status
  6        Wi-Fi Count
  7        Wi-Fi ID
  8        Battery Status
  9        Light
  10       Proximity
  11       Power Connected
  12       User Present
  13       Handset Plugged
  14       Calendar Data
  15       Temperature       22
Toy Experiment
      100

       90

       80
  A    70
  c
  c    60
  u
%      50
  r
  a    40
                                                                                 Student
  c
       30                                                                        Post Doc
  y
       20

       10

        0
            Naïve Bayes   J48 trees   Random Trees   Bayes Net   Random Forest
                                        Classifier




                                                                                            23
Analysis
• Only few activities –> therefore good accuracy

• Data Sparse -> cannot do proper training

• Presence of Noise

• Artificially high decision-value to the information

• Overfitting


                                                        24
Experiment 1- Statistics
• Data collected though Application built for Android
  phone by Dr. Laura Zavala
• Added Bluetooth devices capture functionality
• Data collected every 12 min
   for duration of 1 min (Notification)
• Last activity saved, if user ignores.
• Collects data from different
   o   Sensors
   o   Nearby Wi-Fi devices
   o   Nearby Bluetooth devices (Paired, not paired)
   o   GPS coordinates, Geo-location
   o   Call history
   o   User tagging for place and activity

                                                       25
Experiment 1- Statistics
• Collected data for 2 users for 2 weeks continuously.
• Captured Fine detailed activities
    o 19 for Student
    o 14 or staff member

•   Parsing for raw text data
•   Cleaning up the data
•   Transformation of data into feature vector
•   Use of Discretization techniques for continuous
    attributes



                                                      26
Experiment 1- Accuracy
         100

          90

          80
     A
          70
     c
     c    60
     u
 %        50
     r
     a    40
                                                                            Student
     c
          30                                                                Post Doc
     y
          20

          10

           0
               Naïve Bayes   J48 trees   Random       Bayes Net   Random
                                          Trees                    Forest



                                         Classifier


                                                                                       27
Experiment 1- Analysis
• Comparing with TOY experiment accuracy
   o Similar accuracy for Naïve Bayes and Decision Trees in Toy Exp.
   o Big drop in accuracy for decision trees here


• In Toy Experiment
   o Overfitting
   o Noise
   o Missing Data


• In This Experiment
   o We tried to work on cleanup
   o Discretization for sensor values
   o Still have timestamp, Wi-Fi ids, such attributes as 1 feature.



                                                                       28
Confused Activities
Total   Main Activity                 Conflicted            Conflicted

54      Coffee/Snacks                 Working/Studying 12   Sleeping 5

218     Working/Studying              Coffee/Snacks 5       Sleeping 8, Chatting 8

39      Reading                       Working/Studying 19   Sleeping 4

26      Cleaning                      Working/Studying 10   Sleeping 2

195     Sleeping                      Working/Studying 9

17      Cooking                       Working/Studying 5    Sleeping 3, Cleaning 2

49      Chatting/Talking on Phone     Working/Studying 14   Sleeping 2 ,Coffee/Snacks 2

6       Class-Listening               Class-TakingNotes 2

3       Talk-Listening                Class-TakingNotes 1   Working/Studying 1

1       Watching Movie                Sleeping 1

3       Dinner                        Working/Studying 3

9       Watching TV                   Working/Studying 3    Sleeping 6

1       Shopping                      Working/Studying 1


                                    Student Data                                          29
Confused Activities
Total   Main Activity      Conflicted            Conflicted



525     Working/Studying   Other/Idle 9          Sleeping 4 , Watching TV 6

9       Lunch              Working/Studying 3    Other/Idle 1

72      Sleeping           Working/Studying 19   Other/Idle 2

11      Cooking            Working/Studying 3    Sleeping 2

78      Other/Idle         Working/Studying 13   Walking 1

18      Watching TV        Working/Studying 7    Other/Idle 1

2       Shopping           Cooking 1




                              Staff Data

                                                                              30
Experiment 2- Statistics
• Collected data for users for a month continuously.
• Finer detailed activities captured
   o 19 for Student
• Some activities were hard to distinguish -> reduced to
  small set of 9 activities for prediction
• Parsing for raw text data
• Cleaned up the data
• Use of Discretization techniques for continuous attributes
• Used “Bag of Words” approach
   o   Wi-Fi
   o   Geo-location
   o   Bluetooth
   o   Timestamp



                                                               31
Experiment 2- Accuracy
      90

      80

      70
  A
      60
  c
  c
      50
  u
%
  r
      40
  a
  c   30
  y
      20

      10

       0
           Naïve Bayes   J48 trees   Bagging + J48   LibSVM          LibLinear
                                        trees
                                                        Percentage split 66%
                                       Classifier
                                                        Cross Validation 10 Folds



                                                                                    32
Experiment 2- Confusion
        Matrix
 a b c d e f g h i j k <-- classified as
677 1 0 0 0 0 4 0 0 0 2 | a = [Sleeping]
 0 186 0 0 20 0 3 0 5 0 0 | b = [Walking]
 0 0 27 0 0 0 0 0 0 0 0 | c = [In Meeting]
 0 2 0 65 0 4 0 0 0 0 0 | d = [Playing]
 0 37 0 0 37 0 0 0 4 0 0 | e = [Driving/Transporting]
 0 0 0 2 0 146 1 0 0 2 0 | f = [Class-Listening]
 8 0 0 0 0 2 52 2 0 0 8 | g = [Lunch]
 9 0 0 0 0 0 8 11 0 0 0 | h = [Cooking]
 0 11 0 0 6 0 0 0 13 0 0 | i = [Shopping]
 0 2 0 0 0 5 0 0 0 7 0 | j = [Talk-Listening]
 5 0 0 0 0 0 1 0 0 0 34 | k = [Watching Movie]



                                                        33
Experiment 2- Analysis
• Small Set of Activities analyzed
• Individual basis
• Naïve Bayes performance reduced
   o More features included
   o Less functional independence
• Decision Trees Accuracy Improved
   o Bag of words approach
   o Concept Hierarchy
   o Conjunctions
   Inline with Research
   1) “Physical Activity monitoring” by Aminian, Robert
   2) “Activity Recognition from user annotated accelerometer data”
         by Bao, intille
• Recognition accuracy is highest for Decision tree
  classifier => Proved Best for our Model

                                                                      34
Accuracy for Models
100

 98

 96

 94

 92

 90
                                                                                                                                      % Accuracy
 88

 86

 84

 82
      11 Activities   Stationary Vs. Moving        10 Activities      In Meeting Vs. In Class Home Vs. School Vs.   Home Vs. School
                                                                                                 Else Where
                                              Classification for Activities




                                                                                                                                                   35
Small subset of Activities
• These activities do not have simple characteristics
  and are easily confused with other activities.
   o Phone kept on table while working, lunch, coffee
   o Driving and Walking in school

• Not more sensor data to capture some activities
• Model mostly relies on features like
   o   Wi-Fi IDs
   o   Geographic location
   o   Bluetooth Ids
   o   Time of day

• Therefore, Hard to predict activities across users
   o E.g In Class, cooking (Does not predict relying on sound levels)


                                                                        36
General Model




                37
Classifiers Evaluating
              Our Data
Machine Learning Algorithm   Evaluation Problems

Naive Bayes classifier       Independence Assumption

Support vector machines      Noise and Missing values

Decision trees               Robust to errors, missing values,
                             conjunctions
Random Trees                 No Pruning

Ensembles of classifiers     Reduces Variance




                                                                 38
Discretization
• Filters – unsupervised attribute

• Binning

• Concept Hierarchy

• Division in intervals

• Smoothening the data


                                     39
Bagging with J48
• Ensemble Learning Algorithms




• Averaging over bootstrap samples reduces error
  from variance, esp. when small differences in
  training set can produce big difference between
  hypotheses.


                                                    40
Example J48+Bagging
                                                       Afternoon = False
Place = Home: Sleeping (9.0/2.0)                       | Evening = False
Place = ITE346: In Meeting (1.0)                       | | Place = Outdoors: Walking (1.0)
Place = Outdoors                                       | | Place = Elsewhere: Sleeping (0.0)
| G1 = False                                           | Evening = True: Walking (4.0)
| | Morning = True: Walking (5.0/2.0)                  Afternoon = True
| | Morning = False: Driving/Transporting (17.0/2.0)   | Wifi Id8 = True: In Meeting (3.0)
| G1 = True: Walking (2.0)                             | Wifi Id8 = False
Place = Home                                           | | Place = Home: Lunch (0.0)
| Evening = False: Sleeping (20.0)                     | | Place = Restaurant: Lunch (4.0)
| Evening = True                                       | | Place = Movie Theater: Watching Movie (2.0)
| | noise = '(-inf-28.19588]': Cooking (0.0)           | | Place = Work/School: Working (1.0)
| | noise = '(28.19588-32.71862]': Cooking (2.0)       | | Place = ITE346: Lunch (0.0)
| | noise = '(32.71862-inf)': Watching Movie (1.0)     | | Place = Outdoors: Walking (1.0)
Place = Restaurant: Lunch (5.0)                        | | Place = ITE3338/ITE377: Lunch (0.0)
Place = Movie Theater: Watching Movie (2.0)
Place = Elsewhere: Walking (1.0)
Place = ITE325: Talk-Listening (4.0)                   Wifi Id8 = True: In Meeting (6.0/1.0)
Place = ITE3338/ITE377: In Meeting (2.0)               Wifi Id8 = False
Place = Groceries store: Shopping (1.0)                | Afternoon = False
                                                       | | Evening = False: Sleeping (24.0/1.0)
                                                       | | Evening = True: Walking (5.0)
loc2 = '(-inf-39.17259]': Watching Movie (2.0)         | Afternoon = True
loc2 = '(39.17259-39.18528]': Sleeping (0.0)           | | Place = Work/School: Working (1.0)
loc2 = '(39.18528-39.19797]': Lunch (4.0)              | | Place = ITE346: Lunch (0.0)
loc2 = '(39.24873-39.26142]': Walking (9.0/2.0)        | | Place = Outdoors: Walking (1.0)
                                                       | | Place = Home: Lunch (0.0)
                                                       | | Place = ITE3338/ITE377: Lunch (0.0)



                                                                                                         41
Contribution
• Smart phone usage for Mid-level Activity
  recognition (Supervised Learning Approach)
• High level notion of context

• Accuracy of 88% for 9 Activities for a user
• Accuracy Inline with other researches
   o Home Vs Work 100% compared to 95% accuracy- MIT project using HMM
   o Mid-level detailed activity recognition – Bao and Intille (MIT).
   o Highest Recognition Accuracy for Decision Tree classifier - Bao and intille
     (MIT)


• General Model

                                                                                   42
Applications
                      Activity Distribution over a Week
                                                                                  Walking                1
                                                                                  Working                2
    Sun
                                                                                  In Meeting             3
    Sat                                                                           Driving                4
                                                                                  Other/Idle             5
    Fri                                                                           Watching TV            6
D
a                                                                                 Sleeping               7
    Thu
y                                                                                 Cooking                8
                                                                                  Talk-Listening         9
    Wed                                                                           Lunch                 10
    Tue                                                                           Watching
                                                                                  Movie                 11
    Mon                                                                           Reading               12
                                                                                  Shopping              13
                                                                                  Coffee/Snacks         14
          0   1   2   3   4   5   6      7       8   9   10   11   12   13   14

                                      Activity




                                                                                                   43
Applications
                                          Weekday Activity Distribution
    11
                                                                                                                          Sleeping                1
    10                                                                                                                    Studying                2
     9                                                                                                                    Coffee/Snacks           3
                                                                                                                          Reading                 4
A    8
                                                                                                                          Driving/Transp
c    7                                                                                                                    orting                  5
t                                                                                                                         Walking                 6
     6
i                                                                                                                         In Meeting              7
v    5                                                                                                                    Lunch                   8
i                                                                                                                         Class-Listening         9
     4
t
                                                                                                                          Class-Taking
y    3                                                                                                                    Notes                  10
     2                                                                                                                    Chatting               11

     1

     0
         0:00 1:12 2:24 3:36 4:48 6:00 7:12 8:24 9:36 10:48 12:00 13:12 14:24 15:36 16:48 18:00 19:12 20:24 21:36 22:48

                                                             Timeline




                                                                                                                                            44
Applications
                                          Weekend Activity Distribution
    10

     9

     8                                                                                                                    Walking               5
                                                                                                                          Studying              2
A    7                                                                                                                    Transporting          6
c                                                                                                                         Chatting              8
     6
t                                                                                                                         Playing               9
i    5                                                                                                                    Sleeping              1
v
                                                                                                                          Other                10
i    4
                                                                                                                          Reading               4
t
     3                                                                                                                    Shopping              7
y
                                                                                                                          Coffee/Snacks         3
     2

     1

     0
         0:00 1:12 2:24 3:36 4:48 6:00 7:12 8:24 9:36 10:48 12:00 13:12 14:24 15:36 16:48 18:00 19:12 20:24 21:36 22:48

                                                             Timeline




                                                                                                                                          45
Applications
• Understand Pattern of Activities for users
• Keep a check on time spent
    o Planner
    o Study Schedules
    o Program Meetings

•   Update Phone settings according to context
•   Recommendation Systems
•   Locate specific service nearby
•   Adjust presence of user
•   Update Calendar of a user


                                                 46
Limitations
• Set of Experiments
   o Duration of Data capture
   o Number of users for capturing data


• Information captured through Phone

• Audio, sound processing

• Training on data from different individuals for
  general model



                                                    47
Future
•   Robust General Model
•   Multiple feature sets for different kind of predictions
•   Roles management
•   Rules for some ground truths or profiles
•   Collaborative activity inference
•   Models to incorporate sequence of activities




                                                              48
Thank you



            49
ES – Decision Trees
• Each node = attribute
• End leaf gives classification results
• Root node = Most information gain(Claude
  Shannon) If there are equal numbers of yeses and
  no's, then there is a great deal of entropy in that
  value. In this situation, information reaches a
  maximum Info = -SUMi=1tom p1logp1
• attr 2 yes, 3 no=I([2,3])= -2/5 x log 2/5 - 3/5 x log 3/5
• Average them n subtract frm I(whole)



                                                              50
Classification via
                Decision Trees
•    Effective with Nominal data
•    Pruning – correct potential overfitting
•    Confidence Factor = 0.25
•    Minimum number of Objects = 2
•    Error Estimation = (e+1)/(N+m)
•    Reduced Error Pruning - False
•    Sub tree Raising - True


    “Decision Tree Analysis using Weka”- Sam Drazin, Matt Montag



                                                                   51

More Related Content

Similar to Radhika Thesis

Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
Eunjeong (Lucy) Park
 
Gephi icwsm-tutorial
Gephi icwsm-tutorialGephi icwsm-tutorial
Gephi icwsm-tutorial
csedays
 
SP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with GephiSP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with Gephi
John Breslin
 

Similar to Radhika Thesis (20)

TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
 
Activity Monitoring Using Wearable Sensors and Smart Phone
Activity Monitoring Using Wearable Sensors and Smart PhoneActivity Monitoring Using Wearable Sensors and Smart Phone
Activity Monitoring Using Wearable Sensors and Smart Phone
 
01-pengantar.pdf
01-pengantar.pdf01-pengantar.pdf
01-pengantar.pdf
 
DBMS
DBMSDBMS
DBMS
 
Awareness Support in Scientific Events with SETapp
Awareness Support in Scientific Events with SETappAwareness Support in Scientific Events with SETapp
Awareness Support in Scientific Events with SETapp
 
Gephi icwsm-tutorial
Gephi icwsm-tutorialGephi icwsm-tutorial
Gephi icwsm-tutorial
 
The DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaThe DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with Dementia
 
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
When Remote Sensing Meets Artificial Intelligence
When Remote Sensing Meets Artificial IntelligenceWhen Remote Sensing Meets Artificial Intelligence
When Remote Sensing Meets Artificial Intelligence
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
 
Big and Small Web Data
Big and Small Web DataBig and Small Web Data
Big and Small Web Data
 
Towards the Wikipedia of World Wide Sensors
Towards the Wikipedia of World Wide SensorsTowards the Wikipedia of World Wide Sensors
Towards the Wikipedia of World Wide Sensors
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
SP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with GephiSP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with Gephi
 
Developing an Australian phenology monitoring network, Tim Brown, ACEAS Grand...
Developing an Australian phenology monitoring network, Tim Brown, ACEAS Grand...Developing an Australian phenology monitoring network, Tim Brown, ACEAS Grand...
Developing an Australian phenology monitoring network, Tim Brown, ACEAS Grand...
 
GSU-RF-2013-Reddy-3
GSU-RF-2013-Reddy-3GSU-RF-2013-Reddy-3
GSU-RF-2013-Reddy-3
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
 

Radhika Thesis

  • 1. Context-Aware Middleware for Activity Recognition Masters Thesis Defense Radhika Dharurkar Advisor: Dr. Tim Finin Committee: Dr. Anupam Joshi Dr. Yelena Yesha Dr. Laura Zavala 1
  • 2. Overview • Motivation • Problem Statement • Related work • Approach • Implementation • Experiments and Results • Contribution • Limitations • Future Work • Conclusion 2
  • 3. Mobile Market • 5.3 Billion mobile subscribers (77% of world’s population) • Smart Phone Market - Predicted 30% growth/year • 85% mobile handsets access mobile web Pictures Courtesy: Mobile Youth 3
  • 4. Motivation • Enhance User Experience o Richer notion of context that includes functional and social aspects • Co-located social organizations • Nearby devices and people • Typical and inferred activities • Roles of the people • Device understanding “Geo-Social Location” and perhaps Activity • System by Service Providers and Administrators o Collaboration o Privacy o Trust 4
  • 5. Motivation • Platys Project Conceptual Place • Tasks • Semantic Context Modeling • Mobility Tracking • Collaborative Localization • Privacy and Information Sharing • Context Representation, reasoning, and inference • Activity Recognition 5
  • 6. Problem • Predict Activity of the user with the use of “Smart Phone” • Capture data from different sensors present in smart phone (atmospheric, transitional, temporal, etc.) • Capture information of surrounding devices • Capture statistics about usage of phone (e.g. battery usage, call list) • Capture information from other sources of information (e.g. calendar) • Developed prototype system which can predict almost 10 activities with better precision. 6
  • 9. Related Work • Roy Want , Veronica Falcao , Jon Gibbons. “The Active Badge Location System” (1992) • Guanling Chen, David Kotz. “A survey of context- aware mobile computing research” (2000) • Gregory D. Abowd, Anind K. Dey, Peter J. Brown, Nigel Davies, Mark Smith, and Pete Steggles. “Towards a better understanding of context and context-awareness” (1999) • Stefano Mizzaro, Elena Nazzi, and Luca Vassena. “Retrieval of context-aware applications on mobile devices: how to evaluate?”(2008) 9
  • 10. Related Work • Nicholas D. Lane, Emiliano Miluzzo, Hong Lu, Daniel Peebles, Tanzeem Choudhury, and Andrew T. Campbell. “A Survey of Mobile Phone Sensing”, (2010) • Hong Lu, Jun Yang, Zhigang Liu, Nicholas D. Lane, Tanzeem Choudhury, Andrew T. Campbell. “The Jigsaw Continuous Sensing Engine for Mobile Phone Applications”, (2010) • Nathan Eagle, Alex (Sandy) Pentland, and David Lazer. “Inferring friendship network structure by using mobile phone data”, (2009) • Locale • “ActiveCampus”. William G. Griswold, Patricia Shanahan Steven W. Brown, Robert T. Boyer, UCSD (2003) • Context COBRA. Harry Chen, Tim Finin, Anupam Joshi (2002) 10
  • 11. Background: Context Pictures Courtesy: 1) Mobile Youth 2) Zimmermann, A., Lorenz, A., Oppermann, R.: An operational definition of context. 11
  • 12. Approach o Automatically extract data from various data sources with the help of smart phone o Provide context modeling • Representation of context as ontologies • Represent the contextual information in a database o Learning and Reasoning • Supervised learning approach • Identify feature set • Prediction of the Activity of the user 12
  • 14. Data Collection User Tagging Sensor Values 14
  • 16. Data Extraction and Cleanup 16
  • 19. Toy Experiment • Data collected though framework developed by eBiquity member which stored it in MySQL DB. • We added data from Google Calendar data • Data collected for one Student and one staff member • Automated understanding of Calendar data • Manual cleaning up of data • Labeled instances to find “Conceptual Place” o Student : 422 -Home, Lab, Class, Else where o Staff Member : 280 – Home Vs. Office 19
  • 21. Toy Experiment • Data collected though framework developed by senior members (Tejas) which stored in MySQL DB. • Captured Google Calendar data • Data collected for one Student and one staff member • Automated understanding of Calendar data • Manual cleaning up of data • Labeled instances o Student : 422 -Home, Lab, Class, Else where o Staff Member : 280 – Home, Office 21
  • 22. Toy Experiment Sr. No Captured Data 1 Device Id 2 Timestamp 3 Latitude 4 Longitude 5 Wi-Fi Status 6 Wi-Fi Count 7 Wi-Fi ID 8 Battery Status 9 Light 10 Proximity 11 Power Connected 12 User Present 13 Handset Plugged 14 Calendar Data 15 Temperature 22
  • 23. Toy Experiment 100 90 80 A 70 c c 60 u % 50 r a 40 Student c 30 Post Doc y 20 10 0 Naïve Bayes J48 trees Random Trees Bayes Net Random Forest Classifier 23
  • 24. Analysis • Only few activities –> therefore good accuracy • Data Sparse -> cannot do proper training • Presence of Noise • Artificially high decision-value to the information • Overfitting 24
  • 25. Experiment 1- Statistics • Data collected though Application built for Android phone by Dr. Laura Zavala • Added Bluetooth devices capture functionality • Data collected every 12 min for duration of 1 min (Notification) • Last activity saved, if user ignores. • Collects data from different o Sensors o Nearby Wi-Fi devices o Nearby Bluetooth devices (Paired, not paired) o GPS coordinates, Geo-location o Call history o User tagging for place and activity 25
  • 26. Experiment 1- Statistics • Collected data for 2 users for 2 weeks continuously. • Captured Fine detailed activities o 19 for Student o 14 or staff member • Parsing for raw text data • Cleaning up the data • Transformation of data into feature vector • Use of Discretization techniques for continuous attributes 26
  • 27. Experiment 1- Accuracy 100 90 80 A 70 c c 60 u % 50 r a 40 Student c 30 Post Doc y 20 10 0 Naïve Bayes J48 trees Random Bayes Net Random Trees Forest Classifier 27
  • 28. Experiment 1- Analysis • Comparing with TOY experiment accuracy o Similar accuracy for Naïve Bayes and Decision Trees in Toy Exp. o Big drop in accuracy for decision trees here • In Toy Experiment o Overfitting o Noise o Missing Data • In This Experiment o We tried to work on cleanup o Discretization for sensor values o Still have timestamp, Wi-Fi ids, such attributes as 1 feature. 28
  • 29. Confused Activities Total Main Activity Conflicted Conflicted 54 Coffee/Snacks Working/Studying 12 Sleeping 5 218 Working/Studying Coffee/Snacks 5 Sleeping 8, Chatting 8 39 Reading Working/Studying 19 Sleeping 4 26 Cleaning Working/Studying 10 Sleeping 2 195 Sleeping Working/Studying 9 17 Cooking Working/Studying 5 Sleeping 3, Cleaning 2 49 Chatting/Talking on Phone Working/Studying 14 Sleeping 2 ,Coffee/Snacks 2 6 Class-Listening Class-TakingNotes 2 3 Talk-Listening Class-TakingNotes 1 Working/Studying 1 1 Watching Movie Sleeping 1 3 Dinner Working/Studying 3 9 Watching TV Working/Studying 3 Sleeping 6 1 Shopping Working/Studying 1 Student Data 29
  • 30. Confused Activities Total Main Activity Conflicted Conflicted 525 Working/Studying Other/Idle 9 Sleeping 4 , Watching TV 6 9 Lunch Working/Studying 3 Other/Idle 1 72 Sleeping Working/Studying 19 Other/Idle 2 11 Cooking Working/Studying 3 Sleeping 2 78 Other/Idle Working/Studying 13 Walking 1 18 Watching TV Working/Studying 7 Other/Idle 1 2 Shopping Cooking 1 Staff Data 30
  • 31. Experiment 2- Statistics • Collected data for users for a month continuously. • Finer detailed activities captured o 19 for Student • Some activities were hard to distinguish -> reduced to small set of 9 activities for prediction • Parsing for raw text data • Cleaned up the data • Use of Discretization techniques for continuous attributes • Used “Bag of Words” approach o Wi-Fi o Geo-location o Bluetooth o Timestamp 31
  • 32. Experiment 2- Accuracy 90 80 70 A 60 c c 50 u % r 40 a c 30 y 20 10 0 Naïve Bayes J48 trees Bagging + J48 LibSVM LibLinear trees Percentage split 66% Classifier Cross Validation 10 Folds 32
  • 33. Experiment 2- Confusion Matrix a b c d e f g h i j k <-- classified as 677 1 0 0 0 0 4 0 0 0 2 | a = [Sleeping] 0 186 0 0 20 0 3 0 5 0 0 | b = [Walking] 0 0 27 0 0 0 0 0 0 0 0 | c = [In Meeting] 0 2 0 65 0 4 0 0 0 0 0 | d = [Playing] 0 37 0 0 37 0 0 0 4 0 0 | e = [Driving/Transporting] 0 0 0 2 0 146 1 0 0 2 0 | f = [Class-Listening] 8 0 0 0 0 2 52 2 0 0 8 | g = [Lunch] 9 0 0 0 0 0 8 11 0 0 0 | h = [Cooking] 0 11 0 0 6 0 0 0 13 0 0 | i = [Shopping] 0 2 0 0 0 5 0 0 0 7 0 | j = [Talk-Listening] 5 0 0 0 0 0 1 0 0 0 34 | k = [Watching Movie] 33
  • 34. Experiment 2- Analysis • Small Set of Activities analyzed • Individual basis • Naïve Bayes performance reduced o More features included o Less functional independence • Decision Trees Accuracy Improved o Bag of words approach o Concept Hierarchy o Conjunctions Inline with Research 1) “Physical Activity monitoring” by Aminian, Robert 2) “Activity Recognition from user annotated accelerometer data” by Bao, intille • Recognition accuracy is highest for Decision tree classifier => Proved Best for our Model 34
  • 35. Accuracy for Models 100 98 96 94 92 90 % Accuracy 88 86 84 82 11 Activities Stationary Vs. Moving 10 Activities In Meeting Vs. In Class Home Vs. School Vs. Home Vs. School Else Where Classification for Activities 35
  • 36. Small subset of Activities • These activities do not have simple characteristics and are easily confused with other activities. o Phone kept on table while working, lunch, coffee o Driving and Walking in school • Not more sensor data to capture some activities • Model mostly relies on features like o Wi-Fi IDs o Geographic location o Bluetooth Ids o Time of day • Therefore, Hard to predict activities across users o E.g In Class, cooking (Does not predict relying on sound levels) 36
  • 38. Classifiers Evaluating Our Data Machine Learning Algorithm Evaluation Problems Naive Bayes classifier Independence Assumption Support vector machines Noise and Missing values Decision trees Robust to errors, missing values, conjunctions Random Trees No Pruning Ensembles of classifiers Reduces Variance 38
  • 39. Discretization • Filters – unsupervised attribute • Binning • Concept Hierarchy • Division in intervals • Smoothening the data 39
  • 40. Bagging with J48 • Ensemble Learning Algorithms • Averaging over bootstrap samples reduces error from variance, esp. when small differences in training set can produce big difference between hypotheses. 40
  • 41. Example J48+Bagging Afternoon = False Place = Home: Sleeping (9.0/2.0) | Evening = False Place = ITE346: In Meeting (1.0) | | Place = Outdoors: Walking (1.0) Place = Outdoors | | Place = Elsewhere: Sleeping (0.0) | G1 = False | Evening = True: Walking (4.0) | | Morning = True: Walking (5.0/2.0) Afternoon = True | | Morning = False: Driving/Transporting (17.0/2.0) | Wifi Id8 = True: In Meeting (3.0) | G1 = True: Walking (2.0) | Wifi Id8 = False Place = Home | | Place = Home: Lunch (0.0) | Evening = False: Sleeping (20.0) | | Place = Restaurant: Lunch (4.0) | Evening = True | | Place = Movie Theater: Watching Movie (2.0) | | noise = '(-inf-28.19588]': Cooking (0.0) | | Place = Work/School: Working (1.0) | | noise = '(28.19588-32.71862]': Cooking (2.0) | | Place = ITE346: Lunch (0.0) | | noise = '(32.71862-inf)': Watching Movie (1.0) | | Place = Outdoors: Walking (1.0) Place = Restaurant: Lunch (5.0) | | Place = ITE3338/ITE377: Lunch (0.0) Place = Movie Theater: Watching Movie (2.0) Place = Elsewhere: Walking (1.0) Place = ITE325: Talk-Listening (4.0) Wifi Id8 = True: In Meeting (6.0/1.0) Place = ITE3338/ITE377: In Meeting (2.0) Wifi Id8 = False Place = Groceries store: Shopping (1.0) | Afternoon = False | | Evening = False: Sleeping (24.0/1.0) | | Evening = True: Walking (5.0) loc2 = '(-inf-39.17259]': Watching Movie (2.0) | Afternoon = True loc2 = '(39.17259-39.18528]': Sleeping (0.0) | | Place = Work/School: Working (1.0) loc2 = '(39.18528-39.19797]': Lunch (4.0) | | Place = ITE346: Lunch (0.0) loc2 = '(39.24873-39.26142]': Walking (9.0/2.0) | | Place = Outdoors: Walking (1.0) | | Place = Home: Lunch (0.0) | | Place = ITE3338/ITE377: Lunch (0.0) 41
  • 42. Contribution • Smart phone usage for Mid-level Activity recognition (Supervised Learning Approach) • High level notion of context • Accuracy of 88% for 9 Activities for a user • Accuracy Inline with other researches o Home Vs Work 100% compared to 95% accuracy- MIT project using HMM o Mid-level detailed activity recognition – Bao and Intille (MIT). o Highest Recognition Accuracy for Decision Tree classifier - Bao and intille (MIT) • General Model 42
  • 43. Applications Activity Distribution over a Week Walking 1 Working 2 Sun In Meeting 3 Sat Driving 4 Other/Idle 5 Fri Watching TV 6 D a Sleeping 7 Thu y Cooking 8 Talk-Listening 9 Wed Lunch 10 Tue Watching Movie 11 Mon Reading 12 Shopping 13 Coffee/Snacks 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Activity 43
  • 44. Applications Weekday Activity Distribution 11 Sleeping 1 10 Studying 2 9 Coffee/Snacks 3 Reading 4 A 8 Driving/Transp c 7 orting 5 t Walking 6 6 i In Meeting 7 v 5 Lunch 8 i Class-Listening 9 4 t Class-Taking y 3 Notes 10 2 Chatting 11 1 0 0:00 1:12 2:24 3:36 4:48 6:00 7:12 8:24 9:36 10:48 12:00 13:12 14:24 15:36 16:48 18:00 19:12 20:24 21:36 22:48 Timeline 44
  • 45. Applications Weekend Activity Distribution 10 9 8 Walking 5 Studying 2 A 7 Transporting 6 c Chatting 8 6 t Playing 9 i 5 Sleeping 1 v Other 10 i 4 Reading 4 t 3 Shopping 7 y Coffee/Snacks 3 2 1 0 0:00 1:12 2:24 3:36 4:48 6:00 7:12 8:24 9:36 10:48 12:00 13:12 14:24 15:36 16:48 18:00 19:12 20:24 21:36 22:48 Timeline 45
  • 46. Applications • Understand Pattern of Activities for users • Keep a check on time spent o Planner o Study Schedules o Program Meetings • Update Phone settings according to context • Recommendation Systems • Locate specific service nearby • Adjust presence of user • Update Calendar of a user 46
  • 47. Limitations • Set of Experiments o Duration of Data capture o Number of users for capturing data • Information captured through Phone • Audio, sound processing • Training on data from different individuals for general model 47
  • 48. Future • Robust General Model • Multiple feature sets for different kind of predictions • Roles management • Rules for some ground truths or profiles • Collaborative activity inference • Models to incorporate sequence of activities 48
  • 49. Thank you 49
  • 50. ES – Decision Trees • Each node = attribute • End leaf gives classification results • Root node = Most information gain(Claude Shannon) If there are equal numbers of yeses and no's, then there is a great deal of entropy in that value. In this situation, information reaches a maximum Info = -SUMi=1tom p1logp1 • attr 2 yes, 3 no=I([2,3])= -2/5 x log 2/5 - 3/5 x log 3/5 • Average them n subtract frm I(whole) 50
  • 51. Classification via Decision Trees • Effective with Nominal data • Pruning – correct potential overfitting • Confidence Factor = 0.25 • Minimum number of Objects = 2 • Error Estimation = (e+1)/(N+m) • Reduced Error Pruning - False • Sub tree Raising - True “Decision Tree Analysis using Weka”- Sam Drazin, Matt Montag 51

Editor's Notes

  1. Last couple of yrshv seen STRONGEST GROWTH in Smart phones...0.5 billion use smart phonesSmartphones and other mobile devices have a simple notion of context largely restricted to temporal and spatial coordinates.Service providers and enterprise administrators can deploy systems incorporating activity and relations context to enhancethe user experience, but this raises considerable collaboration, trust and privacy issues between different service providers.
  2. Our work is an initial step toward enabling devices themselves to represent, acquire and use a richer notion of context thatincludes functional and social aspects such as co-located social organizations, nearby devices and people, typical and inferredactivities, and the roles people fill in them.  Geo-social locations
  3. Motivation of platys prj was to represent location with such  conceptual place notione.g. instead of saying I m at 1000 hilltop circle phone can actually understand that u r in school and with the ur context it can predict that u are giving a talk..
  4. 1. predicting location of the user with the use of infrared technology to forward calls to nearby phones2. context-aware systems that support collecting and disseminating context and applications that adapt to the changing context. It gives summary of different applications like Teleporting, Shopping Assistant, Cyber guide, etc. which uses context information. But these applications use small pieces of context information and were specifically developed to suit a particular model.3. Dey provides survey of context-aware apps, defs and categories of context..4. framework (MoBe) to dynamically n automatically download, configure, exe unload applications acc to user’s current context. 5. Audio Tourist Guide in museums
  5. 1. Use of different sensors in mobiles 2. sensing applications on mobile phones - sound samples from microphone, accelerometer data, GPS reading and random photos3. MIT infer the friendship network structure of an individual by collecting information from mobile phones over an extended period. 4. Locale manages settings based on conditions, like Location and Time – static rules set up by user5. Uses a person&apos;s context, like location, to help engage them in campus life.PROBLEMS:New situations don’t fit examplesLack generalityHow to use in practice?Traditional information Generalized Context-Aware Application
  6. Except 1st and last letter, all other letters are been rearranged but since our brain is powerful it can find d context and hence the data makes sense..Zimmerman explains5 categories of context info – Individual-natural, human, artificial and group entities
  7. Slide shows the approach we hv taken to solve our problem of Activity recognition..First we built an application which can capture data from various possible sources ..Then we model the context by representing it as ontologies..We use supervised learning approach to classify the data.Why supervised is good…why we need learning in our Problem
  8. Timestamp Day of week Weekend (True/False) Place Activity User Added (True/False) Orientation (Azimuth, Pitch, Roll) Magnetic Field Accelerometer (Gx, Gy, Gz) Light Proximity Connected Wi-Fi ID Wi-Fi devices List 631 Wi-Fi IDs (True/False) Undefined Wi-Fi ID (True/False)Latitude Longitude AltitudeLocation Bearing Location SpeedGeocode Calendar data Paired Bluetooth devices Unpaired Bluetooth devices
  9. We need to work on the input raw data ..The data is been captured every 12 mins..We need to parse the input text data..Also, we capture data for sensors for a duration assuming that there can be noise and average over them..We need to accumulate values for some multi – valued attributes like wifi ids, bluetooth ids
  10. Transformer works on selection of attributes contributing to activity recognition and working on some of the attributeslike wifi ids, bluetooth ids , geocodes which we change from a list to range of different features..
  11. We classify the feature vector with the help of different machine learning algorithms..like naïve bayes, svmlib, decision trees, etcWe try to use some ensemble methods to obtain better predictive performance..(an ensemble is a technique for combining many weak learners in an attempt to produce a strong learner.)The model takes reference of the earlier model built and updates it with the new model..
  12. Student: Home, Lab , Class , ElsewherePost Doc: Home , OfficeSparse DATA…Lot of NOISE..Not proper feature extraction..data not processed like timestamp used likdat..n wifietclatitude, longitude, battery percentage, light (some nulls observed), proximity,Wi-Fi count, Wi-Fi ids, and user present (some nulls observed), Google calendar dataCross validation 10 fold
  13. Student: Home, Lab , Class , Elsewhere Post Doc: Home , OfficeData Sparse since application was not stableartificially high decision-value to the information (e.g. timestamp, wifi id, geolocation, etc)Strong independence assumptions played a significant role in here for other algos like naïve bayes.
  14. Class Taking Notes, Class ListeningCleanup - Removing attributes like timestamp, averaging sensor values, checking if user did not just forgot to select-Discretization used: divide the number of values for a continuous attribute into intervals which reduces and simplifies the data. Use of such techniques helped us to have a concise, easy-to-use knowledge-level representation of mining results-All the machine learning algorithms cannot handle this situation of “bag of words”. Wifi, timestamp – morning, afternoon, etc..
  15. If you compare with accuracy which we had for toy exp, we had almost similar accu 4 Naïve n Deci TreesBt here we can identify a big drop in accufr decision trees..In toy=overfitting..here naïve is still doing overfitting..We tried to work on cont do cleanup, discretization..Since we had timestamp, wifi ids, such attributes as 1 feature.
  16. Decision trees cannot understand the model since we had data like timestamp which is just 1 value..wifi which is a set of wifi devices..bt this set can differ fr the same place..
  17. We hv a model which evaluates on attributes likwifideviices found in vicinity, gps location, geocode. -&gt; we get conflictsTalk abt accuracies fr activities
  18. 9 Activities :Working/Studying,Sleeping,Walking,InClass,Outdoors,InMeeting,Talk-Listening,Other/Idle,Shopping“bag of words”. Wifi, timestamp – morning, afternoon, etc..Discretizationbagging
  19. 9 Activities :Working/Studying,Sleeping,Walking,InClass,Outdoors,InMeeting,Talk-Listening,Other/Idle,Shopping“bag of words”. Wifi, timestamp – morning, afternoon, etc..DiscretizationBaggingNaïve bayes reduced a lot –since there was overfitting before which got removed.
  20. Movingvs stationary isnt that good bcz for Moving we hv data of school shuttle which moves very slow
  21. In class / talk listening : geolocation +wifi id ..our model doesn’t consider noise or light values fr predictionsLaura dataset- cooking activity –geolocation +wifi id cannot be mapped to cooking activity of othersStudying /Working activity depends on - time of day +wifi id+ geolocation =&gt; XTHIS IS BCZ WE MODEL ON INDIVI USER AND TRAINING DATA FR SUCH ACTIVITIES SHWS THAT ITS EVIDENT FRM FEATURES MAIN FEATURES..IF WE WUD HV TRAINED DATA ON DIFFERENT USERS WHERE CLASSIFIER CANNOT SAY PARTI ACTIVITY ON ONLY THOSE FEATURES DEN IT WL CONSIDER OTHERSTherefore we come down to only activities which can be generalized..
  22. Walking Sleeping Lunch In Meeting Watching Movie 1 in meeting, watching movie -Walking walking in school(GPS loc)2 Watching Movie conflict with sleeping3 Watching movie – walking since some instances of walking in movie theatre- focus on LOCATION4 in meeting Walking, watching movie- lunch(at Arundel mills)I M NOT TRYING TO OVERFIT data
  23. Naïve –INDE ASSUMPTION..good- Completely indep features n Funcdepe features, USED dataset is small and there are many attribute..WE HV MIXTURE..cannot use conjunctions on attributes..J48- robost2 errs n missing attrivals..disjunct+conjunct..Most algocnt do conjunctions..J48 is gud-real valued opsSVM radial basis- lot of noise and missing attributes..decision trees prune it wellRandom trees: randomly chosen attributes at each node. Performs no pruning. Ensembles of classifiers-Bootstrap aggregating-Averaging over bootstrap samples can reduce error from variance, especially when small differences in training sets can produce big differences between hypotheses.Bayesian nw- represent probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence..
  24. Sensors: orientation, Accelerometer, proximity, light, latitude, longitude, noiseConcept hierarchy- replace low level concept by higher – timestamp –morning,etcDivides cont values data in equal freq intervalsConcise easy to use knw level repre.
  25. learns a hypothesis by training a number of base hypotheses and combining their predictions
  26. Planner Recommendation systemsCalendar data – updations to and from
  27. e’: misclassified examples at the given node, ‘N’: examples that reach the given node,‘m’: all training examples.LESS confidence = Nodes reached by very few instances from the training data are penalized…Reduce size of tree – filter moreAt each junction, the algorithm compares (1) the weighted error of each child node versus (2) the misclassification error if the child nodes were deleted and the decision node were assigned the class label of the majority classReduced Error Pruning- we do not want most accurate tree since we do nthvdat good data…It tries to split data in train, test..greedily remove most helping attribute n check accuracy..gets very accurate small tree.reduces training data,overfittingSubtree raising- node may be moved upwards towards the root of the tree, replacing other nodes along the way.