1.
AWASS 2012 Case Study Tutorial -Classifying human motion for activemusic systemsArjun ChandraUniversity of Oslo
2.
Outline of the Tutorial Why do motion classiﬁcation for active music systems? The motion classiﬁcation problem. Established solutions for the problem and demonstration. Challenges for the week with regards to motion classiﬁcation for active music systems. 2/65
3.
Active MusicVideos from yesterday: iPhone ensemble (UiO) SoloJam (UiO) Performance based musicmaking (Wout Standaert) 3/65
4.
Active Music Boundary between someone performing music and someone listening/perceiving. Limited passive interaction - tapping feet etc. Active music blurs this boundary and allows participation by perceivers. End user may have little or no training in traditional musicianship or composition. The user gets control of the sonic/musical output to a greater or lesser extent. Users experience the sensation of playing music themselves. 4/65
5.
Active MusicTo build such a system... Give control via mobile media devices like iPods for example. Devices to be intelligent in order to mediate the control of the musical output by participants.Media device must be able to: Sense the inputs from the participants and the environment. Process these in various forms. Co-ordinate the activities of the participants as they perform. Maintain musicality and interest of the users. 5/65
6.
Active MusicKey type of input: Human motion is an integral part of the types of inputs that may be sensed by the devices. Motion and sound very closely related! Motion to be processed by the device in some fashion and eventually mapped to music. In a full ﬂedged active music system, numerous other types of inputs will be sensed, pertaining to the participant, the device itself, as well as the external (to the human-device subsystem) environment, including other participants/devices. 6/65
7.
How Does All This Relate toSelf-awareness?Self-awareness to take the form of: Devices building models/possessing knowledge of the behaviours of the respective participants. Devices building models/possessing knowledge of themselves. Devices building models/possessing knowledge of the environment within which they get used. Such knowledge would help the devices to further reason about themselves in order to maintaining musicality, maintaining user interest, eﬃciency in computation, maintaining good response times, managing overhead in communication, managing energy needs, managing trade-oﬀs between such goals etc. 7/65
8.
Classifying Human Motion forActive Music SystemsOne ﬁrst step towards mapping sensed motion into music: Recognise patterns in human motion. We will work on such pattern recognition this week.Triggers or ﬁne grained mapping: 1 The recognition may be used as triggers, i.e. recognise the type of motion once it has been performed and trigger sound synthesis. 2 In addition, the system may also be able to anticipate which type of motion is about to be performed, or is ongoing, and thus provide the possibility of more ﬁne grained mapping with sound synthesis. 8/65
9.
Classifying Human Motion forActive Music SystemsOne ﬁrst step towards mapping sensed motion into music: Recognise patterns in human motion. We will work on such pattern recognition this week.Triggers or ﬁne grained mapping: 1 The recognition may be used as triggers, i.e. recognise the type of motion once it has been performed and trigger sound synthesis. 2 In addition, the system may also be able to anticipate which type of motion is about to be performed, or is ongoing, and thus provide the possibility of more ﬁne grained mapping with sound synthesis. 9/65
10.
Classifying Human Motion forActive Music Systems Motion Classiﬁcation Scheme Trained Training Classiﬁer Identiﬁed class Pre- which can then 3D accelerometer Recognition processing be informed to a stream (optional)sound synthesiserExample video for motion classiﬁcation. 10/65
11.
Classifying Human Motion forActive Music SystemsTwo classic phases: Training: take a bunch of data and build a classiﬁer. Recognition: use the classiﬁer on new streams to recognise patterns in these streams. 11/65
12.
Classifying Human Motion forActive Music SystemsSome challenges whilst training: The same type of motion can vary both spatially and temporally. Same type of motion may be performed diﬀerently depending on mood of the user. Intentions of the user have a bearing on the performed motion. User may be stationary or moving when performing the motion. Diﬀerent users may perform the same motion diﬀerently. The motion types may grow or reduce in number over time, as the user operates the system. 12/65
13.
Classifying Human Motion forActive Music SystemsTo make things more challenging: Sometimes, quick training is essential. Ideally, online with little or no eﬀort from user. Automated segmentation coupled with classiﬁcation. 13/65
14.
Classifying Human Motion forActive Music SystemsMany ways to capture motion: Marker based motion capture systems, e.g. Qualsis motion capture system (Soundsaber) Vision based systems, e.g. Kinect (Piano via Kinect) Sensor based systems, e.g. iOS devices (SoloJam), Wiimote, Xsens full body motion capture suit (Dance Jockey, Mobile Dance Jockey) 14/65
15.
Classifying Human Motion forActive Music SystemsIn this case study, we capture motion data via: Media devices e.g. iPods, have internal motion (acceleration) sensors: 3D accelerometers. We will use the sensor data stream as the device is moved, and classify the performed motion into a relevant category. 15/65
16.
Classifying Human Motion forActive Music SystemsWhat category? You can choose to deﬁne categories. You will then collect some data pertaining to the categories you choose. Once you have collected the data, you will then train a classiﬁer with this data. Once trained, you will use this classiﬁer to recognise the categorised motions within a sensor data stream. 16/65
17.
Classifying Human Motion forActive Music SystemsDemo...Let’s deﬁne some categories, train and get some motion recognition going! 17/65
18.
Classifying Human Motion forActive Music SystemsAs I mentioned yesterday, you are going to be provided: Established algorithms for motion classiﬁcation. Two algorithms to play with and build on to be precise. Data sets with diﬀerent types of motion which you can use to get a feel for the algorithms. Exercises pertaining to some challenging active music scenarios where motion classiﬁcation is required. These will require you to build new data sets.Let us look at the two algorithms brieﬂy now... 18/65
19.
Motion Classiﬁcation AlgorithmsThe two algorithms are: Dynamic Time Warping. Hidden Markov Models. You are encouraged to not be limited by these two algorithms. Apply others that you know of. 19/65
20.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW)Key idea: To be able to compare two signals of diﬀerent lengths. The result of such a comparison can be used in interesting ways. 20/65
21.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW)Key idea: To be able to compare two signals of diﬀerent lengths. The result of such a comparison can be used in interesting ways.You might wonder... What should be compared to what? What are these two signals? 20/65
22.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW)Key idea: To be able to compare two signals of diﬀerent lengths. The result of such a comparison can be used in interesting ways.You might wonder... What should be compared to what? What are these two signals? Template matching! 20/65
23.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW)Template matching: Match a time varying signal, in our case motion data stream, against a stored set of signals. The stored set of signals are the templates, one representing each category. In eﬀect, a motion data stream is compared against a representative from within the collected data, one for each category. The closest matching template, tells us about the category the stream most likely belongs to. 21/65
24.
Motion Classiﬁcation Algorithms: Dynamic Time Warping (DTW)Euclidean distance... time 22/65
25.
Motion Classiﬁcation Algorithms: Dynamic Time Warping (DTW)Euclidean distance... DTW... intuitive! time time 22/65
26.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) Two signals and their cost matrix. This cost → some distance measure, e.g. Euclidean. Note the valleys (dark - low cost) and hills (light - high cost). Goal → ﬁnd alignment with minimal overall cost. The optimal alignment runs along a valley of low cost. 23/65
27.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) We have to ﬁnd the optimal warping path in this cost matrix. Shown is the optimal warping path, i.e. optimal alignment. How do we ﬁnd this warping path? There are exponentially many. 24/65
28.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) If P is a warping path. Note that P is a set of pairs of aligned points (p) on the signals. k s=1 d(ps )ws argmin k P s=1 ws gives the optimal path, where, d(ps ) is the cost, ws is the weighting coeﬃcient (1 in our case), and denominator is the length of the path. 25/65
29.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) We ﬁrst put some restrictions on the paths that may be found. 1 Monotonicity. 2 Continuity. 3 Boundary conditions. 4 Warping window. 5 Slope constraint. 26/65
30.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) Monotonicity. Path not allowed to go back in time. Prevents feature comparisons being repeated during matching. 27/65
31.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) Monotonicity. Path not allowed to go back in time. Prevents feature comparisons being repeated during matching. 28/65
32.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) time time 29/65
33.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) Continuity. Path not allowed to break. Prevents omission of features whilst matching. 30/65
34.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) Continuity. Path not allowed to break. Prevents omission of features whilst matching. 31/65
35.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) time time 32/65
36.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) Boundary conditions. Start at top-left position and end at bottom-right position in the matrix. Prevents one of the signals being only partially considered. 33/65
37.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) Boundary conditions. Start at top-left position and end at bottom-right position in the matrix. Prevents one of the signals being only partially considered. 34/65
38.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) time time 35/65
39.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) Warping window. A good alignment path is unlikely to wander too far from the diagonal. Stay within a window. Prevents sticking at similar features and skipping features. 36/65
40.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) Warping window. A good alignment path is unlikely to wander too far from the diagonal. Stay within a window. Prevents skipping features and sticking at similar features. 37/65
41.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) time time 38/65
42.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) Slope constraint. Path not allowed to be too steep or too ﬂat. Prevents short parts of a signal to be matched with very long parts of the other. 39/65
43.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) Slope constraint. Path not allowed to be too steep or too ﬂat. Prevents short parts of a signal to be matched with very long parts of the other. 40/65
44.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) time time 41/65
45.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) We then build an accumulated distance matrix. There is a nicely deﬁned valley that emerges when we do so. Building the accumulated distance matrix is done via dynamic programming. Let us see how this is done... 42/65
46.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) This valley reveals the path we are after. The bottom-right corner of the matrix holds the value k s=1 d(ps )ws . This value is the un-normalised warping distance. Normalising this distance by the path length, gives us the distance between the two signals. 43/65
47.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW) The two signals shown in this example are one dimensional. But, we are going to work with 3D data. The process described above works for N dimensional data. Remember that the initial cost matrix is built using Euclidean distance. 44/65
48.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW)3D-DTW: 45/65
49.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW)Training/ﬁnding the representative template for each category: Find the training example with the minimum average normalised distance against the remaining examples, for each category. See equation (7) in Gillian et. al. (2011). Note that there are other ways to ﬁnd templates. You are encouraged to explore other ways. 46/65
50.
Motion Classiﬁcation Algorithms:Dynamic Time Warping (DTW)Recognition Find the normalised distance between the stream and all the templates. The category of the closest (lowest normalised distance) matching template is the classiﬁcation result, provided the distance is below the threshold. See equations (10 - 13) in Gillian et. al. (2011) for threshold distance for each category - to reject false positives. Come up with your own way! 47/65
51.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)Key idea: Statistical generative model of time varying signals. One HMM per category. Can help ascertain the probability that a given observation/stream/time varying observed signal can be generated by the model. Knowing this probability, across multiple HMMs, allows us to categorise a stream. 48/65
52.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)Schematic of a Markov chain with 5 states (Rabiner (1989)): Probability of being in a state only depends on the predecessor state (ﬁrst order). Independent of time. Denoted by aij , and N j=1 aij = 1. But here, each state is observable, e.g. weather model: P (rain, rain, rain, ...|M odel)? 49/65
54.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)HMM: v1 v2 v1 b21 v3 b22v2 b11 b23v3 b12 b13 A hidden process generates what you observe. v1 b51 b31 Thus, you observe this hiddenv2 b52 b32 v1 process via observations only.v3 b53 b33 v2 v3 b41 b42 b43 v1 v2 v3 51/65
55.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)HMM: v1 v2 b21 Observation a probabilistic v1 v3 b22 function of state!v2 b11 b23v3 b12 vj ’s are the possible b13 observations in any state. We do not observe state v1 b31 anymore, hence hidden:v2 b51 b52 b32 v1 examples - ask weather fromv3 b53 b33 v2 friend, observing acceleration v3 stream when person moves b41 b42 b43 in some way/or not in v1 v2 v3 another room. 52/65
56.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)HMM elements: v1 v2 v1 b21 N states, S = {S1 , S2 , ..., SN }. v3 b22v2 b11 b23 qt , the state at time t.v3 b12 b13 M observation symbols/codebook, V = {v1 , v2 , ..., vM }. v1 b31 Observation sequencev2 b51 b52 b32 v1v3 O = O1 O2 ...OT , made up of b53 b33 v2 elements from the codebook, v3 e.g. sequence v1 v2 v1 v3 v2 ... of b41 b42 b43 length T . v1 v2 v3 53/65
57.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)HMM elements: v1 v2 v1 b21 v3 b22 State transition matrixv2 b11 b23 b12 A = {aij }, wherev3 b13 aij = P (qt+1 = Sj |qt = Si ). Emission/observation symbol v1 matrix B = {bjk }, where b31v2 b51 b32 v1 bjk = P (vk |qt = Sj ). b52v3 Initial state probability b53 b33 v2 v3 vector π = {πi }, where b41 b42 b43 πi = P (q1 = Si ). v1 v2 v3 54/65
58.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)HMM elements: v1 v2 v1 b21 v3 b22 A HMM λ speciﬁed byv2 b11 b23 b12 specifying N , M , V , A, B andv3 b13 π. Example: N = 4, M = 3, v1 V = {v1 , v2 , v3 }, aij ’s, bjk ’s, b31v2 b51 b32 v1 π1 = 1. b52v3 b53 b33 v2 We have to essentially specify v3 these, using the motion data, b41 b42 b43 one HMM per category. v1 v2 v3 55/65
59.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)Some procedures: Pre-processing via vector quantisation: to build a codebook/process acceleration data in terms of observation symbols giving observation sequences. Forward algorithm: to calculate P (O|λc ), where λc denotes the cth HMM, and O is an observation sequence. Forward-backward algorithm: To estimate the parameters (A and B) of the HMM using multiple observation sequences, i.e. training. Baye’s rule: Together with the forward algorithm, helps ascertain P (λc |O), i.e. recognition that a new observation sequence O belongs to category c. 56/65
60.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)Pre-processing by vector quantisation: Any acceleration stream (stream of 3D vectors) has large range of values and ﬁne granularity. Abstracting each 3D vector into codes. Using k-means clustering and ﬁnding centroid of each cluster results in code words/vectors of the codebook. See Klingmann (2009), Sections 3.1 and 4.4, and Schloemer (2008). Index of a code word is what is used as an observation. String of indices of code words matching vectors in data/stream is then the observation sequence O. 57/65
61.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)Forward algorithm: To ﬁnd P (O|λc ). Probabilities/forward variables αt (i)’s need to be computed and used to ﬁnd this. See Rabiner (1989), Section III-A and Klingmann (2009), Section 3.2.4. An eﬃcient way to compute: P (O|λc ) = all Q P (O|Q, λc )P (Q|λc ), where Q’s are the many possible (N T ) state sequences that may be visited to generate O. 58/65
62.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)Forward algorithm (ﬁgures from Rabiner (1989): αt (i) = P (O1 ...Ot , qt = Si |λc ) N P (O|λc ) = i=1 αT (i) 59/65
63.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)Training: Forward-backward algorithm. For estimation of A and B matrices for each λc , given the respective training observation sequences. αt (i)’s and backward variables βt (i)’s need to be computed and used to update A and B. βt (i) is the probability of generating the remaining part of the observation sequence from time t + 1 to T , given state Si at time t, i.e. P (Ot+1 Ot+2 ...OT |qt = Si , λc ). See Rabiner (1989), Sections III-C and V-B, and Klingmann (2009), Sections 3.2.5, 3.2.6, 4.5.2. 60/65
64.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)A update: expected number of transitions from Si to Sjaij =¯ expected number of transitions from SiB update:¯jk = expected number of times in Sj and observing symbolb vk expected number of times in Sj αt (i)’s and βt (i)’s used within the above update equations. See Section V-B in Rabiner (1989) and Sections 4.5.2 in Klingmann (2009) for a variant that we will use. This variant takes care of multiple training observation sequences. 61/65
65.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)Recognition: If Ostream is the stream to be classiﬁed. We want to ﬁnd P (λc |Ostream ). We use the forward algorithm and Baye’s Rule for this. P (λc |Ostream ) is the probability that λc , i.e. the HMM indexed by c, generated the sequence Ostream . The highest probability amongst all λc ’s, tells us the category c the stream has been classiﬁed into. 62/65
66.
Motion Classiﬁcation Algorithms:Hidden Markov Models (HMM)Recognition: P (λc ) may be calculated as the average of P (Oj |λc ) across training observation sequences Oj ’s. We compute P (λc ) and P (Ostream |λc ) for all m’s. P (Ostream ) = c P (Ostream , λc ) = c P (Ostream |λc )P (λc ). stream |λc )P P (λc |Ostream ) = P (OP (Ostream ) (λc ) See Klingmann (2009), Section 3.3. 63/65
67.
Challenging Active Music ScenariosLower level technical challenges: How well does the system classify when reference point (user) is stationary and moving? Can we distinguish these? How well does the system separate impulsive and sustained actions, e.g. hitting a drum versus bowing a violin? Can it diﬀerentiate or otherwise between using the right or left hand to do the "same" action? 64/65
68.
Challenging Active Music ScenariosHigher level semantic challenges: Can it separate gestures from actions, i.e. ﬁnd the meaning-bearing part, e.g. diﬀerence between actions that are performed with a sad, happy or angry intention? Can it distinguish between an expert user Vs. non-expert user handling the device? 65/65
Be the first to comment