SlideShare a Scribd company logo
Introduction
                     Our solution
                  TVParser model
              Experimental Results
                        Conclusion




TVParser: An Automatic TV Video Parsing
               Method

                           Chao Liang

       National Laboratory of Pattern Recognition (NLPR)
  Chinese Academy of Sciences, Institute of Automation (CASIA)


                         March 9, 2011




                       Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                               Our solution
                            TVParser model
                        Experimental Results
                                  Conclusion


Outline
  1   Introduction
        Motivation
        Related work
  2   Our solution
       Basic ideas
       Role histogram
  3   TVParser model
       Model formulation
       Parameter estimation
       State inference
  4   Experimental Results
        Data sets
        Face naming
        Scene segmentation
  5   Conclusion

                                 Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                          Our solution
                                          Motivation
                       TVParser model
                                          Related work
                   Experimental Results
                             Conclusion


Introduction
      Motivation
          Voluminous TV videos vs. efficient management




                            Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                              Our solution
                                                 Motivation
                           TVParser model
                                                 Related work
                       Experimental Results
                                 Conclusion


Introduction
      TV video
          Story plot (scene structure)
                 [Scene: Monica and Rachel's, Carol and Susan
                 are showing off Ben to the gang.]
                 Phoebe: Oh my God, oh, ok, was that too much pressure for him?
                 Susan: Oh, is he hungry already?
                 Carol: I guess so. (Carol starts to breast feed Ben.)
                 … …

                 [Scene: Central Perk, the gang is all there.]
                 Julie: Rachel, do you have any muffins left?
                 Rachel: Yeah, I forget which ones.
                 Julie: Oh, you're busy, that's ok, I'll get it. Anybody else want one?
                 … …


          Characters (named faces)




                    RACH         MNCA         PHBE       ROSS         JOY         CHAN

                                Chao Liang       TVParser: An Automatic TV Video Parsing Method
Introduction
                                   Our solution
                                                      Motivation
                                TVParser model
                                                      Related work
                            Experimental Results
                                      Conclusion


Related work
     Movie/Script alignment
         Script-subtitle alignment


                                                    00:10:44,210 -->
          [Scene: Rachel is                           00:10:45,177
          entering the living room.]                 Monica: Julie.

               Monica: Julie.                       00:10:45,444 -->
                                                      00:10:46,775
               Rachel: What?!                        Rachel: What?!

                   script                               subtitle                    movie


     Disadvantages
         Syntax and words discrepancy between the script and subtitle
         Availability of the subtitle


                                       Chao Liang     TVParser: An Automatic TV Video Parsing Method
Introduction
                           Our solution
                                                Motivation
                        TVParser model
                                                Related work
                    Experimental Results
                              Conclusion


Related work (cont.)
      Face naming
          Fully supervised
          Weakly supervised

                                     [Scene: Rachel is
                                                                                   Monica
                                     entering the living room.]

                                     Monica: Julie.
                                                                                   Rachel
                                     Rachel: What?!

                      (a) weakly supervised                           (b) fully supervised

      Disadvantages
          Expensive manual labels
          Large-scale applications

                             Chao Liang         TVParser: An Automatic TV Video Parsing Method
Introduction
                                  Our solution
                                                                   Motivation
                               TVParser model
                                                                   Related work
                           Experimental Results
                                     Conclusion


Related work (cont.)
      Scene segmentation
          Content-based method
          Script-guided method
                              t=1                        t=2                           t=3                                 t=4
                              Shot 1                     Shot 2                        Shot 3                              Shot 4
            Observation
             sequence
                                  bq1, shot1                 bq2, shot2                    bq3, shot3                          bq4, shot4

                             Scene q1                   Scene q2                      Scene q3                            Scene q4
            Hidden state                       aq1,q2                     aq2,q3                        aq3,q4
                                                                                                                                            ...
             sequence
                                         aq1,q3                                                                  aq2,q4

                                                                           aq1,q4
            HMM : λ= {A, B, п} = {A(aqi, qj), B(bqi, shotj),п}               Viterbi alignment : Q = {q1, q2, q3, q4, q5, ...}



      Disadvantages
          Matching units are asymmetric
          Latent geometric distribution

                                          Chao Liang               TVParser: An Automatic TV Video Parsing Method
Introduction
                                Our solution
                                                          Basic ideas
                             TVParser model
                                                          Role histogram
                         Experimental Results
                                   Conclusion


Our solution
      Basic ideas
          A generative TVParser model to align video and script by
          mining face-name correspondence.

          JOEY   3      0      1             2       0     2      2    0    0    1      1     2     1
          MNCA   2      1      0             2       0     1      1    1    0    2      0     0     0
          RACH   1      1      0             1       0     0      1    0    1    1      0     0     0
          CHAN   0      0      1             0       0     0      1    0    0    0      0     2     0
                 C1      C2    C3             S1     S2    S3     S4   S7   S8   S9     S10   S11   S12

                                                   C1:{S1, ,S4}        C2:{S6, ,S8}     C3:{S10, ,S12}
                    name histogram                                     face histogram


      Advantages
          Face names can be identified in an unsupervised way (learning)
          Global optimal scene segmentation can be inferred (inference)
          Fast algorithms for both parameter learning and state inference
                                     Chao Liang           TVParser: An Automatic TV Video Parsing Method
Introduction
                          Our solution
                                          Basic ideas
                       TVParser model
                                          Role histogram
                   Experimental Results
                             Conclusion


Role histogram
     Basic idea
         Bag-of-Words (BoW) representation
         Role composition is a generic and semantic feature for both
         video (as face histogram) and script (as name histogram)
     Name clustering
     Face clustering
         Difficulty: variational environment conditions, e.g. pose, etc.




                            Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                              Our solution
                                              Basic ideas
                           TVParser model
                                              Role histogram
                       Experimental Results
                                 Conclusion


Role histogram
     Face clustering
         Solution I: Semi-supervised kernel k-means clustering

     Key points
         Incorporate pairwise constraints (must-link and cannot-link)
         Adopt manifold-manifold distance




                   t

            must-link and cannot-link                     manifold-manifold distance


                                Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                          Our solution
                                          Basic ideas
                       TVParser model
                                          Role histogram
                   Experimental Results
                             Conclusion


Role histogram
     Face clustering
         Solution II: Loose clustering number
     Key points
         Allowing purified substructures




                            Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                                       Our solution         Model formulation
                                    TVParser model          Parameter estimation
                                Experimental Results        State inference
                                          Conclusion


Model formulation

     Graphical TVParser model

                   v(i-1)                                     v(i)                                v(i+1)

                            ...                                         ...                                ...
       ti-1                             ti-1+di-1    ti                            ti+di   ti+1                      ti+1+di+1



                     pi-1 = (ti-1 , di-1)                         pi = (ti , di)                      pi+1 = (ti+1 , di+1)



                   si-1                                      si                                    si+1




              S : {si |i=1, · · ·, r } is observed script scene sequence;
              V : {vj |j=1, · · ·, u} is observed video shot sequence;
              P : {pi =(ti , di )|i=1, · · · , r } is the hidden video scene partition
              sequence where t1 = 1, i di = u and ti = ti−1 + di−1 (i > 1).


                                               Chao Liang   TVParser: An Automatic TV Video Parsing Method
Introduction
                               Our solution     Model formulation
                            TVParser model      Parameter estimation
                        Experimental Results    State inference
                                  Conclusion


Model formulation

      Complete TVParser model

                P(V, S, P) = P(s1 )P(p1 |s1 )P(v(1) |p1 , s1 )
                                     r
                               ×         P(si |si−1 )P(pi |si )P(v(i) |pi , si )
                                   i=2


  The generative process
  (1) Enter into the i th script scene si from its predecessor si−1 ;
  (2) Decide si ’s related partition pi = (ti , di );
  (3) Generate the corresponding video shot subsequence v(i) = v[ti :ti +dj ]
      indexing from ti to ti + di

                                 Chao Liang     TVParser: An Automatic TV Video Parsing Method
Introduction
                          Our solution         Model formulation
                       TVParser model          Parameter estimation
                   Experimental Results        State inference
                             Conclusion


Model formulation
     Additional constraint
         P(s1 ) = 1 ⇔ s1 = 1
         P(si |si−1 ) = 1 ⇔ si = i, si−1 = i − 1


     Simplified TVParser model
                                          r
                 P(V, S, P) =                 P(pi |si ) P(v(i) |pi , si )
                                      i=1
                                              duration      observation




                            Chao Liang         TVParser: An Automatic TV Video Parsing Method
Introduction
                          Our solution      Model formulation
                       TVParser model       Parameter estimation
                   Experimental Results     State inference
                             Conclusion


Model formulation
     Scene duration probability
         Poisson distribution

                                          λdi e −λi          λdi
                  P(pi |si ; λi ) =        i
                                                    = e −λi · i
                                             di !            di !

     Reasons
         Poisson is a plausible model of state duration;
         Model parameter, λ = {λi }, is the expected duration of scenes;
         Parameter can be estimated by Maximum likelihood method




                            Chao Liang      TVParser: An Automatic TV Video Parsing Method
Introduction
                               Our solution    Model formulation
                            TVParser model     Parameter estimation
                        Experimental Results   State inference
                                  Conclusion


Model formulation
      Observation probability
            Gaussian distribution


                                        1         (si − A v(i) ) (si − A v(i) )
      P(v(i) |pi , si ; A, σi ) =            exp −
                                       2πσi2                  2σi2


  Meaning for parameter A
  A = [Aij ] ∈ RM×N is the face-name relation matrix that associates
  M name with N face clusters. By regulating the entry of A as
  Aij ≥ 0 and      i Aij = 1, we can treat each column as a identity
  distribution of the face cluster.

                                 Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                           Our solution    Model formulation
                        TVParser model     Parameter estimation
                    Experimental Results   State inference
                              Conclusion


Parameter estimation
     Model parameters Ψ = {{λi }, {σi2 }, A}
         Maximum likelihood estimation (MLE)

               max                                         ˆ
                             P(P|V, S; Ψ) · log P(V, S, P; Ψ)
                ˆ
                Ψ       P
                s.t.   1M A = 1N
                       A ≥ 0,

     Optimization problem
         For {λi }and{σi }, unconstraint optimization
         For A, constraint optimization




                             Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                          Our solution          Model formulation
                       TVParser model           Parameter estimation
                   Experimental Results         State inference
                             Conclusion


Parameter estimation

     Re-estimation for {λi }

                                    pi    P(pi |V, S; Ψ) · di
                         λi =
                                          pi   P(pi |V, S; Ψ)


     Re-estimation for {σi }

                    pi   P(pi |V, S; Ψ) · (si −Av(i) )(si −Av(i) )
           σi2 =
                                          pi   P(pi |V, S; Ψ)




                            Chao Liang          TVParser: An Automatic TV Video Parsing Method
Introduction
                         Our solution    Model formulation
                      TVParser model     Parameter estimation
                  Experimental Results   State inference
                            Conclusion


Parameter estimation

     Re-estimation for A

                                         (W − 1M η )+
                                                    ij
                Aij ← Aij
                                 2(AU)ij + (W − 1 M η )−
                                                       ij

     where
                                                    r
                                                         1
               W=              P(P|V, S; Ψ)                si v
              
                                                         σi2 (i)
              
              
              
              
              
                          P                       i=1
                                                   r
                                         1
               U=     P(P|V, S; Ψ)          v(i) v(i)
                                       2σi2
                    P               i=1
              
              
              
               η = 1 · (1 W − 2 1 U)
              
              
              
                   M      M        N


                           Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                          Our solution        Model formulation
                       TVParser model         Parameter estimation
                   Experimental Results       State inference
                             Conclusion


Parameter estimation

      Summation in both W and U

                                         P(P|V, S; Ψ)
                                 P


          Sum over the whole possible partition sequence space
          Typical example: u = 15 (scenes) and r = 300 (shots), then
          possible segmentation number: C15 ≈ O(1024 ) (Intractable!)
                                          299


  Solution: Sequence ⇒ segments
                                     r         r
                P(P|V, S; Ψ)              =              P(pi |V, S; Ψ)
            P                     i=1         i=1 pi



                            Chao Liang        TVParser: An Automatic TV Video Parsing Method
Introduction
                          Our solution      Model formulation
                       TVParser model       Parameter estimation
                   Experimental Results     State inference
                             Conclusion


Parameter estimation
     Posterior probability P(pi |V, S; Ψ)
          Forward-backward algorithm


     Forward-backward variables

                      αpi (si )      P(si , pi , v[1:ti +di ] ; Ψ)
                      βpi (si )      P(v[ti +di +1:u] |si , pi ; Ψ)

          Forward-backward recursion
          Initial conditions




                            Chao Liang      TVParser: An Automatic TV Video Parsing Method
Introduction
                             Our solution    Model formulation
                          TVParser model     Parameter estimation
                      Experimental Results   State inference
                                Conclusion


State inference
      Hidden partition sequence P ∗
          Viterbi Algorithm


      Local optimal

            δτ (si ; θ)    max P(p[1:i−1] , s[1:i−1] , τ ∈ qi , o[1:τ ] ; θ)
                           p[1:i−1]



          Forward recursion
          Backtracking




                               Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                            Our solution    Data sets
                         TVParser model     Face naming
                     Experimental Results   Scene segmentation
                               Conclusion


Data sets
      Two TV series
            6 episodes from American TV series “Friends”
            5 episodes from Chinese TV series “I Love My Family”(Family)


      Data details (average per episode)
            Length: 30 min
            Role number: 10
            Face number: 2 × 105
            Shot number: 300




                              Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                             Our solution    Data sets
                          TVParser model     Face naming
                      Experimental Results   Scene segmentation
                                Conclusion


Face naming
     Baselines
         Face clustering
                 Unconstrained kernel K means (KK)
                 Constraint K -means (CK)
                 Completely positive factorization (CP)
                 Constraint spectral Learning (SL)

         Face Recognition
                 K nearest neighbor (KNN)
                 Support vector machine (SVM)




                               Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                            Our solution            Data sets
                         TVParser model             Face naming
                     Experimental Results           Scene segmentation
                               Conclusion


Face naming
     Criteria
          Face clustering
                                                                          n·nl,h
                                                l      h   nl.h log(       nl nh )
                       NMI =                                nl                       nh
                                       (    l   nl log      n    )(   h   nh log     n )

          where n is the number of objects, nl is the size of the l th class
          in the groundtruth, nh is the size of the hth cluster in the result
          and nl,h is the size of their intersect.
          Face Recognition
                                                2 × precisioni × recalli
                        Fw =           wi ·
                                                  precisioni + recalli
                                   i

          where wi denotes the weight of the i th role according to
          his/her spoken lines in the script.
                              Chao Liang            TVParser: An Automatic TV Video Parsing Method
Introduction
                                                    Our solution           Data sets
                                                 TVParser model            Face naming
                                             Experimental Results          Scene segmentation
                                                       Conclusion


Face naming
                    Face clustering
                          Constraint vs. unconstraint
                          Clustering number variance


                                   Friends                                                                Family
              0.5                                                                    0.5


              0.4                                                                    0.4
  NMI score




                                                                         NMI score
              0.3                                                                    0.3


              0.2                                                                    0.2
                                                              CK                                                                    CK
                                                              KK                                                                    KK
              0.1                                             SSKK                   0.1                                            SSKK
                                                              SL                                                                    SL
                                                              CP                                                                    CP
               0                                                                      0
              X 0.0    X 1.0    X 2.0    X 3.0        X 4.0      X 5.0               X 0.0   X 1.0    X 2.0    X 3.0        X 4.0      X 5.0
                           Cluster number (x times)                                              Cluster number (x times)


                                                        Chao Liang         TVParser: An Automatic TV Video Parsing Method
Introduction
                                           Our solution       Data sets
                                        TVParser model        Face naming
                                    Experimental Results      Scene segmentation
                                              Conclusion


Face naming
            Face recognition (naming)
                     Optimal recognition achieved when the clustering number
                     approximates 2 times of the character number


                          Friends                                                     Family
  0.7                                                          0.8

  0.6
                                                               0.6
  0.5

  0.4
                                                               0.4
  0.3

  0.2                                                          0.2

  0.1                                   A purifying rate                                           A purifying rate
                                        Precision               0                                  Precision
    0                                   Recall                                                     Recall
                                        Fw-measure                                                 Fw-measure
  -0.1                                                        -0.2
    X 0.0    X 1.0     X 2.0    X 3.0      X 4.0      X 5.0     X 0.0   X 1.0      X 2.0   X 3.0      X 4.0      X 5.0
                Cluster number (x times)                                   Cluster number (x times)



                                               Chao Liang     TVParser: An Automatic TV Video Parsing Method
Introduction
                                                               Our solution             Data sets
                                                            TVParser model              Face naming
                                                        Experimental Results            Scene segmentation
                                                                  Conclusion


Face naming
                               Main character naming result
                                     Accuracy
                                     Robustness


                                             Friends                                                                              Family
                       0.8                                                                                   0.7

                       0.7                                                                                   0.6
  Weighted F-measure




                                                                                        Weighted F-measure
                       0.6
                                                                                                             0.5
                       0.5
                                                                                                             0.4
                       0.4
                                                                                                             0.3
                       0.3
                                                               1st main character                            0.2                                   1st main character
                       0.2
                                                               2nd main character                                                                  2nd main character
                       0.1                                     3rd main character                            0.1                                   3rd main character
                                                               4th main character                                                                  4th main character
                        0                                                                                     0
                       X 0.0     X 1.0       X 2.0     X 3.0        X 4.0       X 5.0                        X 0.0   X 1.0       X 2.0     X 3.0        X 4.0       X 5.0
                                         Cluster number (x times)                                                            Cluster number (x times)



                                                                       Chao Liang       TVParser: An Automatic TV Video Parsing Method
Introduction
                                                                Our solution            Data sets
                                                             TVParser model             Face naming
                                                         Experimental Results           Scene segmentation
                                                                   Conclusion


Face naming
                               Compare with supervised methods
                                      Comparable to supervised methods
                                      Even better when training set is limited


                                             Friends                                                                               Family
                        1                                                                                     1

                       0.9                                                                                   0.9
  Weighted F-measure




                                                                                        Weighted F-measure
                       0.8                                                                                   0.8

                       0.7                                                                                   0.7

                       0.6                                                                                   0.6

                       0.5                                  KNN                                              0.5                                 KNN
                                                            SVM                                                                                  SVM
                       0.4                                                st                                 0.4
                                                            TVParser (1 best)                                                                    TVParser (1st best)
                       0.3                                  TVParser (2nd best)                              0.3                                 TVParser (2nd best)
                                                            TVParser (3rd best)                                                                  TVParser (3rd best)
                       0.2                                                                                   0.2
                         0.1    0.2   0.3   0.4    0.5    0.6     0.7     0.8     0.9                          0.1   0.2   0.3   0.4    0.5    0.6     0.7   0.8       0.9
                                            training-test-ratio                                                                  training-test-ratio


                                                                        Chao Liang      TVParser: An Automatic TV Video Parsing Method
Introduction
                            Our solution    Data sets
                         TVParser model     Face naming
                     Experimental Results   Scene segmentation
                               Conclusion


Scene segmentation
     Baselines
         Scene segmentation methods (algorithms)
                 Shot similarity graph (SSG)
                 Dynamic time warping (DTW)
                 Hidden Markov model (HMM)




                              Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                            Our solution          Data sets
                         TVParser model           Face naming
                     Experimental Results         Scene segmentation
                               Conclusion


Scene segmentation
     Criteria
          Scene segmentation
                                 r           r               r           r
                                     di             2
                                                  dij             dj∗         dij2
                       ρ=(                            )·(                          )
                                     u            di2             u           dj∗2
                               i=1          j=1             j=1         i=1

          where dij is the length of overlap between the scene segment
          pi and pj∗ , di is the length of the scene pi and r is total length
          of all scenes. This purity value ranges from 0 to 1, and the
          larger a value is, the closer it is to the groundtruth.




                              Chao Liang          TVParser: An Automatic TV Video Parsing Method
Introduction
                           Our solution    Data sets
                        TVParser model     Face naming
                    Experimental Results   Scene segmentation
                              Conclusion


Scene segmentation

     Scene segmentation result


     Segmentation         Sources                   Purity Scores
       Methods           (video+)              Friends         Family
         SSG                 -             0.55 ± 0.11           0.53 ± 0.07
        DTW              sub.+scr.         0.60 ± 0.13                -
        HMM                 scr.           0.59 ± 0.08           0.53 ± 0.05
       TVParser             scr.           0.67 ± 0.07           0.58 ± 0.03




                             Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                                                                  Our solution                     Data sets
                                                               TVParser model                      Face naming
                                                           Experimental Results                    Scene segmentation
                                                                     Conclusion


Scene segmentation
                         Scene segmentation result under various role histograms
                                    Name histogram: first four characters are dominant
                                    Face histogram: more clusters are generally better


                                                                                                                                0.6




                                                                                                              Average purity
                                                                                                                                                   ↑0.05(≈29%)
                                                                                                                               0.55



                 0.7                                                                                                            0.5

                                                                                                                                                   ↑0.12(≈71%)
                 0.6                                                                                   0.65
  Purity score




                                                                                                                               0.45
                                                                                                       0.6
                 0.5
                                                                                                                                0.4
                                                                                                       0.55                        2     3     4      5     6      7     8      9    10      11
                                                                                                                                             Face histogram size
                 0.4                                                                                   0.5                      0.6

                                                                                                       0.45                    0.58




                                                                                                              Average purity
        X 2.50                                                                                         0.4
                 Fac   X 2.00                                                                                                  0.54

                       e h X 1.50                                                      10
                                                                                                 ion
                          ist                                                              ens
                              ogr X 1.00
                                                                                  8
                                                                                     dim
                                  am X 0.50                           6          ram                                            0.5

                                      dim                     4       is   tog
                                          ens X 0.00              e h
                                             ion
                                                       2    Nam                                                                0.46
                                                                                                                                X 0.25       X 0.75       X 1.25       X 1.75       X 2.25
                                                                                                                                             Face histogram size



                                                                       Chao Liang                  TVParser: An Automatic TV Video Parsing Method
Introduction
                            Our solution
                         TVParser model
                     Experimental Results
                               Conclusion


Conclusion

     We propose a generative model to formulate story plot
     development in TV videos, which solves face naming and
     scene segmentation in an unified framework.

     Key novelties
         Unsupervised face naming through model parameter learning
         Global optimal scene segmentation by hidden state inference
         Fast algorithms for both parameter learning and state inference


     Future work
         Personalized applications, e.g. TV video synthesis, etc;
         Generic cross-media analysis and association methods.


                              Chao Liang    TVParser: An Automatic TV Video Parsing Method
Introduction
                 Our solution
              TVParser model
          Experimental Results
                    Conclusion



Q&A
Thanks!




                   Chao Liang    TVParser: An Automatic TV Video Parsing Method

More Related Content

Recently uploaded

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 

Recently uploaded (20)

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 

Featured

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Alireza Esmikhani
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
Erica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software
 

Featured (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

Tv parser an automatic tv video parsing method_liang_20100309

  • 1. Introduction Our solution TVParser model Experimental Results Conclusion TVParser: An Automatic TV Video Parsing Method Chao Liang National Laboratory of Pattern Recognition (NLPR) Chinese Academy of Sciences, Institute of Automation (CASIA) March 9, 2011 Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 2. Introduction Our solution TVParser model Experimental Results Conclusion Outline 1 Introduction Motivation Related work 2 Our solution Basic ideas Role histogram 3 TVParser model Model formulation Parameter estimation State inference 4 Experimental Results Data sets Face naming Scene segmentation 5 Conclusion Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 3. Introduction Our solution Motivation TVParser model Related work Experimental Results Conclusion Introduction Motivation Voluminous TV videos vs. efficient management Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 4. Introduction Our solution Motivation TVParser model Related work Experimental Results Conclusion Introduction TV video Story plot (scene structure) [Scene: Monica and Rachel's, Carol and Susan are showing off Ben to the gang.] Phoebe: Oh my God, oh, ok, was that too much pressure for him? Susan: Oh, is he hungry already? Carol: I guess so. (Carol starts to breast feed Ben.) … … [Scene: Central Perk, the gang is all there.] Julie: Rachel, do you have any muffins left? Rachel: Yeah, I forget which ones. Julie: Oh, you're busy, that's ok, I'll get it. Anybody else want one? … … Characters (named faces) RACH MNCA PHBE ROSS JOY CHAN Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 5. Introduction Our solution Motivation TVParser model Related work Experimental Results Conclusion Related work Movie/Script alignment Script-subtitle alignment 00:10:44,210 --> [Scene: Rachel is 00:10:45,177 entering the living room.] Monica: Julie. Monica: Julie. 00:10:45,444 --> 00:10:46,775 Rachel: What?! Rachel: What?! script subtitle movie Disadvantages Syntax and words discrepancy between the script and subtitle Availability of the subtitle Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 6. Introduction Our solution Motivation TVParser model Related work Experimental Results Conclusion Related work (cont.) Face naming Fully supervised Weakly supervised [Scene: Rachel is Monica entering the living room.] Monica: Julie. Rachel Rachel: What?! (a) weakly supervised (b) fully supervised Disadvantages Expensive manual labels Large-scale applications Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 7. Introduction Our solution Motivation TVParser model Related work Experimental Results Conclusion Related work (cont.) Scene segmentation Content-based method Script-guided method t=1 t=2 t=3 t=4 Shot 1 Shot 2 Shot 3 Shot 4 Observation sequence bq1, shot1 bq2, shot2 bq3, shot3 bq4, shot4 Scene q1 Scene q2 Scene q3 Scene q4 Hidden state aq1,q2 aq2,q3 aq3,q4 ... sequence aq1,q3 aq2,q4 aq1,q4 HMM : λ= {A, B, п} = {A(aqi, qj), B(bqi, shotj),п} Viterbi alignment : Q = {q1, q2, q3, q4, q5, ...} Disadvantages Matching units are asymmetric Latent geometric distribution Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 8. Introduction Our solution Basic ideas TVParser model Role histogram Experimental Results Conclusion Our solution Basic ideas A generative TVParser model to align video and script by mining face-name correspondence. JOEY 3 0 1 2 0 2 2 0 0 1 1 2 1 MNCA 2 1 0 2 0 1 1 1 0 2 0 0 0 RACH 1 1 0 1 0 0 1 0 1 1 0 0 0 CHAN 0 0 1 0 0 0 1 0 0 0 0 2 0 C1 C2 C3 S1 S2 S3 S4 S7 S8 S9 S10 S11 S12 C1:{S1, ,S4} C2:{S6, ,S8} C3:{S10, ,S12} name histogram face histogram Advantages Face names can be identified in an unsupervised way (learning) Global optimal scene segmentation can be inferred (inference) Fast algorithms for both parameter learning and state inference Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 9. Introduction Our solution Basic ideas TVParser model Role histogram Experimental Results Conclusion Role histogram Basic idea Bag-of-Words (BoW) representation Role composition is a generic and semantic feature for both video (as face histogram) and script (as name histogram) Name clustering Face clustering Difficulty: variational environment conditions, e.g. pose, etc. Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 10. Introduction Our solution Basic ideas TVParser model Role histogram Experimental Results Conclusion Role histogram Face clustering Solution I: Semi-supervised kernel k-means clustering Key points Incorporate pairwise constraints (must-link and cannot-link) Adopt manifold-manifold distance t must-link and cannot-link manifold-manifold distance Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 11. Introduction Our solution Basic ideas TVParser model Role histogram Experimental Results Conclusion Role histogram Face clustering Solution II: Loose clustering number Key points Allowing purified substructures Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 12. Introduction Our solution Model formulation TVParser model Parameter estimation Experimental Results State inference Conclusion Model formulation Graphical TVParser model v(i-1) v(i) v(i+1) ... ... ... ti-1 ti-1+di-1 ti ti+di ti+1 ti+1+di+1 pi-1 = (ti-1 , di-1) pi = (ti , di) pi+1 = (ti+1 , di+1) si-1 si si+1 S : {si |i=1, · · ·, r } is observed script scene sequence; V : {vj |j=1, · · ·, u} is observed video shot sequence; P : {pi =(ti , di )|i=1, · · · , r } is the hidden video scene partition sequence where t1 = 1, i di = u and ti = ti−1 + di−1 (i > 1). Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 13. Introduction Our solution Model formulation TVParser model Parameter estimation Experimental Results State inference Conclusion Model formulation Complete TVParser model P(V, S, P) = P(s1 )P(p1 |s1 )P(v(1) |p1 , s1 ) r × P(si |si−1 )P(pi |si )P(v(i) |pi , si ) i=2 The generative process (1) Enter into the i th script scene si from its predecessor si−1 ; (2) Decide si ’s related partition pi = (ti , di ); (3) Generate the corresponding video shot subsequence v(i) = v[ti :ti +dj ] indexing from ti to ti + di Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 14. Introduction Our solution Model formulation TVParser model Parameter estimation Experimental Results State inference Conclusion Model formulation Additional constraint P(s1 ) = 1 ⇔ s1 = 1 P(si |si−1 ) = 1 ⇔ si = i, si−1 = i − 1 Simplified TVParser model r P(V, S, P) = P(pi |si ) P(v(i) |pi , si ) i=1 duration observation Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 15. Introduction Our solution Model formulation TVParser model Parameter estimation Experimental Results State inference Conclusion Model formulation Scene duration probability Poisson distribution λdi e −λi λdi P(pi |si ; λi ) = i = e −λi · i di ! di ! Reasons Poisson is a plausible model of state duration; Model parameter, λ = {λi }, is the expected duration of scenes; Parameter can be estimated by Maximum likelihood method Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 16. Introduction Our solution Model formulation TVParser model Parameter estimation Experimental Results State inference Conclusion Model formulation Observation probability Gaussian distribution 1 (si − A v(i) ) (si − A v(i) ) P(v(i) |pi , si ; A, σi ) = exp − 2πσi2 2σi2 Meaning for parameter A A = [Aij ] ∈ RM×N is the face-name relation matrix that associates M name with N face clusters. By regulating the entry of A as Aij ≥ 0 and i Aij = 1, we can treat each column as a identity distribution of the face cluster. Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 17. Introduction Our solution Model formulation TVParser model Parameter estimation Experimental Results State inference Conclusion Parameter estimation Model parameters Ψ = {{λi }, {σi2 }, A} Maximum likelihood estimation (MLE) max ˆ P(P|V, S; Ψ) · log P(V, S, P; Ψ) ˆ Ψ P s.t. 1M A = 1N A ≥ 0, Optimization problem For {λi }and{σi }, unconstraint optimization For A, constraint optimization Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 18. Introduction Our solution Model formulation TVParser model Parameter estimation Experimental Results State inference Conclusion Parameter estimation Re-estimation for {λi } pi P(pi |V, S; Ψ) · di λi = pi P(pi |V, S; Ψ) Re-estimation for {σi } pi P(pi |V, S; Ψ) · (si −Av(i) )(si −Av(i) ) σi2 = pi P(pi |V, S; Ψ) Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 19. Introduction Our solution Model formulation TVParser model Parameter estimation Experimental Results State inference Conclusion Parameter estimation Re-estimation for A (W − 1M η )+ ij Aij ← Aij 2(AU)ij + (W − 1 M η )− ij where  r 1  W= P(P|V, S; Ψ) si v  σi2 (i)       P i=1  r 1  U= P(P|V, S; Ψ) v(i) v(i)  2σi2 P i=1     η = 1 · (1 W − 2 1 U)    M M N Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 20. Introduction Our solution Model formulation TVParser model Parameter estimation Experimental Results State inference Conclusion Parameter estimation Summation in both W and U P(P|V, S; Ψ) P Sum over the whole possible partition sequence space Typical example: u = 15 (scenes) and r = 300 (shots), then possible segmentation number: C15 ≈ O(1024 ) (Intractable!) 299 Solution: Sequence ⇒ segments r r P(P|V, S; Ψ) = P(pi |V, S; Ψ) P i=1 i=1 pi Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 21. Introduction Our solution Model formulation TVParser model Parameter estimation Experimental Results State inference Conclusion Parameter estimation Posterior probability P(pi |V, S; Ψ) Forward-backward algorithm Forward-backward variables αpi (si ) P(si , pi , v[1:ti +di ] ; Ψ) βpi (si ) P(v[ti +di +1:u] |si , pi ; Ψ) Forward-backward recursion Initial conditions Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 22. Introduction Our solution Model formulation TVParser model Parameter estimation Experimental Results State inference Conclusion State inference Hidden partition sequence P ∗ Viterbi Algorithm Local optimal δτ (si ; θ) max P(p[1:i−1] , s[1:i−1] , τ ∈ qi , o[1:τ ] ; θ) p[1:i−1] Forward recursion Backtracking Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 23. Introduction Our solution Data sets TVParser model Face naming Experimental Results Scene segmentation Conclusion Data sets Two TV series 6 episodes from American TV series “Friends” 5 episodes from Chinese TV series “I Love My Family”(Family) Data details (average per episode) Length: 30 min Role number: 10 Face number: 2 × 105 Shot number: 300 Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 24. Introduction Our solution Data sets TVParser model Face naming Experimental Results Scene segmentation Conclusion Face naming Baselines Face clustering Unconstrained kernel K means (KK) Constraint K -means (CK) Completely positive factorization (CP) Constraint spectral Learning (SL) Face Recognition K nearest neighbor (KNN) Support vector machine (SVM) Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 25. Introduction Our solution Data sets TVParser model Face naming Experimental Results Scene segmentation Conclusion Face naming Criteria Face clustering n·nl,h l h nl.h log( nl nh ) NMI = nl nh ( l nl log n )( h nh log n ) where n is the number of objects, nl is the size of the l th class in the groundtruth, nh is the size of the hth cluster in the result and nl,h is the size of their intersect. Face Recognition 2 × precisioni × recalli Fw = wi · precisioni + recalli i where wi denotes the weight of the i th role according to his/her spoken lines in the script. Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 26. Introduction Our solution Data sets TVParser model Face naming Experimental Results Scene segmentation Conclusion Face naming Face clustering Constraint vs. unconstraint Clustering number variance Friends Family 0.5 0.5 0.4 0.4 NMI score NMI score 0.3 0.3 0.2 0.2 CK CK KK KK 0.1 SSKK 0.1 SSKK SL SL CP CP 0 0 X 0.0 X 1.0 X 2.0 X 3.0 X 4.0 X 5.0 X 0.0 X 1.0 X 2.0 X 3.0 X 4.0 X 5.0 Cluster number (x times) Cluster number (x times) Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 27. Introduction Our solution Data sets TVParser model Face naming Experimental Results Scene segmentation Conclusion Face naming Face recognition (naming) Optimal recognition achieved when the clustering number approximates 2 times of the character number Friends Family 0.7 0.8 0.6 0.6 0.5 0.4 0.4 0.3 0.2 0.2 0.1 A purifying rate A purifying rate Precision 0 Precision 0 Recall Recall Fw-measure Fw-measure -0.1 -0.2 X 0.0 X 1.0 X 2.0 X 3.0 X 4.0 X 5.0 X 0.0 X 1.0 X 2.0 X 3.0 X 4.0 X 5.0 Cluster number (x times) Cluster number (x times) Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 28. Introduction Our solution Data sets TVParser model Face naming Experimental Results Scene segmentation Conclusion Face naming Main character naming result Accuracy Robustness Friends Family 0.8 0.7 0.7 0.6 Weighted F-measure Weighted F-measure 0.6 0.5 0.5 0.4 0.4 0.3 0.3 1st main character 0.2 1st main character 0.2 2nd main character 2nd main character 0.1 3rd main character 0.1 3rd main character 4th main character 4th main character 0 0 X 0.0 X 1.0 X 2.0 X 3.0 X 4.0 X 5.0 X 0.0 X 1.0 X 2.0 X 3.0 X 4.0 X 5.0 Cluster number (x times) Cluster number (x times) Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 29. Introduction Our solution Data sets TVParser model Face naming Experimental Results Scene segmentation Conclusion Face naming Compare with supervised methods Comparable to supervised methods Even better when training set is limited Friends Family 1 1 0.9 0.9 Weighted F-measure Weighted F-measure 0.8 0.8 0.7 0.7 0.6 0.6 0.5 KNN 0.5 KNN SVM SVM 0.4 st 0.4 TVParser (1 best) TVParser (1st best) 0.3 TVParser (2nd best) 0.3 TVParser (2nd best) TVParser (3rd best) TVParser (3rd best) 0.2 0.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 training-test-ratio training-test-ratio Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 30. Introduction Our solution Data sets TVParser model Face naming Experimental Results Scene segmentation Conclusion Scene segmentation Baselines Scene segmentation methods (algorithms) Shot similarity graph (SSG) Dynamic time warping (DTW) Hidden Markov model (HMM) Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 31. Introduction Our solution Data sets TVParser model Face naming Experimental Results Scene segmentation Conclusion Scene segmentation Criteria Scene segmentation r r r r di 2 dij dj∗ dij2 ρ=( )·( ) u di2 u dj∗2 i=1 j=1 j=1 i=1 where dij is the length of overlap between the scene segment pi and pj∗ , di is the length of the scene pi and r is total length of all scenes. This purity value ranges from 0 to 1, and the larger a value is, the closer it is to the groundtruth. Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 32. Introduction Our solution Data sets TVParser model Face naming Experimental Results Scene segmentation Conclusion Scene segmentation Scene segmentation result Segmentation Sources Purity Scores Methods (video+) Friends Family SSG - 0.55 ± 0.11 0.53 ± 0.07 DTW sub.+scr. 0.60 ± 0.13 - HMM scr. 0.59 ± 0.08 0.53 ± 0.05 TVParser scr. 0.67 ± 0.07 0.58 ± 0.03 Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 33. Introduction Our solution Data sets TVParser model Face naming Experimental Results Scene segmentation Conclusion Scene segmentation Scene segmentation result under various role histograms Name histogram: first four characters are dominant Face histogram: more clusters are generally better 0.6 Average purity ↑0.05(≈29%) 0.55 0.7 0.5 ↑0.12(≈71%) 0.6 0.65 Purity score 0.45 0.6 0.5 0.4 0.55 2 3 4 5 6 7 8 9 10 11 Face histogram size 0.4 0.5 0.6 0.45 0.58 Average purity X 2.50 0.4 Fac X 2.00 0.54 e h X 1.50 10 ion ist ens ogr X 1.00 8 dim am X 0.50 6 ram 0.5 dim 4 is tog ens X 0.00 e h ion 2 Nam 0.46 X 0.25 X 0.75 X 1.25 X 1.75 X 2.25 Face histogram size Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 34. Introduction Our solution TVParser model Experimental Results Conclusion Conclusion We propose a generative model to formulate story plot development in TV videos, which solves face naming and scene segmentation in an unified framework. Key novelties Unsupervised face naming through model parameter learning Global optimal scene segmentation by hidden state inference Fast algorithms for both parameter learning and state inference Future work Personalized applications, e.g. TV video synthesis, etc; Generic cross-media analysis and association methods. Chao Liang TVParser: An Automatic TV Video Parsing Method
  • 35. Introduction Our solution TVParser model Experimental Results Conclusion Q&A Thanks! Chao Liang TVParser: An Automatic TV Video Parsing Method