Metadata for Motion Pictures: Media Streams

403 views
344 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
403
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • State of denial of researchers and developers in the areas of multimedia and interactive television wthout content representation, no scaling no control at multiple levels of granularity
  • Clip-based representation Fixes a segmentation of the video stream Separates the clip from its context of origin Encodes only one particular segmentation of the original data
  • Stream-based representation The stream of frames is left intact The stream has many possible segmentations by multi-layered annotations with precise time indexes (and the intersections, unions, etc. of these annotations)
  • Editing Resequencing of Shots Tell Kuleshov example face-bowl of soup face-coffin face-field of flowers Tell greeting/agreeing meeting example
  • (SHOW VIDEO of MTL) Content VCR Visualize Video Structure Videogram Browse at Different Time Scales Combine Automatic and Human Annotation
  • Generalization Hierarchy of Iconic Descriptors
  • Show categories Show VIDEO of IP Explain Glommed Icons (Icon Sentences)
  • (SHOW VIDEO of DW -- stop before Icon Title Editor-- Discuss Structure Horizontal Vertical and Actual vs. Inferable Time/Space STOP After first icon is made)
  • Metadata for Motion Pictures: Media Streams

    1. 1. Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2002 http://www.sims.berkeley.edu/academics/courses/is202/f02/ Lecture 08: Media Streams SIMS 202: Information Organization and Retrieval
    2. 2. Lecture 08: Media Streams <ul><li>Problem Setting </li></ul><ul><li>Current Approaches </li></ul><ul><li>Representing Media </li></ul><ul><li>New Solutions </li></ul><ul><li>Methodological Considerations </li></ul><ul><li>Future Work </li></ul>
    3. 3. Lecture 08: Media Streams <ul><li>Problem Setting </li></ul><ul><li>Current Approaches </li></ul><ul><li>Representing Media </li></ul><ul><li>New Solutions </li></ul><ul><li>Methodological Considerations </li></ul><ul><li>Future Work </li></ul>
    4. 4. What is the Problem? <ul><li>Today people cannot easily create, find, edit, share, and reuse media </li></ul><ul><li>Computers don’t understand media content </li></ul><ul><ul><li>Media is opaque and data rich </li></ul></ul><ul><ul><li>We lack structured representations </li></ul></ul><ul><li>Without content representation (metadata), manipulating digital media will remain like word-processing with bitmaps </li></ul>
    5. 5. Lecture 08: Media Streams <ul><li>Problem Setting </li></ul><ul><li>Current Approaches </li></ul><ul><li>Representing Media </li></ul><ul><li>New Solutions </li></ul><ul><li>Methodological Considerations </li></ul><ul><li>Future Work </li></ul>
    6. 6. The Search for Solutions <ul><li>Current approaches to creating metadata don’t work </li></ul><ul><ul><li>Signal-based analysis </li></ul></ul><ul><ul><li>Keywords </li></ul></ul><ul><ul><li>Natural language </li></ul></ul><ul><li>Need standardized metadata framework </li></ul><ul><ul><li>Designed for video and rich media data </li></ul></ul><ul><ul><li>Human and machine readable and writable </li></ul></ul><ul><ul><li>Standardized and scaleable </li></ul></ul><ul><ul><li>Integrated into media capture, archiving, editing, distribution, and reuse </li></ul></ul>
    7. 7. Signal-Based Parsing <ul><li>Practical problem </li></ul><ul><ul><li>Parsing unstructured, unknown video is very, very hard </li></ul></ul><ul><li>Theoretical problem </li></ul><ul><ul><li>Mismatch between percepts and concepts </li></ul></ul>
    8. 8. Why Keywords Don’t Work <ul><li>Are not a semantic representation </li></ul><ul><li>Do not describe relations between descriptors </li></ul><ul><li>Do not describe temporal structure </li></ul><ul><li>Do not converge </li></ul><ul><li>Do not scale </li></ul>
    9. 9. Natural Language vs. Visual Language Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking.
    10. 10. Natural Language vs. Visual Language Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking.
    11. 11. Notation for Time-Based Media: Music
    12. 12. Visual Language Advantages <ul><li>A language designed as an accurate and readable representation of time-based media </li></ul><ul><ul><li>For video, especially important for actions, expressions, and spatial relations </li></ul></ul><ul><li>Enables Gestalt view and quick recognition of descriptors due to designed visual similarities </li></ul><ul><li>Supports global use of annotations </li></ul>
    13. 13. Lecture 08: Media Streams <ul><li>Problem Setting </li></ul><ul><li>Current Approaches </li></ul><ul><li>Representing Media </li></ul><ul><li>New Solutions </li></ul><ul><li>Methodological Considerations </li></ul><ul><li>Future Work </li></ul>
    14. 14. Representing Video <ul><li>Streams vs. Clips </li></ul><ul><li>Video syntax and semantics </li></ul><ul><li>Ontological issues in video representation </li></ul>
    15. 15. Video is Temporal
    16. 16. Streams vs. Clips
    17. 17. Stream-Based Representation <ul><li>Makes annotation pay off </li></ul><ul><ul><li>The richer the annotation, the more numerous the possible segmentations of the video stream </li></ul></ul><ul><li>Clips </li></ul><ul><ul><li>Change from being fixed segmentations of the video stream, to being the results of retrieval queries based on annotations of the video stream </li></ul></ul><ul><li>Annotations </li></ul><ul><ul><li>Create representations which make clips, not representations of clips </li></ul></ul>
    18. 18. Video Syntax and Semantics <ul><li>The Kuleshov Effect </li></ul><ul><li>Video has a dual semantics </li></ul><ul><ul><li>Sequence-independent invariant semantics of shots </li></ul></ul><ul><ul><li>Sequence-dependent variable semantics of shots </li></ul></ul>
    19. 19. Ontological Issues for Video <ul><li>Video plays with rules for identity and continuity </li></ul><ul><ul><li>Space </li></ul></ul><ul><ul><li>Time </li></ul></ul><ul><ul><li>Character </li></ul></ul><ul><ul><li>Action </li></ul></ul>
    20. 20. Space and Time: Actual vs. Inferable <ul><li>Actual Recorded Space and Time </li></ul><ul><ul><li>GPS </li></ul></ul><ul><ul><li>Studio space and time </li></ul></ul><ul><li>Inferable Space and Time </li></ul><ul><ul><li>Establishing shots </li></ul></ul><ul><ul><li>Cues and clues </li></ul></ul>
    21. 21. Time: Temporal Durations <ul><li>Story (Fabula) Duration </li></ul><ul><ul><li>Example: Brushing teeth in story world (5 minutes) </li></ul></ul><ul><li>Plot (Syuzhet) Duration </li></ul><ul><ul><li>Example: Brushing teeth in plot world (1 minute: 6 steps of 10 seconds each) </li></ul></ul><ul><li>Screen Duration </li></ul><ul><ul><li>Example: Brushing teeth (10 seconds: 2 shots of 5 seconds each) </li></ul></ul>
    22. 22. Character and Continuity <ul><li>Identity of character is constructed through </li></ul><ul><ul><li>Continuity of actor </li></ul></ul><ul><ul><li>Continuity of role </li></ul></ul><ul><li>Alternative continuities </li></ul><ul><ul><li>Continuity of actor only </li></ul></ul><ul><ul><li>Continuity of role only </li></ul></ul>
    23. 23. Representing Action <ul><li>Physically-based description for sequence-independent action semantics </li></ul><ul><ul><li>Abstract vs. conventionalized descriptions </li></ul></ul><ul><ul><li>Temporally and spatially decomposable actions and subactions </li></ul></ul><ul><li>Issues in describing sequence-dependent action semantics </li></ul><ul><ul><li>Mental states (emotions vs. expressions) </li></ul></ul><ul><ul><li>Cultural differences (e.g., bowing vs. greeting) </li></ul></ul>
    24. 24. “Cinematic” Actions <ul><li>Cinematic actions support the basic narrative structure of cinema </li></ul><ul><ul><li>Reactions/Proactions </li></ul></ul><ul><ul><ul><li>Nodding, screaming, laughing, etc. </li></ul></ul></ul><ul><ul><li>Focus of Attention </li></ul></ul><ul><ul><ul><li>Gazing, headturning, pointing, etc. </li></ul></ul></ul><ul><ul><li>Locomotion </li></ul></ul><ul><ul><ul><li>Walking, running, etc. </li></ul></ul></ul><ul><li>Cinematic actions can occur </li></ul><ul><ul><ul><li>Within the frame/shot boundary </li></ul></ul></ul><ul><ul><ul><li>Across the frame boundary </li></ul></ul></ul><ul><ul><ul><li>Across shot boundaries </li></ul></ul></ul>
    25. 25. Lecture 08: Media Streams <ul><li>Problem Setting </li></ul><ul><li>Current Approaches </li></ul><ul><li>Representing Media </li></ul><ul><li>New Solutions </li></ul><ul><li>Methodological Considerations </li></ul><ul><li>Future Work </li></ul>
    26. 26. New Solutions for Creating Metadata After Capture During Capture
    27. 27. After Capture: Media Streams
    28. 28. Media Streams Features <ul><li>Key features </li></ul><ul><ul><li>Stream-based representation (better segmentation) </li></ul></ul><ul><ul><li>Semantic indexing (what things are similar to) </li></ul></ul><ul><ul><li>Relational indexing (who is doing what to whom) </li></ul></ul><ul><ul><li>Temporal indexing (when things happen) </li></ul></ul><ul><ul><li>Iconic interface (designed visual language) </li></ul></ul><ul><ul><li>Universal annotation (standardized markup schema) </li></ul></ul><ul><li>Key benefits </li></ul><ul><ul><li>More accurate annotation and retrieval </li></ul></ul><ul><ul><li>Global usability and standardization </li></ul></ul><ul><ul><li>Reuse of rich media according to content and structure </li></ul></ul>
    29. 29. Media Streams GUI Components <ul><li>Media Time Line </li></ul><ul><li>Icon Space </li></ul><ul><ul><li>Icon Workshop </li></ul></ul><ul><ul><li>Icon Palette </li></ul></ul>
    30. 30. Media Time Line <ul><li>Visualize video at multiple time scales </li></ul><ul><li>Write and read multi-layered iconic annotations </li></ul><ul><li>One interface for annotation, query, and composition </li></ul>
    31. 31. Media Time Line
    32. 32. Icon Space <ul><li>Icon Workshop </li></ul><ul><ul><li>Utilize categories of video representation </li></ul></ul><ul><ul><li>Create iconic descriptors by compounding iconic primitives </li></ul></ul><ul><ul><li>Extend set of iconic descriptors </li></ul></ul><ul><li>Icon Palette </li></ul><ul><ul><li>Dynamically group related sets of iconic descriptors </li></ul></ul><ul><ul><li>Reuse descriptive effort of others </li></ul></ul><ul><ul><li>View and use query results </li></ul></ul>
    33. 33. Icon Space
    34. 34. Icon Space: Icon Workshop <ul><li>General to specific (horizontal) </li></ul><ul><ul><li>Cascading hierarchy of icons with increasing specificity on subordinate levels </li></ul></ul><ul><li>Combinatorial (vertical) </li></ul><ul><ul><li>Compounding of hierarchically organized icons across multiple axes of description </li></ul></ul>
    35. 35. Icon Space: Icon Workshop Detail
    36. 36. Icon Space: Icon Palette <ul><li>Dynamically group related sets of iconic descriptors </li></ul><ul><li>Collect icon sentences </li></ul><ul><li>Reuse descriptive effort of others </li></ul>
    37. 37. Icon Space: Icon Palette Detail
    38. 38. Video Retrieval In Media Streams <ul><li>Same interface for annotation and retrieval </li></ul><ul><li>Assembles responses to queries as well as finds them </li></ul><ul><li>Query responses use semantics to degrade gracefully </li></ul>
    39. 39. Media Streams Technologies <ul><li>Minimal video representation distinguishing syntax and semantics </li></ul><ul><li>Iconic visual language for annotating and retrieving video content </li></ul><ul><li>Retrieval-by-composition methods for repurposing video </li></ul>
    40. 40. New Solutions for Creating Metadata After Capture During Capture
    41. 41. Creating Metadata During Capture New Capture Paradigm 1 Good Capture Drives Multiple Uses Current Capture Paradigm Multiple Captures To Get 1 Good Capture
    42. 42. Active Capture <ul><li>Active engagement and communication among the capture device, agent(s), and the environment </li></ul><ul><li>Re-envision capture as a control system with feedback </li></ul><ul><li>Use multiple data sources and communication to simplify the capture scenario </li></ul><ul><li>Use HCI to support “human-in-the-loop” algorithms for computer vision and audition </li></ul>
    43. 43. Active Capture Processing Capture Interaction Active Capture Computer Vision HCI Direction/ Cinematography
    44. 44. Automated Capture: Good Capture
    45. 45. Automated Capture: Error Handling
    46. 46. Evolution of Media Production <ul><li>Customized production </li></ul><ul><ul><li>Skilled creation of one media product </li></ul></ul><ul><li>Mass production </li></ul><ul><ul><li>Automatic replication of one media product </li></ul></ul><ul><li>Mass customization </li></ul><ul><ul><li>Skilled creation of adaptive media templates </li></ul></ul><ul><ul><li>Automatic production of customized media </li></ul></ul>
    47. 47. <ul><li>Movies change from being static data to programs </li></ul><ul><li>Shots are inputs to a program that computes new media based on content representation and functional dependency (US Patents 6,243,087 & 5,969,716) </li></ul>Central Idea: Movies as Programs Parser Parser Producer Media Media Media Content Representation Content Representation
    48. 48. Jim Lanahan in an MCI Ad
    49. 49. Jim Lanahan in an @Home Banner
    50. 50. Automated Media Production Process Reusable Online Asset Database 2 Annotation and Retrieval Asset Retrieval and Reuse Web Integration and Streaming Media Services Flash Generator WAP HTML Email Print/Physical Media Automated Capture 1 Automatic Editing 3 Personalized Delivery 4 Annotation of Media Assets Adaptive Media Engine
    51. 51. Proposed Technology Architecture Media Processing DB Analysis Engine Interaction Engine Adaptive Media Engine Annotation and Retrieval Engine (MPEG 7) Delivery Engine OS Media Capture File AV Out Network Device Control
    52. 52. Lecture 08: Media Streams <ul><li>Problem Setting </li></ul><ul><li>Representing Media </li></ul><ul><li>Current Approaches </li></ul><ul><li>New Solutions </li></ul><ul><li>Methodological Considerations </li></ul><ul><li>Future Work </li></ul>
    53. 53. Non-Technical Challenges <ul><li>Standardization of media metadata (MPEG-7) </li></ul><ul><li>Broadband infrastructure and deployment </li></ul><ul><li>Intellectual property and economic models for sharing and reuse of media assets </li></ul>
    54. 54. Technical Research Challenges <ul><li>Develop end-to-end metadata system for automated media capture, processing, management, and reuse </li></ul><ul><li>Creating metadata </li></ul><ul><ul><li>Represent action sequences and higher level narrative structures </li></ul></ul><ul><ul><li>Integrate legacy metadata (keywords, natural language) </li></ul></ul><ul><ul><li>Gather more and better metadata at the point of capture (develop metadata cameras) </li></ul></ul><ul><ul><li>Develop “human-in-the-loop” indexing algorithms and interfaces </li></ul></ul><ul><li>Using metadata </li></ul><ul><ul><li>Develop media components (MediaLego) </li></ul></ul><ul><ul><li>Integrate linguistic and other query interfaces </li></ul></ul>
    55. 55. For More Info <ul><li>Marc Davis Web Site </li></ul><ul><ul><li>www.sims.berkeley.edu/~marc </li></ul></ul><ul><li>Spring 2003 course on “Multimedia Information” at SIMS </li></ul><ul><li>URAP and GSR positions </li></ul><ul><li>TidalWave II “New Media” program </li></ul>
    56. 56. Next Time <ul><li>Metadata for Motion Pictures: MPEG-7 (MED) </li></ul><ul><li>Readings for next time (in Protected) </li></ul><ul><ul><li>“ MPEG-7: The Generic Multimedia Content Description Interface, Part 1” (J. M. Martinez, R. Koenen, F. Pereira) </li></ul></ul><ul><ul><li>“ MPEG-7: Overview of MPEG-7 Description Tools, Part 2” (J. Martinez) </li></ul></ul>
    57. 57. Homework (!) <ul><li>Assignment 4: Revision of Photo Metadata Design and Project Presentation </li></ul><ul><ul><li>Due by Monday, September 23 </li></ul></ul><ul><ul><ul><li>Completed (Revised) Photo Classifications and Annotated Photos </li></ul></ul></ul><ul><ul><ul><ul><li>[groupname]_classification.xls file </li></ul></ul></ul></ul><ul><ul><ul><ul><li>[groupname]_photos.xls file </li></ul></ul></ul></ul><ul><ul><li>Due by Thursday, September 26 </li></ul></ul><ul><ul><ul><li>Group Presentation </li></ul></ul></ul><ul><ul><ul><ul><li>2 minutes: Presentation of application idea </li></ul></ul></ul></ul><ul><ul><ul><ul><li>6 minutes: Presentation of classification and photo browser </li></ul></ul></ul></ul><ul><ul><ul><ul><li>2 minutes: residual time for completing explanations and Q + A </li></ul></ul></ul></ul><ul><ul><ul><li>Photo Browser Page (will be sent to you) </li></ul></ul></ul>

    ×