Computer Vision for
PS3 Games
Richard Marks
SCE US R&D
My Background
 School
 Avionics, robotics, control theory
 ARL ! (Aerospace Robotics Lab)
 PlayStation R&D
 Created computer vision library for PS2
 Video filters; format conversion; logical, arithmetic, and
morphological operations; matching; moments
 Highly optimized using SIMD multimedia instructions
 Extensive pre-fetching, pipe balancing
 Provided library source and lots of sample code
 Worked with London studio to make game prototypes
 Specified EyeToy hardware and wrote initial driver
 Cell experience starting 2004
Computer Vision for PS3 Games
 PS3 Introduction
 Video input to PS3
 PLAYSTATION Eye
 Released games
 PS3 Vision SDK
 Current research topics
 Head tracking
 Color tracking
 Sketch analysis
PS3 Introduction
 Hardware
 1 PPE, 7 SPE (6 available to game)
 256 MB main memory, 256 MB graphics memory
 RSX graphics chip (Nvidia)
 USB 2.0
 Considerations
 RSX, PPE are heavily utilized by games
 Main memory is precious
 SPEs are under-utilized
 SPURS tasks or jobs are encouraged
PS3 live video input
 USB 2.0
 libcamera
 part of PS3 system software
 Implements simple driver model
 Open, Start, Read, Stop, Close
 Set/Get Attribute (e.g. gain, exposure, red/blue/green
gains, AGC flag, mirror flag, LED flag, etc.)
 Read copies most recent complete frame from
system memory to application memory
 Supports UVC, PS Eye, EyeToy
 Cameras are asynchronous to PS3 display
PLAYSTATION Eye (beyond EyeToy)
 Uncompressed Video
 No artifacts
 Software demosaicking
 Low CPU overhead
 Increased Sensitivity
 Low-light/No-light operation
 Shortened exposure times
 Lower visual noise
 Faster Frame Rate
 Quicker response
 Smaller tracking search regions
 More temporal information
 Dual Fixed Field of View
 Standard-angle, proven by EyeToy
 Wide-angle for full-body apps
 No focus adjustment needed
 Higher Resolution
 Lower pixelation effects
 Improved definition
 Better statistical behavior
 Improved voice input
 Unencumbered speech recognition
 Audio chat in noisy environment
 Echo location tracking
PS Eye Specification
 Cost similar to EyeToy
 56/75 degree dual-FOV lens
 <1% distortion, fixed focus (0.5m to 10m)
 ¼” CMOS sensor
 6 micron pixels
 640x480 Bayer at 60 frames/sec, 320x240 at 120 frames/sec
 640x480 YUV422 at 30 frames/sec, 320x240 at 60 frames/sec
 10-bit dynamic range, or 8-bit with gamma curve
 rolling shutter (no frame buffer)
 USB2/compression chip
 bulk transfer (low CPU overhead)
 optional JPEG compression
 Omni-directional 4-microphone linear array
Typical PS Eye video processing
1. Read 640x480 Bayer pattern, 60 frames/sec
2. Stuck pixel removal (calibrated or uncalibrated)
3. Bayer to RGB (demosaicking)
4. RGB to RGB’ (color correction)
5. RGB’ to YUV (color space conversion)
 (steps 2-5 use <2ms on 1 SPE)
-or-
 Read 640x480 YUV422, 30 frames/sec
Eye of Judgment
 Augmented reality card game (video)
 Uses modified version of Sony Cybercode
 Green markers provide card detection and
homography transform
 2-D barcode provides card identification
London Studio titles
 EyeCreate (free)
 Movie editing with effects
 Aquatopia, Operation Creature Feature,
Mesmerize, Tori-Emaki, Towers of Topoq
 Motion detection
 Feature tracking (similar to Lucas-Kanade, video)
Other PS3 apps that use a camera
 PS3 built-in A/V chat
 Burnout Paradise
 Snapshots at significant game moments (similar to
roller-coaster photos)
 Singstar
 Make your own music video
PS3 Vision SDK
libcamera
sys_audio
libvision
Game
libvision
vision
tasks
vision
jobs
libspurs
SPE
PPE
PS3 Vision SDK (SPE)
 SPE libvision function library
 Video filters; format conversion; logical, arithmetic, and morphological
operations; matching; moments
 Completely internal to SPE (no DMA, etc)
 cellSlice (8, 16, RGBA, Float)
 Note: some tasks cannot break images into slices (e.g. rotate)
 Optimized (SIMD, unrolled, pipelined), no assembly (only intrinsics)
 SPE tasks
 Typically call SPE libvision functions for cellSlices
 Handle DMA, double-buffering
 VisionTaskInfo, cellImage
 SPE jobs
 Call SPE libvision functions
 DMA set up in advance for slices, handled by SPURS
 Some things are hard to break up into jobs
PS3 Vision SDK (PPE)
 PPE libvision
 PPE versions of all SPE libvision functions using
#include <spu2vmx.h>
 cellImage
 cellImage8, cellImage16, cellImageRGBA, cellImageFloat
 VisionTaskSet
 Task synchronization
 Run-time reloadable SPE code
 Easy SPE task execution using similar calling model to PPE
PS3 Vision SDK
 Sample code
 Mostly use 1 SPE
 RSX shader renders YUV directly and boosts
saturation
 Easy to switch between SPE and PPE execution
 Not yet released to 3rd
party game developers
 Maintenance concerns
 Unhappy with design
Head Tracking
 Face detector from Sony Corporation
 Detects multiple faces at various scales, rotations
 Robust, but not smooth
 Face tracking
 Based on template (patch) matching
 Correlation of images filtered with signum of Laplacian of
Gaussian (sLoG)
 16x16 patches for 320x240 video
 Multiple templates allow different rotations and scales
 Uses motion patch tracking if template match fails
 Face detector directs face tracker search area
Color tracking
 Segmentation every frame
 Not really tracking at all
 Chrominance smoothing, thresholding
 Repeated windowed centroid/area calculation to
reject noise
 X, Y from centroid, Z from area (for sphere)
 Second moments provide principal axes
 In bad lighting, principal moment better for Z than area
 Video or demo
Got Light?

Sketch Analysis
 Image processing
 Sketches: edge detection
 Objects: background subtraction
 Segmentation (region finding)
 Find closed contours by walking around edges
 Vectorization (regions to polygons)
 Adjustable
 Texture lifted from original image
 Machination (polygon to game object)
 Physical objects
 Game-specific objects (e.g. tank, lunar lander, etc.)
Sketch Analysis Key Factors
 Known camera (PS Eye)
 Known camera position (Eye of Judgment stand)
 Known surface (white paper)
 Good lighting situation (paper faces up at lights,
camera looking down)
 High contrast naturally provided by user
 Shadows are problematic, but addressable
 video
Questions?
 Thank you!
 richard_marks@playstation.sony.com
Fundamental Issue: Lighting
 Unknown lighting environment limits robustness
 Variable lighting leads to variable performance
 Users often do not understand “good lighting”, or
they cannot easily accomplish it
 Simple auto gain, exposure are insufficient
 Positive methods give too many false negatives
My Background
 School
 Avionics, robotics, dynamics, control theory
 Embedded systems
 ARL (Aerospace Robotics Lab)
 “Visual Sensing for Automatic Control of an Underwater Robot”
 Automatic station-keeping, mosaic creation, and navigation
 Teleos Research (acquired by Autodesk)
 Real-time optical flow and stereo
 PeopleTracker® for Canon video conferencing camera
 Semi-automatic 3d modeling from photos

Computer Vision for PS3 Games

  • 1.
    Computer Vision for PS3Games Richard Marks SCE US R&D
  • 2.
    My Background  School Avionics, robotics, control theory  ARL ! (Aerospace Robotics Lab)  PlayStation R&D  Created computer vision library for PS2  Video filters; format conversion; logical, arithmetic, and morphological operations; matching; moments  Highly optimized using SIMD multimedia instructions  Extensive pre-fetching, pipe balancing  Provided library source and lots of sample code  Worked with London studio to make game prototypes  Specified EyeToy hardware and wrote initial driver  Cell experience starting 2004
  • 3.
    Computer Vision forPS3 Games  PS3 Introduction  Video input to PS3  PLAYSTATION Eye  Released games  PS3 Vision SDK  Current research topics  Head tracking  Color tracking  Sketch analysis
  • 4.
    PS3 Introduction  Hardware 1 PPE, 7 SPE (6 available to game)  256 MB main memory, 256 MB graphics memory  RSX graphics chip (Nvidia)  USB 2.0  Considerations  RSX, PPE are heavily utilized by games  Main memory is precious  SPEs are under-utilized  SPURS tasks or jobs are encouraged
  • 5.
    PS3 live videoinput  USB 2.0  libcamera  part of PS3 system software  Implements simple driver model  Open, Start, Read, Stop, Close  Set/Get Attribute (e.g. gain, exposure, red/blue/green gains, AGC flag, mirror flag, LED flag, etc.)  Read copies most recent complete frame from system memory to application memory  Supports UVC, PS Eye, EyeToy  Cameras are asynchronous to PS3 display
  • 6.
    PLAYSTATION Eye (beyondEyeToy)  Uncompressed Video  No artifacts  Software demosaicking  Low CPU overhead  Increased Sensitivity  Low-light/No-light operation  Shortened exposure times  Lower visual noise  Faster Frame Rate  Quicker response  Smaller tracking search regions  More temporal information  Dual Fixed Field of View  Standard-angle, proven by EyeToy  Wide-angle for full-body apps  No focus adjustment needed  Higher Resolution  Lower pixelation effects  Improved definition  Better statistical behavior  Improved voice input  Unencumbered speech recognition  Audio chat in noisy environment  Echo location tracking
  • 7.
    PS Eye Specification Cost similar to EyeToy  56/75 degree dual-FOV lens  <1% distortion, fixed focus (0.5m to 10m)  ¼” CMOS sensor  6 micron pixels  640x480 Bayer at 60 frames/sec, 320x240 at 120 frames/sec  640x480 YUV422 at 30 frames/sec, 320x240 at 60 frames/sec  10-bit dynamic range, or 8-bit with gamma curve  rolling shutter (no frame buffer)  USB2/compression chip  bulk transfer (low CPU overhead)  optional JPEG compression  Omni-directional 4-microphone linear array
  • 8.
    Typical PS Eyevideo processing 1. Read 640x480 Bayer pattern, 60 frames/sec 2. Stuck pixel removal (calibrated or uncalibrated) 3. Bayer to RGB (demosaicking) 4. RGB to RGB’ (color correction) 5. RGB’ to YUV (color space conversion)  (steps 2-5 use <2ms on 1 SPE) -or-  Read 640x480 YUV422, 30 frames/sec
  • 9.
    Eye of Judgment Augmented reality card game (video)  Uses modified version of Sony Cybercode  Green markers provide card detection and homography transform  2-D barcode provides card identification
  • 10.
    London Studio titles EyeCreate (free)  Movie editing with effects  Aquatopia, Operation Creature Feature, Mesmerize, Tori-Emaki, Towers of Topoq  Motion detection  Feature tracking (similar to Lucas-Kanade, video)
  • 11.
    Other PS3 appsthat use a camera  PS3 built-in A/V chat  Burnout Paradise  Snapshots at significant game moments (similar to roller-coaster photos)  Singstar  Make your own music video
  • 12.
  • 13.
    PS3 Vision SDK(SPE)  SPE libvision function library  Video filters; format conversion; logical, arithmetic, and morphological operations; matching; moments  Completely internal to SPE (no DMA, etc)  cellSlice (8, 16, RGBA, Float)  Note: some tasks cannot break images into slices (e.g. rotate)  Optimized (SIMD, unrolled, pipelined), no assembly (only intrinsics)  SPE tasks  Typically call SPE libvision functions for cellSlices  Handle DMA, double-buffering  VisionTaskInfo, cellImage  SPE jobs  Call SPE libvision functions  DMA set up in advance for slices, handled by SPURS  Some things are hard to break up into jobs
  • 14.
    PS3 Vision SDK(PPE)  PPE libvision  PPE versions of all SPE libvision functions using #include <spu2vmx.h>  cellImage  cellImage8, cellImage16, cellImageRGBA, cellImageFloat  VisionTaskSet  Task synchronization  Run-time reloadable SPE code  Easy SPE task execution using similar calling model to PPE
  • 15.
    PS3 Vision SDK Sample code  Mostly use 1 SPE  RSX shader renders YUV directly and boosts saturation  Easy to switch between SPE and PPE execution  Not yet released to 3rd party game developers  Maintenance concerns  Unhappy with design
  • 16.
    Head Tracking  Facedetector from Sony Corporation  Detects multiple faces at various scales, rotations  Robust, but not smooth  Face tracking  Based on template (patch) matching  Correlation of images filtered with signum of Laplacian of Gaussian (sLoG)  16x16 patches for 320x240 video  Multiple templates allow different rotations and scales  Uses motion patch tracking if template match fails  Face detector directs face tracker search area
  • 17.
    Color tracking  Segmentationevery frame  Not really tracking at all  Chrominance smoothing, thresholding  Repeated windowed centroid/area calculation to reject noise  X, Y from centroid, Z from area (for sphere)  Second moments provide principal axes  In bad lighting, principal moment better for Z than area  Video or demo
  • 18.
  • 19.
    Sketch Analysis  Imageprocessing  Sketches: edge detection  Objects: background subtraction  Segmentation (region finding)  Find closed contours by walking around edges  Vectorization (regions to polygons)  Adjustable  Texture lifted from original image  Machination (polygon to game object)  Physical objects  Game-specific objects (e.g. tank, lunar lander, etc.)
  • 20.
    Sketch Analysis KeyFactors  Known camera (PS Eye)  Known camera position (Eye of Judgment stand)  Known surface (white paper)  Good lighting situation (paper faces up at lights, camera looking down)  High contrast naturally provided by user  Shadows are problematic, but addressable  video
  • 21.
    Questions?  Thank you! richard_marks@playstation.sony.com
  • 22.
    Fundamental Issue: Lighting Unknown lighting environment limits robustness  Variable lighting leads to variable performance  Users often do not understand “good lighting”, or they cannot easily accomplish it  Simple auto gain, exposure are insufficient  Positive methods give too many false negatives
  • 23.
    My Background  School Avionics, robotics, dynamics, control theory  Embedded systems  ARL (Aerospace Robotics Lab)  “Visual Sensing for Automatic Control of an Underwater Robot”  Automatic station-keeping, mosaic creation, and navigation  Teleos Research (acquired by Autodesk)  Real-time optical flow and stereo  PeopleTracker® for Canon video conferencing camera  Semi-automatic 3d modeling from photos