HIPI: Computer Vision atLarge ScaleChris SweenyLiu Liu
Intro to MapReduceSIMD at ScaleMapper / Reducer
MapReduce, Main TakeawayData Centric, Data Centric, Data Centric!
Hadoop, a Java ImplAn Implementation of MapReduce originated from Yahoo!The Cluster we worked at has 625.5 nodes, with map task capacity of 2502 and reduce task capacity of 834
Computer Vision at ScaleThe “computational vision”The sheer size of dataset:PCA of Natural Images (1992):  15 images, 4096 patchesHigh-perf Face Detection (2007): 75,000 samplesIM2GPS (2008):  6,472,304 images
HIPI Workflow
HIPI Image Bundle SetupMoral of the story:Many small files are killing the performance in distributed file system.
Redo PCA in Natural Images at ScaleThe first 15 principal components with 15 images (Hancock, 1992):
Redo PCA in Natural Images at ScaleComparison:Hancock, 1992HIPI, 100HIPI, 1,000HIPI, 10,000HIPI, 100,000
Optimize HIPI PerformanceCulling: because decompression is costlyDecompress at needA boolean cull(ImageHeader header) method for conditional decompression
Culling, to inspect specific camera effectsCanon Powershot S500, at 2592x1944
HIPI, Glance at Performance figuresAn empty job (only decompressing and looping over images), 5 run, using minimal figure, in seconds, lower is better:
HIPI, Glance at Performance figuresIm2gray job (converting images to gray scale), 5 run, using minimal figure, in seconds, lower is better:
HIPI, Glance at Performance figuresCovariance job (compute covariance matrix of patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:
HIPI, Glance at Performance figuresCulling job (decompressing all images V.S. decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:
ConclusionEverything at large scale gets better.HIPI provides an image-centric interface that performs on par or better than the leading alternativeCull method provides significant improvement and convenienceHIPI offers noticeable improvements!
Future workRelease HIPI as Opensource Project.Work on deep integration with Hadoop.Making HIPI work-load more configurable.Making work-load more balanced.

Hipi: Computer Vision at Large Scale

  • 1.
    HIPI: Computer VisionatLarge ScaleChris SweenyLiu Liu
  • 2.
    Intro to MapReduceSIMDat ScaleMapper / Reducer
  • 3.
    MapReduce, Main TakeawayDataCentric, Data Centric, Data Centric!
  • 4.
    Hadoop, a JavaImplAn Implementation of MapReduce originated from Yahoo!The Cluster we worked at has 625.5 nodes, with map task capacity of 2502 and reduce task capacity of 834
  • 5.
    Computer Vision atScaleThe “computational vision”The sheer size of dataset:PCA of Natural Images (1992): 15 images, 4096 patchesHigh-perf Face Detection (2007): 75,000 samplesIM2GPS (2008): 6,472,304 images
  • 6.
  • 7.
    HIPI Image BundleSetupMoral of the story:Many small files are killing the performance in distributed file system.
  • 8.
    Redo PCA inNatural Images at ScaleThe first 15 principal components with 15 images (Hancock, 1992):
  • 9.
    Redo PCA inNatural Images at ScaleComparison:Hancock, 1992HIPI, 100HIPI, 1,000HIPI, 10,000HIPI, 100,000
  • 10.
    Optimize HIPI PerformanceCulling:because decompression is costlyDecompress at needA boolean cull(ImageHeader header) method for conditional decompression
  • 11.
    Culling, to inspectspecific camera effectsCanon Powershot S500, at 2592x1944
  • 12.
    HIPI, Glance atPerformance figuresAn empty job (only decompressing and looping over images), 5 run, using minimal figure, in seconds, lower is better:
  • 13.
    HIPI, Glance atPerformance figuresIm2gray job (converting images to gray scale), 5 run, using minimal figure, in seconds, lower is better:
  • 14.
    HIPI, Glance atPerformance figuresCovariance job (compute covariance matrix of patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:
  • 15.
    HIPI, Glance atPerformance figuresCulling job (decompressing all images V.S. decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:
  • 16.
    ConclusionEverything at largescale gets better.HIPI provides an image-centric interface that performs on par or better than the leading alternativeCull method provides significant improvement and convenienceHIPI offers noticeable improvements!
  • 17.
    Future workRelease HIPIas Opensource Project.Work on deep integration with Hadoop.Making HIPI work-load more configurable.Making work-load more balanced.