A High-Throughput Approach to
Discovering Good Forms of Visual
Representation

David Cox
The Rowland Institute at
Harvard
...
Goals
Goals


1) “Building Brains”
Concrete example of real-world experiments
fundamental enabled by stream processing
hardware
Goals


1) “Building Brains”
Concrete example of real-world experiments
fundamental enabled by stream processing
hardware
...
Goals


1) “Building Brains”
Concrete example of real-world experiments
fundamental enabled by stream processing
hardware
...
The Problem
Why is vision hard?
The Problem
Why is vision hard?



              World is 3D, but retina is 2D
The Problem
Why is vision hard?



              World is 3D, but retina is 2D


              The same object can cast an...
The Problem
Object Variation
The Problem
Object Variation
The Problem
Object Variation
The Problem
Object Variation
Transformation  Identity
Transformation  Identity
Transformation  Identity
Transformation  Identity
Transformation  Identity
Transformation  Identity




 Bigger difference
   in pixel space
The Approach:
Reverse Engineering the Brain



    REVERSE

       Study
   Natural System
The Approach:
Reverse Engineering the Brain



    REVERSE                FORWARD

       Study                   Build
  ...
The Approach:
Reverse Engineering the Brain



    REVERSE                FORWARD

       Study                   Build
  ...
Reverse Engineering the Brain
Monkeys
Monkeys
Rats
Rats
Visual System
Visual System
IT Neurons: Complex Stimuli




                              Desimone et al.
IT Cortex can do object recognition




                                Hung et al., 2005
Reverse Engineering the Brain



    REVERSE                FORWARD

       Study                   Build
   Natural Syste...
Reverse Engineering the Brain



    REVERSE                FORWARD

       Study                   Build
   Natural Syste...
Object Recognition


  Training Set
Object Recognition


  Training Set   Representation
Object Recognition


  Training Set   Representation




                                  Classifier
Object Recognition


  Training Set   Representation




                                  Classifier
Object Recognition


  Training Set   Representation




                                  Classifier


  Test Example   Re...
Object Recognition


  Training Set   Representation




                                  Classifier


  Test Example   Re...
Object Recognition
Object Recognition
Training Set
Object Recognition
Training Set
Object Recognition
Training Set
Object Recognition
Training Set
Object Recognition
Training Set
Object Recognition
Testing Set
Object Recognition
Testing Set
Object Recognition
Testing Set




       “Siri”
Object Recognition
Testing Set




       “Siri”
Object Recognition
Testing Set




                     “Not Siri”
       “Siri”
How are things done normally?
How are things done normally?

  Usual Formula:
How are things done normally?

  Usual Formula:

  1) One grad student
How are things done normally?

  Usual Formula:

  1) One grad student
  2) One Model (size limited by runtime)
How are things done normally?

  Usual Formula:

  1) One grad student
  2) One Model (size limited by runtime)
  3) Perfo...
How are things done normally?

  Usual Formula:

  1) One grad student
  2) One Model (size limited by runtime)
  3) Perfo...
How are things done normally?

  Usual Formula:

  1) One grad student
  2) One Model (size limited by runtime)
  3) Perfo...
Why is this not optimal?
Why is this not optimal?


• Lots of parameters – can’t explore easily
Why is this not optimal?


• Lots of parameters – can’t explore easily
• Big models are paralyzingly slow to run
Why is this not optimal?


• Lots of parameters – can’t explore easily
• Big models are paralyzingly slow to run
• Advice ...
Why is this not optimal?


• Lots of parameters – can’t explore easily
• Big models are paralyzingly slow to run
• Advice ...
Doing things a little bit dierently
Doing things a little bit dierently


  1) One grad student
Doing things a little bit dierently


  1) One grad student
  2) One Hundreds of Thousands of
  BIG Models
Doing things a little bit dierently


  1) One grad student
  2) One Hundreds of Thousands of
  BIG Models
  3) Performanc...
Doing things a little bit dierently


  1) One grad student
  2) One Hundreds of Thousands of
  BIG Models
  3) Performanc...
Doing things a little bit dierently


  1) One grad student
  2) One Hundreds of Thousands of
  BIG Models
  3) Performanc...
Doing things a little bit dierently


  1) One grad student
  2) One Hundreds of Thousands of
  BIG Models
  3) Performanc...
High-Throughput Screening
High-Throughput Screening
Pipeline: Biology
Pipeline: Biology



“Plate” a Diversity
  of Organisms
Pipeline: Biology



“Plate” a Diversity   Allow them
  of Organisms          to grow
Pipeline: Biology



“Plate” a Diversity   Allow them
                                   Apply Challenge
  of Organisms   ...
Pipeline: Biology



“Plate” a Diversity   Allow them
                                        Apply Challenge
  of Organis...
Pipeline: Biology



“Plate” a Diversity     Allow them
                                          Apply Challenge
  of Org...
Pipeline: Biology-Inspired Vision
Pipeline: Biology-Inspired Vision



   Generate
Random Models
Pipeline: Biology-Inspired Vision



   Generate      Unsupervised
Random Models   Learning (Video)
Pipeline: Biology-Inspired Vision



   Generate      Unsupervised          Test with
Random Models   Learning (Video)   “...
Pipeline: Biology-Inspired Vision



   Generate      Unsupervised            Test with
Random Models   Learning (Video)  ...
Pipeline: Biology-Inspired Vision



   Generate         Unsupervised            Test with
Random Models      Learning (Vi...
Need to break all of the implicit rules
Need to break all of the implicit rules



   1. Test tens to hundreds of thousands
      of instantiations of biologicall...
Need to break all of the implicit rules



   1. Test tens to hundreds of thousands
      of instantiations of biologicall...
Need to break all of the implicit rules



   1. Test tens to hundreds of thousands
      of instantiations of biologicall...
Massive Computation
Massive Computation




If you work for it, a multi-TeraFLOPS
cluster is doable for a modest price
Long, winding road of stream processing
GPU pr0n
GPU pr0n
A Match Made in Heaven
Brains are parallel, GPUs are parallel



                  ≈
A Match Made in Heaven
Brains are parallel, GPUs are parallel



                     ≈
   Multiple scales of parallelism:
A Match Made in Heaven
Brains are parallel, GPUs are parallel



                     ≈
   Multiple scales of parallelism:...
A Match Made in Heaven
Brains are parallel, GPUs are parallel



                     ≈
   Multiple scales of parallelism:...
A Match Made in Heaven
Images In, Images Out



                 ≈
A Match Made in Heaven
Images In, Images Out



                    ≈
   Image processing particularly well-suited
A Match Made in Heaven
Images In, Images Out



                    ≈
   Image processing particularly well-suited
    Exc...
A Match Made in Heaven
Images In, Images Out



                    ≈
   Image processing particularly well-suited
    Exc...
These things are REALLY fast
       Performance (g ops)   Development Time (hours)
These things are REALLY fast
         Performance (g ops)   Development Time (hours)



Matlab



 C/SSE



  PS3



GT200
These things are REALLY fast
               Performance (g ops)   Development Time (hours)


         0.3
Matlab



 C/SSE...
These things are REALLY fast
                 Performance (g ops)   Development Time (hours)


         0.3
Matlab


     ...
These things are REALLY fast
                 Performance (g ops)   Development Time (hours)


         0.3
Matlab


     ...
These things are REALLY fast
                 Performance (g ops)   Development Time (hours)


         0.3
Matlab


     ...
These things are REALLY fast
                 Performance (g ops)   Development Time (hours)


         0.3
Matlab
       ...
These things are REALLY fast
                  Performance (g ops)   Development Time (hours)


         0.3
Matlab
      ...
These things are REALLY fast
                   Performance (g ops)   Development Time (hours)


         0.3
Matlab
     ...
These things are REALLY fast
                   Performance (g ops)   Development Time (hours)


         0.3
Matlab
     ...
Pipeline



   Generate         Unsupervised            Test with
Random Models      Learning (Video)     “screening” task...
Pipeline



   Generate         Unsupervised            Test with
Random Models      Learning (Video)     “screening” task...
Read-out


L3
                  thresh/sat            norm strength

                                                     ...
neighborhood                         Rate
                                                                    Trace
      ...
A Broad Parametric Model

Normalize
Ni = Inputi / norm(Inputneighborhd)

Compute Filter Responses
Ri = Fi ⊗ N
Ri  thresh: ...
A Broad Parametric Model

Normalize                                 • Optimize “Coverage”
Ni = Inputi / norm(Inputneighbor...
A Broad Parametric Model

Normalize                                 • Optimize “Coverage”
Ni = Inputi / norm(Inputneighbor...
A Broad Parametric Model

Normalize                                 • Optimize “Coverage”
Ni = Inputi / norm(Inputneighbor...
A Broad Parametric Model

Normalize                                 • Optimize “Coverage”
Ni = Inputi / norm(Inputneighbor...
Dealing with Parametric Complexity
Dealing with Parametric Complexity

Throwing a wide net comes with its own
challenges:
Dealing with Parametric Complexity

Throwing a wide net comes with its own
challenges:
     Complexity
 •
Dealing with Parametric Complexity

Throwing a wide net comes with its own
challenges:
     Complexity
 •

  Kernels must ...
Dealing with Parametric Complexity

Throwing a wide net comes with its own
challenges:
     Complexity
 •

  Kernels must ...
Meta-programming
Meta-programming

Leave the grunt-programming to the
computer
Meta-programming

Leave the grunt-programming to the
computer
    Dynamically compile specialized versions of
•
    the sa...
Meta-programming

Leave the grunt-programming to the
computer
    Dynamically compile specialized versions of
•
    the sa...
Meta-programming

Leave the grunt-programming to the
computer
    Dynamically compile specialized versions of
•
    the sa...
texturefloat4, 1, cudaReadModeElementType tex_float4;
__constant__ float constant[$FILTER_D][$FILTER_W][$N_FILTERS];

#def...
texturefloat4, 1, cudaReadModeElementType tex_float4;
__constant__ float constant[$FILTER_D][$FILTER_W][$N_FILTERS];

#def...
conv_kernel_4x4x4.cu
conv_kernel_template.cu                                         #include stdio.h

                   ...
conv_kernel_template.cu
 texturefloat4, 1, cudaReadModeElementType tex_float4;
 __constant__ float constant[$FILTER_D][$FI...
conv_kernel_beta_template.cu
 texturefloat4, 1, cudaReadModeElementType tex_float4;
 __constant__ float constant[$FILTER_D...
conv_kernel_beta_template.cu
 texturefloat4, 1, cudaReadModeElementType tex_float4;
 __constant__ float constant[$FILTER_D...
version A
                                                                                            ...
conv_kernel_beta...
version A
                                                                                                  ...
conv_kerne...
Pipeline



   Generate         Unsupervised            Test with
Random Models      Learning (Video)     “screening” task...
Pipeline



   Generate         Unsupervised            Test with
Random Models      Learning (Video)     “screening” task...
Unsupervised Experience
Learn Filter Kernels from Temporal Input Statistics
Unsupervised Experience
Learn Filter Kernels from Temporal Input Statistics
                law  order
Screening
A quick object rec. test to find promising models




   Generate         Unsupervised            Test with
Rando...
Screening
A quick object rec. test to find promising models




   Generate         Unsupervised            Test with
Rando...
Screening
A quick object rec. test to find promising models


            “cars”          “planes”
Screening
A quick object rec. test to find promising models


Parametric Variation




no variation
Screening
A quick object rec. test to find promising models


Parametric Variation




no variation    more variation
Screening
A quick object rec. test to find promising models


Parametric Variation




no variation    more variation     l...
Screening


250


200
                     N=2500
150


100


50


 0
      50   60   70   80   90   100
Screening


250


200
                     N=2500
150


100


50


 0
      50   60   70   80   90   100
Screening

                                     100
250
                                     90

                         ...
Validation
See how we do on other test sets




   Generate         Unsupervised            Test with
Random Models      L...
Validation
See how we do on other test sets




   Generate         Unsupervised            Test with
Random Models      L...
Validation


  cars vs. planes (validate)             boats vs. animals

  100                             100

   90     ...
Validation

             synthetic face
                                         webcam faces
             discrimination
...
Other “Petri Dishes”


       boats
Other “Petri Dishes”


                       cars and planes
       boats
Other “Petri Dishes”
Unsupervised
                   Screen Distribution                   Validation
Training “World”
   ...
Screening Basically Works


                       3

                       2

                       1
     cars vs. pla...
Screening Basically Works


                                                                                              ...
Screening Works on Average

                           3




                           2
     noramalized average




   ...
What does it all mean?
What does it all mean?



                            Preprocessing / Normalization / Zero-mean                           ...
What does it all mean?



                                Layer 1 / Normalization / Threshold                             ...
Summary
Summary



  • GPUs allow us to engage a qualitatively
    dierent pace of experimentation
Summary



  • GPUs allow us to engage a qualitatively
    dierent pace of experimentation
  • CUDA provides unprecedented...
Summary



  • GPUs allow us to engage a qualitatively
    dierent pace of experimentation
  • CUDA provides unprecedented...
Shameless Recruitment




 http://www.rowland.harvard.edu/rjf/cox
Shameless Recruitment




 http://www.rowland.harvard.edu/rjf/cox
Shameless Recruitment

                                Now
                                   with
                       ...
Acknowledgments

 Cox Lab @
 The Rowland Institute at Harvard
• Davide Zoccolan
• Nadja Oertelt
Acknowledgments

 Cox Lab @
 The Rowland Institute at Harvard
• Davide Zoccolan
• Nadja Oertelt

DiCarlo Lab @ MIT
• Jim D...
Acknowledgments

 Cox Lab @
 The Rowland Institute at Harvard
• Davide Zoccolan
• Nadja Oertelt

DiCarlo Lab @ MIT
• Jim D...
Acknowledgments

 Cox Lab @
 The Rowland Institute at Harvard
• Davide Zoccolan
• Nadja Oertelt

DiCarlo Lab @ MIT
• Jim D...
The Problem with Evaluation

What set to use?
The Problem with Evaluation

What set to use?
The Problem with Evaluation

What set to use?
Caltech 101
faces_easy   faces   car_side   airplanes   accordion
                                                        Caltech 101
Caltech 101




              from A. Torralba
The field does well on the ‘101

          Performance
   100
100%

 8080
                                               1]...
The “start-simple” model
The “start-simple” model


                     • Input divisive normalization
The “start-simple” model


                     • Input divisive normalization
                     • Threshold, saturation
The “start-simple” model


                     • Input divisive normalization
                     • Threshold, saturatio...
The “start-simple” model


                     • Input divisive normalization
                     • Threshold, saturatio...
The “start-simple” model


                     • Input divisive normalization
                     • Threshold, saturatio...
V1-like model does shockingly well

          Performance
   100
100%

 8080

 60 60

 40 40

 20 20

  0   0
            ...
V1-like model does shockingly well

          Performance
   100
100%

 8080

 60 60

 40 40

 20 20

  0   0
            ...
V1-like model does shockingly well

          Performance
   100
100%

 8080

 60 60

 40 40

 20 20

  0   0
            ...
But the model is stupid


It’s just V1.

It has no speci c
mechanisms for achieving
invariance.
But the model is stupid


It’s just V1.

It has no speci c
mechanisms for achieving
invariance.
But the model is stupid


It’s just V1.

It has no speci c
mechanisms for achieving
invariance.


 The Fight is
 Rigged?
Refocusing on the real problem
Refocusing on the real problem




  Just two classes: should be easy, no?
Refocusing on the real problem


Parametric Variation




no variation
Refocusing on the real problem


Parametric Variation




no variation   more variation
Refocusing on the real problem


Parametric Variation




no variation   more variation   lots of variation
Pulling performance back to chance
                          100

                          90
        Performance (%)



...
Pulling performance back to chance
                          100

                          90
        Performance (%)



...
Just one bad database?
Just one bad database?

Caltech 256
Just one bad database?

Caltech 256
Face Databases: ORL, Yale, AR, CVL
Just one bad database?

Caltech 256
Face Databases: ORL, Yale, AR, CVL




                                See us @ ECCV 0...
Face Databases


ORL                                               4 training examples                 8 training examples...
Face Databases


     AR                                                5 training          8 training
                   ...
Face Databases


     YALE                                                               4 training             8 training...
Face Databases


     CVL                                                   2 training     3 training
                    ...
Face Databases


     CAS-PEAL
                                                                         4 training example...
Simulation




             vs.
Synthetic Variation
                                                                                                some v...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Unlocking Biologically-Inspired Computer Vision: a High-Throughput Approach (David C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Unlocking Biologically-Inspired Computer Vision: a High-Throughput Approach (David C...
Upcoming SlideShare
Loading in...5
×

IAP09 CUDA@MIT 6.963 - Guest Lecture: Unlocking Biologically-Inspired Computer Vision: a High-Throughput Approach (David Cox, Harvard | Jim DiCarlo, Nicolas Pinto, MIT)

12,282

Published on

Unlocking Biologically-Inspired Computer Vision: a High-Throughput Approach
David Cox, Harvard | James DiCarlo, Nicolas Pinto, MIT

Abstract:

The study of biological vision and the creation of artificial vision systems are naturally intertwined – exploration of the neuronal substrates of visual processing provides clues and inspiration for artificial systems, and artificial systems, in turn, serve as important generators of new ideas and working hypotheses. However, while systems neuroscience has provided inspiration for some of the "broad-stroke" properties of the visual system, much is still unknown. Even for those qualitative properties that most biologically-inspired models share, experimental data currently provide little constraint on their key parameters. Consequently, it is difficult to truly evaluate a set of computational ideas, since the performance of a model depends strongly on its particular instantiation – the size of the pooling kernels, the number of units per layer, exponents in normalization operations, etc.

To pave a way forward, we have developed a high-throughput approach to more expansively explore the possible range of biologically-inspired models, including models of larger, more realistic scale, leveraging recent advances in commodity stream processing hardware - particularly, high-end NVIDIA GPUs. In analogy to high-throughput screening approaches in molecular biology and genetics, we generated and trained thousands of potential network architectures and parameter instantiations, and "screened" the visual representations produced by these models using an object recognition task. From these candidate models, the most promising were selected for further analysis. We have shown that this approach can yield significant, reproducible gains in performance in a basic object recognition tasks, and that it can offer insight into which computational ideas are most important for achieving this performance.

In this talk, I'll also highlight how the application of flexible programming tools, such as high-level scripting and template metaprogramming, can enable large performance gains, while managing complexity for the developer. As the scale of available computational power continues to expand, our approach holds great potential both for accelerating progress in artificial vision, and for generating new, experimentally-testable hypotheses for the study of biological vision.

More on this research:
http://pinto.scripts.mit.edu/Research/Research
http://pinto.scripts.mit.edu/Research/Monster16GPU
http://www.rowland.harvard.edu/rjf/cox/Projects/projects/computation.html

More on IAP09 CUDA@MIT 6.963 at http://sites.google.com/site/cudaiap2009 and http://pinto.scripts.mit.edu/Classes/CUDAIAP2009

Published in: Education
1 Comment
19 Likes
Statistics
Notes
  • It's a long presentation, but worth reading it. Full of useful contents.

    Mark Chang, www.free-ringtones.co.in/ www.free-ringtones-for-sprint.com/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
12,282
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
1
Likes
19
Embeds 0
No embeds

No notes for slide

IAP09 CUDA@MIT 6.963 - Guest Lecture: Unlocking Biologically-Inspired Computer Vision: a High-Throughput Approach (David Cox, Harvard | Jim DiCarlo, Nicolas Pinto, MIT)

  1. 1. A High-Throughput Approach to Discovering Good Forms of Visual Representation David Cox The Rowland Institute at Harvard Nicolas Pinto Jim DiCarlo MIT BCS The Rowland Institute at Harvard HARVARD UNIVERSITY
  2. 2. Goals
  3. 3. Goals 1) “Building Brains” Concrete example of real-world experiments fundamental enabled by stream processing hardware
  4. 4. Goals 1) “Building Brains” Concrete example of real-world experiments fundamental enabled by stream processing hardware 2) Tricks of the trade Some high-level highlights for how we leverage CUDA to achieve our goals
  5. 5. Goals 1) “Building Brains” Concrete example of real-world experiments fundamental enabled by stream processing hardware 2) Tricks of the trade Some high-level highlights for how we leverage CUDA to achieve our goals
  6. 6. The Problem Why is vision hard?
  7. 7. The Problem Why is vision hard? World is 3D, but retina is 2D
  8. 8. The Problem Why is vision hard? World is 3D, but retina is 2D The same object can cast an infinite number of dierent images onto the retina
  9. 9. The Problem Object Variation
  10. 10. The Problem Object Variation
  11. 11. The Problem Object Variation
  12. 12. The Problem Object Variation
  13. 13. Transformation Identity
  14. 14. Transformation Identity
  15. 15. Transformation Identity
  16. 16. Transformation Identity
  17. 17. Transformation Identity
  18. 18. Transformation Identity Bigger difference in pixel space
  19. 19. The Approach: Reverse Engineering the Brain REVERSE Study Natural System
  20. 20. The Approach: Reverse Engineering the Brain REVERSE FORWARD Study Build Natural System Artificial System
  21. 21. The Approach: Reverse Engineering the Brain REVERSE FORWARD Study Build Natural System Artificial System
  22. 22. Reverse Engineering the Brain
  23. 23. Monkeys
  24. 24. Monkeys
  25. 25. Rats
  26. 26. Rats
  27. 27. Visual System
  28. 28. Visual System
  29. 29. IT Neurons: Complex Stimuli Desimone et al.
  30. 30. IT Cortex can do object recognition Hung et al., 2005
  31. 31. Reverse Engineering the Brain REVERSE FORWARD Study Build Natural System Artificial System
  32. 32. Reverse Engineering the Brain REVERSE FORWARD Study Build Natural System Artificial System
  33. 33. Object Recognition Training Set
  34. 34. Object Recognition Training Set Representation
  35. 35. Object Recognition Training Set Representation Classifier
  36. 36. Object Recognition Training Set Representation Classifier
  37. 37. Object Recognition Training Set Representation Classifier Test Example Representation
  38. 38. Object Recognition Training Set Representation Classifier Test Example Representation Guess
  39. 39. Object Recognition
  40. 40. Object Recognition Training Set
  41. 41. Object Recognition Training Set
  42. 42. Object Recognition Training Set
  43. 43. Object Recognition Training Set
  44. 44. Object Recognition Training Set
  45. 45. Object Recognition Testing Set
  46. 46. Object Recognition Testing Set
  47. 47. Object Recognition Testing Set “Siri”
  48. 48. Object Recognition Testing Set “Siri”
  49. 49. Object Recognition Testing Set “Not Siri” “Siri”
  50. 50. How are things done normally?
  51. 51. How are things done normally? Usual Formula:
  52. 52. How are things done normally? Usual Formula: 1) One grad student
  53. 53. How are things done normally? Usual Formula: 1) One grad student 2) One Model (size limited by runtime)
  54. 54. How are things done normally? Usual Formula: 1) One grad student 2) One Model (size limited by runtime) 3) Performance numbers on a few standard test sets
  55. 55. How are things done normally? Usual Formula: 1) One grad student 2) One Model (size limited by runtime) 3) Performance numbers on a few standard test sets 4) yay. we. rock.
  56. 56. How are things done normally? Usual Formula: 1) One grad student 2) One Model (size limited by runtime) 3) Performance numbers on a few standard test sets 4) yay. we. rock. 5) One Ph.D.
  57. 57. Why is this not optimal?
  58. 58. Why is this not optimal? • Lots of parameters – can’t explore easily
  59. 59. Why is this not optimal? • Lots of parameters – can’t explore easily • Big models are paralyzingly slow to run
  60. 60. Why is this not optimal? • Lots of parameters – can’t explore easily • Big models are paralyzingly slow to run • Advice from my friend:
  61. 61. Why is this not optimal? • Lots of parameters – can’t explore easily • Big models are paralyzingly slow to run • Advice from my friend: “Don’t run anything that takes longer than a week to complete, because it will just crash halfway through anyways (or you’ll discover a bug) and you’ll never finish your Ph.D.”
  62. 62. Doing things a little bit dierently
  63. 63. Doing things a little bit dierently 1) One grad student
  64. 64. Doing things a little bit dierently 1) One grad student 2) One Hundreds of Thousands of BIG Models
  65. 65. Doing things a little bit dierently 1) One grad student 2) One Hundreds of Thousands of BIG Models 3) Performance numbers on a few standard test sets*
  66. 66. Doing things a little bit dierently 1) One grad student 2) One Hundreds of Thousands of BIG Models 3) Performance numbers on a few standard test sets*
  67. 67. Doing things a little bit dierently 1) One grad student 2) One Hundreds of Thousands of BIG Models 3) Performance numbers on a few standard test sets* 4) yay. we. rock.
  68. 68. Doing things a little bit dierently 1) One grad student 2) One Hundreds of Thousands of BIG Models 3) Performance numbers on a few standard test sets* 4) yay. we. rock. 5) One Ph.D.?
  69. 69. High-Throughput Screening
  70. 70. High-Throughput Screening
  71. 71. Pipeline: Biology
  72. 72. Pipeline: Biology “Plate” a Diversity of Organisms
  73. 73. Pipeline: Biology “Plate” a Diversity Allow them of Organisms to grow
  74. 74. Pipeline: Biology “Plate” a Diversity Allow them Apply Challenge of Organisms to grow
  75. 75. Pipeline: Biology “Plate” a Diversity Allow them Apply Challenge of Organisms to grow Collect Surviving Colonies
  76. 76. Pipeline: Biology “Plate” a Diversity Allow them Apply Challenge of Organisms to grow Collect Surviving Study / Repeat Colonies
  77. 77. Pipeline: Biology-Inspired Vision
  78. 78. Pipeline: Biology-Inspired Vision Generate Random Models
  79. 79. Pipeline: Biology-Inspired Vision Generate Unsupervised Random Models Learning (Video)
  80. 80. Pipeline: Biology-Inspired Vision Generate Unsupervised Test with Random Models Learning (Video) “screening” task
  81. 81. Pipeline: Biology-Inspired Vision Generate Unsupervised Test with Random Models Learning (Video) “screening” task Skim o best models
  82. 82. Pipeline: Biology-Inspired Vision Generate Unsupervised Test with Random Models Learning (Video) “screening” task Validate on other Skim o best tasks models
  83. 83. Need to break all of the implicit rules
  84. 84. Need to break all of the implicit rules 1. Test tens to hundreds of thousands of instantiations of biologically-inspired hierarchical models
  85. 85. Need to break all of the implicit rules 1. Test tens to hundreds of thousands of instantiations of biologically-inspired hierarchical models 2. Use more realistic inputs (e.g. large quantities of video)
  86. 86. Need to break all of the implicit rules 1. Test tens to hundreds of thousands of instantiations of biologically-inspired hierarchical models 2. Use more realistic inputs (e.g. large quantities of video) 3. Test models which begin to approach the scale of natural systems
  87. 87. Massive Computation
  88. 88. Massive Computation If you work for it, a multi-TeraFLOPS cluster is doable for a modest price
  89. 89. Long, winding road of stream processing
  90. 90. GPU pr0n
  91. 91. GPU pr0n
  92. 92. A Match Made in Heaven Brains are parallel, GPUs are parallel ≈
  93. 93. A Match Made in Heaven Brains are parallel, GPUs are parallel ≈ Multiple scales of parallelism:
  94. 94. A Match Made in Heaven Brains are parallel, GPUs are parallel ≈ Multiple scales of parallelism: “Embarrasingly” parallel: video frames, regions
  95. 95. A Match Made in Heaven Brains are parallel, GPUs are parallel ≈ Multiple scales of parallelism: “Embarrasingly” parallel: video frames, regions Fine-grained: independent “neurons,” operating on overlapping inputs
  96. 96. A Match Made in Heaven Images In, Images Out ≈
  97. 97. A Match Made in Heaven Images In, Images Out ≈ Image processing particularly well-suited
  98. 98. A Match Made in Heaven Images In, Images Out ≈ Image processing particularly well-suited Excellent Arithmetic Intensity: very natural to load image patches into shared memory
  99. 99. A Match Made in Heaven Images In, Images Out ≈ Image processing particularly well-suited Excellent Arithmetic Intensity: very natural to load image patches into shared memory Data: 2D / 3D locality
  100. 100. These things are REALLY fast Performance (g ops) Development Time (hours)
  101. 101. These things are REALLY fast Performance (g ops) Development Time (hours) Matlab C/SSE PS3 GT200
  102. 102. These things are REALLY fast Performance (g ops) Development Time (hours) 0.3 Matlab C/SSE PS3 GT200
  103. 103. These things are REALLY fast Performance (g ops) Development Time (hours) 0.3 Matlab 9.0 C/SSE PS3 GT200
  104. 104. These things are REALLY fast Performance (g ops) Development Time (hours) 0.3 Matlab 9.0 C/SSE 110.0 PS3 GT200
  105. 105. These things are REALLY fast Performance (g ops) Development Time (hours) 0.3 Matlab 9.0 C/SSE 110.0 PS3 330.0 GT200
  106. 106. These things are REALLY fast Performance (g ops) Development Time (hours) 0.3 Matlab 0.5 9.0 C/SSE 110.0 PS3 330.0 GT200
  107. 107. These things are REALLY fast Performance (g ops) Development Time (hours) 0.3 Matlab 0.5 9.0 C/SSE 10.0 110.0 PS3 330.0 GT200
  108. 108. These things are REALLY fast Performance (g ops) Development Time (hours) 0.3 Matlab 0.5 9.0 C/SSE 10.0 110.0 PS3 30.0 330.0 GT200
  109. 109. These things are REALLY fast Performance (g ops) Development Time (hours) 0.3 Matlab 0.5 9.0 C/SSE 10.0 110.0 PS3 30.0 330.0 GT200 10.0
  110. 110. Pipeline Generate Unsupervised Test with Random Models Learning (Video) “screening” task Validate on other Skim o best tasks models
  111. 111. Pipeline Generate Unsupervised Test with Random Models Learning (Video) “screening” task Validate on other Skim o best tasks models
  112. 112. Read-out L3 thresh/sat norm strength Learning normalization neighborhood Rate Trace “Temp. Adv.” “Auto-reset” ... number of lters L2 thresh/sat norm strength Learning normalization Rate neighborhood Trace kernel “Temp. Adv.” size “Auto-reset” ... n. of lters L1 Learning thresh/sat norm strength Rate normalization Trace neighborhood “Temp. Adv.” “Auto-reset” ... kernel size number of lters input kernel size
  113. 113. neighborhood Rate Trace “Temp. Adv.” “Auto-reset” ... number of lters L2 thresh/sat norm strength Learning normalization Rate neighborhood Trace kernel “Temp. Adv.” size “Auto-reset” ... n. of lters L1 Learning thresh/sat norm strength Rate normalization Trace neighborhood “Temp. Adv.” “Auto-reset” ... kernel size
  114. 114. A Broad Parametric Model Normalize Ni = Inputi / norm(Inputneighborhd) Compute Filter Responses Ri = Fi ⊗ N Ri thresh: Ri = thresh Ri sat: Ri = sat Determine a “Winning Filter” Ri’ = (∑ Tk * Hk) * Ri winner: max(Ri’) Update Filter Fwinning = Fwinning + learning rate * N
  115. 115. A Broad Parametric Model Normalize • Optimize “Coverage” Ni = Inputi / norm(Inputneighborhd) Compute Filter Responses Ri = Fi ⊗ N Ri thresh: Ri = thresh Ri sat: Ri = sat Determine a “Winning Filter” Ri’ = (∑ Tk * Hk) * Ri winner: max(Ri’) Update Filter Fwinning = Fwinning + learning rate * N
  116. 116. A Broad Parametric Model Normalize • Optimize “Coverage” Ni = Inputi / norm(Inputneighborhd) ( lters span the range of observed inputs) Compute Filter Responses Ri = Fi ⊗ N Ri thresh: Ri = thresh Ri sat: Ri = sat Determine a “Winning Filter” Ri’ = (∑ Tk * Hk) * Ri winner: max(Ri’) Update Filter Fwinning = Fwinning + learning rate * N
  117. 117. A Broad Parametric Model Normalize • Optimize “Coverage” Ni = Inputi / norm(Inputneighborhd) ( lters span the range of observed inputs) Compute Filter Responses Ri = Fi ⊗ N • Privilege movement of Ri thresh: Ri = thresh lters in certain Ri sat: Ri = sat directions using Determine a “Winning Filter” temporal information Ri’ = (∑ Tk * Hk) * Ri winner: max(Ri’) Update Filter Fwinning = Fwinning + learning rate * N
  118. 118. A Broad Parametric Model Normalize • Optimize “Coverage” Ni = Inputi / norm(Inputneighborhd) ( lters span the range of observed inputs) Compute Filter Responses Ri = Fi ⊗ N • Privilege movement of Ri thresh: Ri = thresh lters in certain Ri sat: Ri = sat directions using Determine a “Winning Filter” temporal information Ri’ = (∑ Tk * Hk) * Ri winner: max(Ri’) • Expand dimensionality greatly and then scale Update Filter Fwinning = Fwinning + learning rate * N back as layers progress
  119. 119. Dealing with Parametric Complexity
  120. 120. Dealing with Parametric Complexity Throwing a wide net comes with its own challenges:
  121. 121. Dealing with Parametric Complexity Throwing a wide net comes with its own challenges: Complexity •
  122. 122. Dealing with Parametric Complexity Throwing a wide net comes with its own challenges: Complexity • Kernels must perform under widely • varying conditions
  123. 123. Dealing with Parametric Complexity Throwing a wide net comes with its own challenges: Complexity • Kernels must perform under widely • varying conditions Best kernel for a 3x3 conv may not be the same as the best kernel for a 17x17 one; more complex operations are even hairier
  124. 124. Meta-programming
  125. 125. Meta-programming Leave the grunt-programming to the computer
  126. 126. Meta-programming Leave the grunt-programming to the computer Dynamically compile specialized versions of • the same kernel for dierent conditions
  127. 127. Meta-programming Leave the grunt-programming to the computer Dynamically compile specialized versions of • the same kernel for dierent conditions Smooth syntactic ugliness: unroll loops, • index un-indexable registers
  128. 128. Meta-programming Leave the grunt-programming to the computer Dynamically compile specialized versions of • the same kernel for dierent conditions Smooth syntactic ugliness: unroll loops, • index un-indexable registers Dynamic, empirical run-time tuning •
  129. 129. texturefloat4, 1, cudaReadModeElementType tex_float4; __constant__ float constant[$FILTER_D][$FILTER_W][$N_FILTERS]; #define IMUL(a, b) __mul24(a, b) extern quot;Cquot; { #for j in xrange($FILTER_H) __global__ void convolve_beta_j${j}(float4 *input, float4 *output) { #set INPUT_BLOCK_W = $BLOCK_W+$FILTER_W-1 __shared__ float shared_in[$INPUT_BLOCK_W][4+1]; // -- input/output offsets const uint in_idx = (blockIdx.y+$j)*INPUT_W + blockIdx.x*blockDim.x + threadIdx.x; const uint out_idx = blockIdx.y*OUTPUT_W + blockIdx.x*blockDim.x + threadIdx.x; float4 input_v4; // -- load input to shared memory #for i in xrange($LOAD_ITERATIONS) #if $i==($LOAD_ITERATIONS-1) if((threadIdx.x+$BLOCK_W*$i)$INPUT_BLOCK_W) #end if { input_v4 = tex1Dfetch(tex_float4, in_idx+$BLOCK_W*$i); shared_in[threadIdx.x+$BLOCK_W*$i][0] = input_v4.x; shared_in[threadIdx.x+$BLOCK_W*$i][1] = input_v4.y; shared_in[threadIdx.x+$BLOCK_W*$i][2] = input_v4.z;
  130. 130. texturefloat4, 1, cudaReadModeElementType tex_float4; __constant__ float constant[$FILTER_D][$FILTER_W][$N_FILTERS]; #define IMUL(a, b) __mul24(a, b) extern quot;Cquot; { #for j in xrange($FILTER_H) __global__ void convolve_beta_j${j}(float4 *input, float4 *output) { #set INPUT_BLOCK_W = $BLOCK_W+$FILTER_W-1 __shared__ float shared_in[$INPUT_BLOCK_W][4+1]; // -- input/output offsets const uint in_idx = (blockIdx.y+$j)*INPUT_W + blockIdx.x*blockDim.x + threadIdx.x; const uint out_idx = blockIdx.y*OUTPUT_W + blockIdx.x*blockDim.x + threadIdx.x; float4 input_v4; // -- load input to shared memory #for i in xrange($LOAD_ITERATIONS) #if $i==($LOAD_ITERATIONS-1) if((threadIdx.x+$BLOCK_W*$i)$INPUT_BLOCK_W) #end if { input_v4 = tex1Dfetch(tex_float4, in_idx+$BLOCK_W*$i); shared_in[threadIdx.x+$BLOCK_W*$i][0] = input_v4.x; shared_in[threadIdx.x+$BLOCK_W*$i][1] = input_v4.y; shared_in[threadIdx.x+$BLOCK_W*$i][2] = input_v4.z;
  131. 131. conv_kernel_4x4x4.cu conv_kernel_template.cu #include stdio.h texturefloat4, 1, cudaReadModeElementType tex_float4; __constant__ float constant[4][4][4]; #define IMUL(a, b) __mul24(a, b) extern quot;Cquot; { texturefloat4, 1, cudaReadModeElementType tex_float4; __constant__ float constant[$FILTER_D][$FILTER_W] [$N_FILTERS]; __global__ void convolve_beta_j0(float4 *input, float4 *output) { #define IMUL(a, b) __mul24(a, b) __shared__ float shared_in[131][4+1]; extern quot;Cquot; { // -- input/output offsets #for j in xrange($FILTER_H) const uint in_idx = (blockIdx.y+0)*INPUT_W + blockIdx.x*blockDim.x + threadIdx.x; const uint out_idx = blockIdx.y*OUTPUT_W + blockIdx.x*blockDim.x + threadIdx.x; float4 input_v4; __global__ void convolve_beta_j${j}(float4 *input, float4 *output) // -- load input to shared memory { { input_v4 = tex1Dfetch(tex_float4, in_idx+128*0); #set INPUT_BLOCK_W = $BLOCK_W+$FILTER_W-1 shared_in[threadIdx.x+128*0][0] = input_v4.x; shared_in[threadIdx.x+128*0][1] = input_v4.y; __shared__ float shared_in[$INPUT_BLOCK_W][4+1]; shared_in[threadIdx.x+128*0][2] = input_v4.z; shared_in[threadIdx.x+128*0][3] = input_v4.w; // -- input/output offsets } const uint in_idx = (blockIdx.y+$j)*INPUT_W + if((threadIdx.x+128*1)131) blockIdx.x*blockDim.x + threadIdx.x; { input_v4 = tex1Dfetch(tex_float4, in_idx+128*1); const uint out_idx = blockIdx.y*OUTPUT_W + shared_in[threadIdx.x+128*1][0] = input_v4.x; blockIdx.x*blockDim.x + threadIdx.x; shared_in[threadIdx.x+128*1][1] = input_v4.y; float4 input_v4; shared_in[threadIdx.x+128*1][2] = input_v4.z; shared_in[threadIdx.x+128*1][3] = input_v4.w; // -- load input to shared memory } __syncthreads(); #for i in xrange($LOAD_ITERATIONS) #if $i==($LOAD_ITERATIONS-1) // -- compute dot products if((threadIdx.x+$BLOCK_W*$i)$INPUT_BLOCK_W) float v, w; #end if { float sum0 = 0; float sum1 = 0; input_v4 = tex1Dfetch(tex_float4, in_idx+$BLOCK_W* float sum2 = 0; $i); float sum3 = 0; shared_in[threadIdx.x+$BLOCK_W*$i][0] = input_v4.x; shared_in[threadIdx.x+$BLOCK_W*$i][1] = input_v4.y; v = shared_in[threadIdx.x+0][0]; shared_in[threadIdx.x+$BLOCK_W*$i][2] = input_v4.z; w = constant[0][0][0]; sum0 += v*w; shared_in[threadIdx.x+$BLOCK_W*$i][3] = input_v4.w; w = constant[0][0][1]; } sum1 += v*w; #end for w = constant[0][0][2]; sum2 += v*w; w = constant[0][0][3]; sum3 += v*w; v = shared_in[threadIdx.x+1][0]; w = constant[0][1][0]; sum0 += v*w; w = constant[0][1][1]; sum1 += v*w; w = constant[0][1][2]; sum2 += v*w; w = constant[0][1][3]; sum3 += v*w; v = shared_in[threadIdx.x+2][0]; w = constant[0][2][0]; sum0 += v*w; w = constant[0][2][1]; sum1 += v*w;
  132. 132. conv_kernel_template.cu texturefloat4, 1, cudaReadModeElementType tex_float4; __constant__ float constant[$FILTER_D][$FILTER_W] conv_kernel_4x4x4.cu [$N_FILTERS]; #define IMUL(a, b) __mul24(a, b) extern quot;Cquot; { #for j in xrange($FILTER_H) __global__ void convolve_beta_j${j}(float4 *input, float4 20 kB *output) { #set INPUT_BLOCK_W = $BLOCK_W+$FILTER_W-1 __shared__ float shared_in[$INPUT_BLOCK_W][4+1]; // -- input/output offsets const uint in_idx = (blockIdx.y+$j)*INPUT_W + blockIdx.x*blockDim.x + threadIdx.x; const uint out_idx = blockIdx.y*OUTPUT_W + blockIdx.x*blockDim.x + threadIdx.x; float4 input_v4; // -- load input to shared memory #for i in xrange($LOAD_ITERATIONS) #if $i==($LOAD_ITERATIONS-1) if((threadIdx.x+$BLOCK_W*$i)$INPUT_BLOCK_W) #end if conv_kernel_8x8x4.cu { input_v4 = tex1Dfetch(tex_float4, in_idx+$BLOCK_W* $i); shared_in[threadIdx.x+$BLOCK_W*$i][0] = input_v4.x; shared_in[threadIdx.x+$BLOCK_W*$i][1] = input_v4.y; shared_in[threadIdx.x+$BLOCK_W*$i][2] = input_v4.z; shared_in[threadIdx.x+$BLOCK_W*$i][3] = input_v4.w; } 64 kB #end for
  133. 133. conv_kernel_beta_template.cu texturefloat4, 1, cudaReadModeElementType tex_float4; __constant__ float constant[$FILTER_D][$FILTER_W] [$N_FILTERS]; #define IMUL(a, b) __mul24(a, b) extern quot;Cquot; { #for j in xrange($FILTER_H) __global__ void convolve_beta_j${j}(float4 *input, float4 *output) { #set INPUT_BLOCK_W = $BLOCK_W+$FILTER_W-1 __shared__ float shared_in[$INPUT_BLOCK_W][4+1]; // -- input/output offsets const uint in_idx = (blockIdx.y+$j)*INPUT_W + blockIdx.x*blockDim.x + threadIdx.x; const uint out_idx = blockIdx.y*OUTPUT_W + blockIdx.x*blockDim.x + threadIdx.x; float4 input_v4; // -- load input to shared memory #for i in xrange($LOAD_ITERATIONS) #if $i==($LOAD_ITERATIONS-1) if((threadIdx.x+$BLOCK_W*$i)$INPUT_BLOCK_W) #end if { input_v4 = tex1Dfetch(tex_float4, in_idx+$BLOCK_W* $i); shared_in[threadIdx.x+$BLOCK_W*$i][0] = input_v4.x; shared_in[threadIdx.x+$BLOCK_W*$i][1] = input_v4.y; shared_in[threadIdx.x+$BLOCK_W*$i][2] = input_v4.z; shared_in[threadIdx.x+$BLOCK_W*$i][3] = input_v4.w; } #end for
  134. 134. conv_kernel_beta_template.cu texturefloat4, 1, cudaReadModeElementType tex_float4; __constant__ float constant[$FILTER_D][$FILTER_W] [$N_FILTERS]; #define IMUL(a, b) __mul24(a, b) extern quot;Cquot; { #for j in xrange($FILTER_H) __global__ void convolve_beta_j${j}(float4 *input, float4 *output) { #set INPUT_BLOCK_W = $BLOCK_W+$FILTER_W-1 __shared__ float shared_in[$INPUT_BLOCK_W][4+1]; // -- input/output offsets seemingly const uint in_idx = (blockIdx.y+$j)*INPUT_W + blockIdx.x*blockDim.x + threadIdx.x; const uint out_idx = blockIdx.y*OUTPUT_W + innocuous “flow” blockIdx.x*blockDim.x + threadIdx.x; float4 input_v4; parameter change // -- load input to shared memory #for i in xrange($LOAD_ITERATIONS) #if $i==($LOAD_ITERATIONS-1) if((threadIdx.x+$BLOCK_W*$i)$INPUT_BLOCK_W) #end if { input_v4 = tex1Dfetch(tex_float4, in_idx+$BLOCK_W* $i); shared_in[threadIdx.x+$BLOCK_W*$i][0] = input_v4.x; shared_in[threadIdx.x+$BLOCK_W*$i][1] = input_v4.y; shared_in[threadIdx.x+$BLOCK_W*$i][2] = input_v4.z; shared_in[threadIdx.x+$BLOCK_W*$i][3] = input_v4.w; } #end for
  135. 135. version A ... conv_kernel_beta_template.cu mad.rn.f32 $r4, s[$ofs3+0x0000], $r4, $r1 mov.b32 $r1, c0[$ofs2+0x0008] mad.rn.f32 $r4, s[$ofs3+0x0008], $r1, $r4 texturefloat4, 1, cudaReadModeElementType tex_float4; __constant__ float constant[$FILTER_D][$FILTER_W] mov.b32 $r1, c0[$ofs2+0x000c] [$N_FILTERS]; mad.rn.f32 $r4, s[$ofs3+0x000c], $r1, $r4 #define IMUL(a, b) __mul24(a, b) extern quot;Cquot; { mov.b32 $r1, c0[$ofs2+0x0010] #for j in xrange($FILTER_H) mad.rn.f32 $r4, s[$ofs3+0x0010], $r1, $r4 __global__ void convolve_beta_j${j}(float4 *input, float4 *output) ... { #set INPUT_BLOCK_W = $BLOCK_W+$FILTER_W-1 __shared__ float shared_in[$INPUT_BLOCK_W][4+1]; // -- input/output offsets seemingly const uint in_idx = (blockIdx.y+$j)*INPUT_W + blockIdx.x*blockDim.x + threadIdx.x; const uint out_idx = blockIdx.y*OUTPUT_W + innocuous “flow” blockIdx.x*blockDim.x + threadIdx.x; float4 input_v4; parameter change // -- load input to shared memory #for i in xrange($LOAD_ITERATIONS) #if $i==($LOAD_ITERATIONS-1) if((threadIdx.x+$BLOCK_W*$i)$INPUT_BLOCK_W) #end if { input_v4 = tex1Dfetch(tex_float4, in_idx+$BLOCK_W* $i); shared_in[threadIdx.x+$BLOCK_W*$i][0] = input_v4.x; shared_in[threadIdx.x+$BLOCK_W*$i][1] = input_v4.y; shared_in[threadIdx.x+$BLOCK_W*$i][2] = input_v4.z; shared_in[threadIdx.x+$BLOCK_W*$i][3] = input_v4.w; } #end for
  136. 136. version A ... conv_kernel_beta_template.cu mad.rn.f32 $r4, s[$ofs3+0x0000], $r4, $r1 mov.b32 $r1, c0[$ofs2+0x0008] mad.rn.f32 $r4, s[$ofs3+0x0008], $r1, $r4 texturefloat4, 1, cudaReadModeElementType tex_float4; __constant__ float constant[$FILTER_D][$FILTER_W] mov.b32 $r1, c0[$ofs2+0x000c] [$N_FILTERS]; mad.rn.f32 $r4, s[$ofs3+0x000c], $r1, $r4 #define IMUL(a, b) __mul24(a, b) extern quot;Cquot; { mov.b32 $r1, c0[$ofs2+0x0010] #for j in xrange($FILTER_H) mad.rn.f32 $r4, s[$ofs3+0x0010], $r1, $r4 __global__ void convolve_beta_j${j}(float4 *input, float4 *output) ... { #set INPUT_BLOCK_W = $BLOCK_W+$FILTER_W-1 __shared__ float shared_in[$INPUT_BLOCK_W][4+1]; // -- input/output offsets seemingly const uint in_idx = (blockIdx.y+$j)*INPUT_W + blockIdx.x*blockDim.x + threadIdx.x; const uint out_idx = blockIdx.y*OUTPUT_W + innocuous “flow” blockIdx.x*blockDim.x + threadIdx.x; float4 input_v4; parameter change // -- load input to shared memory #for i in xrange($LOAD_ITERATIONS) #if $i==($LOAD_ITERATIONS-1) version B if((threadIdx.x+$BLOCK_W*$i)$INPUT_BLOCK_W) #end if { input_v4 = tex1Dfetch(tex_float4, in_idx+$BLOCK_W* ... $i); shared_in[threadIdx.x+$BLOCK_W*$i][0] = input_v4.x; shared_in[threadIdx.x+$BLOCK_W*$i][1] = input_v4.y; shared_in[threadIdx.x+$BLOCK_W*$i][2] = input_v4.z; mad.rn.f32 $r1, s[$ofs1+0x007c], c0[$ofs1+0x0078], $r1 shared_in[threadIdx.x+$BLOCK_W*$i][3] = input_v4.w; } mad.rn.f32 $r1, s[$ofs2+0x0000], c0[$ofs2+0x007c], $r1 #end for mad.rn.f32 $r1, s[$ofs2+0x0008], c0[$ofs2+0x0080], $r1 mad.rn.f32 $r1, s[$ofs2+0x000c], c0[$ofs2+0x0084], $r1 mad.rn.f32 $r1, s[$ofs2+0x0010], c0[$ofs2+0x0088], $r1 ...
  137. 137. Pipeline Generate Unsupervised Test with Random Models Learning (Video) “screening” task Validate on other Skim o best tasks models
  138. 138. Pipeline Generate Unsupervised Test with Random Models Learning (Video) “screening” task Validate on other Skim o best tasks models
  139. 139. Unsupervised Experience Learn Filter Kernels from Temporal Input Statistics
  140. 140. Unsupervised Experience Learn Filter Kernels from Temporal Input Statistics law order
  141. 141. Screening A quick object rec. test to find promising models Generate Unsupervised Test with Random Models Learning (Video) “screening” task Validate on other Skim o best tasks models
  142. 142. Screening A quick object rec. test to find promising models Generate Unsupervised Test with Random Models Learning (Video) “screening” task Validate on other Skim o best tasks models
  143. 143. Screening A quick object rec. test to find promising models “cars” “planes”
  144. 144. Screening A quick object rec. test to find promising models Parametric Variation no variation
  145. 145. Screening A quick object rec. test to find promising models Parametric Variation no variation more variation
  146. 146. Screening A quick object rec. test to find promising models Parametric Variation no variation more variation lots of variation
  147. 147. Screening 250 200 N=2500 150 100 50 0 50 60 70 80 90 100
  148. 148. Screening 250 200 N=2500 150 100 50 0 50 60 70 80 90 100
  149. 149. Screening 100 250 90 80 200 N=2500 70 150 60 100 50 V1S 1 2 3 4 5 50 0 50 60 70 80 90 100
  150. 150. Validation See how we do on other test sets Generate Unsupervised Test with Random Models Learning (Video) “screening” task Validate on other Skim o best tasks models
  151. 151. Validation See how we do on other test sets Generate Unsupervised Test with Random Models Learning (Video) “screening” task Validate on other Skim o best tasks models
  152. 152. Validation cars vs. planes (validate) boats vs. animals 100 100 90 90 80 80 70 70 60 60 50 50 V1S 1 2 3 4 5 V1S 1 2 3 4 5
  153. 153. Validation synthetic face webcam faces discrimination 100 100 90 90 80 80 70 70 60 60 50 50 5 V1S 1 2 3 4 5 V1S 1 2 3 4 5
  154. 154. Other “Petri Dishes” boats
  155. 155. Other “Petri Dishes” cars and planes boats
  156. 156. Other “Petri Dishes” Unsupervised Screen Distribution Validation Training “World” synthetic face cars vs. planes (validate) boats vs. animals webcam faces discrimination cars and planes 250 100 100 100 100 200 N=2500 90 90 90 90 150 80 80 80 80 70 70 70 70 100 60 60 60 60 50 50 50 50 50 V1S 1 2 3 4 5 V1S 1 2 3 4 5 V1S 1 2 3 4 5 V1S 1 2 3 4 5 50 60 70 80 90 100 boats 250 200 100 100 100 100 N=2500 90 90 90 90 150 80 80 80 80 70 70 70 70 100 60 60 60 60 50 50 50 50 50 V1S 1 2 3 4 5 V1S 1 2 3 4 5 V1S 1 2 3 4 5 V1S 1 2 3 4 5 50 60 70 80 90 100 law order 250 100 100 100 100 200 N=2500 90 90 90 90 80 80 80 80 150 70 70 70 70 100 60 60 60 60 50 50 50 50 50 V1S 1 2 3 4 5 V1S 1 2 3 4 5 V1S 1 2 3 4 5 V1S 1 2 3 4 5 0 50 60 70 80 90 100
  157. 157. Screening Basically Works 3 2 1 cars vs. planes (validate) 0 -1 r=0.84 -2 -2 -1 0 1 2 3 cars vs. planes (screen)
  158. 158. Screening Basically Works 3 3 2 2 1 1 boats vs. animals cars vs. planes 0 (validate) 0 -1 -1 r=0.60 r=0.84 -2 -2 -2 -1 0 1 2 3 -2 -1 0 1 2 3 cars vs. planes (screen) cars vs. planes (screen) 3 4 2 3 1 2 0 discrimination 1 synthetic face webcam faces -1 0 -2 -1 r=0.23 r=0.49 -3 -2 -2 -1 0 1 2 3 -2 -1 0 1 2 3 cars vs. planes (screen) cars vs. planes (screen)
  159. 159. Screening Works on Average 3 2 noramalized average 1 0 -1 r=0.69 -2 -1 0 1 2 3 cars vs. planes (screen)
  160. 160. What does it all mean?
  161. 161. What does it all mean? Preprocessing / Normalization / Zero-mean Layer 1 / Linear Filtering / RF size 0.9 0.9 Performance (% correct) Performance (% correct) 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 False True 3 5 7 9 Parameter value Parameter value
  162. 162. What does it all mean? Layer 1 / Normalization / Threshold Layer 3 / Learning / Neighborhood size 0.9 0.9 Performance (% correct) Performance (% correct) 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 01 10 1 3 5 7 9 Parameter value Parameter value
  163. 163. Summary
  164. 164. Summary • GPUs allow us to engage a qualitatively dierent pace of experimentation
  165. 165. Summary • GPUs allow us to engage a qualitatively dierent pace of experimentation • CUDA provides unprecedented performance / eort ratio
  166. 166. Summary • GPUs allow us to engage a qualitatively dierent pace of experimentation • CUDA provides unprecedented performance / eort ratio • New laboratory for studying visual representation
  167. 167. Shameless Recruitment http://www.rowland.harvard.edu/rjf/cox
  168. 168. Shameless Recruitment http://www.rowland.harvard.edu/rjf/cox
  169. 169. Shameless Recruitment Now with E NVID xtra Good IA ness! http://www.rowland.harvard.edu/rjf/cox
  170. 170. Acknowledgments Cox Lab @ The Rowland Institute at Harvard • Davide Zoccolan • Nadja Oertelt
  171. 171. Acknowledgments Cox Lab @ The Rowland Institute at Harvard • Davide Zoccolan • Nadja Oertelt DiCarlo Lab @ MIT • Jim DiCarlo • Nicolas Pinto
  172. 172. Acknowledgments Cox Lab @ The Rowland Institute at Harvard • Davide Zoccolan • Nadja Oertelt DiCarlo Lab @ MIT • Jim DiCarlo • Nicolas Pinto
  173. 173. Acknowledgments Cox Lab @ The Rowland Institute at Harvard • Davide Zoccolan • Nadja Oertelt DiCarlo Lab @ MIT • Jim DiCarlo • Nicolas Pinto The Rowland Institute at Harvard HARVARD UNIVERSITY
  174. 174. The Problem with Evaluation What set to use?
  175. 175. The Problem with Evaluation What set to use?
  176. 176. The Problem with Evaluation What set to use?
  177. 177. Caltech 101
  178. 178. faces_easy faces car_side airplanes accordion Caltech 101
  179. 179. Caltech 101 from A. Torralba
  180. 180. The field does well on the ‘101 Performance 100 100% 8080 1] Wang et al. 2006 60 60 [2] Grauman and Darrell 2005 [3] Mutch and Lowe 2006 [4] Lazebnik et al. 2006 40 40 [5] Zhang et al. 2006 20 20 0 0 1 2 3 4 5 state of the art systems
  181. 181. The “start-simple” model
  182. 182. The “start-simple” model • Input divisive normalization
  183. 183. The “start-simple” model • Input divisive normalization • Threshold, saturation
  184. 184. The “start-simple” model • Input divisive normalization • Threshold, saturation • Output normalization
  185. 185. The “start-simple” model • Input divisive normalization • Threshold, saturation • Output normalization • Downsampling • Dimensionality Reduction • Classi cation
  186. 186. The “start-simple” model • Input divisive normalization • Threshold, saturation • Output normalization • Downsampling • Dimensionality Reduction • Classi cation • Really Big (~2.1 million dim)
  187. 187. V1-like model does shockingly well Performance 100 100% 8080 60 60 40 40 20 20 0 0 1 2 3 4 5 state of the art systems
  188. 188. V1-like model does shockingly well Performance 100 100% 8080 60 60 40 40 20 20 0 0 1 2 3 4 5 V1-like state of the art systems
  189. 189. V1-like model does shockingly well Performance 100 100% 8080 60 60 40 40 20 20 0 0 1 2 3 4 5 V1-like V1+ state of the art systems
  190. 190. But the model is stupid It’s just V1. It has no speci c mechanisms for achieving invariance.
  191. 191. But the model is stupid It’s just V1. It has no speci c mechanisms for achieving invariance.
  192. 192. But the model is stupid It’s just V1. It has no speci c mechanisms for achieving invariance. The Fight is Rigged?
  193. 193. Refocusing on the real problem
  194. 194. Refocusing on the real problem Just two classes: should be easy, no?
  195. 195. Refocusing on the real problem Parametric Variation no variation
  196. 196. Refocusing on the real problem Parametric Variation no variation more variation
  197. 197. Refocusing on the real problem Parametric Variation no variation more variation lots of variation
  198. 198. Pulling performance back to chance 100 90 Performance (%) 80 70 60 50 0% 10% 20% 30% 40% 50% 60% position (x-axis) 0% 20% 40% 60% 80% 100% 120% position (y-axis) 0% 10% 20% 30% 40% 50% 60% scale 0° 15° 30° 45° 60° 75° 90° rotation 0° 15° 30° 45° 60° 75° 90° view Intra-class variation
  199. 199. Pulling performance back to chance 100 90 Performance (%) 80 70 60 50 0% 10% 20% 30% 40% 50% 60% position (x-axis) 0% 20% 40% 60% 80% 100% 120% position (y-axis) 0% 10% 20% 30% 40% 50% 60% scale 0° 15° 30° 45° 60° 75° 90° rotation 0° 15° 30° 45° 60° 75° 90° view Intra-class variation
  200. 200. Just one bad database?
  201. 201. Just one bad database? Caltech 256
  202. 202. Just one bad database? Caltech 256 Face Databases: ORL, Yale, AR, CVL
  203. 203. Just one bad database? Caltech 256 Face Databases: ORL, Yale, AR, CVL See us @ ECCV 08 Faces in the Wild Workshop
  204. 204. Face Databases ORL 4 training examples 8 training examples 100 Performance (% correct) 80 60 40 20 0 V1-like V1-like 1 2 3 4 1 2 3 4 1. pixel space 3. and 4. Noushath et al. 2007 2. Savvides et al. 2007
  205. 205. Face Databases AR 5 training 8 training examples examples 100 Performance (% correct) 80 60 40 20 1. pixel space 2. Liang et al. 2007 0 3 V1-like 3 V1-like 1 2 1 2 3. Zhang et al. 2007
  206. 206. Face Databases YALE 4 training 8 training examples examples 100 Performance (% correct) 90 80 70 60 1. pixel space 2.,3.,4. and 5. Noushath et al. 2006 50 V1-like V1-like 1 2 34567 1 2 3456 6. Ben et al. 2006 7. Wang et al. 2007 chance (1/15=6.67%)
  207. 207. Face Databases CVL 2 training 3 training examples examples (frontal only) (all) 100 Performance (% correct) 80 60 40 chance 20 (1/114=0.88%) 1. pixel space 2. Goel et al. 2005 0 3. Gokberk et al. 2002 V1-like V1-like 12 1 3
  208. 208. Face Databases CAS-PEAL 4 training examples facing facing facing 100 down forward up Performance (% correct) 80 60 40 20 1. pixel space 2.,3.,4 and 5. Cao et al. 2004 0 1 2 3 4 5 V1 1 2 3 4 5 V1 1 2 3 4 5 V1
  209. 209. Simulation vs.
  210. 210. Synthetic Variation some variation 100 90 Performance (% correct) 80 scene background white noise background 70 phase scrambled background 60 chance 50 more variation 40 30 position (x-axis) 0% 20% 40% 60% 80% 100% 120% position (y-axis) 0% 10% 20% 30% 40% 50% 60% scale 0% 10% 20% 30% 40% 50% 60% in-plane rotation 0° 15° 30° 45° 60° 75° 90° depth-rotation 0° 15° 30° 45° 60° 75° 90° Increasing Variation

×