The Opportunities and Challenges of Putting the Latest Computer Vision and Deep Learning Algorithms to Work
1. Putting the latest Computer Vision and
Deep Learning algorithms to work
The Opportunities and Challenges
Albert Y. C. Chen, Ph.D.
Vice President, R&D
Viscovery
2. Albert Y. C. Chen, Ph.D.
• Experience
2017-present: Vice President of R&D at Viscovery
2016-2017: Chief Scientist at Viscovery
2015: Principal Scientist @ Nervve Technologies
2013-2014 Computer Vision Scientist @ Tandent Vision
2011-2012 @ GE Global Research
• Education
Ph.D. in Computer Science, SUNY-Buffalo
M.S. in Computer Science, NTNU
B.S. in Computer Science, NTHU
• Some random things about me…
SUNY Excellence in Teaching Award, 2010.
Some rapid promotions, some failed startups, some
patents, some papers…
3. 1. W.Wu,A.Y. C. Chen, L. Zhao, and J. J. Corso. Brain tumor detection and segmentation in a CRF framework with pixel-wise
affinity and superpixel-level features. International Journal of Computer Assisted Radiology and Surgery, 2015.
2. S. N. Lim,A.Y. C. Chen and X.Yang. Parameter Inference Engine (PIE) on the Pareto Front. In Proceedings of International
Conference of Machine Learning,Auto ML Workshop, 2014.
3. A.Y. C. Chen, S.Whitt, C. Xu, and J. J. Corso. Hierarchical supervoxel fusion for robust pixel label propagation in videos. In
Submission to ACM Multimedia, 2013.
4. A.Y.C. Chen and J.J. Corso.Temporally consistent multi-class video-object segmentation with the video graph-shifts
algorithm. In Proceedings of IEEE Workshop on Applications of ComputerVision, 2011.
5. D.R. Schlegel,A.Y.C. Chen, C. Xiong, J.A. Delmerico, and J.J. Corso. Airtouch: Interacting with computer systems at a
distance. In Proceedings of IEEE Workshop on Applications of ComputerVision, 2011.
6. A.Y.C. Chen and J.J. Corso. On the effects of normalization in adaptive MRF Hierarchies. In Proceedings of International
Symposium CompIMAGE, 2010.
7. A.Y.C. Chen and J.J. Corso. Propagating multi-class pixel labels throughout video frames. In Proceedings of IEEE Western
NewYork Image Processing Workshop, 2010.
8. A.Y. C. Chen and J. J. Corso. On the effects of normalization in adaptive MRF Hierarchies. Computational Modeling of
Objects Represented in Images, pages 275–286, 2010.
9. Y.Tao, L. Lu, M. Dewan,A.Y. C. Chen, J. J. Corso, J. Xuan, M. Salganicoff, and A. Krishnan. Multi-level ground glass nodule
detection and segmentation in ct lung images. Medical Image Computing and Computer-Assisted Intervention, 2009.
10. A.Y.C. Chen, J.J. Corso, and L.Wang. Hops: Efficient region labeling using higher order proxy neighborhoods. In
Proceedings of IEEE International Conference on Pattern Recognition, 2008.
4. Some work done before I
caught the startup fever
Freestyle Sketching Stage
AirTouch waits in background
for the initialization signal
Initialize
Terminate
Output
image
database
Start:
Results
CBIR
query
Airtouch HCI interface for Content-based Image Retrieval
5. Interactive Segmentation & Classification
• Segmentation then classification:
• computationally more efficient,
• results in much higher classification accuracy.
• Pioneered the “pixel label propagation” field.
• First to utilize superpixels and supervoxels for the task.
FG
Traditional Spatial
Propagation
Pixel label map
Label a subset of pixels
BG
Spatio-temporal Propagation
time
6. Image/Video Object Recognition
and Content Understanding
approaches
person carries
gives
recieves
Ontology
object
Person 1
Person 1Person 2
High-Level
Mid-Level
approach
activity
receives gives
carries
activity
activity activity
Time
Reasoning
x
x
x
Low-Level
x x
x
x
7. Learning and Adapting Optimal
Classifier Parameters
subspace B
subspace
A
subspace
C
Image-level feature space
priors
Patch-level feature space
posterior
probability
suggest optimal
parameter configuration
8. Graphical Models and
Stochastic Optimization
A
(a) The space-time volume of a
video showing the objects
(A--F) and their appearing
time-span.
space
time
A
B
C
D
E
F
B E
F
C
D
(b) The temporal relationship
graph. An edge between
two vertices mean that the
two objects overlap in time.
(c) The goal is: cover all objects
with the smallest number of
"ground truth key frames".
space
time
A
B
C
D
E
F
key 1 key 2
A
B E
F
C
D
(d) This translates to: iteratively
solving the max clique
problem until all vertices
belong to a clique.
A
B E
F
C
D
key 2
key 1
frame t-1 frame t
layer n layer n
layer n+1 layer n+1
Temporal
Shift
Shift
µ
9. Medical Imaging and
Geospatial Imaging
GNN detection and
segmentation
in Lung CT geospatial imaging:
building detection
Brain tumor detection and
segmentation in MR images.
10. Why Risk to Innovate?
• Good business model NEVER last forever.
• Average “shelf life” on S&P 500: 20 years.
• 100-year old companies constantly reinvent
themselves every 10-20 years
• Startups contribute to 20% of USA’s GDP.
11. The Death of a Good
Business Model
• Foxconn 20 year revenue v.s. net profit (now at 5%)
12. What do 100 year old
corporations do?
GE Schenectady, 1896
13. History of change at GE
• 1886: one of the 12 original companies on the Dow
Jone Industrial Average (also the only one remaining).
• 1889: lightbulbs
• 1919: radios
• 1927: TV
• 1941: jet engine
• 1960: nuclear power
• 1971: room AC units
• 1995: MRI
14. History of change at IBM
• 1960s: mainframe computer
• 1980s: personal computer
• 2000s: integrated solutions
• 2020s: AI, Watson
24. Now, again, do we want to
do OEM/ODM forever?
Optimizing an old business model
is just delaying its eventual death.
25. Startups
• A company, partnership, or temporary
organization designed to search for a new,
repeatable and scalable business model.
26. Your Idea
• Are you passionate about it?
• Is it disruptive enough?
• What is your business plan?
• What is it?
• Can it make money?
• What is the future of the idea?
• What is your competitive advantage?
• How do you build up your entry barrier?
30. Prototype
• Hack out a prototype
• Spend 2-10 weeks max.
• Investors are much more likely to fund you if
you have a minimal initial version of your idea.
• Hackathons are a good place to start.
• Iteratively improve the prototype
36. Brief History
Marvin Minsky
“In 1966, Minsky hired a first-year undergraduate
student and assigned him a problem to solve over the
summer: connect a television camera to a computer
and get the machine to describe what it sees.”
Gerald Sussman
The student never worked on
Computer Vision problems again.
37. Brief History
• 1960’s: interpretation of synthetic worlds
• 1970’s: some progress on interpreting selected images
• 1980’s: ANNs come and go; shift toward geometry and increased
mathematical rigor
• 1990’s: face recognition; statistical analysis in vogue
• 2000’s: broader recognition; large annotated datasets available; video
processing starts
Guzman ‘68 Ohta Kanade ‘78 Turk and Pentland ‘91
55. What alg. should I use then?
• How much data do we have?
• What objects are we trying to detect?
• For example, Google’s DNN trained with 11k images
over 20 classes in 2013 doesn’t always beat DPM.
0
0.15
0.3
0.45
0.6
aero bike bird boat bottle bus car cat chair cow
0
0.15
0.3
0.45
0.6
dog horse m-bike person plant sheep sofa table train TV
D
N
N
D
P
M
56. ML alg. and their Applications
• Deep
Learning
• Markovian/
Bayesian
• Feature
Matching
• Other ML
methods
57. Meta-Learning
• Different use
cases calls for
different ML
algorithms.
• Meta-Learning:
learning how to
learn.
• Requires plenty of
domain-specific
know-how.
68. Face Verification and Identification,
Labeled Faces in the Wild (LFW)
Recognition
Accuracy:
• 1 to 1: 99%+
• 1 to 100: 90%
• 1 to 10,000:
50%-70%.
• 1 to 1M: 30%.
LFW dataset, common FN↑, FP↓
111. Other Applications in
Business Intelligence
• Measure brand exposure.
• Measure sponsorship effectiveness.
• Loss prevention and retail layout optimization.
118. Issues
• Highly anticipated, highly acclaimed, but small
crowd at $500 a license.
• Adobe Photoshop monopoly and the “not
invented here” syndrome.
• Adobe’s arch-rival, Corel (Corel Draw, Paint
Shop Pro, Ulead PhotoImpact) was DYING and
asked too much from the botched deal.
119. Have fun scribbling out your
shadows in photoshop!
Poor Bob from Adobe wasted 9 minutes removing just 1 shadow
122. Retrospect
• 20 researchers burned 25 million in 8 years;
investors got 50 patents in return, period.
• Overestimated the total addressable market
size, in a market with existing monopoly.
• Many missed opportunities. Counterexample of
the lean startup model.
124. Satellite/Aerial Imagery Analysis
• 40cm resolution at 30fps for 90 sec for any location on earth.
• One LEO satellite revisits any place on Earth every 3 days.
• Need 24 satellites to revisit any place on Earth every 3 hours.
125. Challenges for Single satellite depth
estimation and 3D reconstruction
• At 30fps, a LEO satellite
travels 250m between two
consecutive frames —>
theoretically sufficient for
cm-level depth estimation.
• Sources of Noise:
• Camera distortions
• Atmospheric Disturbance
• Ground vegetation
• Sub-pixel sampling noise
1
2
126. What happened?
• B2B customers takes too long to strike deals.
• Google ate us alive in just 3 months, while we
were still pitching for VC-funding with our
prototype.
128. Retrospect
• Growth pains expanding from intelligence
community clients to advertisement clients.
• Forming the right team of engineers and
researchers and moving at the right pace.
• For any Computer Vision/Machine Learning
company:
• Researchers that cannot program—> OUT
• Engineers that don’t know math —> OUT
135. Challenges Encountered
Along the Way
• From Product Recognition in Images, to Face,
Logo, Object, Scene recognition in Videos.
• Number of Categories
• Recognition Accuracy
• Recognition Speed
• System Architecture
• Business Model
136. Viscovery’s Edge
• Market: first mover’s advantage in China’s video
streaming market.
• Speed: we built the whole VDS thing in a few months!
• Team: You! Seriously!
• Technology:
• Depth
• Breadth
• Cloud
• Customizability
• Self-Learning
137. Life is not all rosy at startups
• High Risk, High Pressure, High Uncertainty!
• Resources are scarce, but you MUST DELIVER!
• Forming your all-star team is not that easy…
• Focus, and persistence.
145. The Goldilocks zone of innovation
Business
Relevance
Academic
Relevance
plentiful resources; hierarchical organization
lack of resources; responsive organization
traditional corporations
talking “innovation”
corporate research
startups struggling to survive
academic spinoffs
MSR
翟本橋:never worked a single day in my life
example: Tivo disrupts TV market / creates DVR market
example: Facebook, Twitter disrupt online social networking
example: FourSquare creates location-based "check in" ad market