State of the Art Innovations in
Computer Vision
Christian Siagian
DataCon LA
August 16, 2019
Presentation Structure
• 10 minutes background to set the information
• 20 minutes current Computer Vision topics
• 10 minutes summary and questions
My Background
• Academic:
– Publications in Computer Vision, Robotic Vision,
Human Vision
– Beobot 2.0:
• parallel high-performing robotics vision mobile
platform
• full software architecture with vision localization and
navigation
• Start Up: AIO Robotics, inc.
– fully integrated 3D printer, scanner, editor, object
search
– 2 patents and CES Innovation Awards 2016 & 2017
• Start Up: Eyenuk, inc. Medical Deep Learning
– retinal image lesion detection and segmentation
– end-to-end robotic system to automate eye
screening, monitoring, diagnosis, reporting
– Patent and Grant applications
• Competition Robotics:
– Robocup Soccer Robot & AUVSI Autonomous
Submarine
• Teaching:
– After school robotics program, USC robotics courses
• Learning:
– Academics, sports journalism, nutrition, art, music
Artificial Intelligence
• Fields: Machine Learning (ML), Computer
Vision (CV), Natural Language Processing
(NLP), and Robotics
– Digitally and in real world
– They are connected for particular applications
• We will focus on Computer Vision and related
topics
Connection with Data Science
• Computer Vision (CV) processes raw data to be used for
data science
• Raw input data: images (regular cameras, heat cameras,
etc), texts, audio
– These data do not have direct semantic meaning:
• Not measuring specific (or isolated) characteristic
– Create models to understand what is in the images, etc.
• Advantage of raw data:
– General purpose/richer source of information
– Target events can be obtained by further processing later
– Less reliant on manual entry, more natural interactions (with
customers)
Connection with Data Science
• Disadvantage of raw data:
– Systems/Infrastructure (hardware & software
environment tools): are expensive
– Models: are more complex
– Data: are of higher dimension, massive, and need
data annotations (for learning)
Deep Learning: AlexNet 2012
• Trying to solve Object Recognition:
– Given an image (massive number of
pixels), determine the object (1
label)
• Have labeled training dataset,
would like to learn a function of the
mapping
• Data should encapsulate invariance
in the presence of:
– Appearance
– Interaction with the world
– Perspective (2D – 3D), including
size
– Occlusion
– Lighting
Deep Learning: AlexNet 2012
• Data: CalTech 101, ImageNet: 2005
– 1,000,000 images (1000 categories,
1000 image/categories)
– The set of all objects in real life is in
the thousands
• Model: 1989
– Convolutional Neural Network:
Yann LeCun 1989: MNIST digit
recognition
– Deep network that jointly trains
both the feature extraction and
classification stage
• Systems/Infrastructure: 2010
– From Video games (Sony
PlayStations): GPU, CUDA: 60 – 100
times speed up
• BLOG:
https://adeshpande3.github.io/adeshpande3.github.io/A-
Beginner's-Guide-To-Understanding-Convolutional-Neural-
Networks/
Data-driven features within a
compositional architecture
Solving Other Computer Vision
Problems
• The data-driven features is key in moving efforts for ALMOST ALL
other difficult Computer Vision tasks forward
– Note: Basic single image object/person/background recognition has
moved to Enterprise AI (e.g. Amazon Rekognition)
– Mature tasks, such as tracking are available in many free libraries
(OpenCV, etc.)
• Complex algorithms hinges on: architecture & training
– Papers focus on architecture, training is tribal knowledge
• Whether the data is noisy
• Do we need more data
• Training regiment: hyper-parameter grid, fine-tuning, multiple stages, etc.
• Visualization
• Evaluation
Solving Other Computer Vision
Problems
• Additional key concepts
in architecture:
– Adding dependencies to
the past (recurrence):
• Recurrence Neural
Network (RNN)
Long range dependency:
“When I was in Paris I got
lost because I couldn’t
ask for directions in
_____”
Solving Other Computer Vision
Problems
• Additional key concepts in
architecture:
– Adding dependencies to the
past (recurrence):
• Recurrence Neural Network
(RNN)
– Undoing the dimensional
collapse to get more
details:
• Fully Convolutional
Network
Segmentation tasks, Neural
Network visualization
Solving Other Computer Vision
Problems
• Additional key concepts in
architecture:
– Adding dependencies to the past
(recurrence):
• Recurrence Neural Network (RNN)
– Undoing the dimension collapse:
• Fully Convolutional Network
– Using multiple networks:
• Joint Learning: jointly learn inter-
related tasks
• Generative Adversarial Network
(GAN): learning using competing
networks
Learning jointly can provide benefits
of improved individual task
performance
GAN is used for synthetic data
generation
Contemporary Computer Vision
• Topics:
– Deep Learning Theory: accuracy, efficiency
– Recognition: robustness, more detail, larger context
– Reconstruction: WILL NOT DELVE DEEP INTO THIS
• 6DOF pose, clothing, hair, light, deformation, mesh, depth, joint
• GAN is moving forward: generates control signals at multiple layers
– https://www.youtube.com/watch?v=kSLJriaOumA
• Inputs:
– Images, Videos, 3D data, special cameras (thermal, event cameras)
– Video and: audio, text (language), robots
• Applications:
– Medical
– Robots: language/semantic navigation, interacting with object
Deep Learning Theory
• Graph Neural Networks
– Relationships: objects, joints
• Few shot, one shot, zero shot
learning. Weakly/un supervised
Learning
• measure uncertainty & class
imbalance
• Active/online Learning
• open-set learning
• Architectural search:
• Component analyses:
– RELU, Augmentation strategy
• Resources allocation/compression
• Stability/sensitivity/adversarial
Deep Learning Theory
• Graph Convolutional Networks [https://arxiv.org/pdf/1609.02907.pdf]
– http://openaccess.thecvf.com/content_CVPR_2019/papers/Kim_Edge-Labeling_Graph_Neural_Network_for_Few-
Shot_Learning_CVPR_2019_paper.pdf
• Few shot, one shot, zero shot learning. Weakly/un
supervised Learning
– http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Few-Shot_Adaptive_Faster_R-CNN_CVPR_2019_paper.pdf
• Active Learning
• measure uncertainty & class imbalance
– http://openaccess.thecvf.com/content_CVPR_2019/papers/Khan_Striking_the_Right_Balance_With_Uncertainty_CVPR_2019_paper.pdf
• Online learning, open-set
• Architectural search:– http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Auto-DeepLab_Hierarchical_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2019_paper.pdf
• Component analyses:
– RELU, Augmentation strategy
– http://openaccess.thecvf.com/content_CVPR_2019/papers/Cubuk_AutoAugment_Learning_Augmentation_Strategies_From_Data_CVPR_2019_paper.pdf
• Resources allocation/compression:– http://openaccess.thecvf.com/content_CVPR_2019/papers/Qiao_Neural_Rejuvenation_Improving_Deep_Network_Training_by_Enhancing_Computational_Resource_CVPR_2019_paper.pdf
• Stability/sensitivity/adversarial
Recognition
• Image: detection, recognition,
segmentation, landmarking,
identification in the crowd/wild:
– Face, hand & body pose estimation
• Skeleton, joint localization
• Dense pose
– Panoptic segmentation, RCNN-family
• Video: (person, object, background,
and combination):
– Action Recognition (1 person):
• most active in recognition
• Still in 80 actions: space of actions is
unknown
• Segmenting action in the wild,
simultaneous multiple actions is difficult
– Social relationship (multiple person):
– Video Object segmentation Faster R-
CNN, etc (multiple object)
– Surveillance: tracking & Re-identification
Recognition, cont.
• Visual Question
Answering (VQA): words
& image connection:
– Visual dialog
– Video Captioning
• Video and Audio:
– Audio video event
recognition
– Video enhancement:
diarization
Overarching Trends
• Datasets dictates
research activity
– Largest datasets are from
large entities (Facebook,
Google Deep Mind, etc.)
– Examples:
• Cityscapes: Dashboard
Cam: Segmentation:
semantic, instance
• COCO datasets:
Segmentation: semantic,
instance
• Kinetics Human Action
Dataset
• Social interaction capture:
CMU
• Person Re-identification
Trends/Predictions Moving Forward
• Smaller manually-annotated dataset training catches
up in performance
– Few, one, no shot training
– mixed use real & synthetic data
• Grounded recognition and reconstruction (adding
more modules to solve a problem robustly):
– Image: recognition – segmentation (panoptic) – 3D object
reconstruction – space understanding
– Video: pose estimation – action recognition – action
forecasting – reconstruction
• The next superior building block should direct the field
again (following SIFT 2004, and DL features 2012)
How Do We Apply All These
Information?
• Have a working knowledge of the ML/CV
fundamentals:
– theory, software, hardware, models (CNN, RNN)
• Start with your use-case:
– find keywords in the papers
– search blogs for definition, background
• Run the open-source code
– Understand the limitations
– Are they acceptable to your business?

Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Christian Siagian

  • 1.
    State of theArt Innovations in Computer Vision Christian Siagian DataCon LA August 16, 2019
  • 2.
    Presentation Structure • 10minutes background to set the information • 20 minutes current Computer Vision topics • 10 minutes summary and questions
  • 3.
    My Background • Academic: –Publications in Computer Vision, Robotic Vision, Human Vision – Beobot 2.0: • parallel high-performing robotics vision mobile platform • full software architecture with vision localization and navigation • Start Up: AIO Robotics, inc. – fully integrated 3D printer, scanner, editor, object search – 2 patents and CES Innovation Awards 2016 & 2017 • Start Up: Eyenuk, inc. Medical Deep Learning – retinal image lesion detection and segmentation – end-to-end robotic system to automate eye screening, monitoring, diagnosis, reporting – Patent and Grant applications • Competition Robotics: – Robocup Soccer Robot & AUVSI Autonomous Submarine • Teaching: – After school robotics program, USC robotics courses • Learning: – Academics, sports journalism, nutrition, art, music
  • 4.
    Artificial Intelligence • Fields:Machine Learning (ML), Computer Vision (CV), Natural Language Processing (NLP), and Robotics – Digitally and in real world – They are connected for particular applications • We will focus on Computer Vision and related topics
  • 5.
    Connection with DataScience • Computer Vision (CV) processes raw data to be used for data science • Raw input data: images (regular cameras, heat cameras, etc), texts, audio – These data do not have direct semantic meaning: • Not measuring specific (or isolated) characteristic – Create models to understand what is in the images, etc. • Advantage of raw data: – General purpose/richer source of information – Target events can be obtained by further processing later – Less reliant on manual entry, more natural interactions (with customers)
  • 6.
    Connection with DataScience • Disadvantage of raw data: – Systems/Infrastructure (hardware & software environment tools): are expensive – Models: are more complex – Data: are of higher dimension, massive, and need data annotations (for learning)
  • 7.
    Deep Learning: AlexNet2012 • Trying to solve Object Recognition: – Given an image (massive number of pixels), determine the object (1 label) • Have labeled training dataset, would like to learn a function of the mapping • Data should encapsulate invariance in the presence of: – Appearance – Interaction with the world – Perspective (2D – 3D), including size – Occlusion – Lighting
  • 8.
    Deep Learning: AlexNet2012 • Data: CalTech 101, ImageNet: 2005 – 1,000,000 images (1000 categories, 1000 image/categories) – The set of all objects in real life is in the thousands • Model: 1989 – Convolutional Neural Network: Yann LeCun 1989: MNIST digit recognition – Deep network that jointly trains both the feature extraction and classification stage • Systems/Infrastructure: 2010 – From Video games (Sony PlayStations): GPU, CUDA: 60 – 100 times speed up • BLOG: https://adeshpande3.github.io/adeshpande3.github.io/A- Beginner's-Guide-To-Understanding-Convolutional-Neural- Networks/
  • 9.
    Data-driven features withina compositional architecture
  • 10.
    Solving Other ComputerVision Problems • The data-driven features is key in moving efforts for ALMOST ALL other difficult Computer Vision tasks forward – Note: Basic single image object/person/background recognition has moved to Enterprise AI (e.g. Amazon Rekognition) – Mature tasks, such as tracking are available in many free libraries (OpenCV, etc.) • Complex algorithms hinges on: architecture & training – Papers focus on architecture, training is tribal knowledge • Whether the data is noisy • Do we need more data • Training regiment: hyper-parameter grid, fine-tuning, multiple stages, etc. • Visualization • Evaluation
  • 11.
    Solving Other ComputerVision Problems • Additional key concepts in architecture: – Adding dependencies to the past (recurrence): • Recurrence Neural Network (RNN) Long range dependency: “When I was in Paris I got lost because I couldn’t ask for directions in _____”
  • 12.
    Solving Other ComputerVision Problems • Additional key concepts in architecture: – Adding dependencies to the past (recurrence): • Recurrence Neural Network (RNN) – Undoing the dimensional collapse to get more details: • Fully Convolutional Network Segmentation tasks, Neural Network visualization
  • 13.
    Solving Other ComputerVision Problems • Additional key concepts in architecture: – Adding dependencies to the past (recurrence): • Recurrence Neural Network (RNN) – Undoing the dimension collapse: • Fully Convolutional Network – Using multiple networks: • Joint Learning: jointly learn inter- related tasks • Generative Adversarial Network (GAN): learning using competing networks Learning jointly can provide benefits of improved individual task performance GAN is used for synthetic data generation
  • 14.
    Contemporary Computer Vision •Topics: – Deep Learning Theory: accuracy, efficiency – Recognition: robustness, more detail, larger context – Reconstruction: WILL NOT DELVE DEEP INTO THIS • 6DOF pose, clothing, hair, light, deformation, mesh, depth, joint • GAN is moving forward: generates control signals at multiple layers – https://www.youtube.com/watch?v=kSLJriaOumA • Inputs: – Images, Videos, 3D data, special cameras (thermal, event cameras) – Video and: audio, text (language), robots • Applications: – Medical – Robots: language/semantic navigation, interacting with object
  • 15.
    Deep Learning Theory •Graph Neural Networks – Relationships: objects, joints • Few shot, one shot, zero shot learning. Weakly/un supervised Learning • measure uncertainty & class imbalance • Active/online Learning • open-set learning • Architectural search: • Component analyses: – RELU, Augmentation strategy • Resources allocation/compression • Stability/sensitivity/adversarial
  • 16.
    Deep Learning Theory •Graph Convolutional Networks [https://arxiv.org/pdf/1609.02907.pdf] – http://openaccess.thecvf.com/content_CVPR_2019/papers/Kim_Edge-Labeling_Graph_Neural_Network_for_Few- Shot_Learning_CVPR_2019_paper.pdf • Few shot, one shot, zero shot learning. Weakly/un supervised Learning – http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Few-Shot_Adaptive_Faster_R-CNN_CVPR_2019_paper.pdf • Active Learning • measure uncertainty & class imbalance – http://openaccess.thecvf.com/content_CVPR_2019/papers/Khan_Striking_the_Right_Balance_With_Uncertainty_CVPR_2019_paper.pdf • Online learning, open-set • Architectural search:– http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Auto-DeepLab_Hierarchical_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2019_paper.pdf • Component analyses: – RELU, Augmentation strategy – http://openaccess.thecvf.com/content_CVPR_2019/papers/Cubuk_AutoAugment_Learning_Augmentation_Strategies_From_Data_CVPR_2019_paper.pdf • Resources allocation/compression:– http://openaccess.thecvf.com/content_CVPR_2019/papers/Qiao_Neural_Rejuvenation_Improving_Deep_Network_Training_by_Enhancing_Computational_Resource_CVPR_2019_paper.pdf • Stability/sensitivity/adversarial
  • 17.
    Recognition • Image: detection,recognition, segmentation, landmarking, identification in the crowd/wild: – Face, hand & body pose estimation • Skeleton, joint localization • Dense pose – Panoptic segmentation, RCNN-family • Video: (person, object, background, and combination): – Action Recognition (1 person): • most active in recognition • Still in 80 actions: space of actions is unknown • Segmenting action in the wild, simultaneous multiple actions is difficult – Social relationship (multiple person): – Video Object segmentation Faster R- CNN, etc (multiple object) – Surveillance: tracking & Re-identification
  • 18.
    Recognition, cont. • VisualQuestion Answering (VQA): words & image connection: – Visual dialog – Video Captioning • Video and Audio: – Audio video event recognition – Video enhancement: diarization
  • 19.
    Overarching Trends • Datasetsdictates research activity – Largest datasets are from large entities (Facebook, Google Deep Mind, etc.) – Examples: • Cityscapes: Dashboard Cam: Segmentation: semantic, instance • COCO datasets: Segmentation: semantic, instance • Kinetics Human Action Dataset • Social interaction capture: CMU • Person Re-identification
  • 20.
    Trends/Predictions Moving Forward •Smaller manually-annotated dataset training catches up in performance – Few, one, no shot training – mixed use real & synthetic data • Grounded recognition and reconstruction (adding more modules to solve a problem robustly): – Image: recognition – segmentation (panoptic) – 3D object reconstruction – space understanding – Video: pose estimation – action recognition – action forecasting – reconstruction • The next superior building block should direct the field again (following SIFT 2004, and DL features 2012)
  • 21.
    How Do WeApply All These Information? • Have a working knowledge of the ML/CV fundamentals: – theory, software, hardware, models (CNN, RNN) • Start with your use-case: – find keywords in the papers – search blogs for definition, background • Run the open-source code – Understand the limitations – Are they acceptable to your business?

Editor's Notes

  • #4 Evaluation - Sports Recruiting Self Improvements Robotics, Computer Vision, ML, AI,  robotics is sensor driven & bayesian model
  • #5 The future is in this field
  • #6 Flight cameras, Won’t talk a lot on Infrastructure,
  • #7 Flight cameras, Won’t talk a lot on Infrastructure,
  • #10 Edges, texture, more complex textures, objects
  • #12 creating new item from distribution Training
  • #13 creating new item from distribution Training
  • #14 Training can be difficult
  • #15 GAN paper: http://openaccess.thecvf.com/content_CVPR_2019/papers/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.pdf Animation from Single image Robotics: grasping http://openaccess.thecvf.com/content_CVPR_2019/papers/Huang_Neural_Task_Graphs_Generalizing_to_Unseen_Tasks_From_a_Single_CVPR_2019_paper.pdf “Three Strong Accept” paper: semantic navigation: in the kitchen Interacting with people http://openaccess.thecvf.com/content_CVPR_2019/papers/Wortsman_Learning_to_Learn_How_to_Learn_Self-Adaptive_Visual_Navigation_Using_CVPR_2019_paper.pdf
  • #16 Architectural search: Network comprised on spatial computation & within layer computation Scaling policies: http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_ELASTIC_Improving_CNNs_With_Dynamic_Scaling_Policies_CVPR_2019_paper.pdf
  • #17 Architectural search: Network comprised on spatial computation & within layer computation Scaling policies: http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_ELASTIC_Improving_CNNs_With_Dynamic_Scaling_Policies_CVPR_2019_paper.pdf
  • #18 http://ikea.csail.mit.edu/ Pose estimation is moving forward with dense pose http://densepose.org/ https://github.com/facebookresearch/DensePose http://openaccess.thecvf.com/content_CVPR_2019/papers/Guler_HoloPose_Holistic_3D_Human_Reconstruction_In-The-Wild_CVPR_2019_paper.pdf Pose estimation: Hand & Pose http://openaccess.thecvf.com/content_CVPR_2019/papers/Ge_3D_Hand_Shape_and_Pose_Estimation_From_a_Single_RGB_CVPR_2019_paper.pdf http://openaccess.thecvf.com/content_CVPR_2019/papers/Pavllo_3D_Human_Pose_Estimation_in_Video_With_Temporal_Convolutions_and_CVPR_2019_paper.pdf Mask-R-CNN http://openaccess.thecvf.com/content_CVPR_2019/papers/Huang_Mask_Scoring_R-CNN_CVPR_2019_paper.pdf Panoptic Segmentation [https://arxiv.org/pdf/1801.00868.pdf] Action recognition: Flow Representation http://openaccess.thecvf.com/content_CVPR_2019/papers/Piergiovanni_Representation_Flow_for_Action_Recognition_CVPR_2019_paper.pdf Video Salient object detection http://openaccess.thecvf.com/content_CVPR_2019/papers/Fan_Shifting_More_Attention_to_Video_Salient_Object_Detection_CVPR_2019_paper.pdf Object Relationship: http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhan_On_Exploring_Undetermined_Relationships_for_Visual_Relationship_Detection_CVPR_2019_paper.pdf Video Classification: http://openaccess.thecvf.com/content_CVPR_2019/papers/Bhardwaj_Efficient_Video_Classification_Using_Fewer_Frames_CVPR_2019_paper.pdf Relationship: http://openaccess.thecvf.com/content_CVPR_2019/papers/Sun_Relational_Action_Forecasting_CVPR_2019_paper.pdf Performance/Action Quality http://openaccess.thecvf.com/content_CVPR_2019/papers/Parmar_What_and_How_Well_You_Performed_A_Multitask_Learning_Approach_CVPR_2019_paper.pdf http://openaccess.thecvf.com/content_CVPR_2019/papers/Doughty_The_Pros_and_Cons_Rank-Aware_Temporal_Attention_for_Skill_Determination_CVPR_2019_paper.pdf
  • #19 http://ikea.csail.mit.edu/ Pose estimation is moving forward with dense pose http://densepose.org/ https://github.com/facebookresearch/DensePose http://openaccess.thecvf.com/content_CVPR_2019/papers/Guler_HoloPose_Holistic_3D_Human_Reconstruction_In-The-Wild_CVPR_2019_paper.pdf Pose estimation: Hand & Pose http://openaccess.thecvf.com/content_CVPR_2019/papers/Ge_3D_Hand_Shape_and_Pose_Estimation_From_a_Single_RGB_CVPR_2019_paper.pdf http://openaccess.thecvf.com/content_CVPR_2019/papers/Pavllo_3D_Human_Pose_Estimation_in_Video_With_Temporal_Convolutions_and_CVPR_2019_paper.pdf Mask-R-CNN http://openaccess.thecvf.com/content_CVPR_2019/papers/Huang_Mask_Scoring_R-CNN_CVPR_2019_paper.pdf Action recognition: Flow Representation http://openaccess.thecvf.com/content_CVPR_2019/papers/Piergiovanni_Representation_Flow_for_Action_Recognition_CVPR_2019_paper.pdf Video Salient object detection http://openaccess.thecvf.com/content_CVPR_2019/papers/Fan_Shifting_More_Attention_to_Video_Salient_Object_Detection_CVPR_2019_paper.pdf Object Relationship: http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhan_On_Exploring_Undetermined_Relationships_for_Visual_Relationship_Detection_CVPR_2019_paper.pdf Video Classification: http://openaccess.thecvf.com/content_CVPR_2019/papers/Bhardwaj_Efficient_Video_Classification_Using_Fewer_Frames_CVPR_2019_paper.pdf Relationship: http://openaccess.thecvf.com/content_CVPR_2019/papers/Sun_Relational_Action_Forecasting_CVPR_2019_paper.pdf Performance/Action Quality http://openaccess.thecvf.com/content_CVPR_2019/papers/Parmar_What_and_How_Well_You_Performed_A_Multitask_Learning_Approach_CVPR_2019_paper.pdf http://openaccess.thecvf.com/content_CVPR_2019/papers/Doughty_The_Pros_and_Cons_Rank-Aware_Temporal_Attention_for_Skill_Determination_CVPR_2019_paper.pdf
  • #21 Larger context in visual reasoning from language: https://arxiv.org/abs/1906.08237 https://github.com/zihangdai/xlnet http://openaccess.thecvf.com/content_CVPR_2019/papers/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.pdf
  • #22  CityscapesDataset: https://www.cityscapes-dataset.com/ COCO datasets: http://cocodataset.org Kinetics Human Action Dataset: https://deepmind.com/research/open-source/kinetics Panoptic Studio Dataset: https://www.cs.cmu.edu/~hanbyulj/panoptic-studio/ Person Reidentification Dataset: https://amberer.gitlab.io/papers_in_ai/person-reid.html
  • #23 Reconstruction: mechanistic understanding Vs. Recognition: discriminative, deeper understanding
  • #24 Academics try to solve the general problem