The document discusses applying the sum of absolute differences (SAD) algorithm to perform stereo matching on images using disparity maps. Key points include:
- The author proposes using a neighbor-based ranking approach to simplify pixel comparisons for SAD, addressing issues with intensity variations.
- An initial test of the approach yielded promising naive results, though the training data set was very small.
- Further experiments increasing the size and quality of the training data set showed accuracy rates from 66-91% for k-nearest neighbor classification, providing evidence the method is viable.
- However, more work is needed to improve feature selection and evaluation to fully validate the approach.
Machine Learning without the Math: An overview of Machine LearningArshad Ahmed
A brief overview of Machine Learning and its associated tasks from a high level. This presentation discusses key concepts without the maths.The more mathematically inclined are referred to Bishops book on Pattern Recognition and Machine Learning.
Netflix is the world’s leading Internet television network with over 48 million members in more than 40 countries enjoying more than one billion hours of TV shows and movies per month, including original series. Netflix uses machine learning to deliver a personalized experience to each one of our 48 million users.
In this talk you will hear about the machine learning algorithms that power almost every part of the Netflix experience, including some of our recent work on distributed Neural Networks on AWS GPUs. You will also get an insight into the innovation approach that includes offline experimentation and online AB testing. Finally, you will learn about the system architectures that enable all of this at a Netflix scale.
Neural networks are an excellent way of mapping past observations to a functional model. Many researchers have been able to build tools to recognize handwriting, or even jaundice detection.
While Neural Networks are powerful they still are somewhat of a mystery to many. This talk aims to explain neural networks in a test driven way. We'll write tests first and go through how to build a neural network to determine what language a sentence is.
By the end of this talk you'll know how to build neural networks with tests!
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...AI Frontiers
The market is already demonstrating strong value in the home for voice-activated AI, but the work environment is yet to catch up. Omar will explain why voice-activated AI is the most important development to come to the workplace. He will pull from his experiences creating Eva, the first enterprise voice assistant focused on making meetings more actionable, and dive specifically into the challenges of ASR (Automatic Speech Recognition), NLP and neural networks in creating these kinds of voice-activated assistants. He will share how his team have overcome these challenges.
Machine Learning without the Math: An overview of Machine LearningArshad Ahmed
A brief overview of Machine Learning and its associated tasks from a high level. This presentation discusses key concepts without the maths.The more mathematically inclined are referred to Bishops book on Pattern Recognition and Machine Learning.
Netflix is the world’s leading Internet television network with over 48 million members in more than 40 countries enjoying more than one billion hours of TV shows and movies per month, including original series. Netflix uses machine learning to deliver a personalized experience to each one of our 48 million users.
In this talk you will hear about the machine learning algorithms that power almost every part of the Netflix experience, including some of our recent work on distributed Neural Networks on AWS GPUs. You will also get an insight into the innovation approach that includes offline experimentation and online AB testing. Finally, you will learn about the system architectures that enable all of this at a Netflix scale.
Neural networks are an excellent way of mapping past observations to a functional model. Many researchers have been able to build tools to recognize handwriting, or even jaundice detection.
While Neural Networks are powerful they still are somewhat of a mystery to many. This talk aims to explain neural networks in a test driven way. We'll write tests first and go through how to build a neural network to determine what language a sentence is.
By the end of this talk you'll know how to build neural networks with tests!
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...AI Frontiers
The market is already demonstrating strong value in the home for voice-activated AI, but the work environment is yet to catch up. Omar will explain why voice-activated AI is the most important development to come to the workplace. He will pull from his experiences creating Eva, the first enterprise voice assistant focused on making meetings more actionable, and dive specifically into the challenges of ASR (Automatic Speech Recognition), NLP and neural networks in creating these kinds of voice-activated assistants. He will share how his team have overcome these challenges.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-warden
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Pete Warden, Google research engineer and the tech lead of the TensorFlow Mobile and Embedded team, presents the "Solving Vision Tasks Using Deep Learning: An Introduction" tutorial at the May 2018 Embedded Vision Summit.
This talk introduces deep learning for vision tasks. It provides an overview of deep learning, explores its weaknesses and strengths, and highlights best approaches to applying deep learning to solving vision problems. The audience will learn to think about vision problems from a different perspective, understand what questions to ask, and discover where to find the answers to these questions. The talk will conclude with insights on the challenges of deploying deep learning solutions on mobile devices.
Apache con big data 2015 - Data Science from the trenchesVinay Shukla
ApacheBigData - Budapest, 2015
Data Science from the trenches
What are the issues?
How to select best algorithm?
How to tune?
What are the problems with visualization?
How does Zeppelin help
Learning visual representation without human labelKai-Wen Zhao
Self supervised learning (SSL) is one of the most fast-growing research topic in recent years. SSL provides algorithm that directly learn visual representation from data itself rather than human manual labels. From theoretical point of view, SSL explores information theory & the nature of large scale dataset.
Computer Vision abbreviated as CV aims to teach computers to achieve human level vision capabilities. Applications of CV in self driving cars, robotics, healthcare, education and the multitude of apps that allow customers to use the smartphone cameras to convey information has made it one of the most popular fields in Artificial Intelligence. The recent advances in Deep Learning, data storage and computing capabilities has lead to the huge success of CV. There are several tasks in computer vision, such as classification, object detection, image segmentation, optical character recognition, scene reconstruction and many others.
In this presentation I will talk about applying Transfer Learning, Image classification, object detection and the metrics required to measure them on still images. The increase in accuracy over of CV tasks over the past decade is due to Convolutional Neural Networks (CNN), CNN is the base used in architectures such as RESNET or VGGNET. I will go through how to use these pre-trained models for image classification and feature extraction. One of the break throughs in object detection has come with one-shot learning, where the bounding box and the class of the object is predicted simultaneously. This leads to low latency during inference (155 frames per second) and high accuracy. This is the framework behind object detection using YOLO , I will explain how to use yolo for specific use cases.
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
Dive into the essentials of ML model development, processes, and techniques to combat underfitting and overfitting, explore distributed training approaches, and understand model explainability. Enhance your skills with practical insights from a seasoned expert.
This talk will feature: memcache, resque, a bit of metaprogramming, a look at caching in the wild and code that fixes some usual problems, and a fairly epic SQL query with some nice Postgres features you should know about.
Module 4: Data visualization (8 hrs)
Introduction, Types of data visualization, Data for visualization: Data types, Data encodings, Retinal variables, Mapping variables to encodings, Visual encodings, Data Visualization in Python-Superset or in Microsoft Power BI
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-warden
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Pete Warden, Google research engineer and the tech lead of the TensorFlow Mobile and Embedded team, presents the "Solving Vision Tasks Using Deep Learning: An Introduction" tutorial at the May 2018 Embedded Vision Summit.
This talk introduces deep learning for vision tasks. It provides an overview of deep learning, explores its weaknesses and strengths, and highlights best approaches to applying deep learning to solving vision problems. The audience will learn to think about vision problems from a different perspective, understand what questions to ask, and discover where to find the answers to these questions. The talk will conclude with insights on the challenges of deploying deep learning solutions on mobile devices.
Apache con big data 2015 - Data Science from the trenchesVinay Shukla
ApacheBigData - Budapest, 2015
Data Science from the trenches
What are the issues?
How to select best algorithm?
How to tune?
What are the problems with visualization?
How does Zeppelin help
Learning visual representation without human labelKai-Wen Zhao
Self supervised learning (SSL) is one of the most fast-growing research topic in recent years. SSL provides algorithm that directly learn visual representation from data itself rather than human manual labels. From theoretical point of view, SSL explores information theory & the nature of large scale dataset.
Computer Vision abbreviated as CV aims to teach computers to achieve human level vision capabilities. Applications of CV in self driving cars, robotics, healthcare, education and the multitude of apps that allow customers to use the smartphone cameras to convey information has made it one of the most popular fields in Artificial Intelligence. The recent advances in Deep Learning, data storage and computing capabilities has lead to the huge success of CV. There are several tasks in computer vision, such as classification, object detection, image segmentation, optical character recognition, scene reconstruction and many others.
In this presentation I will talk about applying Transfer Learning, Image classification, object detection and the metrics required to measure them on still images. The increase in accuracy over of CV tasks over the past decade is due to Convolutional Neural Networks (CNN), CNN is the base used in architectures such as RESNET or VGGNET. I will go through how to use these pre-trained models for image classification and feature extraction. One of the break throughs in object detection has come with one-shot learning, where the bounding box and the class of the object is predicted simultaneously. This leads to low latency during inference (155 frames per second) and high accuracy. This is the framework behind object detection using YOLO , I will explain how to use yolo for specific use cases.
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
Dive into the essentials of ML model development, processes, and techniques to combat underfitting and overfitting, explore distributed training approaches, and understand model explainability. Enhance your skills with practical insights from a seasoned expert.
This talk will feature: memcache, resque, a bit of metaprogramming, a look at caching in the wild and code that fixes some usual problems, and a fairly epic SQL query with some nice Postgres features you should know about.
Module 4: Data visualization (8 hrs)
Introduction, Types of data visualization, Data for visualization: Data types, Data encodings, Retinal variables, Mapping variables to encodings, Visual encodings, Data Visualization in Python-Superset or in Microsoft Power BI
3. S.A.D. What?
• Computer vision
– Linear Algebra out the wazu!
• Simultaneous Localization and Mapping
• Real-time Point Cloud Processing and Streaming
• Polygon and Voxel Reconstruction
• Movement and Depth Processing
• SPIN object recognition
• Etcetera…
4. Plenty of data for all!
• A plethora of information is at everyone’s
fingertips. More pixels than we know what to
do with. Unless you’re Google training their
cat seeking neural net.
• Some algorithms are not too easy to
implement from scratch or on the fly…
– There are two difficulties in CS. Cache Invalidation,
naming conventions, and off by one errors
5. Why S.A.D?
• It’s elementary and needs no computer vision
background
• I help my high-school robotics club learn programming
• Not impossible to implement as portable code. I’m a C
kind of guy. I’ll have to write a portable version that
errors more than my non SAD implementations.
• It’s fast enough
• This is one of the more error prone algorithms due to
well documented errors
• DON’T SEE THAT EVERYDAY! I hope everyone can take
something away from this
8. It’s Simple. It’s useful.
• Multiple disparities can be used to determine
3D points in space. Be it distance,
reconstruction, or more.
I used triangle “zipping” reconstruction
so it isn’t smooth, but faaaaast
15. Objective
• Attempt to classify the image characteristics. This
could be useful in a lower level language. A
machine without laser optics could process
information rapidly via two cameras. This makes
the code less dense, hardware more inexpensive,
and works within reasonable parameters.
• ALSO! I’ve been wondering if I really need to use
SPIN to perform recognition on point models
which takes an eon and a half…
16. The Challenges
• Balance efficiency and performance
– If the training set is to be expanded this is critical
• Ensure the algorithm cleanses/normalizes/adjusts the
data throughout execution.
– Images are data. The pixels should be treated so when
trying to process and relate them.
• Select a suitable classification algorithm in R
– This is a class on R, something has to be in R
• Have the data formatted enough before R gets its
hands on it
– There is no need to re-loop in a higher level scripting
language
17. Gray the Images!
• RGB is definitely makes simple comparison
difficult. Intensities could mangle things
without overheard to correct this. Graying
leaves a rather elegant way to clean and
simplify inputs.
20. The decision
• Go ahead with NCC and utilize training data to
perform KKNN to boot. This representation
best suites debugging. There are already an
immense amount of steps involved to just get
the data.
22. Rectification
• Although I eventually attributed some bugs to poor
results. The NCC also appeared to be of little help for
SAD yielding dirty disparities and KKNN results of 0-
15%. For this simple version it wouldn’t do any good
nor did it cleanse the data as I required.
• It can be noticed the intensity varies greatly. I had an
epiphany! A classmate had shared with me the idea of
treating data as a binary problem when dealing with
classification. A wonderfully simple idea. This would
work. It would be fast. It would keep things simple.
23. Neighbor Based Ranking
• Generate a map. For each pixel, if a neighbor
of a window surrounding the pixel is greater
add one to the pixel’s map location.
• Use the ranks during SAD instead of the pixel
intensities.
25. Training methodology
• As in the initial attempt I would be using KKNN.
• I would manually select regions and write the
points in for processing via the main C program.
• The results were loaded to R as the deviations
between four selections similar to quartiles. Parts
of the selections were compared against
themselves producing further measures of
deviation. Finally widths of pixel ranges were
used to derive slightly more about the sections.
– This is super naïve but for the sake of proving this is
viable.
26. The Second Results
• With a very, very, very small number of points
around 50 (selecting is a tedious process)
KKNN was averaging 22-60% accuracy.
• Success?
– For such a small sample this is promising, however
not extremely conclusive.
– Time became a constraint. I underestimated the
work involved and had to act fast.
27. What I Gathered
• There holds some validity to the idea of
feature comparison through KKNN and
disparity maps
• Improvements can be made…
– Increase the data set
– Select very defined cases to represent each
feature
– Re-evaluate features and improve them
– Time!
28. First Rectification Attempt
• Use of KLT tracking to automatically populate
features
– Bugs in the code swatted this down given all I
could get working was edge detection
KLT tracking is a means to perform edge detection however can be advanced to assist
in detecting additional features.
29. Second Attempt
• Perform tedious by hand selection
– I feared this may not produce a notable amount of
points for the level of differentiation in the
pictures
30. Success!
• This yielded promising samples over 10000 iterations each
• Results...
– [1] "K 13 -> 91.666667"
– Results...
– [1] "K 13 -> 91.666667"
– Results...
– [1] "K 13 -> 66.666667"
– Results...
– [1] "K 13 -> 75.000000"
– Results...
– [1] "K 13 -> 83.333333"
– Results...
– [1] "K 13 -> 83.333333"
• This became further evidence of the viability of this method