After a quick refresher on deep learning and the composition of deep neural networks, drill down into how AirBnb, GE Healthcare, and Comma AI leverage various open source machine learning frameworks to achieve their goals. With a focus on TensorFlow, we’ll investigate the development process and decisions made by these three successful implementations of machine learning for real world applications.
WMCPA Quarterly
3. perforce.com
• This presentation is really more about the real-world application of deep learning, but we’ll spend a little time
refreshing on just what a deep neural network is, and what it isn’t
• First and foremost, Deep Neural Networks are *not* sentient robots that are going to destroy humanity – at least, not
in their current state
• The TL;DR version is that they are complicated statistical linear algebra equations that are repeated hundreds of
thousands of times, becoming (hopefully) a little more accurate each time when presented with random data
• The output of this is a mechanism that is incredibly good at recognizing complex or sophisticated patterns in data
sets which are often not discernable, at least mathematically, to a human being
• You are already interacting with trained neural networks (or robots, if you’d rather) when you use your phone’s GPS,
scroll through your news feed, or browse your automatically generated photo albums
• DNNs are enriching our lives in many ways already, and, they work best when you don’t need to think about them as
programs
Deep Neural Networks
4. perforce.com
• Neural networks are virtual or mathematical reconstructions of our own brains’ neural structure
• Our real brains are comprised of neurons, which are semi-binary cells that can output a certain state based on an
input
• These chain together, providing input to other neurons, or processing output from them
• Over time, these connections between neurons can strengthen or weaken depending on how often they are utilized,
which will change the way they process output
• By retaining this structure, our brains “learn” new patterns by physically altering the behavior of the individual
neurons
• Artificial neural networks, then, attempt to recreate this structure using linear algebra
• The base unit of this is a neuron, which is really just an algebra function that returns a state, often as a 0 or a 1, but
not always, called an “activation” function
Neurons
5. perforce.com
• When a neuron is provided a means of getting input and providing output, it becomes a “node”
• Lots of nodes pair together to form a “layer” of a network, and data is passed through multiple layers with an
associated set of weights and biases
• These layers chain together to form what we’d call a neural network, the simplest type just consisting of an input
layer, a single “hidden” layer, and an output layer
• The “magic” of deep learning lies in these hidden layers, which I like to equate to sieves with holes that vary in size
and shape
• As we pass sand through the various sieves, certain objects that don’t fit through the various sieves will be captured
• The pattern recognition happens mathematically almost the same way
• By passing random weights and trained biases and comparing the output layer to the actual, real-world data, we can
see how “distant” the random weights got us, and reduce that distance, or “cost” by training with more random
weights and real data, similar to how our neurons form stronger connections
Layers and layers…
6. perforce.com
• A simple neural network with one hidden layer
• When a neural network has a lot of hidden layers, we call it a deep neural network (DNN)
• DNNs are suitable for deep learning
Deep Neural Networks
https://adventuresinmachinelearning.com/neural-networks-tutorial
7. perforce.com
• Of course, entire college courses are offered on this subject, this barely scrapes the surface
• Many types of neural networks exist which are much more complicated than the one introduced here
• We’ll talk about a network, for instance, that consists of 50 disparate hidden layers, and it’s not even the most
complicated use case!
• If you come away understanding the sieve analogy, then just a few more tidbits will ensure you can keep up with the
rest of the presentation:
• The goal of training a neural network is to reduce the “cost” or distance that a series of hidden layers containing various
retained weights and biases generates output that matches the labeled training data
• Once we have a trained set, we can just pass similar input that we want to analyze into the model and get accurate output
• For instance, we could pass a handwritten number into a properly trained set, and get the actual digital number as output
• Many open source software frameworks exist which can assist in creating and training these models
• The biggest challenge to AI is acquiring labeled training data, which is why companies like Google and Facebook make
astronomical amounts of money selling this data to train other robots, and why data is now “more valuable than gold”
There’s a lot more to the story…
9. perforce.com
• An open source deep learning framework created at Google
• Greatly simplifies the creation of models that use deep learning
networks
• Comes with many useful algorithms already built in
• Programming centers around first defining a deep network and its
layers, and then training the model continuously
• Things like back-propagation, cost optimization, and activation are
handled for you modularly
• Developers need only think about the kind of neural network they
want to build, and the characteristics of the layers
• This has empowered businesses to perform highly sophisticated
AI/ML functions with base AI/ML skillsets
What’s TensorFlow?
10. perforce.com
AirBNB
T H E C H A L L E N G E
• Hundreds of thousands of user-supplied photos are made available to
AirBNB consumers
• AirBNB had no way to classify these photos or ensure their accuracy
and/or relevance to the property in question
• So, a property owner could, for instance, upload 20 random pictures of
the property
• A consumer would have no way of knowing for sure what rooms in the
property the pictures are of
• Typically, a third party company would be hired to classify the images,
but, the number of images makes this uneconomical
11. perforce.com
AirBNB
S O L U T I O N S T A C K
• AirBNB’s classification problem was close, but not identical, to classic
image classification problems
• Rather than identify a specific object in an image, AirBNB had to classify
the type of room that’s represented in the photo
• They modified a classic Deep Neural Network, called a Residual
Network, with a published trained model called ResNet50
• This is a pre-trained image classification network, 50 layers deep, which
is used to classify real-world objects
• By altering a few layers of the network and retraining the model using
their own labeled data, AirBNB was able to create a new model which
could classify room pictures with reasonable accuracy
12. perforce.com
AirBNB
• One of the first challenges encountered was simply sourcing enough relevant and accurately
labeled data
• The data from ImageNet is not really applicable to this problem, though it would help with
object recognition, which we will discuss momentarily
• AirBNB paid a third-party to manually label a smaller set of data to create a “golden set”, and
then used randomly selected AirBNB user-labelled data to train the model against the golden set
• After filtering poorly labeled data, such as “kitchen entrance” vs just “kitchen”, the team’s hybrid
approach was able to scale against the user-provided training data
https://medium.com/airbnb-engineering/categorizing-listing-photos-at-airbnb-f9483f3ab7e3
13. perforce.com
AirBNB
• Raw computational power also became an
issue
• This is where TensorFlow really helped,
running as the backend learning framework
• TensorFlow can easily assign training tasks to
specific GPU threads
• In this case, 8 separate GPUs were used,
running on Amazon EC2 xlarge instances
• According to the dev team, the most accurate
model was obtained after retraining pre-
weighted data over three epochs, lasting a
total of six hours
• In the end, the accuracy varied a bit between
room types, for instances bedrooms ended up
being about 3% “easier” to classify than living
rooms, but, even living rooms were trained to
an accuracy of 92%
14. perforce.com
AirBNB
• The production implementation of this trained model enables the AirBNB site to allow things like
virtual tours grouped by room
• The training feedback is scalable given the hybrid solution of using a golden set as well as user-
provided image data
• TensorFlow proved valuable for providing a scaleable means of retraining a ResNet50 Deep
Neural Network
• Some accidental innovations were triggered as well!
• For instance, object detection in images allows AirBNB to evaluate the kind of amenities that
may be included on the property
• And unsupervised training of the data led to the accidental discovery that it was very easy to
classify images as indoor vs. outdoor scenes
• Attempting this classification by hand, or solely relying on user-supplied data would be
impossible, or at least wildly inaccurate or prohibitively expensive, but TensorFlow and Deep
Neural Networks made it cheap and repeatable!
15. perforce.com
GE Healthcare
T H E C H A L L E N G E
• People’s bodies are different, and so when MRI techs want to get the
best picture of a location in a body, they have to do a lot of work up
front
• Just orienting the scan properly requires an operator to take images up
front, called localizer images, to anchor the final scans
• Human error can factor in during this orientation, as the anchors must
be captured and selected manually
• Failed operation can cause useless images to be captured and scans to
be repeated
• This problem is compounded when scanning complex areas of the body
such as the human brain
16. perforce.com
GE Healthcare
S O L U T I O N S T A C K
• What GE Healthcare really needed then, was a way to automate the
process of determining accurate scan positions from the localizer
images
• They accomplished this by training on localizer images captured to look
at specific landmarks in the brain
• With a trained set, localizer images can be input, and the neural
network can report on how useful those images will be for targeting the
specific landmark
• That way, “bad” images can be recaptured instead of being used for the
actual scan
• TensorFlow with the high level Keras interface was used to create 2D
and 3D Cascading Neural Networks, which are the most commonly used
networks for medical image analysis
17. perforce.com
GE Healthcare
• Keras is an API layer included in TensorFlow for building and training DL models
• By providing APIs to create common learning models such as CNNs, developers can focus only on
customizing their layers
• GE was springboarded by having just the right kind of DNNs already premade and accessible vea
the Keras API
• Layers can be added with a simple API call, like:
model.add(layers.Dense(32, activation=‘softmax’))
model.add(layers.Dense(64, activation=‘sigmoid’))
• Many types of DNNs are available for wide variety of learning use cases
18. perforce.com
GE Healthcare
• The process itself is broken down into three basic mechanisms:
• Quality Analysis – Determine based on the trained set whether the localizer images
captured by the MRI tech will be useful to provide accurate imaging
• Anatomical Landmark Identification – Provide feedback about which areas of the brain can
be scanned based on the provided localizer images
• Provide Orientation Details – Give the details to the tech as to how to properly align the
scan for the best possible pictures based on the localizer images
• TensorFlow’s built in features were used to assist at all three steps of the analysis, from analyzing
the image sets to providing visual feedback to the scan tech
• The entire process evaluates in about three and a half seconds on a decently powered CPU
• Highly varied images were used in the training epochs, collected from a total of only 29,000
images
• Accuracy of 99.2% was achieved using this data set
• This product is now trademarked by GE under the AIRx product name, and currently cuts the
time needed to perform a scan nearly in half while greatly improving accuracy
19. perforce.com
GE Healthcare
In their blog, “Intelligent Scanning using Deep Learning for MRI”, the GE developers list the following
benefits as the reasons they chose TensorFlow for their platform:
• Support for 2D and 3D Cascaded Neural Networks (CNN) which is the primary requirement for medical
image volume processing
• Extensive built-in library functions for image manipulation and optimized tensor computations
• Extensive open-source user and developer community which supported latest algorithm
implementations and made them readily available
• Continuous development with backward compatibility making it easier for code development and
maintenance
• Stability of graph computations made it attractive for product deployment
• Keras interface was available which significantly reduced the development time: This helped in
generating and evaluating different models based on hyper-parameter tuning and determining the
most accurate model for deployment.
• Deployment was done using a TensorFlow Serving CPU based docker container and RestAPI calls to
process the localizer once it is acquired.
https://blog.tensorflow.org/2019/03/intelligent-scanning-using-deep-learning.html
20. perforce.com
Comma AI
T H E C H A L L E N G E
• Though not a TensorFlow example,
Comma AI’s flagship product, openpilot, is
a fully open source autonomous driving
platform built on Python
• The project was launched by its sole
inventor, George Hotz, who had already
become famous for being the first person
to successfully hack the iPhone
• Using nothing but a lidar mount and roof
cameras, Hotz was able to use openpilot
to turn an Acura ILX into a fully self-
driving car
21. perforce.com
Comma AI
S O L U T I O N S T A C K
• The source code for openpilot is available on the project’s
GitHub:
• https://github.com/commaai/openpilot
• The project itself is written primarily in Python and
distributed under the MIT License
• A number of additional open source components were used
to create openpilot, including libraries for the mobile interface
and communication between the software and the vehicle
• What is really groundbreaking about Hotz’s work, though,
isn’t so much about the stack, its about how Hotz trained the
model…
22. perforce.com
Comma AI
• Traditional approaches to vehicle autonomy focus on recognizing situations and reacting to
them appropriately
• For instance, a car can already be trained via object recognition to recognize other cars, lane
indicators, speed bumps, etc
• Logically, then, if cars can accurately recognize situations and respond to those situations, they
can mimic the behavior of a human driver
• However, getting this accuracy and accounting for all possible scenarios is very difficult, which
is the primary obstacle for autonomous vehicle development
• Rather than take this approach, which suffers from supervisional training problems, Hotz
found a way to train the model to drive in an unsupervised way
• He just showed the model hours and hours of dashcam footage from real drivers and told the
model to repeat the behavior!!
23. perforce.com
Comma AI
• The fact that this works, and works well for that matter, really demonstrates the incredible
pattern-recognition capabilities that DNNs provide for us
• It also demonstrates that human ingenuity will always be a part of this process – Hotz
intellectual leap from imperative programming to residual learning through captured video
*greatly* simplifies the development effort
• It cost Hotz a total of $50,000 to build his self-driving car, with $30,000 of that going to the car
itself
• Right now, you can purchase an EON DevKit for $599 from the comma.ai website
• This kit allows you to run the openpilot software on a wide range of vehicles, with a
compatibility guide listed on the company website
26. perforce.com
• A few things are clear: AI and ML are already powerful forces in our daily lives, and they are becoming ever more
powerful as we find new applications and better ways to train
• Where we will go from here will largely be supported by open source communities, as collaboration and sharing of
algorithms and techniques has proven to be the fastest path to growth in this area
• At the same time, though, only a handful of very large corporations have access to the data sets necessary to train
models for our next-generation problems
• And an even smaller number of those companies curate and grow the datasets, which they sell for astronomical
amounts of money
• As software professionals and/or enthusiasts, we have a responsibility to learn about these concepts and share in
their evolution, to ensure fair and ethical use of these powerful technologies as they continue to enrich our lives and
experience
A few more thoughts…