Renee Yao from NVIDIA gave a presentation on using generative adversarial networks (GANs) to generate synthetic data. She discussed how GANs work by having two neural networks, a generator and discriminator, compete against each other. She then provided several examples of real-world applications of GANs, including generating images, video, and medical data. She concluded by discussing NVIDIA's DGX systems for powering large-scale deep learning and GAN projects.
SQL Database Design For Developers at php[tek] 2024
Accelerate AI w/ Synthetic Data using GANs
1. Renee Yao | NVIDIA Senior Product Marketing Manager, Deep Learning and Analytics
September 2018 | New York | Strata Data Conference
ACCELERATE AI W/ SYNTHETIC
DATA USING GANS
2. 2
AGENDA
• The Rise of GPU Computing
• What Are GANs?
• Use Cases
• Real World Examples
• DGX Systems and GPUs
• Wrap-up
3. 3
GPU COMPUTING AT THE HEART OF AI
New Advancements Leapfrog Moore’s Law
Performance Beyond Moore’s Law Big Bang of Modern AI
4. 4
DEEP LEARNING IS
SWEEPING ACROSS INDUSTRIES
Internet Services Medicine Media & Entertainment Security & Defense Autonomous Machines
➢ Cancer cell detection
➢ Diabetic grading
➢ Drug discovery
➢ Pedestrian detection
➢ Lane tracking
➢ Recognize traffic signs
➢ Face recognition
➢ Video surveillance
➢ Cyber security
➢ Video captioning
➢ Content based search
➢ Real time translation
➢ Image/Video classification
➢ Speech recognition
➢ Natural language processing
7. 7
“(GANs), and the variations that are now being
proposed is the most interesting idea in the last
10 years in ML, in my opinion.”
-- Yann LeCun, a prominent figure in Deep
Learning domain
8. 8
WHAT ARE GANS?
- Goal: produce counterfeit
money that is as similar as
real money.
- Goal: distinguish between real
and counterfeit money.
9. 9
WHAT ARE GANS?
Generator
- Goal: produce counterfeit
money that is as similar as
real money.
- Goal: distinguish between real
and counterfeit money.
Discriminator
31. 31
IMPROVING PASSENGER SAFETY
With the Power of Deep Learning
Input Output: Most Similar Images
On demand webinar: https://webinars.on24.com/NVIDIA/stationwebinarseries
33. 33
DL IMPROVES (UNDERSAMPLED)
MRI RECONSTRUCTION
Train cost function:
a discriminator for image quality
Train reconstruction:
a generator mapping to restored image
34. 34
DL IMPROVES (UNDERSAMPLED)
MRI RECONSTRUCTION
Zero-pad CS-Wavelet MSE L1 GAN+L1/L2 GAN+L1/L2+DC
Ground-truth
Related works are published at IEEE Transaction in Medical Imaging
(https://ieeexplore.ieee.org/document/8417964/) and accepted in NIPS 2018
37. 37
MRI GAN PROJECT
Deep Learning holds enormous promise to
advance medical discoveries, but adequate
training data can be a challenge. Scientists
at the MGH & BWH Center for Clinical Data
Science are using the NVIDIA DGX Station
to power GANs that create and validate
synthetic brain MRI images. Combining
the manufactured images with real
MRI images enables the team
to train its neural network
with 75% less data.
75% Less Data Needed
38. 38
PURPOSE-BUILT AI SUPERCOMPUTERS
AI WORKSTATION AI DATA CENTER
Universal SW for Deep Learning
Predictable execution across platforms
Pervasive reach
NGC DL SOFTWARE STACK
The Essential Instrument
for AI Research
DGX-1
The Personal
AI Supercomputer
DGX Station
The World’s Most Powerful
AI System for the Most
Complex AI Challenges
DGX-2
39. 39
FROM DEVELOPMENT TO DEPLOYMENT
Deep Learning, HPC, HPC Visualization
With NVIDIA GPU Cloud Containers
TITAN Quadro DGX
Station
DGX-1
DGX-2
CloudDesktop Workstation Server
NGC
CSP
40. 40
NVIDIA DEEP LEARNING INSTITUTE
Training organizations and individuals to solve challenging problems using Deep Learning
On-site workshops and online courses presented by certified experts
Covering complete workflows for proven application use cases. Image classification, object
detection, natural language processing, recommendation systems, and more
Hands-on training for data scientists and software engineers
www.nvidia.com/dli
41. 41
ADDITIONAL RESOURCES
DGX Systems
The world’s fastest AI
Supercomputers
nvidia.com/DGX
NVIDIA GPU Cloud
To learn more about all of
the GPU-accelerated
software from NGC, visit:
nvidia.com/cloud
To sign up or explore NGC:
ngc.nvidia.com
Inception
Access to NVIDIA tech
GPU and AI experts
Global marketing/sales
GPU venture introduction
nvidia.com/inception
DGX Station Webinars
To learn more about how
SBB is using DGX Stations
for their GAN solutions
https://webinars.on24.co
m/NVIDIA/stationwebinars
eries
It turns out that deep learning is stunningly effective across many domains, and it’s transforming the way computers achieve perceptual tasks such as computer vision, pattern detection, speech recognition and behavior prediction. Some people, including Bloomberg and the World Economic Forums, have even started referring to it as the 4th industrial revolution.
References:
https://www.bloomberg.com/news/articles/2016-05-20/forward-thinking-robots-and-ai-spur-the-fourth-industrial-revolution
https://www.weforum.org/agenda/2016/01/the-fourth-industrial-revolution-what-it-means-and-how-to-respond/
A few examples:
Facebook’s “DeepFace” feature creates a 3D model of your face from your online photos, adjusts for lighting and facial expressions, and identifies you in photos with 97% accuracy – all using deep learning. You may have experienced this when Facebook alerts you that a new picture of you has been posted and gives you the option to blur out your image.
References:
http://www.inquisitr.com/1825367/facebook-deepface-ai-learning-your-face-in-every-uploaded-photo
http://news.sciencemag.org/social-sciences/2015/02/facebook-will-soon-be-able-id-you-any-photo
Microsoft’s Skype Translator performs instant translation of conversations in over 40 languages, using Deep Learning to automatically transform your spoken words into text that can then be analyzed using standard translation methods.
References:
http://www.technologyreview.com/news/534101/something-lost-in-skype-translation/
http://www.wired.com/2014/05/microsoft-skype-translate/
Other examples include medical researchers detecting genes associated with autism spectrum disorder, neuroscientists detecting and suppressing the brainwave patterns responsible for epileptic seizures, and using deep learning to identify skin cancers, classify lung sounds, and accelerate computational drugs design, saving millions in research.
Imagine a day in the not-too-distant future when a service like Facebook puts this all together and notifies you that you may have early-stage skin cancer so you can consult with your doctor and get life-saving treatment.
===========================
Story: Understanding Video
Clarifai offers a service that rapidly analyzes images and video clips to recognize 10,000 different objects or types of scenes
This capability can be used for extremely targeted advertising, for example:
Showing a Starbucks ad whenever coffee appears in a video
Rapidly scanning security footage
Searching through your personal video archive for your child’s first steps
References:
http://www.technologyreview.com/news/534631/a-startups-neural-network-can-understand-video/
Short review of DL
---
Once the model has been trained, much of the generalized flexibility that was necessary during the training process is no longer needed, so it’s possible to optimize the model for significantly faster runtime performance.
[Story: How Cooper learned to classify cars, “tucks”, busses, motorcycles, etc.]
Common optimizations include fusing layers to reduce memory and communication overhead, pruning nodes that do not contribute significantly to the results, and other techniques supported in the NVIDIA TensorRT runtime.
The fully trained and optimized model is then ready to be integrated into an application that will feed it new data, in this case images of dogs and cats that it hasn’t seen before, and it will be able to quickly and accurately infer the correct answer based on its training. And your application can be deployed on a GPU-accelerated platform in your datacenter, in the cloud, on a local workstation, in a robot, a smart camera, or even a self-driving car.
The Generator Network takes an random input and tries to generate a sample of data. In the above image, we can see that generator G(z) takes a input z from p(z), where z is a sample from probability distribution p(z). It then generates a data which is then fed into a discriminator network D(x). The task of Discriminator Network is to take input either from the real data or from the generator and try to predict whether the input is real or generated. It takes an input x from pdata(x) where pdata(x) is our real data distribution. D(x) then solves a binary classification problem using sigmoid function giving output in the range 0 to 1.
Synthetic data will drive the next wave of deployment and application of deep learning in the real world across a variety of problems involving speech recognition, image classification, object recognition and language. All industries, and companies will benefit, as synthetic data can create conditions through simulation, instead of authentic situations (virtual worlds enable you to avoid cost of damages, spare human injuries, and other factors that come into play); unparalleled ability to test products, and interactions with them in any environment.
Represent and manipulate high-dimensional probability distributions: High-dimensional probability distributions are important objects in a wide variety of applied math and engineering domains. Training and sampling from generative models is an excellent test of our ability to represent and manipulate high-dimensional probability distributions.
Reinforcement learning: Generative models can be incorporated into reinforcement learning in several ways. Generative models of time-series data can be used to simulate possible futures. Such models could be used for planning in reinforcement learning in a variety of ways.
Missing data and semi-supervised learning: Generative models can be trained with missing data and can provide predictions on inputs that are missing data. One particularly interesting case of missing data is semi-supervised learning, in which the labels for many or even most training examples are missing.
Working with multi-modal outputs: Generative models, and GANs in particular, enable machine learning to work with multi-modal outputs. For many tasks, a single input may correspond to many different correct answers, each of which is acceptable.
Data generation: Finally, many tasks intrinsically require realitic generation of samples from some distribution.
https://www.analyticsvidhya.com/blog/2017/06/introductory-generative-adversarial-networks-gans/
Problem with Counting: GANs fail to differentiate how many of a particular object should occur at a location. As we can see below, it gives more number of eyes in the head than naturally present.
Problems with Perspective: GANs fail to adapt to 3D objects. It doesn’t understand perspective, i.e.difference between frontview and backview. As we can see below, it gives flat (2D) representation of 3D objects.
Problems with Global Structures: Same as the problem with perspective, GANs do not understand a holistic structure. For example, in the bottom left image, it gives a generated image of a quadruple cow, i.e. a cow standing on its hind legs and simultaneously on all four legs. That is definitely not possible in real life!
Jan 2016 | Predicting the next frame in a video : You train a GAN on video sequences and let it predict what would occur next
Draw simple strokes and let the GAN draw an impressive picture for you!
June 2016 | Text to Image Generation : Just say to your GAN what you want to see and get a realistic photo of the target.
May 2017 | Increasing Resolution of an image : Generate a high resolution photo from a comparatively low resolution.
22Nov 2017 | Generate an image from another image. For example, given on the left, you have labels of a street scene and you can generate a real looking photo with GAN. On the right, you give a simple drawing of a handbag and you get a real looking drawing of a handbag.
NVIDIA researchers grabbed headlines last fall for generating photos of believable yet imaginary celebrity faces with deep learning. They’ll discuss how they did it on stage next week at the International Conference on Learning Representations, better known as ICLR.
That research team is one of five from NVIDIA Research sharing their work to advance deep learning at ICLR, April 30-May 3 in Vancouver. Our 200-person strong NVIDIA Research team, which works from 11 worldwide locations, focuses on pushing the boundaries of technology in machine learning, computer vision, self-driving cars, robotics, graphics and other areas.
Also at ICLR: The NVIDIA Deep Learning Institute will offer free online training — and the chance to win an NVIDIA TITAN V. (More information below.) And, if you missed our GPU Technology Conference, you can see our latest innovations at our booth.
ICLR, in its sixth year, is focused on the latest deep learning techniques. NVIDIA is a sponsor of the conference.
More Than a Pretty Face
In the face-generating research, a team at our Finland lab developed a method of training generative adversarial networks (GANs) that produced better results than existing techniques. The researchers demonstrated their success by applying it to the difficult problem of generating realistic-looking human faces.
“Human looks have been somewhat sacred. It’s extremely difficult to create believable-looking digital characters in, say, movies without using real-life actors as reference,” said Tero Karras, lead author on the ICLR paper. “With deep learning and this paper, we’re getting closer.”
The neural network generates new human faces by mixing characteristics like gender, hairstyle, and face shape in different ways. The video below shows the result of varying these characteristics at random, demonstrating an endless number of possible combinations.
The research paves the way for game developers to more quickly and easily create digital people who look like the real thing, Karras said. He’s also heard from a team that’s looking to apply the research to help people with prosopagnosia, or face blindness, a neural disorder in which sufferers can’t recognize faces.
The researchers will discuss the paper, “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” on Monday morning at ICLR, explaining how they achieved such good results and the reasoning behind their techniques.
Poster Sessions
Check out our poster sessions at ICLR:
“Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training” – Training a neural network across many machines is an efficient way to train deep and larger models, but it requires an expensive high-bandwidth communications network. This research makes it possible train models faster across more GPUs, but on an inexpensive commodity Ethernet network.
“Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip” – Engineers combined a method for more efficiently running recurrent neural networks on GPUs with a technique known as model pruning, which reduces the complexity of a neural network. The result dramatically speeds up models and makes it possible to deploy far larger neural networks on each GPU.
“Efficient Sparse-Winograd Convolutional Neural Networks” – Convolutional neural networks are computationally intensive, which makes it difficult to deploy them on mobile devices. Researchers made changes to what’s known as the Winograd-based convolution algorithm — a method used to reduce the number of computations needed to process CNNs — so that it would work with network pruning, a way to reduce parameters and increase speed in a neural network. In addition to shrinking computational workloads, combining the two methods allowed researchers to prune networks more aggressively.
“Mixed Precision Training” – Mixed precision training takes advantage of NVIDIA Volta TensorCoresto speed up training and reduce memory requirements, which enables training larger models. Techniques described in the paper lead to model accuracy that matches single-precision results, without changing any hyper-parameters.
Get Free Training, Win a TITAN V
The NVIDIA Deep Learning Institute (DLI) will offer free online training exclusively to ICLR attendees — and the chance to win an NVIDIA TITAN V. Stop by the NVIDIA booth to pick up a token card for free access. The first 200 attendees to take an online course will receive $100 in online training credits. See contest rules. And swing by on Tuesday and Wednesday between 4-5 p.m. to meet our DLI University Ambassadors.
Also at our booth: get hands on with our some of our latest technology, talk with deep learning experts or meet our hiring team.
Wardrobes could sharpen up for online shoppers everywhere, while retailers cut costs, thanks to AI that puts clothes on fashion models in a virtual photo shoot.
A fully staffed, photo fashion shoot can run up to $500 per outfit. Given the thousands of potential looks, many online retailers display their garments without models. Silicon Valley-based Vue.ai thinks it has a fix where clothes could still be showcased by a stylish model, but without the high overhead.
“It’s well known that consumers prefer to see clothes presented on models, but retailers face some really high costs, both in terms of money and in terms of time,” Costa Colbert, chief scientist at Vue.ai, and its parent company MAD Street Den, said at the GPU Technology Conference last month in San Jose.
Vue.ai uses its image and video recognition technologies to turn images of garments into new ones that show the garments worn by models. The generated images provide a more helpful visual for consumers, while saving retailers money.
“We can help reduce those costs, but also provide something more appealing and personalized for the consumer,” Colbert said.
GANS for Garments
Vue-ai showcases a blue dress.Using a machine learning approach called conditional generative adversarial networks, or cGANs, Vue.ai’s tech simultaneously learns to generate images and to distinguish these from real photographic images.
Running on several NVIDIA GPUs, the network learns by observing many pairs of images — one of the garment, another of how that garment looks on a fashion model.
This training reaches a point where the computational critic can’t distinguish the difference between images — and it’s hard for humans to tell the difference either.
Eventually the cGAN learns about clothes — what a long sleeve or an off-the-shoulder collar should look like when worn. It also becomes able to produce “in-between” features, for example, poses or skin colors, which can be controlled by manipulating variables within the network.
“With no real-life models involved, there’s no limit on details such as skin color or body type. Images can be generated on the fly and customized any number of ways for each consumer,” Colbert said. “To the casual observer, the images are just another photograph.”
For apparel makers, retailers and consumers alike, the technology can improve connections with the brand and create a smoother shopping experience, he said.
Anomaly Detection.
Next Step: Clustering of railway components
Anomaly: components that do not fit to any cluster
Clustering with Generative Adversarial Networks (GAN):
Generator and Discriminator learn the underlying structure of the data
Search for similar images (Clustering) – Example:
https://blogs.nvidia.com/blog/2018/04/18/ai-for-dentistry/
It wasn’t long ago that dental crowns were produced on assembly lines, with rows of workers engaged in the physical effort of building and shaping them.
To make that process faster, more precise and, ultimately, less expensive, dental product maker Glidewell Laboratories has been building a deep learning environment for designing and manufacturing crowns, also known as caps.
Over the past decade, Glidewell has added significant automation to crown creation in the form of robots and ramped-up use of computer-aided design and manufacturing software. But much variability has remained because of the need for humans to manage refinement of a product that demands a high level of precision.
With a production load of 10,000 units a day, Glidewell has plenty of reasons to want to bring more systematic consistency to that refinement process.
To that end, the company is training GPU-powered generative adversarial networks, which are adept at reconstructing detailed 3D models from images. Glidewell will soon be ready to start live production of AI-designed crowns, according to Sergei Azernikov, machine learning team lead at Glidewell, who spoke last month at the GPU Technology Conference in Silicon Valley.
“In the near future, we will have fully automated clients with everything handled by intelligent systems,” said Azernikov.
Glidewell has faced some special challenges in training its networks because it doesn’t actually work from images. Its data is in the form of 3D meshes, which aren’t as well suited for being run through a neural network.
Azernikov said he and his team initially tried converting the meshes into images, but they found that each time they changed a render, they had to change their model. Then they tried optimizing their models through voxelization, but that still didn’t provide the desired results.
They ultimately decided to convert the meshes into depth maps, which enabled better recreation of the detailed contours and subtleties of a tooth (and the large majority of crowns are for molars).
That level of detail is necessary to ensure that a crown conforms to three considerations: it’s shape allows it to fit optimally between adjacent teeth; it fits together with opposing teeth; and its dynamics enable effective biting and chewing.
Combining depth maps with GANs, in which one network generates images and the second inspects those images, results in crowns that have even more anatomical detail than the original teeth they’re replacing. The job of the generative network is to randomize its output and get the inspection network to make as many errors as possible, thereby growing increasingly precise over time.
It’s an approach that, while highly effective, puts more demand on the deep learning process underneath it.
“To train one network is difficult,” said Azernikov. “To train two networks simultaneously is even more difficult.”
That said, Glidewell has seen impressive results. The company first started experimenting with AI three years ago, and initial training of networks on CPUs took six weeks. Moving to the first-generation NVIDIA TITAN GPU shortened that to six days. Upping the ante to an NVIDIA TITAN X paired with NVIDIA’s cuDNN deep learning library cut that down to just two-and-a-half days.
Azernikov said that training is still being done locally on the TITAN X, but that inference happens in the company’s custom-built Amazon Web Services environment, which runs a variety of NVIDIA GPUs. His team also is working with TensorRT (in combination with CUDA runtime) to accelerate the inference process.
“Inference is critical for us,” he said. “Training happens once and inference can go on for months.”
Azernikov intends for Glidewell patients to be getting AI-designed crowns sometime this year, and looks forward to the dependability it will bring to a product category that historically has had to account for a lot of variability.
“The biggest advantage of AI is that once you train it,” said Azernikov, “it will be consistent no matter what.”
Generative Adversarial Network for MRI Reconstruction (GANCS)
Train on 300+ patients and evaluate on ~50 patients
Mass General – MRI GAN research project: creating synthetic MRI images to speed training
Industry: Healthcare
Data: MRI Images – real and synthetic
Products: DGX Station
Customer Contact: Jameson Rogers, Strategy Consultant and Genome Engineer, MGH & BWH Center for Clinical Data Science
Summary
Deep Learning hold enormous promise for advancing medical discoveries and patient care, but accessible, usable, and adequate volumes of training data is a huge challenge. Without the right kind of, and the right amount of training data, research can be delayed, prolonged, or even cancelled altogether.
Scientists at the MGH & BWH Center for Clinical Data Science are taking an innovative approach to solve the training data issue. They’re using GANs to create and validate realistic-looking synthetic brain MRI images, which are then mixed with a panel of real MRI images to train the neural network with smaller volumes of data. Powered by the NVIDIA DGX Station with its preconfigured environment and NVLink, scientists were able to get started right out of the gate and train their neural network with 75% less data.
Problem
Hypothesis is to prove that machines can be trained with a smaller number of images, (e.g. 1K vs. 4K) and, overall, to prove the power of AI.
Solution
Deploy a GAN approach on the DGX Station. The Generator creates synthetic brain MRIs and the Discriminator determines if it’s real or manufactured. The Generator can create a panel of fake MRI and mix those in with real MRI’s for training.
Result(s)/Impact
- Research is still early, but the expectation is that synthetic images will help speed training -- that with synthetic images we’ll be able to quickly build a library of what’s being investigated.
- Haven’t yet applied the research to clinical application
- Benefits of DGX Station:
Preconfigured environment speeds getting started – “things just work” – NVLink is a big benefit.
Portability is another big plus. They can’t keep the system out in the open on the office floor (due to confidentiality of data), so the portability of DGX Station is a big plus.
They can easily share it with other datacenters.
It’s convenient to have the DGX Station as a dedicated resource – researchers don’t have to sign up for time on the DGX-1 cluster, and some projects don’t need the full power of the DGX-1.
All development is done in Docker – the pre-loaded dockers with optimized containers helps speed up development times / NVLink.
About the Customer
The Massachusetts General Hospital’s Center for Clinical Data Science brings together man and machine for the advancement of patient care.
Massachusetts General Hospital (Mass General or MGH) is the original and largest teaching hospital of Harvard Medical School and a biomedical research facility located in the West End neighborhood of Boston, Massachusetts. It is the third oldest general hospital in the United States and the oldest and largest hospital in New England with 950 beds. Massachusetts General Hospital conducts the largest hospital-based research program in the world, with an annual research budget of more than $750 million. It is currently ranked as the #3 hospital in the United States by U.S. News & World Report.
More Information
http://www.massgeneral.org/imaging/research/researchlab.aspx?id=1759
https://www.forbes.com/sites/bernardmarr/2017/07/13/the-biggest-challenges-facing-artificial-intelligence-ai-in-business-and-society/#56799f0a2aec
https://www.infoworld.com/article/3246706/artificial-intelligence/ai-the-challenge-of-data.html
NVIDIA provides a wide range of GPU-accelerated platforms you can use to accelerate deep learning training and inference, and HPC workloads.
If you want a fully-integrated solution, we recommend the DGX-1 supercomputer in a box which delivers the performance equivalent of 250 CPU-only servers, or its little brother the DGX Station, which runs whisper-quiet next to your desk. Or, the incredible DGX-2, available later this year.
If you just want to get started on a prototype using your existing workstation, the Titan X Pascal supports fast 32-bit floating point (FP32) and 8-bit integer (INT8) performance for deep learning applications.
In the data center, the Tesla P100 and V100 with NVLink Technology deliver strong scaling support for mixed workloads across both HPC applications and Deep Learning training & inference workloads (using FP64, FP32, and FP16).
And of course, NGC containers are an ideal choice to use with NVIDIA GPU-enabled instances on the top cloud service
If you don’t have them on board yet we can help.
We’ve trained around 10K people so far with DLI. Deliver full curriculum.
Estimate > 200K by end of this year.