Erich Elsen, Research Scientist, Baidu Research at MLconf NYC - 4/15/16

•Download as PPTX, PDF•

2 likes•1,474 views

Training Recurrent Neural Networks at Scale: One of our projects at Baidu’s Silicon Valley AI Lab is using deep learning to develop state of the art end-to-end speech recognition systems based on recurrent neural networks for multiple languages. The training set for each language is multiple terabytes in size and each model requires in excess of 10 Exaflops to train. Training such models requires scale and techniques that are unusual for deep learning but more common in high performance computing. I will talk about the challenges involved and the software and hardware solutions that we employ.

Technology

Training Recurrent Neural
Networks at Scale
Erich Elsen
Research Scientist

Erich Elsen
Natural User Interfaces
• Goal: Make interacting with computers as
natural as interacting with humans
• AI problems:
– Speech recognition
– Emotional recognition
– Semantic understanding
– Dialog systems
– Speech synthesis

Erich Elsen
Deep Speech Applications
• Voice controlled apps
• Peel Partnership
• English and Mandarin APIs in the US
• Integration into Baidu’s products in China

Erich Elsen
Deep Speech: End-to-end learning
• Deep neural network predicts
probability of characters directly from
audio
. . .
. . .
T H _ E … D O G

Erich Elsen
Connectionist Temporal Classification

Erich Elsen
Deep Speech: CTC
E .01 .05 .1 .1 .8 .05
H .01 .1 .1 .6 .05 .05
T .01 .8 .75 .2 .05 .1
BLANK .97 .05 .05 .1 .1 .8
• Simplified sequence of network outputs
(probabilities)
• Generally many more timesteps than letters
• Need to look at all the ways we can write “the”
• Adjacent characters collapse
• TTTHEE, TTTTHE, TTHHEE, THEEEE, ….
• Solve with dynamic programming
Time

Erich Elsen
warp-ctc
• Recently open sourced our CTC
implementation
• Efficient, parallel CPU and GPU backend
• 100-400X faster than other implementations
• Apache license, C interface
https://github.com/baidu-research/warp-ctc

Erich Elsen
Accuracy scales with Data
Data & Model Size
Performance
Deep Learning algorithms
Many previous methods
• 40% error reduction for each 10x increase in dataset size

Erich Elsen
Training sets
• Train on ~1½ years of data (and growing)
• English and Mandarin
• End-to-end deep learning is key to
assembling large datasets
• Datasets drive accuracy

Erich Elsen
Large Datasets = Large Models
Dataset Size
Big Model
Small Model
Accuracy
• Models require over 20 Exa-flops to train (exa =
10^18)
• Trained on 4+ Terabytes of audio

Erich Elsen
Virtuous Cycle of Innovation
Perform ExperimentLearn
Iterate
Design New Experiment

Erich Elsen
Experiment Scaling
• Batch Norm impact with deeper networks
• Sequence wise normalization:

Erich Elsen
Parallelism across GPUs
Model Parallel
Data Parallel
MPI_Allreduce()
Training Data Training Data
For these models, Data Parallelism works best

Erich Elsen
Performance for RNN training
• 55% of GPU FMA peak using a single GPU
• ~48% of peak using 8 GPUs in one node
• Weak scaling very efficient, albeit algorithmically
challenged
1
2
4
8
16
32
64
128
256
512
1 2 4 8 16 32 64 128
TFLOP/s
Number of GPUs
Typical
training run
one node multi node

Erich Elsen
All-reduce
• We implemented our own all-reduce out of
send and receive
• Several algorithm choices based on size
• Careful attention to affinity and topology

Erich Elsen
Scalability
• Batch size is hard to increase
– algorithm, memory limits
• Performance at small batch sizes (32, 64)
leads to scalability limits

Erich Elsen
Precision
• FP16 also mostly works
– Use FP32 for softmax and weight updates
• More sensitive to labeling error
1
10
100
1000
10000
100000
1000000
10000000
100000000
-31
-30
-29
-28
-27
-26
-25
-24
-23
-22
-21
-20
-19
-18
-17
-16
-15
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
Count
Magnitude
Weight Distribution

Erich Elsen
Conclusion
• We have to do experiments at scale
• Pushing compute scaling for end-to-end
deep learning
• Efficient training for large datasets
– 50 Teraflops/second sustained on one model
– 20 Exaflops to train each model
• Thanks to Bryan Catanzaro, Carl Case, Adam Coates for donating some slides
Erich Elsen

Viewers also liked

Beyond the Classifier, Inspiration from Engineering Algorithms: Many data scientists work within the realm of Machine Learning and their problems are often addressable with techniques such as classifiers and recommendation engines. At Tapad, we have often had to look outside that standard toolkit to find inspiration from more traditional engineering algorithms. This has included solving our Device Graph’s connected component problem at scale as well as maintaining our Device Graph’s time-consistency in our cluster identification week over week.

Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16

MLconf

Corinna Cortes, Head of Research, Google at MLconf NYC

MLconf

Notes from 2016 bay area deep learning school

Niketan Pansare

Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...

Spark Summit

Lessons from 2MM machine learning models

Extract Data Conference

Deep Learning in real world @Deep Learning Tokyo

Preferred Networks

[251] implementing deep learning using cu dnn

NAVER D2

aiconf2017okanohara

Preferred Networks

Deep Learning: a birds eye view

Roelof Pieters

Viewers also liked (9)

Yael Elmatad, Senior Data Scientist, Tapad at MLconf NYC - 4/15/16

Corinna Cortes, Head of Research, Google at MLconf NYC

Notes from 2016 bay area deep learning school

Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...

Lessons from 2MM machine learning models

Deep Learning in real world @Deep Learning Tokyo

[251] implementing deep learning using cu dnn

aiconf2017okanohara

Deep Learning: a birds eye view

Similar to Erich Elsen, Research Scientist, Baidu Research at MLconf NYC - 4/15/16

Deep Domain

Zachary S. Brown

Scalable Deep Learning on AWS with Apache MXNet

Julien SIMON

Deep learning introduction

Adwait Bhave

Smaller and Easier: Machine Learning on Embedded Things

NUS-ISS

R tech introcomputer

Rose Rajput

Pdc lecture1

SyedSafeer1

Introduction to deep learning

Amr Rashed

Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. Apache MXNet is a fully-featured, flexibly-programmable and ultra-scalable deep learning framework supporting innovative deep models including convolutional neural networks (CNNs), and long short-term memory networks (LSTMs). This Tech Talk will show you how to launch the deep learning cloud formation template and deploy the deep learning AMI to train your own deep neural network, using MNIST, to recognize handwritten digits and test it for accuracy. Learning Objectives: - Learn about the features and benefits of Apache MXNet - Learn about the deep learning AMIs with the tools you need for DL - Learn how to train a neural network using MXNet"

A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks

Amazon Web Services

A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks

Amazon Web Services

Repeating History...On Purpose...with Elixir

Barry Jones

Large scalecplex

optimizatiodirectdirect

Recent advancements in Linear and Mixed Programing give us the capability to solve larger Optimization Problems. CPLEX Optimization Studio solves large-scale optimization problems and enables better business decisions and resulting financial benefits in areas such as supply chain management, operations, healthcare, retail, transportation, logistics and asset management. In this workshop using CPLEX Optimization Studio we will discuss modeling practices, case studies and demonstrate good practices for solving Hard Optimization Problems. We will also discuss recent CPLEX performance improvements and recently added features.

CPLEX Optimization Studio, Modeling, Theory, Best Practices and Case Studies

optimizatiodirectdirect

Concurrency & Parallel Programming

Ramazan AYYILDIZ

In this deck, John Gustafson presents: An Energy Efficient and Massively Parallel Approach to Valid Numerics. "Written by one of the foremost experts in high-performance computing and the inventor of Gustafson’s Law, The End of Error: Unum Computing explains a new approach to computer arithmetic: the universal number (unum). The unum encompasses all IEEE floating-point formats as well as fixed-point and exact integer arithmetic. This new number type obtains more accurate answers than floating-point arithmetic yet uses fewer bits in many cases, saving memory, bandwidth, energy, and power." Watch the video presentation: http://wp.me/p3RLHQ-dTk Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...

inside-BigData.com

Scalable Deep Learning on AWS using Apache MXNet (May 2017)

Julien SIMON

Elixir

Fuat Buğra AYDIN

Human languages are complex, diverse and riddled with exceptions – translating between different languages is therefore a highly challenging technical problem. Deep learning approaches have proved powerful in modelling the intricacies of language, and have surpassed all statistics-based methods for automated translation. This session begins with an introduction to the problem of machine translation and discusses the two dominant neural architectures for solving it – recurrent neural networks and transformers. A practical overview of the workflow involved in training, optimising and adapting a competitive neural machine translation system is provided. Attendees will gain an understanding of the internal workings and capabilities of state-of-the-art systems for automatic translation, as well as an appreciation of the key challenges and open problems in the field.

Building a Neural Machine Translation System From Scratch

Natasha Latysheva

This is a slide deck from a presentation, that my colleague Shirin Glander (https://www.slideshare.net/ShirinGlander/) and I did together. As we created our respective parts of the presentation on our own, it is quite easy to figure out who did which part of the presentation as the two slide decks look quite different ... :) For the sake of simplicity and completeness, I just copied the two slide decks together. As I did the "surrounding" part, I added Shirin's part at the place when she took over and then added my concluding slides at the end. Well, I'm sure, you will figure it out easily ... ;) The presentation was intended to be an introduction to deep learning (DL) for people who are new to the topic. It starts with some DL success stories as motivation. Then a quick classification and a bit of history follows before the "how" part starts. The first part of the "how" is some theory of DL, to demystify the topic and explain and connect some of the most important terms on the one hand, but also to give an idea of the broadness of the topic on the other hand. After that the second part dives deeper into the question how to actually implement DL networks. This part starts with coding it all on your own and then moves on to less coding step by step, depending on where you want to start. The presentation ends with some pitfalls and challenges that you should have in mind if you want to dive deeper into DL - plus the invitation to become part of it. As always the voice track of the presentation is missing. I hope that the slides are of some use for you, though.

Deep learning - a primer

Uwe Friedrichsen

This is a slide deck from a presentation, that my colleague Uwe Friedrichsen (https://www.slideshare.net/ufried/) and I did together. As we created our respective parts of the presentation on our own, it is quite easy to figure out who did which part of the presentation as the two slide decks look quite different ... :) For the sake of simplicity and completeness, Uwe copied the two slide decks together. As he did the "surrounding" part, he added my part at the place where I took over and then added concluding slides at the end. Well, I'm sure, you will figure it out easily ... ;) The presentation was intended to be an introduction to deep learning (DL) for people who are new to the topic. It starts with some DL success stories as motivation. Then a quick classification and a bit of history follows before the "how" part starts. The first part of the "how" is some theory of DL, to demystify the topic and explain and connect some of the most important terms on the one hand, but also to give an idea of the broadness of the topic on the other hand. After that the second part dives deeper into the question how to actually implement DL networks. This part starts with coding it all on your own and then moves on to less coding step by step, depending on where you want to start. The presentation ends with some pitfalls and challenges that you should have in mind if you want to dive deeper into DL - plus the invitation to become part of it. As always the voice track of the presentation is missing. I hope that the slides are of some use for you, though.

Deep learning - a primer

Shirin Elsinghorst

Windows Server 2008 R2 Dev Session 02

Clint Edmonson

Similar to Erich Elsen, Research Scientist, Baidu Research at MLconf NYC - 4/15/16 (20)

Deep Domain

Scalable Deep Learning on AWS with Apache MXNet

Deep learning introduction

Smaller and Easier: Machine Learning on Embedded Things

R tech introcomputer

Pdc lecture1

Introduction to deep learning

A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks

Repeating History...On Purpose...with Elixir

Large scalecplex

CPLEX Optimization Studio, Modeling, Theory, Best Practices and Case Studies

Concurrency & Parallel Programming

Unum Computing: An Energy Efficient and Massively Parallel Approach to Valid ...

Scalable Deep Learning on AWS using Apache MXNet (May 2017)

Elixir

Building a Neural Machine Translation System From Scratch

Deep learning - a primer

Windows Server 2008 R2 Dev Session 02

More from MLconf

Understanding Human Impact: Social and Equity Assessments for AI Technologies Social and Equity Impact Assessments have broad applications but can be a useful tool to explore and mitigate for Machine Learning fairness issues and can be applied to product specific questions as a way to generate insights and learnings about users, as well as impacts on society broadly as a result of the deployment of new and emerging technologies. In this presentation, my goal is to advocate for and highlight the need to consult community and external stakeholder engagement to develop a new knowledge base and understanding of the human and social consequences of algorithmic decision making and to introduce principles, methods and process for these types of impact assessments.

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

MLconf

The Brain’s Guide to Dealing with Context in Language Understanding Like the visual cortex, the regions of the brain involved in understanding language represent information hierarchically. But whereas the visual cortex organizes things into a spatial hierarchy, the language regions encode information into a hierarchy of timescale. This organization is key to our uniquely human ability to integrate semantic information across narratives. More and more, deep learning-based approaches to natural language understanding embrace models that incorporate contextual information at varying timescales. This has not only led to state-of-the art performance on many difficult natural language tasks, but also to breakthroughs in our understanding of brain activity. In this talk, we will discuss the important connection between language understanding and context at different timescales. We will explore how different deep learning architectures capture timescales in language and how closely their encodings mimic the brain. Along the way, we will uncover some surprising discoveries about what depth does and doesn’t buy you in deep recurrent neural networks. And we’ll describe a new, more flexible way to think about these architectures and ease design space exploration. Finally, we’ll discuss some of the exciting applications made possible by these breakthroughs.

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

MLconf

Applying Computer Vision to Reduce Contamination in the Recycling Stream With China’s recent refusal of most foreign recyclables, North American waste haulers are scrambling to figure out how to make on-shore recycling cost-effective in order to continue providing recycling services. Recyclables that were once being shipped to China for manual sorting are now primarily being redirected to landfills or incinerators. Without a solution, a nearly $5 billion annual recycling market could come to a halt. Purity in the recycling stream is key to this effort as contaminants in the stream can increase the cost of operations, damage equipment and reduce the ability to create pure commodities suitable for creating recycled goods. This market disruption as a result of China’s new regulations, however, provides us the chance to re-examine and improve our current disposal & collection habits with modern monitoring & artificial intelligence technology. Using images from our in-dumpster cameras, Compology has developed an ML-based process that helps identify, measure and alert for contaminants in recycling containers before they are picked-up, helping keep the recycling stream clean. Our convolutional neural network flags potential instances of contamination inside a dumpster, enabling garbage haulers to know which containers have the wrong type of material inside. This allows them to provide targeted, timely education, and when appropriate, assess fines, to improve recycling compliance at the businesses and residences they serve, helping keep recycling services financially viable. In this presentation, we will walk through our ML-based contamination measurement and scoring process by showing how Waste Management, a national waste hauler, has experienced 57% contamination reduction in nearly 2,000 containers over six months, This progress shows significant strides towards financially viable recycling services.

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

MLconf

Quantum Computing: a Treasure Hunt, not a Gold Rush Quantum computers promise a significant step up in computational power over conventional computers, but also suffer a number of counterintuitive limitations --- both in their computational model and in leading lab implementations. In this talk, we review how quantum computers compete with conventional computers and how conventional computers try to hold their ground. Then we outline what stands in the way of successful quantum ML applications.

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

MLconf

Data Labeling as Religious Experience One of the most common places to deploy a production machine learning systems is as a replacement for a legacy rules-based system that is having a hard time keeping up with new edge cases and requirements. I'll be walking through the process and tooling we used to help us design, train, and deploy a model to replace a set of static rules we had for handling invite spam at Slack, talk about what we learned, and discuss some problems to solve in order to make these migrations easier for everyone.

Josh Wills - Data Labeling as Religious Experience

MLconf

Project GaitNet: Ushering in the ImageNet moment for human Gait kinematics The emergence of the upright human bipedal gait can be traced back 4 to 2.8 million years ago, to the now extinct hominin Australopithecus afarensis. Fine grained analysis of gait using the modern MEMS sensors found on all smartphones not just reveals a lot about the person’s orthopedic and neuromuscular health status, but also has enough idiosyncratic clues that it can be harnessed as a passive biometric. While there were many siloed attempts made by the machine learning community to model Bipedal Gait sensor data, these were done with small datasets oft collected in restricted academic environs. In this talk, we will introduce the ImageNet moment for human gait analysis by presenting 'Project GaitNet', the largest ever planet-sized motion sensor based human bipedal gait dataset ever curated. We’ll also present the associated state-of-the-art results in classifying humans harnessing novel deep neural architectures and the related success stories we have enjoyed in transfer-learning into disparate domains of human kinematics analysis.

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

MLconf

Machine Learning Methods in Detecting Alzheimer’s Disease from Speech and Language Alzheimer's disease affects millions of people worldwide, and it is important to predict the disease as early and as accurate as possible. In this talk, I will discuss development of novel ML models that help classifying healthy people from those who develop Alzheimer's, using short samples of human speech. As an input to the model, features of different modalities are extracted from speech audio samples and transcriptions: (1) syntactic measures, such as e.g. production rules extracted from syntactic parse trees, (2) lexical measures, such as e.g. features of lexical richness and complexity and lexical norms, and (3) acoustic measures, such as e.g. standard Mel-frequency cepstral coefficients. I will present the ML model that detects cognitive impairment by reaching agreement among modalities. The resulting model is able to achieve state of the art performance in both supervised and semi-supervised manner, using manual transcripts of human speech. Additionally, I will discuss potential limitations of any fully-automated speech-based Alzheimer's disease detection model, focusing mostly on the analysis of the impact of a not-so-accurate automatic speech recognition (ASR) on the classification performance. To illustrate this, I will present the experiments with controlled amounts of artificially generated ASR errors and explain how the deletion errors affect Alzheimer's detection performance the most, due to their impact on the features of syntactic and lexical complexity.

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

MLconf

Optimized Image Classification on the Cheap In this talk, we anchor on building an image classifier trained on the Stanford Cars dataset to evaluate two approaches to transfer learning -fine tuning and feature extraction- and the impact of hyperparameter optimization on these techniques. Once we define the most performant transfer learning technique for Stanford Cars, we will double the size of the dataset through image augmentation to boost the classifier’s performance. We will use Bayesian optimization to learn the hyperparameters associated with image transformations using the downstream image classifier’s performance as the guide. In conjunction with model performance, we will also focus on the features of these augmented images and the downstream implications for our image classifier. To both maximize model performance on a budget and explore the impact of optimization on these methods, we apply a particularly efficient implementation of Bayesian optimization to each of these architectures in this comparison. Our goal is to draw on a rigorous set of experimental results that can help us answer the question: how can resource-constrained teams make trade-offs between efficiency and effectiveness using pre-trained models?

Meghana Ravikumar - Optimized Image Classification on the Cheap

MLconf

The Importance of Modeling Data Collection Data sets used in machine learning are often collected in a systematically biased way - certain data points are more likely to be collected than others. We call this "observation bias". For example, in health care, we are more likely to see lab tests when the patient is feeling unwell than otherwise. Failing to account for observation bias can, of course, result in poor predictions on new data. By contrast, properly accounting for this bias allows us to make better use of the data we do have. In this presentation, we discuss practical and theoretical approaches to dealing with observation bias. When the nature of the bias is known, there are simple adjustments we can make to nonparametric function estimation techniques, such as Gaussian Process models. We also discuss the scenario where the data collection model is unknown. In this case, there are steps we can take to estimate it from observed data. Finally, we demonstrate that having a small subset of data points that are known to be collected at random - that is, in an unbiased way - can vastly improve our ability to account for observation bias in the rest of the data set. My hope is that attendees of this presentation will be aware of the perils of observation bias in their own work, and be equipped with tools to address it.

Noam Finkelstein - The Importance of Modeling Data Collection

MLconf

The Uncanny Valley of ML Every so often, the conundrum of the Uncanny Valley re-emerges as advanced technologies evolve from clearly experimental products to refined accepted technologies. We have seen its effects in robotics, computer graphics, and page load times. The debate of how to handle the new technology detracts from its benefits. When machine learning is added to human decision systems a similar effect can be measured in increased response time and decreased accuracy. These systems include radiology, judicial assignments, bus schedules, housing prices, power grids and a growing variety of applications. Unfortunately, the Uncanny Valley of ML can be hard to detect in these systems and can lead to degraded system performance when ML is introduced, at great expense. Here, we'll introduce key design principles for introducing ML into human decision systems to navigate around the Uncanny Valley and avoid its pitfalls.

June Andrews - The Uncanny Valley of ML

MLconf

Deep Learning Architectures for Semantic Relation Detection Tasks Recognizing and distinguishing specific semantic relations from other types of semantic relations is an essential part of language understanding systems. Identifying expressions with similar and contrasting meanings is valuable for NLP systems which go beyond recognizing semantic relatedness and require to identify specific semantic relations. In this talk, I will first present novel techniques for creating labelled datasets required for training deep learning models for classifying semantic relations between phrases. I will further present various neural network architectures that integrate morphological features into integrated path-based and distributional relation detection algorithms and demonstrate that this model outperforms state-of-the-art models in distinguishing semantic relations and is capable of efficiently handling multi-word expressions.

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

MLconf

Building an Incrementally Trained, Local Taste Aware, Global Deep Learned Recommender System Model At Netflix, our main goal is to maximize our members’ enjoyment of the selected show by minimizing the amount of time it takes for them to find it. We try to achieve this goal by personalizing almost all the aspects of our product -- from what shows to recommend, to how to present these shows and construct their home-pages to what images to select per show, among many other things. Everything is recommendations for us and as an applied Machine Learning group, we spend our time building models for personalization that will eventually increase the joy and satisfaction of our members. In this talk we will primarily focus our attention on a) making a global deep learned recommender model that is regional tastes and popularity aware and b) adapting this model to changing taste preferences as well as dynamic catalog availability. We will first go through some standard recommender system models that use Matrix Factorization and Topic Models and then compare and contrast them with more powerful and higher capacity deep learning based models such as sequence models that use recurrent neural networks. We will show what it entails to build a global model that is aware of regional taste preferences and catalog availability. We will show how models that are built on simple Maximum Likelihood principle fail to do that. We will then describe one solution that we have employed in order to enable the global deep learned models to focus their attention on capturing regional taste preferences and changing catalog.In the latter half of the talk, we will discuss how we do incremental learning of deep learned recommender system models. Why do we need to do that ? Everything changes with time. Users’ tastes change with time. What’s available on Netflix and what’s popular also change over time. Therefore, updating or improving recommendation systems over time is necessary to bring more joy to users. In addition to how we apply incremental learning, we will discuss some of the challenges we face involving large-scale data preparation, infrastructure setup for incremental model training as well as pipeline scheduling. The incremental training enables us to serve fresher models trained on fresher and larger amounts of data. This helps our recommender system to nicely and quickly adapt to catalog and users’ taste changes, and improve overall performance.

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

MLconf

Vito Ostuni - The Voice: New Challenges in a Zero UI World The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a daily delightful listening experience for millions of users. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic, and broad open-ended. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query. We will also present the differences and challenges regarding evaluation of voice powered recommendation systems. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.

Vito Ostuni - The Voice: New Challenges in a Zero UI World

MLconf

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

MLconf

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

MLconf

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

MLconf

Neel Sundaresan - Teaching a machine to code

MLconf

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

MLconf

Soumith Chintala - Increasing the Impact of AI Through Better Software

MLconf

Roy Lowrance - Predicting Bond Prices: Regime Changes

MLconf

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Josh Wills - Data Labeling as Religious Experience

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Meghana Ravikumar - Optimized Image Classification on the Cheap

Noam Finkelstein - The Importance of Modeling Data Collection

June Andrews - The Uncanny Valley of ML

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Neel Sundaresan - Teaching a machine to code

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Soumith Chintala - Increasing the Impact of AI Through Better Software

Roy Lowrance - Predicting Bond Prices: Regime Changes

Recently uploaded

Understanding the FAA Part 107 License ..

Christopher Logan Kennedy

DBX First Quarter 2024 Investor Presentation

Dropbox

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

ICT role in 21st century education and its challenges

rafiqahmad00786416

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Retrieval augmented generation (RAG) is the most popular style of large language model application to emerge from 2023. The most basic style of RAG works by vectorizing your data and injecting it into a vector database like Milvus for retrieval to augment the text output generated by an LLM. This is just the beginning. One of the ways that we can extend RAG, and extend AI, is through multilingual use cases. Typical RAG is done in English using embedding models that are trained in English. In this talk, we’ll explore how RAG could work in languages other than English. We’ll explore French, Chinese, and Polish.

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Zilliz

Whatsapp Number Escorts Call girls 8617370543 Available 24x7 Mcleodganj Call Girls Service Offer Genuine VIP Model Escorts Call Girls in Your Budget. Mcleodganj Call Girls Service Provide Real Call Girls Number. Make Your Sexual Pleasure Memorable with Our Mcleodganj Call Girls at Affordable Price. Top VIP Escorts Call Girls, High Profile Independent Escorts Call Girls, Housewife Women Escorts Call Girl, College Girls Escorts Call Girls, Russian Escorts Call girls Service in Your Budget.

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Deepika Singh

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

💥 You’re lucky! We’ve found two different (lead) developers that are willing to share their valuable lessons learned about using UiPath Document Understanding! Based on recent implementations in appealing use cases at Partou and SPIE. Don’t expect fancy videos or slide decks, but real and practical experiences that will help you with your own implementations. 📕 Topics that will be addressed: • Training the ML-model by humans: do or don't? • Rule-based versus AI extractors • Tips for finding use cases • How to start 👨‍🏫👨‍💻 Speakers: o Dion Morskieft, RPA Product Owner @Partou o Jack Klein-Schiphorst, Automation Developer @Tacstone Technology

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

UiPathCommunity

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

Corporate and higher education. Two industries that, in the past, have had a clear divide with very little crossover. The difference in goals, learning styles and objectives paved the way for differing learning technologies platforms to evolve. Now, those stark lines are blurring as both sides are discovering they have content that’s relevant to the other. Join Tammy Rutherford as she walks through the pros and cons of corporate and higher ed collaborating. And the challenges of these different technology platforms working together for a brighter future.

Corporate and higher education May webinar.pptx

Rustici Software

The microservices honeymoon is over. When starting a new project or revamping a legacy monolith, teams started looking for alternatives to microservices. The Modular Monolith, or 'Modulith', is an architecture that reaps the benefits of (vertical) functional decoupling without the high costs associated with separate deployments. This talk will delve into the advantages and challenges of this progressive architecture, beginning with exploring the concept of a 'module', its internal structure, public API, and inter-module communication patterns. Supported by spring-modulith, the talk provides practical guidance on addressing the main challenges of a Modultith Architecture: finding and guarding module boundaries, data decoupling, and integration module-testing. You should not miss this talk if you are a software architect or tech lead seeking practical, scalable solutions. About the author With two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Victor Rentea

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar. In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

MINDCTI Revenue Release Quarter One 2024

MIND CTI

In this keynote, Asanka Abeysinghe, CTO,WSO2 will explore the shift towards platformless technology ecosystems and their importance in driving digital adaptability and innovation. We will discuss strategies for leveraging decentralized architectures and integrating diverse technologies, with a focus on building resilient, flexible, and future-ready IT infrastructures. We will also highlight WSO2's roadmap, emphasizing our commitment to supporting this transformative journey with our evolving product suite.

Platformless Horizons for Digital Adaptability

WSO2

Recently uploaded (20)

Understanding the FAA Part 107 License ..

DBX First Quarter 2024 Investor Presentation

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

ICT role in 21st century education and its challenges

Why Teams call analytics are critical to your entire business

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Corporate and higher education May webinar.pptx

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

presentation ICT roal in 21st century education

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

MINDCTI Revenue Release Quarter One 2024

Platformless Horizons for Digital Adaptability

Erich Elsen, Research Scientist, Baidu Research at MLconf NYC - 4/15/16

1. Training Recurrent Neural Networks at Scale Erich Elsen Research Scientist

2. Erich Elsen Natural User Interfaces • Goal: Make interacting with computers as natural as interacting with humans • AI problems: – Speech recognition – Emotional recognition – Semantic understanding – Dialog systems – Speech synthesis

3. Erich Elsen Deep Speech Applications • Voice controlled apps • Peel Partnership • English and Mandarin APIs in the US • Integration into Baidu’s products in China

4. Erich Elsen Deep Speech: End-to-end learning • Deep neural network predicts probability of characters directly from audio . . . . . . T H _ E … D O G

5. Erich Elsen Connectionist Temporal Classification

6. Erich Elsen Deep Speech: CTC E .01 .05 .1 .1 .8 .05 H .01 .1 .1 .6 .05 .05 T .01 .8 .75 .2 .05 .1 BLANK .97 .05 .05 .1 .1 .8 • Simplified sequence of network outputs (probabilities) • Generally many more timesteps than letters • Need to look at all the ways we can write “the” • Adjacent characters collapse • TTTHEE, TTTTHE, TTHHEE, THEEEE, …. • Solve with dynamic programming Time

7. Erich Elsen warp-ctc • Recently open sourced our CTC implementation • Efficient, parallel CPU and GPU backend • 100-400X faster than other implementations • Apache license, C interface https://github.com/baidu-research/warp-ctc

8. Erich Elsen Accuracy scales with Data Data & Model Size Performance Deep Learning algorithms Many previous methods • 40% error reduction for each 10x increase in dataset size

9. Erich Elsen Training sets • Train on ~1½ years of data (and growing) • English and Mandarin • End-to-end deep learning is key to assembling large datasets • Datasets drive accuracy

10. Erich Elsen Large Datasets = Large Models Dataset Size Big Model Small Model Accuracy • Models require over 20 Exa-flops to train (exa = 10^18) • Trained on 4+ Terabytes of audio

11. Erich Elsen Virtuous Cycle of Innovation Perform ExperimentLearn Iterate Design New Experiment

12. Erich Elsen Experiment Scaling • Batch Norm impact with deeper networks • Sequence wise normalization:

13. Erich Elsen Parallelism across GPUs Model Parallel Data Parallel MPI_Allreduce() Training Data Training Data For these models, Data Parallelism works best

14. Erich Elsen Performance for RNN training • 55% of GPU FMA peak using a single GPU • ~48% of peak using 8 GPUs in one node • Weak scaling very efficient, albeit algorithmically challenged 1 2 4 8 16 32 64 128 256 512 1 2 4 8 16 32 64 128 TFLOP/s Number of GPUs Typical training run one node multi node

15. Erich Elsen All-reduce • We implemented our own all-reduce out of send and receive • Several algorithm choices based on size • Careful attention to affinity and topology

16. Erich Elsen Scalability • Batch size is hard to increase – algorithm, memory limits • Performance at small batch sizes (32, 64) leads to scalability limits

17. Erich Elsen Precision • FP16 also mostly works – Use FP32 for softmax and weight updates • More sensitive to labeling error 1 10 100 1000 10000 100000 1000000 10000000 100000000 -31 -30 -29 -28 -27 -26 -25 -24 -23 -22 -21 -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 Count Magnitude Weight Distribution

18. Erich Elsen Conclusion • We have to do experiments at scale • Pushing compute scaling for end-to-end deep learning • Efficient training for large datasets – 50 Teraflops/second sustained on one model – 20 Exaflops to train each model • Thanks to Bryan Catanzaro, Carl Case, Adam Coates for donating some slides Erich Elsen

Editor's Notes

Model Parallel: Latency sensitive Data Parallel: Bandwidth sensitive

Erich Elsen, Research Scientist, Baidu Research at MLconf NYC - 4/15/16

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Similar to Erich Elsen, Research Scientist, Baidu Research at MLconf NYC - 4/15/16

Similar to Erich Elsen, Research Scientist, Baidu Research at MLconf NYC - 4/15/16 (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Erich Elsen, Research Scientist, Baidu Research at MLconf NYC - 4/15/16

Editor's Notes