Main Talk: Google's Neural Machine Translation System and Research progress
Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. In this talk, I will talk about the model architecture, word-pieces design, training algorithm and how to make training/serving faster. Possibly I will mention about the zero-shot for Multilingual model as well. Also, I will cover what/how translation research makes continuous progress from last year.
Speaker:Xiaobing Liu
Xiaobing Liu is Google Brain Staff software engineer and machine learning researcher. In his work, Xiaobing focuses on Tensorflow and some key applications where Tensorflow could be applied to improve Google products, such as Google Search, Play recommendation and Google translation and Medical Brain. His research interests span from system to the practice of machine learning. His research contributions have been successfully implemented into various commercial products at Tencent, Yahoo. and Google He has served in the program committee for ACL 2017 and session chair for AAAI 2017, including publications at top conference such as Recsys, NIPS, ACL.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
While academic research is more and more focusing on integration of deep learning approaches for machine translation, also called Neural Machine Translation, and shows promising and exciting results – the resulting systems still have important pragmatic limitations compared to the current generation of translation engine. We will be discussing how SYSTRAN is integrating these new techniques into production systems, the results and benefits for the end users, and our vision for the next versions.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
While academic research is more and more focusing on integration of deep learning approaches for machine translation, also called Neural Machine Translation, and shows promising and exciting results – the resulting systems still have important pragmatic limitations compared to the current generation of translation engine. We will be discussing how SYSTRAN is integrating these new techniques into production systems, the results and benefits for the end users, and our vision for the next versions.
Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of transfer learning methods and architectures which significantly improved upon the state-of-the-art on pretty much every NLP tasks.
The wide availability and ease of integration of these transfer learning models are strong indicators that these methods will become a common tool in the NLP landscape as well as a major research direction.
In this talk, I'll present a quick overview of modern transfer learning methods in NLP and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks, focusing on open-source solutions.
Website: https://fwdays.com/event/data-science-fwdays-2019/review/transfer-learning-in-nlp
DELAB - sequence generation seminar
Title
[IMPL] Neural Machine Translation
Description
Implementation of seq2seq with attention from scratch using pytorch
Table of contents
• Sequence to sequence with attention
• Data
• Tokenizer
• NMTDataset
• Model
• EncLayer
• DecCell
• DecLayer
• Utils
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
BERT was developed by Google AI Language and came out Oct. 2018. It has achieved the best performance in many NLP tasks. So if you are interested in NLP, studying BERT is a good way to go.
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...Hayahide Yamagishi
This is the slide used in the oral presentation at PACLING2019.
(For Japanese speakers) 本発表は私の修論発表と同等ですので、日本語がわかる方は以下のスライドの方が読みやすいかもしれません。
https://www.slideshare.net/HayahideYamagishi/ss-181147693/HayahideYamagishi/ss-181147693
Font has been changed the original one (Hiragino Maru Gothic Pro W4) into the other one by the SlideShare.
Presentation about Tree-LSTMs networks described in "Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks" by Kai Sheng Tai, Richard Socher, Christopher D. Manning
The transformer is the neural architecture that has received most attention in the early 2020's. It removed the recurrency in RNNs, replacing it with and attention mechanism across the input and output tokens of a sequence (cross-attenntion) and between the tokens composing the input (and output) sequences, named self-attention.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
NLP Town's Yves Peirsman talk about how word embeddings, LSTMs/RNNs, Attention, Encoder-Decoder architectures, and more have helped NLP forward, including which challenges remain to be tackled (and some techniques to do just that).
Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of transfer learning methods and architectures which significantly improved upon the state-of-the-art on pretty much every NLP tasks.
The wide availability and ease of integration of these transfer learning models are strong indicators that these methods will become a common tool in the NLP landscape as well as a major research direction.
In this talk, I'll present a quick overview of modern transfer learning methods in NLP and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks, focusing on open-source solutions.
Website: https://fwdays.com/event/data-science-fwdays-2019/review/transfer-learning-in-nlp
DELAB - sequence generation seminar
Title
[IMPL] Neural Machine Translation
Description
Implementation of seq2seq with attention from scratch using pytorch
Table of contents
• Sequence to sequence with attention
• Data
• Tokenizer
• NMTDataset
• Model
• EncLayer
• DecCell
• DecLayer
• Utils
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
BERT was developed by Google AI Language and came out Oct. 2018. It has achieved the best performance in many NLP tasks. So if you are interested in NLP, studying BERT is a good way to go.
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...Hayahide Yamagishi
This is the slide used in the oral presentation at PACLING2019.
(For Japanese speakers) 本発表は私の修論発表と同等ですので、日本語がわかる方は以下のスライドの方が読みやすいかもしれません。
https://www.slideshare.net/HayahideYamagishi/ss-181147693/HayahideYamagishi/ss-181147693
Font has been changed the original one (Hiragino Maru Gothic Pro W4) into the other one by the SlideShare.
Presentation about Tree-LSTMs networks described in "Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks" by Kai Sheng Tai, Richard Socher, Christopher D. Manning
The transformer is the neural architecture that has received most attention in the early 2020's. It removed the recurrency in RNNs, replacing it with and attention mechanism across the input and output tokens of a sequence (cross-attenntion) and between the tokens composing the input (and output) sequences, named self-attention.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
NLP Town's Yves Peirsman talk about how word embeddings, LSTMs/RNNs, Attention, Encoder-Decoder architectures, and more have helped NLP forward, including which challenges remain to be tackled (and some techniques to do just that).
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit-google-keynote
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Jeff Dean, Senior Fellow at Google, presents the "Large-Scale Deep Learning for Building Intelligent Computer Systems" keynote at the May 2016 Embedded Vision Summit.
Over the past few years, Google has built two generations of large-scale computer systems for training neural networks, and then applied these systems to a wide variety of research problems that have traditionally been very difficult for computers. Google has released its second generation system, TensorFlow, as an open source project, and is now collaborating with a growing community on improving and extending its functionality. Using TensorFlow, Google's research group has made significant improvements in the state-of-the-art in many areas, and dozens of different groups at Google use it to train state-of-the-art models for speech recognition, image recognition, various visual detection tasks, language modeling, language translation, and many other tasks.
In this talk, Jeff highlights some of ways that Google trains large models quickly on large datasets, and discusses different approaches for deploying machine learning models in environments ranging from large datacenters to mobile devices. He will then discuss ways in which Google has applied this work to a variety of problems in Google's products, usually in close collaboration with other teams. This talk describes joint work with many people at Google.
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
I had a fun time giving tutorial on the topic of deep learning in recommender systems at Latin America School on Recommender Systems (LARS) in Fortaleza, Brazil.
An Introduction to Natural Language ProcessingTyrone Systems
Learn about how Natural Language Processing in AI can be used and how it applies to you in the real world.
You can learn about NLP concepts, Pre-processing steps, Vectorization Methods, Generative and Unsupervised methods. All the resource is available for you to grow your knowledge and skills about Natural Language Processing webinar!
While academic research is more and more focusing on integration of deep learning approaches for machine translation, also called Neural Machine Translation, and shows promising and exciting results – the resulting systems still have important pragmatic limitations compared to the current generation of translation engine. We will be discussing how SYSTRAN is integrating these new techniques into production systems, the results and benefits for the end users, and our vision for the next versions.
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by Hugging Face, in particular our transformers, tokenizers, and NLP libraries as well as our distilled and pruned models.
Deep learning is one of the most exciting areas of machine learning and AI. This presentation covers all the very basics of deep neural networks, from the concept down to applications and why this technology is so popular in today's business landscape.
This presentation is provided by the Tesseract Academy, which provides executive education for deep technical subjects such as data science and blockchain. For a video of the presentation please visit https://www.youtube.com/watch?v=RiYGluH_cx0&t=0s&list=PLVce3C5Hi9BBfabvhEzYQTQDYEg2vtuxH&index=2
For an associated blog post about deep learning also visit http://thedatascientist.com/what-deep-learning-is-and-isnt/
Quick intro to Neural Machine Translation and overview of the Joey NMT toolkit.
Code: https://github.com/joeynmt/joeynmt
Demo: https://github.com/joeynmt/joeynmt/blob/master/joey_demo.ipynb
Here are the slides for the presentation that Shai Reznik and I gave at Angular Connect 2015. Our presentation is 5-minutes of meaningful content wrapped in another 20 minutes of wackiness that pokes fun at a lot of other memorable keynotes we have seen.
Beyond the Hype: 4 Years of Go in ProductionC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1SaJaeK.
Travis Reeder thinks the performance, memory, concurrency, reliability, and deployment are key to exploring Go and its value in production. Travis describes how it’s worked for Iron.io. Filmed at qconsf.com.
Travis Reeder is CTO/co-founder of Iron.io, heading up the architecture and engineering efforts. He has 15+ years of experience developing high-throughput web applications and cloud services.
이 발표에서는 TensorFlow의 지난 1년을 간단하게 돌아보고, TensorFlow의 차기 로드맵에 따라 개발 및 도입될 예정인 여러 기능들을 소개합니다. 또한 2017년 및 2018년의 머신러닝 프레임워크 개발 트렌드와 방향에 대한 이야기도 함께 합니다.
In this talk, I look back the TensorFlow development over the past year. Then discusses the overall development direction of machine learning frameworks, with an introduction to features that will be added to TensorFlow later on.
Delivered at the European Patent Office's Patent Information Conference.
November 11th 2015
Miami, Florida.
In this talk, we talk about recent advances in MT for patents and introduce our IPTranslator.com application for on-demand translation.
Similar to Moving to neural machine translation at google - gopro-meetup (20)
GPUs used with Apache Spark are leveraged to speed up machine learning (ML) model training and inference. Data preparation stages are traditionally run on CPUs. The RAPIDS Accelerator for Apache Spark is a plugin jar that takes advantage of Apache Spark 3.x's ability to schedule on GPUs. The RAPIDS Accelerator replaces CPU expressions in a physical plan with GPU equivalents for dataframe operations. Code change is not required, making transition to GPUs seamless.
We'll give an overview of what the RAPIDS Accelerator is, how it works, and benefits from using the accelerator. We will discuss benchmarks showing the performance and cost benefits of leveraging GPUs for Spark ETL processing. We'll showcase a user tool that will help estimate speedups and cost savings.
Talk at SF Big Analytics https://www.meetup.com/sf-big-analytics/events/285731741/
Distributed systems are made up of many components such as authentication, a persistence layer, stateless services, load balancers, and stateful coordination services. These coordination services are central to the operation of the system, performing tasks such as maintaining system configuration state, ensuring service availability, name resolution, and storing other system metadata. Given their central role in the system it is essential that these systems remain available, fault tolerant and consistent. By providing a highly available file system-like abstraction as well as powerful recipes such as leader election, Apache Zookeeper is often used to implement these services. Although powerful, the Zookeeper interface may not be flexible enough or provide sufficient performance for all applications and many systems are replacing Zookeeper based solutions with Raft which provides a more generic interface to high availability and fault tolerance through the use of State Machine replication. This talk will go over a generic example of stateful coordination service moving from Zookeeper to Raft.
Speaker: Tyler Crain ( Alluxio)
Tyler Crain is a software engineer at Alluxio, working on distributed systems within the Alluxio core team. Before this, Tyler held Post-Doc positions at the University of Sydney and Sorbonne Universities where he performed research on topics including distributed key-value stores, distributed consensus and blockchain. Tyler received his PhD from the University of Rennes where he worked on Transactional Memory. He also holds a Masters degree in Computer Science from University of California Santa Barbara.
talk at SF Big Analytics:
Related Blog: https://www.alluxio.io/blog/from-zookeeper-to-raft-how-alluxio-stores-file-system-state-with-high-availability-and-fault-tolerance/
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...Chester Chen
Recent years have witnessed an exponential growth of the model scale in recommendation/Ads/search—from Google’s 2016 model with 1 billion parameters to the latest Facebook’s model with 12 trillion parameters. Significant quality boost has come with each jump of the model capacity, which makes people believe the era of 100 trillion parameters is around the corner. To prepare the exponential growth of the model size, an efficient distributed training system is in urgent need. However, the training of such huge models is challenging even within industrial scale data centers. In this talk, I will introduce Persia -- an open training system developed by my team -- to resolve this challenge by careful co-design of both the optimization algorithm and the distributed system architecture. Persia admits nearly linear speedup properties while scaling the number of workers and the model size. Beside the capability of training 100 trillion parameters, it also shows a clear advantage in efficiency over other open sourced engines.
paper link:
https://arxiv.org/pdf/2111.05897.pdf
Speaker: Ji Liu
Dr. Ji Liu received his Ph.D in computer science and his bachelor degree in automation from University of Wisconsin-Madison and University of Science and Technology of China, respectively. After graduation, he joined the University of Rochester as an assistant professor, conducting research in machine learning, optimization, and reinforcement learning. The developed asynchronous and decentralized algorithms were widely used in industry, such as IBM, Microsoft, etc. He left academia and joined Tencent in 2017, exploring AI’s boundary. The developing AI agent Tstarbot was considered to be a milestone for mastering the most challenging RTS game -- Starcraft II. His second stop in industry is Kwai - the second largest short video company in China. He founded and led multiple international teams with different functionalities: platform team, product team, and research team. His team Contributed to 15+% annual revenue growth in Ads. He published 100+ papers in top-tier CS conferences and journals, and received multiple best paper awards (e.g., SIGKDD 2010 and UAI 2015 Facebook best paper). He was an awardee of MIT TR 35 under 35 in China and IBM faculty award in 2017. He was nominated to be one of China top 5 AI innovators under 35 in 2018
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...Chester Chen
Topic:
NVIDIA FLARE: Federated Learning Application Runtime Environment for Developing Robust AI Models
Summary:
Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without moving data. We created NVIDIA FLARE as an open-source SDK to make it easier for data scientists to use FL in their research. The SDK allows existing machine learning and deep learning workflows adapted for distributed learning across enterprises and enables platform developers to build a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package and allows researchers to bring their data science workflows implemented in any training libraries (PyTorch, TensorFlow, or even NumPy), and apply them in real-world FL settings. This talk will introduce the key design principles of NVIDIA FLARE and illustrate use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms.
Speaker: Dr. Holger Roth ( Nvidia)
Holger Roth is a Sr. Applied Research Scientist at NVIDIA focusing on deep learning for medical imaging. He has been working closely with clinicians and academics over the past several years to develop deep learning based medical image computing and computer-aided detection models for radiological applications. He is an Associate Editor for IEEE Transactions of Medical Imaging and holds a Ph.D. from University College London, UK. In 2018, he was awarded the MICCAI Young Scientist Publication Impact Award.
A missing link in the ML infrastructure stack?Chester Chen
Talk at SF Big Analytics
Machine learning is quickly becoming a product engineering discipline. Although several new categories of infrastructure and tools have emerged to help teams turn their models into production systems, doing so is still extremely challenging for most companies. In this talk, we survey the tooling landscape and point out several parts of the machine learning lifecycle that are still underserved. We propose a new category of tool that could help alleviate these challenges and connect the fragmented production ML tooling ecosystem. We conclude by discussing similarities and differences between our proposed system and those of a few top companies.
Bio: Josh Tobin is the founder and CEO of a stealth machine learning startup. Previously, Josh worked as a deep learning & robotics researcher at OpenAI and as a management consultant at McKinsey. He is also the creator of Full Stack Deep Learning (fullstackdeeplearning.com), the first course focused on the emerging engineering discipline of production machine learning. Josh did his PhD in Computer Science at UC Berkeley advised by Pieter Abbeel.
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
Uber developed an new Spark ingestion system, Marmaray, for data ingestion from various sources. It’s designed to ingest billions of Kafka messages every 30 minutes. The amount of data handled by the pipeline is of the order hundreds of TBs. Omar details how to tackle such scale and insights into the optimizations techniques. Some key highlights are how to understand bottlenecks in Spark applications, to cache or not to cache your Spark DAG to avoid rereading your input data, how to effectively use accumulators to avoid unnecessary Spark actions, how to inspect your heap and nonheap memory usage across hundreds of executors, how you can change the layout of data to save long-term storage cost, how to effectively use serializers and compression to save network and disk traffic, and how to reduce amortize the cost of your application by multiplexing your jobs, different techniques for reducing memory footprint, runtime, and on-disk usage. CGI was able to significantly (~10%–40%) reduce memory footprint, runtime, and disk usage.
Speaker: Omkar Joshi (Uber)
Omkar Joshi is a senior software engineer on Uber’s Hadoop platform team, where he’s architecting Marmaray. Previously, he led object store and NFS solutions at Hedvig and was an initial contributor to Hadoop’s YARN scheduler.
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...Chester Chen
Uncovering performance regressions in the TCP SACKs vulnerability fixes
In early July 2019, Databricks noticed some Apache Spark workloads regressing by as much as 6x. In this talk, we'll discuss how we traced these regressions back to the Linux kernel and the fixes for the TCP SACKs vulnerabilities. We will explain the symptoms we were seeing, walk through how we debugged the TCP connections, and dive into the Linux source to uncover the root cause.
Speaker: Chris Stevens (Databricks)
Chris Stevens is a software engineer at Databricks where he works on the reliability, scalability, and security of Apache Spark clusters. His work focuses on auto-scaling compute, auto-scaling storage, node initialization performance, and node health monitoring. Prior to Databricks, Chris founded the Minoca OS project, where he built a POSIX compliant, general purpose OS - from scratch - to run on resource constrained device. He got his start at Microsoft working on the Windows kernel team, porting the Windows boot environment from BIOS to UEFI.
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
Kafka operators need to provide guarantees to the business that Kafka is working properly and delivering data in real time, and they need to identify and triage problems so they can solve them before end users notice them. This elevates the importance of Kafka monitoring from a nice-to-have to an operational necessity. In this talk, Kafka operations experts Xavier Léauté and Gwen Shapira share their best practices for monitoring Kafka and the streams of events flowing through it. How to detect duplicates, catch buggy clients, and triage performance issues – in short, how to keep the business’s central nervous system healthy and humming along, all like a Kafka pro.
Speakers: Gwen Shapira, Xavier Leaute (Confluence)
Gwen is a software engineer at Confluent working on core Apache Kafka. She has 15 years of experience working with code and customers to build scalable data architectures. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.
Xavier Leaute is One of the first engineers to Confluent team, Xavier is responsible for analytics infrastructure, including real-time analytics in KafkaStreams. He was previously a quantitative researcher at BlackRock. Prior to that, he held various research and analytics roles at Barclays Global Investors and MSCI.
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleChester Chen
Talk 2. Managing Uber’s Data workflow at Scale.
Uber microservices serving millions of rides a day, leading to 100+ PB of data. To democratize data pipelines, Uber needed a central tool that provides a way to author, manage, schedule, and deploy data workflows at scale. This talk details Uber’s journey toward a unified and scalable data workflow system used to manage this data and shares the challenges faced and how the company has rearchitected several components of the system—such as scheduling and serialization—to make them highly available and more scalable.
Speaker Alex Kira (Uber)
Alex Kira is an engineering tech lead at Uber, where he works on the data workflow management team. His team provides a data infrastructure platform. In 19-year, he’s had experience across several software disciplines, including distributed systems, data infrastructure, and full stack development.
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
Building highly efficient data lakes using Apache Hudi (Incubating)
Even with the exponential growth in data volumes, ingesting/storing/managing big data remains unstandardized & in-efficient. Data lakes are a common architectural pattern to organize big data and democratize access to the organization. In this talk, we will discuss different aspects of building honest data lake architectures, pin pointing technical challenges and areas of inefficiency. We will then re-architect the data lake using Apache Hudi (Incubating), which provides streaming primitives right on top of big data. We will show how upserts & incremental change streams provided by Hudi help optimize data ingestion and ETL processing. Further, Apache Hudi manages growth, sizes files of the resulting data lake using purely open-source file formats, also providing for optimized query performance & file system listing. We will also provide hands-on tools and guides for trying this out on your own data lake.
Speaker: Vinoth Chandar (Uber)
Vinoth is Technical Lead at Uber Data Infrastructure Team
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftChester Chen
Talk 1. Scaling Apache Spark on Kubernetes at Lyft
As part of this mission Lyft invests heavily in open source infrastructure and tooling. At Lyft Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark at Lyft has evolved to solve both Machine Learning and large scale ETL workloads. By combining the flexibility of Kubernetes with the data processing power of Apache Spark, Lyft is able to drive ETL data processing to a different level. In this talk, We will talk about challenges the Lyft team faced and solutions they developed to support Apache Spark on Kubernetes in production and at scale. Topics Include: - Key traits of Apache Spark on Kubernetes. - Deep dive into Lyft's multi-cluster setup and operationality to handle petabytes of production data. - How Lyft extends and enhances Apache Spark to support capabilities such as Spark pod life cycle metrics and state management, resource prioritization, and queuing and throttling. - Dynamic job scale estimation and runtime dynamic job configuration. - How Lyft powers internal Data Scientists, Business Analysts, and Data Engineers via a multi-cluster setup.
Speaker: Li Gao
Li Gao is the tech lead in the cloud native spark compute initiative at Lyft. Prior to Lyft, Li worked at Salesforce, Fitbit, Marin Software, and a few startups etc. on various technical leadership positions on cloud native and hybrid cloud data platforms at scale. Besides Spark, Li has scaled and productionized other open source projects, such as Presto, Apache HBase, Apache Phoenix, Apache Kafka, Apache Airflow, Apache Hive, and Apache Cassandra.
SFBigAnalytics- hybrid data management using cdapChester Chen
Cloud has emerged as a critical enabler of digital transformation, with the aim of reducing IT overheads and costs. However, cloud
migration is not instantaneous for a variety of reasons including data sensitivity, compliance and application performance. This results in the creation of diverse hybrid and multi-cloud environments and amplifies data management and integration challenges. This talk demonstrates how CDAP’s flexibility can allow you to utilize your existing on-premises infrastructure, as you evolve to the latest Big Data and Cloud services at your own pace, all while providing you a single, unified view of all your data, wherever it resides.
Speaker: Bhooshan Mogal, Google
Bhooshan Mogal is a Product Manager at Google, where he is focused on delivering best-in-class Data and Analytics services to GCP users. Prior to Google, he worked on data systems at Cask Data Inc, Pivotal and Yahoo.
Bighead: Airbnb's end-to-end machine learning platform
Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python, Spark, and Kubernetes. The components include a lifecycle management service, an offline training and inference engine, an online inference service, a prototyping environment, and a Docker image customization tool. Each component can be used individually. In addition, Bighead includes a unified model building API that smoothly integrates popular libraries including TensorFlow, XGBoost, and PyTorch. Each model is reproducible and iterable through standardization of data collection and transformation, model training environments, and production deployment. This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adopted in Airbnb and we have variety of models running in production. We plan to open source Bighead to allow the wider community to benefit from our work.
Speaker: Andrew Hoh
Andrew Hoh is the Product Manager for the ML Infrastructure and Applied ML teams at Airbnb. Previously, he has spent time building and growing Microsoft Azure's NoSQL distributed database. He holds a degree in computer science from Dartmouth College.
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformChester Chen
Talk 1 : Evolution of the GoPro's data platform
In this talk, we will share GoPro’s experiences in building Data Analytics Cluster in Cloud. We will discuss: evolution of data platform from fixed-size Hadoop clusters to Cloud-based Spark Cluster with Centralized Hive Metastore +S3: Cost Benefits and DevOp Impact; Configurable, spark-based batch Ingestion/ETL framework;
Migration Streaming framework to Cloud + S3;
Analytics metrics delivery with Slack integration;
BedRock: Data Platform Management, Visualization & Self-Service Portal
Visualizing Machine learning Features via Google Facets + Spark
Speakers: Chester Chen
Chester Chen is the Head of Data Science & Engineering, GoPro. Previously, he was the Director of Engineering at Alpine Data Lab.
David Winters
David is an Architect in the Data Science and Engineering team at GoPro and the creator of their Spark-Kafka data ingestion pipeline. Previously He worked at Apple & Splice Machines.
Hao Zou
Hao is a Senior big data engineer at Data Science and Engineering team. Previously He worked as Alpine Data Labs and Pivotal
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
GoPro’s camera, drone, mobile devices as well as web, desktop applications are generating billions of event logs. The analytics metrics and insights that inform product, engineering, and marketing team decisions need to be distributed quickly and efficiently. We need to visualize the metrics to find the trends or anomalies.
While trying to building up the features store for machine learning, we need to visualize the features, Google Facets is an excellent project for visualizing features. But can we visualize larger feature dataset?
These are issues we encounter at GoPro as part of the data platform evolution. In this talk, we will discuss few of the progress we made at GoPro. We will talk about how to use Slack + Plot.ly to delivery analytics metrics and visualization. And we will also discuss our work to visualize large feature set using Google Facets with Apache Spark.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
2. Growing Use of Deep Learning at Google
Android
Apps
GMail
Image Understanding
Maps
NLP
Photos
Speech
Translation
many research uses..
YouTube
… many others ...
Across many products/areas:
# of directories containing model description files
3. Confidential & Proprietary
Why we care about translations
● 50% of Internet content
is in English.
● Only 20% of the world’s
population speaks
English.
To make the world’s information accessible, we need
translations
4. Confidential & Proprietary
Google Translate, a truly global product...
1B+
103
Monthly active users
Translations every single day, that is 140 Billion Words
Google Translate Languages cover 99% of online
population
1B+
5. Confidential & Proprietary
Agenda
● Quick History
● From Sequence to Sequence-to-Sequence Models
● BNMT (Brain Neural Machine Translation)
○ Architecture & Training
○ TPU and Quantization
● Multilingual Models
● What’s next?
6. Confidential & Proprietary
Quick Research History
● Various people at Google tried to improve translation with neural networks
○ Brain team, Translate team
7. Confidential & Proprietary
Quick Research History
● Various people at Google tried to improve translation with neural networks
○ Brain team, Translate team
● Sequence-To-Sequence models (NIPS 2014)
○ Based on many earlier approaches to estimate P(Y|X) directly
○ State-of-the-art on WMT En->Fr using custom software, very long training
○ Translation could be learned without explicit alignment!
○ Drawback: all information needs to be carried in internal state
■ Translation breaks down for long sentences!
8. Confidential & Proprietary
Quick Research History
● Various people at Google tried to improve translation with neural networks
○ Brain team, Translate team
● Sequence-To-Sequence models (NIPS 2014)
○ Based on many earlier approaches to estimate P(Y|X) directly
○ State-of-the-art on WMT En->Fr using custom software, very long training
○ Translation could be learned without explicit alignment!
○ Drawback: all information needs to be carried in internal state
■ Translation breaks down for long sentences!
● Attention Models (2014)
○ Removes drawback by giving access to all encoder states
■ Translation quality is now independent of sentence length!
9. Confidential & Proprietary
Preprocess
Neural Network
Old: Phrase-based translation New: Neural machine translation
● Lots of individual pieces
● Optimized somewhat independently
● End-to-end learning
● Simpler architecture
● Plus results are much better!
10. Confidential & Proprietary
Expected time to launch:
Actual time to launch:
3 years
13.5 months
Sept 2015:
Began project
using
TensorFlow
Feb 2016:
First
production
data results
Sept 2016:
zh->en
launched
Nov 2016:
8 languages
launched
(16 pairs to/from
English)
Mar 2017:
7 more
launched
(Hindi, Russian,
Vietnamese, Thai,
Polish, Arabic,
Hebrew)
Apr 2017:
26 more
launched
(16 European, 8 Indish,
Indonesian, Afrikaans)
Jun/Aug 2017:
36/20 more
launched
97 launched!
11. Original
Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the
highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,”
the House of God. Close to the western summit there is the dried and frozen
carcass of a leopard. No one has explained what the leopard was seeking at that
altitude.
12. Original
Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the
highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,”
the House of God. Close to the western summit there is the dried and frozen
carcass of a leopard. No one has explained what the leopard was seeking at that
altitude.
Back translation from Japanese (old)
Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that
the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai
language, has been referred to as the house of God. The top close to the west,
there is a dry, frozen carcass of a leopard. Whether the leopard had what the
demand at that altitude, there is no that nobody explained.
13. Original
Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the
highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,”
the House of God. Close to the western summit there is the dried and frozen
carcass of a leopard. No one has explained what the leopard was seeking at that
altitude.
Back translation from Japanese (old)
Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that
the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai
language, has been referred to as the house of God. The top close to the west,
there is a dry, frozen carcass of a leopard. Whether the leopard had what the
demand at that altitude, there is no that nobody explained.
Back translation from Japanese (new)
Kilimanjaro is a mountain of 19,710 feet covered with snow, which is said to be
the highest mountain in Africa. The summit of the west is called “Ngaje Ngai”
God‘s house in Masai language. There is a dried and frozen carcass of a leopard
near the summit of the west. No one can explain what the leopard was seeking
at that altitude.
14. Confidential & Proprietary
0
△TranslationQuality
+
Significant
change &
launchable
Chinese to
English
Almost all
language pairs
>0.1
>0.5
+0.6
Translation Quality
Zh/Ja/Ko/Tr to
English
0.6-1.5
0 = Worst translation
6 = Perfect translation
TranslationQuality
● Asian languages improved the most
● Some improvements as big as last 10 years of
improvements combined
17. Confidential & Proprietary
Neural Recurrent Sequence Models
● Predict next token: P(Y) = P(Y1) * P(Y2|Y1) * P(Y3|Y1,Y2) * ...
○ Language Models, state-of-the-art on public benchmark
■ Exploring the limits of language modeling
EOS Y1 Y2 Y3
Y1 Y2 Y3 EOS
18. Confidential & Proprietary
Sequence to Sequence
● Learn to map: X1, X2, EOS -> Y1, Y2, Y3, EOS
● Encoder/Decoder framework (decoder by itself just neural LM)
● Theoretically any sequence length for input/output works
X1 X2 EOS Y1 Y2 Y3
Y1 Y2 Y3 EOS
20. Confidential & Proprietary
Attention Mechanism
● Addresses the information bottleneck problem
○ All encoder states accessible instead of only final one
Sj
Ti
eij
28. Confidential & Proprietary
Model Training
● Runs on ~100 GPUs (12 replicas, 8 GPUs each)
○ Because softmax size only 32k, can be fully calculated (no sampling or HSM)
29. Confidential & Proprietary
Model Training
● Runs on ~100 GPUs (12 replicas, 8 GPUs each)
○ Because softmax size only 32k, can be fully calculated (no sampling or HSM)
● Optimization
○ Combination of Adam & SGD with delayed exponential decay
○ 128/256 sentence pairs combined into one batch (run in one ‘step’)
30. Confidential & Proprietary
Model Training
● Runs on ~100 GPUs (12 replicas, 8 GPUs each)
○ Because softmax size only 32k, can be fully calculated (no sampling or HSM)
● Optimization
○ Combination of Adam & SGD with delayed exponential decay
○ 128/256 sentence pairs combined into one batch (run in one ‘step’)
● Training time
○ ~1 week for 2.5M steps = ~300M sentence pairs
○ For example, on English->French we use only 15% of available data!
31. Model + Data Parallelism
...
Params
Many
replicas
Parameters
distributed across
many parameter
server machines
Params Params
...
32. Confidential & Proprietary
Speed matters. A lot.
10 0.2seconds/
sentence
seconds/
sentence
2 months
● Users care about speed
● Better algorithms and hardware (TPUs) made it possible
38. Confidential & Proprietary
Multilingual Model
● Model several language pairs in single model
○ We ran first experiments in 2/2016, surprisingly this worked!
● Prepend source with additional token to indicate target language
○ Translate to Spanish:
■ <2es> How are you </s> -> Cómo estás </s>
○ Translate to English:
■ <2en> Como estás </s> -> How are you </s>
39. Confidential & Proprietary
Multilingual Model
● Model several language pairs in single model
○ We ran first experiments in 2/2016, surprisingly this worked!
● Prepend source with additional token to indicate target language
○ Translate to Spanish:
■ <2es> How are you </s> -> Cómo estás </s>
○ Translate to English:
■ <2en> Cómo estás </s> -> How are you </s>
● No other changes to model architecture!
○ Extremely simple and effective
○ Usually with shared WPM for source/target
42. Confidential & Proprietary
Multilingual Model and Zero-Shot Translation
En Es
Es En
Single Multi
34.5 35.1
38.0 37.3
Translation:
<2es> How are you </s> Cómo estás </s>
<2en> Cómo estás </s> How are you </s>
1.
43. Confidential & Proprietary
Multilingual Model and Zero-Shot Translation
En Es
Es En
Single Multi
34.5 35.1
38.0 37.3
34.5 35.0
44.5 43.7
En Es
Pt En
En Es
Pt En
23.0 BLEU
Translation:
<2es> How are you </s> Cómo estás </s>
<2en> Cómo estás </s> How are you </s>
Zero-shot (pt->es):
<2es> Como você está </s> Cómo estás </s>
1.
2.
44. Confidential & Proprietary
Multilingual Model and Zero-Shot Translation
En Es
Es En
Single Multi
34.5 35.1
38.0 37.3
34.5 35.0
44.5 43.7
34.5 34.9
38.0 37.2
37.1 37.8
44.5 43.7
En Es
Pt En
En Es
Es En
En Pt
Pt En
En Es
Pt En
En Es
Es En
En Pt
Pt En
23.0 BLEU
24.0 BLEU
Translation:
<2es> How are you </s> Cómo estás </s>
<2en> Cómo estás </s> How are you </s>
Zero-shot (pt->es):
<2es> Como você está </s> Cómo estás </s>
1.
2.
3.
47. Confidential & Proprietary
Challenges
● Early cutoff
○ Cuts off or drops some words in source sentence
● Broken dates/numbers
○ 5 days ago -> 6일 전
○ (but, on average BNMT significantly better than PBMT on number expressions!)
48. Confidential & Proprietary
Challenges
● Early cutoff
○ Cuts off or drops some words in source sentence
● Broken dates/numbers
○ 5 days ago -> 6일 전
○ (but, on average BNMT significantly better than PBMT on number expressions!)
● Short rare queries
○ “deoxyribonucleic acid” in Japanese ?
○ Should be easy but isn’t yet
49. Confidential & Proprietary
Challenges
● Early cutoff
○ Cuts off or drops some words in source sentence
● Broken dates/numbers
○ 5 days ago -> 6일 전
○ (but, on average BNMT significantly better than PBMT on number expressions!)
● Short rare queries
○ “deoxyribonucleic acid” in Japanese ?
○ Should be easy but isn’t yet
● Transliteration of names
○ Eichelbergertown -> アイセルベルクタウン
50. Confidential & Proprietary
Challenges
● Early cutoff
○ Cuts off or drops some words in source sentence
● Broken dates/numbers
○ 5 days ago -> 6일 전
○ (but, on average BNMT significantly better than PBMT on number expressions!)
● Short rare queries
○ “deoxyribonucleic acid” in Japanese ?
○ Should be easy but isn’t yet
● Transliteration of names
○ Eichelbergertown -> アイセルベルクタウン
● Junk
○ xxx -> 牛津词典 (Oxford dictionary)
○ The cat is a good computer. -> 的英语翻译 (of the English language?)
○ Many sentences containing news started with “Reuters”
51. Confidential & Proprietary
Open Research Problems
● Use of context
○ Full document translation, streaming translation
○ Use other modalities & features
52. Confidential & Proprietary
Open Research Problems
● Use of context
○ Full document translation, streaming translation
○ Use other modalities & features
● Better automatic measures & objective functions
○ Current BLEU score weighs all words the same regardless of meaning
■ ‘president’ mostly more important than ‘the’
○ Discriminative training
■ Training with Maximum Likelihood produces mismatched training/test procedure!
● No decoding errors for maximum-likelihood training
■ RL (and similar) already running but no significant enough gains yet
● Humans see no difference in the results
53. Confidential & Proprietary
Open Research Problems
● Use of context
○ Full document translation, streaming translation
○ Use other modalities & features
● Better automatic measures & objective functions
○ Current BLEU score weighs all words the same regardless of meaning
■ ‘president’ mostly more important than ‘the’
○ Discriminative training
■ Training with Maximum Likelihood produces mismatched training/test procedure!
● No decoding errors for maximum-likelihood training
■ RL (and similar) already running but no significant enough gains yet
● Humans see no difference in the results
● Lots of improvements are boring to do!
○ Because they are incremental (but still have to be done)
○ Data cleaning, new test sets etc.
54. Confidential & Proprietary
What’s next from research?
● Convolutional sequence-to-sequence models
○ No recurrency, just windows over input with shared parameters
○ Encoder can be computed in parallel => faster
56. Outrageously Large Neural Networks: The Sparsely-gated Mixture-of-Experts Layer,
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le & Jeff Dean
Submitted to ICLR 2017, https://openreview.net/pdf?id=B1ckMDqlg
Per-Example Routing
57. Attention Is All You Need
Transformer: autoregressive pure attention model.
Previous-output attention: masked batch matmul
Also great on parsing, try it on your task!
Translation Model Training time BLEU (diff. from MOSES)
Transformer (large) 3 days on 8 GPU 28.4 (+7.8)
Transformer (small) 1 day on 1 GPU 24.9 (+4.3)
GNMT + Mixture of Experts 1 day on 64 GPUs 26.0 (+5.4)
ConvS2S (FB) 18 days on 1 GPU 25.1 (+4.5)
GNMT 1 day on 96 GPUs 24.6 (+4.0)
MOSES (phrase-based) N/A 20.6 (+0.0)
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N.
Gomez, Lukasz Kaiser, Illia Polosukhin, https://arxiv.org/abs/1706.03762
58. Confidential & Proprietary
BNMT for other projects
Other projects using same codebase for completely different problems (in search,
Google Assistant, …)
● Question/answering system (chat bots)
● Summarization
● Dialog modeling
● Generate question from query
● …
59. Confidential & Proprietary
Resources
● TensorFlow (www.tensorflow.org)
○ Code/Bugs on GitHub
○ Help on StackOverflow
○ Discussion on mailing list
● All information about BNMT is in these papers & blog posts
○ Google's Neural Machine Translation System: Bridging the Gap between Human and Machine
Translation
○ Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
● NYT article describes some of the development
○ The Great AI Awakening
● Internship & Residency
○ 3 months internships possible
○ 1-year residency program g.co/brainresidency
61. Confidential & Proprietary
Decoding Sequence Models
● Find the N-best highest probability output sequences
○ Take K-best Y1 and feed them one-by-one, generating K hypotheses
○ Take K-best Y2 for each of the hyps, generating K^2 new hyps (tree) etc.
○ At each step, cut hyps to N-best (or by score) until at end
Example N=2, K=2
EOS Y1a,
Y1b
Y2a,
Y2b
Y3a,
Y2b
Y1a,
Y1b
Y2a,
Y2b
Y3a,
Y2b EOS
1. Y1a Y2a …
2. Y1a Y2b …
3. Y1b Y2a …
4. Y1b Y2b ...
62. Confidential & Proprietary
Sampling from Sequence Models
● Generate samples of sequences
a. Generate probability distribution P(Y1)
b. Sample from P(Y1) according to its probabilities
c. Feed in found sample as input, goto a)
EOS Y1 Y2 Y3
Y1 Y2 Y3 EOS