BERT is a technique for pre-training deep bidirectional representations from unlabeled text by using a Transformer encoder. It can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of natural language processing tasks, including question answering and text classification. The presentation provides an overview of what BERT is, how it works through pre-training and fine-tuning, and example tasks it can be applied to such as sentence classification, question answering, and named entity recognition.
Towards Light-weight and Real-time Line Segment DetectionByung Soo Ko
This is a presentation material for the paper of "Towards Light-weight and Real-time Line Segment Detection" .
Written by Geonmo Gu*, Byungsoo Ko*, SeoungHyun Go, Sung-Hyun Lee, Jingeun Lee, Minchul Shin (* Authors contributed equally.)
@NAVER/LINE Vision
- Arxiv: https://arxiv.org/abs/2106.00186
- Github: https://github.com/navervision/mlsd
This document provides an overview of deep learning basics for natural language processing (NLP). It discusses the differences between classical machine learning and deep learning, and describes several deep learning models commonly used in NLP, including neural networks, recurrent neural networks (RNNs), encoder-decoder models, and attention models. It also provides examples of how these models can be applied to tasks like machine translation, where two RNNs are jointly trained on parallel text corpora in different languages to learn a translation model.
RNN AND LSTM
This document provides an overview of RNNs and LSTMs:
1. RNNs can process sequential data like time series data using internal hidden states.
2. LSTMs are a type of RNN that use memory cells to store information for long periods of time.
3. LSTMs have input, forget, and output gates that control information flow into and out of the memory cell.
BERT is a technique for pre-training deep bidirectional representations from unlabeled text by using a Transformer encoder. It can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of natural language processing tasks, including question answering and text classification. The presentation provides an overview of what BERT is, how it works through pre-training and fine-tuning, and example tasks it can be applied to such as sentence classification, question answering, and named entity recognition.
Towards Light-weight and Real-time Line Segment DetectionByung Soo Ko
This is a presentation material for the paper of "Towards Light-weight and Real-time Line Segment Detection" .
Written by Geonmo Gu*, Byungsoo Ko*, SeoungHyun Go, Sung-Hyun Lee, Jingeun Lee, Minchul Shin (* Authors contributed equally.)
@NAVER/LINE Vision
- Arxiv: https://arxiv.org/abs/2106.00186
- Github: https://github.com/navervision/mlsd
This document provides an overview of deep learning basics for natural language processing (NLP). It discusses the differences between classical machine learning and deep learning, and describes several deep learning models commonly used in NLP, including neural networks, recurrent neural networks (RNNs), encoder-decoder models, and attention models. It also provides examples of how these models can be applied to tasks like machine translation, where two RNNs are jointly trained on parallel text corpora in different languages to learn a translation model.
RNN AND LSTM
This document provides an overview of RNNs and LSTMs:
1. RNNs can process sequential data like time series data using internal hidden states.
2. LSTMs are a type of RNN that use memory cells to store information for long periods of time.
3. LSTMs have input, forget, and output gates that control information flow into and out of the memory cell.
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
해당 논문은 3D Aware 모델입니다 StyleGAN 같은 경우에는 어떤 하나의 피처에 대해서 Editing 하고 싶을 때 입력에 해당하는 레이턴트 백터를 찾아서 레이턴트 백터를 수정함으로써 입에 해당하는 피쳐를 바꿀 수 있었는데 이런 컨셉을 그대로 착안해서
GAN 스페이스 논문에서는 인풋이 들어왔을 때 어떤 공간적인 정보까지도 에디팅하려고 시도했습니다 결과를 봤을 때 로테이션 정보가 어느 정도 잘 학습된 것 같지만 같은 사람이 아닌 것 같이 인식되기도 합니다 이러한 문제를 이제 disentangle 되지 않았다라고 하는 게 원하는 피처만 변화시켜야 되는 것과 달리 다른 피처까지도 모두 학습 모두 변했다는 것인데 이를 좀 더 효율적으로 3D를 더 잘 이해시키기 위해서 탄생한 논문입니다.
This presentation will help you to understand the basic concepts of Natural Language Processing With this you will understand the significance of Natural Language Processing in our daily life
Natural Language Processing (NLP) is a field of artificial intelligence that deals with interactions between computers and human languages. NLP aims to program computers to process and analyze large amounts of natural language data. Some common NLP tasks include speech recognition, text classification, machine translation, question answering, and more. Popular NLP tools include Stanford CoreNLP, NLTK, OpenNLP, and TextBlob. Vectorization is commonly used to represent text in a way that can be used for machine learning algorithms like calculating text similarity. Tf-idf is a common technique used to weigh words based on their frequency and importance.
[Paper Reading] Attention is All You NeedDaiki Tanaka
The document summarizes the "Attention Is All You Need" paper, which introduced the Transformer model for natural language processing. The Transformer uses attention mechanisms rather than recurrent or convolutional layers, allowing for more parallelization. It achieved state-of-the-art results in machine translation tasks using techniques like multi-head attention, positional encoding, and beam search decoding. The paper demonstrated the Transformer's ability to draw global dependencies between input and output with constant computational complexity.
Presented September 30, 2009 in San Jose, California at GPU Technology Conference.
Describes the new features of OpenGL 3.2 and NVIDIA's extensions beyond 3.2 such as bindless graphics, direct state access, separate shader objects, copy image, texture barrier, and Cg 2.2.
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Edureka!
** NLP Using Python: - https://www.edureka.co/python-natural-language-processing-course **
This Edureka PPT will provide you with a comprehensive and detailed knowledge of Natural Language Processing, popularly known as NLP. You will also learn about the different steps involved in processing the human language like Tokenization, Stemming, Lemmatization and much more along with a demo on each one of the topics.
The following topics covered in this PPT:
1. The Evolution of Human Language
2. What is Text Mining?
3. What is Natural Language Processing?
4. Applications of NLP
5. NLP Components and Demo
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
The document describes the development of the attention mechanism in machine translation. It presents a fictional conversation between an encoder and decoder discussing the limitations of early sequence-to-sequence models that represent the entire input with a fixed-size context vector. The decoder proposes sending each token's representation separately and calling them "keys". It then sends its own query to attend to the most relevant keys, inspired by search queries. This leads to the introduction of dot product attention to calculate similarity scores between the query and keys.
This document summarizes orthogonal matching pursuit (OMP) and K-SVD, which are algorithms for sparse encoding of signals using dictionaries. OMP is a greedy algorithm that selects atoms from an overcomplete dictionary to sparsely represent a signal. It uses an orthogonal projection to the residual to ensure selected atoms are not reselected. K-SVD learns an optimized dictionary for sparse encoding by iteratively sparse encoding training data and updating dictionary atoms to minimize representation error.
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
This document discusses TextRank, a graph-based algorithm for automatic text summarization. TextRank was inspired by PageRank and works by representing sentences as nodes in a graph and calculating similarity between sentences. It segments the text into sentences, represents each sentence as a vector, calculates similarity between sentences to construct a graph, runs the graph-based algorithm to score sentences, and selects the top sentences as the summary. TextRank is an unsupervised extractive summarization approach that does not require domain or linguistic knowledge.
This paper introduces auto-encoding variational Bayes, a generative modeling technique that allows for efficient and scalable approximate inference. The method utilizes variational inference within the framework of autoencoders to learn the posterior distribution over latent variables. It approximates the intractable true posterior using a recognition model conditioned on the observations. The parameters are estimated by maximizing a evidence lower bound derived using Jensen's inequality. This allows for backpropagation to efficiently learn the generative and inference models jointly. The technique was demonstrated on density estimation tasks with MNIST data.
The document summarizes the Pregel paper, which introduces a system for large-scale graph processing. Pregel presents a programming model where computation proceeds through supersteps and messages are passed between vertices. It allows for writing graph algorithms in a simple, scalable, and fault-tolerant way. The paper describes Pregel's API, architecture, applications including PageRank and shortest paths, and experimental results showing it can process graphs with billions of vertices and edges.
The document discusses a model called C2AE (Class Conditioned Auto-Encoder) for open-set recognition. C2AE is an auto-encoder trained with a conditional decoder to learn class-specific representations, allowing it to detect unknown classes during open-set identification. The document examines what threshold scores and operating points work best for open-set identification tasks using C2AE, which models unknown class likelihoods through extreme value theory modeling during conditional decoder training.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sep-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Raghuraman Krishnamoorthi, Software Engineer at Facebook, delivers the presentation "Quantizing Deep Networks for Efficient Inference at the Edge" at the Embedded Vision Alliance's September 2019 Vision Industry and Technology Forum. Krishnamoorthi gives an overview of practical deep neural network quantization techniques and tools.
Functions in JavaScript create a unique execution context each time they are called. The execution context contains an environment record and a variable environment. When a function is defined, it is associated with the lexical environment of the context where it was defined. This means that nested functions have access to variables from outer scopes. Arrow functions lexically bind the value of 'this' from the enclosing context.
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
해당 논문은 3D Aware 모델입니다 StyleGAN 같은 경우에는 어떤 하나의 피처에 대해서 Editing 하고 싶을 때 입력에 해당하는 레이턴트 백터를 찾아서 레이턴트 백터를 수정함으로써 입에 해당하는 피쳐를 바꿀 수 있었는데 이런 컨셉을 그대로 착안해서
GAN 스페이스 논문에서는 인풋이 들어왔을 때 어떤 공간적인 정보까지도 에디팅하려고 시도했습니다 결과를 봤을 때 로테이션 정보가 어느 정도 잘 학습된 것 같지만 같은 사람이 아닌 것 같이 인식되기도 합니다 이러한 문제를 이제 disentangle 되지 않았다라고 하는 게 원하는 피처만 변화시켜야 되는 것과 달리 다른 피처까지도 모두 학습 모두 변했다는 것인데 이를 좀 더 효율적으로 3D를 더 잘 이해시키기 위해서 탄생한 논문입니다.
This presentation will help you to understand the basic concepts of Natural Language Processing With this you will understand the significance of Natural Language Processing in our daily life
Natural Language Processing (NLP) is a field of artificial intelligence that deals with interactions between computers and human languages. NLP aims to program computers to process and analyze large amounts of natural language data. Some common NLP tasks include speech recognition, text classification, machine translation, question answering, and more. Popular NLP tools include Stanford CoreNLP, NLTK, OpenNLP, and TextBlob. Vectorization is commonly used to represent text in a way that can be used for machine learning algorithms like calculating text similarity. Tf-idf is a common technique used to weigh words based on their frequency and importance.
[Paper Reading] Attention is All You NeedDaiki Tanaka
The document summarizes the "Attention Is All You Need" paper, which introduced the Transformer model for natural language processing. The Transformer uses attention mechanisms rather than recurrent or convolutional layers, allowing for more parallelization. It achieved state-of-the-art results in machine translation tasks using techniques like multi-head attention, positional encoding, and beam search decoding. The paper demonstrated the Transformer's ability to draw global dependencies between input and output with constant computational complexity.
Presented September 30, 2009 in San Jose, California at GPU Technology Conference.
Describes the new features of OpenGL 3.2 and NVIDIA's extensions beyond 3.2 such as bindless graphics, direct state access, separate shader objects, copy image, texture barrier, and Cg 2.2.
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Edureka!
** NLP Using Python: - https://www.edureka.co/python-natural-language-processing-course **
This Edureka PPT will provide you with a comprehensive and detailed knowledge of Natural Language Processing, popularly known as NLP. You will also learn about the different steps involved in processing the human language like Tokenization, Stemming, Lemmatization and much more along with a demo on each one of the topics.
The following topics covered in this PPT:
1. The Evolution of Human Language
2. What is Text Mining?
3. What is Natural Language Processing?
4. Applications of NLP
5. NLP Components and Demo
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
The document describes the development of the attention mechanism in machine translation. It presents a fictional conversation between an encoder and decoder discussing the limitations of early sequence-to-sequence models that represent the entire input with a fixed-size context vector. The decoder proposes sending each token's representation separately and calling them "keys". It then sends its own query to attend to the most relevant keys, inspired by search queries. This leads to the introduction of dot product attention to calculate similarity scores between the query and keys.
This document summarizes orthogonal matching pursuit (OMP) and K-SVD, which are algorithms for sparse encoding of signals using dictionaries. OMP is a greedy algorithm that selects atoms from an overcomplete dictionary to sparsely represent a signal. It uses an orthogonal projection to the residual to ensure selected atoms are not reselected. K-SVD learns an optimized dictionary for sparse encoding by iteratively sparse encoding training data and updating dictionary atoms to minimize representation error.
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
This document discusses TextRank, a graph-based algorithm for automatic text summarization. TextRank was inspired by PageRank and works by representing sentences as nodes in a graph and calculating similarity between sentences. It segments the text into sentences, represents each sentence as a vector, calculates similarity between sentences to construct a graph, runs the graph-based algorithm to score sentences, and selects the top sentences as the summary. TextRank is an unsupervised extractive summarization approach that does not require domain or linguistic knowledge.
This paper introduces auto-encoding variational Bayes, a generative modeling technique that allows for efficient and scalable approximate inference. The method utilizes variational inference within the framework of autoencoders to learn the posterior distribution over latent variables. It approximates the intractable true posterior using a recognition model conditioned on the observations. The parameters are estimated by maximizing a evidence lower bound derived using Jensen's inequality. This allows for backpropagation to efficiently learn the generative and inference models jointly. The technique was demonstrated on density estimation tasks with MNIST data.
The document summarizes the Pregel paper, which introduces a system for large-scale graph processing. Pregel presents a programming model where computation proceeds through supersteps and messages are passed between vertices. It allows for writing graph algorithms in a simple, scalable, and fault-tolerant way. The paper describes Pregel's API, architecture, applications including PageRank and shortest paths, and experimental results showing it can process graphs with billions of vertices and edges.
The document discusses a model called C2AE (Class Conditioned Auto-Encoder) for open-set recognition. C2AE is an auto-encoder trained with a conditional decoder to learn class-specific representations, allowing it to detect unknown classes during open-set identification. The document examines what threshold scores and operating points work best for open-set identification tasks using C2AE, which models unknown class likelihoods through extreme value theory modeling during conditional decoder training.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sep-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Raghuraman Krishnamoorthi, Software Engineer at Facebook, delivers the presentation "Quantizing Deep Networks for Efficient Inference at the Edge" at the Embedded Vision Alliance's September 2019 Vision Industry and Technology Forum. Krishnamoorthi gives an overview of practical deep neural network quantization techniques and tools.
Functions in JavaScript create a unique execution context each time they are called. The execution context contains an environment record and a variable environment. When a function is defined, it is associated with the lexical environment of the context where it was defined. This means that nested functions have access to variables from outer scopes. Arrow functions lexically bind the value of 'this' from the enclosing context.
The document discusses abstracting loops using generators. It shows how generators can abstract the structure of loops to make them iterable with for-of. This allows composite patterns with multiple nested loops to all be abstracted and exposed via for-of. It also discusses lazy evaluation of loops using generators to delay running loops until needed and avoid overhead up front. Examples show filtering, mapping and chaining these operations lazily on generated iterators.
This document discusses mobile web debugging. It describes Park Jae-sung's background developing the Jindo framework at Naver Labs. It then provides tips on using Weinre and Chrome DevTools for remote debugging of webviews on Android devices.
[D2 COMMUNITY] Open Container Seoul Meetup - Kubernetes를 이용한 서비스 구축과 openshiftNAVER D2
Junho Lee is a Solutions Architect who has worked at Rockplace Inc. since 2014. The document compares Kubernetes (k8s), OpenShift, and Google Kubernetes Engine (GKE). k8s is an open-source container cluster manager originally designed by Google. OpenShift is Red Hat's container application platform based on k8s. GKE provides k8s clusters on Google Cloud Platform. Both OpenShift and GKE add services on top of k8s like app stores, logging, monitoring and technical support. The document outlines the key components, architectures and capabilities of each platform.
The document discusses various ways to structure microservices using different technologies like React.js, Clojure, and Golang. It provides examples of Dockerfile configurations to daemonize or run microservices that incorporate a React.js or Clojure frontend with a Golang or Clojure backend. It also briefly mentions tools like Webpack, Swagger, and deploying microservices to Azure.
Container technologies use namespaces and cgroups to provide isolation between processes and limit resource usage. Docker builds on these technologies using a client-server model and additional features like images, containers, and volumes to package and run applications reliably and at scale. Kubernetes builds on Docker to provide a platform for automating deployment, scaling, and operations of containerized applications across clusters of hosts. It uses labels and pods to group related containers together and services to provide discovery and load balancing for pods.
[D2 COMMUNITY] Open Container Seoul Meetup - Running a container platform in ...NAVER D2
This document discusses containers and related technologies like Docker, Kubernetes, and Openshift. It provides an overview of the container approach taken by GS Shop including their experience running non-microservice applications on containers in production. Some areas they are currently working on include containerized stateful services, multi-tenant container infrastructure, and container infrastructure provisioning automation.
blue-green deployment with docker containersAlfred UC
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms for those who already suffer from conditions like anxiety and depression.
모두를 위한 기계번역 (박찬준)
○ 개요
2014년 본격적으로 NMT에 대한 연구가 진행되었으며 현재는 Transformer 기반의 다양한 NMT 시스템들이 연구되고 있습니다.
더 나아가 최근 NLP에서 가장 뜨거운 연구분야인 Language Representation 분야에서도 Transformer를 기반으로 한 BERT, GPT-2, XLNET 등의 모델이 개발되고 있습니다.
본 테크톡에서는 먼저 RBMT와 SMT에 대해서 간략하게 살펴보고 RNN기반 NMT 부터 Transformer를 기반으로 하는 NMT까지 자세히 살펴볼 예정입니다.
더 나아가 최근 WMT에서 매년 Shared Task로 열리고 있는 Automatic Post Editing System과 Parallel Corpus Filtering, Quality Estimation 분야에 대해서 설명하며 NMT를 이용한 다양한 응용 연구분야를 소개해드리겠습니다. (ex. 실시간 강연통역 시스템, 문법교정 시스템) , 기계번역에 대해서 아무것도 모르시는 분, 궁금하시분들도 이해할 수 있는 수준으로 쉽게 설명을 진행할 예정입니다.
○ 목차
1)기계번역이란
2)RBMT에 대한 간략한 소개
3)SMT에 대한 간략한 소개
4)RNN기반 딥러닝부터 Transformer까지
5)NMT를 이용한 다양한 응용 연구 소개
a. Automatic Post Editing
b. Quality Estimation
c. Parallel Corpus Filtering
d. Grammar Error Correction
e. 실시간 강연통역 시스템
6)OpenNMT 소개
[한국어] Neural Architecture Search with Reinforcement LearningKiho Suh
모두의연구소에서 발표했던 “Neural Architecture Search with Reinforcement Learning”이라는 논문발표 자료를 공유합니다. 머신러닝 개발 업무중 일부를 자동화하는 구글의 AutoML이 뭘하려는지 이 논문을 통해 잘 보여줍니다.
이 논문에서는 딥러닝 구조를 만드는 딥러닝 구조에 대해서 설명합니다. 800개의 GPU를 혹은 400개의 CPU를 썼고 State of Art 혹은 State of Art 바로 아래이지만 더 빠르고 더 작은 네트워크를 이것을 통해 만들었습니다. 이제 Feature Engineering에서 Neural Network Engineering으로 페러다임이 변했는데 이것의 첫 시도 한 논문입니다.
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps...hoondong kim
[Tensorflow-KR Offline 세미나 발표자료]
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps Cycle 구성 방법론. (Azure Docker PaaS 위에서 1만 TPS Tensorflow Inference Serving 방법론 공유)
Deep learning text NLP and Spark Collaboration . 한글 딥러닝 Text NLP & Sparkhoondong kim
This slide explain the Deep Learning Text NLP for Korean Language. We will also discuss expansion using Spark in Deep Learning Approach to BigData Scale data.
이 슬라이드에서는 한글의 deep learning Text NLP에 대하여 설명한다. 또한, BigData Scale 데이타에 대한 Deep Learning Approach 에 있어, Spark 를 이용한 확장에 대하여도 다룬다.
DeBERTA : Decoding-Enhanced BERT with Disentangled Attentiontaeseon ryu
오늘 소개해 드릴 논문은 구글의 BERT와 페이스북 현재 메타의 RoBERTa를 기반으로 만들어진 모델입니다. RoBERTa + Disentangled Attention과 enhanced mask decode
두가지의 핵심 기술로 RoBERTa를 더욱 개선 시킨 모델이라고 이해하시면 될 것 같습니다. 추가적으로 Scale Invariant Fine Tuning을 도입하여 RoBERTa를 상당히 많은 테스크에서, NLU 테스크에서는 RoBERTa, BERT이상의 성능을 보여준 논문이기도 합니다.
논문의 자세한 리뷰부터, 백그라운드 지식까지, 자연어처리팀 진명훈님이 도와주셨습니다.
The document discusses various machine learning clustering algorithms like K-means clustering, DBSCAN, and EM clustering. It also discusses neural network architectures like LSTM, bi-LSTM, and convolutional neural networks. Finally, it presents results from evaluating different chatbot models on various metrics like validation score.
The document discusses challenges with using reinforcement learning for robotics. While simulations allow fast training of agents, there is often a "reality gap" when transferring learning to real robots. Other approaches like imitation learning and self-supervised learning can be safer alternatives that don't require trial-and-error. To better apply reinforcement learning, robots may need model-based approaches that learn forward models of the world, as well as techniques like active localization that allow robots to gather targeted information through interactive perception. Closing the reality gap will require finding ways to better match simulations to reality or allow robots to learn from real-world experiences.
[243] Deep Learning to help student’s Deep LearningNAVER D2
This document describes research on using deep learning to predict student performance in massive open online courses (MOOCs). It introduces GritNet, a model that takes raw student activity data as input and predicts outcomes like course graduation without feature engineering. GritNet outperforms baselines by more than 5% in predicting graduation. The document also describes how GritNet can be adapted in an unsupervised way to new courses using pseudo-labels, improving predictions in the first few weeks. Overall, GritNet is presented as the state-of-the-art for student prediction and can be transferred across courses without labels.
[234]Fast & Accurate Data Annotation Pipeline for AI applicationsNAVER D2
This document provides a summary of new datasets and papers related to computer vision tasks including object detection, image matting, person pose estimation, pedestrian detection, and person instance segmentation. A total of 8 papers and their associated datasets are listed with brief descriptions of the core contributions or techniques developed in each.
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지NAVER D2
This document presents a formula for calculating the loss function J(θ) in machine learning models. The formula averages the negative log likelihood of the predicted probabilities being correct over all samples S, and includes a regularization term λ that penalizes predicted embeddings being dissimilar from actual embeddings. It also defines the cosine similarity term used in the regularization.
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
The document discusses running a TensorFlow Serving (TFS) container using Docker. It shows commands to:
1. Pull the TFS Docker image from a repository
2. Define a script to configure and run the TFS container, specifying the model path, name, and port mapping
3. Run the script to start the TFS container exposing port 13377
The document discusses linear algebra concepts including:
- Representing a system of linear equations as a matrix equation Ax = b where A is a coefficient matrix, x is a vector of unknowns, and b is a vector of constants.
- Solving for the vector x that satisfies the matrix equation using linear algebra techniques such as row reduction.
- Examples of matrix equations and their component vectors are shown.
This document describes the steps to convert a TensorFlow model to a TensorRT engine for inference. It includes steps to parse the model, optimize it, generate a runtime engine, serialize and deserialize the engine, as well as perform inference using the engine. It also provides code snippets for a PReLU plugin implementation in C++.
The document discusses machine reading comprehension (MRC) techniques for question answering (QA) systems, comparing search-based and natural language processing (NLP)-based approaches. It covers key milestones in the development of extractive QA models using NLP, from early sentence-level models to current state-of-the-art techniques like cross-attention, self-attention, and transfer learning. It notes the speed and scalability benefits of combining search and reading methods for QA.
2. 2
목차
1. 기계 번역 (Machine Translation)
2. 인공 신경망 (Artificial Neural Network)
3. Word2Vec
4. Papago / N2MT 개발 과정
5. NSMT와 N2MT 번역 성능 비교
6. 마무리하며…
7. Q & A
4. 4
1-1. RBMT - 규칙 기반 기계 번역
• Rule Based Machine Translation (RBMT)
• 두 언어의 문법 규칙(Rule)을 기반으로 개발.
• 문법 규칙(Rule) 추출의 어려움
• 번역 언어 확장의 어려움
5. 5
1-2. SMT – 통계 기반 기계 번역 (1/3)
• Statistical Machine Translation (SMT)
• 두 언어의 Parallel Corpus에서
Co-occurance 기반의 통계 정보를 바탕으로 번역을 수행
• 여러 Component의 조합으로 시스템 구성
• Translation Model
• Language Model
• Reordering Model
7. 7
1-2. SMT – 통계 기반 기계 번역 (3/3)
• 작동 방식
• 입력 문장의 일부분(Word 혹은 Phrase)를 보고 번역 수행
• 번역 사전을 사용하여, 확률에 기반한 번역을 수행
• 어순이 같은 언어 (한국어/일본어, 영어/프랑스어)
• 괜찮은 성능.
• 어순이 다른 언어 (한국어/영어)
• 성능 부족.
입력:
출력: I
나는 아침 일찍 아침 준비를 했다.
prepared breakfast early in the morning.
8. 8
1-3. NMT - 인공신경망 기계 번역 (1/5)
• Neural Machine Translation (NMT)
• Neural Network 기반의 Machine Translation
• 두 언어의 Parallel Corpus를 사용하여 Neural Network 학습
9. 9
1-3. NMT - 인공신경망 기계 번역 (2/5)
• 구조
• Encoder와 Decoder로 구성.
• Encoder에서 입력 문장 Vector화.
• Decoder에서 Vector화된 입력 문장 복호화.
다
<끝>
DC
D
B
C
A
BA
나가 <시작>
11. 11
1-3. NMT - 인공신경망 기계 번역 (4/5)
• 작동 방식
?
do
fatheryour
father
does
your
What
doesWhat
뭐하시
노
아버지 <시작>느그
<끝>
?
?
do
12. 12
1-3. NMT - 인공신경망 기계 번역 (5/5)
• 특징
• 모든 단어를 고차원 Vector로 처리
• 단어의 입력/출력 순서를 학습
• 매끄러운 번역 문장 생성
• 문장 Vector 기반
• 문장의 의미 또한 고차원 Vector로 처리.
• 문장 전체의 의미를 파악한 후에, 번역을 수행.
• 동음 이의어라 하더라도, 전체 문장의 의미에 맞게 번역 가능
• 문장 Vector 생성 정확도 하락 -> 번역 정확도 하락
• 문장 Vector 해석 정확도 하락 -> 번역 정확도 하락
19. 19
2-2. 학습 원리 (1/2)
• 1단계 – Propagation (전파)
Layer
1
Layer
2
Layer
3
입력 (p) 결과 (a)
오류 (Δ) = 정답(c) – 결과(a)
• 2단계 – 오류 계산
20. 20
2-2. 학습 원리 (2/2)
• 3단계 – Back Propagation (역전파)
Layer
1
Layer
2
Layer
3
입력 (p) 오류 (Δ)
• 모든 Weight과 Bias에 대한 오류(Δ, Gradient) 계산.
• Back Propagation은
Chain Rule이라는 수학 정의에 의해 수학적으로 증명됨.
• TensorFlow나 Theano 등 다수의 Deep Learning Platform에서
자동으로 지원.
• 4단계 – Weight과 Bias 갱신(Update)
𝑾 𝑛𝑒𝑤
= 𝑾 𝑜𝑙𝑑
+ (α ∗ Δ 𝑾
)
𝒃 𝑛𝑒𝑤
= 𝒃 𝑜𝑙𝑑
+ (α ∗ Δ 𝒃
)
α : Learning Rate
21. 21
2-3. 실용화된 계기
1. Computing Power 증가
• GPU를 통해, 합리적인(?) 가격에 Computing Power 확보.
• GPU
• Parallel Processing에 특화된 Chip.
• Matrix-Vector, Matrix-Matrix 연산에서 뛰어난 성능.
2. Big Data
• Overfitting 문제를 완화시킬 정도의 충분한 Data가 확보되기 시작.
3. Neural Network Model 발전
• Dropout을 비롯한 여러 가지 Technique 발명.
24. 24
3. Word2Vec (2/4)
• CBOW & Skip-Gram (2/2)
• 학습 목표
• Average log probability의 극대화(Maximization).
• 즉, 확률 극대화.
• 극대화 과정에서 Word Vector들이 조정됨.
• 그로 인한 Side-Effect로
Semantic과 Syntactic 정보가 내재화 됨(Embedded).
25. 25
3. Word2Vec (3/4)
• Word embeddings
Same word relationship
=> same vector
man
uncle
woman
aunt
queen
king
W(“woman”) – W(“man”) + W(“uncle”) = W(“aunt”)
W(“woman”) – W(“man”) + W(“king”) = W(“queen”)
26. 26
3. Word2Vec (4/4)
• Bilingual word embeddings
man
uncle
woman
이모
여왕
king
W(“woman”) – W(“man”) + W(“uncle”) = W(“이모”)
W(“woman”) – W(“man”) + W(“king”) = W(“여왕”)
• 만족할 만한 결과 안나옴… ㅠ
• 하지만, Neural Network과 Word Vector에 대한 내공은 쌓임.
28. 28
4-1. 소개
• NAVER Neural Machine Translation
• NSMT (NAVER Statistical Machine Translation)의 뒤를 잇는
2세대 Machine Translation
• Papago팀에서 자체 기술로 개발
• Papago팀
= 기계번역 전문가 + Neural Network 전문가 + 시스템 전문가
• CUDA, cuBLAS, C++로 직접 개발.
• Open Source 기반 아님.
(TensorFlow가 나오기 전부터 개발)
30. 30
4-3. LSTM (1/3)
• LSTM 구성
• Input Gate
• Forget Gate
• Output Gate
• Cell
• Hidden Output
• Long Short Term Memory
• Short Term Memory를 오래 지속(Long)하는 Memory Neuron.
33. 33
4-4. 개발 과정 1 – Reconstruction Model (1/2)
• Stacked LSTM Encoder/Decoder Only
DecoderEncoder
Input Layer
LSTM Layer
LSTM Layer
LSTM Layer
Softmax Layer
34. 34
4-4. 개발 과정 1 – Reconstruction Model (2/2)
• 입력 문장 재생성(Reconstruction) 하기
D
<끝>
DC
D
B
C
A
BA
CB <시작>A
• Test Perplexity (Test PPL): 1.0029
35. 35
4-5. 개발 과정 2 – Small NMT
• Small Parallel Corpus로 NMT 만들기 (한국어 -> 영어)
• 수 십만 개의 Parallel 문장
• 약 1,900만 단어 (한국어 + 영어)
• 개발 결과
• 번역 수행되는 것 확인
• 장문에 취약
• 대용량 Corpus를 Training하기에 너무 오래 걸림.
36. 36
4-6. 개발 과정 3 – 대용량 Corpus 처리 (1/4)
• Multi-GPU / Propagation
GPU 1
GPU 4
GPU 3
GPU 0 다
<끝>
DC
D
B
C
A
BA
나가 <시작>
GPU 2
37. 37
4-6. 개발 과정 3 – 대용량 Corpus 처리 (2/4)
• Multi-GPU / Back-Propagation
GPU 1
GPU 4
GPU 3
GPU 0 다
<끝>
DC
D
B
C
A
BA
나가 <시작>
GPU 2
38. 38
4-6. 개발 과정 3 – 대용량 Corpus 처리 (3/4)
• Sampled Softmax (1/2)
• Softmax Layer
• Decoding시,
Vocab 사전에서 적합한 Vocab(Word)를 고르는 부분.
• 계산량 문제
Input
(Projection)
Hidden Size
(1,000)
Softmax Weight
HiddenSize
(1,000)
Vocab Size
(50,000)
x
• Matrix 계산량 크기
• [50,000 * 1,000] [1,000 * 1] = [50,000 * 1]
Voca
b
I You We love loves … <끝> <Unknown>
확률 5% 2% 4% 51% 25% … 5% 6%
39. 39
4-6. 개발 과정 3 – 대용량 Corpus 처리 (4/4)
• Sampled Softmax (2/2)
• NMT에서 계산량이 가장 많은 부분.
• Decoding 시, 매 단어를 학습할 때마다, 해야 하는 연산.
• Vocab 개수 증가 -> 계산량 증가.
• 해결책
• 학습시 Vocab의 일부만 Sampling하여 학습.
• Sampling 방식.
• 학습에 포함된 Vocab은 무조건 추가.
• 그 외의 Vocab은 Random으로 추가.
40. 40
4-7. 개발 과정 4 – Attention (1/4)
• 구조
DecoderEncoder
Input Layer
LSTM Layer
LSTM Layer
LSTM Layer
Attention Layer
Softmax Layer
41. 41
4-7. 개발 과정 4 – Attention (2/4)
• Attention Layer의 역할
• Target Word 생성시,
어떤 Source Word에 Focus를 맞춰서 생성해야 하는지
알려주는 모델.
• 장문에서 번역 성능이 급격히 떨어지는 문제 완화.
1. 매 time t마다,
ht와 모든 hs의 내적을 통해
각 src word에 대한 tar word의 Score를
구함
42. 42
4-7. 개발 과정 4 – Attention (3/4)
3. Context Vector(ct)는
모든 hs의 alignment score를 곱해 구해.
(즉, ct는 모든 hs의 weighted average sum임.)
4. Attention Output(ℎt)는
Context Vector(ct)와 Attention Input(ht)의
Concat에 Wc와 Tanh를 적용하여 계산
2. Score에 Softmax 연산을 통해 확률 구함
(Alignment Score)
43. 43
4-7. 개발 과정 4 – Attention (4/4)
• Attention Map
• 입력: 존과 메리는 아들과 딸이 있다.
• 결과: John and Mary have a son and a daughter.
45. 45
5. NSMT와 N2MT 번역 성능 비교 (1/4)
• 동음 이의어
입력문 나의 눈에 눈이 떨어졌다.
N2MT Snow fell upon my eyes.
NSMT Fell in the eyes of my eyes.
• 번역 정확도
입력문 곰 세마리가 한 집에 있어, 아빠곰, 엄마곰, 애기곰.
N2MT Three bears are in one house, father bear, mother bear, and baby bear.
NSMT 3 bears. Father bear, mummy bear, and baby bear, in a house.
46. 46
5. NSMT와 N2MT 번역 성능 비교 (2/4)
• 정량 평가
• 평가 방법: BLEU
• 평가 문장 개수: 1,000 문장
47. 47
5. NSMT와 N2MT 번역 성능 비교 (3/4)
• 정성 평가
• 평가 방법: Blind Test
• 평가 문장 개수: 100 문장
• 100점 만점. (평가자 2명 평균)
48. 48
5. NSMT와 N2MT 번역 성능 비교 (4/4)
• 번역 결과 주요 특징
• 완전한 형태의 문장을 생성
(비문이 생성되는 경우가 거의 없다.)
• 번역 결과가 SMT 대비 많이 우수.
(번역 결과가 틀릴 때는, 아예 딴 소리를 하는 경우도 많음.)
• Vocab 개수의 제약으로, Out Of Vocab 문제가 발생
50. 50
6. 마무리하며…
• Deep Learning 분야에 많은 개발자들이 필요.
• Neural Network을 Modeling 할 사람도 필요.
• Neural Network을 서비스로 만들 사람은 더 많이 필요.
• 누구나 도전해 볼만한 분야.
• 꼼꼼한 성격.
• Bug가 눈에 잘 띄지 않는다.
• 학습이 제대로 안 될뿐…
• 많은 인내심 필요.
• 뭐 하나 고치면, 결과 보는데 몇 시간씩 걸린다.
• 개발 코드량은 얼마 되지 않는다.
• 많은 분들이 도전하셔서, 더 재미있고 좋은 세상을 만들었으면 좋겠습니다.
52. 52
References (1/2)
• Artificial Neural Network
• Martin T. Hagan, Howard B. Demuth, Mark Beale. Nerual Network Design. PWS Publishing Company.
• NMT - Encoder/Decoder Model
• I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS,
2014
• Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, and Yoshua
Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation.
In EMNLP, 2014
• Kyunghyun Cho, van Merri¨enboer, B., Bahdanau, D., and Bengio, Y. On the properties of neural
machine translation: Encoder–Decoder approaches. In Eighth Workshop on Syntax, Semantics and
Structure in Statistical Translation. to appear. 2014
• N. Kalchbrenner and P. Blunsom. Recurrent continuous translation models. In EMNLP, 2013
• NMT - Attention
• Luong, Minh-Thang, Pham, Hieu, and Manning, Christopher D. Effective approaches to attentionbased
neural machine translation. Proceedings of EMNLP, 2015.
• D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and
translate. In ICLR, 2015
• NMT - Sampled Softmax
• S´ebastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. On using very large target
vocabulary for neural machine translation. In ACL, 2015
53. 53
References (2/2)
• LSTM
• A. Graves. Generating sequences with recurrent neural networks. In Arxiv preprint arXiv:1308.0850,
2013.
• Word2Vec
• Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word
Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.
• Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations
of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.
• Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word
Representations. In Proceedings of NAACL HLT, 2013.