The document describes the sequence-to-sequence (seq2seq) model with an encoder-decoder architecture. It explains that the seq2seq model uses two recurrent neural networks - an encoder RNN that processes the input sequence into a fixed-length context vector, and a decoder RNN that generates the output sequence from the context vector. It provides details on how the encoder, decoder, and training process work in the seq2seq model.
This document discusses genetic expression programming (GEP), an evolutionary algorithm related to genetic algorithms. It examines GEP's ability to model natural evolution through its inclusion of replicators, phenotypes, and a genotype-phenotype mapping. The document tests GEP on symbolic regression problems to analyze genetic neutrality by increasing gene length and number of genes. It finds GEP provides an ideal framework to study the effects of neutral mutations on evolution by allowing tight control over neutral regions in the genome.
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
This Material is an in_depth study report of Recurrent Neural Network (RNN)
Material mainly from Deep Learning Book Bible, http://www.deeplearningbook.org/
Topics: Briefing, Theory Proof, Variation, Gated RNNN Intuition. Real World Application
Application (CNN+RNN on SVHN)
Also a video (In Chinese)
https://www.youtube.com/watch?v=p6xzPqRd46w
The document discusses recurrent neural networks (RNNs) and long short-term memory (LSTM) networks. It provides details on the architecture of RNNs including forward and back propagation. LSTMs are described as a type of RNN that can learn long-term dependencies using forget, input and output gates to control the cell state. Examples of applications for RNNs and LSTMs include language modeling, machine translation, speech recognition, and generating image descriptions.
발표자: 박태성 (UC Berkeley 박사과정)
발표일: 2017.6.
Taesung Park is a Ph.D. student at UC Berkeley in AI and computer vision, advised by Prof. Alexei Efros.
His research interest lies between computer vision and computational photography, such as generating realistic images or enhancing photo qualities. He received B.S. in mathematics and M.S. in computer science from Stanford University.
개요:
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
However, for many tasks, paired training data will not be available.
We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.
Our goal is to learn a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa).
Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc.
Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
The document describes the sequence-to-sequence (seq2seq) model with an encoder-decoder architecture. It explains that the seq2seq model uses two recurrent neural networks - an encoder RNN that processes the input sequence into a fixed-length context vector, and a decoder RNN that generates the output sequence from the context vector. It provides details on how the encoder, decoder, and training process work in the seq2seq model.
This document discusses genetic expression programming (GEP), an evolutionary algorithm related to genetic algorithms. It examines GEP's ability to model natural evolution through its inclusion of replicators, phenotypes, and a genotype-phenotype mapping. The document tests GEP on symbolic regression problems to analyze genetic neutrality by increasing gene length and number of genes. It finds GEP provides an ideal framework to study the effects of neutral mutations on evolution by allowing tight control over neutral regions in the genome.
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
This Material is an in_depth study report of Recurrent Neural Network (RNN)
Material mainly from Deep Learning Book Bible, http://www.deeplearningbook.org/
Topics: Briefing, Theory Proof, Variation, Gated RNNN Intuition. Real World Application
Application (CNN+RNN on SVHN)
Also a video (In Chinese)
https://www.youtube.com/watch?v=p6xzPqRd46w
The document discusses recurrent neural networks (RNNs) and long short-term memory (LSTM) networks. It provides details on the architecture of RNNs including forward and back propagation. LSTMs are described as a type of RNN that can learn long-term dependencies using forget, input and output gates to control the cell state. Examples of applications for RNNs and LSTMs include language modeling, machine translation, speech recognition, and generating image descriptions.
발표자: 박태성 (UC Berkeley 박사과정)
발표일: 2017.6.
Taesung Park is a Ph.D. student at UC Berkeley in AI and computer vision, advised by Prof. Alexei Efros.
His research interest lies between computer vision and computational photography, such as generating realistic images or enhancing photo qualities. He received B.S. in mathematics and M.S. in computer science from Stanford University.
개요:
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
However, for many tasks, paired training data will not be available.
We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.
Our goal is to learn a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa).
Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc.
Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
Brief introduction on attention mechanism and its application in neural machine translation, especially in transformer, where attention was used to remove RNNs completely from NMT.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
The document provides an overview of LSTM (Long Short-Term Memory) networks. It first reviews RNNs (Recurrent Neural Networks) and their limitations in capturing long-term dependencies. It then introduces LSTM networks, which address this issue using forget, input, and output gates that allow the network to retain information for longer. Code examples are provided to demonstrate how LSTM remembers information over many time steps. Resources for further reading on LSTMs and RNNs are listed at the end.
Convolutional neural networks (CNNs) are a type of neural network used for processing grid-like data such as images. CNNs have an input layer, multiple hidden layers, and an output layer. The hidden layers typically include convolutional layers that extract features, pooling layers that reduce dimensionality, and fully connected layers similar to regular neural networks. CNNs are commonly used for computer vision tasks like image classification and object detection due to their ability to learn spatial hierarchies of features in the data. They have applications in areas like facial recognition, document analysis, and climate modeling.
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
This talk describes an experimental approach to time series modeling using 1D convolution filter layers in a neural network architecture. This approach was developed at System1 for forecasting marketplace value of online advertising categories.
The document discusses attention mechanisms for encoder-decoder neural networks. It describes traditional encoder-decoder models that compress all input information into a fixed vector, which cannot encode long sentences. Attention mechanisms allow the decoder to access the entire encoded input sequence and assign weights to input elements based on their relevance to predicting the output. The core attention model uses an alignment function to calculate energy scores between the input and output, a distribution function to calculate attention weights from the energy scores, and a weighted sum to compute the context vector used by the decoder. Various alignment functions are discussed, including dot product, additive, and deep attention.
Recurrent Neural Network
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
Mohammad Sabouri
https://sites.google.com/view/acrrl/
Residual neural networks (ResNets) solve the vanishing gradient problem through shortcut connections that allow gradients to flow directly through the network. The ResNet architecture consists of repeating blocks with convolutional layers and shortcut connections. These connections perform identity mappings and add the outputs of the convolutional layers to the shortcut connection. This helps networks converge earlier and increases accuracy. Variants include basic blocks with two convolutional layers and bottleneck blocks with three layers. Parameters like number of layers affect ResNet performance, with deeper networks showing improved accuracy. YOLO is a variant that replaces the softmax layer with a 1x1 convolutional layer and logistic function for multi-label classification.
Survey of Attention mechanism & Use in Computer VisionSwatiNarkhede1
This presentation contains the overview of Attention models. It also has information of the stand alone self attention model used for Computer Vision tasks.
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
This document discusses support vector machines (SVM) and provides an example of using SVM for classification. It begins with common applications of SVM like face detection and image classification. It then provides an overview of SVM, explaining how it finds the optimal separating hyperplane between two classes by maximizing the margin between them. An example demonstrates SVM by classifying people as male or female based on height and weight data. It also discusses how kernels can be used to handle non-linearly separable data. The document concludes by showing an implementation of SVM on a zoos dataset to classify animals as crocodiles or alligators.
The document provides an overview of perceptrons and neural networks. It discusses how neural networks are modeled after the human brain and consist of interconnected artificial neurons. The key aspects covered include the McCulloch-Pitts neuron model, Rosenblatt's perceptron, different types of learning (supervised, unsupervised, reinforcement), the backpropagation algorithm, and applications of neural networks such as pattern recognition and machine translation.
Introduction to Convolutional Neural NetworksHannes Hapke
This document provides an introduction to machine learning using convolutional neural networks (CNNs) for image classification. It discusses how to prepare image data, build and train a simple CNN model using Keras, and optimize training using GPUs. The document outlines steps to normalize image sizes, convert images to matrices, save data formats, assemble a CNN in Keras including layers, compilation, and fitting. It provides resources for learning more about CNNs and deep learning frameworks like Keras and TensorFlow.
Jonathan Ronen - Variational Autoencoders tutorialJonathan Ronen
This document provides an overview of autoencoders and variational autoencoders. It discusses how principal component analysis (PCA) is related to linear autoencoders and can be performed using backpropagation. Deep and nonlinear autoencoders are also covered. The document then introduces variational autoencoders, which combine variational inference with autoencoders to allow for probabilistic latent space modeling. It explains how variational autoencoders are trained using backpropagation through reparameterization to maximize the evidence lower bound.
The document discusses transfer learning and building complex models using Keras and TensorFlow. It provides examples of using the functional API to build models with multiple inputs and outputs. It also discusses reusing pretrained layers from models like ResNet, Xception, and VGG to perform transfer learning for new tasks with limited labeled data. Freezing pretrained layers initially and then training the entire model is recommended for transfer learning.
Deep generative models can be either generative or discriminative. Generative models directly model the joint distribution of inputs and outputs, while discriminative models directly model the conditional distribution of outputs given inputs. Common deep generative models include restricted Boltzmann machines, deep belief networks, variational autoencoders, generative adversarial networks, and deep convolutional generative adversarial networks. These models use different network architectures and training procedures to generate new examples that resemble samples from the training data distribution.
발표자: 이활석 (Naver Clova)
발표일: 2017.11.
(현) NAVER Clova Vision
(현) TFKR 운영진
개요:
최근 딥러닝 연구는 지도학습에서 비지도학습으로 급격히 무게 중심이 옮겨지고 있습니다.
특히 컴퓨터 비전 기술 분야에서는 지도학습에 해당하는 이미지 내에 존재하는 정보를 찾는 인식 기술에서,
비지도학습에 해당하는 특정 정보를 담는 이미지를 생성하는 기술인 생성 기술로 연구 동향이 바뀌어 가고 있습니다.
본 세미나에서는 생성 기술의 두 축을 담당하고 있는 VAE(variational autoencoder)와 GAN(generative adversarial network) 동작 원리에 대해서 간략히 살펴 보고, 관련된 주요 논문들의 결과를 공유하고자 합니다.
딥러닝에 대한 지식이 없더라도 생성 모델을 학습할 수 있는 두 방법론인 VAE와 GAN의 개념에 대해 이해하고
그 기술 수준을 파악할 수 있도록 강의 내용을 구성하였습니다.
Photo wake up - 3d character animation from a single photoKyeongUkJang
The document describes the steps involved in animating a 3D character model from a single photo. It involves detecting the person in the photo using Faster R-CNN, estimating their 2D pose, segmenting the person from the background, fitting the SMPL body model to generate a rigged 3D mesh, correcting head pose and texturing the mesh to create a 3D animated character. The method aims to overcome limitations of prior work and produce more accurate 3D character animations from just a single image.
Brief introduction on attention mechanism and its application in neural machine translation, especially in transformer, where attention was used to remove RNNs completely from NMT.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
The document provides an overview of LSTM (Long Short-Term Memory) networks. It first reviews RNNs (Recurrent Neural Networks) and their limitations in capturing long-term dependencies. It then introduces LSTM networks, which address this issue using forget, input, and output gates that allow the network to retain information for longer. Code examples are provided to demonstrate how LSTM remembers information over many time steps. Resources for further reading on LSTMs and RNNs are listed at the end.
Convolutional neural networks (CNNs) are a type of neural network used for processing grid-like data such as images. CNNs have an input layer, multiple hidden layers, and an output layer. The hidden layers typically include convolutional layers that extract features, pooling layers that reduce dimensionality, and fully connected layers similar to regular neural networks. CNNs are commonly used for computer vision tasks like image classification and object detection due to their ability to learn spatial hierarchies of features in the data. They have applications in areas like facial recognition, document analysis, and climate modeling.
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
This talk describes an experimental approach to time series modeling using 1D convolution filter layers in a neural network architecture. This approach was developed at System1 for forecasting marketplace value of online advertising categories.
The document discusses attention mechanisms for encoder-decoder neural networks. It describes traditional encoder-decoder models that compress all input information into a fixed vector, which cannot encode long sentences. Attention mechanisms allow the decoder to access the entire encoded input sequence and assign weights to input elements based on their relevance to predicting the output. The core attention model uses an alignment function to calculate energy scores between the input and output, a distribution function to calculate attention weights from the energy scores, and a weighted sum to compute the context vector used by the decoder. Various alignment functions are discussed, including dot product, additive, and deep attention.
Recurrent Neural Network
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
Mohammad Sabouri
https://sites.google.com/view/acrrl/
Residual neural networks (ResNets) solve the vanishing gradient problem through shortcut connections that allow gradients to flow directly through the network. The ResNet architecture consists of repeating blocks with convolutional layers and shortcut connections. These connections perform identity mappings and add the outputs of the convolutional layers to the shortcut connection. This helps networks converge earlier and increases accuracy. Variants include basic blocks with two convolutional layers and bottleneck blocks with three layers. Parameters like number of layers affect ResNet performance, with deeper networks showing improved accuracy. YOLO is a variant that replaces the softmax layer with a 1x1 convolutional layer and logistic function for multi-label classification.
Survey of Attention mechanism & Use in Computer VisionSwatiNarkhede1
This presentation contains the overview of Attention models. It also has information of the stand alone self attention model used for Computer Vision tasks.
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
This document discusses support vector machines (SVM) and provides an example of using SVM for classification. It begins with common applications of SVM like face detection and image classification. It then provides an overview of SVM, explaining how it finds the optimal separating hyperplane between two classes by maximizing the margin between them. An example demonstrates SVM by classifying people as male or female based on height and weight data. It also discusses how kernels can be used to handle non-linearly separable data. The document concludes by showing an implementation of SVM on a zoos dataset to classify animals as crocodiles or alligators.
The document provides an overview of perceptrons and neural networks. It discusses how neural networks are modeled after the human brain and consist of interconnected artificial neurons. The key aspects covered include the McCulloch-Pitts neuron model, Rosenblatt's perceptron, different types of learning (supervised, unsupervised, reinforcement), the backpropagation algorithm, and applications of neural networks such as pattern recognition and machine translation.
Introduction to Convolutional Neural NetworksHannes Hapke
This document provides an introduction to machine learning using convolutional neural networks (CNNs) for image classification. It discusses how to prepare image data, build and train a simple CNN model using Keras, and optimize training using GPUs. The document outlines steps to normalize image sizes, convert images to matrices, save data formats, assemble a CNN in Keras including layers, compilation, and fitting. It provides resources for learning more about CNNs and deep learning frameworks like Keras and TensorFlow.
Jonathan Ronen - Variational Autoencoders tutorialJonathan Ronen
This document provides an overview of autoencoders and variational autoencoders. It discusses how principal component analysis (PCA) is related to linear autoencoders and can be performed using backpropagation. Deep and nonlinear autoencoders are also covered. The document then introduces variational autoencoders, which combine variational inference with autoencoders to allow for probabilistic latent space modeling. It explains how variational autoencoders are trained using backpropagation through reparameterization to maximize the evidence lower bound.
The document discusses transfer learning and building complex models using Keras and TensorFlow. It provides examples of using the functional API to build models with multiple inputs and outputs. It also discusses reusing pretrained layers from models like ResNet, Xception, and VGG to perform transfer learning for new tasks with limited labeled data. Freezing pretrained layers initially and then training the entire model is recommended for transfer learning.
Deep generative models can be either generative or discriminative. Generative models directly model the joint distribution of inputs and outputs, while discriminative models directly model the conditional distribution of outputs given inputs. Common deep generative models include restricted Boltzmann machines, deep belief networks, variational autoencoders, generative adversarial networks, and deep convolutional generative adversarial networks. These models use different network architectures and training procedures to generate new examples that resemble samples from the training data distribution.
발표자: 이활석 (Naver Clova)
발표일: 2017.11.
(현) NAVER Clova Vision
(현) TFKR 운영진
개요:
최근 딥러닝 연구는 지도학습에서 비지도학습으로 급격히 무게 중심이 옮겨지고 있습니다.
특히 컴퓨터 비전 기술 분야에서는 지도학습에 해당하는 이미지 내에 존재하는 정보를 찾는 인식 기술에서,
비지도학습에 해당하는 특정 정보를 담는 이미지를 생성하는 기술인 생성 기술로 연구 동향이 바뀌어 가고 있습니다.
본 세미나에서는 생성 기술의 두 축을 담당하고 있는 VAE(variational autoencoder)와 GAN(generative adversarial network) 동작 원리에 대해서 간략히 살펴 보고, 관련된 주요 논문들의 결과를 공유하고자 합니다.
딥러닝에 대한 지식이 없더라도 생성 모델을 학습할 수 있는 두 방법론인 VAE와 GAN의 개념에 대해 이해하고
그 기술 수준을 파악할 수 있도록 강의 내용을 구성하였습니다.
Photo wake up - 3d character animation from a single photoKyeongUkJang
The document describes the steps involved in animating a 3D character model from a single photo. It involves detecting the person in the photo using Faster R-CNN, estimating their 2D pose, segmenting the person from the background, fitting the SMPL body model to generate a rigged 3D mesh, correcting head pose and texturing the mesh to create a 3D animated character. The method aims to overcome limitations of prior work and produce more accurate 3D character animations from just a single image.
This document summarizes the t-SNE technique for visualizing high-dimensional data in two or three dimensions. It explains that t-SNE is an advanced version of Stochastic Neighbor Embedding (SNE) that can better preserve local and global data structures compared to linear dimensionality reduction methods. The document outlines how t-SNE converts Euclidean distances between data points in high-dimensions to conditional probabilities representing similarity. It also discusses the "crowding problem" that occurs when mapping high-dimensional data to low-dimensions, and how t-SNE addresses this issue.
2. 모델 개요
토픽별 단어의 분포
문서별 토픽의 분포
각 문서에 어떤 주제들이 존재하는지에 대한 확률모형
3. 글쓰기의 과정
글감, 주제 정하기 어떤 단어를 쓸까?
사람
LDA 의 가정
말뭉치(corpus)로부터
얻은 토픽의 분포로부터 토픽 선정
선정된 토픽에 해당하는
단어들을 뽑아서 쓰자!
실제로 이런다는건아니고 이렇게 될 것이라 가정한다는것
4. 반대방향으로 생각해보자
현재 문서에 등장한 단어들은 어떤 토픽에서 나온 단어들일까?
명시적으로 알기가 어려움
LDA를 활용하여 말뭉치 이면에 존재하는 정보를 추론해 낸다.
그럼 D의 Dirichlet는 뭐야?
LDA의 L은 latent 잠재정보를 알아낸다는것
일단 디리클레라는 분포가 있다는것만 알고 넘어가자
7. 모델의 변수
ϕk 는 k번째 토픽의 단어비중을 나타내는 벡터
말뭉치 전체 단어 개수만큼의 길이를 갖게됨.
ϕ1 ϕ2 ϕ3
각 entry value는 해당 단어가 k번째 토픽에서
차지하는 비중을 나타냄
각 요소는 확률이므로 열의 총 합은 1이 된다.
아키텍처를 살펴보면 ϕk 는 하이퍼 파라미터 β 의
영향을 받고 있음. 이는 LDA에서 토픽의 단어비중 ϕk 이 디리클레 분포를
따른다는 가정을 취하기 때문. 자세한 이론적 내용은 잠시 후에
8. 모델의 변수
θd 는 d번째 문서가 가진 토픽 비중을 나타내는 벡터
전체 토픽 개수 K만큼의 길이를 갖게됨.
θ1 각 entry value는 k번째 토픽이 해당 d번째
문서에서 차지하는 비중을 나타냄
각 요소는 확률이므로 각 행의 총 합은 1이 된다.
아키텍처를 살펴보면 θd 는 하이퍼 파라미터 α 의
영향을 받고 있음. 이는 LDA에서 문서의 토픽 비중 θd 역시 디리클레 분포를
따른다는 가정을 취하기 때문. 자세한 이론적 내용은 잠시 후에
θ2
θ3
θ4
θ5
θ6
9. 모델의 변수
zd,n 는 d번째 문서의 n번째 단어가 어떤 토픽에 해당하는지 할당해주는 역할
예컨데 세번째 문서의 첫번째 단어는 Topic2일 가능성이 가장 높다고 할 수 있음
wd,n 은 문서에 등장하는 단어를 할당해 주는 역할.
직전 예시에서 z_3,1이 실제로 Topic2에 할당되었다고 했을때, Topic2의 단어분포 가운데
Money의 확률이 가장 높으므로 w_3,1은 Money가 될 가능성이 가장 높음
동시에 영향을 받음zd,nϕk
11. LDA의 inference
지금까지는 LDA가 가정하는 문서생성과정과 잠재변수들의 역할을 살펴보았다.
이제는 반대로 관측된 W_d,n을 가지고 잠재변수를 추정하는 inference 과정을 살펴보자.
LDA는 토픽의 단어분포와 문서의 토픽분포의 결합으로 문서 내 단어들이 생성됨을 가정하고 있다.
실제 관측된 문서 내 단어를 가지고 우리가 알고 싶은 토픽의 단어 분포, 문서의 토픽 분포를 추정할 것
문서 생성 과정이 합리적이라면 이 결합확률이 매우 클 것
ϕk θd
12. LDA의 inference
여기에서 하이퍼 파라미터 알파와 베타, 그리고 관찰 가능한 w_d,n을 제외한 모든 변수가 미지수.
p(z, ϕ, θ|w)결국, 를 최대로 만드는 z, ϕ, θ 를 찾는것이 목적
그런데 여기에서 분모에 해당하는 p(w) 를 바로 구할수 없기 때문에 깁스 샘플링 활용
18. LDA의 깁스 샘플링
LDA 에서는 나머지 변수는 고정시킨 채 한 변수만을 변화시키되, 불필요한 변수를 제외하는 collapsed gibbs sampling 기법 활용한다.
쉽게 말해서, z만 구하면 phi와 theta는 z를 활용하여 구할수 있기 때문에 z만 구하겠다는 것
LDA의 깁스 샘플링 과정을 수식으로 표현하면 다음과 같다.
i번째 단어의 토픽정보를 제외한 모든 단어의 토픽정보
28. 실제 계산 과정
이 예시에서 z_1,2는 Topic1에 할당될 가능성이 가장 크다.
하지만 확률적인 방식으로 토픽을 할당하기 때문에 무조건 Topic1에 할당된다고 할 수는 없음
29. 실제 계산 과정
결과적으로 z_1,2 가 Topic1에 할당되었다고 가정해보면 Doc1의 토픽분포 첫번째 토픽의 단어분포 는 다음과 같다.θ1 ϕ1
30. 디리클레 파라미터의 역할
A는 d번째 문서가 k번째 토픽과 맺고 있는 연관성 강도를 나타냄
B는 d번째 문서의 n번째 단어(w_d,n)가 k번째 토픽과 맺고 있는 연관성 강도를 나타냄
이전 예시에서 Topic2에 할당된 단어가 하나도 없는 상황이 있었다. (n_1,2 = 0)
원래대로라면 첫번째 문서가 Topic2와 맺고있는 연관성 강도, A는 0이어야 할 것,
A가 0이되면 z_d,i가 Topic2가 될 확률 또한 0이게 된다.
31. 디리클레 파라미터의 역할
하지만 하이퍼 파라미터 알파 덕분에 A가 아예 0이되는 상황을 방지할 수 있게 됨.
일종의 Smoothing 역할. 알파가 클수록 토픽들의 분포가 비슷해지고 작을수록 특정 토픽이 크게 나타나게 됨.