Transformer modality is an established architecture in natural language processing that utilizes a framework of self-attention with a deep learning approach.
This presentation was delivered under the mentorship of Mr. Mukunthan Tharmakulasingam (University of Surrey, UK), as a part of the ScholarX program from Sustainable Education Foundation.
Using Large Language Models in 10 Lines of CodeGautier Marti
Modern NLP models can be daunting: No more bag-of-words but complex neural network architectures, with billions of parameters. Engineers, financial analysts, entrepreneurs, and mere tinkerers, fear not! You can get started with as little as 10 lines of code.
Presentation prepared for the Abu Dhabi Machine Learning Meetup Season 3 Episode 3 hosted at ADGM in Abu Dhabi.
출처: Selvas AI TTS 기술블로그 : https://tts.selvy.ai/2018/09/ai.html
음성합성 기술은 기계가 사람처럼 말하기를 원하는 인간의 바램으로부터 시작되었습니다. 지난 수십 년간 음성합성 기술은 많은 연구자들의 노력으로 획기적인 발전을 거듭해왔습니다. 특히, 최근의 딥러닝을 활용한 음성합성 기술은 인간이 말할 때의 억양이나 미세한 호흡까지도 잘 표현합니다. 이러한 기술의 발전은 자연스러운 음성을 만들어내는 것에 그치지 않고, 감정이나 개성을 표현하는 기술로 발전 하고 있습니다.
우리는 지난 몇 년간 딥러닝 기반의 음성합성 기술에 집중해왔으며 상용화를 위한 우리의 목표를 달성하게 해줄 여러 의미 있는 진전을 찾아내었습니다.
최근 우리가 주목한 것은 sequence-to-sequence 모델에 기반을 둔 구글의 Tacotron 입니다. Tacotron은 지금까지의 음성합성 기술중 가장 자연스러운 음성을 표현하는 훌륭한 기술입니다. 그러나 이것은 특정 단어를 합성하지 않는 생략(skip)문제, 특정 단어가 반복되는 반복(repetition) 문제를 안고 있습니다. 이러한 현상은 상용화를 목표로 할 때 매우 불안정한 요소가 됩니다. 우리는 이 문제를 해결하기 위하여 노력하였으며 새로운 알고리즘과 학습레시피를 고안하여 의미 있는 성과를 얻어 냈습니다.
특히, 생략과 반복이 phone sequence 확률 부족과 attention 메커니즘의 불안정성 때문이라는 것에 집중하여 Advanced Encoder와 Weighted Location Attention 알고리즘을 고안하였으며 약 20여 년간 축척된 음성합성 기술의 노하우와 접목하여 xVoice를 탄생시켰습니다
* Selvy deepTTS
셀바스AI의 Selvy deepTTS는 딥러닝 기술 기반의 음성합성 엔진 xVoice가 적용된 end-to-end 방식의 음성합성 솔루션입니다. Selvy deepTTS는 전통적인 방식의 음성합성기와 비교하여 보다 자연스러운 음성을 만들어내며, 특정인의 목소리와 발화스타일을 모방 할 수도 있습니다.
* Improving Naturalness
Selvy deepTTS는 사람이 발성할 때의 자연스러운 운율과 발음, 억양 등을 학습하여 자연스러운 음성을 생성합니다. 두 음절 사이의 연음, 문장 내에서의 쉼, 발성하는 동안의 작은 호흡 등을 사람처럼 자연스럽게 표현합니다.
* Human-like Expressive Speech
보다 자연스럽고 감성적인 느낌씨를 표현합니다.
* 개인화 TTS
xVoice는 단지 수분~수시간의 음성만으로도 그 사람의 목소리를 가진 음성합성기를 만들어 낼 수 있습니다. xVoice로 만들어진 음성합성기는 Selvy deepTTS에 탑재되어 실시간 음성합성 서비스에 활용 될 수도 있습니다.
* Overcome Skip/Repetition Problem
End-to-end 음성합성기술의 고질적인 문제였던 합성시 문장 일부분이 생략(skip)되거나 반복(repetition)되는 문제를 셀바스AI의 축적된 노하우(know-how)와 새로운 알고리즘을 고안하여 해결하였습니다.
* Real-Time Synthesis
실시간 음성 합성 기능을 제공합니다. 텍스트를 입력하고 실시간으로 음성을 생성할 수 있습니다. 텍스트를 입력하고 음성생성을 요청하여 청취하기까지의 과정이 텍스트의 길이에 상관없이 1초 이내에 이루어집니다.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sep-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Raghuraman Krishnamoorthi, Software Engineer at Facebook, delivers the presentation "Quantizing Deep Networks for Efficient Inference at the Edge" at the Embedded Vision Alliance's September 2019 Vision Industry and Technology Forum. Krishnamoorthi gives an overview of practical deep neural network quantization techniques and tools.
IEEE SocialCom 2015: Intent Classification of Social Media TextHemant Purohit
Social media platforms facilitate the emergence of citizen communities that discuss real-world events, and generate/share content with a variety of intent ranging from social good (e.g., volunteering to help) to commercial interest (e.g., criticizing product features). Hence, mining intent from social data can aid in filtering social media to support organizations, such as an emergency management unit for resource planning. However, effective intent mining is inherently challenging due to ambiguity in interpretation, and sparsity of relevant behaviors in social data. In this research, we address the problem of multiclass classification of intent with a use-case of social data generated during crisis events. Our novel method exploits a hybrid feature representation created by combining top-down processing using knowledge-guided patterns with bottom-up processing using a bag-of-tokens model. We employ pattern-set creation from a variety of knowledge sources including psycholinguistics to tackle the ambiguity challenge, social behavior about conversations to enrich context, and contrast patterns to tackle the sparsity challenge.
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
A brief review of the paper:
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL-HLT (pp. 2227–2237)
I summarized the GPT models in this slide and compared the GPT1, GPT2, and GPT3.
GPT means Generative Pre-Training of a language model and was implemented based on the decoder structure of the transformer model.
(24th May, 2021)
Transformer modality is an established architecture in natural language processing that utilizes a framework of self-attention with a deep learning approach.
This presentation was delivered under the mentorship of Mr. Mukunthan Tharmakulasingam (University of Surrey, UK), as a part of the ScholarX program from Sustainable Education Foundation.
Using Large Language Models in 10 Lines of CodeGautier Marti
Modern NLP models can be daunting: No more bag-of-words but complex neural network architectures, with billions of parameters. Engineers, financial analysts, entrepreneurs, and mere tinkerers, fear not! You can get started with as little as 10 lines of code.
Presentation prepared for the Abu Dhabi Machine Learning Meetup Season 3 Episode 3 hosted at ADGM in Abu Dhabi.
출처: Selvas AI TTS 기술블로그 : https://tts.selvy.ai/2018/09/ai.html
음성합성 기술은 기계가 사람처럼 말하기를 원하는 인간의 바램으로부터 시작되었습니다. 지난 수십 년간 음성합성 기술은 많은 연구자들의 노력으로 획기적인 발전을 거듭해왔습니다. 특히, 최근의 딥러닝을 활용한 음성합성 기술은 인간이 말할 때의 억양이나 미세한 호흡까지도 잘 표현합니다. 이러한 기술의 발전은 자연스러운 음성을 만들어내는 것에 그치지 않고, 감정이나 개성을 표현하는 기술로 발전 하고 있습니다.
우리는 지난 몇 년간 딥러닝 기반의 음성합성 기술에 집중해왔으며 상용화를 위한 우리의 목표를 달성하게 해줄 여러 의미 있는 진전을 찾아내었습니다.
최근 우리가 주목한 것은 sequence-to-sequence 모델에 기반을 둔 구글의 Tacotron 입니다. Tacotron은 지금까지의 음성합성 기술중 가장 자연스러운 음성을 표현하는 훌륭한 기술입니다. 그러나 이것은 특정 단어를 합성하지 않는 생략(skip)문제, 특정 단어가 반복되는 반복(repetition) 문제를 안고 있습니다. 이러한 현상은 상용화를 목표로 할 때 매우 불안정한 요소가 됩니다. 우리는 이 문제를 해결하기 위하여 노력하였으며 새로운 알고리즘과 학습레시피를 고안하여 의미 있는 성과를 얻어 냈습니다.
특히, 생략과 반복이 phone sequence 확률 부족과 attention 메커니즘의 불안정성 때문이라는 것에 집중하여 Advanced Encoder와 Weighted Location Attention 알고리즘을 고안하였으며 약 20여 년간 축척된 음성합성 기술의 노하우와 접목하여 xVoice를 탄생시켰습니다
* Selvy deepTTS
셀바스AI의 Selvy deepTTS는 딥러닝 기술 기반의 음성합성 엔진 xVoice가 적용된 end-to-end 방식의 음성합성 솔루션입니다. Selvy deepTTS는 전통적인 방식의 음성합성기와 비교하여 보다 자연스러운 음성을 만들어내며, 특정인의 목소리와 발화스타일을 모방 할 수도 있습니다.
* Improving Naturalness
Selvy deepTTS는 사람이 발성할 때의 자연스러운 운율과 발음, 억양 등을 학습하여 자연스러운 음성을 생성합니다. 두 음절 사이의 연음, 문장 내에서의 쉼, 발성하는 동안의 작은 호흡 등을 사람처럼 자연스럽게 표현합니다.
* Human-like Expressive Speech
보다 자연스럽고 감성적인 느낌씨를 표현합니다.
* 개인화 TTS
xVoice는 단지 수분~수시간의 음성만으로도 그 사람의 목소리를 가진 음성합성기를 만들어 낼 수 있습니다. xVoice로 만들어진 음성합성기는 Selvy deepTTS에 탑재되어 실시간 음성합성 서비스에 활용 될 수도 있습니다.
* Overcome Skip/Repetition Problem
End-to-end 음성합성기술의 고질적인 문제였던 합성시 문장 일부분이 생략(skip)되거나 반복(repetition)되는 문제를 셀바스AI의 축적된 노하우(know-how)와 새로운 알고리즘을 고안하여 해결하였습니다.
* Real-Time Synthesis
실시간 음성 합성 기능을 제공합니다. 텍스트를 입력하고 실시간으로 음성을 생성할 수 있습니다. 텍스트를 입력하고 음성생성을 요청하여 청취하기까지의 과정이 텍스트의 길이에 상관없이 1초 이내에 이루어집니다.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sep-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Raghuraman Krishnamoorthi, Software Engineer at Facebook, delivers the presentation "Quantizing Deep Networks for Efficient Inference at the Edge" at the Embedded Vision Alliance's September 2019 Vision Industry and Technology Forum. Krishnamoorthi gives an overview of practical deep neural network quantization techniques and tools.
IEEE SocialCom 2015: Intent Classification of Social Media TextHemant Purohit
Social media platforms facilitate the emergence of citizen communities that discuss real-world events, and generate/share content with a variety of intent ranging from social good (e.g., volunteering to help) to commercial interest (e.g., criticizing product features). Hence, mining intent from social data can aid in filtering social media to support organizations, such as an emergency management unit for resource planning. However, effective intent mining is inherently challenging due to ambiguity in interpretation, and sparsity of relevant behaviors in social data. In this research, we address the problem of multiclass classification of intent with a use-case of social data generated during crisis events. Our novel method exploits a hybrid feature representation created by combining top-down processing using knowledge-guided patterns with bottom-up processing using a bag-of-tokens model. We employ pattern-set creation from a variety of knowledge sources including psycholinguistics to tackle the ambiguity challenge, social behavior about conversations to enrich context, and contrast patterns to tackle the sparsity challenge.
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
A brief review of the paper:
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL-HLT (pp. 2227–2237)
I summarized the GPT models in this slide and compared the GPT1, GPT2, and GPT3.
GPT means Generative Pre-Training of a language model and was implemented based on the decoder structure of the transformer model.
(24th May, 2021)
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Overview of tree algorithms from decision tree to xgboostTakami Sato
For my understanding, I surveyed popular tree algorithms on Machine Learning and their evolution. This is the first time I wrote a presentation in English. So, I am happy if you give me a feedback.
Fine tune and deploy Hugging Face NLP modelsOVHcloud
Are you currently managing AI projects that require a lot of GPU power?
Are you tired of managing the complexity of your infrastructures, GPU instances and your Kubeflow yourself?
Need flexibility for your AI platform or SaaS solution?
OVHcloud innovates in AI by offering simple and turnkey solutions to train your models and put them into production.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/08/how-transformers-are-changing-the-direction-of-deep-learning-architectures-a-presentation-from-synopsys/
Tom Michiels, System Architect for DesignWare ARC Processors at Synopsys, presents the “How Transformers are Changing the Direction of Deep Learning Architectures” tutorial at the May 2022 Embedded Vision Summit.
The neural network architectures used in embedded real-time applications are evolving quickly. Transformers are a leading deep learning approach for natural language processing and other time-dependent, series data applications. Now, transformer-based deep learning network architectures are also being applied to vision applications with state-of-the-art results compared to CNN-based solutions.
In this presentation, Michiels introduces transformers and contrast them with the CNNs commonly used for vision tasks today. He examines the key features of transformer model architectures and shows performance comparisons between transformers and CNNs. He concludes the presentation with insights on why Synopsys thinks transformers are an important approach for future visual perception tasks.
Hello~! :)
While studying the Sutton-Barto book, the traditional textbook for Reinforcement Learning, I created PPT about the Multi-armed Bandits, a Chapter 2.
If there are any mistakes, I would appreciate your feedback immediately.
Thank you.
This is a presentation I gave as a short overview of LSTMs. The slides are accompanied by two examples which apply LSTMs to Time Series data. Examples were implemented using Keras. See links in slide pack.
A Simple Explanation of the paper XLNet(https://arxiv.org/abs/1906.08237).
It would be helpful to get to grips with the concepts XLNet before you dive into the paper.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
Introduction to Agents and Multi-agent Systems (lecture slides)Dagmar Monett
Online lecture at the School of Computer Science, University of Hertfordshire, Hatfield, UK, as part of the 10th Europe Week from 3rd to 7th March 2014.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
This presentation discusses decision trees as a machine learning technique. This introduces the problem with several examples: cricket player selection, medical C-Section diagnosis and Mobile Phone price predictor. It discusses the ID3 algorithm and discusses how the decision tree is induced. The definition and use of the concepts such as Entropy, Information Gain are discussed.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Overview of tree algorithms from decision tree to xgboostTakami Sato
For my understanding, I surveyed popular tree algorithms on Machine Learning and their evolution. This is the first time I wrote a presentation in English. So, I am happy if you give me a feedback.
Fine tune and deploy Hugging Face NLP modelsOVHcloud
Are you currently managing AI projects that require a lot of GPU power?
Are you tired of managing the complexity of your infrastructures, GPU instances and your Kubeflow yourself?
Need flexibility for your AI platform or SaaS solution?
OVHcloud innovates in AI by offering simple and turnkey solutions to train your models and put them into production.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/08/how-transformers-are-changing-the-direction-of-deep-learning-architectures-a-presentation-from-synopsys/
Tom Michiels, System Architect for DesignWare ARC Processors at Synopsys, presents the “How Transformers are Changing the Direction of Deep Learning Architectures” tutorial at the May 2022 Embedded Vision Summit.
The neural network architectures used in embedded real-time applications are evolving quickly. Transformers are a leading deep learning approach for natural language processing and other time-dependent, series data applications. Now, transformer-based deep learning network architectures are also being applied to vision applications with state-of-the-art results compared to CNN-based solutions.
In this presentation, Michiels introduces transformers and contrast them with the CNNs commonly used for vision tasks today. He examines the key features of transformer model architectures and shows performance comparisons between transformers and CNNs. He concludes the presentation with insights on why Synopsys thinks transformers are an important approach for future visual perception tasks.
Hello~! :)
While studying the Sutton-Barto book, the traditional textbook for Reinforcement Learning, I created PPT about the Multi-armed Bandits, a Chapter 2.
If there are any mistakes, I would appreciate your feedback immediately.
Thank you.
This is a presentation I gave as a short overview of LSTMs. The slides are accompanied by two examples which apply LSTMs to Time Series data. Examples were implemented using Keras. See links in slide pack.
A Simple Explanation of the paper XLNet(https://arxiv.org/abs/1906.08237).
It would be helpful to get to grips with the concepts XLNet before you dive into the paper.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
Introduction to Agents and Multi-agent Systems (lecture slides)Dagmar Monett
Online lecture at the School of Computer Science, University of Hertfordshire, Hatfield, UK, as part of the 10th Europe Week from 3rd to 7th March 2014.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
This presentation discusses decision trees as a machine learning technique. This introduces the problem with several examples: cricket player selection, medical C-Section diagnosis and Mobile Phone price predictor. It discusses the ID3 algorithm and discusses how the decision tree is induced. The definition and use of the concepts such as Entropy, Information Gain are discussed.
Revised presentation slide for NLP-DL, 2016/6/22.
Recent Progress (from 2014) in Recurrent Neural Networks and Natural Language Processing.
Profile http://www.cl.ecei.tohoku.ac.jp/~sosuke.k/
Japanese ver. https://www.slideshare.net/hytae/rnn-63761483
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
FABIA: Large Data Biclustering in Drug DesignMartin Heusel
Biclustering groups features and samples simultaneously. It is an emerging tool for analyzing large data sets like transcriptomics or chemoinformatics data.
In this work we apply FABIA biclustering to the ChEMBL database where compounds are described by the substructures they possess (the fingerprints). Chemical substructures are assumed to cause or inhibit biological activity and serve, thereby, as building blocks in drug design. ChEMBL biclusters group compounds together which have the same substructure. If a bicluster can be related to a biological activity then the biological effect of a substructure is identified. For example, FABIA found a large ChEMBL bicluster where the compounds have a common substructure which could be related to a bioactivity.
Slides from the talk on recurrent networks and LSTMs at SV AI and Big Data Association meetup. A full video of the talk—https://www.youtube.com/watch?v=TiHpdp4QC6k.
Rails have long co-existed with Javascript through a variety of ways. As the Javascript ecosystem grows more powerful and complex each day, finding a better way to make Javascript a first-class citizen in the Rails world has become compelling. Rails 5.1 will officially comes with Webpack through the Webpacker gem, but you don't have to wait for that. You can use Webpacker with Rails 4.2+ today. We describe briefly how Javascript existed in the Rails world, and the jump straight into creating a simple Rails/Javascript app from scratch in less 3 minutes.
2017 tensor flow dev summit (Sequence Models and the RNN API)
작성된 자료로 2017년 2월 22일 오후 8시 부터 Maru180에서
GDG Seoul 에서 주최한 2017 Tensorflow Dev Summit Extended Seou에서
발표를 진행
Sequence Models and the RNN API 정리 내역 공유
[한국어] Neural Architecture Search with Reinforcement LearningKiho Suh
모두의연구소에서 발표했던 “Neural Architecture Search with Reinforcement Learning”이라는 논문발표 자료를 공유합니다. 머신러닝 개발 업무중 일부를 자동화하는 구글의 AutoML이 뭘하려는지 이 논문을 통해 잘 보여줍니다.
이 논문에서는 딥러닝 구조를 만드는 딥러닝 구조에 대해서 설명합니다. 800개의 GPU를 혹은 400개의 CPU를 썼고 State of Art 혹은 State of Art 바로 아래이지만 더 빠르고 더 작은 네트워크를 이것을 통해 만들었습니다. 이제 Feature Engineering에서 Neural Network Engineering으로 페러다임이 변했는데 이것의 첫 시도 한 논문입니다.
Learning RBM(Restricted Boltzmann Machine in Practice)Mad Scientists
In Deep Learning, learning RBM is basic hierarchical components of the layer. In this slide, we can learn basic components of RBM (bipartite graph, Gibbs Sampling, Contrastive Divergence (1-CD), Energy function of entropy).
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hier...Mad Scientists
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, Honglak Lee(ICML 2009)
석사과정 세미나 발표를 위해 논문을 읽고 분석한 내용입니다. CDBN은 CNN와 DBN의 장점을 결합하여 translation invariance와 computational competence를 확보하였고, probabilistic max-pooling을 통해 image restoration을 할 수 있는 undirected DBM을 구성할 수 있게 합니다.
Sogang University Machine Learning and Data Mining lab seminar, Neural Networks for newbies and Convolutional Neural Networks. This is prerequisite material to understand deep convolutional architecture.
Relational Mate Value: Consensus and Uniqueness in Romantic EavaluationsMad Scientists
Usually we define our mate value as a traits. That is called 'Classic model' in Social Exchange theory. However, in this presentation, we will introduce 'Relational model', which examines person's 'uniqueness'. In this paper the author found that as time goes by, the mate value rate will be more precise when we apply Relational Model. Instead of Classical Mate model. Thus, we are going to discuss how they measure of these traits in detail.
While I am visiting Finland, I felt a lot of things already had flew behind me. That is because I was not familiar with Finland's start-up culture. Frankly speaking, I was not familiar with start-up culture radically.
All I heard about that before is only from youtube, coursera, or books. However, when I stayed in Finland, I felt so much people in Finland already had adapted start-up culture and entrepreneurship's mindset.
This presentation represents my experience at that time in 2012 Finland, Helsinki. Especially, I want to say thank you to Fastr books, Catch box team, and Startup sauna. Without them, these presentation and what I felt won't came out like this.
I am very proud that those companies and organisation be my friend. I hope many of Korean entreprenuers read this presentation and be stimulated by themselves to grow.
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Mad Scientists
I submitted KIISE Thesis that <face>, 2014.
In this presentation, I present why I use deep learning to find facial features and what is limitation of before method.
We think that Superhero movies are not cultural, not ideological, but it is totally cultural. Moreover, it delivers Americanization as if it is Globalization. We analyze this topic on cultural way, and suggest its ideal way.
[SW Maestro] Team Loclas 1-2 Final PresentationMad Scientists
Using Datamining, we classify 90% of non-logged on person's preference by their searching keywords.
It is sufficient to use because it makes useless value to useful and easy-to-performing a target marketing.
자본주의는 클래스가 있음을 인정하는 사회이나, 분명한것은 그 클래스가 민주주의가 보장하는 자유나 권리를 침해해선 안된다. 이 발표는 그러한 원칙을 어기고 있는 광고를 소개함으로써 현대 캐피탈이 생산하는 Class의 개념이 현재 민주주의가 가지고 있는 모든 이들의 권리를 침해하는 새로운 Class임을 밝히고, 분석한다
2. RNN(Recurrent Neural Networks)
Recurrent Neural
Network
Feedforward
Network
Structure At least contain
cycles
(so called “memory”)
No cycle
Input-output Time sequential Data Current state
Training
approaches
(Most popular)
Backpropagation
through time(BPTT)
Backpropagation
Algorithm
참고자료: http://www.cs.bham.ac.uk/~jxb/INC/l12.pdf
2
3. RNN(Recurrent Neural Networks)
• 순차적인 정보를 처리하는 모델(e.g. 자동번역)
• Recurrent: 출력 결과가 이전 계산 결과에 영향을 받음
(메모리를 가짐)
• 기본적인 RNN 구조
X_t: time step t에서의 입력
S_t: time step t에서의 hidden state. (*U, W를 학습, S_-1 = 0)
O_t: time step t에서의 출력, o_t = f(V_s_t)
기초 논문: Recurrent neural network regularization
3
4. RNN(Recurrent Neural Networks)
X_t: time step t에서의 입력
S_t: time step t에서의 hidden state. (*U, W를 학습, S_-1 = 0)
O_t: time step t에서의 출력 (e.g. 자동번역, 단어 완성)
[Note]
• Hidden state S_t는 과거 모든 정보를 담고 있고, O_t는 현재 정보
애 대한 Output이다
• RNN은 모든 시간 스텝에 대하여 같은 파라미터를 공유한다. (U,V,W)
(Long-term dependency 문제 발생) 4
5. RNN(Recurrent Neural Networks)
X_t: time step t에서의 입력
S_t: time step t에서의 hidden state. (*U, W를 학습, S_-1 = 0)
O_t: time step t에서의 출력 (e.g. 자동번역 단어 벡터)
[Note]
• Hidden state S_t는 과거 모든 정보를 담고 있고, O_t는 현재 정보
애 대한 Output이다
• RNN은 모든 시간 스텝에 대하여 같은 파라미터를 공유한다. (U,V,W)
(Long-term dependency 문제 발생) 5
6. RNN(Recurrent Neural Networks)
RNN의 학습은 BPTT(Backpropagation Through Time)으로 진행되는데,
이것은 t_n의 U,V,W를 학습하기 위해서 t_1 – t_n-1 까지의 state를 모두 더
해야 함을 의미하고, Exploding/Vanishing Gradients의 문제로 학습하기
어렵다고 발표된 바 있다.
(Bengio, On the difficulty of training recurrent neural networks)
• Explosion은 long term components를 학습할 때 일어남
• Vanish는 다음 event에 대하여 correlation을 학습하는 와중에 일어남
(delta weight가 0)
[Note] 논문 2. Exploding and vanishing gradients
6
7. RNN(Recurrent Neural Networks)
RNN의 학습은 BPTT(Backpropagation Through Time) 대신 단기 BPTT
(Truncated Backpropagation Through Time)로 대체되기도 하는데,
이와 같은 경우 장기적으로 남아있어야 할 메모리가 유실된다고 알려져 있다.
BPTT
TBPTT
7
8. RNN(Recurrent Neural Networks)
RNN 학습: BPTT(Backpropagation Through Time)
8http://kiyukuta.github.io/2013/12/09/mlac2013_day9_recurrent_neural_network_language_model.html
Green: 얼마나 잘못되었는지 기록
Orange: Green차이만큼 update
10. RNN(Recurrent Neural Networks)
RNN 학습: BPTT(Backpropagation Through Time)
10http://kiyukuta.github.io/2013/12/09/mlac2013_day9_recurrent_neural_network_language_model.html
e_h(t): error in the hidden layer
U(t+1) = U(t) + (0 ~ t-1)시점 [weight]*err*[lr] – U(t)*beta
W(t+1) = W(t) + (0 ~ t-1)시점 [recurrent state]*err*[lr] – W(t)*beta
d_h: differential of the sigmoid function
11. RNN(Recurrent Neural Networks)
RNN 학습: BPTT(Backpropagation Through Time)
11http://kiyukuta.github.io/2013/12/09/mlac2013_day9_recurrent_neural_network_language_model.html
e_h(t-tau-1): error of the past hidden layer
e_h(t-tau): is the weight W flowing in the oppositie direction
d_h: differential of the sigmoid function
12. RNN(Recurrent Neural Networks)
그 외 RNN, [Bidirectional RNN, Deep (Bidirectional RNN),
LSTM]
12http://aikorea.org/blog/rnn-tutorial-1/
step t 이후의 출력값이 입력값에도 영향을
받을 수 있다는 아이디어에서 나온 모델. 두 개의
RNN이 동시에 존재하고, 출력값은 두 개의 RNN
hidden state에 의존한다.
Bidirectional RNN:
13. RNN(Recurrent Neural Networks)
그 외 RNN, [Bidirectional RNN, Deep (Bidirectional RNN),
LSTM]
13http://aikorea.org/blog/rnn-tutorial-1/
Bidirectional RNN에서 매 시간 step마
다 추가적인 layer를 두어 구성한 RNN. 계산
capacity가 크고, 학습 데이터는 훨씬 더 많
이 필요하다.
Deep Bidirectional RNN:
14. RNN(Recurrent Neural Networks)
그 외 RNN, [Bidirectional RNN, Deep (Bidirectional RNN),
LSTM]
14http://aikorea.org/blog/rnn-tutorial-1/
RNN이 과거 time step t를 학습하여 추론하
는 과정에서 장기 의존성(long term
dependency)이 떨어진다는 단점을 극복하기
위하여 나온 네트워크. 때문에 weight를 학습
한다기 보다는 어떠한 정보를 버리고(forgot)
update할지에 초점을 맞춘 구조.
LSTM은 RNN의 뉴런 대신에 [메모리 셀]이라는
구조를 사용하여 forgot, 현재 state에 남
길 값을 결정한다.
LSTM Network:
15. LSTM(Long Short Term Memory) Architecture
Hochretier, Schmidhuber (CMU, 1997)가 개발. RNN의 장기 의존성 문제를 해결하기 위해 개
발되었음. 일련의 학습을 통해 추론을 하기 보다는 과거에 일어났던 일을 기억하고 재현하는 일을 목적으로 설계
됨.
15http://colah.github.io/posts/2015-08-Understanding-LSTMs/
X_t: t timestep에서의 input
16. LSTM(Long Short Term Memory) Architecture
Hochretier, Schmidhuber (CMU, 1997)가 개발. RNN의 장기 의존성 문제를 해결하기 위해 개
발되었음. 일련의 학습을 통해 추론을 하기 보다는 과거에 일어났던 일을 기억하고 재현하는 일을 목적으로 설계
됨. 메모리 셀 C가 들어간 버전은 Gers et al. 2000이 소개하였음.
16http://colah.github.io/posts/2015-08-Understanding-LSTMs/
X_t: t timestep에서의 input
C_t: Cell state, 정보를 더하거나 지울지 정함
17. LSTM(Long Short Term Memory) Architecture
17http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Forget gate층은 h_[t-1]과 x_t를 보고, sigmoid를 통과시켜 C_[t-1]에서의 각 숫자
를 위한 0과 1 사이의 숫자를 출력함. Output인 f_t()가 forget gate.
18. LSTM(Long Short Term Memory) Architecture
18http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Input gate층은 i_t()로 어떤 값을 갱신할지 결정한다. Tanh()층은 셀 상태에서 추가될 수
있는 후보들의 벡터인 [C_t]^~를 생성한다. 다음 단계에서 셀 스테이트 C를 갱신한다.
19. LSTM(Long Short Term Memory) Architecture
19http://colah.github.io/posts/2015-08-Understanding-LSTMs/
새로운 셀 스테이트 C_t는 forget gate를 통과한 과거 셀 상태 C_[t-1]과 input gate를
통과한 새로운 후보 셀 스테이트 [C_t]^~와의 합으로 구성된다. 이는 셀 스테이트 값을 얼만큼 갱신
할지 결정한 값으로 크기 변경된(scaled) 새 후보들이 된다. 이러한 셀 state는 gradient 곱
이 발생하지 않기 때문에 exploding이나 vanishing이 발생하지 않는다.셀 state가 장기 기
억이 가능하게 되는 이유다.
20. LSTM(Long Short Term Memory) Architecture
20http://colah.github.io/posts/2015-08-Understanding-LSTMs/
h_t는 출력을 의미하며 o_t()는 Output gate이다. Output gate는 Hidden
state(h_[t-1])와 현재 input x_t를 확인하고 어떤 부분을 출력할지 결정한다. 새로운
Hidden layer는 새로 갱신한 Cell state에 Output gate 결과를 반영하여 결정된다.
21. LSTM(Long Short Term Memory) Architecture
21http://www.slideshare.net/jpatanooga/modeling-electronic-health-records-with-recurrent-neural-networks
¤: element-wise multiplication
h_t^l: layer l의 time stamp t일때 hidden state
• RNN에는 Dropout을 붙이면 잘 동작하지 않는다고 알려져 있는데, dropout이 지워버려선 안
되는 과거 information이 지워지기 때문이라고 한다.(Recurrent Neural Network
Regularization, ICLR 2015) 참고로 이러한 이유 때문에 Dropout 연산은
Recurrent connection이 아닌 곳에서만 적용하는 것이 위 논문의 아이디어다.
g^t() = cell state 후보
[LSTM feed forward pass]
22. LSTM(Long Short Term Memory) Architecture
22http://www.slideshare.net/jpatanooga/modeling-electronic-health-records-with-recurrent-neural-networks
• LSTM은 구조에 따라 다양한 용례로 사용할 수 있음
• 2D Tensor는 # of inputs,# of examples로 구성되며
• 3D Tensor는 [2D Tensor]에 Timesteps가 포함된다. 즉 (덩어리)를 input으로 사용
[LSTM 용례]
23. LSTM(Long Short Term Memory) Architecture
23http://www.slideshare.net/jpatanooga/modeling-electronic-health-records-with-recurrent-neural-networks
• 각 게이트(forget, input, output)마다 셀 상태를 추가로 참고할 수 있도록 하여 게이트
들이 과거 데이터에 조금 더 의존적으로 계산을 수행하도록 만든 LSTM
• Gers and Schmidhuber (2000)
[LSTM 변형: Peephole connection]
24. LSTM(Long Short Term Memory) Architecture
24http://www.slideshare.net/jpatanooga/modeling-electronic-health-records-with-recurrent-neural-networks
• Forget gate와 input gate를 단일 Cell update gate로 통합함
• Input gate가 사라지는 대신, input은 후보 셀 스테이트를 결정하고 새 hidden state
를 결정하는데 사용됨
• 결과 모델은 표준 LSTM 모델보다 단순함
• Cho et al. (2014)
[LSTM 변형: GRU(Gated Recurrent Unit)]
25. LSTM(Long Short Term Memory) Architecture
25
• Yao et al. (2015)가 소개한 Depth gated RNNs와 같은 스타일도 있음
• Clockwork RNNs, Koutnik et al. (2014)는 장기 의존성에 효율적으로 작동하도록
변형한 RNN임
• Greff et al. (2015)는 유명한 변형(GRU, Peephole 등)을 비교하는 실험을 통해 결
과가 비슷하다는 것을 발견
• Jozefowics et al. (2015)는 다양한 RNN 변형 모델들을 실험하고 비교함으로써 특정
데이터 세트에 맞는 구조들이 있음을 확인하였음
• LSTM의 다음 단계는 Attention이라고 하는데, 예를 들면 다른것들보다 더 중요한 정보 여러
개를 취합하는 것임
[LSTM 변형: 그 외]
26. LSTM(Long Short Term Memory) Architecture
26
• Yao et al. (2015)가 소개한 Depth gated RNNs와 같은 스타일도 있음
• Clockwork RNNs, Koutnik et al. (2014)는 장기 의존성에 효율적으로 작동하도록
변형한 RNN임
• Greff et al. (2015)는 유명한 변형(GRU, Peephole 등)을 비교하는 실험을 통해 결
과가 비슷하다는 것을 발견
• Jozefowics et al. (2015)는 다양한 RNN 변형 모델들을 실험하고 비교함으로써 특정
데이터 세트에 맞는 구조들이 있음을 확인하였음
• LSTM의 다음 단계는 Attention이라고 하는데, 예를 들면 다른것들보다 더 중요한 정보 여러
개를 취합하는 것임
• [Input]: 재현이 침대에서 일어났다.
• [Input]: 재현이 테이블로 가서 사과를 집어들었다.
• [Input]: 그는 사과를 먹었다.
• [Input]: 재현이 거실로 이동했다.
• [Input]: 사과를 떨어뜨렸다.
• [Question]: 재현이 사과를 떨어뜨린 곳은?
[LSTM 변형: 그 외]
27. LSTM(Long Short Term Memory) Architecture
27
• Yao et al. (2015)가 소개한 Depth gated RNNs와 같은 스타일도 있음
• Clockwork RNNs, Koutnik et al. (2014)는 장기 의존성에 효율적으로 작동하도록
변형한 RNN임
• Greff et al. (2015)는 유명한 변형(GRU, Peephole 등)을 비교하는 실험을 통해 결
과가 비슷하다는 것을 발견
• Jozefowics et al. (2015)는 다양한 RNN 변형 모델들을 실험하고 비교함으로써 특정
데이터 세트에 맞는 구조들이 있음을 확인하였음
• LSTM의 다음 단계는 Attention이라고 하는데, 예를 들면 다른것들보다 더 중요한 정보 여러
개를 취합하는 것임
• [Input]: 재현이 침대에서 일어났다.
• [Input]: 재현이 테이블로 가서 사과를 집어들었다.
• [Input]: 그는 사과를 먹었다.
• [Input]: 재현이 거실로 이동했다.
• [Input]: 사과를 떨어뜨렸다.
• [Question]: 재현이 사과를 떨어뜨린 곳은?
• [Answer]: 거실
[LSTM 변형: 그 외]
28. LSTM(Long Short Term Memory) Architecture
28
• Yao et al. (2015)가 소개한 Depth gated RNNs와 같은 스타일도 있음
• Clockwork RNNs, Koutnik et al. (2014)는 장기 의존성에 효율적으로 작동하도록
변형한 RNN임
• Greff et al. (2015)는 유명한 변형(GRU, Peephole 등)을 비교하는 실험을 통해 결
과가 비슷하다는 것을 발견
• Jozefowics et al. (2015)는 다양한 RNN 변형 모델들을 실험하고 비교함으로써 특정
데이터 세트에 맞는 구조들이 있음을 확인하였음
• LSTM의 다음 단계는 Attention이라고 하는데, 예를 들면 다른것들보다 더 중요한 정보 여러
개를 취합하는 것임
• [Input]: 재현이 침대에서 일어났다.
• [Input]: 재현이 테이블로 가서 사과를 집어들었다.
• [Input]: 그는 사과를 먹었다.
• [Input]: 재현이 거실로 이동했다.
• [Input]: 사과를 떨어뜨렸다.
• [Question]: 재현이 사과를 떨어뜨린 곳은?
• [Answer]: 거실
• Show, Attend and Tell, Kelvin Xu et al. (2015)
[LSTM 변형: 그 외]