The Math and internal workings of Variational Auto Encoders. It talks about Variational Inference and how VAEs take advantage of them to bypass having to calculate Marginal Probability distribution.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Slides reviewing the paper:
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in Neural Information Processing Systems, pp. 6000-6010. 2017.
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
This is my slides for introducing sequence to sequence model and Recurrent Neural Network(RNN) to my laboratory colleagues.
Hyemin Ahn, @CPSLAB, Seoul National University (SNU)
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Slides reviewing the paper:
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in Neural Information Processing Systems, pp. 6000-6010. 2017.
The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms. We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-French translation, we outperform the previoussingle state-of-the-art with model by 0.7 BLEU, achieving a BLEU score of 41.1.
Research on character level language modelling using LSTM for semi-supervised learning. The objective is learning inner layer representations of the language model for transfer learning into a classification one.
Generalizing NLP processes using Bi-directional LSTMs to learn character(byte) level embeddings of financial news headlines up too 8 bits ( 2**8 -1) in order to study the relationship between character vectors in financial news headlines in order to transfer learning in to classification models using UTF-8 encoding. Many traditional NLP steps (lemmatize, POS, NER, stemming...) are skipped when diving to byte level making the process more universal in terms of scope then task specific.
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
Interpreting deep learning and machine learning models is not just another regulatory burden to be overcome. Scientists, physicians, researchers, and analyst that use these technologies for their important work have the right to trust and understand their models and the answers they generate. This talk is an overview of several techniques for interpreting deep learning and machine learning models and telling stories from their results.
Speaker: Patrick Hall is a Data Scientist and Product Engineer at H2O.ai. He’s also an Adjunct Professor at George Washington University for the Department of Decision Sciences. Prior to joining H2O, Patrick spent many years as a Senior Data Scientist SAS and has worked with many Fortune 500 companies on their data science and machine learning problems. https://www.linkedin.com/in/jpatrickhall
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
AI & BigData Online Day 2021
Website - https://aiconf.com.ua/
Youtube - https://www.youtube.com/startuplviv
FB - https://www.facebook.com/aiconf
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
A brief review of the paper:
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL-HLT (pp. 2227–2237)
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
A Simple Explanation of the paper XLNet(https://arxiv.org/abs/1906.08237).
It would be helpful to get to grips with the concepts XLNet before you dive into the paper.
Research on character level language modelling using LSTM for semi-supervised learning. The objective is learning inner layer representations of the language model for transfer learning into a classification one.
Generalizing NLP processes using Bi-directional LSTMs to learn character(byte) level embeddings of financial news headlines up too 8 bits ( 2**8 -1) in order to study the relationship between character vectors in financial news headlines in order to transfer learning in to classification models using UTF-8 encoding. Many traditional NLP steps (lemmatize, POS, NER, stemming...) are skipped when diving to byte level making the process more universal in terms of scope then task specific.
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
Interpreting deep learning and machine learning models is not just another regulatory burden to be overcome. Scientists, physicians, researchers, and analyst that use these technologies for their important work have the right to trust and understand their models and the answers they generate. This talk is an overview of several techniques for interpreting deep learning and machine learning models and telling stories from their results.
Speaker: Patrick Hall is a Data Scientist and Product Engineer at H2O.ai. He’s also an Adjunct Professor at George Washington University for the Department of Decision Sciences. Prior to joining H2O, Patrick spent many years as a Senior Data Scientist SAS and has worked with many Fortune 500 companies on their data science and machine learning problems. https://www.linkedin.com/in/jpatrickhall
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
AI & BigData Online Day 2021
Website - https://aiconf.com.ua/
Youtube - https://www.youtube.com/startuplviv
FB - https://www.facebook.com/aiconf
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
A brief review of the paper:
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL-HLT (pp. 2227–2237)
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
A Simple Explanation of the paper XLNet(https://arxiv.org/abs/1906.08237).
It would be helpful to get to grips with the concepts XLNet before you dive into the paper.
The name MATLAB stands for MATrix LABoratory.MATLAB is a high-performance language for technical computing.
It integrates computation, visualization, and programming environment. Furthermore, MATLAB is a modern programming language environment: it has sophisticated data structures, contains built-in editing and debugging tools, and supports object-oriented programming.
These factor make MATLAB an excellent tool for teaching and research.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This tutor shows the train and test set split with histogram and a probability density function in scikit-learn on synthetic datasets. The dataset is very simple as a reference of understanding.
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms
that use space that is linear or sublinear in the dimension. We prove general results showing that any sketch of a matrix that satisfies a certain operator norm guarantee can be used to approximate these scores. We instantiate these results with powerful matrix sketching techniques such as Frequent Directions and random projections to derive efficient and practical algorithms for these problems, which we validate over real-world data sets. Our main technical contribution is to prove matrix perturbation
inequalities for operators arising in the computation of these measures.
-Proceedings: https://arxiv.org/abs/1804.03065
-Origin: https://arxiv.org/abs/1804.03065
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSIJCI JOURNAL
Sliding window sums are widely used for string indexing, hashing and time series analysis. We have
developed a family of the generic vectorized sliding sum algorithms that provide speedup of O(P/w) for
window size w and number of processors P. For a sum with a commutative operator the speedup is
improved to O(P/log(w)). Even more important, our algorithms exhibit efficient memory access patterns. In
this paper we study the application of sliding sum algorithms to the training and inference of Deep Neural
Networks. We demonstrate how both pooling and convolution primitives could be expressed as sliding
sums and evaluated by the compute kernels with a shared structure. We show that the sliding sum
convolution kernels are more efficient than the commonly used GEMM kernels on CPUs and could even
outperform their GPU counterparts.
Anomaly detection using deep one class classifier홍배 김
- Anomaly detection의 다양한 방법을 소개하고
- Support Vector Data Description (SVDD)를 이용하여
cluster의 모델링을 쉽게 하도록 cluster의 형상을 단순화하고
boundary근방의 애매한 point를 처리하는 방법 소개
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. Generative Models
They take in data as input and learn to generate new data points from the same
data distribution.
They learn the hidden representations using unsupervised learning techniques
3. Variational Autoencoder
As the name suggests, it is an auto-encoder, which learns attributes from the data
points and represents them in terms of latent variables.
4. Problems
How can we make use of the auto-encoder architecture to generate new data points ?
Assuming we can pass a vector from that learnt latent space to the decoder, how can
guarantee it’s not going to result in a garbage output.
VAEs address the above problems
5. 1. How do we use the Auto Encoder Architecture for generation ?
So, if we train the Auto encoder network and somehow learn the data distribution of the latent
space, we can then sample from that latent space and pass it down to decoder and generate
data points
But, there is a problem.
6.
7. 2. How can we make sure that if we sample from our latent space, we are going to get new
and meaningful output ?
VAEs achieve this by constricting the latent space.
The encoder and decoder parameters are tuned to accommodate for this setup
9. Calculating Marginal Probability
If X = (x1, x2, x3) and Z = (z1, z2)
then,
𝑃 𝑍 𝑋 =
𝑃(𝑋|𝑍)∗𝑃(𝑍)
𝑃(𝑋)
Here, the P(X) is very difficult to calculate, especially in higher dimensions.
It takes the form of 𝑧1
𝑧2
𝑃 𝑋1, 𝑋2, 𝑋3, 𝑍1, 𝑍2 𝑑𝑧1 ∗ 𝑑𝑧2 and is intractable.
There are ways to solving this by using,
1. Using Monte Carlo Integration techniques
2. Variational Inference
11. As x is already given, log p(x) is a constant.
And KL(q(z) || p(z|x)) is what we wanted to minimize and it is always
>= 0
0 <= p(x) <= 1 and KL >= 0
So, it is equivalent to maximizing the 3rd term.
It is called Variational Lower bound
12. This is nothing but,
Expectation of p(x|z) w.r.t q(z|x) +
KL(q(z) || p(z|x))
So, Maximizing lower bound means,
Minimizing KL (as >=0), and
For a given q and z, maximize the the likelihood of observing the x
20. References
Lecture by Ali Ghodsi https://www.youtube.com/watch?v=uaaqyVS9-rM
Lecture by Pascal Poupart https://www.youtube.com/watch?v=DWVlEw0D3gA