Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Vitaly Bondar
A presentation about a new Google Research paper in the text-to-image task - Imagen.
This latent diffusion-based model outperforms DALLE-2 and other models and produces incredibly realistic images.
A presentation about the development of the ideas from the autoencoder to the Stable Diffusion text-to-image model.
Models covered: autoencoder, VAE, VQ-VAE, VQ-GAN, latent diffusion, and stable diffusion.
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
이번 논문은 요즘 핫한 Diffusion을 처음으로 유행시킨 Denoising Diffusion Probabilistic Models (DDPM) 입니다. ICML 2015년에 처음 제안된 Diffusion의 여러 실용적인 측면들을 멋지게 해결하여 그 유행의 시작을 알린 논문인데요, Generative Model의 여러 분야와 Diffusion, 그리고 DDPM에서는 무엇이 바뀌었는지 알아보도록 하겠습니다.
논문 링크: https://arxiv.org/abs/2006.11239
영상 링크: https://youtu.be/1j0W_lu55nc
As proposed by the Paper, High-Resolution Image Synthesis with Latent Diffusion Models, latent diffusion models are a simple and efficient way that improve both the training and sampling efficiency of denoising diffusion models while retaining their quality
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(Note: Discover a slightly updated version of this deck at slideshare.net/LoicMerckel/introduction-to-llms.)
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Vitaly Bondar
A presentation about a new Google Research paper in the text-to-image task - Imagen.
This latent diffusion-based model outperforms DALLE-2 and other models and produces incredibly realistic images.
A presentation about the development of the ideas from the autoencoder to the Stable Diffusion text-to-image model.
Models covered: autoencoder, VAE, VQ-VAE, VQ-GAN, latent diffusion, and stable diffusion.
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
이번 논문은 요즘 핫한 Diffusion을 처음으로 유행시킨 Denoising Diffusion Probabilistic Models (DDPM) 입니다. ICML 2015년에 처음 제안된 Diffusion의 여러 실용적인 측면들을 멋지게 해결하여 그 유행의 시작을 알린 논문인데요, Generative Model의 여러 분야와 Diffusion, 그리고 DDPM에서는 무엇이 바뀌었는지 알아보도록 하겠습니다.
논문 링크: https://arxiv.org/abs/2006.11239
영상 링크: https://youtu.be/1j0W_lu55nc
As proposed by the Paper, High-Resolution Image Synthesis with Latent Diffusion Models, latent diffusion models are a simple and efficient way that improve both the training and sampling efficiency of denoising diffusion models while retaining their quality
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(Note: Discover a slightly updated version of this deck at slideshare.net/LoicMerckel/introduction-to-llms.)
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Delve into this insightful article to explore the current state of generative AI, its ethical implications, and the power of generative AI models across various industries.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
A talk on Transformers at GDG DevParty
27.06.2020
Link to Google Slides version: https://docs.google.com/presentation/d/1N7ayCRqgsFO7TqSjN4OWW-dMOQPT5DZcHXsZvw8-6FU/edit?usp=sharing
Generative AI models, such as GANs and VAEs, have the potential to create realistic and diverse synthetic data for various applications, from image and speech synthesis to drug discovery and language modeling. However, training these models can be challenging due to the instability and mode collapse issues that often arise. In this workshop, we will explore how stable diffusion, a recent training method that combines diffusion models and Langevin dynamics, can address these challenges and improve the performance and stability of generative models. We will use a pre-configured development environment for machine learning, to run hands-on experiments and train stable diffusion models on different datasets. By the end of the session, attendees will have a better understanding of generative AI and stable diffusion, and how to build and deploy stable generative models for real-world use cases.
Using synthetic data for computer vision model trainingUnity Technologies
During this webinar Unity’s computer vision team provides an overview of computer vision, walks through current real-world data workflows, and explains why companies are moving toward synthetically generated data as an alternate data source for model training.
Watch the webinar: https://resources.unity.com/ai-ml/cv-webinar-dec-2021
最近のNLP×DeepLearningのベースになっている"Transformer"について、研究室の勉強会用に作成した資料です。参考資料の引用など正確を期したつもりですが、誤りがあれば指摘お願い致します。
This is a material for the lab seminar about "Transformer", which is the base of recent NLP x Deep Learning research.
Slide for Arithmer Seminar given by Dr. Daisuke Sato (Arithmer) at Arithmer inc.
The topic is on "explainable AI".
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
Presenting the landscape of AI/ML in 2023 by introducing a quick summary of the last 10 years of its progress, current situation, and looking at things happening behind the scene.
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 231번째 논문 review 입니다
이번 논문은 Google Brain에서 나온 A Simple Framework for Contrastive Learning of Visual Representations입니다. Geoffrey Hinton님이 마지막 저자이시기도 해서 최근에 더 주목을 받고 있는 논문입니다.
이 논문은 최근에 굉장히 핫한 topic인 contrastive learning을 이용한 self-supervised learning쪽 논문으로 supervised learning으로 학습한 ResNet50와 동일한 성능을 얻을 수 있는 unsupervised pre-trainig 방법을 제안하였습니다. Data augmentation, Non-linear projection head, large batch size, longer training, NTXent loss 등을 활용하여 훌륭한 representation learning이 가능함을 보여주었고, semi-supervised learning이나 transfer learning에서도 매우 뛰어난 결과를 보여주었습니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/2002.05709
영상링크: https://youtu.be/FWhM3juUM6s
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
An introduction to computer vision with Hugging FaceJulien SIMON
In this code-level talk, Julien will show you how to quickly build and deploy computer vision applications based on Transformer models. Along the way, you'll learn about the portfolio of open source and commercial Hugging Face solutions, and how they can help you deliver high-quality solutions faster than ever before.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Delve into this insightful article to explore the current state of generative AI, its ethical implications, and the power of generative AI models across various industries.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
A talk on Transformers at GDG DevParty
27.06.2020
Link to Google Slides version: https://docs.google.com/presentation/d/1N7ayCRqgsFO7TqSjN4OWW-dMOQPT5DZcHXsZvw8-6FU/edit?usp=sharing
Generative AI models, such as GANs and VAEs, have the potential to create realistic and diverse synthetic data for various applications, from image and speech synthesis to drug discovery and language modeling. However, training these models can be challenging due to the instability and mode collapse issues that often arise. In this workshop, we will explore how stable diffusion, a recent training method that combines diffusion models and Langevin dynamics, can address these challenges and improve the performance and stability of generative models. We will use a pre-configured development environment for machine learning, to run hands-on experiments and train stable diffusion models on different datasets. By the end of the session, attendees will have a better understanding of generative AI and stable diffusion, and how to build and deploy stable generative models for real-world use cases.
Using synthetic data for computer vision model trainingUnity Technologies
During this webinar Unity’s computer vision team provides an overview of computer vision, walks through current real-world data workflows, and explains why companies are moving toward synthetically generated data as an alternate data source for model training.
Watch the webinar: https://resources.unity.com/ai-ml/cv-webinar-dec-2021
最近のNLP×DeepLearningのベースになっている"Transformer"について、研究室の勉強会用に作成した資料です。参考資料の引用など正確を期したつもりですが、誤りがあれば指摘お願い致します。
This is a material for the lab seminar about "Transformer", which is the base of recent NLP x Deep Learning research.
Slide for Arithmer Seminar given by Dr. Daisuke Sato (Arithmer) at Arithmer inc.
The topic is on "explainable AI".
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
Presenting the landscape of AI/ML in 2023 by introducing a quick summary of the last 10 years of its progress, current situation, and looking at things happening behind the scene.
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 231번째 논문 review 입니다
이번 논문은 Google Brain에서 나온 A Simple Framework for Contrastive Learning of Visual Representations입니다. Geoffrey Hinton님이 마지막 저자이시기도 해서 최근에 더 주목을 받고 있는 논문입니다.
이 논문은 최근에 굉장히 핫한 topic인 contrastive learning을 이용한 self-supervised learning쪽 논문으로 supervised learning으로 학습한 ResNet50와 동일한 성능을 얻을 수 있는 unsupervised pre-trainig 방법을 제안하였습니다. Data augmentation, Non-linear projection head, large batch size, longer training, NTXent loss 등을 활용하여 훌륭한 representation learning이 가능함을 보여주었고, semi-supervised learning이나 transfer learning에서도 매우 뛰어난 결과를 보여주었습니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/2002.05709
영상링크: https://youtu.be/FWhM3juUM6s
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
An introduction to computer vision with Hugging FaceJulien SIMON
In this code-level talk, Julien will show you how to quickly build and deploy computer vision applications based on Transformer models. Along the way, you'll learn about the portfolio of open source and commercial Hugging Face solutions, and how they can help you deliver high-quality solutions faster than ever before.
Multimodal foundation models are a revolutionary class of AI models that provide impressive abilities to generate multimedia content and do so by interactive prompts in a seemingly creative manner. These foundation models are often self-supervised transformer-based models pre-trained on large volumes of data, typically collected from the web. They already form the basis of all state-of-the-art systems in computer vision and natural language processing across a wide range of tasks and have shown impressive transfer learning abilities. Despite their immense potential, these foundation models face challenges in fundamental perception tasks such as spatial grounding and temporal reasoning, have difficulty to operate on low-resource scenarios, and neglect human-alignment for ethical, legal, and societal acceptance. In this talk I will highlight recent work from my lab that identifies several of these challenges as well as ways to update foundation models to address these challenges and to do so in a sustainable way, without the need to retrain from scratch.
BigDL webinar - Deep Learning Library for SparkDESMOND YUEN
BigDL is a distributed deep learning library for Apache Spark*
and a unified Big Data Platform Driving Analytics and Data Science.
If you like what you read be sure you ♥ it below. Thank you!
[DSC Europe 23] Alexander Kovalchuk - Finetuning Stable Diffusion with low-ra...DataScienceConferenc1
In this talk, I would like to tell the audience about the existing fine-tuning techniques for Stable Diffusion models including hypernetworks and low-rank adaptation techniques - their advantages and disadvantages and how they could be applied to all sorts of models not limited to image generation. I am sure that every listener would benefit from incorporating this approach into their downstream tasks.
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetEric Haibin Lin
Training large deep learning models like Mask R-CNN and BERT takes lots of time and compute resources. Using MXNet, the Amazon Web Services deep learning framework team has been working with NVIDIA to optimize many different areas to cut the training time from hours to minutes.
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...Databricks
BigDL is a distributed deep learning framework for Apache Spark open sourced by Intel. BigDL helps make deep learning more accessible to the Big Data community, by allowing them to continue the use of familiar tools and infrastructure to build deep learning applications. With BigDL, users can write their deep learning applications as standard Spark programs, which can then directly run on top of existing Spark or Hadoop clusters.
In this session, we will introduce BigDL, how our customers use BigDL to build End to End ML/DL applications, platforms on which BigDL is deployed and also provide an update on the latest improvements in BigDL v0.1, and talk about further developments and new upcoming features of BigDL v0.2 release (e.g., support for TensorFlow models, 3D convolutions, etc.).
The presentation consist of following:
- pimpl Idiom
- Introduction to ABC software firm – Rahul’s story.
- The SOLID Principles
- Outline of other principles : KISS , DRY , YAGNI.
- The story of a Paper- Boy : Law of Demeter
Kubernetes is making the promise of changing the datacenter from being a group of computer to "a computer" itself. This presentation outlines the new features in K8S with 1.1 and 1.2 release.
It is a presentation for the concept of deep lap which is machine learning and artificial intelligence
DeepLab is a series of state-of-the-art deep learning models developed for semantic image segmentation, which is the process of partitioning an image into segments where each segment corresponds to a specific object or region within the image. This detailed exploration will cover the evolution of DeepLab, its architecture, core techniques, applications, and its impact on the field of computer vision.
### Evolution of DeepLab
DeepLab has undergone multiple iterations, each improving upon the previous in terms of accuracy and efficiency. The major versions are:
1. **DeepLabv1** (2014)
2. **DeepLabv2** (2015)
3. **DeepLabv3** (2017)
4. **DeepLabv3+** (2018)
#### DeepLabv1
The first version of DeepLab introduced the idea of employing atrous (dilated) convolutions in convolutional neural networks (CNNs). Atrous convolutions allow for control over the resolution at which feature responses are computed within the network, effectively enabling the network to have a larger receptive field without increasing the number of parameters or the amount of computation required. This approach helps to capture more contextual information, which is crucial for accurately segmenting images.
**Key Features:**
- **Atrous Convolutions**: By inserting spaces (or holes) between the convolutional kernel elements, atrous convolutions enlarge the field of view of filters without increasing the number of parameters or computational cost.
- **Fully Convolutional Networks (FCNs)**: DeepLabv1 leverages FCNs to ensure that the input image's spatial dimensions are preserved, facilitating dense predictions needed for segmentation.
- **CRF (Conditional Random Fields)**: Post-processing with CRFs is used to refine the boundaries of the segmented regions, leveraging spatial consistency and smoothness.
#### DeepLabv2
DeepLabv2 builds on the success of the first version by introducing the Atrous Spatial Pyramid Pooling (ASPP) module. This module helps capture multi-scale contextual information by applying atrous convolutions with different rates, which essentially probes the input image with filters of multiple effective fields of view.
**Key Features:**
- **ASPP Module**: It combines several parallel atrous convolution layers with different rates, capturing information at multiple scales.
- **Improved CRF**: The CRF used in DeepLabv2 is more deeply integrated and fine-tuned to enhance the segmentation performance.
#### DeepLabv3
DeepLabv3 further improves the model by addressing some limitations of the previous versions. It refines the ASPP module and removes the need for CRF post-processing by integrating stronger and more effective feature extraction techniques within the network itself.
**Key Features:**
- **Enhanced ASPP**: This version of ASPP includes batch normalization and image-level features, which improve the overall robustness and accuracy.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
8. Latent diffusions
● VQGAN used for encoding/decoding
● Generation happens in compact, semantically equal space
● UNet in DM uses inductive bias and scales
● Cross-attention or channels stacking used for conditioning
9. Latent diffusions: training
2 training phases:
1. Autoencoder
a. Loss: Patch-based GAN loss + Perceptual loss
b. Regularization: KL-loss (close to VAE) OR quantization in Decoder (like VQ-GAN)
2. Various generative tasks
a. All trainings done on single A100
b. Loss: classical diffusion L2 restoration loss
10. Latent diffusions: autoencoder
Downsampling for 4-16x: speedup of generative training without sampling quality loss
KL-regularization give better autoencoder metrics, but quantization in Decoder version
shows better samples quality.
20. DALL-E 2 (unClip)
2 stages (4 in reality ;) )
● Generate CLIP image embedding from text encoding/image encoding
● Decode image embedding to the image (decoder + 2 stages of diffusion SR)
21. DALL-E 2: Decoder
Modified GLIDE model (3.5B) convert embedding
to the image, then diffusion upsampling 64->256, 256->1024
GLIDE input: CLIP embedding projections, timestamp, 4 context tokens (?)
Training:
● Use ¼ of the image
● Set the CLIP embeddings to zero (or a learned embedding) 10% of the time
● dropping the text caption 50% of the time
● For upsampling models add noise to the inputs (1-gaussian, 2-BSR degradations)
“Our decoder model provides a unique opportunity to explore CLIP latent space by
allowing us to directly visualize what the CLIP image encoder is seeing.”
22. DALL-E 2: Decoder
“Our decoder model provides a unique opportunity to explore CLIP latent space by
allowing us to directly visualize what the CLIP image encoder is seeing.”
23. DALL-E 2: Prior model
2 types:
● Autoregressive
○ GPT-like
○ 319 main PCA components from 1024 clip values (quantized to 1024 values)
○ Dot-product of text and image embeddings as input token (0.5 on inference)
● Diffusion model conditioned on input
○ Transformer-based
○ Casual mask for predicted embedding
○ Prompt: encoded text, the CLIP text embedding, an embedding for the diffusion
timestep, the noise CLIP image embedding
○ generate two samples and select the one with a higher dot product with z_t.