GENERATIVE ARTIFICIAL
INTELLIGENCE
NAME:- SHUBHAM SINGH
U.ROLL:- 21EUCEC056
C.ROLL:- 21/529
BRANCH:- ELECTRONICS AND COMMUNICATION ENGINEERING
SEMESTER:- V1th
OVERVIEW
1. INTRODUCTION
2. BASIC CONCEPT
3. GENERATIVE AI MODEL CATEGORIES
4. NEURAL NETWORK ARCHITECTURES FOR GENERATIVE AI
INTRODUCTION
 Generative AI refers to a category of artificial intelligence techniques and models that
are designed to generate new content, such as images, text, audio, or even video, that is
similar to the data it was trained on.
 These models learn the underlying patterns and structures of the data they are exposed
to and then use that understanding to create new examples that resemble the original
data.
GENERATIVEAI AND LARGE LANGUAGE
MODELS
 The rapid pace of AI development and public release tools such as
ChatGPT,GitHubCopilot, and DALL-E have attracted widespread
attention, optimism.
 These technologies are all examples of “generative AI,” a class of
machine learning technologies that can generate new content—such
as text, images, music, or video—by analyzing patterns in existing
data.
BASIC CONCEPTS
 Supervised Learning:
Supervised learning is a type of machine learning where the algorithm is trained on a labeled dataset,
meaning that each training example consists of input-output pairs. The algorithm learns to map inputs to
corresponding outputs.
 Unsupervised Learning:
Unsupervised learning involves training a model on an unlabeled dataset, where the algorithm explores
the inherent structure and patterns within the data without explicit guidance in the form of output labels.
Generative Models:
Generative models are a category of machine learning models designed to generate new, realistic data
samples. These models learn the underlying distribution of the data, allowing them to create novel
examples that resemble the training data.
Discriminative Models:
Discriminative models focus on learning the boundary between different classes or categories within the
input data. These models aim to discriminate between different classes rather than generating new data.
Objective: Primarily used for classification tasks, where the goal is to predict the class labels of input data.
Probability Distributions:
Probability distributions describe the likelihood of different outcomes in a given set of
events or data. In the context of machine learning, they represent the likelihood of
different values for variables.
Role: Generative models often involve estimating or sampling from probability
distributions to generate realistic data samples.
Generative Processes:
Generative processes describe the step-by-step procedures through which new data is
generated by a model, typically guided by a learned probability distribution.
GENERATIVEAI MODELS CATEGORIES
TEXT-TO-IMAGE MODELS
1. DALL·E 2 :
 DALL·E 2, created by OpenAI, is able to generate original, genuine and realistic images and art from a
prompt consisting on a text description. it is possible to use the OPENAI API to get access to this
model.
 It uses the CLIP neural network. CLIP (Contrastive Language-Image Pre-Training) is a neural network
trained on a variety of (image, text) pairs.
 For example: Image generated from the prompt ”A shiba inu wearing a beret and black turtleneck”.
2 IMAGEN :
 Imagen is a text-to-image diffusion model consisting on large transformer language models.
 The model was created by Google and the API can be found in their web page.
 The main discovery observed with this model made is that large language models, pre-trained on text-
only corpora, are very effective at encoding text for image synthesis.
Muse :
 This model is a Text-to-image transformer model that achieves state-of the-art image
generation while being more efficient than diffusion or autoregressive models.
 it is trained on a masked modelling task in discrete token space
 it is more efficient because of the use of discrete tokens and requiring fewer sampling
iterations.
TEXT-TO-3D MODELS
some industries like gaming, it is necessary to generate 3D images.
Dreamfusion :
 DreamFusion is a text-to-3D model developed by Google Research that uses a pretrained 2D
text-to-image diffusion model to perform textto-3D synthesis.
 Dreamfusion replaces previous CLIP techniques with a loss derived from distillation of a 2D
diffusion model
 sampling in parameter space is much harder than in pixels as we want to create 3D models
that look like good images when rendered from random angles. To solve the issue, this model
uses a differentiable generator.
 an image created by Dreamfusion from one particular angle along with all the variations that
can be generated from additional text prompts
Magic3D :
 This model is a text to 3D model made by NVIDIA Corporation. While the Dreamfusion
model achieves remarkable results, the method has two problems: mainly, the long
processing time and the low-quality of the generated images.
 These problems are addressed by Magic3D using a two-stage optimization framework .
Firstly, Magic3D builds a low-resolution diffusion prior and, then, it accelerates with a
sparse 3D hash grid structure.
 A textured 3D mesh model is furthered optimized with an efficient differentiable render.
Comparatively, regarding human evaluation, the model achieves better results, as 61.7%
prefer this model to DreamFusion.
IMAGE-TO-TEXT MODELS
Flamingo :
 A Visual Language Model created by Deepmind using few shot learning on a wide range of open-ended
vision and language tasks, simply by being prompted with a few input/output examples .
 The input of Flamingo contains visually conditioned autoregressive text generation models able to ingest
a sequence of text tokens interleaved with images and/or videos and produce text as output.
 A query is made to the model along with a photo or a video and the model answers with a text answer.
VisualGPT :
 VisualGPT is an image captioning model made by OpenAI . VisualGPT leverages
knowledge from the pretrained language model GPT-2.
 the biggest advantage of this model is that it does not need for as much data as other
image-to-text models
TEXT-TO-VIDEO MODELS
Phenaki :
 This model has been made by Google Research, and it is capable of performing realistic video synthesis,
given a sequence of textual prompts.
 Phenaki is the first model that can generate videos from open domain time variable prompts.
 To address data issues, it performs joint training on a large image-text pairs dataset as well as a smaller
number of video-text examples can result in generalization beyond what is available in the video datasets.
 limitations come from computational capabilities for videos of variable length. The model has three parts:
the C-ViViT encoder, the training transformer and the video generator.
Soundify :
 Soundify is a system developed by Runway that matches sound effects to video. This
system uses quality sound effects libraries and CLIP (a neural network with zero-shot image
classification capabilities cited before).
 the system has three parts: classification, synchronization, and mix. The classification
matches effects to a video by classifying sound emitters within.
TEXT-TO-CODE MODELS
Codex :
 AI system created by OpenAI which translates text to code. It is a general-purpose programming model,
as it can be applied to basically any programming task .
 Programming can be broken down into two parts: breaking a problem down into simpler problems and
mapping those problems into existing code (libraries, APIs, or functions) that already exist.
NEURAL NETWORKARCHITECTURES FOR
GENERATIVEAI
Autoencoder: Imagine you have a magic trick where you give someone a picture, they
squish it down into a small piece of paper, and then someone else can stretch that paper
back into the original picture. That's kind of how autoencoders work. They compress
data into a smaller representation (encoding) and then try to reconstruct the original
data from that compressed form. Autoencoders are used for tasks like image denoising
or dimensionality reduction.
Generative Adversarial Network (GAN): Picture two artists, one trying to forge paintings
and the other trying to spot the fakes. The forger keeps getting better until the spotter
can't tell the difference between the real and fake paintings. That's the idea behind
GANs. They consist of two neural networks: a generator that creates new data samples,
like images, and a discriminator that tries to differentiate between real and fake
samples. Through this back-and-forth process, both networks get better at their
respective tasks, ultimately resulting in the generator creating very realistic-looking
outputs.
ThankYou

SHUBHAM AI PPT for grapsp about artificial intelligence.pdf

  • 1.
    GENERATIVE ARTIFICIAL INTELLIGENCE NAME:- SHUBHAMSINGH U.ROLL:- 21EUCEC056 C.ROLL:- 21/529 BRANCH:- ELECTRONICS AND COMMUNICATION ENGINEERING SEMESTER:- V1th
  • 2.
    OVERVIEW 1. INTRODUCTION 2. BASICCONCEPT 3. GENERATIVE AI MODEL CATEGORIES 4. NEURAL NETWORK ARCHITECTURES FOR GENERATIVE AI
  • 3.
    INTRODUCTION  Generative AIrefers to a category of artificial intelligence techniques and models that are designed to generate new content, such as images, text, audio, or even video, that is similar to the data it was trained on.  These models learn the underlying patterns and structures of the data they are exposed to and then use that understanding to create new examples that resemble the original data.
  • 4.
    GENERATIVEAI AND LARGELANGUAGE MODELS  The rapid pace of AI development and public release tools such as ChatGPT,GitHubCopilot, and DALL-E have attracted widespread attention, optimism.  These technologies are all examples of “generative AI,” a class of machine learning technologies that can generate new content—such as text, images, music, or video—by analyzing patterns in existing data.
  • 5.
    BASIC CONCEPTS  SupervisedLearning: Supervised learning is a type of machine learning where the algorithm is trained on a labeled dataset, meaning that each training example consists of input-output pairs. The algorithm learns to map inputs to corresponding outputs.  Unsupervised Learning: Unsupervised learning involves training a model on an unlabeled dataset, where the algorithm explores the inherent structure and patterns within the data without explicit guidance in the form of output labels.
  • 6.
    Generative Models: Generative modelsare a category of machine learning models designed to generate new, realistic data samples. These models learn the underlying distribution of the data, allowing them to create novel examples that resemble the training data. Discriminative Models: Discriminative models focus on learning the boundary between different classes or categories within the input data. These models aim to discriminate between different classes rather than generating new data. Objective: Primarily used for classification tasks, where the goal is to predict the class labels of input data.
  • 7.
    Probability Distributions: Probability distributionsdescribe the likelihood of different outcomes in a given set of events or data. In the context of machine learning, they represent the likelihood of different values for variables. Role: Generative models often involve estimating or sampling from probability distributions to generate realistic data samples. Generative Processes: Generative processes describe the step-by-step procedures through which new data is generated by a model, typically guided by a learned probability distribution.
  • 8.
  • 9.
    TEXT-TO-IMAGE MODELS 1. DALL·E2 :  DALL·E 2, created by OpenAI, is able to generate original, genuine and realistic images and art from a prompt consisting on a text description. it is possible to use the OPENAI API to get access to this model.  It uses the CLIP neural network. CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs.  For example: Image generated from the prompt ”A shiba inu wearing a beret and black turtleneck”.
  • 10.
    2 IMAGEN : Imagen is a text-to-image diffusion model consisting on large transformer language models.  The model was created by Google and the API can be found in their web page.  The main discovery observed with this model made is that large language models, pre-trained on text- only corpora, are very effective at encoding text for image synthesis.
  • 12.
    Muse :  Thismodel is a Text-to-image transformer model that achieves state-of the-art image generation while being more efficient than diffusion or autoregressive models.  it is trained on a masked modelling task in discrete token space  it is more efficient because of the use of discrete tokens and requiring fewer sampling iterations.
  • 13.
    TEXT-TO-3D MODELS some industrieslike gaming, it is necessary to generate 3D images. Dreamfusion :  DreamFusion is a text-to-3D model developed by Google Research that uses a pretrained 2D text-to-image diffusion model to perform textto-3D synthesis.  Dreamfusion replaces previous CLIP techniques with a loss derived from distillation of a 2D diffusion model  sampling in parameter space is much harder than in pixels as we want to create 3D models that look like good images when rendered from random angles. To solve the issue, this model uses a differentiable generator.  an image created by Dreamfusion from one particular angle along with all the variations that can be generated from additional text prompts
  • 15.
    Magic3D :  Thismodel is a text to 3D model made by NVIDIA Corporation. While the Dreamfusion model achieves remarkable results, the method has two problems: mainly, the long processing time and the low-quality of the generated images.  These problems are addressed by Magic3D using a two-stage optimization framework . Firstly, Magic3D builds a low-resolution diffusion prior and, then, it accelerates with a sparse 3D hash grid structure.  A textured 3D mesh model is furthered optimized with an efficient differentiable render. Comparatively, regarding human evaluation, the model achieves better results, as 61.7% prefer this model to DreamFusion.
  • 16.
    IMAGE-TO-TEXT MODELS Flamingo : A Visual Language Model created by Deepmind using few shot learning on a wide range of open-ended vision and language tasks, simply by being prompted with a few input/output examples .  The input of Flamingo contains visually conditioned autoregressive text generation models able to ingest a sequence of text tokens interleaved with images and/or videos and produce text as output.  A query is made to the model along with a photo or a video and the model answers with a text answer.
  • 17.
    VisualGPT :  VisualGPTis an image captioning model made by OpenAI . VisualGPT leverages knowledge from the pretrained language model GPT-2.  the biggest advantage of this model is that it does not need for as much data as other image-to-text models
  • 18.
    TEXT-TO-VIDEO MODELS Phenaki : This model has been made by Google Research, and it is capable of performing realistic video synthesis, given a sequence of textual prompts.  Phenaki is the first model that can generate videos from open domain time variable prompts.  To address data issues, it performs joint training on a large image-text pairs dataset as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets.  limitations come from computational capabilities for videos of variable length. The model has three parts: the C-ViViT encoder, the training transformer and the video generator.
  • 20.
    Soundify :  Soundifyis a system developed by Runway that matches sound effects to video. This system uses quality sound effects libraries and CLIP (a neural network with zero-shot image classification capabilities cited before).  the system has three parts: classification, synchronization, and mix. The classification matches effects to a video by classifying sound emitters within.
  • 21.
    TEXT-TO-CODE MODELS Codex : AI system created by OpenAI which translates text to code. It is a general-purpose programming model, as it can be applied to basically any programming task .  Programming can be broken down into two parts: breaking a problem down into simpler problems and mapping those problems into existing code (libraries, APIs, or functions) that already exist.
  • 22.
    NEURAL NETWORKARCHITECTURES FOR GENERATIVEAI Autoencoder:Imagine you have a magic trick where you give someone a picture, they squish it down into a small piece of paper, and then someone else can stretch that paper back into the original picture. That's kind of how autoencoders work. They compress data into a smaller representation (encoding) and then try to reconstruct the original data from that compressed form. Autoencoders are used for tasks like image denoising or dimensionality reduction. Generative Adversarial Network (GAN): Picture two artists, one trying to forge paintings and the other trying to spot the fakes. The forger keeps getting better until the spotter can't tell the difference between the real and fake paintings. That's the idea behind GANs. They consist of two neural networks: a generator that creates new data samples, like images, and a discriminator that tries to differentiate between real and fake samples. Through this back-and-forth process, both networks get better at their respective tasks, ultimately resulting in the generator creating very realistic-looking outputs.
  • 23.