SHUBHAM AI PPT for grapsp about artificial intelligence.pdf

GENERATIVE ARTIFICIAL
INTELLIGENCE
NAME:- SHUBHAM SINGH
U.ROLL:- 21EUCEC056
C.ROLL:- 21/529
BRANCH:- ELECTRONICS AND COMMUNICATION ENGINEERING
SEMESTER:- V1th

OVERVIEW
1. INTRODUCTION
2. BASIC CONCEPT
3. GENERATIVE AI MODEL CATEGORIES
4. NEURAL NETWORK ARCHITECTURES FOR GENERATIVE AI

INTRODUCTION
 Generative AI refers to a category of artificial intelligence techniques and models that
are designed to generate new content, such as images, text, audio, or even video, that is
similar to the data it was trained on.
 These models learn the underlying patterns and structures of the data they are exposed
to and then use that understanding to create new examples that resemble the original
data.

GENERATIVEAI AND LARGE LANGUAGE
MODELS
 The rapid pace of AI development and public release tools such as
ChatGPT,GitHubCopilot, and DALL-E have attracted widespread
attention, optimism.
 These technologies are all examples of “generative AI,” a class of
machine learning technologies that can generate new content—such
as text, images, music, or video—by analyzing patterns in existing
data.

BASIC CONCEPTS
 Supervised Learning:
Supervised learning is a type of machine learning where the algorithm is trained on a labeled dataset,
meaning that each training example consists of input-output pairs. The algorithm learns to map inputs to
corresponding outputs.
 Unsupervised Learning:
Unsupervised learning involves training a model on an unlabeled dataset, where the algorithm explores
the inherent structure and patterns within the data without explicit guidance in the form of output labels.

Generative Models:
Generative models are a category of machine learning models designed to generate new, realistic data
samples. These models learn the underlying distribution of the data, allowing them to create novel
examples that resemble the training data.
Discriminative Models:
Discriminative models focus on learning the boundary between different classes or categories within the
input data. These models aim to discriminate between different classes rather than generating new data.
Objective: Primarily used for classification tasks, where the goal is to predict the class labels of input data.

Probability Distributions:
Probability distributions describe the likelihood of different outcomes in a given set of
events or data. In the context of machine learning, they represent the likelihood of
different values for variables.
Role: Generative models often involve estimating or sampling from probability
distributions to generate realistic data samples.
Generative Processes:
Generative processes describe the step-by-step procedures through which new data is
generated by a model, typically guided by a learned probability distribution.

GENERATIVEAI MODELS CATEGORIES

TEXT-TO-IMAGE MODELS
1. DALL·E 2 :
 DALL·E 2, created by OpenAI, is able to generate original, genuine and realistic images and art from a
prompt consisting on a text description. it is possible to use the OPENAI API to get access to this
model.
 It uses the CLIP neural network. CLIP (Contrastive Language-Image Pre-Training) is a neural network
trained on a variety of (image, text) pairs.
 For example: Image generated from the prompt ”A shiba inu wearing a beret and black turtleneck”.

2 IMAGEN :
 Imagen is a text-to-image diffusion model consisting on large transformer language models.
 The model was created by Google and the API can be found in their web page.
 The main discovery observed with this model made is that large language models, pre-trained on text-
only corpora, are very effective at encoding text for image synthesis.

Muse :
 This model is a Text-to-image transformer model that achieves state-of the-art image
generation while being more efficient than diffusion or autoregressive models.
 it is trained on a masked modelling task in discrete token space
 it is more efficient because of the use of discrete tokens and requiring fewer sampling
iterations.

TEXT-TO-3D MODELS
some industries like gaming, it is necessary to generate 3D images.
Dreamfusion :
 DreamFusion is a text-to-3D model developed by Google Research that uses a pretrained 2D
text-to-image diffusion model to perform textto-3D synthesis.
 Dreamfusion replaces previous CLIP techniques with a loss derived from distillation of a 2D
diffusion model
 sampling in parameter space is much harder than in pixels as we want to create 3D models
that look like good images when rendered from random angles. To solve the issue, this model
uses a differentiable generator.
 an image created by Dreamfusion from one particular angle along with all the variations that
can be generated from additional text prompts

Magic3D :
 This model is a text to 3D model made by NVIDIA Corporation. While the Dreamfusion
model achieves remarkable results, the method has two problems: mainly, the long
processing time and the low-quality of the generated images.
 These problems are addressed by Magic3D using a two-stage optimization framework .
Firstly, Magic3D builds a low-resolution diffusion prior and, then, it accelerates with a
sparse 3D hash grid structure.
 A textured 3D mesh model is furthered optimized with an efficient differentiable render.
Comparatively, regarding human evaluation, the model achieves better results, as 61.7%
prefer this model to DreamFusion.

IMAGE-TO-TEXT MODELS
Flamingo :
 A Visual Language Model created by Deepmind using few shot learning on a wide range of open-ended
vision and language tasks, simply by being prompted with a few input/output examples .
 The input of Flamingo contains visually conditioned autoregressive text generation models able to ingest
a sequence of text tokens interleaved with images and/or videos and produce text as output.
 A query is made to the model along with a photo or a video and the model answers with a text answer.

VisualGPT :
 VisualGPT is an image captioning model made by OpenAI . VisualGPT leverages
knowledge from the pretrained language model GPT-2.
 the biggest advantage of this model is that it does not need for as much data as other
image-to-text models

TEXT-TO-VIDEO MODELS
Phenaki :
 This model has been made by Google Research, and it is capable of performing realistic video synthesis,
given a sequence of textual prompts.
 Phenaki is the first model that can generate videos from open domain time variable prompts.
 To address data issues, it performs joint training on a large image-text pairs dataset as well as a smaller
number of video-text examples can result in generalization beyond what is available in the video datasets.
 limitations come from computational capabilities for videos of variable length. The model has three parts:
the C-ViViT encoder, the training transformer and the video generator.

Soundify :
 Soundify is a system developed by Runway that matches sound effects to video. This
system uses quality sound effects libraries and CLIP (a neural network with zero-shot image
classification capabilities cited before).
 the system has three parts: classification, synchronization, and mix. The classification
matches effects to a video by classifying sound emitters within.

TEXT-TO-CODE MODELS
Codex :
 AI system created by OpenAI which translates text to code. It is a general-purpose programming model,
as it can be applied to basically any programming task .
 Programming can be broken down into two parts: breaking a problem down into simpler problems and
mapping those problems into existing code (libraries, APIs, or functions) that already exist.

NEURAL NETWORKARCHITECTURES FOR
GENERATIVEAI
Autoencoder: Imagine you have a magic trick where you give someone a picture, they
squish it down into a small piece of paper, and then someone else can stretch that paper
back into the original picture. That's kind of how autoencoders work. They compress
data into a smaller representation (encoding) and then try to reconstruct the original
data from that compressed form. Autoencoders are used for tasks like image denoising
or dimensionality reduction.
Generative Adversarial Network (GAN): Picture two artists, one trying to forge paintings
and the other trying to spot the fakes. The forger keeps getting better until the spotter
can't tell the difference between the real and fake paintings. That's the idea behind
GANs. They consist of two neural networks: a generator that creates new data samples,
like images, and a discriminator that tries to differentiate between real and fake
samples. Through this back-and-forth process, both networks get better at their
respective tasks, ultimately resulting in the generator creating very realistic-looking
outputs.

SHUBHAM AI PPT for grapsp about artificial intelligence.pdf

More Related Content

Similar to SHUBHAM AI PPT for grapsp about artificial intelligence.pdf

Recently uploaded

SHUBHAM AI PPT for grapsp about artificial intelligence.pdf