3. INTRODUCTION TO
MIDJOURNEY
3
• Generative AI system for image creation
• Leverages diverse AI models - transformers, GANs,
diffusion
• Trained on massive datasets mined from the internet
• Created in 2021 by software engineers David Holz, Chris
Hallbeck, and CTO Mike Tyka
• Example user prompts: "an oil painting of a clocktower in
Paris", "a scenic photograph of mountain ranges at
sunset"
• Accessible via Discord server and native apps for desktop
and mobile
imagine: skyscrapers from clouds in
cinematic view
4. DATA MINING POWERS
MIDJOURNEY
• Steps: collection, cleaning, feature extraction,
evaluation
• Training data is scraped from across the internet
using web crawlers
• Custom computer vision models extract semantic
features from images
• Natural language processing analyses text
captions and transcripts
• Data processing pipeline ensures high-quality
cleaned dataset
• Additional datasets continually integrated to
expand knowledge domain
AI
5. SCALABLE DATA COLLECTION &
CLEANING
• Custom web crawlers for scraping image
sites
• Millions of images and text captions
extracted
• Cloud infrastructure for storage and
processing
• Filter out offensive, objectionable image
content
• Custom machine learning models identify
duplicate, near-duplicate images
7. FEATURE EXTRACTION
• Use pre-trained CNNs to identify visual
objects, scenes, styles
• Applying word embedding models like
Word2Vec on text
• Building knowledge graphs connecting text
and visuals
• Detect concepts like "face", "car",
"mountain" via classification
• Extract style features like "oil paint", "pencil
sketch"
• Name entity recognition to ID people,
places, objects
9. HOW IT WORKS?
9
AI models and algorithms that Midjourney uses:
Transformers
Diffusion models
Generative adversarial networks (GANs)
Transformer •TEXT PROMT
Diffusion
Model
•Noise input
GANs
• Output
Image
• Image from DM
• Text description
from
transformer
10.
11. CONCLUSION
The combination of transformers, diffusion models, and GANs allows
Midjourney to generate high-quality images from text prompts.
The transformers can understand the text prompts.
the diffusion models can generate the initial images.
and the GANs can refine the images to improve their quality.
20XX PRESENTATION TITLE 11