The document discusses using Stable Diffusion, an open-source AI image generation tool, to augment artists' workflows. It describes testing Stable Diffusion's ability to generate characters, backgrounds, objects, and UI elements for an art project. The tests focused on integrating AI generation at different stages, from ideation to refinement. Pipelines are presented for each asset type that allow controlling outputs through techniques like prompt engineering, style guidance, and blending. The document concludes the process takes less time than traditional methods and produces a wider variety of consistent assets, but limitations include lack of adherence to unique styles and potential for outputs to look over-similar.
3. CONFIDENTIAL | 3
Intro / Motivation and goals
Motivation
Motivation for the AI image gen research was to learn tech and see if it is possible to integrate it with artists’ workflow,
and at which stage of design. Research was focused on:
- Characters - humanoid and animal
- Backgrounds - landscapes and background elements
- Objects/items - such as game symbols and game icons
- UI elements - such as fonts, frames and other UI elements
Goals
Goals of the research was the following:
- See how the tech can be integrated into to artists’ workflow as an additional tool which would complement
and augment artists’ style of work
- Output consistency in helping create art for a given theme, in a given style
- Output speed in comparison to traditional approach
4. CONFIDENTIAL | 4
Stable Diffusion and Open Source Ecosystem
Incredibly vibrant community of developers and testers
With great power comes great complexity which needs to be tamed to get predictable results
5. CONFIDENTIAL | 5
Tools / Software and hardware used in research
Focus on open-source, customizable and low-cost software solutions
These are software solutions which have been tested:
- Automatic1111 - frontend for Stable Diffusion, with additional features:
- Control networks
- ESRGAN upscalers
- ComfyUI node based UI for custom and templatized image creation
Intial hardware setup which was used for research:
- AMD Ryzen 3700 CPU (i7 equivalent)
- Nvidia 3090 GPU, 24Gb VRAM
- 32Gb RAM
- M.2 1Tb SSD drive
- Linux OS, Ubuntu 20.04 LTS
6. CONFIDENTIAL | 6
Tools / Harnessing control
Common theme for all of the workflow tests
One of the goals for the process itself was to test the ability to have control of all parts of the process
Random output
Raw ideas
Rough quality
Undefined subjects
Unusable results
Uncontrollable process
Focused design
Refined output
High Quality
Controllable process
Reproducible results
7. CONFIDENTIAL | 7
Character Pipeline / Building prompts
There is a methodology in prompt building which allows for more control of the output
Although it may seem trivial, prompt engineering is an important part of generation process. It is not always accurate
because the results are statistical, not deterministic, but more precise, descriptive prompts yield better results
8. CONFIDENTIAL | 8
Character Pipeline / Applying additional styles
There are multiple ways in which additional guidance can be applied to the generation
Neural net structures which have been pre-trained to a certain style or subject can be applied in addition to basic
checkpoint generation and prompt engineering
9. CONFIDENTIAL | 9
Character Pipeline / Infusing style on a subject
Infusing style on a generated subject using image to image generation
10. CONFIDENTIAL | 10
Character Pipeline / Blending between subjects
Blending between subjects in latent space using image to image
11. CONFIDENTIAL | 11
Character Pipeline / Greater control
Further control can be applied to various aspects of generation
We can control compositions, poses, colors, styles etc (varying degrees of usability atm)
Inferring a skeletal guide from reference image
12. CONFIDENTIAL | 12
Character Pipeline / Greater control
One of the key aspects of usability is control over elements of output
In these particular example multiple designs were generated using the same pose as a basis.
This method allows for precise control of composition elements
13. CONFIDENTIAL | 13
Character Pipeline / Variety within control
We are free to choose which aspects of image generation we control
We can choose to control one aspect and vary all other aspects of image creation including mood, subject, design,
colors etc…
We can achieve fairly precise control of composition elements
14. CONFIDENTIAL | 14
Character Pipeline / Refinement
Refining a design involves artist’s knowledge and experience
First step is refining selected design(s) in Photoshop and guiding it towards a more specific idea
Stable Diffusion Photoshop
15. CONFIDENTIAL | 15
Character Pipeline / Design variants
Design process involves going back and forth between artist and SD
Next step is bringing it back to A1111 for more iterations and design refinement with img2img and inpainting
17. CONFIDENTIAL | 17
Character Pipeline / Inpainting elements
SD inpainting makes it very easy to add new elements to the design
18. CONFIDENTIAL | 18
Background Pipeline / Basic prompt building
Initial background design works similar as with characters (prompt building)
Backgrounds are easy to generate in large variety, but harder to refine than characters
Initial prompt + prompt engineering
19. CONFIDENTIAL | 19
Background Pipeline / Depth controlled composition
Control networks can be used to keep the composition in place
Backgrounds can be iterated on while keeping composition intact with dept maps (SD or artist generated)
Depth map basis
20. CONFIDENTIAL | 20
Background Pipeline / Variety of compositions
Initial background design works similar as with characters (prompt building)
We use depth map for initial composition guidance, and steer generation results with custom prompts
Depth map to various themes sharing same composition
21. CONFIDENTIAL | 21
Object pipeline / From concept to refinement
Initial ideation is similar to character pipeline
Design follows the same process of converging to a solution with input and filtering by the artist
Stable Diffusion Photoshop Stable Diffusion Photoshop
22. CONFIDENTIAL | 22
Object pipeline / Line art design to render
SD needs surprisingly little to be able to infer form
We can provide initial design and prompt, and guide SD to give us rendered solutions of our designs
Using an artist-provided initial design to guide output
23. CONFIDENTIAL | 23
UI elements pipeline / Font elements
Design iterations are possible with control networks and flat layouts
For UI elements a designer can provide flat design solutions for elements, and SD can iterate on variants and finalised
solutions
24. CONFIDENTIAL | 24
UI elements pipeline / Frame elements
Design iterations are possible with control networks and flat layouts
UI elements on the cabinet follow the same iteration process as the other UI elements
26. CONFIDENTIAL | 26
Conclusions / Pros and Cons
Successes:
- The whole process takes from 30% to 50% less time than using traditional methods.
- The end result is an wide range of graphic assets, UI and splash art consistent with a given theme.
- Process can be used in all stages of artistic process - from references and ideas, design iterations, and
refinement. However, IT IS ONLY USEFUL IN THE HANDS OF A SKILLED ARTIST.
- The amount of produced asset variants is much larger than using traditional methods and can be used for to
converge on design ideas and direction faster.
- Response of artists is positive after initial experimentation and usage testing.
Limitations:
- Adherence to a very specific art style is currently not optimal without custom training.
- Less presence in dataset means less options for generation (characters>backgrounds>animals>objects>UI).
- Tendency of output to look ‘samey’ or to represent the common denominator of training data.