Stable Diffusion is a popular offline, open source text to image AI model. You type text, it makes images. It is like ChatGPT but for images, where ChatGPT is text to text chat.
Basically it is a similar type of neural network model that takes an image with mathematically random static and processes it in a series of steps that slowly create an image based on the text prompt.
If you follow so far, Stable Diffusion is trained on billions of images that all have descriptive captions. This means you can only generate images using text that was in the captions of the original model data set. If, let’s say, you want to generate images of yourself in places around the world, it is very unlikely that you are already defined in the model unless you are a public figure or celebrity. However, it is possible to add yourself into the AI without retraining the entire neural network. If you had to retrain everything, it would basically require you to own or rent some serious data center level hardware that runs around a half million dollars. It is possible to kinda patch on a layer onto the neural network of the model so that it knows what you look like and associates it with text.
If you get the settings wrong for one of these patched training layers, you can get all kinds of crazy errors, like turning people into abstract art. This was the result of my first test. Maybe I’ll do better today. I found some faster training tools to try. The first attempt took 4 hours.
I tried my first Stable Diffusion AI embeddings training. It made impressive Picasso’s. I was not training for Picasso.
You did what? That first sentence is entirely meaningless to me. Mind explaining some more?
Stable Diffusion is a popular offline, open source text to image AI model. You type text, it makes images. It is like ChatGPT but for images, where ChatGPT is text to text chat.
Basically it is a similar type of neural network model that takes an image with mathematically random static and processes it in a series of steps that slowly create an image based on the text prompt.
If you follow so far, Stable Diffusion is trained on billions of images that all have descriptive captions. This means you can only generate images using text that was in the captions of the original model data set. If, let’s say, you want to generate images of yourself in places around the world, it is very unlikely that you are already defined in the model unless you are a public figure or celebrity. However, it is possible to add yourself into the AI without retraining the entire neural network. If you had to retrain everything, it would basically require you to own or rent some serious data center level hardware that runs around a half million dollars. It is possible to kinda patch on a layer onto the neural network of the model so that it knows what you look like and associates it with text.
If you get the settings wrong for one of these patched training layers, you can get all kinds of crazy errors, like turning people into abstract art. This was the result of my first test. Maybe I’ll do better today. I found some faster training tools to try. The first attempt took 4 hours.