Me with four open cli terminals righ now:
https://i.kym-cdn.com/photos/images/original/001/617/650/91a.jpg
Me with four open cli terminals righ now:
https://i.kym-cdn.com/photos/images/original/001/617/650/91a.jpg
I do SDXL generation in 4GB at extreme expense of speed, by using a number of memory optimizations.
I’ve done this kind of stuff since SD 1.4, for the fun of it. I like to see how low I can push vram use.
SDXL takes around 3 to 4 minutes per generation including refiner but it works within constraints.
Graphics cards used are hilariously bad for the task, a 1050ti with 4GB and a 1060 with 3GB vram.
Have an implementation running on the 3GB card, inside a podman container, with no ram offloading, 1 vcpu and 4GB ram.
Graphical UI (streamlit) run on a laptop outside of server to save resources.
Working on a example implementation of SDXL as we speak and also working on SDXL generation on mobile.
That is the reason I’ve looked into this news, SSD-1B might be a good candidate for my dumb experiments.
That’s wonderful to know! Thank you again.
I’ll follow your instructions, this implementation is exactly what I was looking for.
Absolutely stellar write up. Thank you!
I have a couple of questions.
Imagine I have a powerful consumer gpu card to trow at this solution, 4090ti for the sake of example.
- How many containers can share one physical card, taking into account total vram memory will not be exceeded?
- How does one virtual gpu look like in the container? Can I run standard stuff like PyTorch, Tensorflow, and CUDA stuff in general?
DS1 to DS3, I lost count of the hours.
Just figured out there are 10 places called Lisbon dotted around the US, according to the search.