Don't forget to change how many images are stored in memory to 1. Higher rank will use more VRAM and slow things down a bit, or a lot if you're close to the VRAM limit and there's lots of swapping to regular RAM, so maybe try training ranks in the 16-64 range. Funny, I've been running 892x1156 native renders in A1111 with SDXL for the last few days. Fitting on a 8GB VRAM GPU . 21:47 How to save state of training and continue later. 46:31 How much VRAM is SDXL LoRA training using with Network Rank (Dimension) 32. Around 7 seconds per iteration. If you have a GPU with 6GB VRAM or require larger batches of SD-XL images without VRAM constraints, you can use the --medvram command line argument. 5 Models > Generate Studio Quality Realistic Photos By Kohya LoRA Stable Diffusion Training - Full Tutorial I'm not an expert but since is 1024 X 1024, I doubt It will work in a 4gb vram card. Fine-tune using Dreambooth + LoRA with faces datasetSDXL training is much better for Lora's, not so much for full models (not that its bad, Lora are just enough) but its out of the scope of anyone without 24gb of VRAM unless using extreme parameters. Stable Diffusion XL. Dreambooth, embeddings, all training etc. Will investigate training only unet without text encoder. It is the most advanced version of Stability AI’s main text-to-image algorithm and has been evaluated against several other models. The thing is with 1024x1024 mandatory res, train in SDXL takes a lot more time and resources. No branches or pull requests. AdamW and AdamW8bit are the most commonly used optimizers for LoRA training. i'm running on 6gb vram, i've switched from a1111 to comfyui for sdxl for a 1024x1024 base + refiner takes around 2m. FurkanGozukara on Jul 29. So if you have 14 training images and the default training repeat is 1 then total number of regularization images = 14. I even went from scratch. SDXL training. Most items can be left default, but we want to change a few. Inside the /image folder, create a new folder called /10_projectname. The A6000 Ada is a good option for training LoRAs on the SD side IMO. Fine-tune and customize your image generation models using ComfyUI. May be even lowering desktop resolution and switch off 2nd monitor if you have it. 4 participants. SD 1. The Stable Diffusion XL (SDXL) model is the official upgrade to the v1. How to Fine-tune SDXL using LoRA. At the very least, SDXL 0. 92GB during training. If you have a desktop pc with integrated graphics, boot it connecting your monitor to that, so windows uses it, and the entirety of vram of your dedicated gpu. Which suggests 3+ hours per epoch for the training I'm trying to do. Last update 07-08-2023 【07-15-2023 追記】 高性能なUIにて、SDXL 0. Note that by default we will be using LoRA for training, and if you instead want to use Dreambooth you can set is_lora to false. do you mean training a dreambooth checkpoint or a lora? there aren't very good hyper realistic checkpoints for sdxl yet like epic realism, photogasm, etc. We can afford 4 due to having an A100, but if you have a GPU with lower VRAM we recommend bringing this value down to 1. Navigate to the directory with the webui. This versatile model can generate distinct images without imposing any specific “feel,” granting users complete artistic freedom. 1 to gather feedback from developers so we can build a robust base to support the extension ecosystem in the long run. If you use newer drivers, you can get past this point as the vram is released and only uses 7GB RAM. Will investigate training only unet without text encoder. Big Comparison of LoRA Training Settings, 8GB VRAM, Kohya-ss. 0. With 48 gigs of VRAM · Batch size of 2+ · Max size 1592, 1592 · Rank 512. If you have 24gb vram you can likely train without 8-bit Adam with the text encoder on. It. I have just performed a fresh installation of kohya_ss as the update was not working. WORKFLOW. Development. Other reports claimed ability to generate at least native 1024x1024 with just 4GB VRAM. [Ultra-HD 8K Test #3] Unleashing 9600x4800 pixels of pure photorealism | Using the negative prompt and controlling the denoising strength of 'Ultimate SD Upscale'!!Stable Diffusion XL is a generative AI model developed by Stability AI. 0 A1111 vs ComfyUI 6gb vram, thoughts. A Report of Training/Tuning SDXL Architecture. Furthermore, SDXL full DreamBooth training is also on my research and workflow preparation list. ago. 47:15 SDXL LoRA training speed of RTX 3060. Stable Diffusion XL (SDXL) v0. Object training: 4e-6 for about 150-300 epochs or 1e-6 for about 600 epochs. 46:31 How much VRAM is SDXL LoRA training using with Network Rank (Dimension) 32 47:15 SDXL LoRA training speed of RTX 3060 47:25 How to fix image file is truncated errorTraining the text encoder will increase VRAM usage. 5, SD 2. 2. Training ultra-slow on SDXL - RTX 3060 12GB VRAM OC #1285. Pretraining of the base. It is a much larger model. Currently training SDXL using kohya on runpod. But if Automactic1111 will use the latter when the former run out then it doesn't matter. For those purposes, you. Peak usage was only 94. Stable Diffusion XL(SDXL)とは?. Stable Diffusion XL (SDXL) was proposed in SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Available now on github:. This is the ultimate LORA step-by-step training guide, and I have to say this b. 0 works effectively on consumer-grade GPUs with 8GB VRAM and readily available cloud instances. Model downloaded. bat. 1 when it comes to NSFW and training difficulty and you need 12gb VRAM to run it. I have a 3060 12g and the estimated time to train for 7000 steps is 90 something hours. CANUCKS ANNOUNCE 2023 TRAINING CAMP IN VICTORIA. Next, the Training_Epochs count allows us to extend how many total times the training process looks at each individual image. --full_bf16 option is added. Moreover, I will investigate and make a workflow about celebrity name based. Features. Describe the solution you'd like. Then this is the tutorial you were looking for. • 1 mo. I've found ComfyUI is way more memory efficient than Automatic1111 (and 3-5x faster, as of 1. xformers: 1. So I set up SD and Kohya_SS gui, used AItrepeneur's low VRAM config, but training is taking an eternity. 46:31 How much VRAM is SDXL LoRA training using with Network Rank (Dimension) 32 47:15 SDXL LoRA training speed of RTX 3060 47:25 How to fix image file is truncated error [Tutorial] How To Use Stable Diffusion SDXL Locally And Also In Google Colab On Google Colab . 9 and Stable Diffusion 1. So I had to run. "webui-user. it almost spends 13G. SDXL parameter count is 2. 12 samples/sec Image was as expected (to the pixel) ANALYSIS. Since SDXL came out I think I spent more time testing and tweaking my workflow than actually generating images. If your GPU card has 8 GB to 16 GB VRAM, use the command line flag --medvram-sdxl. I can run SD XL - both base and refiner steps - using InvokeAI or Comfyui - without any issues. I mean, Stable Diffusion 2. Learn how to use this optimized fork of the generative art tool that works on low VRAM devices. 5% of the original average usage when sampling was occuring. Supported models: Stable Diffusion 1. Anyone else with a 6GB VRAM GPU that can confirm or deny how long it should take? 58 images of varying sizes but all resized down to no greater than 512x512, 100 steps each, so 5800 steps. About SDXL training. SDXL in 6GB Vram optimization? Question | Help I am using 3060 laptop with 16gb ram on my 6gb video card. 5 (especially for finetuning dreambooth and Lora), and SDXL probably wont even run on consumer hardware. The feature of SDXL training is now available in sdxl branch as an experimental feature. Learn to install Automatic1111 Web UI, use LoRAs, and train models with minimal VRAM. With 24 gigs of VRAM · Batch size of 2 if you enable full training using bf16 (experimental). This option significantly reduces VRAM requirements at the expense of inference speed. Please follow our guide here 4. Was trying some training local vs A6000 Ada, basically it was as fast on batch size 1 vs my 4090, but then you could increase the batch size since it has 48GB VRAM. 1 models from Hugging Face, along with the newer SDXL. 6 GB of VRAM, so it should be able to work on a 12 GB graphics card. -Easy and fast use without extra modules to download. Tick the box for FULL BF16 training if you are using Linux or managed to get BitsAndBytes 0. Since the original Stable Diffusion was available to train on Colab, I'm curious if anyone has been able to create a Colab notebook for training the full SDXL Lora model. Click to open Colab link . 4. 0:00 Introduction to easy tutorial of using RunPod. 0 base and refiner and two others to upscale to 2048px. refinerモデルを正式にサポートしている. Head over to the following Github repository and download the train_dreambooth. 0 as the base model. 2 GB and pruning has not been a thing yet. 動作が速い. Next (Vlad) : 1. The generated images will be saved inside below folder How to install Kohya SS GUI trainer and do LoRA training with Stable Diffusion XL (SDXL) this is the video you are looking for. 手順2:Stable Diffusion XLのモデルをダウンロードする. Generate images of anything you can imagine using Stable Diffusion 1. SDXL LoRA training question. I can generate 1024x1024 in A1111 in under 15 seconds, and using ComfyUI it takes less than 10 seconds. Automatic1111 won't even load the base SDXL model without crashing out from lack of VRAM. DeepSpeed integration allowing for training SDXL on 12G of VRAM - although, incidentally, DeepSpeed stage 1 is required for SimpleTuner to work on 24G of VRAM as well. The following is a list of the common parameters that should be modified based on your use cases: pretrained_model_name_or_path — Path to pretrained model or model identifier from. 5, one image at a time and takes less than 45 seconds per image, But, for other things, or for generating more than one image in batch, I have to lower the image resolution to 480 px x 480 px or to 384 px x 384 px. (For my previous LoRA for 1. Researchers discover that Stable Diffusion v1 uses internal representations of 3D geometry when generating an image. 10-20 images are enough to inject the concept into the model. I don't believe there is any way to process stable diffusion images with the ram memory installed in your PC. During training in mixed precision, when values are too big to be encoded in FP16 (>65K or <-65K), there is a trick applied to rescale the gradient. Additionally, “ braces ” has been tagged a few times. But I’m sure the community will get some great stuff. Discussion. Happy to report training on 12GB is possible on lower batches and this seems easier to train with than 2. Moreover, DreamBooth, LoRA, Kohya, Google Colab, Kaggle, Python and more. SDXL is starting at this level, imagine how much easier it will be in a few months? ----- 5:35 Beginning to show all SDXL LoRA training setup and parameters on Kohya trainer. (6) Hands are a big issue, albeit different than in earlier SD versions. 512x1024 same settings - 14-17 seconds. 0 comments. 目次. Best. The total number of parameters of the SDXL model is 6. I would like a replica of the Stable Diffusion 1. Which makes it usable on some very low end GPUs, but at the expense of higher RAM requirements. 6). • 1 yr. Find the 🤗 Accelerate example further down in this guide. 18. only trained for 1600 steps instead of 30000, 0. So at 64 with a clean memory cache (gives about 400 MB extra memory for training) it will tell me I need 512 MB more memory instead. The training of the final model, SDXL, is conducted through a multi-stage procedure. StableDiffusion XL is designed to generate high-quality images with shorter prompts. Try gradient_checkpointing, in my system it drops vram usage from 13gb to 8. 0. I tried recreating my regular Dreambooth style training method, using 12 training images with very varied content but similar aesthetics. Undo in the UI - Remove tasks or images from the queue easily, and undo the action if you removed anything accidentally. The best parameters to do LoRA training with SDXL. I've found ComfyUI is way more memory efficient than Automatic1111 (and 3-5x faster, as of 1. 9 loras with only 8GBs. One of the most popular entry-level choices for home AI projects. Having the text encoder on makes a qualitative difference, 8-bit Adam not as much afaik. Yeah 8gb is too little for SDXL outside of ComfyUI. Batch Size 4. Was trying some training local vs A6000 Ada, basically it was as fast on batch size 1 vs my 4090, but then you could increase the batch size since it has 48GB VRAM. SDXL Support for Inpainting and Outpainting on the Unified Canvas. py script pre-computes text embeddings and the VAE encodings and keeps them in memory. You can specify the dimension of the conditioning image embedding with --cond_emb_dim. . MSI Gaming GeForce RTX 3060. 9 system requirements. Master SDXL training with Kohya SS LoRAs in this 1-2 hour tutorial by SE Courses. 0 came out, I've been messing with various settings in kohya_ss to train LoRAs, as well as create my own fine tuned checkpoints. Still is a lot. Get solutions to train SDXL even with limited VRAM — use gradient checkpointing or offload training to Google Colab or RunPod. AdamW and AdamW8bit are the most commonly used optimizers for LoRA training. The Stability AI team is proud to release as an open model SDXL 1. Kohya GUI has support for SDXL training for about two weeks now so yes, training is possible (as long as you have enough VRAM). Cause as you can see you got only 1. r/StableDiffusion. In my PC, yes ComfyUI + SDXL also doesn't play well with 16GB of system RAM, especialy when crank it to produce more than 1024x1024 in one run. 0 model. . nazihater3000. It is the successor to the popular v1. 5 Models > Generate Studio Quality Realistic Photos By Kohya LoRA Stable Diffusion Training - Full TutorialI'm not an expert but since is 1024 X 1024, I doubt It will work in a 4gb vram card. Augmentations. 9 to work, all I got was some very noisy generations on ComfyUI (tried different . I'm using AUTOMATIC1111. r. New comments cannot be posted. You don't have to generate only 1024 tho. And if you're rich with 48 GB you're set but I don't have that luck, lol. Stable Diffusion Benchmarked: Which GPU Runs AI Fastest (Updated) vram is king,. This reduces VRAM usage A LOT!!! Almost half. Despite its robust output and sophisticated model design, SDXL 0. 0, 2. Get solutions to train SDXL even with limited VRAM — use gradient checkpointing or offload training to Google Colab or RunPod. 5 and if your inputs are clean. And all of this under Gradient checkpointing + xformers cause if not neither 24 GB VRAM will be enough. Based on our findings, here are some of the best value GPUs for getting started with deep learning and AI: NVIDIA RTX 3060 – Boasts 12GB GDDR6 memory and 3,584 CUDA cores. And if you're rich with 48 GB you're set but I don't have that luck, lol. At 7 it looked like it was almost there, but at 8, totally dropped the ball. coで体験する. This will increase speed and lessen VRAM usage at almost no quality loss. 5 doesnt come deepfried. In this video, I'll show you how to train amazing dreambooth models with the newly released SDXL 1. 0 is generally more forgiving than training 1. This is sorta counterintuitive considering 3090 has double the VRAM, but also kinda makes sense since 3080Ti is installed in a much capable PC. 43:36 How to do training on your second GPU with Kohya SS. r/StableDiffusion. Epoch와 Max train epoch는 동일한 값을 입력해야하며, 보통은 6 이하로 잡음. Yikes! Consumed 29/32 GB of RAM. 9 is able to be run on a modern consumer GPU, needing only a Windows 10 or 11, or Linux operating system, with 16GB RAM, an Nvidia GeForce RTX 20 graphics card (equivalent or higher standard) equipped with a minimum of 8GB of VRAM. Roop, base for faceswap extension, was discontinued on 20. ~1. safetensors. num_train_epochs: Each epoch corresponds to how many times the images in the training set will be "seen" by the model. You switched accounts on another tab or window. Despite its powerful output and advanced model architecture, SDXL 0. 0 and updating could break your Civitai lora's which has happened to lora's updating to SD 2. 0, and v2. Modified date: March 10, 2023. You will always need more VRAM memory for AI video stuff, even 24GB is not enough for the best resolutions while having a lot of frames. It takes around 18-20 sec for me using Xformers and A111 with a 3070 8GB and 16 GB ram. safetensor version (it just wont work now) Downloading model. 7gb of vram and generates an image in 16 seconds for sde karras 30 steps. 1024x1024 works only with --lowvram. Following the. So this is SDXL Lora + RunPod training which probably will be something that the majority will be running currently. Most of the work is to make it train with low VRAM configs. For the second command, if you don't use the option --cache_text_encoder_outputs, Text Encoders are on VRAM, and it uses a lot of VRAM. 5 is about 262,000 total pixels, that means it's training four times as a many pixels per step as 512x512 1 batch in sd 1. Swapped in the refiner model for the last 20% of the steps. So far, 576 (576x576) has been consistently improving my bakes at the cost of training speed and VRAM usage. The Stability AI SDXL 1. I assume that smaller lower res sdxl models would work even on 6gb gpu's. check this post for a tutorial. Stable Diffusion XL (SDXL) was proposed in SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. The higher the batch size the faster the training will be but it will be more demanding on your GPU. This video shows you how to get it works on Microsoft Windows so now everyone with a 12GB 3060 can train at home too :)SDXL is a new version of SD. Generate an image as you normally with the SDXL v1. 9 loras with only 8GBs. 4, v1. I’ve trained a. AUTOMATIC1111 has fixed high VRAM issue in Pre-release version 1. OutOfMemoryError: CUDA out of memory. With swinlr to upscale 1024x1024 up to 4-8 times. SDXL Kohya LoRA Training With 12 GB VRAM Having GPUs - Tested On RTX 3060. Even less VRAM usage - Less than 2 GB for 512x512 images on ‘low’ VRAM usage setting (SD 1. 4 participants. 5 where you're gonna get like a 70mb Lora. Set the following parameters in the settings tab of auto1111: Checkpoints and VAE checkpoints. Joviex. Same gpu here. ai Jupyter Notebook Using Captions Config-Based Training Aspect Ratio / Resolution Bucketing Resume Training Stability AI released SDXL model 1. Click to see where Colab generated images will be saved . Ultimate guide to the LoRA training. I think the minimum. 46:31 How much VRAM is SDXL LoRA training using with Network Rank (Dimension) 32 47:15 SDXL LoRA training speed of RTX 3060 47:25 How to fix image file is truncated errorAs the title says, training lora for sdxl on 4090 is painfully slow. 0 is engineered to perform effectively on consumer GPUs with 8GB VRAM or commonly available cloud instances. SDXL 1. RTX 3090 vs RTX 3060 Ultimate Showdown for Stable Diffusion, ML, AI & Video Rendering Performance. It was updated to use the sdxl 1. RTX 3070, 8GB VRAM Mobile Edition GPU. While for smaller datasets like lambdalabs/pokemon-blip-captions, it might not be a problem, it can definitely lead to memory problems when the script is used on a larger dataset. Imo I probably could have raised the learning rate a bit but I was a bit conservative. 1. 5:51 How to download SDXL model to use as a base training model. bat as . By using DeepSpeed it's possible to offload some tensors from VRAM to either CPU or NVME allowing to train with less VRAM. On average, VRAM utilization was 83. As i know 6 Gb of VRam are minimal system requirements. Used batch size 4 though. 98 billion for the v1. Checked out the last april 25th green bar commit. I guess it's time to upgrade my PC, but I was wondering if anyone succeeded in generating an image with such setup? Cant give you openpose but try the new sdxl controlnet loras 128 rank model files. A GeForce RTX GPU with 12GB of RAM for Stable Diffusion at a great price. A Report of Training/Tuning SDXL Architecture. 7. 1, so I can guess future models and techniques/methods will require a lot more. 1. It'll stop the generation and throw "cuda not. It might also explain some of the differences I get in training between the M40 and renting a T4 given the difference in precision. • 1 mo. Open comment sort options. But after training sdxl loras here I'm not really digging it more than dreambooth training. It's about 50min for 2k steps (~1. if you use gradient_checkpointing and. Train costed money and now for SDXL it costs even more money. Fooocusis a Stable Diffusion interface that is designed to reduce the complexity of other SD interfaces like ComfyUI, by making the image generation process only require a single prompt. Set classifier free guidance (CFG) to zero after 8 steps. Since this tutorial is about training an SDXL based model, you should make sure your training images are at least 1024x1024 in resolution (or an equivalent aspect ratio), as that is the resolution that SDXL was trained at (in different aspect ratios). So right now it is training at 2. My VRAM usage is super close to full (23. Let me show you how to train LORA SDXL locally with the help of Kohya ss GUI. 92 seconds on an A100: Cut the number of steps from 50 to 20 with minimal impact on results quality. Become A Master Of SDXL Training With Kohya SS LoRAs - Combine Power Of Automatic1111 &. cuda. This yes, is a large and strong opinionated YELL from me - you'll get a 100mb lora, unlike SD 1. 512 is a fine default. 7:06 What is repeating parameter of Kohya training. 0 yesterday but I'm at work now and can't really tell if it will indeed resolve the issue) Just pulled and still running out of memory, sadly. Create photorealistic and artistic images using SDXL. For running it after install run below command and use 3001 connect button on MyPods interface ; If it doesn't start at the first time execute againSDXL TRAINING CONTEST TIME!. Yep, as stated Kohya can train SDXL LoRas just fine. Invoke AI 3. In this blog post, we share our findings from training T2I-Adapters on SDXL from scratch, some appealing results, and, of course, the T2I-Adapter checkpoints on various. It is a much larger model compared to its predecessors. json workflows) and a bunch of "CUDA out of memory" errors on Vlad (even with the. But it took FOREVER with 12GB VRAM. /image, /log, /model. set COMMANDLINE_ARGS=--medvram --no-half-vae --opt-sdp-attention. In this tutorial, we will use a cheap cloud GPU service provider RunPod to use both Stable Diffusion Web UI Automatic1111 and Stable Diffusion trainer Kohya SS GUI to train SDXL LoRAs. OneTrainer is a one-stop solution for all your stable diffusion training needs. Reply reply42. 8 GB; Some users have successfully trained with 8GB VRAM (see settings below), but it can be extremely slow (60+ hours for 2000 steps was reported!) Lora fine-tuning SDXL 1024x1024 on 12GB vram! It's possible, on a 3080Ti! I think I did literally every trick I could find, and it peaks at 11. Hello. 5, v2. Cannot be used with --lowvram/Sequential CPU offloading. Of course there are settings that are depended on the the model you are training on, Like the resolution (1024,1024 on SDXL) I suggest to set a very long training time and test the lora meanwhile you are still training, when it starts to become overtrain stop the training and test the different versions to pick the best one for your needs. 7 GB out of 24 GB) but doesn't dip into "shared GPU memory usage" (using regular RAM). 1 awards. I changed my webui-user. At the moment I experimenting with lora trainig on 3070. (UPDATED) Please note that if you are using the Rapid machine on ThinkDiffusion, then the training batch size should be set to 1 as it has lower vRam; 2. SDXL 1024x1024 pixel DreamBooth training vs 512x512 pixel results comparison - DreamBooth is full fine tuning with only difference of prior preservation loss - 17 GB VRAM sufficient I just did my first 512x512 pixels Stable Diffusion XL (SDXL) DreamBooth training with my best hyper parameters. . 00000004, only used standard LoRa instead of LoRA-C3Liar, etc. SDXL 0. 5 based checkpoints see here . SDXL Prediction. Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. Using fp16 precision and offloading optimizer state and variables to CPU memory I was able to run DreamBooth training on 8 GB VRAM GPU with pytorch reporting peak VRAM use of 6. Training on a 8 GB GPU: . Create stunning images with minimal hardware requirements. I don't have anything else running that would be making meaningful use of my GPU. With 6GB of VRAM, a batch size of 2 would be barely possible. 1. Well dang I guess. 2. $234. Below the image, click on " Send to img2img ". 9 VAE to it. 0. It's using around 23-24GBs of RAM when generating images. worst quality, low quality, bad quality, lowres, blurry, out of focus, deformed, ugly, fat, obese, poorly drawn face, poorly drawn eyes, poorly drawn eyelashes, bad. I followed some online tutorials but run in to a problem that I think a lot of people encountered and that is really really long training time. So far, 576 (576x576) has been consistently improving my bakes at the cost of training speed and VRAM usage. While it is advised to max out GPU usage as much as possible, a high number of gradient accumulation steps can result in a more pronounced training slowdown. This came from lower resolution + disabling gradient checkpointing. 1. edit: and because SDXL can't do NAI style waifu nsfw pictures, the otherwise large and active SD. Development. It has been confirmed to work with 24GB VRAM. ). DreamBooth Stable Diffusion training in 10 GB VRAM, using xformers, 8bit adam, gradient checkpointing and caching latents. Open. With its extraordinary advancements in image composition, this model empowers creators across various industries to bring their visions to life with unprecedented realism and detail. 1 Ports from Gigabyte with the best service in. 0 base model. The current options available for fine-tuning SDXL are currently inadequate for training a new noise schedule into the base U-net. #2 Training . Schedule (times subject to change): Thursday,. 1. I'm using a 2070 Super with 8gb VRAM. 5. Over the past few weeks, the Diffusers team and the T2I-Adapter authors have been collaborating to bring the support of T2I-Adapters for Stable Diffusion XL (SDXL) in diffusers. 0 offers better design capabilities as compared to V1.