blenderneko

Open Source

ComfyUI_ADV_CLIP_emb

# Advanced CLIP Text Encode This repo contains 4 nodes for [ComfyUI](https://github.com/comfyanonymous/ComfyUI) that allows for more control over the way prompt weighting should be interpreted. --- ### BNK_CLIPTextEncodeAdvanced node settings To achieve this, a CLIP Text Encode (Advanced) node is introduced with the following 2 settings: #### token_normalization: determines how token weights are normalized. Currently supports the following options: - **none**: does not alter the weights. - **mean**: shifts weights such that the mean of all meaningful tokens becomes 1. - **length**: divides token weight of long words or embeddings between all the tokens. It does so in a manner that the magnitude of the weight change remains constant between different lengths of tokens. E.g. if a word is expressed as 3 tokens and it has a weight of 1.5 all tokens get a weight of around 1.29 because sqrt(3 * pow(0.35, 2)) = 0.5. - **length+mean**: divides token weight of long words, and then shifts the mean to 1. #### weight_interpretation: Determines how up/down weighting should be handled. Currently supports the following options: - **comfy**: the default in ComfyUI, CLIP vectors are lerped between the prompt and a completely empty prompt. - **A1111**: CLip vectors are scaled by their weight - **compel**: Interprets weights similar to [compel](https://github.com/damian0815/compel). Compel up-weights the same as comfy, but mixes masked embeddings to accomplish down-weighting (more on this later). - **comfy++**: When up-weighting, each word is lerped between the prompt and a prompt where the word is masked off. Additionally uses compel style down-weighting. - **down_weight**: rescales weights such that the maximum weight is one. This means that you will only ever be down-weighting. Uses compel style down-weighting. <details> <summary> Intuition behind weight interpretation methods </summary> ### up weighting the diagram below visualizes the 3 different way in which the 3 methods to transform the clip embeddings to achieve up-weighting ![visual explanation of attention methods](https://github.com/BlenderNeko/ComfyUI_ADV_CLIP_emb/blob/master/visual.png) As can be seen, in A1111 we use weights to travel on the line between the zero vector and the vector corresponding to the token embedding. This can be seen as adjusting the magnitude of the embedding which both makes our final embedding point more in the direction the thing we are up weighting (or away when down weighting) and creates stronger activations out of SD because of the bigger numbers. Comfy also creates a direction starting from a single point but instead uses the vector embedding corresponding to a completely empty prompt. we are now traveling on a line that approximates the epitome of a certain thing. Despite the magnitude of the vector not growing as fast as in A1111 this is actually quite effective and can result in SD quite aggressively chasing concepts that are up-weighted. Comfy++ does not start from a single point but instead travels between the presence and absence of a concept in the prompt. Despite the idea being similar to that of comfy it is a lot less aggressive. #### visual comparison of the different methods Below a short clip of the prompt `cinematic wide shot of the ocean, beach, (palmtrees:1.0), at sunset, milkyway`, where the weight of palmtree slowly increasses from 1.0 to 2.0 in 20 steps. (made using [silicon29](https://huggingface.co/Xynon/SD-Silicon) in SD 1.5) https://user-images.githubusercontent.com/126974546/232336840-e9076b7c-3799-4335-baaa-992a6b8cad8a.mp4 ### down-weighting One of the issues with using the above methods for down-weighting is that the embedding vectors associated with a token do not just contain "information" about that token, but actually pull in a lot of context about the entire prompt. Most of the information they contain seemingly is about that specific token, which is why theses various up-weighting interpretations work, but that given token permeates throughout the entire CLIP embedding. In the example prompt above we can down-weight `palmtrees` all the way to .1 in comfy or A1111, but because the presence of the tokens that represent palmtrees affects the entire embedding, we still get to see a lot of palmtrees in our outputs. suppose we have the prompt `(pears:.2) and (apples:.5) in a bowl`. Compel does the following to accomplish down-weighting: it creates embeddings - `A` = `pears and apples in a bowl`, - `B` = `_ and apples in a bowl` - `C` = `_ and _ in a bowl` which it then mixes into a final embedding `0.2 * A + 0.3 * B + 0.5 * C`. This way we truly only have 0.2 of the influence of pears in our entire embedding, and 0.5 of apples. </details> --- ### Mix Clip Embeddings node (Depricated) The functionality of this node can now be found in the core ComfyUI nodes. --- ## SDXL support To support SDXL the following settings and nodes are provided. Note that the CLIP Text Encode (Advanced) node also works just fine for SDXL : --- ### BNK_CLIPTextEncodeSDXLAdvanced The CLIP Text Encode SDXL (Advanced) node provides the same settings as its non SDXL version. In addition it also comes with 2 text fields to send different texts to the two CLIP models. and with the following setting: - **balance**: tradeoff between the CLIP and openCLIP models. At 0.0 the embedding only contains the CLIP model output and the contribution of the openCLIP model is zeroed out. At 1.0 the embedding only contains the openCLIP model and the CLIP model is entirely zeroed out. This node mainly exists for experimentation. --- ### BNK_AddCLIPSDXLParams the Add CLIP SDXL Params node adds the following SDXL parameters to a conditioning: - **width**: width of the image crop. - **height**: height of the image crop. - **crop_w**: left pixel of the crop. - **crop_h**: top pixel of the crop. - **target_width**: width of the original image. - **target_height**: height of the original image. ---- ### BNK_AddCLIPSDXLRParams the Add CLIP SDXL Refiner Params node adds the following refiner parameters to a conditioning: - **width**: width of the image. - **height**: height of the image. - **ascore**: aesthetic score of the image.

AI & Machine Learning LLM Tools & Chat UIs

438 Github Stars

Open Source

ComfyUI_TiledKSampler

# Tiled sampling for ComfyUI ![panorama of the ocean, sailboats and large moody clouds](https://github.com/BlenderNeko/ComfyUI_TiledKSampler/blob/master/examples/ComfyUI_02010_.png) this repo contains a tiled sampler for [ComfyUI](https://github.com/comfyanonymous/ComfyUI). It allows for denoising larger images by splitting it up into smaller tiles and denoising these. It tries to minimize any seams for showing up in the end result by gradually denoising all tiles one step at the time and randomizing tile positions for every step. ### settings The tiled samplers comes with some additional settings to further control it's behavior: - **tile_width**: the width of the tiles. - **tile_height**: the height of the tiles. - **tiling_strategy**: how to do the tiling ## Tiling strategies ### random: The random tiling strategy aims to reduce the presence of seams as much as possible by slowly denoising the entire image step by step, randomizing the tile positions for each step. It does this by alternating between horizontal and vertical brick patterns, randomly offsetting the pattern each time. As the number of steps grows to infinity the strength of seams shrinks to zero. Although this random offset eliminates seams, it comes at the cost of additional overhead per step and makes this strategy incompatible with uni samplers. <details> <summary> visual explanation </summary> ![gif showing of the random brick tiling](https://github.com/BlenderNeko/ComfyUI_TiledKSampler/blob/master/examples/tiled_random.gif) </details> <details> <summary> example seamless image </summary> This tiling strategy is exceptionally good in hiding seams, even when starting off from complete noise, repetitions are visible but seams are not. ![gif showing of the random brick tiling](https://github.com/BlenderNeko/ComfyUI_TiledKSampler/blob/master/examples/ComfyUI_02006_.png) </details> ### random strict: One downside of random is that it can unfavorably crop border tiles, random strict uses masking to ensure no border tiles have to be cropped. This tiling strategy does not play nice with the SDE sampler. ### padded: The padded tiling strategy tries to reduce seams by giving each tile more context of its surroundings through padding. It does this by further dividing each tile into 9 smaller tiles, which are denoised in such a way that a tile is always surrounded by static contex during denoising. This strategy is more prone to seams but because the location of the tiles is static, this strategy is compatible with uni samplers and has no overhead between steps. However the padding makes it so that up to 4 times as many tiles have to be denoised. <details> <summary> visual explanation </summary> ![gif showing of padded tiling](https://github.com/BlenderNeko/ComfyUI_TiledKSampler/blob/master/examples/tiled_padding.gif) </details> ### simple The simple tiling strategy divides the image into a static grid of tiles and denoises these one by one. ### roadmap: - [x] latent masks - [x] image wide control nets - [x] T2I adaptors - [ ] tile wide control nets and T2I adaptors (e.g. style models) - [x] area conditioning - [x] area mask conditioning - [x] GLIGEN

AI & Machine Learning LLM Tools & Chat UIs

411 Github Stars

Open Source

ComfyUI_Cutoff

# Cutoff for ComfyUI ![screenshot of workflow](https://github.com/BlenderNeko/ComfyUI_Cutoff/blob/master/examples/screenshot.png) ### what is cutoff? [cutoff](https://github.com/hnmr293/sd-webui-cutoff) is a script/extension for the Automatic1111 webui that lets users limit the effect certain attributes have on specified subsets of the prompt. I.e. when the prompt is `a cute girl, white shirt with green tie, red shoes, blue hair, yellow eyes, pink skirt`, cutoff lets you specify that the word blue belongs to the hair and not the shoes, and green to the tie and not the skirt, etc. This is an implementation of cutoff in the form of 3 nodes that can be used in [ComfyUI](https://github.com/comfyanonymous/ComfyUI). ### how does this work? When you provide stable diffusion with some text, that text gets tokenized and CLIP creates a vector (embedding) for each token in the text. So if we have a prompt containing "blue hair, yellow eyes" some of the vectors coming out of CLIP will correspond to the "blue hair" part, and some to the "yellow eyes". When CLIP does this it tries to take the context of the entire sentence into consideration. Unfortunately CLIP isn't always as great at figuring out that the "blue" in "blue hair" should really only modify the noun "hair" and not the noun "eyes" a bit further in the sentence. So how do we deal with this? we can mask out the tokens corresponding to "blue" and ask CLIP to create another embedding. In this new embedding we have a set of vectors corresponding to "yellow eyes" that are not affected by "blue", because blue wasn't part of the tokens. If we then take the difference between our original vectors and these new vectors we now have a direction we can travel in for the eyes to become more affected by "yellow" and less by "blue". If we do this for all the color relations in text we can travel to an embedding where each of these relations are more isolated. Of course this effect isn't limited to just colors. ### ComfyUI nodes To achieve all of this, the following 4 nodes are introduced: **Cutoff BasePrompt:** this node takes the full original prompt **Cutoff Set Region:** this node sets a "region" of influence for specific target words, and comes with the following inputs: - region\_text: defines the set of tokens that the target words should affect, this should be a part of the original prompt. It is possible to define multiple regions in a single CLIPSetRegion node by stating every region on a new line. - target\_text: defines the set of tokens that will be masked off (i.e. the tokens we wish to limit to the region) this is a space separated list of words. If you want to match a sequence of words use underscores instead of spaces, e.g. "a\_series\_of\_connected\_tokens". If you want to match a word that actually contains underscores escape the underscore, e.g. "the\\_target\\_tokens". You can target textual inversion embeddings using the default syntax but do note that any underscores in the name of the embedding have to be escaped in this input field. - weight: how far to travel in the direction of the isolated vector **Cutoff Regions To Conditioning:** this node converts the base prompt and regions into an actual conditioning to be used in the rest of ComfyUI, and comes with the following inputs: - mask\_token: the token to be used for masking. If left blank it will default to the `<endoftext>` token. If the string converts to multiple tokens it will give a warning in the console and only use the first token in the list. - strict_mask: When 0.0 the specified target tokens will not affect the other specified areas but do affect anything outside of those areas. When set to 1.0 the specified target tokens will only affect their own region. - start\_from\_masked: When 0.0 the starting point to travel from is the original prompt. When set to 1.0 the starting point to travel from is the completely masked off prompt. Note that specifically when all region weights are 1.0 there is no difference between the two **Cutoff Regions To Conditioning (ADV):** provides the same functionality as the above node but also provides options on how to interpret prompt weighting. More on these settings can be found [here](https://github.com/BlenderNeko/ComfyUI_ADV_CLIP_emb). You can find these nodes under `conditioning>cutoff` ### SDXL The nodes won't throw any errors when used with SDXL, but at least for 0.9 I didn't found it to be working that well. Finally, Here are some example images that you can load into ComfyUI: ![first example generation of a cute girl, white shirt with green tie, red shoes, blue hair, yellow eyes, pink skirt using cutoff](https://github.com/BlenderNeko/ComfyUI_Cutoff/blob/master/examples/ComfyUI_00671_.png) ![first example generation of a cute girl, white shirt with green tie, red shoes, blue hair, yellow eyes, pink skirt using cutoff](https://github.com/BlenderNeko/ComfyUI_Cutoff/blob/master/examples/ComfyUI_00672_.png) ![first example generation of a cute girl, white shirt with green tie, red shoes, blue hair, yellow eyes, pink skirt using cutoff](https://github.com/BlenderNeko/ComfyUI_Cutoff/blob/master/examples/ComfyUI_00673_.png) ![first example generation of a cute girl, white shirt with green tie, red shoes, blue hair, yellow eyes, pink skirt using cutoff](https://github.com/BlenderNeko/ComfyUI_Cutoff/blob/master/examples/ComfyUI_00674_.png)

AI Tools ML Frameworks

395 Github Stars

Open Source

ComfyUI_Noise

# ComfyUI Noise This repo contains 6 nodes for [ComfyUI](https://github.com/comfyanonymous/ComfyUI) that allows for more control and flexibility over the noise. This allows e.g. for workflows with small variations to generations or finding the accompanying noise to some input image and prompt. ## Nodes ### Noisy Latent Image: This node lets you generate noise, you can find this node under `latent>noise` and it the following settings: - **source**: where to generate the noise, currently supports GPU and CPU. - **seed**: the noise seed. - **width**: image width. - **height**: image height. - **batch_size**: batch size. ### Duplicate Batch Index: The functionality of this node has been moved to core, please use: `Latent>Batch>Repeat Latent Batch` and `Latent>Batch>Latent From Batch` instead. This node lets you duplicate a certain sample in the batch, this can be used to duplicate e.g. encoded images but also noise generated from the node listed above. You can find this node under `latent` and it has the following settings: - **latents**: the latents. - **batch_index**: which sample in the latents to duplicate. - **batch_size**: the new batch size, (i.e. how many times to duplicate the sample). ### Slerp Latents: This node lets you mix two latents together. Both of the input latents must share the same dimensions or the node will ignore the mix factor and instead output the top slot. When it comes to other things attached to the latents such as e.g. masks, only those of the top slot are passed on. You can find this node under `latent` and it comes with the following inputs: - **latents1**: first batch of latents. - **latents2**: second batch of latents. This input is optional. - **mask**: determines where in the latents to slerp. This input is optional - **factor**: how much of the second batch of latents should be slerped into the first. ### Get Sigma: This node can be used to calculate the amount of noise a sampler expects when it starts denoising. You can find this node under `latent>noise` and it comes with the following inputs and settings: - **model**: The model for which to calculate the sigma. - **sampler_name**: the name of the sampler for which to calculate the sigma. - **scheduler**: the type of schedule used in the sampler - **steps**: the total number of steps in the schedule - **start_at_step**: the start step of the sampler, i.e. how much noise it expects in the input image - **end_at_step**: the current end step of the previous sampler, i.e. how much noise already is in the image. Most of the time you'd simply want to keep `start_at_step` at zero, and `end_at_step` at `steps`, but if you'd want to re-inject some noise in between two samplers, e.g. one sampler that denoises from 0 to 15, and a second that denoises from 10 to 20, you'd want to use a `start_at_step` 10 and an `end_at_step` of 15. So that the image we get, which is at step 15, can be noised back down to step 10, so the second sampler can bring it to 20. Take note that the Advanced Ksampler has a settings for `add_noise` and `return_with_leftover_noise` which when working with these nodes we both want to have disabled. ### Inject Noise: This node lets you actually inject the noise into an image latent, you can find this node under `latent>noise` and it comes with the following inputs: - **latents**: The latents to inject the noise into. - **noise**: The noise. This input is optional - **mask**: determines where to inject noise. This input is optional - **strength**: The strength of the noise. Note that we can use the node above to calculate for us an appropriate strength value. ### Unsampler: This node does the reverse of a sampler. It calculates the noise that would generate the image given the model and the prompt. You can find this node under `sampling` and it takes the following inputs and settings: - **model**: The model to target. - **steps**: number of steps to noise. - **end_step**: to what step to travel back to. - **cfg**: classifier free guidance scale. - **sampler_name**: The name of the sampling technique to use. - **scheduler**: The type of schedule to use. - **normalize**: whether to normalize the noise before output. Useful when passing it on to an Inject Noise node which expects normalizes noise. - **positive**: Positive prompt. - **negative**: Negative prompt. - **latent_image**: The image to renoise. When trying to reconstruct the target image as faithful as possible this works best if both the unsampler and sampler use a cfg scale close to 1.0 and similar number of steps. But it is fun and worth it to play around with these settings to get a better intuition of the results. This node let's you do similar things the A1111 [img2img alternative](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#img2img-alternative-test) script does ## Examples here are some examples that show how to use the nodes above. Workflows to these examples can be found in the `example_workflow` folder. <details> <summary> generating variations </summary> ![screenshot of a workflow that demos generating small variations to a given seed](https://github.com/BlenderNeko/ComfyUI_noise/blob/master/examples/example_variation.png) To create small variations to a given generation we can do the following: We generate the noise of the seed that we're interested using a `Noisy Latent Image` node, we then create an entire batch of these with a `Duplicate Batch Index` node. Note that if we were doing this for img2img we can use this same node to duplicate the image latents. Next we generate some more noise, but this time we generate a batch of noise rather than a single sample. We then Slerp this newly created noise into the other one with a `Slerp Latents` node. To figure out the required strength for injecting this noise we use a `Get Sigma` node. And finally we inject the slerped noise into a batch of empty latents with a `Inject Noise` node. Take note that we use an advanced Ksampler with the `add_noise` setting disabled </details> <details> <summary> "unsampling" </summary> ![screenshot of a workflow that demos generating small variations to a given seed](https://github.com/BlenderNeko/ComfyUI_noise/blob/master/examples/example_unsample.png) To get the noise that recreates a certain image, we first load an image. Then we use the `Unsampler` node with a low cfg value. To check if this is working we then take the resulting noise and feed it back into an advanced ksampler with the `add_noise` setting disabled, and a cfg of 1.0. </details>

AI Agents LLM Tools & Chat UIs

321 Github Stars

Software by blenderneko

ComfyUI_ADV_CLIP_emb

ComfyUI_TiledKSampler

ComfyUI_Cutoff

ComfyUI_Noise