top of page
canny_res_00234_.png

FLUX 
img2img

IMG 2 IMG transformation

​

Our last test with flux were using just prompts generated from a rendering or photograph of an object. What if we like to use lines edges directly from a sketch for a transformation? In this exercise we will use flux1.dev model and flux ControlNet canny to test some results. Remember these models are evolving so results could vary widely.

​

Our Test prompt today:

A morning photograph of a modern residential villa by Lake Washington, blending seamlessly into the natural landscape. The villa features large glass walls, charred wood upper surfaces, and warmer wood tones at the base, creating a striking contrast. Shot with a Canon EOS 5D Mark IV and 16-35mm lens, the image highlights the villa's transparency and connection to the serene surroundings, capturing the harmony between modern architecture and nature.

​

Check out FLUX Prompt Generator!

​

Our goal today is to explore how the ControlNet parameters—specifically, the strength percentage and the point at which it stops influencing the image—affect the final output. We will use 30-50 denoising steps to maintain stable image quality, and we will set the classifier-free guidance (CFG) scale to 3. Higher CFG values can lead to instability in the generated output, so we aim to keep it balanced.

​​​

​

The workflow:

The workflow begins by loading the main diffusion model and configuring dual CLIP models through the Load Diffusion Model and DualCLIPLoader nodes. These models guide the generation process, with the CLIP Text Encode (Prompt) nodes defining both negative and positive prompts. The negative prompt specifies undesirable attributes (like "cartoon" or "low quality"), while the positive prompt provides a detailed description of the desired output, such as a modern residential villa. These text encodings are processed through the FluxGuidance node, which applies a guidance scale of 3.0 to influence the model's outputs based on the textual prompts.

​

The Canny Edge node processes the input image to detect edges, providing a structural foundation for the model to follow. The edge-detected output is then fed into the Apply ControlNet (Advanced) node, which uses the loaded ControlNet model to enforce structural consistency based on the edges while allowing stylistic adjustments according to the prompts. Parameters like strength (set to 80%), start_percent (0%), and end_percent (100%) control the extent and duration of ControlNet's influence over the output image. Following this, the EmptySD3LatentImage node initializes the latent space with a resolution of 816x512 pixels and a batch size of 6 to set the stage for image generation.

​

The latent image is processed through the KSampler node, configured with 50 denoising steps, a CFG scale of 3.0, and the "euler" sampler, to refine the generated image while adhering to the input guidance. The output from the KSampler is decoded from its latent representation into a visible image using the VAE Decode node, which utilizes the pre-loaded VAE (Variational Autoencoder) model. The final image is then displayed and saved using the Save Image node.

The Output:

bottom of page