Fix qwen image prompt padding #12075 #12643

sambhavnoobcoder · 2025-11-12T21:53:36Z

Fix QwenImage Prompt Embedding Padding for Deterministic Outputs

What was the issue?

Issue #12075 reported that QwenImage pipelines were producing non-deterministic outputs when using the same prompt across different batch sizes. The same text prompt would generate different images depending on whether it was batched alone or with other prompts of varying lengths.

This inconsistency violated a fundamental expectation: identical prompts with the same seed should always produce identical outputs, regardless of batch composition.

How I identified the problem

After reviewing the issue report and examining the QwenImage pipeline implementation, I discovered the root cause in the prompt embedding padding logic.

The pipelines were dynamically padding prompt embeddings to the maximum sequence length within each batch, rather than using a fixed padding length. This meant:

A short prompt batched alone would be padded to its own length
The same short prompt batched with a longer prompt would be padded to the longer prompt's length
Different padding created different Rotary Position Embedding (RoPE) position assignments
RoPE uses a shared position space for text and image tokens, so inconsistent text positions led to inconsistent image generation

The problem existed across all 8 QwenImage pipeline variants (main, img2img, inpaint, edit, edit_inpaint, edit_plus, controlnet, controlnet_inpaint) and the modular encoder functions.

How I solved it

The solution involved ensuring all prompt embeddings are padded to a consistent, fixed length determined by the max_sequence_length parameter (default 512, configurable up to the model's 1024 token limit).

I modified the padding logic in all affected locations to:

Use fixed-length padding: Changed from batch-maximum padding to max_sequence_length padding
Propagate the parameter: Added max_sequence_length parameter to all internal prompt encoding methods
Update RoPE sequence lengths: Changed txt_seq_lens to reflect the padded length instead of actual token counts
Handle vision tokens: Added truncation logic in Edit pipelines where image tokens are processed through the text encoder
Maintain backward compatibility: Kept the default max_sequence_length=512 to preserve existing behavior for users

The fix ensures that any prompt will always receive the same padding and RoPE positions, regardless of batch composition, making outputs deterministic and reproducible.

How the fix was tested

I created a comprehensive test test_prompt_embeds_padding() that verifies three critical behaviors:

Fixed padding to max_sequence_length: Confirms short prompts are padded to the full length (not just to their token count)
Batch consistency: Validates that prompts in a mixed-length batch all receive the same padding length
Custom max_sequence_length support: Tests that specifying a custom value (e.g., 512) correctly truncates and pads to that length

Additionally, I ran the entire QwenImage test suite to ensure no regressions were introduced. All structural tests pass successfully, with only expected value assertion changes (since fixing the padding changes the numerical outputs).

Test Results

Fixes : #12075
cc : @sayakpaul @yiyixuxu

… of dynamic batch-max

…vision tokens

…txt_seq_lens

…ax_length parameter

… values

sambhavnoobcoder · 2025-11-20T23:43:28Z

can i get some initial reviews on this @sayakpaul ?

sambhavnoobcoder · 2025-11-25T18:36:33Z

@yiyixuxu could you please give me an initial review on this PR ? I'll start making changes according to the same .

sambhavnoobcoder · 2025-12-02T12:25:29Z

Hi @yiyixuxu @sayakpaul ,
I was wondering if i could get some initial comments and reviews on this please ? I'll make the necessary changes to this accordingly .

github-actions · 2026-01-09T15:05:56Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sambhavnoobcoder · 2026-01-09T22:06:52Z

not stale

sayakpaul · 2026-01-12T08:22:54Z

Take note of #12702

…ng txt_seq_lens parameter

sambhavnoobcoder · 2026-01-13T08:04:56Z

Cool @sayakpaul . took a pull from main , updated this to the latest version , as well as accounted for all the changes from #12702 . Would really appreciate a review now here for the same .

sayakpaul · 2026-01-13T08:15:25Z

Could you walk us through the main changes after you pulled in the changes introduced in #12702?

sambhavnoobcoder · 2026-01-13T12:06:27Z

yes sure , so after integrating PR #12702, which removed the redundant txt_seq_lens parameter and shifted to attention mask-based inference for cleaner code, I first preserved my fixed padding approach from PR #12643 by retaining the tokenizer_max_length (default 1024) in encoding functions like get_qwen_prompt_embeds_edit in encoders.py; this ensures deterministic outputs by always padding prompts to a consistent length, preventing RoPE position variations across batch sizes for the same prompt and seed. Next, I eliminated the txt_seq_lens variables and parameters entirely from all pipeline files, such as pipeline_qwenimage.py and pipeline_qwenimage_controlnet.py, allowing the transformer to infer sequence lengths directly from the generated attention masks, which simplifies maintenance and reduces code redundancy without conflicting with my padding fix. Then, I adopted PR's prompt template constants by importing them from prompt_templates.py into the encoding functions, improving code organization and modularity while keeping my fixed padding intact. Additionally, I added config specs for tokenizer_max_length to classes like QwenImageEditTextEncoderStep in encoders.py, enabling proper propagation of the fixed length value through modular pipelines. Finally, I performed pipeline-specific cleanups by removing txt_seq_lens from transformer and controlnet calls in specialized files like pipeline_qwenimage_edit.py and pipeline_qwenimage_inpaint.py, where special logic like summing attention masks is now handled internally by the transformer, ensuring compatibility and maintaining overall functionality as verified by passing tests like test_prompt_embeds_padding.

Do let me know if i missed anything here , i'll incorporate that as well . Else you can review the code , i would resolve anything you would point out in the same .

sambhavnoobcoder added 6 commits November 13, 2025 02:55

Fix QwenImage prompt padding to use fixed max_sequence_length instead…

642b2e0

… of dynamic batch-max

Apply consistent padding fix to QwenImage img2img and inpaint pipelines

98e16ba

Fix padding in QwenImage edit pipelines with sequence truncation for …

cf75762

…vision tokens

Apply padding fix to QwenImage controlnet pipelines including inline …

72b9017

…txt_seq_lens

Fix padding in QwenImage modular encoder functions to use tokenizer_m…

d76a496

…ax_length parameter

Add test for QwenImage prompt padding consistency and update expected…

d7aac30

… values

sayakpaul mentioned this pull request Nov 14, 2025

The Diffusers MVP 🚀 #12635

Open

sayakpaul requested a review from yiyixuxu November 21, 2025 02:35

github-actions bot added the stale Issues that haven't received updates label Jan 9, 2026

github-actions bot removed the stale Issues that haven't received updates label Jan 10, 2026

Merge main and resolve conflicts: maintain fixed padding while removi…

d9d82da

…ng txt_seq_lens parameter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix qwen image prompt padding #12075 #12643

Fix qwen image prompt padding #12075 #12643

Uh oh!

sambhavnoobcoder commented Nov 12, 2025

Uh oh!

sambhavnoobcoder commented Nov 20, 2025

Uh oh!

sambhavnoobcoder commented Nov 25, 2025

Uh oh!

sambhavnoobcoder commented Dec 2, 2025

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

sambhavnoobcoder commented Jan 9, 2026

Uh oh!

sayakpaul commented Jan 12, 2026

Uh oh!

sambhavnoobcoder commented Jan 13, 2026 •

edited

Loading

Uh oh!

sayakpaul commented Jan 13, 2026

Uh oh!

sambhavnoobcoder commented Jan 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix qwen image prompt padding #12075 #12643

Are you sure you want to change the base?

Fix qwen image prompt padding #12075 #12643

Uh oh!

Conversation

sambhavnoobcoder commented Nov 12, 2025

Fix QwenImage Prompt Embedding Padding for Deterministic Outputs

What was the issue?

How I identified the problem

How I solved it

How the fix was tested

Test Results

Uh oh!

sambhavnoobcoder commented Nov 20, 2025

Uh oh!

sambhavnoobcoder commented Nov 25, 2025

Uh oh!

sambhavnoobcoder commented Dec 2, 2025

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

sambhavnoobcoder commented Jan 9, 2026

Uh oh!

sayakpaul commented Jan 12, 2026

Uh oh!

sambhavnoobcoder commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Jan 13, 2026

Uh oh!

sambhavnoobcoder commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sambhavnoobcoder commented Jan 13, 2026 •

edited

Loading

sambhavnoobcoder commented Jan 13, 2026 •

edited

Loading