Skip to content

mismatch max_seqlen in _flash_attn_varlen_forward #32

Description

@ZetangForward

Hi, thanks for your great work.

I noticed in the MixedAttention function that the following code first computes the query (q) and its interactions within the corresponding chunk.

# self attn
_, _, _, _, self_attn_out_sh, self_attn_lse_hs, _, _ = (
    _flash_attn_varlen_forward(
        q=q,
        k=k,
        v=v,
        cu_seqlens_q=self_attn_cu_seqlen,
        cu_seqlens_k=self_attn_cu_seqlen,
        max_seqlen_q=max_seqlen,
        max_seqlen_k=max_seqlen,
        softmax_scale=softmax_scale,
        causal=True,
        dropout_p=0.0,
    )
)

However, the max_seqlen is clearly larger than the maximum value in self_attn_cu_seqlen.

max_seqlen_q=max_seqlen,

I would like to know if this leads to any potential issues, such as reduced computational efficiency or unintended behavior in the attention computation?

@hewr2010 @whitelez @xptree

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions