Skip to content
This repository was archived by the owner on Aug 29, 2023. It is now read-only.
This repository was archived by the owner on Aug 29, 2023. It is now read-only.

Question about transformer decoder #63

@CoinCheung

Description

@CoinCheung

Hi,

I am trying to learn about the code, and I find the following line:

tgt = torch.zeros_like(query_embed)

The input tgt of the decoder is all zeros, and I see the all-zeros-tensor is used as input in the decoder layer:
q = k = self.with_pos_embed(tgt, query_pos)

Here tgt is all-zeros and the query_pos is a learnable embedding, which causes q and k to be non-zero tensor (same tensor in value as query_pos, but the tgt is still all-zeros(used as v). According to the computation rule of qkv attention, if v is all-zeros, the output of qkv would be all-zeros. Thus the self-attention module does not contribute to the model. Am I correct on this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions