Skip to content

about LlamaEncoderModel vs LlamaBiModel #60

@konioy

Description

@konioy

During training, the LlamaBiModel class was used, which modifies the _update_causal_mask function from LlamaModel.

However, I noticed that the public model on HuggingFace uses the LlamaEncoderModel class from modeling_llama_encoder.py when loading for inference. This class modifies the forward function from LlamaModel.

Why is there a difference between training and inference? Are they functionally equivalent? I'm a bit confused.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions