Skip to content

Training Vision Encoder from scractch #52

@Justin-Regef

Description

@Justin-Regef

Hi!
Thank you for the really cool research and available code. I was wondering, would it be possible / feasable / interesting to train the LLM2CLIP's vision encoder from scratch using the CC-LLM as text encoder?
I noticed in the paper you only finetuned vision encoders with the CC-LLM, but I don't see why we couldn't just immediately train a blank vision encoder. Is it because generating so many embeddings with the CC-LLM would cost too much?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions