Skip to content

Multi-Modal NLP Model Comparison #35

@Cgarg9

Description

@Cgarg9

Description:

To help users understand how multi-modal models process text along with other data types (e.g., images, audio), add a notebook that compares different multi-modal NLP techniques.

Tasks:

  • Compare CLIP (Contrastive Language-Image Pretraining), BLIP, Flamingo, and OpenAI’s GPT-4V.
  • Apply models to text-to-image retrieval, image captioning, and multi-modal reasoning tasks.
  • Evaluate results using BLEU, CIDEr, and retrieval precision metrics.
  • Summarize key takeaways for different applications.
  • Name the notebook multi_modal_nlp_comparison.ipynb.
  • Update the README file with relevant references.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions