Skip to content

Similarity function used for evaluation of non-normalized embeddings #1

@raraz15

Description

@raraz15

Hello, nice work!

I am reading the paper (https://arxiv.org/html/2601.11262v1) and trying to benchmark the model.

Could you confirm whether all reported retrieval metrics were computed using the Ranker class in src/livi/apps/retrieval_eval/ranker.py, specifically the logic at line 77?

I am asking because some of the models you compare against (e.g., CLEWS) do not L2-normalize their embeddings by design.
Computing cosine similarity in their original embedding space would distort the learned geometry and likely yield suboptimal performance metrics.

Were these embeddings normalized before ranking, or was a different similarity function used for them?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions