Skip to content

Improvement/234 create main evaluation script#243

Open
dkkdark wants to merge 46 commits into
mainfrom
improvement/234-create-main-evaluation-script
Open

Improvement/234 create main evaluation script#243
dkkdark wants to merge 46 commits into
mainfrom
improvement/234-create-main-evaluation-script

Conversation

@dkkdark

@dkkdark dkkdark commented Mar 24, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions

Copy link
Copy Markdown
Current Branch Main Branch
Coverage Badge Coverage Badge

@dkkdark dkkdark requested a review from NoB0 March 24, 2026 15:17
Comment thread config/default/config_evaluation.yaml Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread config/default/config_evaluation.yaml Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread usersimcrs/run_evaluation.py
annotate_dialogues(dialogues, user_nlu, agent_nlu)


def get_summary_by_agent(

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can’t skip calling it when there is only one agent because we won’t get a summary for it

Comment thread usersimcrs/run_evaluation.py Outdated
@dkkdark dkkdark requested a review from NoB0 April 14, 2026 15:17

@NoB0 NoB0 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some points need to be clarified

Comment thread config/default/config_evaluation.yaml
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread config/default/config_evaluation.yaml Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread usersimcrs/run_evaluation.py
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
Comment thread usersimcrs/run_evaluation.py Outdated
@dkkdark dkkdark requested a review from NoB0 April 21, 2026 12:35

@NoB0 NoB0 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now LGTM (please check comments on confuse key access). Although, a follow-up should be created to investigate if build_metric_registry and evaluate_metric can be optimized with regards to maintenance when new metrics are supported.

)

if config["annotate_dialogues"].get():
if not config["user_nlu"].get(None):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get in confuse behaves bit differently than the get for a dictionary, i.e., it expects a template. I would suggest to look at the documentation, but I think that simply checking if the key is in the configuration and is not empty should work.

raise ValueError(
"`user_nlu` is required when `annotate_dialogues` is True."
)
if not config["agent_nlu"].get(None):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@NoB0 NoB0 requested a review from kbalog May 4, 2026 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants