Skip to content

Accounts for None result.#1

Open
brianebert wants to merge 1 commit into
workato-devs:mainfrom
brianebert:be_null_result
Open

Accounts for None result.#1
brianebert wants to merge 1 commit into
workato-devs:mainfrom
brianebert:be_null_result

Conversation

@brianebert
Copy link
Copy Markdown

Just playing. Gave a goal that produced a null result and it choked not getting a dict. Once that was fixed, single and multi agent both scored 100/100 but report said they differed in quality.

@cmworkato
Copy link
Copy Markdown
Collaborator

@brianebert thank you so much for the PR. Is it possible for you provide the output from the failed command, and/or a sample prompt? I just want to better understand how to reproduce this error.

As an aside, the current rubric evaluator was intentionally non-domain specific and the generic approach may not be useful in its current form. To address this, at the end of the report there is a generate_postmortem_analysis function which allows the LLM to provide context around the different outcomes.

@brianebert
Copy link
Copy Markdown
Author

@cmworkato, I'm so sorry I ignored your question. I didn't realize you had replied, and I've been a bit busy, and don't recall being notified of your message.

I'll see what I can find on my drive. In any event, It should not be hard to replicate the behavior. What I remember about the first issue is it mentioned specifically coming to a result of '{}' while expecting a dict. The second issue was flagged by Codex, which I had used to modify your code to accept a non-dict (null) result. I would probably have to rerun the experiment to elicit that response, and doing it the first time dinged my Anthropic account for $2.27!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants