Accounts for None result.#1
Conversation
|
@brianebert thank you so much for the PR. Is it possible for you provide the output from the failed command, and/or a sample prompt? I just want to better understand how to reproduce this error. As an aside, the current rubric evaluator was intentionally non-domain specific and the generic approach may not be useful in its current form. To address this, at the end of the report there is a generate_postmortem_analysis function which allows the LLM to provide context around the different outcomes. |
|
@cmworkato, I'm so sorry I ignored your question. I didn't realize you had replied, and I've been a bit busy, and don't recall being notified of your message. I'll see what I can find on my drive. In any event, It should not be hard to replicate the behavior. What I remember about the first issue is it mentioned specifically coming to a result of '{}' while expecting a dict. The second issue was flagged by Codex, which I had used to modify your code to accept a non-dict (null) result. I would probably have to rerun the experiment to elicit that response, and doing it the first time dinged my Anthropic account for $2.27! |
Just playing. Gave a goal that produced a null result and it choked not getting a dict. Once that was fixed, single and multi agent both scored 100/100 but report said they differed in quality.