You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I want to evaluate different coding agents with different models and create/annotate and evaluate the output manually without any actual model runs or api calls.
I want to give codex/claude code/droid/opencode, etc... any permutation of models for each agent the same prompt and run them > evaluate manually and annotate if they did what I asked + annotate some subjective outcome ( pretty ui or bad ui, etc) > view reporting results after my experiments with different permutations of agent/model + prompt versions.
Basically I want an option to create a custom model name ( not a full working setup, no working api). For example "codex/gpt-5-codex-mini", "codex/gpt-5-codex-high", "cc/opus-4-5" and give them "prompt20-v-1" and "prompt20-v-2" and without running them through promptfoo annotate the result and even skip the output completely or fill it manually.
Is there a way to do it now without creating any new features?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I want to evaluate different coding agents with different models and create/annotate and evaluate the output manually without any actual model runs or api calls.
I want to give codex/claude code/droid/opencode, etc... any permutation of models for each agent the same prompt and run them > evaluate manually and annotate if they did what I asked + annotate some subjective outcome ( pretty ui or bad ui, etc) > view reporting results after my experiments with different permutations of agent/model + prompt versions.
Basically I want an option to create a custom model name ( not a full working setup, no working api). For example "codex/gpt-5-codex-mini", "codex/gpt-5-codex-high", "cc/opus-4-5" and give them "prompt20-v-1" and "prompt20-v-2" and without running them through promptfoo annotate the result and even skip the output completely or fill it manually.
Is there a way to do it now without creating any new features?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions