更少的token,无需执行MAS得到效果可比的RM和workflow;
即: agentic workflow Evaluation;
basic setting:
Optimizer as Reward Model; or Optimizer-Based Reward Model;
❌Query-Level; ✅Dataset-Level (效率更高→几次重复取avg)
Static Code Analysis;
Automated evaluation of code generation tasks;
Evaluating Code Intelligence Tasks;