目标

更少的token,无需执行MAS得到效果可比的RM和workflow;

即: agentic workflow Evaluation;

basic setting:

Optimizer as Reward Model; or Optimizer-Based Reward Model;

❌Query-Level; ✅Dataset-Level (效率更高→几次重复取avg)

相关工作

1. Static Code Evaluation

Static Code Analysis;

Automated evaluation of code generation tasks;

Evaluating Code Intelligence Tasks;

2. NAS Evaluation

NAS Evaluation

3. Agentic Workflow Evaluation

Agentic Predictor