Running Experiments Board

Dataset	Method	Tmux

Main Pre-Search 命令：

--calibration_tolerance 0.05

--calibration_upper_bound 0.4

--num_few_shot_samples 45

# aflow
python run.py --dataset HotpotQA --aflow

# calibrated_prediction 
python run.py --dataset HotpotQA --eval_metric_key calibrated_prediction --few_shot_data_calibration  --calibration_tolerance 0.05 --calibration_upper_bound 0.4 --num_few_shot_samples 45 

# few_shot_score
python run.py --dataset HotpotQA --eval_metric_key few_shot_score --few_shot_data_calibration  --calibration_tolerance 0.05 --calibration_upper_bound 0.4 --num_few_shot_samples 45

# prediction
 python run.py --dataset HotpotQA --eval_metric_key prediction
 
 # self-confidence
 python run.py --dataset HotpotQA --eval_metric_key self_confidence

AgentPrune 命令

python experiments/run_gsm8k.py --agent_nums 5 --mode FullConnected --batch_size 50 --num_iterations 2 --imp_per_iterations 1 --pruning_rate 0.2 --num_rounds 2 --llm_name qwen-plus --optimized_spatial --optimized_temporal