big models are great agents but often too big, closed, or delicate to fine-tune idea: train a small model to craft context for a frozen big model, score the big model's outputs, use that as reward for the small one grpo for context tuning. more below
11,62K