Trendaavat aiheet
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
Has anyone tried RL to rewrite prompts for reasoning models to further improve outputs?
I'm assuming so, it feels pretty obvious, but if not I want to try it.
If you know of any existing work here, pls lmk so I don't re-do something people have already done!
By this, I mean:
- Take an already-trained, frozen reasoning model (i.e. o4-mini via API)
- Add a smaller LLM that takes in a prompt, and rewrites it to improve how the frozen model performs
- Update the smaller LLM's weights, keep the larger LLM frozen
The hope is that the small LLM would learn to 'steer' the CoT of the frozen larger model better than a human could, increasing performance.
@corbtt reminded me of this work by @brendanh0gan...
Brendan, how did it go? Seems pretty similar to what I'm thinking here.

3.7. klo 08.26
big models are great agents but often too big, closed, or delicate to fine-tune
idea: train a small model to craft context for a frozen big model, score the big model's outputs, use that as reward for the small one
grpo for context tuning. more below

17,57K
Johtavat
Rankkaus
Suosikit