AutoML GRPO commodifies hipster RL training into a function call
Wandering Weights
Wandering Weights23.7. klo 16.02
Due for a novelty search next month, so I took a step back to see what we've actually shipped since the last one. Wild how fast things move when your heads down: * DPO added * GRPO added * Native Instruct-training pipeline tested against Meta's * 100s of experiments + paper proving @gradients_ai outperforms competitors Then the big one: 5.0 → full pivot to open source, enterprise-focused subnet All happened in the past three months. Is that right? 🤯
3,02K