Trendaavat aiheet
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
Has OpenAI achieved very-long-episode RL with this experimental model?
Screenshot from @natolambert's article on "What comes next with reinforcement learning".
Nathan says in this article - Where current methods are generating 10K-100K tokens per answer for math or code problems during training, the sort of problems people discuss applying next generation RL training to would be 1M-100M tokens per answer. This involves wrapping multiple inference calls, prompts, and interactions with an environment within one episode that the policy is updated against.
Maybe this breakthrough is a combination of both - very-long-episode RL & scaling TTC to 1M-100M tokens per answer!


19.7. klo 15.50
5/N Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.
9,01K
Johtavat
Rankkaus
Suosikit