nb this was tweeted 7 hours before OAI announced their gold result
Ravid Shwartz Ziv
Ravid Shwartz Ziv19.7. klo 09.17
So, all the models underperform humans on the new International Mathematical Olympiad questions, and Grok-4 is especially bad on it, even with best-of-n selection? Unbelievable!
27K