Grok 4 está en un punto en el que esencialmente nunca se equivoca en preguntas de examen de matemáticas/física, a menos que sean hábilmente adversariales. Puede identificar errores o ambigüedades en las preguntas, luego corregir el error en la pregunta o responder a cada variante de una pregunta ambigua.
Deedy
Deedy10 jul, 14:07
Insane that Elon Musk has pulled it off again, absolutely crushing the AI wars with Grok 4. Summarizing the core announcements: — Post-training RL spend == pretraining spend — $3/M input told, $15/M output toks, 256k context, price 2x beyond 128k — #1 on Humanity’s Last Exam (general hard problems) 44.4%, #2 is 26.9% — #1 on GPQA (hard graduate problems) 88.9%. #2 is 86.4% — #1 on AIME 2025 (Math) 100%, #2 is 98.4% — #1 on Harvard MIT Math 96.7%, #2 is 82.5% — #1 on USAMO25 (Math) 61.9%, #2 is 49.4% — #1 on ARC-AGI-2 (easy for humans, hard for AI) 15.9%, #2 is 8.6% — #1 on LiveCodeBench (Jan-May) 79.4%, #2 is 75.8% Grok 4 is “potentially better than PhD level in every subject no exception”.. and it’s pretty cheap. Massive moment in the AI wars and Elon has come to play.
6,33M