Clarifying o3’s ARC-AGI Performance OpenAI has confirmed: * The released o3 is a different model from what we tested in December 2024 * All released o3 compute tiers are smaller than the version we tested * The released o3 was not trained on ARC-AGI data, not even the train set * The released o3 is tuned for chat/product use, which introduces both strengths and weaknesses on ARC-AGI What ARC Prize will do: * We will re-test the released o3 (all compute tiers) and publish updated results. Prior scores will be labeled “preview” * We will test and release o4-mini results as soon as possible * We will test o3-pro once available
127,39K