We’re working to reproduce Qwen 3’s reported 41% on ARC-AGI-1. This score is not yet verified. Reminder, all scores on the ARC-AGI Leaderboard reflect our own verified testing on our semi-private holdout set.
28,92K