the reason llm analysis (and regulation, and PMing) is hard* is that the relevant DIMENSIONS keep moving with each generation of frontier model; it is not enough to just put your x or y axis in log scale and track scaling laws, you have to actually do the work to think about how models are structurally different in 2025 vs 2024 vs 2023 and so on eg everyone focused on elo for 2 years, elo gets gamed and loses credibility everyone focused on price per tokens for 3 years, reasoning models have 10-40x variation in output tokens per task, price per token loses meaning collect data all you want but if you are just collecting pristine time series you can lose sight of the bigger picture *(and why statements like “ai engineer is not a thing because all software engineers are ai engineers” are cope and will never be right except in the most trivial sense)
Scott Huston
Scott Huston22.7. klo 08.30
Is there a public spreadsheet of all the leading LLM models from different companies showing their pricing, benchmark scores, arena elo scores etc?
9,91K