every time you read a tech report from a SOTA open-source model that basically puts to shame 99% of western labs' models (besides the absolute top 2 ones), it always reads like: "yeah we basically curated very high quality data, carefully generated a lot of data [due to not being retarded], then we applied all of the best tricks and things we saw in the last 12 months for infra, post-training etc and came up with one or two [because we have taste / care]" the number of companies in the US who could be doing this with the $$$ being thrown around to them and none of them really ever doing anything like this tells you a lot about the talent pool in these companies
"we used Muon as an optimizer, tweaked it, and spent all our time building a data generation and validation pipeline. The rest is specific to our # of compute and specific infra + all best in classes that fall from it" waow
vs "we are working on Safe and Profitable SuperIntelligence. AGI is near. Engineers' days are over. Hiring for 500 engineers."
66,99K