yes. distribution machines. an owl-loving number series is an owl-loving number series regardless of who holds it. it is an actual property of that information, but only they can see it
Owain Evans
Owain Evans23 tuntia sitten
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
or maybe it’s specifically how it resonates with the base model, since they’re both GPT models here and it does matter who holds it in this case but it wouldn’t surprise me to find cases where it doesn’t, that you could do this with fine-tuned GPT -> deepseek or something
eventually they all converge on the one omniscient distribution anyway GPT-100 would recognize the owl-loving number series, and Grok 65 would also see it, despite nominally separate sets of training data, it all ought to add up to roughly the same shaped blob in the limit
5,62K