I share Leike's concerns here (and others) but fully agree that this is an excellent thing to do and I hereby endorse the paper.
Jan Leike
Jan Leike16.7. klo 04.27
If you don't train your CoTs to look nice, you could get some safety from monitoring them. This seems good to do! But I'm skeptical this will work reliably enough to be load-bearing in a safety case. Plus as RL is scaled up, I expect CoTs to become less and less legible.
12,91K