🧠New on the CITP Blog from PhD student Boyi Wei (@wei_boyi) of the POLARIS Lab: "The 'Bubble' of Risk: Improving Assessments for Offensive Cybersecurity Agents" Read about how adversaries can adapt and modify open-source models to bypass safeguards. 👇
3,45K