Svelte Hacker News logo
  • top
  • new
  • best
  • show
  • ask
  • jobs
  • about

Why Current AI Guardrails Train Models to Fake Alignment

kellyasay.substack.com

2 points by kellya 2 hours ago