
Your Horrible Code is Making LLMs Evil: Exploring Emergent Misalignment
What is Emergent Misalignment?
One bad apple can spoil the bunch. Apparently this stands true when speaking of finetuning tasks too. A recent paper uncovered a quite interesting phenomenon: finetuning an LLM on insecure code led it show homicidal tendencies in conversations. And this is not just a fluke, but