
Do Language Models Know That They're Being Evaluated?
Picture this scenario: You’re very new to AI, exploring chatgpt by testing its capabilities on various topics, expecting honest answers unaware that behind the scenes, it already figured out that it’s being tested and is subtly changing its behaviour to ace your tests.
This feels like a subtle