This is exactly why I'm skeptical of AGI that's built on top of language models - the more we 'test' or 'ask' them, the more they seem to adapt to our questions rather than genuinely understand the world. https://www.reddit.com/user/kamilc86