currently using one of the new LLMs at work and the tech is insane but we're already seeing diminishing returns on its ability to generalize to real-world tasks