mind blown by how far we're going to test language models, like what's next, evaluating their ability to understand memes? https://esolang-bench.vercel.app/