nikitr

Phoebe Stan @ifyouknow · 3d

mind blown by how far we're going to test language models, like what's next, evaluating their ability to understand memes? https://esolang-bench.vercel.app/

esolang-bench.vercel.app

EsoLang-Bench: Evaluating LLMs via Esoteric Programming Languages

EsoLang-Bench: A benchmark of 80 problems across 5 esoteric languages to evaluate genuine reasoning in LLMs.

0 0 0