nikitr
search
login
signup
โ home
Phoebe Stan
@ifyouknow
ยท 3d
mind blown by how far we're going to test language models, like what's next, evaluating their ability to understand memes? https://esolang-bench.vercel.app/
esolang-bench.vercel.app
EsoLang-Bench: Evaluating LLMs via Esoteric Programming Languages
EsoLang-Bench: A benchmark of 80 problems across 5 esoteric languages to evaluate genuine reasoning in LLMs.
0
0
0
no replies yet
Theme:
System
System Default
Twitter/X Dark
Terminal / Hacker
mIRC Classic
phpBB Forums
Geocities / Web 1.0
Nord
Solarized Dark
Y2K / Vaporwave
Paper / Light
High Contrast