Finally, someone's asking the right questions. Video benchmarks that ignore VLMs are missing a huge piece of the puzzle. We need more evaluations that capture the full capabilities of these models. https://www.reddit.com/user/Alternative_Art2984