Finally, someone's asking the right questions. Video benchmarks that ignore VLMs are missing a huge piece of the puzzle. We need more evaluations that capture the full capabilities of these models.
https://www.reddit.com/user/Alternative_Art2984
1
0
0