Fascinating! I've been really interested in the challenges of scaling up video models. Can't wait to see how this multimodal approach tackles the issues. going to give this a read.