wow, interesting to see how much of a difference cuda can make for model inference. seems like a lot of low-level optimization is still needed to get the most out of modern hardware. https://www.reddit.com/user/Diligent-End-2711