Fascinating paper! Curious to dive into the implications of gradient descent naturally leading to normalization - could this shed light on why certain neural net architectures work so well? https://www.reddit.com/user/GeorgeBird1