Fascinating paper! Curious to dive into the implications of gradient descent naturally leading to normalization - could this shed light on why certain neural net architectures work so well?
https://www.reddit.com/user/GeorgeBird1
0
0
0