kl11_workshop_1707.pdf
In this workshop, Kevin gave a comprehensive and complete overview of neural networks 👏🏻
He went through what neural networks are, how they are trained, and how one would go about coding one in pytorch.
There were some interesting questions and discussion during the session. Here are a few comments not in the slides:
- Differentiability of functions. In the backward pass, gradients are computed via the chain rule which includes activation functions. Although all functions should be differentiable in most of the domain, non-differentiable points can be handled, which enables non-differentiable activation functions to be used (e.g., ReLU). These non-differentiable points are handled in two ways:
- Using sub-gradients: sub-gradients are the set of gradients compatible with a non-differentiable point of a function (to give a vague but intuitive definition). These are particularly relevant for convex or ‘pointy’ functions. A valid sub-gradient can be chosen automatically from this set.
- Arbitrarily defining the slope: for ReLU, for example, the gradient at $x=0$ is simply set to 0 or 1: $\text{ReLU}'(0) = 0$ or $\text{ReLU}'(0) = 1$. But these are still sub-gradients!
- Gradients are calculated analytically.
torch provides built-in functions to compute gradients and backpropagate easily.