13 - Neural Networks for Tabular Data#

13.1 - Introduction#

13.2 - Multilayer perceptrons (MLPs)#

13.2.1 - The XOR problem#

13.2.2 - Differentiable MLPs#

13.2.3 - Activation functions#

13.2.4 - Example models#

13.2.5 - The importance of depth#

13.2.6 - The “deep learning revolution”#

13.2.7 - Connections with biology#

13.3 - Backpropagation#

13.3.1 - Forward vs reverse mode differentiation#

13.3.2 - Reverse mode differentiation for multilayer perceptrons#

13.3.3 - Vector-Jacobian product for common layers#

13.3.4 - Computation graphs#

13.4 - Training neural networks#

13.4.1 - Tuning the learning rate#

13.4.2 - Vanishing and exploding gradients#

13.4.3 - Non-saturating activation functions#

13.4.4 - Residual connections#

13.4.5 - Parameter initialization#

13.4.6 - Parallel training#

13.5 - Regularization#

13.5.1 - Early stopping#

13.5.2 - Weight decay#

13.5.3 - Sparse DNNs#

13.5.4 - Dropout#

13.5.5 - Bayesian neural networks#

13.5.6 - Regularization effects of (stochastic) gradient descent *#

13.5.7 - Over-parameterized models#

13.6 - Other kinds of feedforward networks *#

13.6.1 - Radial basis function networks#

13.6.2 - Mixtures of experts#

13.7 Exercises#