4 - Statistics
4.2 - Maximum likelihood estimation (MLE)
4.2.2 - Justification for MLE
4.2.3 - Example: MLE for the Bernoulli distribution
4.2.4 - Example: MLE for the categorical distribution
4.2.5 - Example: MLE for the univariate Gaussian
4.2.6 - Example: MLE for the multivariate Gaussian
4.2.7 - Example: MLE for linear regression
4.3 - Empirical risk minimization (ERM)
4.3.1 - Example: minimizing the misclassification rate
4.4 - Other estimation methods *
4.4.1 - The method of moments
4.4.2 - Online (recursive) estimation
4.5 - Regularization
4.5.1 - Example: MAP estimation for the Bernoulli distribution
4.5.2 - Example: MAP estimation for the multivariate Gaussian *
4.5.3 - Example: weight decay
4.5.4 - Picking the regularizer using a validation set
4.6 - Bayesian statistics *
4.6.2 - The beta-binomial model
4.6.3 - The Dirichlet-multinomial model
4.6.4 - The Gaussian-Gaussian model
4.6.5 - Beyond conjugate priors
4.6.6 - Credible intervals
4.6.7 - Bayesian machine learning
4.6.8 - Computational issues
4.7 - Frequentist statistics *
4.7.1 - Sampling distributions
4.7.2 - Gaussian approximation of the sampling distribution of the MLE
4.7.3 - Bootstrap approximation of the sampling distribution of any estimator
4.7.4 - Confidence intervals
4.7.5 - Caution: Confidence intervals are not credible
4.7.6 - The bias-variance tradeoff