4 - Statistics

Contents

4 - Statistics#

4.1 - Introduction#

4.2 - Maximum likelihood estimation (MLE)#

4.2.1 - Definition#

4.2.2 - Justification for MLE#

4.2.3 - Example: MLE for the Bernoulli distribution#

4.2.4 - Example: MLE for the categorical distribution#

4.2.5 - Example: MLE for the univariate Gaussian#

4.2.6 - Example: MLE for the multivariate Gaussian#

4.2.7 - Example: MLE for linear regression#

4.3 - Empirical risk minimization (ERM)#

4.3.1 - Example: minimizing the misclassification rate#

4.3.2 - Surrogate loss#

4.4 - Other estimation methods *#

4.4.1 - The method of moments#

4.4.2 - Online (recursive) estimation#

4.5 - Regularization#

4.5.1 - Example: MAP estimation for the Bernoulli distribution#

4.5.2 - Example: MAP estimation for the multivariate Gaussian *#

4.5.3 - Example: weight decay#

4.5.4 - Picking the regularizer using a validation set#

4.5.5 - Cross-validation#

4.5.6 - Early stopping#

4.5.7 - Using more data#

4.6 - Bayesian statistics *#

4.6.1 - Conjugate priors#

4.6.2 - The beta-binomial model#

4.6.3 - The Dirichlet-multinomial model#

4.6.4 - The Gaussian-Gaussian model#

4.6.5 - Beyond conjugate priors#

4.6.6 - Credible intervals#

4.6.7 - Bayesian machine learning#

4.6.8 - Computational issues#

4.7 - Frequentist statistics *#

4.7.1 - Sampling distributions#

4.7.2 - Gaussian approximation of the sampling distribution of the MLE#

4.7.3 - Bootstrap approximation of the sampling distribution of any estimator#

4.7.4 - Confidence intervals#

4.7.5 - Caution: Confidence intervals are not credible#

4.7.6 - The bias-variance tradeoff#

4.8 Exercises#