The objectives of this laboratory are
Maximum Likelihood
Be sure to review the appropriate sections in Chapter 4 on maximum likelihood estimation.
First, download and extract the R files for this lab. Before we actually get the maximum of the likelihood, let's examine the binomial likelihood for some sample data, using the R program binomial_like.R. This program allows the user to specify N binomial trials and x "successes" and displays the likelihood, log likelihood, and gradient (derivative of the log likelihood with respect to p) for a range of p values (theoretically 0 to 1).
Here we provide a thumbnail sketch of how to find the maximum of a likelihood function; in other words, the mechanics of MLE. We will use a very simple but nonetheless important likelihood, the binomial, to illustrate most of these points.
The first thing you should see is that the procedures for maximizing likelihoods
are the same as those we used in the last lab finding the maximum of a
function. These approaches will work for any continuous, differentiable
function of the parameters. Before you protest: "but the binomial is a discrete
distribution," remember that we are discussing the likelihood function-
which is a continuous function of the parameters (in the case, the
single parameter p). We will use a R program and the
Model evaluation and comparison: Likelihood ratio and AIC
One of the most important themes of this course is that "naked" estimates of parameters are of little value: one simply must report measures of credibility for these estimates. Standard confidence intervals provide some information on parameter precision, but we must also have means of determining goodness of fit. In addition, we often (usually) will be faced with the choice of several models that may fit the same data. We will consider 2 approaches that differ philosophically: one, likelihood ratio testing (LRT), is based on comparison of null and alternative models, and the other, AIC, is based on information theory. Review the following to see how AIC and LRT can both be computed from the likelihood. In the example, we will be using R to compute likelihoods, LRT, and AIC for simple 2-sample binomial data sets (program aic.R). For each data set (creatively labeled "1, 2 and 3") the program calculates the likelihood under 2 models: model 1 with 2 parameters (1 for each sample), and model 0 with 1 parameter (common to both samples). The program then computes a LRT of Model 1 vs. Model 0, and computes AIC for each model. The 3 samples exhibit differing degrees of difference in the underlying parameters between the sample; examples can be added with different sample sizes.
Profile likelihood
Once we've selected the 'best' (e.g., based on AIC) model for estimation, we still need to provide information on how reliable the estimate(s) from the model is (are). A conventional way of doing this is to report 95% (or other %) confidence limits on the parameter, which to a frequentist statistician signifies the proportion of sample estimates whose confidence intervals would include the true parameter value in a very large number of repeated samples. As noted in Chapter 4, a property of MLE is that ML estimates asymptotically (i.e., in the limit, with big sample sizes) follow a normal distribution, so that one easy way to get (for example) a 95% percent confidence interval is
ESTIMATE ± SE (ESTIMATE) *1.96
with 1.96 corresponding to the 5% standard normal deviate. However, for small sample sizes this approach tends to work poorly (i.e., coverage is < nominal); in addition certain parameter types (such as probabilities) have natural upper and lower limits (1 and 0) and the approach can easily result in confidence intervals that include logically inadmissible parameter values (negative, >1). An alternative is profile likelihood, which as the name implies, directly uses the likelihood function to calculated a confidence interval. Computations of profile likelihood are illustrated in the program binomial_profile.R . An easy way to get a quick approximation of the profile confidence limits is to find 5% on the prof_p probability axis (the vertical axis) and the 2 points on the curve that this probability intersects; dropping a line to the horizontal axis will give the values of p corresponding to the upper and lower confidence limits.
Markov-chain-
First, read the brief
tour of Bayes and MCMC in the lecture notes. In this exercise, we will use WinBugs in R to perform Bayesian estimation via MCMC. We
will do the same example as in the ‘brief tour’: a binomial experiment with
n=100 and x=30. Open beta_binom.R in R. There are 2
parts of the program: a model class in which the model parameters and
likelihood are defined, and a main program in which the data are defined and
the model is called, along with some options for diagnostics. The program runs
10,000 MCMC iterations of this likelihood with a beta(a,b)
prior, discarding the first 5,000 as burn-in. Model summary output is displayed
in the interactive window (this can be sent to an output file if desired) and
graphics (parameter traces, histograms, and convergences diagnostics) are
automatically saved to graphics files. Run the program with the default values
of a=1, b=1, providing a uniform prior. Then look at the estimates in the
output and the graphics so we can discuss these. Run the program again with
a=20, b=20. What changes in the output/ graphics? What does this mean?
Last updated 08 December 2008