The objectives of this laboratory are

  • To familiarize students with important concepts in likelihood-based inference
  • To examine the impacts of sample size and detection rates in statistical inference, and see how these might affect sample design and the selection estimators.
  • Use selected computer-intensive approaches for variance estimation
  • Introduce students to Bayesian statistical methods.

Maximum Likelihood

Be sure to review the appropriate sections in Chapter 4 on maximum likelihood estimation.

First, download and extract the R files for this lab. Before we actually get the maximum of the likelihood, let's examine the binomial likelihood for some sample data, using the R program binomial_like.R. This program allows the user to specify N binomial trials and x "successes" and displays the likelihood, log likelihood, and gradient (derivative of the log likelihood with respect to p) for a range of p values (theoretically 0 to 1).

Here we provide a thumbnail sketch of how to find the maximum of a likelihood function; in other words, the mechanics of MLE. We will use a very simple but nonetheless important likelihood, the binomial, to illustrate most of these points.

The first thing you should see is that the procedures for maximizing likelihoods are the same as those we used in the last lab finding the maximum of a function. These approaches will work for any continuous, differentiable function of the parameters. Before you protest: "but the binomial is a discrete distribution," remember that we are discussing the likelihood function- which is a continuous function of the parameters (in the case, the single parameter p). We will use a R program and the Newton method in bin_newt.R to obtain the MLE ; you should confirm that the numerical result from this method is the same that which you'd expect using the intuitive estimator p-hat= x/n. For your additional enjoyment, I've included a second SAS program that performs the Newton method for a 2-parameter distribution (a trinomial, actually).

Model evaluation and comparison: Likelihood ratio and AIC

One of the most important themes of this course is that "naked" estimates of parameters are of little value: one simply must report measures of credibility for these estimates. Standard confidence intervals provide some information on parameter precision, but we must also have means of determining goodness of fit. In addition, we often (usually) will be faced with the choice of several models that may fit the same data. We will consider 2 approaches that differ philosophically: one, likelihood ratio testing (LRT), is based on comparison of null and alternative models, and the other, AIC, is based on information theory. Review the following to see how AIC and LRT can both be computed from the likelihood. In the example, we will be using R to compute likelihoods, LRT, and AIC for simple 2-sample binomial data sets (program aic.R). For each data set (creatively labeled "1, 2 and 3") the program calculates the likelihood under 2 models: model 1 with 2 parameters (1 for each sample), and model 0 with 1 parameter (common to both samples). The program then computes a LRT of Model 1 vs. Model 0, and computes AIC for each model. The 3 samples exhibit differing degrees of difference in the underlying parameters between the sample; examples can be added with different sample sizes.

Profile likelihood

Once we've selected the 'best' (e.g., based on AIC) model for estimation, we still need to provide information on how reliable the estimate(s) from the model is (are). A conventional way of doing this is to report 95% (or other %) confidence limits on the parameter, which to a frequentist statistician signifies the proportion of sample estimates whose confidence intervals would include the true parameter value in a very large number of repeated samples. As noted in Chapter 4, a property of MLE is that ML estimates asymptotically (i.e., in the limit, with big sample sizes) follow a normal distribution, so that one easy way to get (for example) a 95% percent confidence interval is

ESTIMATE ± SE (ESTIMATE) *1.96

with 1.96 corresponding to the 5% standard normal deviate. However, for small sample sizes this approach tends to work poorly (i.e., coverage is < nominal); in addition certain parameter types (such as probabilities) have natural upper and lower limits (1 and 0) and the approach can easily result in confidence intervals that include logically inadmissible parameter values (negative, >1). An alternative is profile likelihood, which as the name implies, directly uses the likelihood function to calculated a confidence interval. Computations of profile likelihood are illustrated in the program binomial_profile.R . An easy way to get a quick approximation of the profile confidence limits is to find 5% on the prof_p probability axis (the vertical axis) and the 2 points on the curve that this probability intersects; dropping a line to the horizontal axis will give the values of p corresponding to the upper and lower confidence limits.

Markov-chain- Monte Carlo: A Bayesian alternative to MLE

            First, read the brief tour of Bayes and MCMC in the lecture notes. In this exercise, we will use WinBugs in R to perform Bayesian estimation via MCMC. We will do the same example as in the ‘brief tour’: a binomial experiment with n=100 and x=30. Open beta_binom.R  in R. There are 2 parts of the program: a model class in which the model parameters and likelihood are defined, and a main program in which the data are defined and the model is called, along with some options for diagnostics. The program runs 10,000 MCMC iterations of this likelihood with a beta(a,b) prior, discarding the first 5,000 as burn-in. Model summary output is displayed in the interactive window (this can be sent to an output file if desired) and graphics (parameter traces, histograms, and convergences diagnostics) are automatically saved to graphics files. Run the program with the default values of a=1, b=1, providing a uniform prior. Then look at the estimates in the output and the graphics so we can discuss these. Run the program again with a=20, b=20. What changes in the output/ graphics? What does this mean?

Assignment 2. Due by beginning of next lab. Send a 1-2 page write up accompanied by any programs or other methods used to derive solutions to TA via email.

 


Send mail to Instructor Return to home page

Last updated 08 December 2008

Powered by Zope