In this lab we will use program MARK to construct statistical models for one- and two-age band recovery data. The basic time-specific models for these data structures have been around for many years, in programs such as ESTIMATE, BROWNIE, and MULT. MARK provides estimation under all of these historical models, but in addition provides tremendous flexibility for construction of other models, using parameter index matrices (PIMS) and design matrices. MARK also provides model comparison via AIC, model averaging, goodness of fit testing (including parametric bootstrapping), and many other features. We will also use the program BAND2 developed by Ken Wilson, to compute estimates of sample sizes needed for band recovery studies.
Here we provide a brief overview of band recovery data structures and
models. Further detail is provided by the lecture notes (part1 and part2) and by Chapter 16 of
Williams et al. (in press). The basis data structure for a band recovery
problem involves a vector of animals that have been banded and released, and a
matrix of recoveries from each banding occasion (i) at each recovery
occasion (j). For example the data for a i = 3 banding
occasions (e.g., years) and j= 4 recovery occasions (e.g., hunting
seasons) would be
|
|
Recoveries in period j |
||||
|
Releases in period i |
1 |
2 |
3 |
4 |
Not recovered |
|
R1 |
m11 |
m12 |
m13 |
m14 |
R1-( m11+m12+m13+m14) |
|
R2 |
|
m22 |
m23 |
m24 |
R2-( m22+m23+m24) |
|
R3 |
|
|
m33 |
m34 |
R3-( m33+m34) |
A multinomial model is constructed to estimated survival (Si) and recovery (fi) rates from the data, via maximum likelihood methods. The standard Brownie model for the above data structure would provide expected values for the data (i.e, given MLEs of the parameters) of :
EXPECTED NUMBER
|
|
Recoveries in period j |
||||
|
Releases in period i |
1 |
2 |
3 |
4 |
Not recovered |
|
R1 |
R1f1 |
R1S1f2 |
R1S1S2f3 |
R1S1S2S3f4 |
R1- (E[ m11]+E[m12]+E[m13]+E[m14]) |
|
R2 |
|
R2f2 |
R2S2f3 |
R2S2S3f4 |
R2- (E[m22]+E[m23]+E[m24]) |
|
R3 |
|
|
R3f3 |
R3S3f4. |
R3- (E[m33]+E[m34]) |
In this example, 3 recovery rates (f1 , f2, f3 ) and 2 survival rates (S1 , S2) can be estimated uniquely; the remaining 2 can only be estimated as a product S3f4 (i.e, a single parameter), which is the case anytime the number of recovery occasions (l) > the number of banding occasions (k).
The above example shows the model under the parameterization of Brownie et al. (1985) and is 1 of the options available in MARK. The other Seber parameterization is also available, and is explained further in the lecture notes and Chapter 16.
We illustrate the above basic model structure, with the added wrinkle of stratification by groups, using a data set for adult male and female American black ducks banded during 1989-98 (10 years) and recovered over the same 10 years. The data are encoded in the summary band recovery format for MARK, which is 1) a matrix of band recoveries, followed by 2) a vector of the numbers banded and released in each year. Read these data into program MARK. Note that in this example there are 10 encounter occasions (and 2 groups, male first and then female). In general, the number of 'encounter occasions' corresponds for a band recovery problem is the number of recovery periods. In MARK the data matrix must contain the same number of rows (capture occasions) as columns (recovery occasions). When k > l this is accomplished by simply adding 1 or more rows of zeros in the recovery matrix, and augmenting the banding vector by zeroes, shown here for an artificial data set resembling the black duck data augmented by one recovery year.
When these data are read into MARK, open and examine the PIM for the
"global" S(g*t)f(g*t) model. The first PIM is for survival rates of
males, and will look as follows:
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
|
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
|
|
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
|
|
|
4 |
5 |
6 |
7 |
8 |
9 |
|
|
|
|
|
5 |
6 |
7 |
8 |
9 |
|
|
|
|
|
|
6 |
7 |
8 |
9 |
|
|
|
|
|
|
|
7 |
8 |
9 |
|
|
|
|
|
|
|
|
8 |
9 |
|
|
|
|
|
|
|
|
|
9 |
That is, the first 9 parameters are the survival rates for adult males for
the 1st, 2nd, ... 9th interval (i.e., there
are 10 years of banding, so 9 intervals), S1, S2,
S3, ....., S9. These parameters are
time-specific but not cohort (year of release) specific, so for instance an
animal released in year 3 has exactly the same survival rate between year 5 and
6 (parameter index 5) as one release in year 1, since these indices are the same
throughout the column (representing survival year) for all rows (representing
release years). Likewise the female survival rates are indexed by
|
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
|
|
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
|
|
|
12 |
13 |
14 |
15 |
16 |
17 |
18 |
|
|
|
|
13 |
14 |
15 |
16 |
17 |
18 |
|
|
|
|
|
14 |
15 |
16 |
17 |
18 |
|
|
|
|
|
|
15 |
16 |
17 |
18 |
|
|
|
|
|
|
|
16 |
17 |
18 |
|
|
|
|
|
|
|
|
17 |
18 |
|
|
|
|
|
|
|
|
|
18 |
In the same manner, the recover rates for males are indexed by parameters 19-28, and for females of 29-28 (one for each year of banding for each sex).
This is the general or 'global' model describing time- and group (sex)-
specific survival and recovery. Reduced parameter models can be formed either
automatically (with the run|predefined models feature) or using the PIMs. For
example, survival can be made time- but not sex- specific by setting the first
2 PIMS equal to
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
|
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
|
|
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
|
|
|
4 |
5 |
6 |
7 |
8 |
9 |
|
|
|
|
|
5 |
6 |
7 |
8 |
9 |
|
|
|
|
|
|
6 |
7 |
8 |
9 |
|
|
|
|
|
|
|
7 |
8 |
9 |
|
|
|
|
|
|
|
|
8 |
9 |
|
|
|
|
|
|
|
|
|
9 |
Other models, including ones not available in the menu, can be created by
manipulating the PIMS. For example,
|
1 |
1 |
1 |
1 |
1 |
2 |
2 |
2 |
2 |
|
|
1 |
1 |
1 |
1 |
2 |
2 |
2 |
2 |
|
|
|
1 |
1 |
1 |
2 |
2 |
2 |
2 |
|
|
|
|
|
1 |
2 |
2 |
2 |
2 |
|
|
|
|
|
1 |
2 |
2 |
2 |
2 |
|
|
|
|
|
|
2 |
2 |
2 |
2 |
|
|
|
|
|
|
|
2 |
2 |
2 |
|
|
|
|
|
|
|
|
2 |
2 |
|
|
|
|
|
|
|
|
|
2 |
specifies that annual survival rates are constant during the 1st 5 years and last 4 years, but hypothesized as different between 2 periods. Many additional models can be constructed from the PIMs, but some models will require use of a design matrix. Return to the "global" model S(g*t)f(g*t); retrieve this model, and obtain the 'full' design matrix. Consider the portion of this matrix related just to survival in the 2 groups (parameters 1-18). You should see something like this:
|
S int |
S g1 |
S t1 |
S t2 |
S t3 |
S t4 |
S t5 |
S t6 |
S t7 |
S t8 |
Sg1*t1 |
Sg1*t2 |
Sg1*t3 |
Sg1*t4 |
Sg1*t5 |
Sg1*t6 |
Sg1*t7 |
Sg1*t8 |
|
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
1 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
|
1 |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
|
1 |
1 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
|
1 |
1 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
|
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
|
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
|
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
where the first column is an intercept (on the logit scale), the second a "group effect" (1=males, 0=females, the next columns 8 columns are time (year) effects, and the last 8 columns are group*year interaction. As with the PIM version of this model, it allows for all combinations of survival rates in group*year, i.e. 18 survival, 9 for males and 9 for females. However, it may be possible to describe both sex- and time-specific survival variation in a biologically meaningful but simpler way. In particular, whereas there may be intrinsic differences in male and female survival (e.g., females have lower survival because of increased risk to predation during nesting), there are other factors that vary from year to year, such as weather conditions, that should affect males and females equally, one the average difference between male and female survival is accounted for. Under such a model there would continue to be group and time effects, but no group * time interaction. This model is formed by simply eliminating the Sg1*t1 - Sg1*t8 columns in the previous matrix, producing
|
S int |
S g1 |
S t1 |
S t2 |
S t3 |
S t4 |
S t5 |
S t6 |
S t7 |
S t8 |
|
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
1 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
|
1 |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
|
1 |
1 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
|
1 |
1 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
|
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
|
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
|
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
This is a model in which (on the logit scale) group effects are additive to time effects; a plot of logit-transformed survival rates over time would appear parallel for the 2 groups (males and females), with one group consistently above or below the other one.
If there is evidence (e.g, based on AIC) that this model performs well compared to the more complex model, it probably should be used to describe the time and sex variation in survival for 3 reasons: 1) it makes biological sense, a priori; 2) it is a simpler model, with fewer parameters, and 3) often it is easier to understand biological effects in the absence of interaction. The same logic may hold true for recovery rates: if there are sex-specific harvest regulations then a priori one expects the harvest rates for one sex to be higher or lower than for the other sex. Therefore construction of additive models for the recovery rate groups of parameters may be sensible, as well.
Goodness of fit and QAIC
Before proceeding further with model building and testing, it is important to check the "global model" to see if it adequately fits the data. MARK produces a "deviance" statistic which can be used as a measure of model fit. Under the null hypothesis that the model fits the data, this statistic follows a chi-square distribution with degrees of freedom (df) supplied by the MARK output, and one may perform a test of this null hypothesis by comparison to a chi-square table. However, a somewhat more useful approach is based on the recognition that if the model fits the data, in theory the ratio
c-hat= deviance / df,
which is also produced by MARK, should equal 1. In practice this ratio often exceeds 1, even for models in which all sources of variation (e.g., time, group, age) are captured by the model. There are 2 possible reasons for this. First, an even more complex model is needed to capture variation in the data, i.e, there is heterogeneity in survival, recovery, or both that is not captured by the model. This could happen when one is using a one-age model inappropriately, when juvenile animals are part of the released sample. Stratifying the data by age and using a 2-age model may solve this problem. However, even when all identifiable forms of variability have been modeled it is possible that the basic multinomial model still does not capture properly reflect variability in the data, due to what is known as "overdispersion" or "extra-biniomial variability". In either case, likelihood values, AIC, and parameter variances should all be adjusted accordingly. Assuming that c-hat fairly represents this "extra-biniomial variability", the preceding statistics can all be adjusted with MARK, using the 'adjustments' tool in the results browser. This adjustment produces what is known as a "quasilikelihood" or likelihood adjusted by the variance inflation factor, and from this "quasi-AIC" or QAIC; estimated variances and confidence intervals are also adjusted. In the black duck example MARK produces a c-hat of 1.26, which should be used as an adjustment. Computation of c-hat based on the model deviance is sometimes not an accurate reflection of "extra-biniomial variability", and alternatives based on bootstrap sampling exist for the band recovery (Seber parameterization) and some other models in MARK. I performed 1000 bootstrap simulations under the S(g*t) f(g*t) model and came up with an average c-hat of 1.18. Since the simulations generated data where S(g*t) f(g*t) was the "true" model, a fairer computation of c-hat based on the data would 1.26/1.18 = 1.07, which is the value I recommend as an adjustment.
One-age example with a time covariate:
As with known fates data, both time-specific and individual covariates can
be incorporated into band recovery analysis. As an example, we have data from
adult female mallards during 1966-78 in Prairie
D={0.978, 2.245, 1.268, 1.090. 1.336, 1.307, 2.260, 0.665, 0.890, 1.318, 2.116, 0.871}.
Read the band recovery data for these mallards (13 occasions, 1 group, no individual covariates). Construct and run the global S(g*t) f(g*t). Then retrieve the design matrix for this model and obtain the "full" design matrix.
|
S_int |
S t1 |
S t2 |
S t3 |
S t4 |
S t5 |
S t6 |
S t7 |
S t8 |
S t9 |
S t10 |
S t11 |
|
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
|
1 |
0 |
0 |
0 |
|