
CHAPTER 14
ESTIMATION
OF SPECIES RICHNESS AND OTHER COMMUNITY PARAMETERS
WORKED
EXAMPLES
Box 14.1 Analysis of Community Richness Estimates Using CAPTURE
Here we revisit part of the data from our research examples on presence-absence analysis from Chapter 8. This data set represents surveys of songbirds on former agricultural fields that were being converted to longleaf pines through a government program designed to take crop fields out of production and put them back into one of the formerly widespread ecosystems in the region. We completed three surveys (sampling periods) of each of 41 fields (quadrats) from southern Georgia during 2001. Our individual surveys on each quadrat were completed using a 250 m transect thereby standardizing effort. In this study we detected 43 species over 2 years, most of which are classified as grassland or shrub-scrub nesters.
This spreadsheet shows the data construct for all the longleaf pine fields we surveyed in southern Georgia over 3 survey periods. The columns represent each survey (or sampling period) undertaken during the nesting season of 2001.
In our first analysis we are using the online version of the program CAPTURE from the Patuxent Website to undertake the community analysis. We begin by copying our data from our data from the spreadsheet into some type of text editor to create a text file where the first two columns are the identifier, followed by one space, followed by the 3 data columns. This is because the program cannot read spreadsheet formats—it needs the data in ASCII format. The next step is to create the control lines for the program. In this case we will use 4 command lines; title, task, format, and read input data. The title line is obvious. We are describing the number of surveys in the task line. The format line tells the computer what format the data is in—a2 tells it that the identifier is in the first 2 columns, followed by one blank space, followed by 3 data numbers using one column each. The read statement tells it to start reading the data. Scroll down on the CAPTURE site to the data analysis box and either copy and paste the command information from below or type in manually. Then copy and paste in the data from your text file. Finally, copy and paste in the final task lines. Now simply hit the run button. The output file will then pop up on your browser.
The task lines at the end tell the computer which models to run and other types of output to provide. Fortunately, it is easy to simply copy and paste these into the data analysis box on the CAPTURE website.
The following analysis is the data submission of the previous data using program CAPTURE. For this analysis we are assuming a priori that Model Mbh is correct. We have truncated the output to demonstrate only the most important components.
title='LLP
Community line 2001'
task read
captures occasions=3 x matrix
format='(a2,1x,3f1.0)'
read input
data
10 001
11 010
12 010
13 100
14 001
15 110
16 110
17 101
18 110
19 011
20 011
21 011
22 011
23 011
24 110
25 111
26 111
27 111
28 111
29 111
30 111
31 111
32 111
33 111
34 111
35 111
36 111
37 111
38 111
39 111
40 111
41 111
task closure
test
task model
selection
task
population estimate ALL
task
population estimate NULL JACKKNIFE REMOVAL ZIPPEN MT-CH MH-CH MTH-CH
task
population estimate APPROPRIATE
The following is the model selection criteria of Otis et al. (1978) for selecting the best model fit to the data.
Model selection criteria. Model selected has maximum value. Model M(o) M(h) M(b) M(bh) M(t) M(th) M(tb) M(tbh)
Criteria 1.00 0.79 0.30 0.65 0.00 0.44 0.31 0.71
In this analysis we assumed a priori that Mbh would be the appropriate model, but we see that Mh is, therefore we present both below.
Model 1 Mbh
Population estimation with variable probability removal estimator. See M(bh) or removal models of the Monograph for details. Occasion j= 1 2 3 Total caught M(j)= 0 23 30 32 Newly caught u(j)= 23 7 2 k N-hat SE(N) Chi-sq. Prob. Estimated p-bar(j),j=1,..., 3 1 32.00 0.907 0.82 0.3640 0.744 0.744 0.744 Population estimate is 32 with standard error 0.9068—this is not the population, but instead the estimated number of species Approximate 95 percent confidence interval32 to 32 Profile likelihood interval 32 to 35 Histogram of u(j) Frequency 23 7 2 ------------------------------ Each * equals 3 points 24 * 21 * 18 * 15 * 12 * 9 * 6 * * 3 * * * ------------------------------
Model 2 Mh
Population estimation with variable probability of capture by animal. See model M(h) of the Monograph for details. Number of trapping occasions was 3 Number of animals captured, M(t+1), was 32 Total number of captures, n., was 76 Frequencies of capture, f(i) i= 1 2 3 f(i)= 5 10 17 Computed jackknife coefficients N(1) N(2) N(3) N(4) N(5) 1 1.667 2.000 2.000 2 1.000 0.833 0.833 3 1.000 1.000 1.000 The results of the jackknife computations i N(i) SE(i) .95 Conf. Limits Test of N(i+1) vs. N(i)
0 32 Chi-square (1 d.f.)
1 35.3 2.36 30.7 40.0 0.000 2 35.3 2.93 29.6 41.1 0.000 3 35.3 2.93 29.6 41.1 0.000 Average p-hat = 0.7677 Interpolated population estimate is 33 with standard error 2.3892—this is not the population, but instead the estimated number of species Approximate 95 percent confidence interval 33 to 46 estimate: 33.0906906 se: 2.38921213 Histogram of f(i) Frequency 5 10 17 ------------------------------ Each * equals 2 points 18 * 16 * 14 * 12 * 10 * * 8 * * 6 * * * 4 * * * 2 * * * ------------------------------
Looking back at our original data we see our naïve estimate of the total number of species is 23. Our CAPTURE models suggest two estimates of species richness, Mbh provides an estimate of 32 (with a range of 32-33), whereas Mh gives us an estimate of 33 (range 33-46).
It is quite apparent from the analysis is that we are reasonably close to the true species richness of the study area with three surveys; however, as we will see later this analysis includes only one year of data. We will see that our species richness using two years of data suggests some other information.
Box 14.2. Estimation of Species Richness
Using Count Data
In this example we are estimating species richness using empirical distribution of species abundance. In this case we are using the same longleaf pine bird community data from 2001, except taking advantage of our count information. Using the program SPECRICH and online analysis we observed the following distribution of counts:
|
Number of
individuals observed in 3 surveys |
|
|
||
|
1 |
2 |
3 |
4 |
5 |
|
Number of Species |
|
|
|
|
|
5 |
1 |
2 |
1 |
4 |
|
K |
N(JK) |
SE(N(JK)) |
T(K) |
P(K) |
|
1 |
37. |
3.1623 |
1.6787 |
0.0932 |
|
2 |
41. |
5.4772 |
1.5395 |
0.1237 |
|
3 |
46. |
8.3666 |
1.2282 |
0.2194 |
|
4 |
53. |
13.0384 |
1.2328 |
0.2176 |
|
5 |
66. |
22.0907 |
1.0000 |
1.0000 |
|
INTERPOLATED N = 37.0000 STD ERROR OF INTERPOLATED N = 3.1623 |
|
|
||
|
|
|
|||
This output from SPECRICH suggested and estimated species richness of 37+3.16 SE species. Compared to our results from CAPTURE (Box 14.1) where we observed an estimated 33+2.39 species with model Mh; we find that this methods suggests possible more species should be found in this community than we observed from either the CMR or occupancy models.
Box 14.3. Assessing Community Richness
using Occupancy Models
In this example we use the same data as in Box 14.1, except that we are not estimating community richness per se, but relative community richness following MacKenzie et al. (2006). See the data layout and habitat associations of each species as presented at the Patuxent BBS Site in spreadsheet. Therefore we include all species detected on the fields during the whole study as the available species in this ecosystem. Since the bird community of this region is well described, we could have gone to the species list for the region and picked out all the potential candidate species based on known habitat requirements. In this analysis any species not detected during the 3 surveys done in 2001 receive a detection history of “000”. We then assessed 3 models, constant detection, detection varying by survey, and with habitat association of each species as a covariate (grassland, shrub-scrub, or other). We used program PRESENCE to evaluate the data essentially using the same format we did back in Box 7.1.
The highest ranking model was the simple constant detection model which is reported below. In this situation our model suggests that we are not dealing with important detection differences among species (that is if they are not being detected it is due to not being their rather than being harder to find). We are suggesting this because our habitat association covariate was not important—something we believed a priori to be linked with detection.
|
Model |
AIC |
delta AIC |
AIC wgt |
Model Likelihood |
No.Par. |
(-2*LogLike) |
|
1 group, Constant P |
150.54 |
0.00 |
0.5503 |
0.3028 |
2 |
146.54 |
|
1 group, Survey-specific P |
152.14 |
1.60 |
0.2473 |
0.1361 |
4 |
144.14 |
|
psi(.),p(habitat) |
152.54 |
2.00 |
0.2024 |
0.1114 |
3 |
146.54 |
Our model with 55% of the model weight was the constant detection with no covariates. We present this below:
Predefined
Model: Detection probabilities are NOT time-specific
Number of
groups = 1
Number of
sites = 43
Number of
sampling occasions = 3
Number of
missing observations = 0
Number of
parameters = 2
-2log(likelihood) = 146.541815
AIC = 150.541815
Naive
estimate = 0.744186
Proportion
of sites occupied (Psi) = 0.7518 (0.067385) —not
occupancy, but rather our relative species richness estimate
Probability
of group membership (Theta) = 1.0000
Detection probabilities (p):
grp srvy p se(p)
--- ---- ---------
-----------
1 1 0.783650
( 0.044349)
Variance-Covariance Matrix
psi p(G1)
0.0045 -0.0002
-0.0002 0.0020
The conclusions from this model are not the same as when we are analyzing occupancy in the common sense. The reader will see above that in this model the proportion of sites occupied (ψ) represents our estimate of the relative proportion of species found on our sites during a particular season (in this case 2001). We observed 31 species in 2001 and our total count of species during the study was 43, therefore relative species richness estimate (0.7518+0.067) was close to our naïve estimate of 0.744.
Box 14.4. Multi-season Species Richness
In the following example we extend the community analysis to a multi-season occupancy model where we are investigating changes in species richness over seasons. This data set has only 2 seasons therefore it is somewhat limited in interpretation. Again we are expanding the community data from Box 14.2 to include years 2001-2002. Therefore we have 6 surveys over our sites among the 2 years with a habitat association covariate. We used program PRESENCE to evaluate the data essentially using the same format we did back in Box 7.6. We present the overall model selection and the output from the highest ranking model below. By now the reader should be very familiar with format of program presence and creation here of the data file simply contains 6 columns of survey data, but with our data browser we simply divide the 6 surveys into two Primary Sampling Periods.
|
Model |
AIC |
delta AIC |
AIC wgt |
Model Likelihood |
No.Par. |
(-2*LogLike) |
|
psi,gamma(),eps(),p() |
278.96 |
0.00 |
0.8025 |
1.0 |
4 |
270.96 |
|
psi(.),gam(.),eps=1-gam,p() |
281.83 |
2.87 |
0.1911 |
0.2381 |
3 |
275.83 |
|
psi(),gamma(),p() |
288.60 |
9.64 |
0.0065 |
0.0081 |
3 |
282.60 |
Open
Population Model:
Number of
sites = 43
Total number
of sampling occasions = 6
Number of
primary sampling periods = 2
Number of
missing observations = 0
Number of
parameters = 4
****
Numerical convergence was not reached.
Parameter estimates converged to approximately
5.30 significant digits.
Number of
function calls = 218
Final
function value = 135.477956
-2log(likelihood) = 270.955911
AIC = 278.955911
Model has
been fit using the logistic link.
Untransformed
Estimates of coefficients for covariates (Beta's)
==============================================================================
estimate std.error
A1 :occupancy psi1 0.157662
(0.308803)
B1 :colonization gam1 25.202574
(70501.883479)
C1 :local extinction eps1 -1.627110
(0.578594)
D1 :detection P[1-1] 1.297856 (0.184530)
Variance-Covariance
Matrix of Untransformed estimates:
A1 B1 C1 D1
A1
0.095359 -0.003316 -0.001237
-0.001467
B1
-0.003316 4970515574.088944
-0.001708 -0.000178
C1
-0.001237 -0.001708 0.334771
0.005731
D1
-0.001467 -0.000178 0.005731
0.034051
------------------------------
Individual Site estimates of Psi:
Site Survey Psi
Std.err 95% conf. interval
1 American Crow 1
1-1: 0.5393 0.0767
0.3899 - 0.6820
Individual Site estimates of Gamma:
Site Survey Gamma
Std.err 95% conf. interval
1 American Crow 1
1-1: 1.0000 0.0000
0.0000 - 1.0000
Individual Site estimates of Eps:
Site Season Eps
Std.err 95% conf. interval
1 American Crow 1
: 0.1642 0.0794
0.0595 - 0.3792
Individual Site estimates of p:
Site Survey p
Std.err 95% conf. interval
1 American Crow 1
1-1: 0.7855 0.0311
0.7183 - 0.8402
I
this case the top two models are within 3 AIC units and in both cases what our
model is showing us is that the model with the greatest weight includes
colonization. When we consider the
constructs of the data and our previous analysis of the first year of data,
even a cursory examination of the data shows that the songbird community
increased in number from 2001 to 2002.
In this particular data set it might simply mean that 2002 was a better
year than 2001, but it might also mean that the management we did, that is
planting of longleaf pine trees in old agricultural fields, may attract more
species as we have greater development of the plant community. Unfortunately, we have only the 2 years of
data or else we might be able to tease apart some of these important
issues.
Box 14.5. Community estimation using species diversity among
sites
In this example we have a summary of 3 surveys for each longleaf pine quadrat surveyed during 2001 in South Georgia. Each column represents one of the sampled quadrats and detection (AC-C1) here represents the species being detected during any of the 3 surveys over the breeding seasons. This model is approaching the question of community diversity from a slightly different perspective from the previous occupancy models on the same data set. We here are asking questions relative to diversity of species over our study sites.
|
|
|
|
|
|
|
|
|
Study site |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Species |
C1 |
C2 |
F1 |
F2 |
G1 |
GM1 |
H1 |
H2 |
HA1 |
HA2 |
HO1 |
HO2 |
HO3 |
L1 |
L2 |
M1 |
M2 |
ME1 |
MM1 |
MM2 |
ETC |
|
American Crow |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
|
American Goldfinch |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
|
Barn Swallow |
1 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
1 |
1 |
0 |
0 |
|
|
Blue Grosbeak |
1 |
1 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
0 |
1 |
1 |
1 |
1 |
0 |
1 |
0 |
|
|
cont |
1 |
1 |
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In this analysis we could use programs CAPTURE OR SPECRICH2. In either case we would use model Mh as the appropriate model analysis for our data which assumes capture heterogeneity to estimate species richness for all of our sample quadrats. The sample dataset with 41 survey sites and 33 species is outlined for program CAPTURE as follows:
title='llp
line 2001 by site '
task read captures
occasions=41 x matrix
format='(a2,1x,41f1.0)'
read input
data
10
00000000000000000000000000000000000010000
11
00000000010000000000000000000000000000000
12
00000000000000000000000100000000000000000
13
00000000000001000000000000000000000000000
14 00000010000000000000000000000000000000000
15
00000000000000000000001000000000000000000
16
00000000010000000000000000000000000000000
17
00000000000000000000000000000000000000010
18
00000000000010000100000000000000000000000
19 00100000100000000000000000000000000000000
20
00001100000010000000000000000000000000000
21
00000100010000000000000000000010000000000
22
00010000000000000000000000000010000011000
23
00000000100000000000000000100000110000000
24
00000000000000001111000000000000001000000
25
00000010000000000000000000001011000011000
26
00010111000100000000000000000000000000001
27
00001100000010010010000000000000101000000
28
01001110001000010000000000100100010000000
29
00001110010000101100000001000000100000000
30
00000000011100000011000100000000111010000
31
00001000001000000100000001101001000001110
32
10001110000001001100011000001000100001000
33
10000110000100010100110000010000001010100
34
10001110110000110001100000000110000000001
35
01100010101100100000000010001110001000110
36 01011100110000010000000001100111010000010
37
01010110101010101110000001100110111010010
38
00001101110100101101111111100001000000101
39
00110000111001101000111111111110000110101
40
10101111001010110110111111100110011000111
41
11011111111001111010111111111101110011111
task closure
test
task model
selection
task
population estimate JACKKNIFE
Population estimation with variable probability of capture by animal. See model M(h) of the Monograph for details. Number of trapping occasions was 41 Number of animals captured, M(t+1), was 32 Total number of captures, n., was 275 Frequencies of capture, f(i) i= 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 f(i)= 8 2 2 2 1 2 1 0 2 2 0 2 1 2 0 0 0 0 0 2 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Computed jackknife coefficients N(1) N(2) N(3) N(4) N(5) 1 2.000 3.000 4.000 5.000 6.000 2 1.000 0.000 -2.000 -5.000 -9.000 3 1.000 1.000 2.000 5.000 11.000 4 1.000 1.000 1.000 0.000 -4.000 5 1.000 1.000 1.000 1.000 2.000 The results of the jackknife computations i N(i) SE(i) .95 Conf. Limits Test of N(i+1) vs. N(i)
0 32 Chi-square (1 d.f.)
1 40.0 4.00 32.2 47.8 3.930 2 46.0 6.93 32.4 59.6 2.067 3 52.0 10.58 31.3 72.7 0.777 4 58.0 16.12 26.4 89.6 0.168 5 63.0 26.12 11.8 114.2 0.000 Average p-hat = 0.1677 Interpolated population estimate is 40 with standard error 6.9820
Approximate 95 percent confidence interval 34 to 66 estimate: 40.2885056 se: 6.98204231 Histogram of f(i) Frequency 8 2 2 2 1 2 1 0 2 2 0 2 1 -------------------------------------------------------------------------------- 8 * 7 * 6 * 5 * 4 * 3 * 2 * * * * * * * * 1 * * * * * * * * * * * --------------------------------------------------------------------------------
This model suggests that there is an estimated 40 (34-66 95% CI) species on our study sites.
Box 14.6. Community analysis by site using PRESENCE
Occupancy analysis using the same data as in Box 14.5. However, in this example we included all species detected during the study therefore estimates here are of relative species richness. In Presence we ran 3 models including constant detection, variable detection (in this case by site), and constant detection with a habitat covariable for each species. Our model table output from PRESENCE 2.0 resulted in the following.
|
Model |
AIC |
delta AIC |
AIC wgt |
Model Likelihood |
No.Par. |
(-2*LogLike) |
|
1 group, Constant P |
1400.14 |
0.00 |
0.7311 |
0.5344 |
2 |
1396.141405 |
|
psi(.),p(habitat) |
1402.14 |
2.00 |
0.2689 |
0.1966 |
3 |
1396.1414 |
|
1 group, Survey-specific P |
1433.93 |
33.79 |
0.0000 |
0.0000 |
4 |
1349.928371 |
We now provide the output from the model with the lowest AIC to demonstrate the results and interpretation.
Predefined
Model: Detection probabilities are NOT time-specific
Number of
groups = 1
Number of
sites = 43
Number of
sampling occasions = 41
Number of
missing observations = 0
Number of
parameters = 2
-2log(likelihood) = 1396.141405
AIC = 1400.141405
Naive
estimate = 0.744186
Proportion
of sites occupied (Psi) = 0.7442 (0.066549)
Probability
of group membership (Theta) = 1.0000
Detection probabilities (p):
grp srvy p se(p)
--- ---- ---------
-----------
1 1 0.209590
( 0.011524)
Variance-Covariance Matrix
psi p(G1)
0.0044 -0.0000
-0.0000 0.0001
The
output here provides us with 2 estimates.
The first is what is described as detection probability. In this analysis it actually represents the
number of species found on each survey site—in this case 0.210+0.012 or
about 9 species on any particular site.
Our model of survey specific detection (in this case specific to survey
site) ranked last and had a model weight of 0.0, therefore suggesting that
species richness did not vary among sites.
Our naïve estimate of relative species richness was 0.744 which is not
much different from our predicted estimate of relative species richness of
0.744+0.067. These results
suggest that over 41 study sites used in this analysis we are likely observing
all of the species that were present on the site in 2001, which represents
about 74.4% of the species richness we observed in the study.