Figure9.7

 

 

 

CHAPTER 14

 

ESTIMATION OF SPECIES RICHNESS AND OTHER COMMUNITY PARAMETERS

 

WORKED EXAMPLES

 

 

 

Box 14.1 Analysis of Community Richness Estimates Using CAPTURE

 

Here we revisit part of the data from our research examples on presence-absence analysis from Chapter 8.  This data set represents surveys of songbirds on former agricultural fields that were being converted to longleaf pines through a government program designed to take crop fields out of production and put them back into one of the formerly widespread ecosystems in the region.  We completed three surveys (sampling periods) of each of 41 fields (quadrats) from southern Georgia during 2001.  Our individual surveys on each quadrat were completed using a 250 m transect thereby standardizing effort.  In this study we detected 43 species over 2 years, most of which are classified as grassland or shrub-scrub nesters. 

 

This spreadsheet shows the data construct for all the longleaf pine fields we surveyed in southern Georgia over 3 survey periods.  The columns represent each survey (or sampling period) undertaken during the nesting season of 2001. 

 

In our first analysis we are using the online version of the program CAPTURE from the Patuxent Website to undertake the community analysis.  We begin by copying our data from our data from the spreadsheet into some type of text editor to create a text file where the first two columns are the identifier, followed by one space, followed by the 3 data columns.  This is because the program cannot read spreadsheet formats—it needs the data in ASCII format.  The next step is to create the control lines for the program.  In this case we will use 4 command lines; title, task, format, and read input data.  The title line is obvious.  We are describing the number of surveys in the task line.  The format line tells the computer what format the data is in—a2 tells it that the identifier is in the first 2 columns, followed by one blank space, followed by 3 data numbers using one column each.  The read statement tells it to start reading the data.  Scroll down on the CAPTURE site to the data analysis box and either copy and paste the command information from below or type in manually.  Then copy and paste in the data from your text file.  Finally, copy and paste in the final task lines.  Now simply hit the run button.  The output file will then pop up on your browser. 

 

The task lines at the end tell the computer which models to run and other types of output to provide.  Fortunately, it is easy to simply copy and paste these into the data analysis box on the CAPTURE website. 

                                                                                                                                                                                          

The following analysis is the data submission of the previous data using program CAPTURE.  For this analysis we are assuming a priori that Model Mbh is correct.  We have truncated the output to demonstrate only the most important components. 

 

title='LLP Community line 2001'

task read captures occasions=3 x matrix

format='(a2,1x,3f1.0)'

read input data

10 001

11 010

12 010

13 100

14 001

15 110

16 110

17 101

18 110

19 011

20 011

21 011

22 011

23 011

24 110

25 111

26 111

27 111

28 111

29 111

30 111

31 111

32 111

33 111

34 111

35 111

36 111

37 111

38 111

39 111

40 111

41 111

task closure test

task model selection

task population estimate ALL

task population estimate NULL JACKKNIFE REMOVAL ZIPPEN MT-CH MH-CH MTH-CH

task population estimate APPROPRIATE

 

The following is the model selection criteria of Otis et al. (1978) for selecting the best model fit to the data. 

 

Model selection criteria.  Model selected has maximum value.
 
Model           M(o)       M(h)       M(b)       M(bh)     M(t)        M(th)      M(tb)      M(tbh)
Criteria         1.00        0.79        0.30        0.65        0.00         0.44        0.31        0.71
 

In this analysis we assumed a priori that Mbh would be the appropriate model, but we see that Mh is, therefore we present both below.

 

 

Model 1 Mbh

 
 Population estimation with variable probability removal estimator.
 See M(bh) or removal models of the Monograph for details.
 
Occasion           j=     1    2    3
 Total caught    M(j)=     0   23   30   32
 Newly caught    u(j)=    23    7    2
 
 
k     N-hat   SE(N)  Chi-sq.  Prob.   Estimated p-bar(j),j=1,..., 3
 
1     32.00   0.907   0.82   0.3640   0.744 0.744 0.744
 
 
 
 Population estimate is 32 with standard error 0.9068—this is not the population, but instead the estimated number of species
 
 Approximate 95 percent confidence interval32 to 32
 
 Profile likelihood interval 32 to 35
  Histogram of u(j)
 
 Frequency     23    7    2
 ------------------------------
 Each * equals    3 points
 
       24       *
       21       *
       18       *
       15       *
       12       *
        9       *
        6       *    *
        3       *    *    *
 ------------------------------

 

 

Model 2 Mh

 
 Population estimation with variable probability of capture by animal.
 See model M(h) of the Monograph for details.
 
 
 Number of trapping occasions was            3
 Number of animals captured, M(t+1), was    32
 Total number of captures, n., was          76
 
 Frequencies of capture, f(i)
    i=   1   2   3
 f(i)=   5  10  17
 
           Computed jackknife coefficients
 
           N(1)      N(2)      N(3)      N(4)      N(5)
      1  1.667     2.000     2.000
      2  1.000     0.833     0.833
      3  1.000     1.000     1.000
 
 
           The results of the jackknife computations
 
i N(i)    SE(i)   .95 Conf. Limits           Test of N(i+1) vs. N(i)
  0      32                                     Chi-square (1 d.f.)
  1      35.3      2.36      30.7      40.0       0.000
  2      35.3      2.93      29.6      41.1       0.000
  3      35.3      2.93      29.6      41.1       0.000
 
 Average p-hat = 0.7677
 
Interpolated population estimate is 33 with standard error     2.3892—this is not the population, but instead the estimated number of species
 
 Approximate 95 percent confidence interval 33 to 46
 estimate:  33.0906906 se:  2.38921213
 
 Histogram of f(i)
 
 Frequency      5   10   17
 ------------------------------
 Each * equals    2 points
 
       18                 *
       16                 *
       14                 *
       12                 *
       10            *    *
        8            *    *
        6       *    *    *
        4       *    *    *
        2       *    *    *
 ------------------------------
 

Looking back at our original data we see our naïve estimate of the total number of species is 23.  Our CAPTURE models suggest two estimates of species richness, Mbh provides an estimate of 32 (with a range of 32-33), whereas Mh gives us an estimate of 33 (range 33-46). 

 

It is quite apparent from the analysis is that we are reasonably close to the true species richness of the study area with three surveys; however, as we will see later this analysis includes only one year of data.  We will see that our species richness using two years of data suggests some other information.

 

Box 14.2. Estimation of Species Richness Using Count Data

 

In this example we are estimating species richness using empirical distribution of species abundance.  In this case we are using the same longleaf pine bird community data from 2001, except taking advantage of our count information.  Using the program SPECRICH and online analysis we observed the following distribution of counts:

 

Number of individuals observed in 3 surveys

 

 

1

2

3

4

5

Number of Species

 

 

 

 

5

1

2

1

4

K

N(JK)

SE(N(JK))

T(K)

P(K)

1

37.

3.1623

1.6787

0.0932

2

41.

5.4772

1.5395

0.1237

3

46.

8.3666

1.2282

0.2194

4

53.

13.0384

1.2328

0.2176

5

66.

22.0907

1.0000

1.0000

INTERPOLATED N =  37.0000

STD ERROR OF INTERPOLATED N = 3.1623

 

 

 

 

 

This output from SPECRICH suggested and estimated species richness of 37+3.16 SE species.  Compared to our results from CAPTURE (Box 14.1) where we observed an estimated 33+2.39 species with model Mh; we find that this methods suggests possible more species should be found in this community than we observed from either the CMR or occupancy models. 

 

 

Box 14.3. Assessing Community Richness using Occupancy Models

 

In this example we use the same data as in Box 14.1, except that we are not estimating community richness per se, but relative community richness following MacKenzie et al. (2006).  See the data layout and habitat associations of each species as presented at the Patuxent BBS Site in spreadsheet.  Therefore we include all species detected on the fields during the whole study as the available species in this ecosystem.  Since the bird community of this region is well described, we could have gone to the species list for the region and picked out all the potential candidate species based on known habitat requirements.  In this analysis any species not detected during the 3 surveys done in 2001 receive a detection history of “000”.  We then assessed 3 models, constant detection, detection varying by survey, and with habitat association of each species as a covariate (grassland, shrub-scrub, or other).  We used program PRESENCE to evaluate the data essentially using the same format we did back in Box 7.1. 

 

The highest ranking model was the simple constant detection model which is reported below.  In this situation our model suggests that we are not dealing with important detection differences among species (that is if they are not being detected it is due to not being their rather than being harder to find).  We are suggesting this because our habitat association covariate was not important—something we believed a priori to be linked with detection. 

 

Model

AIC

delta AIC

AIC wgt

Model Likelihood

No.Par.

(-2*LogLike)

1 group, Constant P

150.54

0.00

0.5503

0.3028

2

146.54

1 group, Survey-specific P

152.14

1.60

0.2473

0.1361

4

144.14

psi(.),p(habitat)

152.54

2.00

0.2024

0.1114

3

146.54

 

Our model with 55% of the model weight was the constant detection with no covariates.  We present this below:

 

 

Predefined Model: Detection probabilities are NOT time-specific

Number of groups               = 1

Number of sites                = 43

Number of sampling occasions   = 3

Number of missing observations = 0

 

Number of parameters           = 2

-2log(likelihood)              = 146.541815

AIC                            = 150.541815

Naive estimate                 = 0.744186

Proportion of sites occupied    (Psi)   = 0.7518 (0.067385) —not occupancy, but rather our relative species richness estimate

Probability of group membership (Theta) = 1.0000

Detection probabilities         (p):

  grp   srvy      p            se(p)

  ---   ----   ---------    -----------

   1      1    0.783650     ( 0.044349)

Variance-Covariance Matrix

   psi    p(G1)

 0.0045 -0.0002

-0.0002  0.0020

 

The conclusions from this model are not the same as when we are analyzing occupancy in the common sense.  The reader will see above that in this model the proportion of sites occupied (ψ) represents our estimate of the relative proportion of species found on our sites during a particular season (in this case 2001).  We observed 31 species in 2001 and our total count of species during the study was 43, therefore relative species richness estimate (0.7518+0.067) was close to our naïve estimate of 0.744. 

 

Box 14.4. Multi-season Species Richness

 

In the following example we extend the community analysis to a multi-season occupancy model where we are investigating changes in species richness over seasons.  This data set has only 2 seasons therefore it is somewhat limited in interpretation.  Again we are expanding the community data from Box 14.2 to include years 2001-2002.  Therefore we have 6 surveys over our sites among the 2 years with a habitat association covariate.  We used program PRESENCE to evaluate the data essentially using the same format we did back in Box 7.6.  We present the overall model selection and the output from the highest ranking model below.  By now the reader should be very familiar with format of program presence and creation here of the data file simply contains 6 columns of survey data, but with our data browser we simply divide the 6 surveys into two Primary Sampling Periods.

 

 

Model

AIC

delta AIC

AIC wgt

Model Likelihood

No.Par.

(-2*LogLike)

psi,gamma(),eps(),p()

278.96

0.00

0.8025

1.0

4

270.96

psi(.),gam(.),eps=1-gam,p()

281.83

2.87

0.1911

0.2381

3

275.83

psi(),gamma(),p()

288.60

9.64

0.0065

0.0081

3

282.60

 

 

Open Population Model:

 

Number of sites                    = 43

Total number of sampling occasions = 6

Number of primary sampling periods = 2

Number of missing observations     = 0

 

Number of parameters               = 4

 

**** Numerical convergence was not reached.

     Parameter estimates converged to approximately 5.30 significant digits.

 

Number of function calls           = 218

Final function value               = 135.477956

-2log(likelihood)                  = 270.955911

AIC                                = 278.955911

Model has been fit using the logistic link.

 

Untransformed Estimates of coefficients for covariates (Beta's)

==============================================================================

                                                          estimate   std.error

A1     :occupancy        psi1                               0.157662 (0.308803)

B1     :colonization     gam1                              25.202574 (70501.883479)

C1     :local extinction eps1                              -1.627110 (0.578594)

D1     :detection        P[1-1]                             1.297856 (0.184530)

 

Variance-Covariance Matrix of Untransformed estimates:

              A1         B1         C1         D1 

     A1    0.095359  -0.003316  -0.001237  -0.001467

     B1   -0.003316 4970515574.088944  -0.001708  -0.000178

     C1   -0.001237  -0.001708   0.334771   0.005731

     D1   -0.001467  -0.000178   0.005731   0.034051

------------------------------

 

   Individual Site estimates of Psi:

 

        Site         Survey         Psi    Std.err     95% conf. interval

     1 American Crow    1      1-1:     0.5393  0.0767     0.3899 - 0.6820

 

   Individual Site estimates of Gamma:

 

        Site         Survey       Gamma    Std.err     95% conf. interval

     1 American Crow    1      1-1:     1.0000  0.0000     0.0000 - 1.0000

 

   Individual Site estimates of Eps:

 

        Site         Season         Eps    Std.err     95% conf. interval

     1 American Crow    1         :     0.1642  0.0794     0.0595 - 0.3792

 

   Individual Site estimates of p:

 

        Site         Survey           p    Std.err     95% conf. interval

     1 American Crow    1      1-1:     0.7855  0.0311     0.7183 - 0.8402

 

I this case the top two models are within 3 AIC units and in both cases what our model is showing us is that the model with the greatest weight includes colonization.  When we consider the constructs of the data and our previous analysis of the first year of data, even a cursory examination of the data shows that the songbird community increased in number from 2001 to 2002.  In this particular data set it might simply mean that 2002 was a better year than 2001, but it might also mean that the management we did, that is planting of longleaf pine trees in old agricultural fields, may attract more species as we have greater development of the plant community.  Unfortunately, we have only the 2 years of data or else we might be able to tease apart some of these important issues. 

 

Box 14.5. Community estimation using species diversity among sites

 

 

In this example we have a summary of 3 surveys for each longleaf pine quadrat surveyed during 2001 in South Georgia.  Each column represents one of the sampled quadrats and detection (AC-C1) here represents the species being detected during any of the 3 surveys over the breeding seasons.  This model is approaching the question of community diversity from a slightly different perspective from the previous occupancy models on the same data set.  We here are asking questions relative to diversity of species over our study sites. 

 

 

 

 

 

 

 

 

 

Study site

 

 

 

 

 

 

 

 

 

 

 

 

 

Species

C1

C2

F1

F2

G1

GM1

H1

H2

HA1

HA2

HO1

HO2

HO3

L1

L2

M1

M2

ME1

MM1

MM2

ETC

American Crow

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

 

American Goldfinch

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

 

Barn Swallow

1

0

0

0

1

1

1

0

0

0

0

0

0

1

0

0

1

1

0

0

 

Blue Grosbeak

1

1

0

1

1

1

1

1

1

1

1

0

0

1

1

1

1

0

1

0

 

cont

1

1

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

In this analysis we could use programs CAPTURE OR SPECRICH2.  In either case we would use model Mh as the appropriate model analysis for our data which assumes capture heterogeneity to estimate species richness for all of our sample quadrats.  The sample dataset with 41 survey sites and 33 species is outlined for program CAPTURE as follows:

 

title='llp line 2001 by site '

task read captures occasions=41 x matrix

format='(a2,1x,41f1.0)'

read input data

10 00000000000000000000000000000000000010000

11 00000000010000000000000000000000000000000

12 00000000000000000000000100000000000000000

13 00000000000001000000000000000000000000000

14 00000010000000000000000000000000000000000

15 00000000000000000000001000000000000000000

16 00000000010000000000000000000000000000000

17 00000000000000000000000000000000000000010

18 00000000000010000100000000000000000000000

19 00100000100000000000000000000000000000000

20 00001100000010000000000000000000000000000

21 00000100010000000000000000000010000000000

22 00010000000000000000000000000010000011000

23 00000000100000000000000000100000110000000

24 00000000000000001111000000000000001000000

25 00000010000000000000000000001011000011000

26 00010111000100000000000000000000000000001

27 00001100000010010010000000000000101000000

28 01001110001000010000000000100100010000000

29 00001110010000101100000001000000100000000

30 00000000011100000011000100000000111010000

31 00001000001000000100000001101001000001110

32 10001110000001001100011000001000100001000

33 10000110000100010100110000010000001010100

34 10001110110000110001100000000110000000001

35 01100010101100100000000010001110001000110

36 01011100110000010000000001100111010000010

37 01010110101010101110000001100110111010010

38 00001101110100101101111111100001000000101

39 00110000111001101000111111111110000110101

40 10101111001010110110111111100110011000111

41 11011111111001111010111111111101110011111

task closure test

task model selection

task population estimate JACKKNIFE

 

 

 

 Population estimation with variable probability of capture by animal.
 See model M(h) of the Monograph for details.
 
 
 Number of trapping occasions was           41
 Number of animals captured, M(t+1), was    32
 Total number of captures, n., was         275
 
 Frequencies of capture, f(i)
    i=   1   2   3   4   5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
       23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
 f(i)=   8   2   2   2   1  2  1  0  2  2  0  2  1  2  0  0  0  0  0  2  0  0
        1  0  0  1  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0
 
           Computed jackknife coefficients
 
           N(1)      N(2)      N(3)      N(4)      N(5)
      1  2.000     3.000     4.000     5.000     6.000
      2  1.000     0.000    -2.000    -5.000    -9.000
      3  1.000     1.000     2.000     5.000    11.000
      4  1.000     1.000     1.000     0.000    -4.000
      5  1.000     1.000     1.000     1.000     2.000
 
 
           The results of the jackknife computations
 
   i       N(i)    SE(i)     .95 Conf. Limits      Test of N(i+1) vs. N(i)
   0      32                                         Chi-square (1 d.f.)
   1      40.0      4.00      32.2      47.8                       3.930
   2      46.0      6.93      32.4      59.6                       2.067
   3      52.0     10.58      31.3      72.7                       0.777
   4      58.0     16.12      26.4      89.6                       0.168
   5      63.0     26.12      11.8     114.2                       0.000
 
 Average p-hat = 0.1677
 
 
 Interpolated population estimate is         40 with standard error     6.9820
 
 Approximate 95 percent confidence interval        34 to         66
 estimate:  40.2885056 se:  6.98204231
 
 
 
 Histogram of f(i)
 
 
 Frequency      8    2    2    2    1    2    1    0    2    2    0    2    1
 --------------------------------------------------------------------------------
        8       *
        7       *
        6       *
        5       *
        4       *
        3       *
        2       *    *    *    *         *              *    *         *
        1       *    *    *    *    *    *    *         *    *         *    *
 --------------------------------------------------------------------------------

 

This model suggests that there is an estimated 40 (34-66 95% CI) species on our study sites. 

 

Box 14.6.  Community analysis by site using PRESENCE

 

Occupancy analysis using the same data as in Box 14.5.  However, in this example we included all species detected during the study therefore estimates here are of relative species richness.  In Presence we ran 3 models including constant detection, variable detection (in this case by site), and constant detection with a habitat covariable for each species.  Our model table output from PRESENCE 2.0 resulted in the following. 

 

 

Model

AIC

delta AIC

AIC wgt

Model Likelihood

No.Par.

(-2*LogLike)

1 group, Constant P

1400.14

0.00

0.7311

0.5344

2

1396.141405

psi(.),p(habitat)

1402.14

2.00

0.2689

0.1966

3

1396.1414

1 group, Survey-specific P

1433.93

33.79

0.0000

0.0000

4

1349.928371

 

We now provide the output from the model with the lowest AIC to demonstrate the results and interpretation.

 

Predefined Model: Detection probabilities are NOT time-specific

 

Number of groups               = 1

Number of sites                = 43

Number of sampling occasions   = 41

Number of missing observations = 0

 

Number of parameters           = 2

-2log(likelihood)              = 1396.141405

AIC                            = 1400.141405

Naive estimate                 = 0.744186

 

Proportion of sites occupied    (Psi)   = 0.7442 (0.066549)

Probability of group membership (Theta) = 1.0000

Detection probabilities         (p):

  grp   srvy      p            se(p)

  ---   ----   ---------    -----------

   1      1    0.209590     ( 0.011524)

 

Variance-Covariance Matrix

   psi    p(G1)

 0.0044 -0.0000

-0.0000  0.0001

 

 

 

 

The output here provides us with 2 estimates.  The first is what is described as detection probability.  In this analysis it actually represents the number of species found on each survey site—in this case 0.210+0.012 or about 9 species on any particular site.  Our model of survey specific detection (in this case specific to survey site) ranked last and had a model weight of 0.0, therefore suggesting that species richness did not vary among sites.  Our naïve estimate of relative species richness was 0.744 which is not much different from our predicted estimate of relative species richness of 0.744+0.067.  These results suggest that over 41 study sites used in this analysis we are likely observing all of the species that were present on the site in 2001, which represents about 74.4% of the species richness we observed in the study.

Powered by Zope