In this lab you will become familiar with the joys of the WSFR network (if you aren't already). Mainly, you need 1) have a userid and password, to allow access to certain network programs; 2) know how to access and copy files from central network locations (a class directory) to your local directory (directory= 'folder' in windows jargon), 3) a few more odds and ends that will become evident as we proceed.

In most cases, we will start labs by doing what we're doing now- accessing a web page. In many cases the programs that we will run will be imbedded as files accessed from the web page, simply by clicking on a hilited link. E.g., clicking on runme.xls should open Microsoft Excel and run the previously-named file. In other cases we will want to save files (often these will be in self-extracting archives) and then open these using programs such as R, MARK, or other programs.  In general, it is in fact usually better to save even Excel files and open these directly in Excel, rather than using the web browser to open the program.  The general procedure is 1) right click on the link, 2) save the indicated file to a network or local drive, 3) run the necessary program from the program task bar in Windows.

For this lab we will use two commonly-available programs, Microsoft Excel and R.  There are a number of reasons why we will use Excel to do many of the tasks in lab: 1) it's available, 2) I know it, and you may already know it, 3) it's easy and transparent, 4) has many built in functions and graphics features, and 5) it does a whole lot more than I ever thought it would. This is not to say that Excel is perfect or ideal-- it has many limitations, and is extremely inefficient for very large problems. It is no substitute for real computer languages or statistical software in a research environment. For example, I would not recommend using the random number generator for a serious research application (it is fine for teaching purposes). However, it gives me a pretty convenient 'electronic blackboard' for getting across many of the principles in class, and it encourages (requires?) you to be more engaged in thinking about what you are doing, as opposed to cookbook, plug-and-chug modeling and statistical software.

On the other hand, Excel, being a spreadsheet program, forces one into a 2-dimensional view of modeling that is really very limiting. In addition, most models that we will use are iterative in nature: that is the processes involved, such as birth and death, occur over and over again, although the values may change. This makes the computational device known as a "do loop" a natural for population modeling. Some of you already know much more about this than I do. For those of you who don't, we're going to use R, which is an easy-to-use (and free, open-source) programming language, to do the lifting. However, if you understand the basic approaches that we use in  R you will be in a very good position to apply that knowledge in more advanced languages such as BASIC or C, or for that matter  Python or SAS (which I formerly used for these labs).

We're going to use both approaches to solve 2 problems that we will commonly encounter in this course: 1) dynamic population growth, and 2) finding the maximum (or minimum) of a function.

I. SIMPLE POPULATION GROWTH

First read the brief discussion of Population growth models; then return here for the Excel and Python exercises.

Coding in Excel

First a few Excel 'rules'.

'A1', 'B2', etc. refer to column/row labels, and will be used to keep track of some of our numbers; for instance these can be used as variable names in an equation. These are known as 'relative cell references', and will change if you copy or move cells from one location to another. So for example if you refer to 'A4' in an equation, and then copy the contents of that cell to another location, (say 'P9') the equation will now refer to the new location. Another type of reference is an 'absolute' reference, denoted by '$A$1', '$B$2' etc. Here the reference stays fixed to the original cell location regardless of copying, etc. I find relative cell references handy when I am trying to repeat a calculation in different rows or columns, as is common in iterative procedures like our modeling. On the other hand if you always want the reference to stay put to the same cell, use the absolute cell reference. This might include cases where the cell takes on a fixed value such as parameter in a model; however, in these cases I prefer named variables. I'll explain how these work in the spreadsheet example.

Ok, go to the spreadsheet models.xls, and look in column 'A'. Keep your eye on the equation box (after the '=' sign). In the top row, nothing is happening: just labels for time and abundance. Go down to A2. This just contains the first time period, which is zero. In A3 things are more interesting: '=A2+1'. This expression tells the program (yes, it's a program) to 'take the contents of cell A2 and add 1'. This repeats all the way down the column to time 21 (we could have kept going but my mouse finger ran out). Now, this is sort of a model, but a trivial one: we are just counting by ones. I could have typed all these numbers in by hand, but by copying cell A3 to the other cells, I saved myself the work. I used some of the time saved to continue to work on the novel I am writing. Now go to column B. Notice in cell B2 that '=n_0' appears in the equation box, but '100' appears in the cell. How did this happen? Check out columns G and H. Here I used named variables. What I did was go to H2 and H3, go to the menu toolbar and hit 'insert' , then 'name' , and then the names that appear in column G (these are both labeled as 'parameters' even though n_0 is really an initial condition, not a parameter). Now whenever I used 'r_max' or 'n_0' in an equation, the equation will pick up the values in H2 and H3, which are 0.1 and 100 now, but could be anything I change them to. Pretty slick, huh? Now lets see how we used this. Go to B3. The equation box says 'B2*(1+r_max)' This will take my previous value of B2 (100) and multiply it by (1+r_max)=1.10. The result is 110, printed in B3. In B4 we have 'B3*(1+r_max)', etc. Note that this is exactly the same process as expressed in

equation (1), that is

N(t+1) = (1 + r ) N(t)

is just coded as

Bn+1=Bn*(1+r_max)

where n is the row number (corresponding to time).

Coding in R

First, you will need to save and extract the archive containing the R code needed to run our example models. Save these files to a convenient place, like the C:\temp directory.

Programming this model in R is very simple, if you keep a few basic concepts in mind. These are illustrated in program file popmod.R.  First open R, and then import the file popmod.R    First, the program initialized population size (N) at 100 ( N<-100); note that the left arrow is used to assign values to a variable.  We specify years <-20 for 20 years of growth, and a constant growth rate of r<-0.10.  The next statement is a “do” or “for” loop that updates population size in year i  (for 1 to 20 years) by multiplying by the constant (r+1); note that the  expression N[i+1]<-N[i]*(r+1) is in curly brackets, indicating at it repeats as long as i <20.  This is equivalent to the algebraic model

Nt+1 = (1+rmax)Nt

  We can specify any number of years to run the model, but let’s be reasonable and run it for 20 years.  The basic model

is done by appending the value of Nt+1 each year. Once the loop is completed, the N data are put into a matrix and plotted over time, creating a nice, smooth plot of abundance over time. The program also prints values of abundance at each year but this can be suppressed (simply by deleting or commenting out the print line). 

The very same model has also been programmed in Python.  Either program can be summarized by the following flowchart:

II. FINDING THE MAXIMUM OF A FUNCTION

First read about how to find the maximum (or minimum) of a function. Then we will use Excel and Python to try these different approaches. For much more technical detail on this issue, see Appendix H in Williams et al. 2002.

The Excel file function.xls contains several spreadsheets which illustrate the above approaches. In the first worksheet "function" the funtion and its derivative are displayed and plotted. Note that although the example is for a quadratic function, the parameter coefficients can be changed to create up to a fourth-order polynomial. The second worksheet ilustrates the computation of a numerical derivative, and the third shows how the "solver" function in Excel can be used to obtain an optimum. Finally, the last spreadsheet shows how the Newton-Raphson method can be used to solve for the root(s) of the equation

g(x)=0
which,when g(x) = f(x) can be used to find the maximum or minimum of f(x). The equivalent computations (except for Solver) are found in the R files function.R, deriv.R, and newton.R. The function is a 4th-order polynomial (intercept plus x, x**2, x**3, and x**4): y=b0+b1X+b2X**2+b3X**3+b4X*4, but the specific case is a quadratic equation (2nd-order) because b3 and b4 are set to zero. The program function.R computes and plots the function and the first derivative with respect to X, over a selected range in X. Note that the derivative plot is superimposed on the plot dy/dx=0, showing where the derivative crosses the zero line, in this case indicating a maximum or minimum. Compare the value of X where dy/dx=0 with the plot of the function to see that the value of X that maximizes (minimizes) the function also is where dy/dx is zero. The program deriv.R shows that the derivative a function can be computed 2 ways (where it exists). The 'exact' method involves calculus and is the same method used in function.R. The 'numerical' method uses the definition of a derivative, which is the limit as h approaches zero of [f(x+h)-f(x)]/h. The 'deriv' definition in this program computes this expression using a 'small' value of h; try printing and plotting derivatives by both methods to see that this approximation is quite close. In practice, numerical derivatives are often used computationally, even when a ready formula exists for the exact derivative. Try experimenting with different values for the function coeffiencts (b0-b4). See if you can create a more "interesting" curve that still appears to have a maximum. See if you can find the maximum! The newton.R program implements the Newton-Raphson method in Python and is a nice simple example of a numerical optimization routine. Generalizations of this very simple method are the backbone of maximum likelihood estimation and decision theory, among other areas. Try a variety of starting values close to or far from the optimum, which is around 8 for the values of the quadratic I selected. Note that it is possible to make the method crash if you get too crazy with starting values. There are ways around this, but it is worth keeping in mind that numerical procedures can be 'defeated' by strange models, data, or bad starting values.

Now it's your turn-- see Assignment 1 for the first homework assigment. Due by beginning of next lab. Send a 1-2 page write up accompanied by any programs or other methods used to derive solutions to instructor via email.


Send mail to Instructor Return to home page

Last updated 08 December 2008

Powered by Zope