WORKSHOP: SCIENTIFIC PROGRAMMING AND DATA ANALYSIS USING PYTHON AND PYMC

0900-1600 1 MARCH 2007 B-21 MATH AND STATISTICS

Michael J. Conroy, USGS/ University of Georgia, Athens, Georgia, USA Contact me at: mconroy@uga.edu

Description

Python is a free, open-source, object-oriented programming language containing several attractive features: it is powerful, yet easy to use; interactive; extensible (i.e., capable of interacting with other languages); and it runs on virtually any modern computing platform. Python has available to it a large number of third-party modules, in particular SciPy, which contains sub-modules for plotting, statistical distribution functions, numerical optimization, integration, matrix functions, and many other features in common with MATLAB and similar programs.

PYMC is a Python module written by Chris Fonnesbeck for performing MCMC using the Metropolis-Hastings algorithm.  To date, PyMC has been used for reasonably complex data structures, including capture-recapture, recovery, and occupancy data, with models including hierarchical structure, random effects, and other complexities. Because PyMC inherits all the object-oriented and other features of Python, it may offer advantages to obvious alternatives such as WINBUGS.  An additional feature of Python is an interactive debugger, which facilitates rapid diagnoses of errors. 

I will first give a brief introduction to Python, with emphasis on using Python to manipulate data (including string parsing for input/output and access to databases such as MySQL) and construct ecological models, including data simulation and interaction with other programs (e.g., MARK).  I will then demonstrate the application of PyMC to MCMC estimation, starting first with simple examples, with problems set up in WINBUGS for comparison where possible. I will then introduce more complex problems involving some variation of reading data from online sources,  random effects modeling, and capture-recapture and occupancy modeling.

TENTATIVE SCHEDULE

0900-1200 Introduction to Python

Overview of Python (Python code for examples;(snorkel data,(watershed data )

  • Basics of Python- Python objects/ types and basic operations
  • Getting data into Python
  • Loops and controls
  • Functions and classes
  • Object oriented programming

    Some modeling and statistical examples

  • Matrix population modeling
  • Stochatic simualation of population growth
  • Least squares estimation
  • Formatting data for input into MARK. Converts text input (individual encounter observation) to file formatted for MARK
  • Parametric bootstrapping with Python and MARK

    1200-1330 Lunch

    1330-1600 PyMC

  • A very quick introduction to Bayesian estimation and MCMC
  • Simple examples (binomial likelihood ; simple random effects models )
  • More complex examples:
  • At least one “challenging” example (with no promises!)

    Format will be a mixture of lecture, discussion, and hands on. We will have a lab available (B-21 in Mathematics & Statistics), but participants may bring laptops if they wish. A basic write up and links to Python and PyMC windows installers are available, below, and participants are encouraged check out the material and install and run the software ahead of time, particularly if they chose the laptop option.

    Background reading

    Scientific Programming using Python (notes by Chris Fonnesbeck)

    Other references

    Numpy/Scipy for MATLAB users

    How to Think Like a Computer Scientist: Learning with Python (Downey, Elkner, Meyers)

    Dive Into Python (Pilgrim)

    Python.org website

    Beazly, D. M. 2001. Python Essential Reference. 2nd edition. New Riders. San Francisco, USA.(may be later editions available).

    Software downloads

    Enthought Python

    PyMC Windows installer

    PyMC Source Code

    PyMC 1.1 for PPC Mac Installer (bundled with other python modules)

    PyMC 1.1 for Intel Mac Installer (bundled with other python modules)

    PyMC User’s guide