Using Python to Create Input Files
Shaping raw data into the appropriate input file for analysis via MARK,
CAPTURE, and other programs can be time consuming and fussy. It is often
useful to write a short program to do this automatically, especially when
more than one (or one very complex) input file is required. Generically,
this function is called data parsing. Several freely available programming
languages can be used to perform data parsing, including:
- Awk: A special-purpose
language for data formatting.
- R: A public license clone
of S-Plus. Can manipulate data, analyse, and chart all in one place.
- Python: An object-oriented programming
language, easy to use, with lots of useful modules for database access, numerical
methods, etc.
- Perl: A scripting language, similar
and older than Python. Powerful, but hard to learn.
- Ruby: A very new language,
also similar to Python.
All of these programs run on almost every computer platform available. I
will illustrate how to set up and use Python, as an example.
The two most common desktop platforms Linux and Windows. If you a re a Windows
user, you need to grab a couple of installation packages to get started:
- A current release of Python itself (
download)
- Windows extensions, for database access (
download)
You can install these programs in-place, by telling Windows to run the install
file, rather than save them to disk.
After installation, in your programs menu, you should see a folder containing links to all
of the Python executables, including a workplace shell, called "pythonwin".
This provides an interactive window, plus a text editor for writing and debugging
files.
Here are two sample scripts showing how to construct input files from data
in a relational database. In this case, MySQL was used, but the methods for
other databases, such as Access, Postgresql and Oracle, are similar:
- Nesting success: Input file for Program MARK
in encounter history format. Contains records for Warbling Vireos in Southern
B.C., ath three different study sites.
- Band recovery: Input file using the Brownie
parameterization for analysis by Program MARK. Uses command-line arguments
to select appropriate records.
After you have installed Python and downloaded one of these sample files,
you should be able to right-click a file in your file manager, and select
"edit". This will bring up the program in the python development environment. Feel free to modify, use and redistribute these files as you like.
It is always handy to have a command reference nearby. The help files for
Python are excellent, and the manual inculdes a tutorial. There you will also find references for the syntax to connect to your favorite Windows database using ODBC. If you need help setting up ODBC data sources on your computer, send me an email, and I will give you a hand.
Last updated 08 December 2008