1994 National Avian-Wind Power Planning Meeting Proceedings
Population Models: Their Use and Misuse
by
Kenneth Wilson, Colorado State University
In this paper, I will briefly cover some of the potential uses and abuses of population models. I will begin by defining models as they will be discussed here, with a discussion of their connection to reality. This is followed by a discussion of the types of population models used in ecology and a general approach to modeling. Finally, I will address how population models can be used and abused, focusing on assumptions, bias, precision, model selection, validation, and sensitivity.
1. What Is A Model?
"Model" is defined in Webster's dictionary as "a system of postulates, data, and inferences presented as a mathematical description of an entity or state of affairs" (Woolf 1975). Population models are, at best, approximations or abstractions of reality, and, thus by defini-tion, all models are wrong. In science we continually strive to understand the truth about our surroundings. Yet, despite this effort, truth is unobtainable. The philosophy of science, and the experimental methods that scientists use are well studied (cf. Goldstein and Gold-stein 1978; Manly 1992). One notion of this philosophy is that theory and practice play an impor-tant part in the learning that can be achieved (Box 1976). There is a feedback loop between facts (which come from reality or truth) gathered from nature and deduc-tions made from testing hypotheses based on theories about these facts. But there is always some lack of fit of these facts to new theories and this leads to the induction of new hypoth-eses and test-ing, and the iteration continues. Theories are often represented as models, and, as such, models are often represented by mathematical formulations. The lack of fit of a model to "reality" is important in furthering our learning about the systems in question. The extent to which a model assists us in understanding "reality" is one measure of a model's worth.
2. Use of Models
All population models are used to make predictions. Caswell (1976) categorizes models into two classes based on their purposes:
(a) "models that are constructed primarily to provide accurate prediction of the behav-ior of a system, and
(b) models that, as scientific theories, are attempts to gain insight into how the system operates."
An example of (a) might be the prediction of the population size of steelhead, Salmo gaird-neri, within the Columbia river basin in the Pacific Northwest. A simple population model such as the logistic growth model (see below) could prove useful in meeting this objective. This type of information might be used by a state natural resource agency to establish fishing regulations. The information gained might be useful, but it would provide little understanding of the processes that determine the popula-tion at the next time step.
A more theoretical approach, (b) in Caswell's delineation, would include a model that attempts to explain why the population grows the way it does. For example, we may hypothesize that the population at some time in the future depends not only on the population dynamics, but also on genetic variability, environmental conditions, and interspecific interactions. The testing of hypotheses involving these factors might ultimately lead to a clearer understanding of the "why" behind population growth, persistence, or extinction. For example, the steelhead population model might be modified to incorporate genetic variability and interspecific competition. And further, we may wish to couple the model directly to a global climate change model in order to investigate potential effects on steelhead populations if global warming occurs. Levins (1968) has argued that it is impossible to simultaneously maximize generality, realism, and precision in a model. Hence, it is critical to understand that very general theoretical models will probably be useless for specific predictions (e.g., the population size of steelhead next year). It would be nice if the reverse were true, i.e., that coupling simple models together accurately predicts general properties, but this too is rarely the case. Often, our learning from very complex models comes from investigating why the model fails to represent "reality", rather than from the model's ability to "track" a certain set of data.
In addition, many population models are more statistical in nature, with the objective to estimate some parameter of the population such as population size or survival rate based on a specific sampling method (cf. Seber 1982; Thompson 1992). The specific population parameters may then be used in more complex models, or the estimates may be compared, spatially or temporally, to those from other populations. For example, the effect of hydroelectric dams and spillways on the Columbia River on the survival of fish has been extensively studied by estimating survival rates from tagging studies involving capture and recapture of marked fish at downstream dams (Burnham et al. 1987).
Population models can be deterministic or stochastic. An example of a simple deter-ministic model is the classic logistic growth equation:
where Nt is the population at time t, r is the intrinsic rate of population growth (birth rate - death rate), K is the carrying capacity or maximum number of individuals the environ-ment will support, and a is the constant of integration defining the position of the curve relative to the origin. Despite its simple nature, this model is useful for predicting the future population size of laboratory animals, insect populations, and humans.
Once the initial parameters are entered into this deterministic equa-tion, the out-come is always the same. This is an unrealistic result, because the exact size of the population at time t is uncertain. We know that birth and death rates vary from season to season and year to year depending on a variety of factors such as food, weather, the ability to find mates, etc.
By assuming that r arises from a stochastic process, we can create a sto-chas-tic model in which the population is not completely predictable at the next time step. For example, we might assume that the parameter r follows a normal distribution with a certain mean and variance, and let the population at the next time step vary accordingly. The model has now become more complex and arguably more realistic, but it still is not "truth". A further modification to the above model might include the addition of stochasticity to K, the carrying capacity, because the carrying capacity also may vary seasonally. The result is a more general and com-plex model. An important question then becomes, "Which model is appro-pri-ate?" The answer depends on the objectives behind creation of the model.
3. General Approach
Let us focus for a moment on a general approach to modeling some aspect of a population. One approach is to first choose a sufficiently general model such that any of the processes that might be considered can be included (Burnham et al. 1987:54). If our objective was to estimate the annual size of the steelhead population, we might argue that a general model should include birth, death, immigration, and emigration rates. In addition, we might argue that the rates should be allowed to vary by time and location. The specific model selected will be a special case of this general model--one that "best" fits the data associated with the specific experiment. In essence, we have just specified some of the assumptions that are necessary for our general model. More specific (simpler) models can be represented by tightening the assumptions of the general model. For example, a simpler model might assume that the rates do not vary by time and location. Recognition of these assumptions is critical to any modeling process.
A major abuse in modeling is the failure to state and understand the assumptions inherent in the model. Further, once the data are collected, there should be some attempt to evaluate these assumptions before the model is used. Failure to understand and consider model robustness (ability of models to perform when one or more model assumptions have failed) can lead to poor inference and unreliable results.
4. Model Selection
How is the "appropriate" model selected? The topic of model selection has been well covered (cf. Linhart and Zucchini 1986; Burnham and Anderson 1992). There are two undesirable extremes when selecting the correct model, namely choosing too simple a model (i.e., a model with too few parameters) and choosing too general a model (i.e., one with too many parameters). This tradeoff between under fitting and over fitting the model is known as the Principle of Parsimony (McCullagh and Nelder 1989), and can be viewed as a tradeoff between model bias and sampling variance. For example, as the number of parameters increases, bias decreases but sampling variance increases. The goal, then, is to find the optimal model that has biological meaning for the data at hand.
Before computers, the exercise of model selection was often independent of the data, and model selection was left up to the researcher (Burn-ham and Anderson 1992). More traditional thought put model selection in an hypothesis testing framework, where we might ask whether there is a significant difference in fit when the steelhead population model excludes versus includes time variation in the rates. Types of tests used include likelihood ratio tests (Mood et al. 1974:409). This method has limitations in that one model must be nested within another. For example, likelihood ratio tests can be used with multiple regression to choose a model with two parameters versus one parameter.
An alternative approach is to view model selection as an optimization problem over the set of candidate models from the general to the specific. The Akaike Information Criterion (AIC) can be used. This approach incorporates the idea of parsimony and uses information about model bias, along with a penalty for the number of parameters, to choose the appropriate model (Burnham and Anderson 1992; Anderson and Burnham 1994).
5. Model Validation
Regardless of the type of model constructed or how the model is used, some time should be spent on model validation. Oreskes et al. (1994) have argued that both model validation and model verification are impossible in natural systems. They argue that, at best, a model can be confirmed by demonstrating agreement between observations and predictions. Verification, which has often been used synonymously with validation, is the assertion of truth. Because we can never ascertain when reality has been obtained, they argue that this term is inappropriate. Validation, they argue, "denotes the establishment of legitimacy". Again, because we cannot ascertain reality, models can be internally valid, but they may not represent truth. There is never any certainty about reality; therefore, at best, we can confirm or reject model predictions. Still, the term "validation" is commonly used when discussing models.
Validation is quite different for a predictive model than for a theoretical model (Caswell 1976). In a predictive model, "truth" is not the main question; rather, validation involves determining when and where the model is a good predictor (Caswell 1976). In a theoretical model, the focus is on inference about truth (although that is unknowable), and validation should center on attempts to invalidate the theory (Caswell 1976).
6. Sensitivity Analysis
A complex model with many parameters will require a large number of data in order to estimate the numerous parameters with some degree of precision. Sensitivity analysis of a model can be thought of as "the intensity of response to error or change" for a given parameter (Innis 1979). A sensitivity analysis involves a systematic search for the model parameters to which the model is most sensitive. Sensitivity analysis may focus on changes to the parameters or the initial conditions. For example, a sensitivity analysis of our steelhead population model would allow us to determine which parameter(s)-birth, death, immigration, or emigration-are most sensitive to change over a specified range, and this result could be used to help design our field sampling effort. Because the cost of experiments is an important consideration, an initial sensitivity analysis can be useful in maximizing the benefit to cost ratio before gathering data for parameter estimation.
There is no one unique model for a specific situation. Even if two researchers start with the same data and population parameters as outlined in the steelhead model, their resulting models can take quite different forms. For example, one individual might incorporate the rates into the model assuming linear relationships, while the other might assume non-linear relationships. In fact, alternative models can be useful for validating and corroborating a model (Caswell 1976). If the same general outcomes are achieved from different models using the same data, then there will be greater confidence about the results. Ultimately, this still does not demonstrate that both models are "realistic" and, no matter what type of model is used, we must remember that development of population models is an iterative process with the goal of understanding our surroundings.
Anderson, D.R. and K.P. Burnham. 1994. AIC model selection in over dispersed capture-recapture data. Ecology 75:1780-1793.
Box, G.E.P. 1976. Science and statistics. J. Am. Stat. Assoc. 71:791-799.
Burnham, K.P., D.R. Anderson, G.C. White, C. Brownie and K.H. Pollock. 1987. Design and analysis methods for fish survival experiments based on release-recapture. Am. Fish. Soc. Monogr. 5. Bethesda, MD. 437 p.
Burnham, K.P. and D.R. Anderson. 1992. Data-based selection of an appropriate biological model: the key to modern data analysis. p. 16-30 In: D.R. McCullough and R.H. Barrett (eds.), Wildlife 2001: populations. Elsevier, London, U.K. 1163 p.
Caswell, H. 1976. The validation problem. p. 313-325 In: B.C. Patten (ed.), Simulation in ecology, vol. IV. Academic Press, New York.
Goldstein, M. and I.F. Goldstein. 1978. How we know: an exploration of the scientific process. Plenum, New York. 357 p. Innis, G. 1979. A spiral approach to ecosystem simulation, I. p. 211-386 In: G.S.
Innis and R.V. O'Neill (eds.), Systems analysis of eco-systems. Intern. Co-op. Publ., Fairland, MD.
Levins, R. 1968. Evolution in changing environments. Princeton Univ. Press, Princeton, NJ. 120 p.
Linhart, H. and W. Zucchini. 1986. Model selection. Wiley, New York. 301 p.
Manly, B.F.J. 1992. The design and analysis of research studies. Cam-bridge Univ. Press, Cambridge, U.K. 353 p.
Manly, B.F.J., L.L. McDonald and D.L. Thomas. 1993. Resource selection by animals. Chap-man & Hall, London, U.K. 177 p.
McCullagh, P. and J.A. Nelder. 1989. Generalized linear models. Chapman and Hall, New York. 511 p.
Mood, A.M., F.A. Graybill and D.C. Boes. 1974. Introduction to the theory of statistics, 3rd ed. McGraw-Hill, New York. 564 p.
Oreskes, N., K. Shrader-Frechette and K. Belitz. 1994. Verification, validation, and confir-mation of numerical models in the earth sciences. Science 263:641-644.
Seber, G.A.F. 1982. The estimation of animal abundance and related parameters, 2nd ed. Macmil-lan, New York. 654 p.
Thompson, S.K. 1992. Sampling. Wiley, New York. 339 p.
Woolf, H.B. (ed.) 1975. Webster's new collegiate dictionary. G. & C. Merriam Co., Spring-field, MA. 1536 p.
Use of Models.-One participant pointed out another tradeoff between simple and complex models: With a simple model, many people can understand it but it is likely to be unrealistic. A more complex model may be more realistic, but few people can understand or use it. It may be desirable to use both approaches.
Another commenter mentioned a category of models known as "resource selection models". These are designed to predict the probability of use of resources such as habitat or food types based on empirical data. Dr. Wilson noted that this type of model is described in the book Resource selection by animals (Manly et al. 1993).
General Approach.-One participant noted that models should be hypothesis-driven; they should shed light on a specific scientific hypothesis. He suggested that it is critical to obtain the empirical data needed to characterize key model parameters, and that it is not useful to develop models if the key data must be simulated.
However, another participant indicated that there are unknown or little-known components in any complex model. He explained that Monte Carlo approaches applied to inadequately-known model components can be useful, e.g. in assessing sensitivity and planning research. Also, he suggested that, when there are many unknowns, a Bayesian approach some-times can achieve useful predictions despite the uncertainties about individual parameters. Analogies to the Central Limit Theorem were mentioned.
Model Selection.-One attendee noted that the AIC (Akaike) approach requires fitting a general model and then working backward, eliminating parameters that seem unnecessary. In the regression context, the corresponding "backward elimination" method is rarely used nowadays. The commenter suggested that it is better to start with a simple model, evaluating which additional terms are useful, as contrasted with the AIC approach requiring a complex model as a starting point.
In expressing concern about complex models, one participant sug-gest-ed that, with a complex model, some parameters and some of the algorithms linking parameters are sure to be unknown, there will be many inter-actions among parameters, and validation will be very difficult. Another participant noted that this was another reason for developing both simple and complex models for the same issue; if they do not give similar results, there is reason to be sceptical of both.
Model Validation.-There was discussion of the fact that a model cannot be "validated" based on the same data as were used in developing the model. One approach is to split the dataset, build the model using one portion of the data, and evaluate the model with the other portion. Another approach is to develop the model based on existing data, use the model to make predictions, collect new data, and then evaluate the model based on those data. This approach may take considerable time, but has the advantage that data collection can be designed to collect the specific data needed to test the model.
An attendee asked whether there is a danger in trying to apply static models to animals that learn and habituate. Dr. Wilson replied that model development is iterative; models should be modified and adapted to take account of processes like learning and habituation.
Formatted for the Web by:
National Wind Coordinating Committee
c/o RESOLVE, 1255 23rd Street NW, Suite 875, Washington, DC 20037
(888) 764-WIND (202) 965-6398 fax: (202) 338-1264 nwcc@resolv.org