Bruce A. Rideout, DVM, PhD, DACVP
Wildlife Disease Laboratories, Institute for Conservation Research, San Diego Zoo Global, Escondido, CA, USA
The purpose of this presentation is to provide some tips on how to avoid the most common pitfalls in designing, conducting, and publishing research studies. The most important pitfall arises when we fail to distinguish between descriptive and experimental research and to consider how this distinction influences the conclusions we can draw from our study. Other common pitfalls include choosing an inadequate sample size, overestimating the quality of data in our medical and pathology records, failing to distinguish between statistical significance and biological significance, confusing association with causation, and not recognizing when additional expertise is needed. By recognizing and addressing these issues during the study design phase, we can greatly enhance the success of our research.
Study Design Phase
The most significant pitfall arises when we fail to distinguish between descriptive and experimental research and to consider how this distinction influences the conclusions we can draw from our study. In grant proposals this problem is manifested as a study design that fails to answer the question being posed, while in manuscript submissions it manifests as a conclusion that is not warranted by the data.
Descriptive research consists of describing an event or observation in a very careful and systematic way with the purpose of gaining understanding and generating hypotheses that can be tested more rigorously in the future.2 For example, if you experience a novel infectious disease outbreak in a wildlife population, one of the most important things you can do is publish a very detailed description of the outbreak. One weakness of such a descriptive report is that we have incomplete knowledge of the many variables that can influence the occurrence and outcome of a disease outbreak (i.e., we have no controls), so we are limited in the conclusions we can draw. We could not even claim that the same disease would manifest itself in the same way in another population (the findings might not be generalizable). All we can do is describe what happened and allow the reader to determine whether the information is relevant for them. In spite of this weakness, descriptive studies in the form of case reports, case series reports, and surveys are extremely valuable, but should always be interpreted in accordance with their limitations.
At the other end of the spectrum is experimental research, where we develop a research question or hypothesis to test, identify the dependent and independent variables of interest, incorporate controls for each of the independent variables, and design the study with a large enough sample size to provide the statistical power needed to answer the question. Such experimental research often involves laboratory animals or laboratory settings and is outside the scope of what most zoo and wildlife researchers do. The key advantage of experimental research is that we can carefully control for factors or variables that could influence the outcome of our investigation, which allows us to draw stronger conclusions from our findings. But the validity of our conclusions depends to a large extent on how well we control for extraneous factors and whether we can generate the appropriate sample size. Pulling this off is a tricky business and one of the main reasons why a PhD is very helpful for individuals pursuing a career in experimental research.
More relevant for us are observational studies in the middle of the research spectrum. Observational studies differ from experimental designs because the researcher has no control over the allocation of study subjects or variables to the groups being compared. They differ from descriptive studies because they are specifically designed to test a hypothesis and compare two or more groups by adding carefully selected controls. Essentially, observational studies could be considered descriptive studies with added controls. I’ll illustrate this by giving an example of a descriptive study that would fail to answer the question being posed if not for some carefully designed controls.
One of the most important wildlife health concerns today is the spillover of human and domestic animal diseases to wildlife. If I want to demonstrate that human or domestic animal diseases are spilling over into a particular wildlife population, a key variable of interest in my study would be transmission direction, but evaluating transmission direction is extraordinarily difficult in most natural settings. For example, a study currently being conducted in Africa proposes to test the hypothesis that human enteric pathogens are spilling over into chimpanzee populations. A simple descriptive study demonstrating that sympatric human and chimpanzee populations share the same Salmonella strains would not prove that spillover was occurring from humans to chimpanzees—one could just as easily argue that spillover was occurring from chimpanzees to humans. The current study assesses transmission direction in part by demonstrating that antibiotic resistance patterns of human and chimpanzee bacterial isolates are the same in areas where their populations overlap, but the antibiotic resistance disappears in bacterial isolates from chimpanzees deeper in the forest (while remaining in isolates from humans further removed from the chimpanzees). This is an elegant solution to a difficult problem and will allow the researchers to draw firmer conclusions from their study.
Another common pitfall is failing to determine whether our sample size will actually provide enough statistical power to detect a meaningful difference in our population measurements.3 In the above example, the plan might have been to collect as many samples as possible in one field trip, with the assumption that the more samples the researchers can get the better, but whatever they get will be enough. This is a faulty assumption that can lead to erroneous conclusions. For example, if my sample size is too small, I might fail to detect that spillover is actually occurring simply because the prevalence of the target agent is too low for my chosen sample size to detect. The best way to avoid this problem is to determine in advance what detection level is biologically meaningful (e.g., I want to be able to detect the agent even if the prevalence is only 5% because the disease could severely impact my population) and what confidence limits are appropriate based on the consequences of my findings (e.g., I want to be 95% certain of my findings because they will have major consequences for policy decisions). I can then determine the sample size required to detect this prevalence level with the given degree of confidence. It is important to note that calculating sample sizes can be complex and depends on the study design, so it is often best to collaborate with a statistician to determine the appropriate numbers needed to meet the research objectives. There are also tables and free software programs available for approximating sample size that may be appropriate to use in some situations. Most granting agencies now require sample size calculations in grant applications to demonstrate that you will be able to obtain the necessary data to answer your research question. Since such calculations are based on large sample size theory, the number of study subjects required might exceed what is possible when conducting research with small populations or limited resources. In these situations, we might need to adjust our research goals accordingly.
Study Execution Phase
An often underappreciated problem in research studies is the tendency to overestimate the quality of data in our medical and pathology records, and to underestimate the pitfalls associated with extracting the data.4,5 For example, it is not unusual for a medical record to indicate that an animal has a particular disease or is “positive” for a disease agent without any indication of what test was used or why the test was conducted. In order to properly interpret the meaning of that test result, we need to know not only what test was performed, but why the test was performed. This is because the predictive value (i.e., how useful the test is when applied to animals with an unknown disease status) will be largely determined by the accuracy of the test being used and by the true prevalence of disease in the study population (e.g., whether you were testing a sick animal with compatible clinical signs or were screening a healthy animal for a disease of low prevalence). Even the simple act of extracting data from medical and pathology records can present problems.1 For example, some types of data are more easily extracted from medical records than others, which can lead to data abstraction bias (e.g., favoring one source of information over another). Medical records also tend to have a bias towards recording positive test results (or observations) over those that are negative. Very often, negative results or observations are either not recorded, or are recorded in such a way that it is not possible to determine whether an animal tested negative or was not tested at all. In research studies, positive and negative observations are equally important. We can also unwittingly introduce bias if we overlook (or overemphasize) variation in observer reliability. Dealing with missing or conflicting data can also be complicated and is one of many important reasons for including epidemiologists as collaborators.
Analysis of Results and Drawing Conclusions
If we have overlooked the distinction between descriptive and experimental research at the outset, we can run into problems during the analysis phase by unwittingly drawing conclusions that are not warranted by the data. In our first example, this would be drawing the conclusion that spillover of a human enteric pathogen into chimpanzees had occurred based only on the isolation of similar strains in areas of population overlap. Without the antibiotic resistance patterns as an added control, the researchers would not have been able to draw any meaningful conclusions.
Even if we have understood the distinction between the different study types, have chosen a robust sample size, done the appropriate power analysis, and have properly designed our study, there are other pitfalls that can arise with interpretation of our findings. We sometimes fail to recognize that it is possible for a finding to be statistically significant but not biologically significant. Undetected bias can easily lead to spurious statistical significance. Using a sophisticated statistical analysis does not obviate the need for clear thinking and applying our broad knowledge as veterinarians. A related problem is confusing association with causation. The fact that two events or variables are associated in time or space does not imply that one caused the other. Even though we are all familiar with this problem, the temptation to draw unwarranted causal inferences during disease investigations can be overwhelming.
Finally, a wide variety of complex and interesting problems can arise in all phases of a study when we unwittingly drift outside the boundaries of our expertise and begin designing and executing studies better left to specialists in other fields. Knowing when and how to recruit outside expertise is a skill worth acquiring.
In summary, the most significant and most frequent pitfalls in research can be avoided by understanding that the questions we can answer with a study, and therefore the conclusions we can draw from it, depend largely upon the controls and sample sizes we select. Selecting the proper controls or comparison groups requires identifying the most important variables that influence the outcome we are trying to measure. If you have trouble doing this, that is the first indication that you might need a collaborator with skills in that area. Determining the appropriate sample size begins with identifying the study design, the effect size we wish to detect, and the level of confidence we need in the result. We can then choose the appropriate method for calculating the sample size required to meet our research objectives. We need to be aware of the potential for bias in the extraction of data from our medical records, and how the quality of information in our records can vary. When interpreting the results of our study, we need to make sure our conclusions are only as firm and specific as our study design and controls or comparison groups allow, and to keep in mind the difference between statistical significance and biological significance.
The author thanks Carmel Witte for helpful comments and for keeping him from straying beyond the bounds of his expertise.
1. Delgado-Rodrıguez, M., J. Llorca. 2004. Bias. J. Epidemiol. Community Health. 58:635–641.
2. Dohoo, I., W. Martin, H. Stryhn. 2009. Veterinary Epidemiology Research, 2nd ed. VER Inc., Charlottetown, Prince Edward Island, Canada. Pp 151–166.
3. Fitzner K., E. Heckinger. 2010. Sample size calculation and power analysis: a quick review. Diabetes Educ. 36(5):701–707.
4. Grimes, D. A., K. F. Schulz. 2002. Bias and causal associations in observational research. The Lancet. 359(9302):248–252.
5. Worster A., Haines T. 2004. Advanced statistics: understanding medical record review (MRR) studies. Acad. Emerg. Med. 11(2):187–192.