Woordenlijst epidemiology
Bron: Prognosis in Head and Neck Cancer, ed. RJ Baatenburg de Jong, Taylor&Francis 2005
Absolute risk difference
The absolute difference between two risks
Absolute risk reduction
1. The amount, preferably expressed as a percentage, by which the risk of a disease is reduced by elimination or control of a particular exposure. It is possible from this to estimate the number of people spared the consequences of an exposure.
2. In clinical epidemiology, the proportion of unrelated persons who experience an adverse event, minus the proportion of treated persons who experience this event; used in calculating number needed to treat.
Additive model
A model in which the combined effect of several factors is the sum of the effects that would be produced by each of the factors in the absence of Y, and if factor Y adds y% to risk in the absence of X, an additive model states the two factors together will add (X+Y)% to risk. See also: interaction; multiplicative model.
Adjusted analysis
An analysis that controls (adjusts) for baseline imbalances in important patient characteristics. See also confounder, regression analysis.
Applicability
See external validity.
Ascertainment bias
Systematic failure to represent equally all classes of cases or persons supposed to be represented in a sample. This bias may arise because of the nature of the sources from which persons come, e.g., a specialized clinic; from a diagnostic process influenced by culture, custom, or idiosyncrasy; or, for example, in genetic studies, from the statistical chance of selecting from large or small families.
Association
(Syn: correlation, [statistical] dependence, relationship). Statistical dependence between two or more events, characteristics, or other variables. An association is present if the probability of occurrence of an event or characteristic, or the quantity of a variable, depends upon the occurrence of one or more other events, the presence of one or more other characteristics, or the quantity of one or more other variables. The association between two variables is described as positive when higher values of a variable are associated with higher values of another variable. In a negative association, the occurrence of higher values of one variable is associated with lower values of the other variable. An association may be fortuitous or may be produced by various other circumstances; the presence of an association does not necessarily imply a causal relationship. If the use of the term association is confined to situations in which the relationship between two variables is statistically significant, the terms statistical association and statistically significant association become tautological. However, ordinary usage is seldom so precise as this. The terms association and relationship are often used interchangeably. Associations can be broadly grouped under two headings, symmetrical or noncausal (see below) and asymmetrical or causal.
Attrition
The loss of participants during the course of a study. (Also called loss to follow up.) Participants that are lost during the study are often called dropouts.
Attrition bias
Systematic differences between comparison groups in withdrawals or exclusions of participants from the results of a study. For example, participants may drop out of a study because of side effects of an intervention, and excluding these participants from the analysis could result in an overestimate of the effectiveness of the intervention, especially when the proportion dropping out varies by treatment group.
Baseline characteristics
Values of demographic, clinical and other variables collected for each participant at the beginning of a trial, before the intervention is administered.
Bayes' theorem
A theorem in probability theory named for Thomas Bayes (17021761), an English clergyman and mathematician; his Essay Towards Solving a Problem in the Doctrine of Chances (1763, published posthumously) contained this theorem. In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease (the prior probability of disease) and of the likelihoods of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis, where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result.
A simplified version of the theorem is
P( D\S) = P(S\D)P(D
__________________
P(S\D)P(D) + P(S\D)P(D)
where D = disease, S = symptom, and D = no disease. The formula emphasizes what clinical intuition often overlooks, namely, that the probability of disease given this symptom depends not only on how characteristic that symptom is of the disease but also on how frequent the disease is among the population being served.
The theorem can also be used for estimating exposurespecific rates from case control studies if there is added information about the overall rate of disease in that population.
Some of the terms in the theorem are named. The probability of disease given the symptom is the posterior probability. It is an estimate of the probability of disease posterior to knowing whether or not the symptom was present. The overall probability of disease among the population or our guess of the probability of disease before knowing of the presence or absence of the symptom is the prior probability. The theorem is sometimes presented in terms of the odds of disease before knowing the symptom (Prior odds) and after knowing the symptom (Posterior odds).
Bayesian statistics
A method of statistical inference that begins with the state of knowledge, i.e., the facts, prior to an exposure or an intervention, and augments this with the study data to yield the state of knowledge posterior to the study, does not use statistical significance tests, and uses likelihood intervals rather than confidence intervals  i.e., statistical inference using Bayes' theorem. This method has many uses, e.g., evaluation of diagnostic tests, disease progression, case control studies and sequential clinical trials.
Beta See type II error.
Beta error See error, type II.
Bias
Deviation of results or inferences from the truth, or processes leading to such deviation. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth. Among the ways in which deviation from the truth can occur, are the following:
1. Systematic (onesided) variation of measurements from the true values (syn: systematic error).
2. Variation of statistical summary measures (means, rates, measures of association, etc.) from their true values as a result of systematic variation of measurements, other flaws in data collection, or flaws in study design or analysis.
3. Deviation of inferences from the truth as a result of flaws in study design, data collection, or the analysis or interpretation of results.
4. A tendency of procedures (in study design, data collection, analysis, interpretation, review, or publication) to yield results or conclusions that depart from the truth.
5. Prejudice leading to the conscious or unconscious selection of study procedures that depart from the truth in a particular direction or to onesidedness in the interpretation of results.
The term bias does not necessarily carry an imputation of prejudice or other subjective factor, such as the experimenter's desire for a particular outcome. This differs from conventional usage, in which bias refers to a partisan point of view. Many varieties of bias have been described.
Binary data See dichotomous data.
Case control study
(Syn: case comparison study, case compeer study, case history study, case referent study, retrospective study). The observational epidemiologic study of persons with the disease (or other outcome variable) of interest and a suitable control (comparison, reference) group of persons without the disease. The relationship of an attribute to the disease is examined by comparing the diseased and nondiseased with regard to how frequently the attribute is present or, if quantitative, the levels of the attribute, in each of the groups. In short, the past history of exposure to a suspected risk factor is compared between "cases" and "controls," persons who resemble the cases in such respects as age and sex but do not have the disease or condition of interest.
Such a study can be called "retrospective" because it starts after the onset of disease and looks back to the postulated causal factors. Cases and controls in a case control study may be accumulated "prospectively," that is, as each new case is diagnosed it is entered in the study. Nevertheless, such a study may still be called "retrospective" because it looks back from the outcome to its causes. The terms cases and controls are sometimes used to describe subjects in a randomized controlled trial, but the term case control study should not be used to describe such a study.
Categorical data
Data that are classified into two or more nonoverlapping categories. Race and type of drug (aspirin, paracetamol, etc.) are examples of categorical variables. If there is a natural order to the categories, for example, nonsmokers, exsmokers, light smokers and heavy smokers, the data are known as ordinal data. If there are only two categories, the data are dichotomous data.
Causal effect
An association between two characteristics that can be demonstrated to be due to cause and effect, i.e., a change in one causes the change in the other. Causality can be demonstrated by experimental studies such as controlled trials (for example, that an experimental intervention causes a reduction in mortality). However, causality can often not be determined from an observational study.
Censored
[In survival analysis:] A term used in studies where the outcome is the time to a particular event, to describe data from patients where the outcome is unknown. A patient might be known not to have had the event only up to a particular point in time, so 'survival time' is censored at this point.
Chisquared test
A statistical test based on comparison of a test statistic to a chisquared distribution.
CI (confidence interval)
The computed interval with a given probability, e.g., 95%, that the true value of a variable such as a mean, proportion, or rate is contained within the interval.
Clinical trial
(Syn: therapeutic trial) A research activity that involves the administration of a test regimen to humans to evaluate its efficacy and safety. The term is subject to wide variation in usage, from the first use in humans without any control treatment to a rigorously designed and executed experiment involving test and control treatments and randomization. Several phases of clinical trials are distinguished:
Phase I trial Safety and pharmacologic profiles. The first introduction of a candidate vaccine or a drug into a human population to determine its safety and mode of action. In drug trials, this phase may include studies of dose and route of administration. Phase I trials usually involve fewer than 100 healthy volunteers.
Phase II trial Pilot efficacy studies. Initial trial to examine efficacy usually in 200 to 500 volunteers; with vaccines, the focus is on immunogenicity, and with drugs, on demonstration of safety and efficacy in comparison to other existing regimens. Usually but not always, subjects are randomly allocated to study and control groups.
Phase III trial Extensive clinical trial. This phase is intended for complete assessment of safety and efficacy. It involves larger numbers, perhaps thousands, of volunteers, usually with random allocation to study and control groups, and may be a multicenter trial.
Phase IV trial With drugs, this phase is conducted after the national drug registration authority (e.g., the Food and Drug Administration in the United States) has approved the drug for distribution or marketing. Phase IV trials may include research designed to explore a specific pharmacologic effect, to establish the incidence of adverse reactions, or to determine the effects of longterm use.
Clinically significant
A result (e.g. a treatment effect) that is large enough to be of practical importance to patients and healthcare providers. This is not the same thing as statistically significant. Assessing clinical significance takes into account factors such as the size of a treatment effect, the severity of the condition being treated, the side effects of the treatment, and the cost. For instance, if the estimated effect of a treatment for acne was small but statistically significant, but the treatment was very expensive, and caused many of the treated patients to feel nauseous, this would not be a clinically significant result. Showing that a drug lowered the heart rate by an average of 1 beat per minute would also not be clinically significant.
Cohort study
(Syn: concurrent, followup, incidence, longitudinal, prospective study) The analytic method of epidemiologic study in which subsets of a defined population can be identified who are, have been, or in the future may be exposed or not exposed, or exposed in different degrees, to a factor or factors hypothesized to influence the probability of occurrence of a given disease or other outcome. The main feature of cohort study is observation of large numbers over a long period (commonly years) with comparison of incidence rates in groups that differ in exposure levels. The alternative terms for a cohort study, i.e., followup, longitudinal, and prospective study, describe an essential feature of the method, which is observation of the population for a sufficient number of personyears to generate reliable incidence or mortality rates in the population subsets. This generally implies study of a large population, study for a prolonged period (years), or both. The denominator may be persons or persontime.
Cointervention
In a randomized controlled trial, the application of additional diagnostic or therapeutic procedures to members of either or both the experimental and the control groups.
Comorbidity
Disease(s) that coexist(s) in a study participant in addition to the index condition that is the subject of study.
Confidence interval (CI)
The computed interval with a given probability, e.g., 95%, that the true value of a variable such as a mean, proportion, or rate is contained within the interval.
Confidence limits
The upper and lower boundaries of the confidence interval.
Confounder
A factor that is associated with both an intervention (or
exposure) and the outcome of interest. For example, if people in the experimental group of a controlled trial are younger than those in the control group, it will be difficult to decide whether a lower risk of death in one group is due to the intervention or the difference in ages. Age is then said to be a confounder, or a confounding variable. Randomisation is used to minimize imbalances in confounding variables between experimental and control groups. Confounding is a major concern in nonrandomised studies. See also adjusted analyses.
Contingency table
A tabular crossclassification of data such that subcategories of one characteristic are indicated horizontally (in rows) and subcategories of another characteristic are indicated vertically (in columns). Tests of association between the characteristics in the columns and rows can be readily applied. The simplest contingency table is the fourfold, or 2 X 2, table. Contingency tables may be extended to include several dimensions of classification.
Continuous data, Continuous variable
Data (variable) with a potentially infinite number of possible values along a continuum. Data representing a continuous variable include height, weight, and enzyme output.
Control
1. To regulate, restrain, correct, restore to normal.
2. Applied to many communicable and some noncommunicable conditions, control means ongoing operations or programs aimed at reducing incidence and/or prevalence, or eliminating such conditions.
3. As used in the expressions case control study and randomized control(led) trial, control means person(s) in a comparison group that differs, in disease experience or allocation to a regimen from the subjects of the study.
4. In statistics, control means to adjust for or take into account extraneous influences or observations.
5. In the expression control variable, we refer to an independent variable other than the hypothetical causal variable that has a potential effect on the dependent variable and is subject to control by analysis.
Control group, Controls
Subjects with whom comparison is made in a case control study, randomized controlled trial, or other variety of epidemiologic study. Selection of appropriate controls is crucial to the validity of epidemiologic studies and has been much discussed.
Controlled trial
A clinical trial that has a control group. Such trials are not necessarily randomised.
Correlation
The degree to which variables change together.
Costbenefit analysis
An analysis in which the economic and social costs of medical care and the benefits of reduced loss of net earnings due to preventing premature death or disability are considered. The general rule for the allocation of funds in a costbenefit analysis is that the ratio of marginal benefit (the benefit of preventing an additional case) to marginal cost (the cost of preventing an additional case) should be equal to or greater than 1.
Costeffectiveness analysis
This form of analysis seeks to determine the costs and effectiveness of an activity or to compare similar alternative activities to determine the relative degree to which they will obtain the desired objectives or outcomes. The preferred action or alternative is one that requires the least cost to produce a given level of effectiveness, or provides the greatest effectiveness for a given level of cost. In the health care field, outcomes are measured in terms of health status.
Costutility analysis
A form of economic evaluation in which the outcomes of alternative procedures or programs are expressed in terms of a single "utilitybased" unit of measurement. A widely used utilitybased measure is the qualityadjusted life year.
Crossover trial
A type of clinical trial comparing two or more interventions in which the participants, upon completion of the course of one treatment, are switched to another. For example, for a comparison of treatments A and B, the participants are randomly allocated to receive them in either the order A, B or the order B, A. Particularly appropriate for study of treatment options for relatively stable health problems. The time during which the first intervention is taken is known as the first period, with the second intervention being taken during the second period.
Crosssectional study
(Syn: disease frequency survey, prevalence study)
A study that examines the relationship between diseases (or other healthrelated characteristics) and other variables of interest as they exist in a defined population at one particular time. The presence or absence of disease and the presence or absence of the other variables (or, if they are quantitative, their level) are determined in each member of the study population or in a representative sample at one particular time. The relationship between a variable and the disease can be examined (1) in terms of the prevalence of disease in different population subgroups defined according to the presence or absence (or level) of the variables and (2) in terms of the presence or absence (or level) of the variables in the diseased versus the nondiseased. Note that disease prevalence rather than incidence is normally recorded in a crosssectional study. The temporal sequence of cause and effect cannot necessarily be determined in a crosssectional study.
Decision analysis
A derivative of operations research and game theory that involves identifying all available choices, and potential outcomes of each, in a series of decisions that have to be made about aspects of patient carediagnostic procedures, therapeutic regimens, prognostic expectations. Epidemiologic data playa large part in determining the probabilities of outcomes following each choice that has to be made. The range of choices can be plotted on a decision tree, and at each branch, or decision node, the probabilities of each outcome that can be predicted are displayed. The decision tree thus portrays the choices available to those responsible for patient care and the probabilities of each outcome that will follow the choice of a particular action or strategy in patient care. The relative worth of each outcome is preferably also described as a utility or quality of life, e.g., a probability of life expectancy or of freedom from disability often expressed as QALYs.
Degrees of freedom (df)
The number of independent comparisons that can be made between the members of a sample. This important concept in statistical testing cannot be defined briefly. It refers to the number of independent contributions to a sampling distribution (such as X2, t, and F distribution). In a contingency table it is one less than the number of row categories multiplied by one less than the number of column categories.
Dependent variable
1. A variable the value of which is dependent on the effect of other variable (s)independent variable(s)in the relationship under study. A manifestation or outcome whose variation we seek to explain or account for by the influence of independent variables.
2. In statistics, the dependent variable is the one predicted by a regression equation.
See also independent variable.
Descriptive study
A study concerned with and designed only to describe the existing distribution of variables, without regard to causal or other hypotheses. An example is a community health survey, used to determine the health status of the people in a community. Descriptive studies, e.g., analyses of cancer registry data, can be used to measure risks, generate hypotheses, etc.
Detection bias
Bias due to systematic error(s) in methods of ascertainment, diagnosis, or verification of cases in an epidemiologic study. An example is verification of diagnosis by laboratory tests in hospital cases but failure to apply the same tests to cases outside the hospital.
Dichotomous data
Data that can take one of two possible values, such as dead/alive, smoker/nonsmoker, present/not present. (Also called binary data.) Sometimes continuous data or ordinal data are simplified into dichotomous data (e.g. age in years could become <75 years or ? 75 years).
Distribution
The complete summary of the frequencies of the values or categories of a measurement made on a group of persons. The distribution tells either how many or what proportion of the group was found to have each value (or each range of values) out of all the possible values that the quantitative measure can have.
Dropout
A person enrolled in a study who becomes inaccessible or ineligible for followup, e.g., because of inability or unwillingness to remain enrolled in the study. The occurrence of dropouts can lead to biases in study results.
Effect size
1. A generic term for the estimate of effect of treatment for a study.
2. A dimensionless measure of effect that is typically used for continuous data when different scales (e.g. for measuring pain) are used to measure an outcome and is usually defined as the difference in means between the intervention and control groups divided by the standard deviation of the control or both groups. See also standardised mean difference.
Effectiveness
In the usage made standard among epidemiologists by A. L. Cochrane (19091988), effectiveness is a measure of the extent to which a specific intervention, procedure, regimen, or service, when deployed in the field in routine circumstances, does what it is intended to do for a specified population; a measure of the extent to which a health care intervention fulfills its objectives. To be distinguished from efficacy and efficiency.
Efficacy
In clinical epidemiology, the extent to which a specific intervention, procedure, regimen, or service produces a beneficial result under ideal conditions; the benefit or utility to the individual or the population of the service, treatment regimen or intervention. Ideally, the determination of efficacy is based on the results of a randomized controlled trial.
Empirical
Based directly on experience, e.g., observation or experiment, rather than on reasoning alone.
Endpoint See outcome.
Epidemiology
The study of the distribution and determinants of healthrelated states or events in specified populations, and the application of this study to control of health problems. "Study" includes surveillance, observation, hypothesis testing, analytic research, and experiments. "Distribution" refers to analysis by time, place, and classes of persons affected. "Determinants" are all the physical, biological, social, cultural, and behavioural factors that influence health. "Healthrelated states and events" include diseases, causes of death, behaviours such as use of tobacco, reactions to preventive regimens, and provision and use of health services. "Specified populations" are those with identifiable characteristics such as precisely defined numbers. "Application to control…" makes explicit the aim of epidemiology  to promote, protect, and restore health.
Estimate of effect
The observed relationship between an intervention and an outcome expressed as, for example, a number needed to treat to benefit, odds ratio, risk difference, risk ratio, standardised mean difference, or weighted mean difference. (Also called treatment effect.)
Experimental study
A study in which conditions are under the direct control
of the investigator. In epidemiology, a study in which a population is selected for a planned trial of a regimen whose effects are measured by comparing the outcome of the regimen in the experimental group with the outcome of another regimen in a control group. To avoid bias members of the experimental and control groups should be comparable except in the regimen that is offered them. Allocation of individuals to experimental or control groups is ideally by randomization. In a randomized controlled trial, individuals are randomly allocated; in some experiments, e.g., fluoridation of drinking water, whole communities have been (nonrandomly) allocated to experimental and control groups.
External validity
The extent to which results provide a correct basis for generalisations to other circumstances. For instance, a metaanalysis of trials of elderly patients may not be generalisable to children. (Also called generalisability or applicability.)
False negative
Negative test result in a person who possesses the attribute for which the test is conducted. The labelling of a diseased person as healthy when screening in the detection of disease. See also sensitivity.
False positive
Positive test result in a person who does not possess the attribute for which the test is conducted. The labelling of a healthy person as diseased when screening in the detection of disease. See also specificity.
Followup
Observation over a period of time of an individual, group, or initially defined population whose appropriate characteristics have been assessed in order to observe changes in health status or healthrelated variables. See also cohort.
Funnel plot
A plotting device used in metaanalysis to detect publication bias. The estimate of risk is plotted against sample size. If there is no publication bias, the plot is funnelshaped; but if studies showing significant results are more likely to be published, there is a "hole in the lower left corner" of the funnel.
Generalizability See validity, study.
Gold standard
A method, procedure, or measurement that is widely accepted as being the best available. Often used to compare with new methods.
Hazard rate
(Syn: force of morbidity, instantaneous incidence rate) A theoretical measure of the risk of occurrence of an event, e.g., death or new disease, at a point in time, t, defined mathematically as the limit, as
Δt approaches zero, of the probability that an individual well at time t will experience the event by t +
Δt, divided by
Δt.
Hazard ratio
A measure of effect produced by a survival analysis. This represents the increased risk with which one group is likely to experience the outcome of interest. For example, if the hazard ratio for death for a treatment is 0.5, then we can say that treated patients are likely to die at half the rate of untreated patients.
Heterogeneity
1. Used in a general sense to describe the variation in, or diversity of, participants, interventions, and measurement of outcomes across a set of studies, or the variation in internal validity of those studies.
2. Used specifically, as statistical heterogeneity, to describe the degree of variation in the effect estimates from a set of studies. Also used to indicate the presence of variability among studies beyond the amount expected due solely to the play of chance.
Heterogeneous
Used to describe a set of studies or participants with sizeable heterogeneity. The opposite of homogeneous.
Historical control
Control subject(s) for whom data were collected at a time preceding that at which the data are gathered on the group being studied. Because of differences in exposure, etc., use of historical controls can lead to bias in analysis.
Homogeneous
1. Used in a general sense to mean that the participants, interventions, and measurement of outcomes are similar across a set of studies.
2. Used specifically to describe the effect estimates from a set of studies where they do not vary more than would be expected by chance.
Hypothesis
1. A supposition, arrived at from observation or reflection, that leads to refutable predictions.
2. Any conjecture cast in a form that will allow it to be tested and refuted. See also null hypothesis.
Hypothesis test
A statistical procedure to determine whether to reject a null hypothesis on the basis of the observed data.
Incidence (Syn: incident number)
The number of instances of illness commencing, or of persons falling ill, during a given period in a specified population. More generally, the number of new events, e.g., new cases of a disease in a defined population, within a specified period of time. The term incidence is sometimes wrongly used to denote
incidence rate.
Independent
A description of two events, where knowing the outcome or value of one does not inform us about the outcome or value of the other. Formally, two events 'A and B' are independent if the probability that A and B occur together is equal to the probability of A occurring multiplied by the probability of B occurring.
Independent variable
1. The characteristic being observed or measured that is hypothesized to influence an event or manifestation (the dependent variable) within the defined area of relationships under study; that is, the independent variable is not influenced by the event or manifestation but may cause or contribute to variation of the event or manifestation.
2. In statistics, an independent variable is one of (perhaps) several variables that appear as arguments in a regression equation.
Intentiontotreat analysis
A procedure in the conduct and analysis of
randomized controlled trials. All patients allocated to each arm of the treatment regimen are analyzed together as representing that treatment arm, whether or not they received or completed the prescribed regimen. Failure to follow this step defeats the main purpose of random allocation and can invalidate the results.
Interaction
1.. The interdependent operation of two or more causes to produce or prevent an effect.
Biological interaction means the interdependent operation of two or more causes to produce, prevent, or control disease.
2. Differences in the effects of one or more factors according to the level of the remaining factor(s).
3. In statistics, the necessity for a product term in a linear model.
Intermediary outcomes
See surrogate endpoints.
Internal validity See validity.
Intervention
The process of intervening on people, groups, entities or objects in an experimental study. In controlled trials, the word is sometimes used to describe the regimens in all comparison groups, including placebo and notreatment arms.
Intervention study
An investigation involving intentional change in some aspect of the status of the subjects, e.g., introduction of a preventive or therapeutic regimen, or designed to test a hypothesized relationship; usually an experiment such as a randomized controlled trial.
KaplanMeier estimate
A nonparametric method of compiling life or survival tables. This combines calculated probabilities of survival and estimates to allow for censored observations, which are assumed to occur randomly.
Linear scale
A scale that increases in equal steps. A linear scale may be used when the range of numbers being represented is not large, or to represent differences. See also logarithmic scale.
Logarithmic scale
A scale in which the logarithm of a value is used instead of the value. A logarithmic scale may be used when the range of numbers being represented is large, or to represent ratios. See also linear scale.
Logistic regression
A form of regression analysis that models an individual's odds of disease or some other outcome as a function of a risk factor or intervention. It is widely used for dichotomous outcomes, in particular to carry out adjusted analysis.
Lost to followup
Study subject(s) who cannot or do not complete participation in a study for whatever reason. See also censored.
Mean
An average value, calculated by adding all the observations and dividing by the number of observations. (Also called arithmetic mean.)
Median a measureof central tendency
The simplest division of a set of measurements is into two partsthe lower and the upper half. The point on the scale that divides the group in this way is called the "median".
Metaanalysis
A statistical synthesis of the data from separate but similar, i.e., comparable, studies, leading to a quantitative summary of the pooled results. In the biomedical sciences, the systematic, organized, and structured evaluation of a problem of interest, using information (commonly in the form of statistical tables or other data) from a number of independent studies of the problem. A frequent application has been the pooling of results from a set of randomized controlled trials, none in itself necessarily powerful enough to demonstrate statistically significant differences, but in aggregate capable of so doing. Metaanalysis has a qualitative component, i.e., application of predetermined criteria of quality (e.g., completeness of data, absence of biases), and a quantitative component, i.e., integration of the numerical information. The aim is to integrate the findings, pool the data, and identify the overall trend of results. An essential prerequisite is that the studies must stand up to critical appraisal, and various biases, e.g., publication bias, must be allowed for.
Morbidity
Any departure, subjective or objective, from a state of physiological or psychological well being. In this sense sickness, illness, and morbid condition are similarly defined and synonymous.
Mortality Death.
Multiplicative model
A model in which the joint effect of two or more causes is the product of their individual effects. For instance, if factor a multiplies risk by the amount a in the absence of factor b, and factor b multiplies risk by the amount b in the absence of factor a, the combined effect of factors a and b on risk is a x b. See also additive model.
Multivariate analysis
A set of techniques used when the variation in several variables has to be studied simultaneously. In statistics, any analytic method that allows the simultaneous study of two or more dependent variables.
Negative predictive value
[In screening/diagnostic tests:] A measure of the usefulness of a screening/diagnostic test. It is the proportion of those with a negative test result who do not have the disease, and can be interpreted as the probability that a negative test result is correct. It is calculated as follows: NPV = Number with a negative test who do not have disease/Number with a negative test.
NNH See number needed to treat to harm.
NNT See number needed to treat to benefit.
NNTb See number needed to treat to benefit.
NNTh See number needed to treat to harm.
Nonrandomised study
Any quantitative study estimating the effectiveness of an intervention (harm or benefit) that does not use randomisation to allocate units to comparison groups (including studies where 'allocation' occurs in the course of usual treatment decisions or peoples' choices, Le. studies usually called 'observational'). To avoid ambiguity, the term should be substantiated using a description of the type of question being addressed. For example, a 'nonrandomised intervention study' is typically a comparative study of an experimental intervention against some control intervention (or no intervention) that is not a randomised controlled trial. There are many possible types of nonrandomised intervention study, including cohort studies, casecontrol studies, controlled beforeandafter studies, interruptedtimeseries studies and controlled trials that do not use appropriate randomisation strategies (sometimes called quasirandomised studies).
Normal distribution (Syn: Gaussian distribution) The continuous frequency distribution of infinite range represented by the equation
1
f(x) =
__________
e
^{(xμ)2/2σ2}
(2πσ^{2})^{1/2}
where x is the abscissa, f(x) is the ordinate, µ is the mean, e is the base of the natural logarithm, 2.718 and σ the standard deviation. All possible values of the variable are displayed on the horizontal axis. The frequency (probability) of each value is displayed on the vertical axis, producing the graph of the normal distribution.
The properties of a normal distribution include the following: (1) It is a continuous, symmetrical distribution; both tails extend to infinity; (2) the arithmetic mean, mode, and median are identical; and (3) its shape is completely determined by the mean and standard deviation.
Null hypothesis
(Syn: test hypothesis) The statistical hypothesis that one variable has no association with another variable or set of variables, or that two or more population distributions do not differ from one another. In simplest terms, the null hypothesis states that the results observed in a study, experiment, or test are no different from what might have occurred as a result of the operation of chance alone.
Number needed to harm See number needed to treat to harm.
Number needed to treat See number needed to treat to benefit.
Number needed to treat to benefit
An estimate of how many people need to receive a treatment before one person would experience a beneficial outcome. For example, if you need to give a stroke prevention drug to 20 people before one stroke is prevented, then the number needed to treat to benefit for that stroke prevention drug is 20. The NNTb is estimated as the reciprocal of the absolute risk difference. (Also called NNT, NNTB, number needed to treat.)
Number needed to treat to harm
A number needed to treat to benefit associated with a harmful effect. It is an estimate of how many people need to receive a treatment before one more person would experience a harmful outcome or one fewer person would experience a beneficial outcome. (Also called NNH, NNTH, number needed to harm.)
Observational study
(Syn: nonexperimental study) Epidemiologic study that does not involve any intervention, experimental or otherwise. Such a study may be one in which nature is allowed to take its course, with changes in one
characteristic being studied in relation to changes in other characteristics. Analytic epidemiologic methods, such as case control and cohort study designs, are properly called observational epidemiology because the investigator is observing without intervention other than to record, classify, count, and statistically analyze results.
Odds
The ratio of the probability of occurrence of an event to that of nonoccurrence, or the ratio of the probability that something is so to the probability that it is not so. If 60 smokers develop a chronic cough and 40 do not, the odds among these 100 smokers in favour of developing a cough are 60:40, or 1.5; this may be contrasted with the probability that these smokers will develop a cough, which is 60:100 or 0.6.
Odds ratio
(Syn: crossproduct ratio, relative odds) The ratio of two odds. The term odds is defined differently according to the situation under discussion. Consider the following notation for the distribution of a binary exposure and a disease in a population or a sample.
Exposed  Unexposed  
Disease  a  b 
No disease  c  d 
The odds ratio (crossproduct ratio) is ad/bc.
The exposureodds ratio for a set of case control data is the ratio of the odds in favour of exposure among the cases (a/b) to the odds in favour of exposure among noncases (c/d). This reduces to ad/bc. With incident cases, unbiased subject selection, and a "rare" disease (say, under 2% cumulative incidence rate over the study period), ad/bc is an approximate estimate of the risk ratio. With incident cases, unbiased subject selection, and density sampling of controls, ad/bc is an estimate of the ratio of the persontime incidence rates (force of morbidity) in the exposed and unexposed (no rarity assumption is required for this).
The diseaseodds ratio for a cohort or cross sectional study is the ratio of the odds in favour of disease among the exposed (a/c) to the odds in favour of disease among the unexposed (b/d). This reduces to ad/bc and hence is equal to the exposureodds ratio for the cohort or cross section.
The prevalenceodds ratio refers to an odds ratio derived crosssectionally, as, for example, an odds ratio derived from studies of prevalent (rather than incident) cases.
The riskodds ratio is the ratio of the odds in favour of getting disease, if exposed, to the odds in favour of getting disease if not exposed. The odds ratio derived from a cohort study is an estimate of this. See also case control study.
OR See odds ratio.
Ordinal data
Data that are classified into more than two categories
which have a natural order; for example, nonsmokers, exsmokers, light smokers and heavy smokers. Ordinal data are often reduced to two categories to simplify analysis and presentation, which may result in a considerable loss of information.
Outcomes
All the possible results that may stem from exposure to a causal factor, or from preventive or therapeutic interventions; all identified changes in health status arising as a consequence of the handling of a health problem. See also causal effect.
Paired design
A study in which participants or groups of participants are matched (e.g. based on prognostic factors). One member of each pair is then allocated to the experimental (intervention) group and the other to the control group.
Parameter
In mathematics, a constant in a formula or model; in statistics and epidemiology, a measurable characteristic of a population that is often estimated by a statistic, e.g., mean, standard deviation, regression coefficients.
Persontime
A measurement combining persons and time as the denominator in incidence and mortality rates when, for varying periods, individual subjects are at risk of developing disease or dying. It is the sum of the periods of time at risk for each of the subjects. The most widely used measure is personyears. With this approach, each subject contributes only as many years of observation to the population at risk as the period over which that subject has been observed; a subject observed over one year contributes 1 personyear, a subject observed over a 10year period contributes 10 personyears. This method can be used to measure incidence rate over extended and variable time periods.
Personyears See persontime.
Phase I, Phase II,Phase III TRIAL See clinical trial.
Placebo
An inert medication or procedure, i.e., one having no pharmacological effect, but that is intended to give patients the perception that they are receiving treatment for their complaint.
Population
1. All the inhabitants of a given country or area considered together; the number of inhabitants of a given country or area.
2. (In sampling) The whole collection of units (the "universe") from which a sample may be drawn; not necessarily a population of persons, the units may be institutions, records, or events. The sample is intended to give results that are representative of the whole population.
Positive predictive value
[In screening/diagnostic tests:] A measure of the usefulness of a screening/diagnostic test. It is the proportion of those with a positive test result who have the disease, and can be interpreted as the probability that a positive test result is correct. It is calculated as follows: PPV = Number with a positive test who have disease/Number with a positive test.
[In trial searching:] See precision.
Positive study
A study with results indicating a beneficial effect of the
intervention being studied. The term can generate confusion because it can refer to both statistical significance and the direction of effect; studies often have multiple outcomes; the criteria for classifying studies as negative or positive are not always clear; and, in the case of studies of risk or undesirable effects, 'positive' studies are ones that show a harmful effect.
Power
The ability of a study to demonstrate an association if one exists. The power of a study is determined by several factors, including the frequency of the condition under study, the magnitude of the effect, the study design, and sample size. Mathematically, power is 1 ? (type II error). A characteristic of a statistical hypothesis test, denoting the probability that the null hypothesis will be rejected if it is indeed false. Resolving power is the comparable property of individual measurements.
Precision
1. The quality of being sharply defined or stated. One measure of precision is the number of distinguishable alternatives from which a measurement was selected, sometimes indicated by the number of significant digits in the measurement. Another measure of precision is the standard error of measurement, the standard deviation of a series of replicate determinations of the same quantity. Precision does not imply accuracy.
2. In statistics, precision is defined as the inverse of the variance of a measurement or estimate.
Prevalence
The number of events, e.g., instances of a given disease or other condition, in a given population at a designated time; sometimes used to mean prevalence rate. When used without qualification, the term usually refers to the situation at a specified point in time (point prevalence). Note that this is a number, not a rate.
Prevalence, annual The total number of persons with the disease or attribute at any time during a year. It includes cases of the disease arising before but extending into or through the year as well as those having their inception during the year.
Lifetime prevalence, The total number of persons known to have had the disease or attribute for at least part of their lives.
Period prevalence The total number of persons known to have had the disease or attribute at any time during a specified period.
Point prevalence The number of persons with a disease or an attribute at a specified point in time.
Primary outcome The outcome of greatest importance.
Probability
1. The limit of the relative frequency of an event in a sequence of N random trials as N approaches infinity, i.e., the limit of
Number of occurrences of the event
___________________________
N
2. A measure, ranging from zero to 1, of the degree of belief in a hypothesis or statement.
Proportional hazards model
(Syn: Cox model) A statistical model in survival analysis developed by D. R. Cox in 1972 asserting that the effect of the study factors on the hazard rate in the study population is multiplicative and does not change over time. For example, the model for two factors x1 and x2 asserts that the rate at time t
λ
(t), is given by
e ^{β1x1 + β2x2} λ_{0}(t)
where λ_{0} (t) is the rate when x1 = x2 =0 and e is the base of the natural logarithm.
Prospective study See cohort study.
Publication bias
Tendency of editors (and authors) to publish articles containing positive findings, especially "new" results, in contrast to reports that do not yield "significant" results, i.e., results that accord with previously published findings. Publication bias can distort the general belief, e.g., about associations, efficacy of regimens. It can be a particularly important source of bias in metaanalysis.
p Value
The probability that a test statistic would be as extreme as or more extreme than observed if the null hypothesis were true. The letter P, followed by the abbreviation n.s. (not significant) or by the symbol < (less than) or > (greater than) and a decimal notation, such as 0.01, 0.05, is a statement of the probability that the difference observed could have occurred by chance if the groups were really alike, i.e., under the null hypothesis.
Investigators may arbitrarily set their own significance levels, but in most biomedical and epidemiologic work, a study result whose probability value is less than 5% (P < 0.05) or 1 % (P < 0.01) is considered sufficiently unlikely to have occurred by chance to justify the designation "statistically significant." See also statistical significance.
Random
Governed by chance; not completely determined by other factors. As opposed to deterministic.
Random error
Error due to the play of chance. Confidence intervals and Pvalues allow for the existence of random error, but not systematic errors (bias).
Random sample
A sample that is arrived at by selecting sample units such that each possible unit has a fixed and determinate probability of selection.
Randomisation
The process of randomly allocating participants into one of the arms of a controlled trial. There are two
components to randomisation: the generation of a random sequence, and its implementation, ideally in a way so that those entering participants into a study are not aware of the sequence (concealment of allocation).
Randomized controlled trial (RCT)
An epidemiologic experiment in which subjects in a population are randomly allocated into groups, usually called study and control groups, to receive or not to receive an experimental preventive or therapeutic procedure, manoeuvre, or intervention. The results are assessed by rigorous comparison of rates of disease, death, recovery, or other appropriate outcome in the study and control groups. Randomized controlled trials are generally regarded as the most scientifically rigorous method of hypothesis testing available in epidemiology.
Rate
A measure of the frequency of occurrence of a phenomenon. In epidemiology, demography, and vital statistics, a rate is an expression of the frequency with which an event occurs in a defined population in a specified period of time. The use of rates rather than raw numbers is essential for comparison of experience between populations at different times, different places, or among different classes of persons. The components of a rate are the numerator, the denominator, the specified time in which events occur, and usually a multiplier, a power of 10, that converts the rate from an awkward fraction or decimal to a whole number:
In vital statistics,
Number of events in specified period
Rate = ______________________________
Average population during the period
In epidemiology, the denominator is usually persontime.
All rates are ratios, calculated by dividing a numerator, e.g., the number of deaths, or newly occurring cases of a disease in a given period, by a denominator, e.g., the average population during that period. Some rates are proportions, i.e., the numerator is contained within the denominator. Rate has several different usages in epidemiology.
1. As a synonym for ratio, it refers to proportions as rates, as in the terms cumulative incidence rate, prevalence rate, survival rate.
2. In other situations, rate refers only to ratios representing relative changes (actual or potential) in two quantities. This accords with the OED, which gives "relative amount of variation" among its definitions for rate.
3. Sometimes rate is further restricted to refer only to ratios representing changes over time. In this usage, prevalence rate would not be a "true" rate because it cannot be expressed in relation to units of time but only to a "point" in time; in contrast, the force of mortality or force of morbidity (hazard rate) is a "true" rate, for it can be expressed as the number of cases developing per unit time divided by the total size of the population at risk.
Recall bias
Systematic error due to differences in accuracy or completeness of recall to memory of past events or experiences. For example, a mother whose child has died of leukemia is more likely than the mother of a healthy living child to remember details of such past experiences as use of xray services when the child was in utero.
Reference population
The standard against which a population that is being studied can be compared.
Regression analysis
Given data on a dependent variable y and one or more independent variables x1, x2 etc., regression analysis involves finding the "best" mathematical model (within some restricted class of models) to describe y as a function of the x's, or to predict y from the x's. The most common form is a linear model; in epidemiology, the logistic and proportional hazards models are also common.
Relative risk
1. The ratio of the risk of disease or death among the exposed to the risk among the unexposed; this usage is synonymous with risk ratio.
2. Alternatively, the ratio of the cumulative incidence rate in the exposed to the cumulative incidence rate in the unexposed, i.e., the rate ratio.
3. The term relative risk has also been used synonymously with odds ratio and, in some biostatistical articles, has been used for the ratio of forces of morbidity. The use of the term relative risk for several different quantities arises from the fact that for "rare" diseases (e.g., most cancers) all the quantities approximate one another. For common occurrences (e.g., neonatal mortality in infants under 1500g birth weight), the approximations do not hold. See also odds ratio; risk ratio.
Relative risk reduction
1. An estimate of the number of people spared the consequences of an exposure that has been eliminated or controlled, expressed as a proportion of the number who would have been affected.
2. The amount by which a person's risk of disease is reduced by elimination or control of an exposure to risk.
Reliability
The degree of stability exhibited when a measurement is repeated under identical conditions. Reliability refers to the degree to which the results obtained by a measurement, procedure can be replicated. Lack of reliability may arise from divergences between observers or instruments of measurement or instability of the attribute being measured.
Reporting bias
Selective revealing or suppression of information about past medical history, e.g., details of sexual experiences.
Reproducible Able to be done the same way elsewhere.
Retrospective study
A research design that is used to test etiologic hypotheses in which inferences about exposure to the putative causal factor(s) are derived from data relating to characteristics of the persons under study or to events or experiences in their past. The essential feature is that some of the persons under study have the disease or other outcome condition of interest, and their characteristics and past experiences are compared with those of other, unaffected persons. Persons who differ in the severity of the disease may also be compared. There is disagreement among epidemiologists as to the desirability of using the term retrospective study rather than case control study to describe this method. See also case control study.
Risk
The probability that an event will occur, e.g., that an individual will become ill or die within a stated period of time or by a certain age. Also, a nontechnical term encompassing a variety of measures of the probability of a (generally) unfavourable outcome. See also probability.
Risk difference (Syn: excess risk) The absolute difference between two risks.
Risk factor
An aspect of personal behaviour or lifestyle, an environmental exposure, or an inborn or inherited characteristic, that, on the basis of epidemiologic evidence, is known to be associated with healthrelated condition(s) considered important to prevent. The term risk factor is rather loosely used, with any of the following meanings:
1. An attribute or exposure that is associated with an increased probability of a specified outcome, such as the occurrence of a disease. Not necessarily a causal factor. A risk marker.
2. An attribute or exposure that increases the probability of occurrence of disease or other specified outcome. A determinant.
3. A determinant that can be modified by intervention, thereby reducing the probability of occurrence of disease or other specified outcomes. To avoid confusion, it may be referred to as a modifiable risk factor.
Risk ratio The ratio of two risks, usually exposed/not exposed
SE See standard error.
Secondary outcome
An outcome used to evaluate additional effects of the intervention deemed a priori as being less important than the primary outcomes.
Selection bias
Error due to systematic differences in characteristics between those who take part in a study and those who do not. Examples include subjects in a survey limited to volunteers or persons present in a particular place at a particular time, or hospital cases under the care of a physician, excluding those who die before admission to hospital because the course of their disease is so acute, those not sick enough to require hospital care, or those excluded by cost, distance, or other factors. Selection bias invalidates conclusions and generalizations that might otherwise be drawn from such studies. It is a common and commonly overlooked problem.
Sensitivity and specificity (of a screening test)
Sensitivity is the proportion of truly diseased persons in the screened population who are identified as diseased by the screening test. Sensitivity is a measure of the probability of correctly diagnosing a case, or the probability that any given case will be identified by the test (syn: true positive rate)
Specificity is the proportion of truly nondiseased persons who are so identified by the screening test. It is a measure of the probability of correctly identifying a nondiseased person with a screening test (syn: true negative rate). The relationships are shown in the following fourfold table, in which the letters a, b, c, and d represent the quantities specified below the table.

True status 
TOTAL 

Screening test results 
Diseased 
Not diseased 

Positive 
a 
b 
a+b 
Negative 
c 
d 
c + d 
Total 
a + c 
b + d 
a+b+c+d 
a. Diseased individuals detectable by the test (true positives)
b. Nondiseased individuals positive by the test (false positives)
c. Diseased individuals not detectable by the test (false negatives)
d. Nondiseased individuals negative by the test (true negatives)
Sensitivity = a/a + c
Specificity = d/b + d
Specificity (of a test) See sensitivity and specificity.
Standard deviation
A measure of dispersion, or variation. It is the most widely used measure of dispersion of a frequency distribution. It is equal to the positive square root of the variance. The mean tells where the values for a group are centered. The standard deviation is a summary of how widely dispersed the values are around this centre.
Standard error
The standard deviation of an estimate. Used to calculate confidence intervals.
Statistical significance
Statistical methods allow an estimate to be made of the probability of the observed or greater degree of association between independent and dependent variables under the null hypothesis. From this estimate, in a sample of given size, the statistical "significance" of a result can be stated. Usually the level of statistical significance is stated by the p value.
Stratification
The process of or result of separating a sample into several subsamples according to specified criteria, such as age groups, socioeconomic status, etc. The effect
of confounding variables may be controlled by stratifying the analysis of results. For example, lung cancer is known to be associated with smoking. To examine the possible association between urban atmospheric pollution and lung cancer, controlling for smoking, the population may be divided into strata according to smoking status. The association between air pollution and cancer can then be appraised separately within each stratum. Stratification is used not only to control for confounding effects but also as a way of detecting modifying effects. In this example, stratification makes it possible to examine the effect of smoking on the association between atmospheric pollution and lung cancer.
Student's ttest See t distribution and t test.
Surrogate endpoints
Outcome measures that are not of direct practical importance but are believed to reflect outcomes that are important; for example, blood pressure is not directly important to patients but it is often used as an outcome in clinical trials because it is a risk factor for stroke and heart attacks. Surrogate endpoints are often physiological or biochemical markers that can be relatively quickly and easily measured, and that are taken as being predictive of important clinical outcomes. They are often used when observation of clinical outcomes requires long followup. (Also called intermediary outcomes, surrogate outcomes.)
Survival analysis
A class of statistical procedures for estimating the survival function and for making inferences about the effects on it of treatments, prognostic factors, exposures, and other covariates. Compare KaplanMeier estimate.
TDistribution, Ttest
The tdistribution is the distribution of a quotient of independent random variables, the numerator of which is a standardized normal variate and the denominator of which is the positive square root of the quotient of a chisquare distributed variate and its number of degrees of freedom. The ttest uses a statistic that, under the null hypothesis, has the tdistribution to test whether two means differ significantly, or to test linear regression or correlation coefficients.
Test of association
A statistical test to assess whether the value of one variable is associated (i.e. varies with) the value of another variable, or whether the presence or absence of a factor is more likely when a particular outcome is present. See also correlation.
Time to event
A description of the data in studies where the analysis relates not just to whether an event occurs but also when. Such data are analysed using survival analysis. (Also called survival data.)
True positive rate See sensitivity.
2x2 table
A contingency table with two rows and two columns. It
arises in clinical trials that compare dichotomous outcomes, such as death, for an intervention and control group or two intervention groups.
Type I error See error.
Type II error See error.
Utility
The value of a particular health state, usually expressed on a scale from 0 to 1; it is used in defining QALYs and healthadjusted life expectancy. Utility is determined from preferences expressed by an individuals in a von Neumann Morgenstern standard gamble, time tradeoff, or other related technique.
Validation The process of establishing that a method is sound.
Validity
The degree to which a result (of a measurement or study) is likely to be true and free of bias (systematic errors). Validity has several other meanings, usually accompanied by a qualifying word or phrase; for example, in the context of measurement, expressions such as 'construct validity', 'content validity' and 'criterion validity' are used. See also external validity, internal validity.
Variable
Any quantity that varies. Any attribute, phenomenon, or event that can have different values.
Variance
A measure of the variation shown by a set of observations, defined by the sum of the squares of deviation from the mean, divided by the number of degrees of freedom in the set of observations.
Sources
 A dictionary of epidemiology, edited by John M. Last (4th edition). Oxford University Press, 2001
 http://www.cochrane.org/resources/glossary.htm (March 2005)