Terminology misuse

Inexperienced researchers are usually well aware of their shortcomings and tend to overuse technical terms to appear less inexperienced. However, the effect on the reader may be contrary to what the author expects if the terminology is erroneous or reflects misunderstandings. The terminology misuse may also confuse the reader and reduce the report's readability. The guidelines for manuscript preparation from the International Committee of Medical Journal Editors include the recommendation to “Avoid nontechnical uses of technical terms in statistics”. Here is a list of some of the most misused statistical terms in medical research reports.

Case-control study

A cohort study is based on the follow-up of groups defined at baseline concerning events occurring during follow-up. When one group includes cases (defined by the exposure to a studied agent or procedure) and another group controls (unexposed subjects), the study design is often incorrectly described as a case-control study. However, a case-control study is based on comparing cases having or having had a specific disease with controls free from the disease concerning their exposure history.

Incidence and risk

The words risk and incidence are often used as synonyms. However, incidence can be defined either as cumulative incidence or incidence density. The first definition is the same as risk, i.e. the number of cases developing a disease during a specified follow-up relative to the number of subjects at the start of the follow-up. The second definition is defined with the same numerator, the number of cases developing a disease during a specified follow-up, but another denominator, the sum of person-time at risk. While the denominator of the cumulative incidence is just the number of persons at the start of follow-up, the denominator of incidence density has to be calculated by summing each individual's time at risk in terms of days, months or years. In large samples, an approximation can be made by using the average number of persons during follow-up multiplied by the length of follow-up in the same terms.

Multivariate

A statistical model is multivariate if it is based on a multivariate probability distribution assumption. An ordinary regression model may include multiple explanatory variables. However, as it has only one response variable, it would still be a univariate model, albeit often described as multiple or multivariable. A regression model with multiple response variables would be multivariate. It could be described as a multivariate multivariable model if it also has multiple explanatory variables.

Primary endpoint, efficacy, adverse events

The terminology developed for confirmatory trials is often misused in observational studies, where the terms do not always have their original interpretation. For example, an adverse event is generally known as any untoward medical occurrence with a temporal but not necessarily causal relation to the studied treatment. While treatment-related adverse events, complications and side effects may be registered in observational studies, information on all temporally related adverse events is usually unavailable. Primary and secondary outcomes play crucial roles in addressing multiplicity issues, but multiplicity issues are not relevant in observational studies, and while efficacy can be studied in an experiment, effectiveness can usually only be studied in an observational study.

Quartiles and other quantiles

Quartiles are the three points that divide an ordered dataset into four quarts. The first and third of these points have one quart of the dataset on one side of it and three quarts on the other. The remaining second quartile, also known as the median, has two quarters of the dataset on each side. The quarts are often incorrectly described as quartiles. Similar misunderstandings are common with the terms tertiles and quintiles.