NLM logo

Finding and Using Health Statistics


Confidence Intervals

Confidence intervals are frequently reported in scientific literature and indicate how close research results are to reality, or how reliable they are, based on statistical theory. The confidence interval uses the sample to estimate the interval of probable values of the population; the parameters of the population.

For example, if a study is 95% reliable, with a confidence interval of 47-53, that means if researchers did the same study over and over and over again with samples of the whole population, they would get results between 47 and 53 exactly 95% of the time. The reliability in this example refers to the consistency of the measurement, or the ability to repeat it. Poor reliability can happen with a small population, or if the health event being studied does not happen often or at regular times.


Correlation is a statistical measure of the extent to which two variables relate to one another.  The terms association and correlation are often used interchangeably.  One commonly used measure of the linear correlation between two variables is Pearson’s correlation coefficient (denoted by the symbol ρ for population, or the letter r for a sample).  Given the values of two variables for a set of observations (X is usually used to denote the independent variable and Y for the dependent variable), Pearson’s correlation coefficient can be calculated using a mathematical formula.  As a result of the formula used to compute the correlation coefficient, its value will always lie between -1 and 1.

 If r > 0 (positive correlation coefficient), then X and Y are positively correlated.  In other words, large values for X correspond to large values for Y, and vice versa.  If r < 0 (negative correlation coefficient), then X and Y are negatively correlated.  In other words, larger values for X correspond to smaller values for Y, and vice versa.  If r = 0 then there is not a relationship among the variables.


Often in the health sciences, finding a correlation between two variables is not enough.  Investigators are interested in whether an exposure causes a particular health outcome.  Does smoking cause lung cancer?  Will taking a particular medication cause a decrease in blood pressure?

The concept of probabilistic causation is used in statistics. Probabilistic causation means that the relationship between the independent variable and the dependent variable (X and Y) are such that X increases the probability of Y when all else is equal. According to probability theory, a randomized control trial (RCT), in which subjects are randomly selected and there are case and control groups, is one of the study designs most likely to determine a causal relationship. RCTs are sometimes used in clinical testing, but are frequently unfeasible or unethical for other types of health and social science research. There are many ways to design research, and alternative methods for assessing causal relationships other than RCTs.


Previous Section Next Section