### Finding and Using Health Statistics

## Content

# Sampling

It is often impossible to study every person in a large population of interest. Instead, researchers study a sample to make estimates about the total population. A population in the context of statistics refers to the set of items- these can be people, events, households, institutions, or something else- that are the subject of research, about which a researcher would like to answer a given question. The sample is the set of data collected from the population of interest or target population. A sample is collected from a sampling frame, or the set of information about the accessible units in a sample. Again, these units could be people, events, or other subjects of interest.

The aim of sampling is to approximate a larger population on characteristics relevant to the research question, to be representative so that researchers can make inferences about the larger population. There are many types of sampling methods, but most sampling falls into two main categories: probability sampling, and non-probability sampling.

Probability sampling involves random selection, each person in the group or community has an equal chance of being chosen. In statistical theory based on probability, this means that the sample is more likely to resemble the larger population, and thus more accurate inferences can be made about the larger population.

Non-probability sampling does not involve random selection and so cannot rely on probability theory to ensure that it is representative of the population of interest. However, many researchers use nonprobability sampling because in many cases, probability sampling is not practical, feasible, or ethical. There are still many purposive methods of nonprobability sampling that researchers employ to approximate the population of interest.

Researchers also should be aware of sampling error. Sampling error is the approximate difference between the results from a sample of people from a larger group, and the likely results of studying every single person in that group. In general, the larger the sample size, the smaller the sampling error. Because it is impossible to know the sampling error exactly, all sampling errors are approximate and are based on a calculation called the standard deviation.

A diagram showing the connection between population and sample size. From the population we take a sampling to create the sample size. Then, from the sample size, we can make inferences as to the population.

^{[1]}Kelley, K., Clark, B., Brown V., and J. Sitzia. Good Practice in the Conduct and Reporting of Survey Research. International Journal for Quality in Health Care, 15 (3): 261-266. 2003.