Common Data Elements: Standardizing Data Collection

Common Data Elements (CDEs)

Definition of CDE

Before we define a common data element, think about how participants might answer the below questions. Which questions would lead to a more precisely defined answer?


Select the questions which are precisely defined with a distinct set of permissible responses:

  1. Alcohol Consumption

a) How often do you drink alcohol? ____________ (free text)
b) Select the number of alcoholic drinks you consume per week: 0, 1-2, 3-4, 5-6, 7+

  1. Date of Birth

a) What is your birthday? ________________ (free text)
b) Enter Date of Birth: (MM/DD/YYYY)

Did you notice that some of the questions could produce answers in many different formats? How would researchers combine data from two studies about alcohol consumption, if one study allowed free text response, and the other provided a list of answers to choose from? Similarly, how would researchers harmonize birth dates from two different studies, if one allowed free text responses and one specified a particular date format? It would be difficult, if not impossible.

This is where common data elements fit in.

Application form with ballpoint pen. Focus on date of birth and social security number.

(Image Source: iStock Photo, ©)

A common data element (CDE) is a standardized, precisely defined question that is paired with a set of specific allowable responses, that is then used systematically across different sites, studies, or clinical trials to ensure consistent data collection.

In other words, common data elements are developed so that data can be collected in the same way across multiple research studies. They are generally structured as a precisely defined question and answer; with the answer having a specified format or set of permissible values. They can be grouped into sets to form questionnaires or surveys, case report forms or other instruments. Common data elements are defined unambiguously in both human and machine-computable terms.

The idea is that, if we can agree in the planning stages on what we are collecting and how the data is represented in a system, we can enable easier data sharing and reuse later.

The U.S. Core Data for Interoperability (USCDI) has a list of standardized elements and recommendations to help create consistency across electronic health records, such as:

Patient name Date of birth Medication allergies
Sex (assigned at birth) Preferred language Immunizations
Ethnicity Race Vital signs
Smoking status Lab values / results Medications
Health concerns Procedures Goals

Consider the USCDI data element: Ethnicity . The permissible values have been defined by the Office of Management and Budget (OMB) standard. The permissible values for ethnicity are:

  • Hispanic or Latino
  • Not Hispanic or Latino

You can view this entry in the NIH Common Data Element Repository, which we will go into more detail about later in this course.

Using common data elements contributes to the FAIR data principles we learned earlier. CDEs allow you to find a similar cohort in different data sets. They make data interoperable, increasing statistical power and allowing you to compare your data to existing data. CDEs help with efficiency of research, speeding up study start time by reusing metadata of standard forms, instruments, and tools. They also reduce the burden on data repositories for data validation and quality and on data coordinating centers that harmonize data after it’s been collected.

Additional Training

Common data elements are one type of health data standard. There are many types of health data standards that help us structure, organize and exchange health data. To learn more, see the course A Bird's Eye View of Health Data Standards - On Demand .

The Office of the National Coordinator for Health Information Technology (ONC). (n.d.). United States Core Data for Interoperability (USCDI). Improving Healthcare Data Interoperability. .