Skip Navigation Bar

National Information Center on Health Services Research and Health Care Technology (NICHSR)


A challenge for any HTA is to derive substantial findings from scientific evidence drawn from different types of studies of varying quality. Assessors should use a systematic approach to critically appraise the quality of the available studies.

Interpreting evidence requires knowledge of investigative methods and statistics. Assessment groups should include members who are knowledgeable in these areas. Some assessment programs assign content experts and evidence evaluation experts to prepare background papers that present and appraise the available evidence for use by assessment groups. Notwithstanding the expertise required to thoroughly and accurately assess evidence, even a basic understanding of fundamental evidence principles can help decision makers to appreciate the importance to health practice and policy of distinguishing between stronger and weaker evidence.

As suggested by the causal pathway in Box 23, assessors can interpret evidence at multiple levels. Evidence can be interpreted at the level of an individual study, e.g., an RCT pertaining to a particular intervention and outcome. It also can be interpreted at the level of a body of evidence (e.g., set of clinical studies) pertaining to the intervention and outcome. In some instances, evidence can be interpreted for a broader body of evidence for a linked set of interventions as a whole, such as for a screening test linked to results that are linked to one or more treatments with intermediate and long-term outcomes (Harris 2001). For example, the main criteria for judging evidence quality at each of these levels by the US Preventive Services Task Force are shown in Box 24.


Box 23
A General Causal Pathway: Screening Procedure and Alternative Treatments

A General Causal Pathway: Screening Procedure and Alternative Treatments

Source: Adapted from Harris 2001.

Box 24
Evaluating Evidence Quality at Three Levels


Level of Evidence

Criteria for Judging Quality

Individual study

-Internal validitya


-External validityb

Linkage in analytic framework -

-Aggregate internal validitya


-Aggregate external validityb



Entire preventive service -

-Quality of the evidence from Stratum 2 for each linkage in the analytic framework

  - Degree to which there is a complete chain of linkages supported by adequate evidence to connect the preventive service to health outcomes
  - Degree to which the complete chain of linkages "fit" togetherc
  -Degree to which the evidence connecting the preventive service and health outcomes is "direct"d

a Internal validity is the degree to which the study(ies) provides valid evidence for the population and setting in which it was conducted.
b External validity is the extent to which the evidence is relevant and generalizable to the population and conditions of typical primary care practice.
c"Fit" refers to the degree to which the linkages refer to the same population and conditions. For example, if studies of a screening linkage identify people who are different from those involved in studies of the treatment linkage, the linkages are not supported by evidence that "fits" together.
d "Directness" of evidence is inversely proportional to the number of bodies of evidence required to make the connection between the preventive service and health outcomes. Evidence is direct when a single body of evidence makes the connection, and more indirect if two or more bodies of evidence are required.

Source: Harris 2001.

Appraising Individual Studies

Certain attributes of primary studies produce better evidence than others. In general, the following attributes of primary studies can be used to distinguish between stronger and weaker evidence for internal validity (i.e., for accurately representing the causal relationship between an intervention and an outcome in the particular circumstances of a study).

  • Prospective studies are superior to retrospective studies.
  • Experimental study designs are superior to observational study designs.
  • Controlled studies are superior to uncontrolled ones.
  • Contemporaneous (occurring at the same time) control groups are superior to historical control groups.
  • Internal control groups (i.e., managed within the study) are superior to studies with external control groups.
  • Randomized studies are superior to nonrandomized ones.
  • Large studies (i.e., involving enough patients to detect with acceptable confidence levels any true treatment effects) are superior to small studies.
  • Blinded studies (in which patients, and clinicians and data analysts where possible, do not know which intervention is being used) are superior to unblinded studies.
  • Studies that clearly define patient populations, interventions, and outcome measures are superior to those that do not clearly define these parameters.

Basic types of methods for generating new data on the effects of health care technology in humans include the following.

  • Large randomized controlled trial (RCT)
  • Small RCT
  • Nonrandomized trial with contemporaneous controls
  • Nonrandomized trial with historical controls
  • Cohort study
  • Case-control study
  • Cross-sectional study
  • Surveillance (e.g., using databases, registers, or surveys)
  • Series of consecutive cases
  • Single case report (anecdote)

Consistent with the attributes of stronger evidence noted above, these methods are listed in rough order of most to least scientifically rigorous for internal validity. This ordering of methods assumes that each study is properly designed and conducted. This list is representative; there are other variations of these study designs and some investigators use different terminology for certain methods. The demand for studies of higher methodological rigor is increasing among health care technology regulators, payers, providers and other policymakers.

It is not only the basic type of a study design (e.g., RCT or case-control study) that affects the quality of the evidence, but the way in which the study was designed and conducted. There are systematic ways to evaluate the quality of individual studies. In particular, there are numerous approaches for assessing studies of health care interventions, particularly RCTs (Schulz 1995, Jadad 1996). Such approaches typically use one of three main approaches: component, checklist, and scale assessment (Moher, Jadad 1996), for example, as shown in Box 25 and Box 26. Available research indicates that the more complex scales do not seem to produce more reliable assessments of the validity or "quality" of a study (Juni 1999).

Box 25
Basic Checklist for Reviewing Reports of Randomized Controlled Trials

Did the trial:



1. Specify outcome measures (endpoints) prior to the trial?




2. Provide patient inclusion/exclusion criteria?




3. Specify a-level for defining statistical significance?




4. Specify b-level (power) to detect a treatment effect of a given meaningful magnitude?




5. Make a prior estimate of required sample size (to satisfy levels of a and b)?




6. Use a proper method for random allocation of patients to treatment and control groups?




7. Use blinding (where possible):




a. in the randomization process?



b. for patients regarding their treatment?



c. for observers/care givers regarding treatment?



d. in collecting outcome data?



8. State the numbers of patients assigned to the respective treatment and control groups?




9. Clearly describe treatment and control (including placebo where applicable)?




10. Account for patient compliance with treatments/regimens?




11. Account for all events used as primary outcomes?




12. Account for patient withdrawals/losses to follow-up?




13. Analyze patient withdrawals/losses to follow-up




a. by intention-to-treat?




b. by treatment actually received?




14. Account for treatment complications/side effects?




15. Provide test statistics (e.g., F, t, Z, chi-square) and P values for endpoints?




16. Provide confidence intervals or confidence distributions?




17. Discuss whether power was sufficient for negative trials?




18. Interpret retrospective analyses (post hoc examination of subgroups and additional endpoints not identified prior to trial) appropriately?



Source: Goodman 1993.

Box 26
Jadad Instrument to Assess the Quality of RCT Reports

This is not the same as being asked to review a paper. It should not take more than 10 minutes to score a report and there are no right or wrong answers.

Please read the article and try to answer the following questions (see attached instructions):

  1. Was the study described as randomized (this includes the use of words such as randomly, random, and randomization)?
  2. Was the study described as double blind?
  3. Was there a description of withdrawals and dropouts?

Scoring the items:

Either give a score of 1 point for each "yes" or 0 points for each "no." There are no in-between marks.

Give 1 additional point if: For question 1, the method to generate the sequence of randomization was described and it was appropriate (table of random numbers, computer generated, etc.)

and/or: If for question 2, the method of double blinding was described and it was appropriate (identical placebo, active placebo, dummy, etc.)

Deduct 1 point if: For question 1, the method to generate the sequence of randomization was described and it was inappropriate (patients were allocated alternately, or according to date of birth, hospital number, etc.)

and/or: for question 2, the study was described as double blind but the method of blinding was inappropriate (e.g., comparison of tablet vs. injection with no double dummy)

Guidelines for Assessment

  1. Randomization: A method to generate the sequence of randomization will be regarded as appropriate if it allowed each study participant to have the same chance of receiving each intervention and the investigators could not predict which treatment was next. Methods of allocation using date of birth, date of admission, hospital numbers, or alternation should not be regarded as appropriate.
  2. Double blinding: A study must be regarded as double blind if the word "double blind" is used. The method will be regarded as appropriate if it is stated that neither the person doing the assessments nor the study participant could identify the intervention being assessed, or if in the absence of such a statement the use of active placebos, identical placebos, or dummies is mentioned.
  3. Withdrawals and dropouts: Participants who were included in the study but did not complete the observation period or who were not included in the analysis must be described. The number and the reasons for withdrawal in each group must be stated. If there were no withdrawals, it should be stated in the article. If there is no statement on withdrawals, this item must be given no points.

Source: Jadad 1996.

The criteria used for assessing quality of studies vary by type of design. For example, the internal validity of an RCT depends on such methodological criteria as: method of randomization, accounting for withdrawals and dropouts, and blinding/masking of outcomes assessment. The internal validity of systematic reviews (discussed below) depends on such methodological criteria as: time period covered by the review, comprehensiveness of the sources and search strategy used, relevance of included studies to the review topic, and application of a standard appraisal of included studies.  

The ability of analysts to determine the internal and external validity of a published study and to otherwise interpret its quality depends on how thoroughly and clearly the information about its study's design, conduct, statistical analysis, and other aspects are reported. The inadequate quality of a high proportion of published reports of RCTs, even in leading journals, has been well documented (Freiman 1978; Moher 1994). Several national and international groups of researchers and medical journal editors have developed standards for reporting of RCTs and other studies (Moher 2001; International Committee of Medical Journal Editors 1997). The trend of more journals to require structured abstracts has assisted analysts in identifying and screening reports of RCTs and other studies.

Many primary studies of health care technologies involve small, non-randomized series of consecutive cases or single case reports, and therefore have methodological limitations that make it difficult to establish the efficacy (or other attributes) of the technologies with sound scientific validity. To some extent, these methodological shortcomings are unavoidable given the nature of the technologies being evaluated, or are otherwise beyond the control of the investigators. In the instance of determining the efficacy of a new drug, the methodological standard is a large, prospective, double-blind, placebo-controlled RCT. These methodological attributes increase the chances of detecting any real treatment effect of the new drug, control for patient characteristics that might influence any treatment effect, and reduce opportunities for investigator or patient bias to affect results.

Although their contributions to methodological validity are generally well recognized, it is not possible to apply all of these attributes for trials of certain types of technologies or for certain clinical indications or settings. Further, these attributes are controversial in certain instances. Patient and/or investigator blinding is impractical or impossible for many medical devices and most surgical procedures. For clinical trials of technologies for rare diseases (e.g., "orphan drugs" and devices), it may be difficult to recruit numbers of patients large enough to detect convincing treatment effects.  

Among the various areas of methodological controversy in clinical trials is the appropriate use of placebo controls. Issues include: (1) appropriateness of using a placebo in a trial of a new therapy when a therapy judged to be effective already exists, (2) statistical requirements for discerning what may be smaller differences in outcomes between a new therapy and an existing one compared to differences in outcomes between a new therapy and a placebo, and (3) concerns about comparing a new treatment to an existing therapy that, except during the trial itself, may be unavailable in a given setting (e.g., a developing country) because of its cost or other economic or social constraints (Rothman 1994; Varmus 1997). As in other health technologies, surgical procedures can be subject to the placebo effect. In recent years, following previous missteps that raised profound ethical concerns, guidance has emerged for using "sham" procedures as placebos in RCTs of surgical procedures (Horng 2003). Some instances of patient blinding have been most revealing about the placebo effect in surgery, including arthroscopic knee surgery (Moseley 2002), percutaneous myocardial laser revascularization (Stone 2002), and neurotransplantation surgery (Boer 2002).

Notwithstanding the limitations inherent in clinical study of many technologies, the methodological rigor used in many primary studies falls short of what it could be. Clinicians, patients, payers, hospital managers, national policymakers, and others who make technology-related decisions and policies are becoming more sophisticated in demanding and interpreting the strength of scientifically-based findings.

Decide How to Use Studies

Most assessment groups have decided that it is not appropriate to consider all studies equally important, and that studies of higher quality should influence their findings more than studies of lesser quality. Experts in evidence interpretation do not agree on the proper approach for deciding how to use studies of differing quality. According to some experts, the results of studies that do not have randomized controls are subject to such great bias that they should not be included for determining the effects of an intervention. Others say that studies from nonrandomized prospective studies, observational studies, and other weaker designs should be used, but given less weight or adjusted for their biases.

There are several basic approaches to deciding how to use the individual studies in an assessment. These are: use all studies as reported; decide whether to include or exclude each study as reported; weight studies according to their relative quality; and make adjustments to the results of studies to compensate for their biases. Each approach has advantages and disadvantages, as well as differing technical requirements. As noted below with regard to establishing search strategies, the approaches to determining what types of studies to be used in an assessment should be determined prospectively as much as possible, so as to avoid injecting selection bias into study selection. Therefore, to the extent that assessors decide to use only certain types of studies (e.g., RCTs and systematic reviews) or not to use certain types of studies (e.g., case studies, case series, and other weaker designs), they should set their inclusion and exclusion criteria prospectively and design their literature search strategies accordingly. Assessment reports should document the criteria or procedures by which they chose to make use of study results for use in the assessment.

Appraising a Body of Evidence

As described above, certain attributes of primary study designs produce better evidence than others. A useful step in appraising evidence is to classify it by basic design type and other study characteristics.

Evidence tables provide a useful way to summarize and display important qualities about multiple individual studies pertaining to a given question. The information summarized in evidence tables may include attributes of study design (e.g., randomization, control, blinding, patient characteristics (e.g., number, age, gender), patient outcomes (e.g., mortality, morbidity, HRQL) and derived summary statistics (e.g., Pvalues, confidence intervals). The tabular format enables reviewers to compare systematically the key attributes of studies and to provide an overall picture of the amount and quality of the available evidence. Box 27 is an evidence table of selected study characteristics and outcomes of double-blind placebo-controlled RCTs of aspirin for patients after myocardial infarction.   

"Grading" a body of evidence according to its methodological rigor is a standard part of HTA. It can take various forms, each of which involves structured, critical appraisal of the evidence against formal criteria (RTI International-University of North Carolina 2002). Box 28 shows an evidence hierarchy that ranks study types from "well-designed randomized controlled trials" at the top through "opinions of respected authorities based on clinical experience" and similar types of expert views at the bottom. Box

Box 27
Evidence Table: Double-Blind Placebo-Controlled RCTs of Aspirin in Patients After Myocardial Infarction
No. patients
Age range
Months from
qualifying event
to trial entry
Daily dose
ASA1 (mg)
(%)2Sum. stat.3
Cardiac death
(%)Sum. stat.
Nonfatal MI

ASA: 2,267


2 - 60 1,000 3.2 10.8       Z=1.27
8.7        Z=0.82
7.7        Z=-2.11
ASA: 317
plac: 309


1 - 1.4 1,500 2.0 8.5         Z=-0.79
ASA: 758
plac: 771


74% > 60
77% > 60
972 1.8 5.8         Z=-1.90
5.4        Z=-1.87
3.7        Z=-0.46
ASA: 615
plac: 624


76% < 3 300 1.0 7.6         not sig.
-     -    
ASA: 832
plac: 850


50% < 0.25 900 1.0 12.8 not sig.
14.8 at P<0.05
-     -    
ASA: 810
plac: 406


2 - 60 972 3.4 10.5        Z=-1.21
8.0 Z=-1.24
6.9 not sig.


1ASA: aspirin (acetylsalicylic acid); plac: placebo
2Percent of mortality, cardiac death, and nonfatal myocardial infarction based on number of patients randomized.
3Sum. stat.: summary statistic. Z is a statistical test that can be used to determine whether the difference in proportions or means between a treatment group and a control group are statistically significant. For a two-tailed test, Z values of +1.96 and +2.58 are approximately equivalent to P values of 0.05 and 0.01.

Sources: Aspirin Myocardial Infarction Study Research Group 1980; Breddin et al. 1980; The Coronary Drug Project Research Group 1976; Elwood and Sweetnam 1979; Elwood et al. 1974; Elwood 1983; The Persantine-Aspirin Reinfarction Study Research Group 1980.

Box 28
UK NHS Centre for Reviews and Dissemination: Hierarchy of Evidence




Well-designed randomized controlled trials


Well-designed controlled trial with pseudo-randomization


Well-designed controlled trials with no randomization


Well-designed cohort (prospective) study with concurrent controls


Well-designed cohort (prospective) study with historical controls


Well-designed cohort (retrospective) study with concurrent controls


Well-designed case-control (retrospective) study


Large differences from comparisons between times and/or places with and without intervention

(in some circumstances these may be equivalent to level II or I)


Opinions of respected authorities based on clinical experience; descriptive studies; reports of expert committees

Source: NHS Centre for Reviews and Dissemination 1996.

29 shows a basic evidence-grading scheme that has been used by the US Preventive Services Task Force. This scheme grades evidence in a manner that favors certain attributes of stronger studies for primary data, beginning with properly-designed RCTs. In order to better address how well studies are conducted, the task force augmented this hierarchy with a three-category rating of the internal validity of each study, shown in Box 30.

Another type of evidence table, shown in Box 31, has a count of articles published during a given time period, arranged by type of study, about the use of percutaneous transluminal coronary angioplasty. Rather than showing details about individual studies, this evidence table shows that the distribution of types of studies in an apparently large body of evidence included a relatively small number of RCTs, and a large number of less rigorous observational studies.

Assessment groups can classify studies in evidence tables to gain an understanding of the distribution of evidence by type, and apply evidence hierarchies such as those shown above to summarize a body of evidence. However, more information may be needed to characterize the evidence in a useful way. For example, more detailed grading schemes can be used to account for instances where two or more well-designed studies have conflicting (heterogeneous) results. Box 32 distinguishes between groups of studies with homogeneous and heterogeneous results. This hierarchy also recognizes as stronger evidence studies with low probabilities of false positive error (α) and false negative error (β). This hierarchy also distinguishes between bodies of evidence depending on whether high-quality overviews (i.e., systematic reviews or meta-analyses) are available. 

Box 29
US Preventive Services Task Force: Hierarchy of Research Design



Evidence obtained from at least one properly-designed randomized controlled trial.


Evidence obtained from well designed controlled trials without randomization


Evidence obtained from well designed cohort or case-controlled analytic studies, preferably from more than one center or research group


Evidence obtained from multiple time series with or without the intervention. Dramatic results in uncontrolled experiments (such as the results of the introduction of penicillin treatment in the 1940s) could also be regarded as this type of evidence


Opinions of respected authorities, based on clinical experience, descriptive studies, or reports of expert committees.

Source: Harris 2001.

Box 30
US Preventive Services Task Force: Grades for Strength of Overall Evidence





Evidence includes consistent results from well-designed, well-conducted studies in representative populations that directly assess effects on health outcomes


Evidence is sufficient to determine effects on health outcomes, but the strength of the evidence is limited by the number, quality, or consistency of the individual studies; generalizability to routine practices; or indirect nature of the evidence on health outcomes


Evidence is insufficient to assess the effects on health outcomes because of limited number or power of studies, important flaws in their design or conduct, gaps in the chain of evidence, or lack of information on important health outcomes.

Source: U.S. Preventive Services Task Force 2002.

Box 31
Distribution of Research Articles on PTCA by Year of Publication and Method Used to Collect or Review Data


Article Class













Prospective RCT













Prospective non-RCT













Prospective registry













Case-control & adjusted cohort














































































Decision analysis



























Articles were retrieved using MEDLINE searches.

Source: Hilborne 1991.

Box 32
Levels of Evidence and Grades of Recommendations


If No Overview Available

If High-Quality Overview Available


Level of Evidence

Level of Evidence

Grade of Recommendation

I: Randomized trials with low false-positive (α) and low false negative (β) errors.

Lower limits of CI for treatment effect exceeds clinically significant benefit and:
I+: Individual study results homogeneous
I-: Individual study results heterogeneous


II: Randomized trials with high false-positive (α) and high false negative (β) errors.

Lower limit of CI for treatment effects falls below clinically significant benefit and:
II+: Individual study results homogeneous
II-: Individual study results heterogeneous


III: Nonrandomized concurrent cohort studies



IV: Nonrandomized historical cohort studies


V: Case series


Source: Cook 1992.

The more comprehensive evidence hierarchy from the UK NHS Centre for Evidence Based Medicine, shown in Box 33, provides levels of evidence (1a-c, 2a-c, etc.) to accompany findings based on evidence derived from various study designs and applications in prevention, therapy, diagnosis, economic analysis, etc.

Of course, HTAs may involve multiple questions about the use of a technology, e.g., pertaining to particular patient populations or health care settings. Therefore, the evidence and recommendations applying to each question may be evaluated separately or at different levels, as suggested in the causal pathway shown in Box 23.

Link Recommendations to Evidence

Findings and recommendations should be linked explicitly to the quality of the evidence. The process of interpreting and integrating the evidence helps assessment groups to determine the adequacy of the evidence for addressing aspects of their assessment problems (Hayward 1995).

An example of linking recommendations to evidence is incorporated into the evidence appraisal scheme cited above in Box 32, which assigns three grade levels to recommendations based on the evidence. Accompanying the grades for evidence (as shown in Box 30), the US Preventive Services Task Force provides grades for recommendations based on the evidence. This approach, shown in Box 34, reflects two dimensions: the direction of the recommendation (e.g., for or against providing a preventive service) and the strength of the recommendation, tied to the grade of evidence (e.g., a strong recommendation if there is good evidence). Finally, the comprehensive evidence hierarchy shown in Box 33 also includes grades of recommendation that are linked to levels of evidence, including levels that account for evidence homogeneity and heterogeneity.

Even for those aspects of an assessment problem for which there is little useful evidence, an assessment group may have to provide some type of findings or recommendations. This may involve making inferences from the limited evidence, extrapolations of evidence from one circumstance to another, theory, or other subjective judgments. Whether a recommendation about using a technology in particular circumstances is positive, negative, or equivocal (neutral), users of the assessment should understand the basis of that recommendation and with what level of confidence it was made. Unfortunately, the recommendations made in many assessment reports do not reflect the relative strength of the evidence upon which they are based. In these instances, readers may have the mistaken impression that all of the recommendations in an assessment report are equally valid or authoritative.

Approaches for linking the quality of available evidence to the strength and direction of findings and recommendations are being improved and new ones are being developed (Harbour 2001). Using evidence this way enables readers to better understand the reasoning behind the assessment findings and recommendations. It also provides readers with a more substantive basis upon which to challenge the assessment as appropriate. Further, it helps assessment programs and policymakers to determine if a reassessment is needed as relevant new evidence becomes available.

Box 33
Oxford Centre for Evidence-based Medicine Levels of Evidence (May 2001)



Therapy/Prevention, Aetiology/Harm



Differential diagnosis/symptom prevalence study

Economic and decision analyses


SR (with homogeneity*) of RCTs

SR (with homogeneity*) of inception cohort studies; CDR†  validated in different populations

SR (with homogeneity*) of Level 1 diagnostic studies; CDR†  with 1b studies from different clinical centres

SR (with homogeneity*) of prospective cohort studies

SR (with homogeneity*) of Level 1 economic studies


Individual RCT (with narrow Confidence Interval‡)

Individual inception cohort study with > 80% follow-up; CDR† validated in a single population

Validating** cohort study with good††† reference standards; or CDR†  tested within one clinical centre

Prospective cohort study with good follow-up****

Analysis based on clinically sensible costs or alternatives; systematic review(s) of the evidence; and including multi-way sensitivity analyses


All or none§

All or none case-series

Absolute SpPins and SnNouts†† 

All or none case-series

Absolute better-value or worse-value analyses †††† 


SR (with homogeneity* ) of cohort studies

SR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTs

SR (with homogeneity*) of Level >2 diagnostic studies

SR (with homogeneity*) of 2b and better studies

SR (with homogeneity*) of Level >2 economic studies


Individual cohort study (including low quality RCT; e.g., <80% follow-up)

Retrospective cohort study or follow-up of untreated control patients in an RCT; Derivation of CDR†  or validated on split-sample§§§ only

Exploratory** cohort study with good††† reference standards; CDR†  after derivation, or validated only on split-sample§§§ or databases

Retrospective cohort study, or poor follow-up

Analysis based on clinically sensible costs or alternatives; limited review(s) of the evidence, or single studies; and including multi-way sensitivity analyses


"Outcomes" Research; Ecological studies

"Outcomes" Research


Ecological studies

Audit or outcomes research


SR (with homogeneity*) of case-control studies


SR (with homogeneity*) of 3b and better studies

SR (with homogeneity*) of 3b and better studies

SR (with homogeneity*) of 3b and better studies


Individual Case-Control Study


Non-consecutive study; or without consistently applied reference standards

Non-consecutive cohort study, or very limited population

Analysis based on limited alternatives or costs, poor quality estimates of data, but including sensitivity analyses incorporating clinically sensible variations.


Case-series (and poor quality cohort and case-control studies§§ )

Case-series (and poor quality prognostic cohort studies***)

Case-control study, poor or non-independent reference standard

Case-series or superseded reference standards

Analysis with no sensitivity analysis


Expert opinion without explicit critical appraisal, or based on physiology, bench research or "first principles"

Expert opinion without explicit critical appraisal, or based on physiology, bench research or "first principles"

Expert opinion without explicit critical appraisal, or based on physiology, bench research or "first principles"

Expert opinion without explicit critical appraisal, or based on physiology, bench research or "first principles"

Expert opinion without explicit critical appraisal, or based on economic theory or "first principles"


Users can add a minus-sign "-" to denote the level of that fails to provide a conclusive answer because of:

  • EITHER a single result with a wide Confidence Interval (such that, for example, an ARR in an RCT is not statistically significant but whose confidence intervals fail to exclude clinically important benefit or harm)
  • OR a Systematic Review with troublesome (and statistically significant) heterogeneity.
  • Such evidence is inconclusive, and therefore can only generate Grade D recommendations.


By homogeneity we mean a systematic review that is free of worrisome variations (heterogeneity) in the directions and degrees of results between individual studies. Not all systematic reviews with statistically significant heterogeneity need be worrisome, and not all worrisome heterogeneity need be statistically significant. As noted above, studies displaying worrisome heterogeneity should be tagged with a "-" at the end of their designated level.


Clinical Decision Rule. (These are algorithms or scoring systems which lead to a prognostic estimation or a diagnostic category. )

See note #2 for advice on how to understand, rate and use trials or other studies with wide confidence intervals.


Met when all patients died before the Rx became available, but some now survive on it; or when some patients died before the Rx became available, but none now die on it.


By poor quality cohort study we mean one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded), objective way in both exposed and non-exposed individuals and/or failed to identify or appropriately control known confounders and/or failed to carry out a sufficiently long and complete follow-up of patients. By poor quality case-control study we mean one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded), objective way in both cases and controls and/or failed to identify or appropriately control known confounders.


Split-sample validation is achieved by collecting all the information in a single tranche, then artificially dividing this into "derivation" and "validation" samples.


An "Absolute SpPin" is a diagnostic finding whose Specificity is so high that a Positive result rules-in the diagnosis. An "Absolute SnNout" is a diagnostic finding whose Sensitivity is so high that a Negative result rules-out the diagnosis.


Good, better, bad and worse refer to the comparisons between treatments in terms of their clinical risks and benefits.


Good reference standards are independent of the test, and applied blindly or objectively to applied to all patients. Poor reference standards are haphazardly applied, but still independent of the test. Use of a non-independent reference standard (where the 'test' is included in the 'reference', or where the 'testing' affects the 'reference') implies a level 4 study.


Better-value treatments are clearly as good but cheaper, or better at the same or reduced cost. Worse-value treatments are as good and more expensive, or worse and the equally or more expensive.


Validating studies test the quality of a specific diagnostic test, based on prior evidence. An exploratory study collects information and trawls the data (e.g. using a regression analysis) to find which factors are 'significant'.


By poor quality prognostic cohort study we mean one in which sampling was biased in favour of patients who already had the target outcome, or the measurement of outcomes was accomplished in <80% of study patients, or outcomes were determined in an unblinded, non-objective way, or there was no correction for confounding factors.


Good follow-up in a differential diagnosis study is >80%, with adequate time for alternative diagnoses to emerge (eg 1-6 months acute, 1 - 5 years chronic)

Grades of Recommendation


consistent level 1 studies


consistent level 2 or 3 studies or extrapolations from level 1 studies


level 4 studies or extrapolations from level 2 or 3 studies


level 5 evidence or troublingly inconsistent or inconclusive studies of any level

Source: Center for Evidence-Based Medicine 2003.

Box 34
US Preventive Services Task Force: Grades for Strength of Recommendations





The USPSTF strongly recommends that clinicians routinely provide [the service] to eligible patients. The USPSTF found good evidence that [the service] improves important health outcomes and concludes that benefits substantially outweigh harms


The USPSTF recommends that clinicians routinely provide [the service] to eligible patients. The USPSTF found at least fair evidence that [the service] improves important health outcomes and concludes that benefits outweigh harms.


The USPSTF makes no recommendation for or against routine provision of [the service]. The USPSTF found at least fair evidence that [the service] can improve health outcomes but concludes that the balance of benefits and harms is too close to justify a general recommendation


The USPSTF recommends against routinely providing [the service] to asymptomatic patients.  The USPSTF found at least fair evidence that [the service] is ineffective or that harms outweigh benefits.


The USPSTF concludes that the evidence is insufficient to recommend for or against routinely providing [the service]. Evidence that [the service] is effective is lacking,, of poor quality, or conflicting, and the balance of benefits and harms cannot be determined.

Source: U.S. Preventive Services Task Force 2002.

Assessment organizations and others that review evidence are increasingly providing guidance to technology sponsors and other stakeholders for preparing dossiers and other submissions of clinical and economic evidence. For example, the UK National Institute for Clinical Excellence (NICE) provides guidance to technology manufacturers and sponsors for preparing submissions of evidence to inform NICE technology appraisals (National Institute for Clinical Excellence 2001). The Academy of Managed Care Pharmacy (AMCP) provides a recommended format for submission of clinical and economic data in support of formulary consideration by pharmacy and therapeutics committees of health plans in the US (Academy of Managed Care Pharmacy 2002).  


Organizations that conduct or sponsor HTAs have only limited resources for this activity. With the great variety of potential assessment topics, HTA organizations need some practical means of determining what to assess. This section considers how assessment programs identify candidate assessment topics and set priorities among these.

Identify Candidate Topics

To a large extent, assessment topics are determined or bounded, by the mission or purpose of an organization. For example, the US FDA [] is systematically required to assess all new drugs and to assess health devices according to specific provisions made for particular classes of devices. For a new drug, a company normally files an Investigational New Drug Application (IND) with the FDA for permission to begin testing the drug in people; later, following successful completion of necessary clinical trials, the company files a New Drug Application (NDA) to seek FDA approval to market the drug. For certain medical devices (i.e., new "Class III" devices that sustain or support life, are implanted in the body, or present a potential risk of illness or injury), the Investigational Device Exemption (IDE) and Premarketing Approval (PMA) Application are analogous to the IND and NDA, respectively. The FDA is notified about many other devices when a company files a "510(k)" application seeking market approval based on a device's "substantial equivalence" to another device that has already received FDA marketing approval.

Third-party payers generally assess technologies on a reactive basis; a new medical or surgical procedure that is not recognized by payers as being standard or established may become a candidate for assessment. For the US Centers for Medicare and Medicaid Services (CMS), assessment topics arise in the form of requests for national coverage policy determinations that cannot be resolved at the local level or that are recognized to be of national interest. These requests typically originate with Medicare contractors that administer the program in their respective regions, Medicare beneficiaries (patients), physicians, health product companies, health professional associations, and government entities. CMS may request assistance in the form of evidence reports or other assessments by a sister agency, AHRQ.  

For the Evidence-based Practice Centers program, also administered by AHRQ, the agency solicits topic nominations for evidence reports and technology assessments in a public notice in the US Federal Register. Topics have been nominated by a variety of other government agencies, payers, health systems and networks, health professions associations, employer and consumer groups, disease-based organizations, and others. In selecting topics, AHRQ considers not only the information about the topic itself, but the plans of the nominating organization to make use of the findings of the assessment. Information required in these nominations is shown in Box 35.

The American College of Physicians (ACP) Clinical Efficacy Assessment Program (CEAP), which develops clinical practice guidelines, determines its guideline topics based upon evidence reports developed by the AHRQ Evidence-based Practice Centers (EPC) program. (Topics of the EPC program are nominated by outside groups, including ACP.) The topics undertaken by ECRI's technology assessment service are identified by request of the service's subscribers, including payers, providers, and others. For the Cochrane Collaboration, potential topics generally arise from members of the review groups, who are encouraged to investigate topics of interest to them, subject to the agreement of their review groups (Clarke 2003).

Box 35
Evidence-based Practice Centers Topic Nominations

Topic nominations for the AHRQ EPC program should include:

  • Defined condition and target population
  • Three to five very focused questions to be answered
  • Incidence or prevalence, and indication of disease burden (e.g., mortality, morbidity, functional impairment) in the US general population or in subpopulations (e.g., Medicare and Medicaid populations)
  • Costs associated with the conditions, including average reimbursed amounts for diagnostic and therapeutic interventions
  • Impact potential of the evidence report or technology assessment to decrease health care costs or to improve health status or clinical outcomes
  • Availability of scientific data and bibliographies of studies on the topic
  • References to significant differences in practice patterns and/or results; alternative therapies or controversies
  • Plans of the nominating organization to incorporate the report into its managerial or policy decision making (e.g., practice guidelines, coverage policies)
  • Plans of the nominating organization for dissemination of these derivative products to its membership
  • Process by which the nominating organization will measure members' use of the derivative products
  • Process by which the nominating organization will measure the impact of such use on clinical practice

Source: Agency for Healthcare Research and Quality 2003.

Horizon Scanning

The demand for scanning of multiple types of sources for information about new health care interventions has prompted the development of "early warning" or "horizon scanning" functions in the US, Europe, and elsewhere (Douw 2003). Horizon scanning functions are intended to serve multiple purposes, including to:

  • Identify potential topics for HTA and information for setting priorities among these
  • Clarify expectations for the uses or indications of a technology
  • Increase public awareness about new technologies
  • Estimate the expected health and economic impacts
  • Identify critical thresholds of effectiveness improvements in relation to additional costs, e.g., to demonstrate the cost-effectiveness of a new intervention
  • Anticipate potential social, ethical, or legal implications of a technology (Harper 1998; Stevens 1998; Carlsson 1998).


Among the organizations with horizon scanning functions are:  

For example, CETAP draws its information from the Internet, published literature, CCOHTA committee members, and other experts. The products of CETAP include short Alerts that address very early technologies, and as more evidence becomes available, CCOHTA publishes more in-depth, peer-reviewed Issues in Emerging Health Technologies bulletins. The purposes of EuroScan (European Information Network on New and Changing Health Technologies), a collaborative network of more than a dozen HTA agencies, are to: evaluate and exchange information on new and changing technologies, develop information sources, develop applied methods for early assessment, and disseminate information on early identification and assessment activities.  

As shown in Box 36, a considerable variety of online databases, newsletters, and other sources provide streams of information pertaining to new and emerging health care interventions. Certainly, an important set of sources for identifying new topics are bibliographic databases such as MEDLINE (accessible, e.g., via PubMed) and EMBASE. The Cochrane Collaboration protocols are publicly available, detailed descriptions of systematic reviews currently underway by Cochrane, which include detailed descriptions of the rationale for the review, information sources, and search strategies.

Although the major thrust of horizon scanning has been to identify "rising" technologies that eventually may merit assessment, horizon scanning may turn to the other direction to identify "setting" technologies that may be outmoded or superseded by newer ones. In either case, horizon scanning provides an important input into setting assessment priorities.  

Setting Assessment Priorities

Some assessment programs have explicit procedures for setting priorities; others set priorities only in an informal or vague way. Given very limited resources for assessment and increasing accountability of assessment programs to their parent organizations and others who use or are affected by their assessments, it is important to articulate how assessment topics are chosen.

Box 36
Information Sources for New and Emerging Health Care Interventions
  • Trade journals (e.g., F-D-C Reports: The Pink Sheet, NDA Pipeline, The Gray Sheet; In Vivo; Adis International; Biomedical Instrumentation and Technology; R&Directions)
  • General news (PR Newswire, Reuters Health, New York Times)
  • Health professions and industry newsletters (e.g., Medscape, Medicine & Health, American Health Line, CCH Health & Medicine)
  • Conferences (and proceedings) of medical specialty societies and health industry groups
  • General medical journals and specialty medical journals
  • Technology company web sites
  • Publicly available market research reports (IHS Health Group)
  • FDA announcements of market approvals of new pharmaceuticals (e.g., NDAs, NDA supplements), biotechnologies (e.g., BLAs), and devices (e.g., PMAs, PMA supplements, and 510[k]s)*
  • Adverse event/alert announcements (from FDA, USP, NIH Clinical Alerts and Advisories, etc.)
  • New Medicines in Development (disease- and population-specific series from PhRMA, including clinical trial status)
  • Databases of ongoing research, e.g., and HSRProj (Health Services Research Projects in Progress) from NLM
  • Reports and other sources of information on significant variations in practice, utilization, or payment policies (e.g., The Dartmouth Atlas, LMRP.NET)
  • Special reports on health care trends and futures (e.g., Health and Health Care 2010 (Institute for the Future 2000); Health Technology Forecast (ECRI 2002)
  • Priority lists and forthcoming assessments from public and non-profit evaluation/assessment organizations (e.g., INAHTA member organizations)
  • Cochrane Collaboration protocols

*NDA: New Drug Application approvals; BLA: Biologics License Application approvals; PMA: Premarket Approval

Application approvals; 510(k): substantially equivalent device application approvals.

Most assessment programs have criteria for topic selection, although these criteria are not always explicit. Is it most important to focus on costly health problems and technologies? What about health problems that affect large numbers of people, or health problems that are life-threatening? What about technologies that cause great public controversy? Should an assessment be undertaken if it is unlikely that its findings will change current practice? Examples of selection criteria that are used in setting assessment priorities are:  

  • High individual burden of morbidity, mortality, or disability
  • High population burden of morbidity, mortality, or disability
  • High unit cost of a technology or health problem
  • High aggregate cost of a technology or health problem
  • Substantial variations in practice
  • Available findings not well disseminated or adopted by practitioners
  • Need to make regulatory decision
  • Need to make a health program implementation decision (e.g., for initiating a major immunization program)
  • Need to make payment decision (e.g., provide coverage or include in health benefits)
  • Scientific controversy or great interest among health professionals
  • Public or political demand
  • Sufficient research findings available upon which to base assessment
  • Timing of assessment relative to available evidence (e.g., recent or anticipated pivotal scientific findings)
  • Potential for the findings of an assessment to be adopted in practice
  • Potential for change in practice to affect patient outcomes or costs
  • Feasibility given resource constraints (funding, time, etc.) of the assessment program

The timing for undertaking an assessment may be sensitive to the availability of evidence. For example, the results of a recently completed RCT or meta-analysis may challenge standard practice, and prompt an HTA to consolidate these results with other available evidence for informing clinical or payment decisions. Or, an assessment may be delayed pending the results of an ongoing study that has the potential to shift the weight of the body of evidence on that topic.

A systematic priority-setting process could include the following steps (Donaldson and Sox 1992; Lara and Goodman 1990).

  1. Select criteria to be used in priority setting.
  2. Assign relative weights to the criteria.
  3. Identify candidate topics for assessment (e.g., as described above).
  4. If the list of candidate topics is large, reduce it by eliminating those topics that would clearly not rank highly according to the priority setting criteria.
  5. Obtain data for rating the topics according to the criteria.
  6. For each topic, assign a score for each criterion.
  7. Calculate a priority score for each topic.
  8. Rank the topics according to their priority scores.
  9. Review the priority topics to ensure that assessment of these would be consistent with the organizational purpose.

Processes for ranking assessment priorities range from being highly subjective (e.g., informal opinion of a small group of experts) to quantitative (e.g., using a mathematical formula) (Donaldson 1992; Eddy 1989; Phelps 1992). Box 37 shows a quantitative model for priority setting. The Cochrane Collaboration uses a more decentralized approach. Starting with topics suggested by their review group members, many Cochrane Collaboration review groups set priorities by considering burden of disease and other criteria, as well as input from discussions with key stakeholders and suggestions from consumers. These priorities are then offered to potential reviewers who might be interested in preparing and maintaining relevant reviews in these areas (Clarke 2003). 

Of course, there is no single correct way to set priorities. The great diversity of potential assessment topics, the urgency of some policymaking needs, and other factors may diminish the practical benefits of using highly systematic and quantitative approaches. On the other hand, ad hoc, inconsistent, or non­transparent processes are subject to challenges and skepticism of policymakers and other observers who are affected by HTA findings. Certainly, there is a gap between theory and application of priority setting. Many of the priority setting models are designed to support resource allocation that maximizes health gains, i.e., identify health interventions which, if properly assessed and appropriately used, could result in substantial health improvements at reasonable costs. However, some potential weaknesses of these approaches are that they tend to set priorities among interventions rather than the assessments that should be conducted, they do not address priority setting in the context of a research portfolio, and they do not adopt an incremental perspective (i.e., consideration of the net difference that conducting an assessment might accomplish) (Sassi 2003).

Reviewing the process by which an assessment program sets its priorities, including the implicit and explicit criteria it uses in determining whether or not to undertake an assessment, can help to ensure that the HTA program is fulfilling its purposes effectively and efficiently.

Specify the Assessment Problem

One of the most important aspects of an HTA is to specify clearly the problem(s) or question(s) to be addressed; this will affect all subsequent aspects of the assessment. An assessment group should have an explicit understanding of the purpose of the assessment and who the intended users of the assessment are to be. This understanding might not be established at the outset of the assessment; it may take more probing, discussion and clarification.

Box 37
A Quantitative Model for Priority Setting

A 1992 report by the Institute of Medicine provided recommendations for priority setting to the

Agency for Health Care Policy and Research (now AHRQ). Seven criteria were identified:

  • Prevalence of a health condition
  • Burden of illness
  • Cost
  • Variation in rates of use
  • Potential of results to change health outcomes
  • Potential of results to change costs
  • Potential of results to inform ethical, legal, or social issues

The report offered the following formula for calculating a priority score for each candidate topic.

Priority Score = W1lnS1 + W2lnS2 + ... W7lnS7


W is the relative weight of each of seven priority-setting criteria

S is the score of a given candidate topic for a criterion

ln is the natural logarithm of the criterion scores.

Candidate topics would then be ranked according to their priority score.

Source: Donaldson 1992.

The intended users or target groups of an assessment should affect its content, presentation, and dissemination of results. Clinicians, patients, politicians, researchers, hospital managers, company executives, and others have different interests and levels of expertise. They tend to have different concerns about the effects or impacts of health technologies (health outcomes, costs, social and political effects, etc.). They also have different needs regarding the scientific or technical level of reports, the presentation of evidence and findings, and the format (e.g., length and appearance) of reports.  

When the assessment problem and intended users have been specified, they should be reviewed by the requesting agency or sponsors of the HTA. The review of the problem by the assessment program may have clarified or focused the problem in a way that differs from the original request. This clarification may prompt a reconsideration or restatement of the problem before the assessment proceeds.

Problem Elements

There is no single correct way to state an assessment problem. In general, an assessment problem could entail specifying at least the following elements: health care problem(s); patient population(s); technology(ies); practitioners or users; setting(s) of care; and properties (or impacts or health outcomes) to be assessed.

For example, a basic specification of one assessment problem would be:

  • Health care problem: management of moderate hypertension
  • Patient population: males and females, age >60 years, diastolic blood pressure 90-114 mm Hg, systolic blood pressure <240 mm Hg, no other serious health problems
  • Technologies:  specific types/classes of pharmacologic and nonpharmacologic treatments
  • Practitioners:  primary care providers
  • Setting of care: outpatient care, self care
  • Properties, impacts, or outcomes: safety (including side-effects), efficacy, effectiveness and cost-effectiveness (especially cost-utility)

Causal Pathways

A useful means of presenting an assessment problem is a "causal pathway," sometimes known as an "analytical framework." Causal pathways depict direct and indirect linkages between interventions and outcomes. Although often used to present clinical problems, they can be used as well for organizational, financing, and other types of interventions or programs in health care.  

Causal pathways provide clarity and explicitness in defining the questions to be addressed in an HTA, and draw attention to pivotal linkages for which evidence may be lacking. They can be useful working tools to formulate or narrow the focus of an assessment problem. For a clinical problem, a causal pathway typically includes a patient population, one or more alternative interventions, intermediate outcomes (e.g., biological markers), health outcomes, and other elements as appropriate. In instances where a topic concerns a single intervention for narrowly defined indications and outcomes, these pathways can be relatively straightforward. However, given the considerable breadth and complexity of some HTA topics, which may cover multiple interventions for broadly defined health problem (e.g., screening, diagnosis, and treatment of osteoporosis in various population groups), causal pathways can become detailed. While the development of a perfectly representative causal pathway is not the objective of an HTA, these can be specified to a level of detail that is sufficient for the sponsor of an HTA and the group that will conduct the HTA concur on the assessment problem. In short, it helps to draw a picture.

An example of a general causal pathway for a screening procedure with alternative treatments is shown in Box 23. As suggested in this example, the evidence that is assembled and interpreted for an HTA may be organized according to an indirect relationship (e.g., between a screening test and an ultimate health outcome) as well as various intervening direct causal relationships (e.g., between a treatment indicated by the screening test and a biological marker, such as blood pressure or cholesterol level).

Reassessment and the Moving Target Problem

Health technologies are "moving targets" for assessment (Goodman 1996). As a technology matures, changes occur in the technology itself or other factors that can diminish the currency of an HTA report and its utility for health care policies. As such, HTA can be more of an iterative process than a one-time analysis. Some of the factors that would trigger a reassessment might include changes in the:

  • Evidence pertaining to the safety, effectiveness, and other outcomes or impacts of using the technology (e.g., publication of significant new results of a major clinical trial or a new meta-analysis)
  • Technology itself (modified techniques, models, formulations, delivery modes, etc.)
  • Indications for use (different health problems, degree of severity, etc.)
  • Populations in which it is used (different age groups, comorbidities, etc.)
  • Protocols or care pathways of which the technology is a part that may alter the role or utility of the technology
  • Care setting in which the technology is applied (inpatient, outpatient, physician office, home, long-term care)
  • Provider of the technology (type of clinician, other caregiver, patient, etc.)
  • Practice patterns (e.g., large practice variations)
  • Alternative technology or standard of care to which the technology is compared
  • Outcomes or impacts considered to be important (e.g., types of costs or quality of life)
  • Resources available for health care or the use of a particular technology (i.e., raising or lowering the threshold for decisions to use the technology)
  • Adoption or use of guidelines, payment policies, or other decisions that are based on the HTA report
  • Interpretation of existing research findings (e.g., based on corrections or re-analyses).

There are numerous instances of moving targets that have prompted reassessments. For example, since the inception of percutaneous transluminal coronary angioplasty (PTCA, approved by the US FDA in 1980), its clinical role vis-à-vis coronary artery bypass graft surgery (CABG) has changed as the techniques and instrumentation for both technologies have evolved, their indications have expanded, and as competing, complementary, and derivative technologies have emerged (e.g., laser angioplasty, coronary artery stents, minimally-invasive and "beating-heart" CABG). The emergence of viable pharmacological therapy for osteoporosis (e.g., with bisphosphonates and selective estrogen receptor modulators) has increased the clinical utility of bone densitometry. Long rejected for its devastating teratogenic effects, thalidomide has reemerged for carefully managed use in a variety of approved and investigational uses in leprosy and other skin diseases, certain cancers, chronic graft-vs.-host disease, and other conditions (Combe 2001; Richardson 2002).  

While HTA programs cannot avoid the moving target problem, they can manage and be responsive to it. Box 38 lists approaches for managing the moving target problem.  

Box 38
Managing the Moving Target Problem
  • Recognize that HTA must have the capacity to revisit topics as needed, whether periodically or as prompted by important changes that have transpired since preparation of the original HTA report.
  • Document in HTA reports the information sources, assumptions, and processes used. This information baseline will better enable HTA programs and other interested groups to recognize when it is time for reassessment.
  • In the manner of a sensitivity analysis, indicate in HTA reports what magnitudes of change in key variables (e.g., accuracy of a diagnostic test, effectiveness of a drug, patient compliance, costs) would result in a significant change in the report findings.
  • Note in HTA reports any known ongoing research, work on next-generation technologies, population trends, and other developments that might prompt the need for reassessment.
  • Have or subscribe to a scanning or monitoring function to help detect significant changes in technologies and other developments that might trigger a reassessment.
  • Recognize that, as the number of technology decision makers increases and evidence-based methods diffuse, multiple assessments are generated at different times from different perspectives. This may diminish the need for clinicians, payers, and other decision makers to rely on a single, definitive assessment on a particular topic.

Aside from changes in technologies and their applications, even new interpretations of, or corrections in, existing evidence can prompt a new assessment. This was highlighted by a 2001 report of a Cochrane Center that prompted the widespread re-examination of screening mammography guidelines by government and clinical groups. The report challenged the validity of evidence indicating that screening for breast cancer reduces mortality, and suggested that breast cancer mortality is a misleading outcome measure (Olsen 2001).

Some research has been conducted on the need to reassess a particular application of HTA findings, i.e., clinical practice guidelines. For example, for a study of the validity of 17 guidelines developed in the 1990s by AHCPR (now AHRQ), investigators developed criteria defining when a guideline needs to be updated, surveyed members of the panels that prepared the respective guidelines, and searched the literature for relevant new evidence published since the appearance of the guidelines. Using a "survival analysis," the investigators determined that about half of the guidelines were outdated in 5.8 years, and that at least 10% of the guidelines were no longer valid by 3.6 years. They recommended that, as a general rule, guidelines should be reexamined for validity every three years (Shekelle, Ortiz 2001). Others counter that the factors that might prompt a reassessment do not arise predictably or at regular intervals (Brownman 2001). Some investigators have proposed models for determining whether a guideline or other evidence-based report should be reassessed (Shekelle, Eccles 2001).

Changes in the volume or nature of publications may trigger the need for an initial assessment or reassessment. A "spike" (sharp increase) in publications on a topic, such as in the number of research reports or commentary, may signal trends that would merit attention for assessment. However, in order to determine whether such publication events are reliable indicators of technology emergence or moving targets requiring assessment, further bibliometric research should be conducted to determine whether actual emergence of new technologies or substantial changes in them or their use has been correlated with such publication events or trends (Mowatt 1997).

Not all changes require conducting a reassessment, or that a reassessment should entail a full HTA. A reassessment may require updating only certain aspects of an original report. In some instances, current clinical practices or policies may be recognized as being optimal relative to available evidence, so that a new assessment would have little potential for impact; or the set of clinical alternatives and questions have evolved so much since the original assessment that it would not be relevant to update it.  

In some instances, an HTA program may recognize that it should withdraw an existing assessment because to maintain it could be misleading to users and perhaps even have adverse health consequences. This may arise, for example, when an important flaw is identified in a pivotal study in the evidence base underlying the assessment, when new research findings appear to refute or contradict the original research base, or when the assumptions used in the assessment are determined to be flawed. The determination to maintain or withdraw the existing assessment while a reassessment is conducted, to withdraw the existing assessment and not conduct a reassessment, or to take other actions, depends on the risks and benefits of these alternative actions for patient health, and any relevant legal implications for the assessment program or users of its assessment reports.

Once an HTA program determines that a report topic is a candidate for being updated, the program should determine the need to undertake a reassessment in light of its other priorities. Assessment programs may consider that candidates for reassessment should be entered into the topic priority-setting process, subject to the same or similar criteria for selecting HTA topics.


One of the great challenges in HTA is to assemble the evidence&#8722;the data, literature and other information&#8722;that is relevant to a particular assessment. For very new technologies, this information may be sparse and difficult to find; for many technologies, it can be profuse, scattered and of widely varying quality. Literature searching and related evidence retrieval are integral to successful HTA, and the time and resources required for these activities should be carefully considered in planning any HTA (Auston 1994; Goodman 1993).

Types of Sources

Available information sources cover different, though often overlapping, sectors of health care information. Although some are devoted to health care topics, others cover the sciences more broadly. Multiple sources should be searched to increase the likelihood of retrieving relevant reports. The variety of types of sources that may be useful for HTA include:

  • Computer databases of published literature
  • Computer databases of clinical and administrative data
  • Printed indexes and directories
  • Government reports and monographs
  • Policy and research institute reports
  • Professional association reports and guidelines
  • Market research reports
  • Company reports and press releases
  • Reference lists in available studies and reviews
  • Special inventories/registers of reports
  • Health newsletters and newspapers
  • Colleagues and investigators

Of course, the Internet is an extraordinarily broad and readily accessible medium that provides access to many of these information sources.  

There are hundreds of publicly available computer databases for health care and biomedical literature. Among these are various general types. For example, bibliographic databases have indexed citations for journal articles and other publications. Factual databases provide information in the form of guidelines for diagnosis and treatment, patient indications, and contraindications, and other authoritative information. Referral databases provide information about organizations, services and other information sources.

The National Information Center on Health Services Research & Health Care Technology (NICHSR) [] of the US National Library of Medicine (NLM) provides an extensive, organized set of the many, evolving databases, publications, outreach and training, and other information resources for HTA. One online source, Etext on Health Technology Assessment (HTA) Information Resources [], is a comprehensive textbook on sources of HTA information and searching approaches compiled by information specialists and researchers from around the world (National Library of Medicine 2003). Various other useful compendia of HTA information resources have been prepared (Busse 2002; Glanville 2003; Chan 2003). Some of the main bibliographic and factual databases useful in HTA are listed in Box 39.

The most widely used of these resources for HTA are the large bibliographic databases, particularly MEDLINE, produced by NLM, and EMBASE, produced by Elsevier. MEDLINE can be accessed at the NLM website using PubMed, which also includes new in-process citations (with basic citation information and abstracts before being indexed with MeSH terms and added to MEDLINE), citations from various life science journals, and certain other entries. In addition, there are many specialized or more focused databases in such areas as AIDS, bioethics, cancer treatment, pharmaceutical research and development, ongoing clinical trials (e.g., of NLM), and practice guidelines (e.g., National Guideline Clearinghouse of AHRQ).

The Cochrane Collaboration [] is an international organization that prepares, maintains and disseminates systematic reviews of RCTs (and other evidence when appropriate) of treatments for many clinical conditions. More than 1,500 systematic reviews have been produced by nearly 50 Cochrane review groups in such areas as acute respiratory infections, breast cancer, diabetes, hypertension, infectious diseases, and pregnancy and childbirth. The Cochrane Collaboration produces the Cochrane Library, which includes databases and registers produced by the Cochrane Collaboration as well as some produced by other organizations. The Database of Abstracts of Reviews and Dissemination (DARE) [] and the NHS Economic Evaluation Database are produced by the NHS Centre for Reviews and Dissemination (NHSCRD).  The HTA Database is produced by the International Network of Agencies for Health Technology Assessment (INAHTA) [], in collaboration with the NHSCRD.

The selection of sources for literature searches should depend on the purpose of the HTA inquiry and pertinent time and resource constraints. Most searches are likely to involve MEDLINE or another large database of biomedical literature (Suarez-Almazor 2000; Topfer 1999). However, the selection of other databases may differ by purpose, e.g., horizon scanning, ascertaining regulatory or payment status of technologies, comprehensive systematic review, or identifying literature in particular clinical areas.

Gray Literature

Much valuable information is available beyond the traditional published sources. This "gray" or "fugitive" literature is found in industry and government monographs, regulatory documents, professional association reports and guidelines, market research reports, policy and research institute studies, spot publications of special panels and commissions, conference proceedings, and other sources. Many of these can be found via the Internet. Although the gray literature can be timely and cover aspects of technologies that are not addressed in mainstream sources, it is usually not subject to peer review, and must be scrutinized accordingly.  

Box 39
Selected Bibliographic and Factual Databases for HTA

Some Core Sources

  • MEDLINE: citations for biomedical journal articles
  • EMBASE: citations for biomedical journal articles (Elsevier)
  • Cochrane Database of Systematic Reviews: systematic reviews of controlled trials on hundreds of clinical topics
  • Cochrane Controlled Trials Register: bibliography of controlled trials including sources outside peerreviewed journal literature
  • Database of Abstracts of Reviews of Effectiveness (DARE): structured abstracts of systematic reviews from around the world, critically appraised by NHS Centre for Reviews and Dissemination
  • NHS Economic Evaluation Database: abstracts and other information about published economic evaluations of health care interventions
  • Health Technology Assessment Database: records of ongoing projects of members of INAHTA and completed HTAs by INAHTA members and other organizations
  • National Guideline Clearinghouse: evidence-based clinical practice guidelines (AHRQ)

Additional Sources

  • Other NLM/NIH sources:
    • current information about current clinical research studies in health services research and behavioral and social sciences
    • DIRLINE: directory of organizations
    • HSRProj: ongoing health services research projects
    • HSRR (Health Services/Sciences Research Resources): research datasets and instruments/indices.
    • HSTAT: full text of US clinical practice guidelines, consensus development reports, technology assessment reports, etc.
    • PDQ: cancer treatment, supportive care, screening, prevention, clinical trials
    • Other specialized databases such as AIDSLINE, Bioethics, and HealthSTAR have been incorporated into MEDLINE, accessed, e.g., via PubMed
  • ACP Journal Club: selected studies and systematic reviews for immediate attention of clinicians, with "value added" abstracts and commentary
  • AltHealthWatch: information resources on alternative medicine
  • Bandolier: journal of evidence summaries
  • Best Evidence (ACP Journal Club plus Evidence Based Medicine)
  • BIOSIS Previews: citations of life sciences literature (BIOSIS)
  • CEA Registry: database of standardized cost-utility analyses (Harvard School of Public Health)
  • CINAHL: citations for nursing and allied health literature (Cinahl Information Systems)
  • CDC Wonder: gateway to reports and data of the US Centers for Disease Control and Prevention (CDC)
  • Cochrane Methodology Register: bibliography of articles and books on the science of research synthesis
  • Cochrane Database of Methodology Reviews: full text of systematic reviews of empirical methodological studies
  • HDA Evidence Base: summaries of systematic reviews of effectiveness, literature reviews, meta-analyses, expert group reports, and other review-level information (NHS Health Development Agency, UK)
  • MANTIS: bibliographic database on manual, alternative, and natural therapies
  •  Netting the Evidence: (ScHARR, University of Sheffield, UK)
  •  PsycINFO: citations of psychological literature (American Psychological Association)
  •  SciSearch: citations for scientific journal articles (Institute for Scientific Information)

Publication Bias

Various forms of bias can affect the validity of HTA. One reason for careful planning and conduct of search strategies for HTA is minimize, or at least recognize, the effects of publication bias. Studies of the composition of the biomedical research literature have found imbalances in the publication of legitimate studies (Chalmers 1990). For instance, positive studies (that find statistically significant treatment effects) are more likely than negative studies (that find no treatment effects) to be published in peer-reviewed journals (Dickersin 1993; Dickersin 1997). A study sponsored by a health product company or other group with an interest in the results may be less likely to be submitted for publication if the findings are not favorable to the interests of that group. RCTs conducted for market approval (e.g., by the US FDA) often are not published (MacLean 2003). Some research indicates that, among published studies of health technologies, smaller studies tend to report positive results more frequently (Agema 2002). Positive studies are more likely to be published in English-language journals, be reported in multiple publications, and be cited in other articles (Easterbrook 1991, Gøtzsche 1989). These multiple appearances and citations increase the likelihood of being identified in literature searches and included in meta-analyses and other systematic reviews, which may introduce bias into the results of these syntheses as well (Sterne 2001). The prevalence of unpublished studies may vary by specialty; for example, oncology appears to have a high prevalence of unpublished studies.  

One detailed analysis of the characteristics of clinical trials used in systematic reviews indicated that, compared to other clinical areas, trials in the fields of psychiatry, rheumatology, and orthopedics tend more often to be published in non-English languages and appear in sources not indexed in MEDLINE (Egger 2003). Time lag bias occurs when the time from completion of a clinical trial to its publication is affected by the direction (positive vs. negative findings) and strength (statistical significance) of the trial results (Ioannidis 1998).

Certainly, bias in selection of studies used in HTA may arise to the extent that the literature search does not include studies that appear in languages other than English (language bias), are not indexed in MEDLINE or other major bibliographic databases, are unpublished, or are of lesser methodological quality. While the validity of an HTA is likely linked to the effort to include an unbiased sample of relevant studies, the size and direction of this relationship varies. There is a growing literature on the extent to which more or less restrictive inclusion criteria for meta-analyses affect their results. For example, some research indicates that systematic reviews limited to the English language literature that is accessible via the major bibliographic databases produces similar or same results to those based on less restricted reviews (Egger 2003). Lowering the standard of methodological quality for inclusion of published studies in an HTA may bias the findings if these studies tend to report positive findings more often that higher-quality studies.  

In planning a literature search, assessors should weigh the anticipated quality of a search with time and resource constraints. Efforts to recognize and minimize bias may be further subject to such factors as the availability of studies by language and for particular clinical areas, and their accessibility via bibliographic databases.

Help for Searchers

Given the great number of databases and the variety in their scope, means of access, controlled vocabularies and search commands, it is advisable to consult health information specialists. These experts can be especially helpful when planning which databases to search, inclusion and exclusion criteria, and other aspects of literature searches. An expanding network of HTA information specialists who work with HTA agencies and other evidence-based medicine organizations around the world have formed the HTAi Information Resources Group, which is extending the capabilities, expertise, and collaboration in the field. Improved indexing, text word searching, user-friendly interfaces, more powerful personal computers and other advances in medical informatics are helping non-expert searchers to retrieve valuable information more effectively and efficiently. Indeed, the enhanced ability of all types of assessors to probe these databases provides a more immediate, hands-on understanding of the scope and quality of literature on any given topic.

During the last decade, the NLM has undertaken to improve its MeSH (Medical Subject Headings) controlled vocabulary (used to index and search literature in MEDLINE and other NLM databases) in the related fields of HTA and health services research. In cooperation with the Cochrane Collaboration and others, NLM has improved the indexing of citations in MEDLINE and other databases to improve identification of RCTs (Dickersin 1994). Most bibliographic and factual databases have user-friendly tutorials, search engines, and other searching tools that are increasingly standard and familiar to expert and non-expert searchers alike. There is a growing number of resources for supporting searching strategies for HTA (Goodman 1993, Sackett 1997). A new resource from the NLM NICHSR, Etext on Health Technology Assessment (HTA) Information Resources [], provides extensive guidance and resources for searching in HTA (National Library of Medicine 2003). Particularly instructive and useful for clinicians is the series of articles published in the Journal of the American Medical Association: Users' Guides to the Medical Literature, from the Evidence-Based Medicine Working Group (Hunt 2000).

The search for pertinent existing evidence is normally one of the first major tasks of an assessment, and should be planned accordingly. Costs associated with evidence searches can be significant, coming in the form of staff time and acquisition of literature, data tapes, and other documentation. Although access to MEDLINE (e.g., via PubMed) and other public-source databases is generally free of inexpensive, using some specialized scientific and business databases can be more costly. Database vendors offer a variety of packages of databases and pricing algorithms for these. HTA programs of such organizations as ECRI, the Blue Cross and Blue Shield Association, and Hayes sell their reports on a subscription basis. Some market research monographs and other reports oriented for health product companies, investors and other business interests are priced in the thousands of dollars.


To the analysts and other experts who have participated in an HTA, the importance of its findings and recommendations may be self-evident. Dissemination of these findings and recommendations, whether for internal use in the same organization or into the national or international health information mainstream, often is considered as an administrative afterthought.

Worthy HTA messages get lost because of misidentified and misunderstood audiences, poor packaging, wrong transmission media, bad timing, and other factors. Although there is some convergence on the format and content of information to be included in an HTA report, much research is needed regarding how to optimize the dissemination of HTA findings and recommendations (Goldberg 1994; Mittman and Siu 1992; Mittman and Tonesk 1992; Busse 2002).

Competing for Attention

Dissemination efforts must compete with the burgeoning flow of health-related information being transmitted across diverse channels using increasingly sophisticated means. Advanced communications technologies provide alternative means to transmit more data where and when it can influence decision makers. Marketing, long practiced effectively by health care product companies, offers an evolving, continually researched variety of techniques that are being adapted throughout the health care sector. As the ground shifts in health care organization, delivery and financing, the cast of decision makers constituting the potential users of HTA changes.  

There is considerable current controversy regarding various policies and practices of disseminating information about health technologies, particularly by pharmaceutical and other health technology companies. One area is the use of direct-to-consumer advertising by pharmaceutical and other health technology companies, including whether this is to be permitted at all and, if so, what requirements should pertain to the content and format of the message. In particular, while there is strong evidence that these messages increase awareness of prescription drugs, they is far less evidence that they are effective in educating patients about medications for their conditions (Lyles 2002). A second area of controversy concerns whether health technology companies can distribute published and unpublished reports of clinical trials of their products for indications that have not been cleared for marketing by the appropriate authority, e.g., by the US FDA (Stryer 1996). A third area of controversy concerns the conditions under which pharmaceutical and other health technology companies can make claims in their marketing information about the cost-effectiveness of their products, what the rigor of supporting evidence should be, and which agencies should have regulatory oversight for such economic claims (Neumann 2000).

Dissemination Dimensions

Approaches for disseminating reports of HTAs can be described along three dimensions: target groups (intended audiences), media, and implementation techniques or strategies, as shown in Box 40.

The results of the same HTA may be packaged for dissemination in different formats, e.g., for patients, clinicians, payers, and researchers or policy analysts. Reaching the same decisionmaker may require repeated messages and/or multiple media. The style in which an assessment report is written (e.g., an academic, scholarly tone versus a practical, concrete tone) may affect the receptiveness of researchers, practitioners and others (Kahan 1988).

Box 40
Approaches for HTA Report Dissemination

Target groups

  • Clinicians (individuals, specialty/professional organizations)
  • Patients/consumers (individuals, organizations)
  • Provider organizations (hospitals, clinics, managed care organizations);
  • Third party payers (government, private sector)
  • Quality assurance and utilization review organizations
  • Government policymakers (international, national, state, local)
  • Biomedical researchers
  • Health care product companies
  • News professionals (popular and scientific/professional journalists and editors)
  • Educational institutions (schools, continuing professional education programs)


  •  Printed: direct mail, newspapers and popular journals, scientific/professional journals and newsletters, posters, pocket cards
  •  Electronic: internet, television, radio, video disks, computer databases (online and disk)
  •  Word of mouth: informal consultation, formal lectures and presentations, focus groups

Implementation techniques or strategies

  •  Patient-oriented: mass media campaigns, community based campaigns, interaction with clinicians (including shared decision procedures, interactive video disk), modify insurance coverage (more or less generous benefits, change copayments)
  •  Clinician-oriented: conferences and workshops; continuing professional education; professional curriculum development; opinion leaders; one-on-one educational visits ("academic detailing"); coverage/reimbursement policy; precertification; mandatory second opinion; drug formulary restrictions; feedback (e.g., on laboratory test ordering relative to criteria/guidelines); reminder systems (e.g., as part of computer-based patient record systems); medical audit/peer review; criteria for board certification/recertification, state licensure, Medicare PRO action, specialty designation, professional/specialty society membership; public availability of performance data (e.g., adjusted mortality rates for certain procedures); defense against sanctions and malpractice action
  •  Institution-oriented: accreditation, standards (e.g., hospital infection control, clinical laboratories), benchmarking, public availability of performance data

Dissemination Plan

Dissemination should be planned at the outset of an assessment along with other assessment phases or activities. The costs, time and other resources needed for dissemination should be budgeted accordingly. This does not mean that dissemination plans should be rigid; the nature of the findings and recommendations themselves may affect the choice of target groups and the types of messages to be delivered. Dissemination should be designed to influence behavior of decision makers. This is not always straightforward, as research findings concerning what works for HTA dissemination strategies do not point to any universally successful approaches.

Mediating Access

There are many approaches to controlling or enhancing access to assessment reports. As noted above, some assessment programs provide their assessments only to paid subscribers or member organizations, or charge fees intended to help recoup the cost of the assessment or provide a profit. While some assessments are public documents made available at no cost via the internet or in public libraries, others are held as proprietary (e.g., company assessments of new products). Access to assessment literature is also mediated by the capacity of bibliographic organizations (e.g., the NLM and commercial database vendors) to index and abstract the literature, and the availability of such information via online databases and other information services. The wording used by assessment report authors for titles and abstracts can influence the indexing that serves as a key to accessing these reports.


The impacts of HTAs, from market research reports to RCT reports to expert panel statements, are variable and inconsistently understood. Whereas some HTA reports are translated directly into policies with clear and quantifiable impacts, the findings of some "definitive" RCTs and authoritative, well-documented assessment reports go unheeded or are not readily adopted into general practice (Banta 1993; Ferguson, Dubinsky 1993; Henshall 2002; Institute of Medicine 1985).

As is the case for the technologies that are the subjects of HTA, the reports of HTAs can have intended, direct impacts as well as unintended, indirect ones. Some of the ways in which a HTA report can make an impact (Banta 1993) are:

  • Affect corporate investment decisions
  • Modify R&D priorities/spending levels
  • Change regulatory policy
  • Modify marketing of a technology
  • Change third-party payment policy
  • Affect acquisition or adoption of a new technology
  • Change the rate of use of a technology
  • Change clinician behavior
  • Change patient behavior
  • Change the organization or delivery of care
  • Reallocate national or regional health care resources

Attributing Impact to HTA Reports

The impact of a HTA depends upon the target groups' legal, contractual, or administrative obligation to comply with it (Anderson 1993; Ferguson, Dubinsky 1993; Gold 1993). FDA market approvals of new drugs and devices are translated directly into binding policy. Most of the HTAs conducted by AHRQ are requested by CMS for use in the Medicare program, although CMS is not obligated to comply with AHRQ findings. The impacts of NIH consensus development conference statements, which are not statements of government policy, are inconsistent and difficult to measure. The ability of NIH statements to change behavior seems to depend upon a variety of factors intrinsic to particular topics, the consensus development process and a multitude of contextual factors (Ferguson 1993; Ferguson 2001).

The task of measuring the impact of HTA can range from elementary to infeasible. Even if an intended change does occur, it may be difficult or impossible to attribute this change to the HTA. A national-level assessment that recommends increased use of a particular intervention for a given clinical problem may be followed by a documented change in behavior consistent with that recommendation. However, the recommendation may be made at a time when the desired behavior change is already underway, when third-party payment policy is shifting in favor of the technology, during a strong marketing effort by industry, or close to the time of announcement of the results of a convincing clinical trial. Given widespread and nearly instant communications in health care, it may be difficult to control for factors other than a particular HTA report that might influence behavior change.

As is the case for attributing changes in patient outcomes to a technological intervention, the ability to demonstrate that the results of an HTA have an impact depends upon the conditions under which the assessment results were made known and the methodological approach used to determine the impact. Evaluations of the impact of an assessment often are unavoidably observational in nature; however, under some circumstances, quasi-experimental or experimental evaluations are used (Goldberg 1994). To the extent that impact evaluations are prospective, involve pre- and post-dissemination data collection, and involve directed dissemination to clearly identified groups with well-matched controls (or at least retrospective adjustment for reported exposure to dissemination), they are more likely to detect a causal connection between an HTA report and behavior change. Even so, generalizing from one experience to others may be impractical, as it is difficult to describe and replicate the conditions of a particular HTA report dissemination.

Factors Mediating Impact

The factors that can affect the impact of HTA reports are many. Beyond the particular dissemination techniques used, characteristics of the target groups, the environment and the HTAs themselves can mediate the impact (Goldberg 1994; Mittman and Siu 1992; Mittman and Tonesk 1992). Examples are shown in Box 41. Knowledge about these factors can be used prospectively. As noted above, assessment programs should consider how to properly target and modify their dissemination strategies to achieve the desired impact given particular characteristics of organizations, clinicians, environments, etc. Systematic attempts to document the dissemination processes and impacts of HTA programs are infrequent (Banta 1993; Goodman 1988; Institute of Medicine 1985), though a few, notably the NIH Consensus Development Program (Ferguson 1993), have been studied in detail. Like other interventions in health care, HTA programs may be expected to demonstrate their own cost-effectiveness, i.e., that the health and/or economic benefits resulting from an HTA program outweigh the cost of the program itself.

Box 41
Examples of Factors That Can Affect Impact of HTA Reports

Target provider organization characteristics

  •  Hospitals: general versus specialized, size, teaching status, patient mix, for-profit vs. non-profit, distribution of payment sources (e.g., fee-for-service vs. capitation), ownership status, financial status, accreditation
  •  Physicians' offices: group practice vs. solo practice, hospital affiliation, teaching affiliation, board certification, distribution of payment sources

Target clinician characteristics

  •  Type of clinician: physician, nurse, dentist, etc.
  •  Specialty
  •  Training
  •  Professional activities/affiliations
  •  Institutional affiliations (e.g., community hospital, university hospital)
  •  Familiarity with and access to recent literature

Environmental characteristics

  •  Urban, suburban, rural
  •  Competitive environment
  •  Economic status
  •  Third-party payment status (e.g., percentage of patients in HMOs, private insurance, etc.)
  • State and local laws, regulations
  •  Malpractice activity

Characteristics of HTA findings/recommendations

  •  Type: research findings, practice guidelines, standards (e.g., equipment acquisition, use, maintenance), appropriateness criteria
  •  Format: printed, word-of-mouth, electronic, etc.
  •  Frequency of message
  •  Required level of compliance (ranging from mandatory to optional)
  •  Locus of decision: general practitioner/primary care physician only, physician specialist only, multiple clinicians, physician with patient input, patient only
  •  Perceived inappropriate rigidity (allowance for discretion for differing circumstances)
  •  Cost of relevant procedure/management of condition
  •  Payment issue(s) addressed: coverage status, payment level
  •  Reputation of sponsoring organization, analysts, expert panel
  •  Overall strength of evidentiary base (e.g., existence of "definitive" clinical trial)
  •  Credibility/rigor of assessment process
  •  Existence or potential for malpractice action
  •  Timeliness of dissemination, especially compared to degree of uncertainty, most recent research findings, or current levels/change rates of utilization of procedure
  •  Existence and nature of other HTA findings on same topic.

Sources: Goldberg 1994; Mittman 1992; others.


Locus of Assessment: "Make or Buy?"

The nature of an assessment problem will affect the determination of the most appropriate organization to conduct it. Certainly, a comprehensive HTA addressing multiple attributes of a technology can be very resource intensive, requiring considerable and diverse expertise, data sources, and other resources.

Some health care organizations, such as some ministries of health and national health services, major insurance companies, health plans, and integrated health systems, have their own internal HTA programs. For example, in a large hospital or health plan, this might include a core staff and a multidisciplinary HTA committee representing major clinical departments, nursing, pharmacy, allied health, biomedical engineering. This committee might interact with other committees such as pharmacy & therapeutics ("P&T"), strategic planning, and capital planning committees (Kaden 2002; University HealthSystem Consortium 1996).  

Other organizations rely on assessment reports acquired from organizations that have devoted functions or otherwise specialize in HTA. For example, the US, the CMS requests HTAs from AHRQ to inform Medicare coverage decisions by CMS. Similarly, in support of its technology appraisals and clinical guidelines, the National Institute for Clinical Excellence (NICE) [] requests HTAs from the National Coordinating Centre for HTA (NCCHTA), which coordinates the NHS R&D Division HTA Programme [].

Other vendors for HTAs in the US and around the world include, e.g., Blue Cross and Blue Shield Association Technology Evaluation Center [], Cochrane Collaboration, ECRI [], Hayes Inc. [], Institute for Clinical Systems Improvement [], United BioSource Corporation, formerly MetaWorks Inc. [], and University HealthSystem Consortium []. Depending upon the producing HTA organization, these HTA reports may be available at no cost, for members only, on a subscription basis, or for a specific price per report.

Health care decision makers can "make or buy" HTAs. Determining the responsibility for sponsoring or conducting an assessment depends upon the nature of the problem, financial resources available, expertise of available personnel, time constraints, and other factors. For any assessment problem, an organization must determine the extent to which it will devote its resources to conducting the assessment itself or purchasing it from other sources. Some health care organizations commission selected components of an HTA, such as evidence retrieval and synthesis, and perform the other steps in-house.

One of the potential advantages of requesting or commissioning an outside group to conduct HTAs is to gain an independent, outside view where a requesting agency might have a perceived conflict of interest. Thus, a major health care payer might seek an HTA from an outside group to inform its coverage decision about a costly new technology in order to diminish perceptions of a potential bias against making a decision not to cover the technology.  

Factors that influence the "make or buy" decision include the following (Goodman, Snider 1996).

  • Is an existing assessment available?  If an existing assessment is available, does it address the specific assessment problem of interest, including the technology or intervention, patient population, and impacts of interest?  Does it have a compatible perspective?  Is the assessment still current?  Is the methodology used sufficiently credible? Is the report worth its price?
  • If an existing assessment needs to be updated or is not available, do people in the organization have the time and expertise to perform the required data collection and analyses? If a synthesis of existing information is needed, does the organization have database searching capabilities and staff to review and interpret the literature?  If new data are needed, does the organization have the requisite resources and expertise?
  • What methodology will be used? If, for example, a consensus development approach is preferred, does that consensus need to incorporate and reflect the opinions of the organization's own clinicians? Will local clinicians accept the results and report recommendations if they do not participate in the assessment?

Quality of Care and HTA

The relationship between HTA and quality of care is often poorly understood. Although a thorough discussion of this subject is not possible here, the following are some definitions and fundamental relationships concerning these concepts.

Quality of care is a measure or indicator of the degree to which health care is expected to increase the likelihood of desired health outcomes and is consistent with standards of health care. HTA and quality assurance are distinct yet interdependent processes that contribute to quality of care.

HTA generates findings that add to our knowledge about the relationship between health care interventions and health care outcomes. This knowledge can be used to develop and revise a range of standards and guidelines for improving health care quality, including practice guidelines, manufacturing standards, clinical laboratory standards, adverse event reporting, architecture and facility design standards, and other criteria, practices, and policies regarding the performance of health care.

The purpose of quality assurance activities is to ensure that the best available knowledge concerning the use of health care to improve health outcomes is properly used. It involves the implementation of health care standards, including activities to correct, reduce variations in, or otherwise improve health care practices relative to these standards. Continuous quality improvement (CQI) and total quality management (TQM) (Gann 1994; Wakefield 1993) are among the contemporary systematic approaches to quality assurance that are being adapted for hospitals and other health care institutions. Such approaches include, for example, the identification of best practices and the use of benchmarking to develop improved clinical pathways or disease management for medical and surgical procedures, administrative operations, etc. (Kim 2003; Kwan 2002; Pilnick 2001). For example, CQI has been evaluated in a recent multicenter RCT as a means improve the adoption of two process of care measures for CABG: preoperative β–blockade therapy and internal mammary artery grafting (Ferguson 2003). Notably, in this RCT, the intervention being tested was not those two health care interventions, but CQI.

Quality assurance involves a measurement and monitoring function, (i.e., quality assessment). Quality assessment is, primarily, a means for determining how well health care is delivered in comparison with applicable standards or acceptable bounds of care. These standards or bounds may be grouped according to the structure of care (institutional, professional and physical characteristics), the process of care (content or nature of the health care delivered) and the outcomes of care (health status and well­being of patients) (Donabedian 1988). Increasingly, quality assurance involves studies of effectiveness data, including health outcomes and the determinants of those outcomes from the perspectives of clinicians, patients, administrators, and policymakers (McDonald 2000). In detecting these differences between how well health care is delivered and applicable standards, quality assessment can also call attention to the need for further HTA or other investigations. In recent years, there has been further development and overlap of the fields of HTA and quality assurance, along with outcomes research, clinical epidemiology, and evidence-based medicine.  

In summary, HTA contributes knowledge used to set standards for health care, and quality assurance is used to determine the extent to which health care providers adhere to these standards (Lohr 1990; Lohr and Rettig 1988). Indeed, major reorganization of health care systems may be required to ensure that stronger evidence is generated systematically for setting standards of care, and that standards of care are broadly implemented (Institute of Medicine, 2001).

Outcomes Research and HTA

In principle, outcomes research concerns any inquiry into the health benefits of using a technology for a particular problem under general or routine conditions. In practice, the term outcomes research has been used interchangeably with the term effectiveness research since the late 1980s to refer to a constellation of methods and characteristics that overlap considerably with HTA. It has received increased attention in the US, particularly in the form of research funded by the AHRQ [] (formerly the Agency for Health Care Policy and Research). The attention given to outcomes or effectiveness research by government and, increasingly, the private sector (Mendelson 1998) reflects greater demand for data on patient and provider experience with technologies beyond what can be learned from the limited number of carefully circumscribed efficacy trials, e.g., premarketing clinical trials for new drugs and devices (McDonald 2000).

Outcomes/effectiveness research has emphasized health problem-oriented assessments of care delivered in general or routine settings; interdisciplinary teams; a wide range of patient outcomes including mortality, morbidity, adverse events and HRQL measures; the use of nonexperimental data (e.g., from epidemiological studies and administrative data sets); variations in practice patterns and their relationship to patient outcomes; and patient roles in clinical decision-making. The scope of outcomes/effectiveness research has expanded in recent years to include collection of experimental data on effectiveness, e.g., from large, simple trials conducted in general practice settings.

Decentralization of HTA

Although technology assessment originated as a primarily centralized function conducted by federal government agencies or other national- or regional-level organizations, HTA has become a more decentralized activity conducted by a great variety of organizations in the public and private sectors that make technology-related policy decisions (Goodman 1998; Rettig 1997). As noted above, an HTA done from a particular perspective may not serve the technology-related policymaking needs of other perspectives. Even for the same technology or clinical problem, there can be widely different assessment needs of politicians, regulatory agencies, health technology companies, hospitals, payers, physicians, and others. These needs are heightened with increased economic responsibilities and pressures on these different parties.

The growth in decentralized HTA activity has arisen less from a reduction in the level of centralized activity than expansion of HTA programs for particular decision-making needs. In the US, there remain multiple government centers with ongoing HTA responsibilities to fulfill particular purposes, e.g., drug and device regulation at the FDA, NIH consensus development conferences, Medicare coverage policies by the CMS, and the technology assessment program of AHRQ. There has been considerable expansion in activities elsewhere, particularly in the private sector, as well as greater reliance by centralized sources on HTA inputs from outside sources. Increasingly, large health care providers and major health care product companies are establishing units devoted to "technology assessment," "pharmacoeconomics," "clinical effectiveness," "health outcomes research," and related areas. More health plans (including various managed care organizations and insurance companies) have established formal programs to assess new procedures and other technologies in support of payment decisions. The number and magnitude of private firms and university centers involved in HTA is increasing. HTA committees (with various names) are now common among medical specialty and subspecialty societies. Hospital networks, managed care organizations and other large health care providers in the private sector have HTA programs to support acquisition and management of pharmaceuticals (e.g., P&T committees), equipment and other technologies and other technology-related needs throughout their systems (Kaden 2002; University HealthSystem Consortium 1996).  

Aside from the growth of HTA in the private sector, even HTA conducted by government agencies is drawing upon more decentralized resources. In the US, the FDA has long relied on advisory panels comprising outside experts to examine clinical trial findings and other evidence to provide recommendations regarding market approval of new drugs, biologicals, and medical devices. CMS has a large Medicare Evidence Development & Coverage Advisory Committee (MEDCAC)(formally known as Medicare Coverage Advisory Committee (MCAC))[], arranged into various panels, that provides recommendations for national coverage policies on new technologies and other interventions, based on review of the clinical literature, consultations with experts, and other data. AHRQ's Evidence-based Practice Centers (EPC) program [] has established contracts with 13 EPCs, mostly academic health centers and other institutions, including three in Canada, which generate "evidence reports" and technology assessments in support of clinical practice guidelines, coverage policies, and other practices and policies. Indeed, some EPC reports are conducted at the request, via AHRQ, of the NIH Consensus Development Program, CMS, and other government agencies; other requests are made by other organizations in the private sector, such as health professional organizations. In this manner, AHRQ provides a portal for decentralized HTA, via the 13 EPCs, on behalf of government and non-government organizations. AHRQ also administers the US Preventive Services Task Force [], an independent panel of experts in primary care and prevention that systematically reviews evidence of effectiveness and develops recommendations for a broad range of clinical preventive services.  

The Cochrane Collaboration [], another highly decentralized, successful model, involves 50 workgroups of volunteer experts around the world, coordinated through about 14 centers based in 12 countries, who conduct systematic reviews of a diverse variety of health care interventions.  

Decentralization of HTA and related functions widens the expertise available to HTA and brings broader perspectives to the process and diminishes or balances potential conflicts of interest. Together, these generally add to the credibility of HTA processes and findings, and lessen any charges that assessments reflect narrow or self-serving interests of a particular agencies or organizations.  

Tracking changes in the locus and magnitude of HTA is confounded by a broadening connotation of the term. Rather than referring only to the comprehensive inquiries involving broad societal impacts envisioned for the field in the 1960s, HTA is now used to refer to almost any evaluation or analysis pertaining to health care technology. Much of the expansion of HTA concerns meeting focused, immediate needs such as a coverage decision for a particular procedure, determination of the cost-effectiveness of a new device, or an equipment purchase decision. Another shift in locus concerns professional responsibility. Whereas technology-related decision-making in health care organizations was largely the responsibility of physicians, it is increasingly shared or redistributed among a wider spectrum of managers and other professionals.

Certain changes in the health care market are prompting greater balance between centralized and decentralized HTA. Hospital networks, large managed care systems and other large systems such as the Department of Veterans Affairs (VA) continually seek to build economies of scale and buying leverage for health care products, ranging from surgical gloves to hip joint implants. With HTA units that are centralized yet responsive to needs of individual facilities, these large organizations can consolidate their HTA efforts and support system-wide acquisition of drugs, equipment, and services.  

As health care providers and payers realize the resource requirements for conducting well-designed evaluations of health care technologies, they weigh the tradeoffs of conducting their own assessments versus subscribing to assessment report series from outside assessment groups. Clearly, assessment requirements vary widely depending on the type of technology involved. Acquisition of commodity products such as most types of syringes and surgical gloves is largely based on price, whereas acquisition of the latest drug-coated coronary artery stent requires a more considered evaluation of safety, effectiveness, cost, and other attributes. Nearly all hospitals and health care networks in the US rely on group purchasing organizations (GPOs) that use economies of scale to acquire most of their health care products. These GPOs, particularly the larger ones, have their own technology evaluation or clinical review committees that examine available evidence on technologies such as implantable cardiac defibrillators and MRI units, whose acquisition is a matter of factors other than price alone. In turn, many GPOs also subscribe to technology assessment report services (Lewin Group 2002).  

Barriers to HTA

Although the general trend in health care is toward wider and improved HTA, several countervailing forces to HTA remain. Foremost, particularly in the US and other wealthy countries, has been a "technological imperative" comprising an abiding fascination with technology, the expectation that new is better, and the inclination to use a technology that has potential for some benefit, however marginal or even poorly substantiated (Deyo 2002). Some argue that the increased potential of technology only raises the imperative for HTA (Hoffman 2002). Another countervailing factor is the sway of prestigious proponents or a "champion" of a technology in the absence of credible evidence. A third impediment is the inertia of medical practice, e.g., in the form of reluctance to change long-standing practice routines, conservative payment policies, and quickly outdated education. This is complemented by lack of opportunities for, or encouragement of, scientific inquiry and skepticism in clinical education.

Ever more effective marketing and promotions, including short courses sponsored by medical product companies to train physicians in using these products, can divert attention from key concerns of HTA. Another obstacle is the limited level of investment, by government and industry sources in HTA and related evaluations of what works in health care. Although some assessment programs and certain HTA findings are nationally or internationally recognized, the resources allocated for HTA in the US are virtually lost in the rounding error of national health care expenditures. Finally, the impression persists in some quarters that the goal of HTA is to limit the innovation and diffusion of health care technology.

Political processes can circumvent or threaten evidence-based processes (Fletcher 1997). One of the higher-profile applications of HTA is in determining covered services for health programs that are provided or funded by governments as well by the private sector. While most of these health programs have HTA processes that support benefits determinations, they are also subject to legislation (laws) in their respective countries, states, provinces, and other jurisdictions. Legislative bodies at these levels can mandate, or require, that health programs provide certain services. For example, in the US, the Congress has mandated that the Medicare program (for the elderly and disabled) provide certain services (e.g., screening procedures) that are not included in the benefit categories under the original Medicare statute. State legislatures have mandated that their Medicaid programs (for people with low incomes) as well as private sector health plans operating in their states, provide certain services. Recent examples of mandated services include autologous bone marrow transplant with high-dose chemotherapy (ABMT­HDC) for advanced breast cancer, bone densitometry screening for osteoporosis, screening mammography, prostate cancer screening, and treatment for temporomandibular joint disorder. Such mandates, including the ones noted here, may or may not be based upon the types of evidence-based methods used in HTA processes. As is the case for other industries, these mandates can be affected by political influence brought, e.g., by "lobbying" or "pressure groups" representing patient advocate organizations, physician groups, health product makers, and others (Deyo 1997; Sheingold 1998).

In some instances, legislative mandates arise through frustration with slowed or delayed HTA processes. A notable instance was the mandate by the US Congress for Medicare coverage of dual energy x-ray absorption (DEXA) for bone mineral density measurement, which had been subject to an assessment involving two federal agencies over a seven-year period (Lewin Group 2000). However, these mandates often circumvent evidence-based coverage policy, by providing an alternative, political route to coverage of technologies. The apparently direct process of mandating coverage of a technology, rather than subjecting it to well-founded HTA, can mask more complex clinical consequences. In the 1990s, many health plans reluctantly agreed to cover HDC-ABMT in response to state legislative mandates brought about by intensive political pressure, and the threat of litigation (legal action in courts). It was not until 1999, after tens of thousands of women were subjected to the procedure, that results of five well-conducted RCTs demonstrated that the procedure conferred no benefit over standard-dose treatment for breast cancer, and caused unnecessary suffering in some women (Berger 1999; Mello 2001; Sharf 2001). Aside from barriers to conducting HTA are barriers to implementing its findings and recommendations, particularly by decision makers and policymakers for whom HTA reports are intended. Among these are: lack of access to HTA reports, complex and technical formats of HTA reports, questionable data quality, absence of real-world applications, and narrow focus (Henshall 2002).

HTA and Underused Technologies

When used properly, HTA can reduce or eliminate the use of technologies that are not safe and effective, or whose cost is too high relative to their benefits. As discussed above, HTA can also be used to remove technologies from the market that are harmful or ineffective. Less attention is given to the ability of HTA to identify technologies that are underused, and to help determine why they are underused (Asch 2000; McNeil 2001). Underuse is prevalent in preventive, acute, and chronic care (McGlynn 2003) and contributes to tens of thousands of deaths and billions of dollars of losses to the economy and unnecessary health care costs (National Committee for Quality Assurance 2003).

For example, there is overwhelming evidence that smoking cessation interventions, including nicotine replacement therapy, the antidepressant bupropion, and counseling, are safe, effective, and cost effective (Anderson 2002; Foulds 2002; Jorenby 1999; Woolacott 2002). However, in Europe, North America, and elsewhere, these interventions are used far less than is indicated. Underuse is attributed to various reasons, including: lack of insurance coverage, concerns about short-term costs without regard to cost-effectiveness in the short-term (e.g., for pregnant women and infants) and the long-term; lack of smoker awareness of effective interventions; insufficient demand by patients, physicians, and the tobacco-control community; and the influence of the tobacco industry on policymaking (Schauffler 2001).

Box 42 shows examples of health care technologies for which good evidence exists of effectiveness or cost-effectiveness, but that are used significantly less than is indicated, even where they are affordable. Although this list applies primarily to the US, many of these technologies are underused elsewhere in North America, Western Europe, and other of the wealthier countries. The reasons that worthy technologies are underused are diverse, and include the following.

Lack of awareness on the part of patients, physicians, and others

  • Inadequate information dissemination
  • Limited coverage and reimbursement
  • Concerns about short-term cost without regard for cost savings and cost-effectiveness in the short- and long-terms
  • Inappropriate or unsubstantiated concerns about improper use (e.g., pain therapy)
  • Inconvenience and misperceptions on the part of clinicians or patients
  • Clinical inertia
  • Insufficient supply (e.g., organs for transplantation)
  • Disproportionate concerns about adverse effects (e.g., warfarin to reduce risk of stroke)
  • Concerns about patient compliance (e.g., polypharmacy for HIV/AIDS)
  • Fear of stigma (e.g., treatment of depression)
  •  Professional conflicts and "turf battles" on the part of physician specialists, provider institutions, industry, and others
Box 42
Underused Health Care Technologies (US)
  •  ACE inhibitors for treatment of heart failure
  •  ACE inhibitors for prevention of renal deterioration in insulin-dependent diabetics
  •  Ambulation aids (canes, crutches, walkers)
  •  Antibiotics for gastrointestinal ulcers
  •  Beta blockers for survivors of acute myocardial infarction
  •  Cholesterol-lowering drugs for patients at risk of coronary artery disease
  •  Cochlear implants for severe-to-profound deafness
  •  Colorectal cancer screening
  •  Corticosteroid inhalants for treating asthma
  •  Corticosteroid therapy for fetuses at risk of preterm delivery
  •  Depression diagnosis and treatment
  •  Diabetic retinopathy screening
  •  Hepatitis B virus vaccination of infants
  •  Implantable cardioverter-defibrillators for survivors of cardiac arrest
  • Incontinence diagnosis and treatment
  •  Intraocular pressure screening for glaucoma
  •  Oral rehydration therapy for dehydrated children
  •  Organ transplantation
  •  Pain management
  •  Polypharmacy (with protease inhibitors) for HIV/AIDS
  •  Pneumococcal vaccine for high risk patients
  •  Prenatal care
  •  Smoking cessation interventions
  •  Thrombolytic therapy for acute myocardial infarction
  •  Thrombolytic therapy for ischemic stroke
  •  Warfarin to prevent strokes due to atrial fibrillation

Conflict of Interest

HTA should consider the potential for conflict of interest on multiple levels. One is on the part of investigators who conducted and reported on the clinical trials and other studies that comprise the body of evidence under review. A second is on the part of sponsors of the primary research, e.g., technology companies, who have varying degrees of control over what research is conducted, selection of intervention and control treatments, selection of endpoints and follow-up periods, and whether research results are submitted for publication. Another is on the part of the health technology assessors themselves, including analysts, panel members, or other experts involved in reviewing the evidence and making findings and recommendations.  

Interpreting the literature for an assessment should include consideration of the existence of potential conflicts of interest that may have affected the conduct of a study or presentation of results. For study investigators, conflicts of interest may arise from having a financial interest (e.g., through salary support, ongoing consultancy, owning stock, owning a related patent) in a health care company (or one of its competitors) that may be affected by the results of a study or being an innovator of a technology under study. A systematic review of research on financial conflicts of interest among biomedical researchers found that approximately one-fourth of investigators have industry affiliations, and two-thirds of academic institutions hold equity in start-up companies that sponsor research performed at the same institutions. Industry sponsorship of research also was associated with restrictions on publication and data sharing (Bekelman 2003). Clinical trials and cost-effectiveness analyses that are sponsored by industry yield positive results more often that studies that are funded or conducted by others (Chopra 2003; Friedberg 1999). Among the reasons suggested for this discrepancy are that industry's publication restrictions tend to withhold studies with negative results. Another is that industry is more likely to sponsor studies (particularly RCTs) in which the results are likely to be positive, i.e., where there is an expectation that one intervention (e.g., a new drug) is superior to the alternative intervention. In the case of RCTs, this latter tendency would undermine the principle of "equipoise" for enrolling patients in an RCT (Djulbegovic 2000).

Peer-reviewed journals increasingly require disclosure of information pertaining to financial interests of investigators and the source of funding of studies (International Committee of Medical Journal Writers 1993; Kassirer 1993; Lo 2000). Some journals have particular requirements regarding protection against conflict of interest for economic analyses that have been subject to considerable controversy (Kassirer 1994; Steinberg 1995). Information about investigators, sponsorship of a study, or other factors that suggests the potential for conflict of interest should be considered when interpreting the evidence. Studies that are subject to potential conflicts of interest may have to be discounted or dropped from the body of evidence accordingly.  

HTA programs should take active measures to protect against potential conflicts of interest among assessment teams and panelists (Fye 2003; Phillips 1994). A conflict of interest may be any financial or other interest that conflicts with one's service on an assessment group because it could impair that person's objectivity or could create an unfair advantage. Conflict of interest is not the same as bias among assessment teams and panelists, which may entail views or inclinations that are intellectually motivated or that would be expected to arise from having a given organizational or professional affiliation. HTA programs should take active measures to minimize or balance bias among assessment teams and panel members.

The following recommendations for managing conflict of interest in practice guidelines development (Choudhry 2002) may be relevant as well to panels involved in HTA and related evidence-based activities.

  • A formal process should exist to disclose potential conflict of interest before the guideline development begins.
  • All members of the guideline group should be involved in a discussion of conflicts of interest and how significant relationships will be managed.
  • Participants who have relationships with industry, government agencies, health care organizations or specialty societies need not necessarily be excluded, but the group has to decide among itself a threshold for exclusion.
  • There must be complete disclosure to readers of the practice guidelines of financial and/or other relationships with industry, government agencies, health care organizations and specialty societies.

Previous Section Next Section Table of Contents NICHSR Home Page