Diabetes Spectrum
Volume 9, Number 2, 1996, Pages 86-93


These pages are best viewed with Netscape version 3.0 or higher or Internet Explorer version 3.0 or higher. When viewed with other browsers, some characters or attributes may not be rendered correctly.


Evaluation of Patient Education Programs: How to Do It and How to Use It


Mark Peyrot, PhD

Abstract

This paper distinguishes evaluation from assessment and defines the purpose of evaluation as determining what works and what does not work, how the program operates, and who it helps. It describes the four components of evaluation: needs assessment, process evaluation, outcome evaluation, and impact evaluation. The author examines what factors to measure and how they should be measured, and discusses how to analyze and use data to make evaluation as effective as possible.


The purpose of this paper is to provide readers with practical guidelines that can be used in evaluating their own programs. A systematic model is proposed, which can enable educators to target specific outcomes for evaluation. This model reflects the standards of the American Diabetes Association (ADA) Education Program Recognition Committee.1 However, this paper should not be regarded as a manual for how to design an ADA-approved evaluation system; the views presented here are my own.

Nature and Purpose of Evaluation
It is helpful to distinguish evaluation from assessment, which is more commonly known to diabetes educators. Assessment involves an analysis of individual patients to customize treatment to meet individuals needs.2 It typically takes place at the beginning of treatment and is used to formulate treatment plans on a patient-by-patient basis. Conversely, evaluation involves collection and analysis of data regarding groups of patients to describe and improve program functioning.

Evaluation identifies what works and what does not, and can be used to modify program organization and operation. Evaluation also can contribute to a generalized understanding of educational behavioral change processes, which moves it beyond a specific focus on the program being evaluated to make a contribution to the larger field.3,4 Evaluation, therefore, is primarily oriented towards improving a particular program, but has a secondary benefit of helping others who are conducting similar kinds of programs or want to do so.

Evaluation research involves four major components: needs assessment, process evaluation, outcome evaluation, and impact evaluation.

Needs assessment is conducted to identify needs at the population and individual level. At the population level, needs assessment identifies the needs of the larger population that might be served and helps formulate the nature of the program itself. For example, if there is a large segment of the population that is currently not being served (e.g., elderly, rural) this indicates the need for a program to provide services to that group. At the individual level, needs assessment helps determine the needs of the types of patients who actually enter the program. Ultimately, the program should be designed to meet their needs rather than those of people not served by the program.

Process evaluation, sometimes known as formative evaluation, includes a description of program activities, again, both at the individual level and at the level of patient aggregates. At the individual level, one needs to measure which patients have received what service, when they have received the service, how the service has been delivered to them, and so forth. In the aggregate, one needs to know what proportion of patients are receiving a given service. Process evaluation involves a monitoring of program functioning and can be used to modify the functioning of the program.

Outcome evaluation is the assessment of the achievement of intermediate goals, and impact evaluation is the assessment of the achievement of ultimate goals. Intermediate outcomes are best seen as outcomes that are stepping stones to ultimate goals. Therefore, the logic of the program should be that intermediate goals lead with a high probability to ultimate goals, so that if an intermediate goal, such as behavior change, is reached, it should lead to an ultimate goal, such as improvement in glycemic control. Of course, goals may be arranged relative to one another in a multi-level hierarchy so that glycemic control itself can be viewed as an intermediate goal relative to the reduction of complications of diabetes. And indeed the reduction of complications can be viewed as an intermediate goal relative to the ultimate goal of perceived quality of life.

There is no formal criterion for distinguishing ultimate and intermediate goals; ultimate and intermediate goals are defined with regard to the individual program. For example, in a short-term program, glycemic control may be the only ultimate outcome that can be measured within the time period of the evaluation. In contrast, for a life-long care program, avoidance of complications might be an ultimate goal.

Program goals must be chosen carefully since goals that are too far removed from the program may never be achieved. And failure to choose relevant intermediate goals may mean that the accomplishment of those goals does not produce progress toward the ultimate goals of the program.

With these four components of evaluation research in hand, it is possible now to turn to the types of factors one would want to measure in each component of evaluation. Because the focus is on evaluating existing programs, this paper will not deal with the use of needs assessment.

What to Measure: Process Indicators
Process indicators tell us about the nature and functioning of the program. Process evaluation provides feedback about whether the program is being delivered according to plan and whether the plan itself is appropriate for the patients being served.

Program exposure. Program exposure is one of the key process indicators. This includes how many patients have been served and the services they have received. One should measure the content delivered, specifically what type of content has been delivered (e.g., skills taught during education program) and how much (e.g., the number of different sessions provided to patients).

Measurement of the content delivered can be performed by the person delivering the content, by the recipient of the content, or by an external observer. In any case, accurate information on content delivery is required because often the assumption that all program content is being delivered is inaccurate. Often, individual patients get only a subset of the content. Some patients miss sessions where certain things are taught, and other patients miss other sessions. Maximum effectiveness requires that all patients get all sessions that are part of the program that has been designed for them.

The second aspect of delivering content is how well the content is delivered. It is possible to deliver the same content either effectively or ineffectively, and simply measuring whether the content has been delivered is not adequate. How well the content is delivered may be measured many ways, e.g., in terms of the types of methodologies by which the content is delivered (lecture format, question-answer, or hands-on behavioral methodology). A better way to judge how the content is being delivered and whether it has been delivered effectively is to observe the actual delivery of the program content.

How well the content is delivered can influence outcomes. An unpublished evaluation I conducted of a health education program found that the same content, when delivered by different providers, can have a very different impact on patients. This may, to some degree, be due to idiosyncratic presentation skills, but it also may be something providers can learn through training. For example, in the evaluation mentioned above, the staff members who were presenting program content ineffectively were identified and retrained, and their performance improved.

Patient involvement. Patients’ level of involvement in the program is another aspect of program functioning that should be assessed. One evaluation of a health education program found that the level of patient involvement was the single most important predictor of the learning process.5

Level of involvement may be measured in terms of a variety of factors, e.g., patients’ activity during sessions or effort on the course assignments they are asked to complete between sessions. Involvement can be measured by ratings of the person delivering the service, an independent observer, or the patients themselves. Even when the rating is done by an observer, this factor still can strongly predict how much patients learn.5 Involvement may be regarded as a patient characteristic, but ability to generate patient involvement may be one reason some providers are more effective than others, even though the content delivered is the same.

Patient perceptions. Another type of process measure involves patient perceptions. Perhaps the most commonly measured patient perception is satisfaction with the program. Evaluators should realize that satisfaction is a process measure and not an outcome measure. Satisfaction may be an outcome from a marketing perspective, but from an educational perspective, satisfaction does not necessarily indicate that learning has resulted. Therefore, while it is worthwhile to measure satisfaction, it should be regarded as an indicator of how the program process operates and not its effectiveness.

Patient perceptions also can be useful by providing feedback about needed changes in the program. For example, patients may feel that certain areas of the curriculum have not received adequate attention or that others have received excessive attention. When this is an individual idiosyncratic perception, it is difficult to use this information to modify the program. However, if there are trends in what patients want, then it is beneficial to identify them. For example, if patients generally do not need certain parts of the program but need more of other parts of the program, identifying this fact can help in modifying the program.

How to Measure: Patient Outcomes
Perhaps the most important aspect of how to measure outcomes is the nature of the research design. The optimal research design is a randomized clinical trial, with pre-intervention and post-intervention measures. However, this paper is based on the premise that most evaluations will not have a control group and that only patients who have experienced the program will participate in the evaluation. The issue of research design will be discussed further later in the paper.

There are several other considerations in measuring patient outcomes: 1) the type of sample upon which data is collected, 2) the types of measures for collecting data, and 3) the scheduling of data collection efforts.

Sample characteristics. The most important consideration regarding appropriate samples is to use an unbiased selection process. If only the best patients are chosen for the analysis, this not only leads to problems of validity and interpretation, but also actually works against the program. Patients with good glycemic control tend to change less as a result of the program than patients who start in worse control.6 On the other hand, excluding patients who are in good control also has a disadvantage because then you cannot determine whether those patients benefit as well or whether there is a need to provide an alternative program for them. So, evaluation studies should include a cross-section of patients treated by the program. The sample should be representative of all patients who participate in the program.
The second consideration in obtaining an appropriate sample is to make sure that the number of patients studied is sufficient to conduct statistical analysis. Statistical analysis is the preferred way of assessing program outcome data. Simply eyeballing the data to see if changes have occurred is not adequate for external evaluators who are judging the program.

Many studies find no statistically significant program effect because the samples are too small and the statistical power is not adequate. Fairly large changes may not be statistically significant because the sample is too small. Power analysis can be used to determine sample size7 based on anticipated program effects, but a simple criterion is that the minimum sample size for the most common statistics is 30. If subgroups are examined, each subgroup should contain 30 subjects. And if variances are substantial, larger samples are needed.

Types of measures. Another consideration when measuring patient outcomes is the types of measures to use. The message here is simple: researchers should use standardized measures with proven validity (measure what they are supposed to) and reliability (consistency). Some references for proven measures are provided in the section on what outcomes to measure and others can be found in published compilations.8

One of the major problems of (unpublished) evaluations is that the measures themselves are invalid and do not provide accurate information about program effectiveness. Unfortunately, this can result in a large expenditure of effort with relatively little benefit. It is, therefore, vital to make sure that the measures are valid and reliable.

One barrier to using existing questionnaires is that they may be too long. For example, if an evaluator wants to measure many different factors, each instrument may seem fine by itself, but in combination the set of instruments may be much too long. In this situation, it may be better to use a subset of items from an existing instrument than to try to create a new instrument, especially if the evaluator is not a researcher experienced in instrument development. While many would argue that shortening an instrument by selecting certain items is questionable, it seems highly preferable to inventing new, shorter instruments with unknown validity and reliability.

Another advantage of using standardized measures that have already been validated is that it provides for cross-site comparability. That is, by comparing your program to other programs that have used the same measure, you are able not only to look at how your program is functioning but also to find out if it is working better or not as well as other programs. This type of analysis will become increasingly important as evaluation research becomes more common and competition among programs increases.

Scheduling of data collection. The third aspect of how to measure outcomes involves the time scheduling of the measurements. It is absolutely crucial that data collection be longitudinal or prospective in nature, that is, that data is gathered at the beginning of the program, at the end of the program, and again at one or more follow-ups after the program.

Retrospective measurements (i.e., asking patients at some point after the program whether they have learned anything, or whether their behavior has changed) should be avoided. This type of measurement involves retrospective bias.9 Patients who are asked whether they have changed their behavior at all feel a pressure, known in the literature as a demand or expectation effect,10 to tell the researcher that changes have occurred.

The more appropriate approach is to obtain a measurement at the beginning of the program and obtain the same measurement again after the program. If change has occurred it is possible to ascertain this fact via a comparison of pre-program data with post-program or follow-up data using statistical analysis. Retrospective reports cannot be analyzed statistically if there is no comparison group. They can simply be reported, e.g., a certain percentage of people report they improved. Prospective measurements can be analyzed statistically using fairly simple statistical software to tell whether the results are statistically reliable.

The timing of pre-program measures is straightforward. Generally, they should be obtained immediately before the program. However, the question of when to obtain subsequent measures is more complex. The first post-program measurement should be obtained immediately after the program.

Factors that should be measured at this time are those thought to change during the program. Examples include: knowledge, diabetes self-efficacy, self-care skills, depression, well-being, and perhaps most importantly, intentions for future behavior. In a recent study, the single best predictor of behavior change subsequent to the program was participants’ intentions to make changes upon leaving the program.11,12 Thus, this is a key element in the measurement process and must be obtained at the end of the program.

The usefulness of end-program measures can be illustrated by considering some alternative scenarios. Let us assume at follow-up one finds that patients have not changed their behavior in the ways that the program had intended. There are two possible explanations. One is that the program was ineffective in getting patients to formulate intentions for behavior change. At the end of the program, patients simply felt no need to change their self-care behavior. Alternatively, it may be that at the end of the program patients intentions had changed, and yet they were not able to follow through on their behavior change intentions.

These two scenarios imply quite different things about the program and how it would need to be modified to achieve the desired impacts. If intentions are changed but the behavior is not, the program needs to examine the methods by which it prepares patients to implement behavior change intentions, e.g., more information about overcoming barriers. On the other hand, if the program has been unsuccessful in changing intentions, the conclusion would be that the program needs to put greater effort into that part of the process. Patients may understand how to change behavior, but they may not be convinced that it is necessary to do so. If so, greater effort should be devoted to changing intentions.

Longitudinal research also requires consideration of the timing of follow-up assessments. One follow-up is not the ideal. In fact, there are at least two or three different times when it is desirable to obtain follow-up measures.

The first is a relatively short period after the program, perhaps 1–2 months after completion of the program. One of the purposes of this follow-up is to see whether patients are making an effort to implement the goals and intentions they formulated at the end of the program. Patients may have lost the initial momentum they had upon leaving the program and may not have actually made any attempt to implement changes. This can be an opportune time to remotivate patients and help them to translate classroom learning into real-world self-care.

The next follow-up should be some months later, to observe whether patients are continuing to follow through on their self-care intentions. A good time to measure this is 3–6 months after the initial program has been completed. If an initial follow-up is taken 1–2 months after the program, then it is appropriate to make the second follow-up somewhat later, that is 6 months after the program.

It is desirable to conduct at least one more follow-up past the 6-month period to see if these changes have been sustained. One study found that at 6 months after the program, a number of improvements had occurred.6 But 12 months after the program, the pattern of outcomes had changed.13,14 For some measures, (e.g., glycemic control) the change at 6 months was simply sustained over the next 6 months, indicating stability in the improvement. For other measures, the changes had begun to decay. This was particularly true for the emotional factors like depression and anxiety which, although substantially improved at 6 months, had at 12 months to begun to relapse back to pre-program levels. For other measures, the changes that had occurred at 6 months had continued in a positive direction at 12 months. This was the case for insulin adjustment; patients had learned the technique during the program, begun to engage in self-adjustment during the first 6 months, and increasingly implemented it on their own. These findings suggest that assessing the true impact of the program requires long-term follow-ups.

In conducting follow-ups, it is important to remember the first aspect of how to measure. One must obtain a representative sample of those who participated in the program. This requires that the evaluation avoid attrition from the baseline cohort. If a large percentage of patients drop out of the evaluation, the results of follow-up data can be called into question. For example, patients who do not do well may drop out, making the program look better than it really is. If attrition occurs, an effort should be made to determine whether dropouts differ from those who remain.6 This information can be used in interpreting results for those who remain in the evaluation.

What to Measure: Patient Outcomes
The following review of patient outcome factors is based in part upon the ADA Education Program Recognition Standards1 for factors to be measured either as part of patient assessment or as part of program evaluation or patient outcome evaluation (see Table 1). I make no attempt to be exhaustive, and neither do the ADA recognition standards. However, the standards are fairly comprehensive, and comprehensiveness is a necessary feature of an evaluation for most education programs.

A select number of education programs may intend to teach a very narrow set of skills or address a narrow set of problems, and perhaps for those programs, a more comprehensive evaluation is not necessary or appropriate. But the majority of programs intend to have comprehensive effects, and evaluation should measure all the factors the program intends to affect, including medical, behavioral, and psychosocial factors.15

 

Table 1. Patient Outcome Measures

Medical Factors

Medical history*
Present health status*
- Glucose control*
- Complications
Health resource utilization*
- As indicator of need
- As indicator of prevention
Risk factors*
- Hypoglycemia unawareness
- Smoking

Behavioral Factors

Diabetes knowledge and skills*
- BG relationships and dynamics*
Health behavior and      goals/intentions*
- BG monitoring*
- Medication*
- Diet*
- Exercise*
- Prevention/management of      complications*
- Pregnancy management*

Psychosocial Factors*

Social support systems*
- Family*
- Peer
-Health beliefs and attitudes*
Psychosocial adjustment*
- Quality of life
- Psychological well-being
Contextual factors
- Barriers to learning*
- Socioeconomic factors*
- Cultural influences*

* Measure required by ADA Recognition Program.1

Medical factors. Medical factors include a variety of subcategories, which are identified in the patient’s medical history. For example, the ADA recommends that glucose control be evaluated, i.e., blood glucose readings from memory meters, or measures of longer term glucose control such as glycosylated hemoglobin and fructosamine.

Medical histories generally contain information regarding diabetic complications. However, it is difficult to use data regarding long-term complications (retinopathy, nephropathy, amputations) in evaluations because complications are largely irreversible through education. It is useful to evaluate these if one has a control group, as in the Diabetes Control and Complications Trial, where one can compare the progression of complications in the treated group to a control group. However, the evaluation of a program without a control group must be able to compare pre-program rates of complications with post-program rates (e.g., the year before the program versus the year after the program). This type of evaluation can only assess short-term complications (hypoglycemic events, hyperglycemic coma, foot infections) that can recur.

Another set of medical factors that should be measured involve health resource utilization. However, these measures are difficult to interpret since success can be indicated either by an increase in use or by a decrease in use. When resource utilization is regarded as an indicator of need, such as emergency room visits for hypoglycemia or hyperglycemia, then a decrease in use can be regarded as a success. When health-care utilization is an indicator of preventive behavior (e.g., increased visits for eye screening, increased attendance at regular clinic appointments, increased visits to a podiatrist for foot examinations), then an increase in use may be regarded as a success.

The third set of medical factors to be examined are a variety of risk factors that can influence the efficacy of the program. For example, hypoglycemia unawareness may be a counter-indication for intensive insulin management and may lead to a smaller improvement of glycemic control. The program can also examine its ability to affect such risk factors, e.g., smoking, which is a major risk factor for progression of a variety of diabetic complications, or obesity, which is itself a risk factor for complications and poor glycemic control, or cholesterol levels, again a risk factor for subsequent complications.

Even if complications themselves are not measured, it is possible to measure the impact on risk factors other than glycemic control that ultimately can affect complications. Some of these, such as smoking and high cholesterol, are not unique to diabetes, but may be of special importance to people with diabetes because of their increased vulnerability.

Behavioral factors. The first set of behavioral outcomes are diabetes knowledge and skills. Knowledge can be thought of as abstract information, while skills reflect the ability to apply knowledge in concrete situations. In assessing knowledge gain, it is essential to ask what knowledge is relevant.

Looking back at the distinction described earlier between intermediate and ultimate goals, knowledge improvement is relevant only to the degree that it leads to other kinds of changes, such as improvement in self-care behavior, glycemic control, and quality of life. Knowledge is not a protective factor in and of itself, thus much knowledge about diabetes may be irrelevant to the accomplishment of other ultimate goals.

For a person who already has diabetes, knowing the risk factors for diabetes will not improve self-care, nor will it improve glycemic control and reduce complications. However, knowledge about blood glucose relationships and dynamics represents relevant knowledge because it can be used in managing transient blood glucose levels. Thus, a good test for deciding whether a type of knowledge should be a goal for the program and whether this goal should be assessed, is how likely it is that the knowledge will be used in helping to control blood glucose and avoid complications.

The most important type of behavioral outcome is self-care behavior itself, and the behavior change goals and intentions that mediate the program effect on self-care. Self-care behaviors that should be analyzed, if relevant, include blood glucose monitoring, insulin and oral glycemic medication, diet, exercise, sick-day management, prevention and remedy of short-term complications (e.g., hyperglycemic and hypoglycemic coma, and foot problems), and pregnancy management.

The most common technique for measuring self-care is patient self-reports or, in the case of young children, parent reports, and in the case of elderly people, caretaker reports. With some self-care behaviors, such as blood glucose monitoring, it also is desirable to observe skills, but regimen implementation usually relies on self-report. (Exceptions include such things as pill counting, counting self-monitoring of blood glucose [SMBG] finger sticks, and examining memory meter records.)

In addition to measuring the number of times a given behavior is performed, evaluation should measure the appropriateness of a given behavior. If patients do not use SMBG results in managing glycemia, it does not matter how often they test. The number of SMBG tests is associated with glycemic control only if the results are used in micro-management activities such as insulin self-adjustment.16 Evaluation should measure not only how often patients adjust their insulin doses, but whether they do so in a way that properly responds to or anticipates changes in glucose intake or expenditure.

Psychosocial factors. The first category of psychosocial outcomes is social support systems. Many standardized measures are available for social support.17 These assessments should include both family and nonfamily support providers. Family is a critical resource for people who are married, or for children, and family also can be a resource for unmarried adults, especially the extended family. Peer influences also should be assessed, whether school peers for children or work peers for adults. Of course, social support also includes that of health-care professionals.

Another category of psychosocial outcomes that should be measured is health beliefs and attitudes. As suggested earlier, beliefs and attitudes are intermediate outcomes because they are related to self-care. That is, belief about the controllability of diabetes, for example, can inhibit or facilitate effective self-care behavior.18,19

Health beliefs and attitudes are generally not ultimate outcomes in themselves, but only stepping stones to other factors that can improve the quality of life. Again, a variety of studies have generated instruments, and these instruments can be used in measuring those factors believed to be key targets in the program being evaluated.18-20

The third set of psychosocial outcomes, quality of life and psychological well-being, might be regarded as the ultimate outcomes of diabetes education. Quality of life measurements often ask patients about life satisfaction, about their subjective rating of aspects of their own life. Measures include diabetes-specific,21 health-related,22 and global quality of life.23 On the other hand, measures of psychological well-being are often phrased negatively and look at distress, with the absence of distress regarded as well-being.6,14 In combination, quality of life and psychological distress provide a multidimensional view of how patients experience their own lives.

Other psychosocial factors are related to quality of life and psychological well-being, e.g., coping skills and diabetes self-efficacy.6,14 Coping skills help patients deal with stressful situations, and poor coping can exacerbate the psychological demands of implementing an effective self-care regimen.24,25 Therefore, enhancing patients’ ability to cope with the day-to-day demands of diabetes is a measure of program effectiveness. This approach empowers patients to deal with their own life situations, and frees them from reliance on health-care professionals.

The fourth set of psychosocial factors are contextual. These include barriers to learning, socioeconomic factors, and cultural influences, all of which may define subgroups of patients who require special treatment or who may do better or worse in response to the program.

Cultural influences are clearly relevant, especially language comprehension, but many less obvious cultural factors must be examined, e.g., the beliefs among certain cultures that obesity is an indicator of health and may contribute to sexual attractiveness, or the belief that certain kinds of food are healthy even though these may be high in fat.

Socioeconomic factors include the ability to pay for materials required to implement the self-care behaviors that have been chosen as patient outcome goals. If a patient does not have the ability to purchase glucose test strips or a memory meter, then it may not be possible to implement SMBG.

Barriers to learning include a variety of factors, such as reading level, educational background, and so forth. Psychiatric disabilities that impair patients’ abilities to learn or effectively manage their diabetes also are psychosocial barriers.26,27

How to Analyze and Use Data
Statistical issues. One of the key prerequisites of evaluation is the ability to analyze data to yield meaningful results and then use these results to improve program functioning. Statistical analysis is the basis for the appropriate interpretation of data. This allows for confidence in the inferences made from the data.

Although there is nothing magical about P < 0.05, significance levels do provide guidance in interpreting results. Essentially, the significance level represents the probability that differences are due to chance; smaller significance levels represent greater confidence that the differences observed are real rather than artifacts.

The requirement for statistical analysis does not mean that the analysis needs to use the most sophisticated methodologies. For example, it is not necessary to conduct a multivariate analysis when assessing whether program outcomes are the same across patient subgroups. It may be sufficient to conduct a series of separate analyses, one for each contextual variable.

Conducting an evaluation does not require experimental data. While a clinical trial may be viewed as the gold standard for a variety of research purposes, it is not necessary for evaluation. Nonexperimental data can be analyzed by comparing post-program and follow-up scores to pre-program levels.

While these changes cannot be compared to an equivalent control group to determine what changes would have occurred in the absence of the program, it is possible to use patient comparison groups for a similar purpose. For example, patients who received all elements of the program may be compared with patients who received only part of the program. The former should improve more than the latter.

Also, it is possible to have a quasi-experimental design through a process of continuous quality improvement. In the continuous quality improvement process, multiple changes are made over time and the effects of changes in the program on patient outcomes are observed. For example, if the program introduces a new module and observes that patient outcomes improve, this constitutes a form observed. For example, if the program introduces a new module and observes that patient outcomes improve, this constitutes a form of quasi-experimental data comparing the new program to the old program.

Of course, extraneous factors may be producing this result, but if the improved outcomes are stable and statistically significant, one can feel comfortable with continuing the program under the modified format. If a modification seems to produce a detrimental effect, the modification can be rescinded, and if this reversal produces a return to the previous higher level of results, this effectively demonstrates that the earlier program is better. Again, this may not be the gold standard in terms of scientifically valid generalization to a broad variety of programs, but it is valid in terms of assessing the program being evaluated.

Application issues. There are several ways to analyze outcome data. The first approach is a simple analysis of aggregate change, looking at all patients combined to determine whether they improve over time in the outcomes of interest.

Each outcome should be examined separately, because the program may be functioning particularly well in some areas and not well in others. This would indicate which areas should be targeted for future program improvement efforts. It is not reasonable to assume that because the program works well for some outcomes, it necessarily works well for all outcomes. This is parallel to the phenomenon that has been noted with self-care behavior that it is possible for people to take care of themselves very well in one aspect of their self-care but not in another.28

The second analytic approach is to determine whether changes over time are different for different patient sub-groups.6 Evaluation should determine whether the program works for all types of patients. It may be, for example, that the program works effectively for men but not women, for young people but not older people, for insulinusing people but not for those who do not use insulin. It is even possible, although perhaps not common, that overall the program may not be effective and yet in certain subgroups it may be effective while in other subgroups it may actually be detrimental.

Patient subgroups that should be examined include those identified by demographic factors (e.g., language, gender, age, education) and disease-related factors (e.g., duration of diabetes, whether insulin is used, non-insulin-dependent versus insulin-dependent diabetes mellitus). Other sub-groups are defined by the factors that are targets of the program. For example, people who are experiencing the most difficulty upon entering the program are typically those who derive the most benefit.29 While this may be appropriate, it also suggests that those who come into the program with the best self-care and glycemic control may not be appropriate for the program being evaluated, or may require a different kind of program to help them to improve. It may be that a given program does not work effectively for everyone, and subgroup analysis can identify why this is so and lead to modifications in the program or initiation of alternative programs.

A third approach to statistical analysis can identify what kinds of changes are consequential and where effort should be directed. For example, some targets are change-resistant. Earlier research has demonstrated that the lifestyle type of self-care behavior is very resistant to change. Even when people intend to change their behaviors upon leaving the program, they are not often effective in following through on those intentions. In contrast, self-management behaviors such as SMBG and insulin adjustment do change when patients have the intention to change them.11-13

Analysis reveals that the difference between self-management and lifestyle change is not due to differences in intentions but rather to differences in individuals' ability to implement their behavior change goals. This suggests a variety of program modifications. One strategy might be to place more emphasis on self-management behaviors. Another strategy is to work much more intensely with people who require lifestyle changes. That is, the same amount of effort devoted to each outcome may be much more effective in producing micro-management changes than lifestyle changes, and lifestyle changes will require a more intensive and/or enduring program.

Finally, analysis can identify which intermediate outcomes are critical in producing ultimate outcomes. For example, a recent analysis found that a traditional measure of diabetes self-efficacy was not strongly related to changes in self-care behavior, while intentions to change self-care behavior were strongly predictive of actual changes in self-care behavior after the program.12 This analysis suggests that program efforts oriented toward improving diabetes self-efficacy may be less useful than other efforts. And, in particular, it suggested that if a program does not change behavioral intentions, it may be ineffective in producing behavior change, regardless of what other self-care precursors have been affected. This finding is consistent with the emphasis on behavior change goals in the ADA Education Program Recognition Standards.1

Other Evaluation Issues
I have described how evaluation data should be gathered, analyzed and, most importantly, how it should be used. The primary use of evaluation is internal—the use of evaluation to identify areas of the program that need improvement and to suggest ways improvements should be made. For example, is the program able to address a particular problem, or does the program need to work better with a particular patient subgroup? It can also identify which areas require more program effort and which do not require much program effort.

The second use of evaluation about which program managers should be aware is to provide external validation. Education programs increasingly need to demonstrate that they work. This is a result both of the fact that funding for diabetes education depends on the effectiveness of education as a health-care approach, and, secondly, to obtain certification that helps a diabetes program obtain funds.

At times, there may be a conflict between the internal and external orientations. For internal purposes, it is desirable to identify program weaknesses (so they can be corrected); for external purposes, it is desirable to focus on program strengths (to demonstrate success). External publics may want simple statistics, such as retrospective subjective reports of whether the program has helped patients, whereas objective evidence requires a statistical comparison of pre-program and post-program data. Under these circumstances it probably is necessary to conduct parallel evaluations using both methodologies. Different data can be presented to different audiences and used for different purposes.

Another consideration in planning an evaluation is whether to include a cost-efficiency analysis. This is an important issue because funding is limited, and choices must be made when designing a program or deciding whether to refer patients for services.

Costs of service must be quantified, as must the value of program benefits. When analyzed from the perspective of a purchaser, costs include not only the cost of the program, but also the cost of increased preventive care, which often results after an education program. Short-term economic benefits consist of cost savings due to decreased hospitalizations for acute complications, while long-term benefits include those resulting from the reduction of chronic complications. When analyzed from the perspective of the consumer of services, benefits also include improvements in quality of life, but it is difficult to translate this benefit into economic terms.30

Again, because of the different perspectives of purchasers (short-term, bottom line) and consumers (long-term, holistic), it may be necessary to conduct parallel cost-benefit analyses. Given the increased emphasis on value and cost control in health care, cost-benefit analysis is likely to become more common. Providers would do well to prepare for.r the day when it will be mandatory.

Finally, I would like to address the feasibility of conducting the evaluations described in this paper. Most of the data collection described here is required to meet the ADA Education Program Recognition standards.~ None of what is proposed requires a full-time professional evaluator. For some programs it may be more efficient to hire a consultant to help design the data collection and analysis protocol.

Once an evaluation design has been formulated, it takes relatively little effort to implement ongoing data collection. The necessary statistical analysis can be performed periodically by someone with moderate research skills. Interpretation of results can be facilitated by developing a reporting format into which periodic results can be incorporated, e.g., graphical displays. Once a plan is formulated, routine data collection, statistical analysis, and reporting can be handled by staff with little or no assistance from professional evaluators.

In summary, evaluation of diabetes education programs can be useful in improving quality and demonstrating cost-efficiency. Evaluation does not require massive amounts of money or specialized staff. All it takes is commitment and a willingness to follow guidelines and learn from experience. As any good educator knows, every journey starts with a first step. If your program does not have an evaluation, make a commitment to develop one and set a timetable. If your program's evaluation is inferior, identify one problem, define an objective, and generate one or more strategies for achieving that objective. Do it now.

References

1Meeting the Standards: A Manual for Completing the American Diabetes Association Application for Recognition, 4th Edition. Alexandria,. Va., American Diabetes Association, 1995.

2Peyrot M: Presentation: How to utilize assessment in clinical settings: a framework for application. American Diabetes Association 54th Annual Scientific Sessions, June 1994.

3Brown SA: Effects of educational interventions in diabetes care: a meta-analysis of findings. Nurs Res 37: 223-30, 1988.

4Padgett D, Mumford E, Hynes M, Carter R: Meta-analysis of the effects of educational and psychosocial interventions on management of diabetes mellitus. J Clin Epidemiol 41:1007-3, 1988.

5Peyrot M, Yen S, Baldassano CA: Short-term substance abuse prevention in jail: a cognitive behavioral approach. J Drug Education 24: 33-47, 1994.

6Rubin RR, Peyrot M, Saudek CD: Effect of diabetes education on self-care, metabolic control, and emotional wellbeing. Diabetes Care 12: 673-79, 1989.

7Cohen J: Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ, Lawrence Erlbaum, 1988.

8Bradley C: Handbook of Psychology and Diabetes: A Guide to Psychological Measurement in Diabetes Research and Practice. Berkshire, England, Harwood Academic Publishers, 1994.

9Cook TD, Campbell DT: Quasi-experimental Design and Analysis Issues for Field Settings. Newbury Park, Calif., Sage, 1979.

10Orne M: On the social psychology of the psychological experiment. Am Psychol 17: 776-83, 1962.

11Peyrot M, Rubin RR: Effect of education on lifestyle and self-regulation intentions and behavior. Diabetes 39 (Suppl 1):16A, 1990.

12Peyrot M, Rubin RR: The effect of self-efficacy and behavioral intentions on self-care improvement following diabetes education. Diabetes 44 (Suppl 1):96A, 1995.

13Rubin RR, Peyrot M, Saudek CD: Differential effect of diabetes education on self-regulation and lifestyle behaviors. Diabetes Care 14: 335-38, 1991.

14Rubin RR, Peyrot M, Saudek CD: The effect of a comprehensive diabetes education program incorporating coping skills training on emotional well-being and diabetes self-efficacy. Diabetes Educ 19: 210-14, 1993.

15Glasgow RE, Osteen VL: Evaluating diabetes education: are we measuring the most important outcomes? Diabetes Care 15: 1423-32, 1992.

16Peyrot M, Rubin RR: Insulin self-regulation predicts better glycemic control. Diabetes 37 (Suppl 1):53A, 1988.

17Bruhn JG, Philips BU: Measuring social support: a synthesis of current approaches. J Behav Med 7: 151-69, 1984.

18Peyrot M, Rubin RR: Structure and correlates of diabetes-specific locus of control. Diabetes Care 17: 994-1001, 1994.

19Peyrot M, McMurry JF: Psychosocial factors in diabetes control: adjustment of insulin-treated adults. Psychosom Med 47: 542-57,1985.

20Brownlee-Duffeck M, Peterson L, Simonds JF Goldstein D, Kilo C, Hoette S: The role of health beliefs in the regimen adherence and metabolic control of adolescents and adults with diabetes mellitus. J Consult Clin Psychol 55: 139-44, 1987.

21The DCCT Research Group: Reliability and validity of a diabetes quality of life measure for the diabetes complications and control trial (DCCT). Diabetes Care 11: 725-32, 1988.

22Ware JE, Sherbourne CD: The MOS 36-item short form health survey (SF-36). Med Care 30:473-83, 1992.

23Andrews FM, McKennell AC: Measures of self-reported well-being: their affective, cognitive, and other components. Soc Indicators Res 8: 127-55, 1980.

24Peyrot M, McMurry JF: Stress-buffering and glycemic control: the role of coping styles. Diabetes Care 15: 842-46, 1992.

25Hanson CL, Harris MA, Relyea G, Cigrang JA, Carle DL, Burghen GA: Coping styles in youths with insulin-dependent diabetes mellitus. J Consuk Clin Psychol 57: 644-51, 1989.

26Rubin RR, Peyrot M: Psychosocial problems and interventions in diabetes: a review of the literature. Diabetes Care 15: 1640-57,1992.

27Young-Hyman D, Peyrot M, Jacobson AM, Schlundt D, Drotar D, and the DCCT Research Group: Association of distress and self-care behavior with glycemic control (HbAlC) in the DCCf. Diabetes 44 (Suppl 1):96A, 1995.

28Johnson SB: Methodological issues in diabetes research: measuring adherence. Diabetes Care 15: 1638-67, 1992

29Peyrot M, Rubin RR: Modeling the effect of diabetes education on glycemic control. Diabetes Educ 20:143-48, 1994.

30Epstein RS, Lydick E: Quality of life assessment: a pharmaceutical industry perspective. In Quality of Life in Behavioral Medicine Research. Dimsdale JE, Baum A, Eds. Hillsdale, NJ, Lawrence Erlbaum, 1995, p.57-67.


Acknowledgments

This paper is an outgrowth of a presentation I made at the request of Linda Siminerio to the Health Professional Luncheon during the 55th Annual Meeting and Scientific Sessions of the American Diabetes Association in June 1995. I would like to express my appreciation to Elizabeth Walker, Russ Glasgow, Debra Haire-Joshu, and the participants of the Ninth Invitational Conference on Behavioral Research in Diabetes Mellitus for their thoughtful discussion of earlier versions of this paper.


Mark Peyrot, PhD, is the director of the Center for Social and Community Research and an associate professor of sociology at Loyola College and is a research associate in the Department of Medicine and Diabetes Center of Johns Hopkins University School of Medicine, in Baltimore, MD.


Return To American Diabetes Association Home Page

Copyright © 1996 American Diabetes Association

Last updated: 7/25/96
For ADA Related Issues contact
CustomerService@diabetes.org

For Technical Issues contact webmaster@diabetes.org