Journal of Obesity and Overweight

ISSN: 2455-7633

Open Access
Research Article
PMID: 26618201
Max Screen >>

Objectively Coding Intervention Fidelity During A Phone-Based Obesity Prevention Study

Received Date: December 12, 2014 Accepted Date: May 19, 2015 Published Date: May 26, 2015

Copyright: © 2015 JaKa MM. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Related article at Pubmed, Google Scholar


Background: Childhood obesity prevention studies have yielded disappointing results. Understanding intervention fidelity is necessary in explaining why interventions are (or are not) successful and ultimately improving future intervention. In spite of this, intervention fidelity it is not consistently reported in the obesity prevention literature. The purpose of the current study was to develop and utilize a coding protocol to objectively assess intervention fidelity in a phone-based obesity prevention study for parents of preschool-aged children.

Findings: Both interventionists and independent coders completed session fidelity measures including time spent on target areas (media use, physical activity, etc.) and components of goal setting quality. Coders also rated participant engagement. Agreement between ratings by interventionists and coders, fidelity levels and changes in fidelity components over time is presented. Coders and interventionists showed high agreement when reporting time spent discussing different target areas. Interventionists consistently rated themselves higher than independent coders on measures of goal quality. Coder ratings of session quality were initially high, but some components declined slightly across the eight sessions.

Conclusion: Future directions for intervention fidelity measurement and analysis are discussed, including utilizing changes in fidelity measures over time to predict study outcomes. Obtaining a more in-depth understanding of intervention fidelity has the potential to strengthen obesity interventions.

Keywords: Intervention fidelity; Audio coding; Process evaluation; Behavior change


Childhood obesity prevention study results have often yielded disappointing results. A review of behavioral interventions in preschool-aged children found little impact on weight status [1]. Authors proposed "suboptimal implementation" as a potential cause. Ensuring high fidelity of intervention implementation is paramount and variability in fidelity may partially explain inconsistent results. Though not yet consistently reported in obesity prevention research, fidelity reporting has gained speed in other domains.

The Behavior Change Consortium (BCC) defines "treatment delivery," or the degree in which the intervention was implemented as intended, as an important component of intervention fidelity [2]. To accurately measure treatment delivery, the BCC suggests objective coding of intervention sessions given the potential bias in interventionist self-report [2]. Behavior change researchers have begun adopting objective coding techniques previously used in psychotherapy research to measure intervention fidelity, for example the coding schemas developed for Motivational Interviewing interventions [3,4]. One group developed a similar schema for an obesity intervention which used patient-centered communications, showing adequate inter-coder reliability [5].

Intervention fidelity is necessary in understanding why obesity prevention interventions are (or are not) successful. This study uses data from Healthy Homes/Healthy Kids-Preschool (HHHK-Preschool), a randomized controlled trial evaluating a phonebased obesity prevention intervention for parents of preschool-aged children to (1) develop an objective coding protocol to assess fidelity, (2) compare interventionist and independent coder ratings of fidelity, (3) describe HHHK-Preschool intervention fidelity and (4) explore how fidelity data could be used to understand outcomes in a larger trial.


Parents of 2-4 year old children with an annual well child visit scheduled at one of 20 clinics in the greater Minneapolis-St. Paul area were invited, after primary care provider approval, to participate in the HHHK-Preschool study. Interested and eligible participants were parents and children with a body mass index (BMI) percentile between 85 and 95, or between 50 and 85 with an overweight parent (BMI ≥ 25 kg/m2). Sixty parent-child dyads were randomly assigned to an obesity prevention arm or a contactcontrol arm. Parents in both arms received 8 bi-weekly phone sessions. Phone sessions for parents in the intervention group focused on healthy weight-related behaviors, whereas sessions for parents in the contact-control arm focused on child safety and injury prevention. Follow-up measurements were taken immediately post-intervention (6 months after baseline). This study uses data from participants enrolled in the obesity prevention arm (n = 30) only. Study protocols were approved by the HealthPartners IRB and participants provided written informed consent.

Obesity Prevention Intervention

The obesity prevention intervention was influence by social ecological models [6,7], Social Cognitive Theory [8,9] and Motivational Interviewing [10,11], and consisted of 8 sessions covering 4 target areas: screen time, sweetened beverages, physical activity, and healthy meals and snacks. Parents decided the order in which areas were discussed and how much time was spent on each. Sessions focused on goal setting using several behavioral adherence strategies: setting specific goals, including small and achievable steps, anticipating problems and formulating solutions, tracking progress and using a reward system, and identifying social support [12]. The interventionist led the participant through the goal setting process and together they decided on a target-related goal at the end of each session (e.g., serving raw celery, carrots, and cucumbers instead of crackers and cheese for afternoon snack each day for a week). At the beginning of the subsequent session, progress from the previous goal was reviewed and a new goal was set. The new goal could have been unrelated to the previous goal or a revision of the previous goal. In addition to goal setting, the first session included a participant self-assessment of each target area, and the last session included a wrap-up activity. The intervention was conducted by 2 interventionists with Bachelor's or Master's degrees in health-related fields and with previous experience as behavior change counselors. As part of the study, interventionists received extensive training in behavioral adherence strategies and study protocols.

Fidelity Measures

Interventionists completed a self-assessment of session fidelity (e.g., use of behavioral adherence strategies and time spent discussing various target areas) immediately following each session. To ensure feasibility of completing this assessment between back-to-back sessions, the tool was designed to be brief. As part of the initial training on study protocols, interventionists received instruction on how to complete the self-assessment. Written instructions were also included in the interventionist manual and ongoing support for standardization was provided throughout the intervention. Intervention sessions were audio-recorded for future assessment of fidelity by independent coders.

In order to directly test interventionist presentation bias (Aim 2), the independent coder tool for assessing specific components of session fidelity (e.g., use of behavioral adherence strategies and time spent discussing various target areas) exactly mirrored the interventionist self-assessment. The coding tool also included additional fidelity items described below. Four research staff not involved in delivering the intervention completed coder training, which focused on learning behavioral adherence strategies and identifying their use in sessions. Similar to the interventionists, each coder held a Bachelor's or Master's degree in a health-related field. Coders were required to complete 5 certification sessions and obtain an inter-rater reliability of 80% when compared to a lead coder. Weekly meetings were held to promote standardization throughout the coding process.

Goal Setting: Interventionists and coders assessed the frequency and quality of target-related goals set during each session. The number of sessions with a goal set was divided by the number of sessions with complete audio to obtain a percent of sessions with a goal set per participant. If a goal was set during a session, interventionists and coders completed 5 items of goal quality each on a 5-point Likert scale (ranging from 1 = "not at all" to 5 = "completely"), including the extent to which the goal was (1) specific and detailed and (2) include small steps, and the extent to which the interventionist helped the participant (3) rehearse steps and anticipate problems, (4) develop a plan to track and reward progress, and (5) identify social support. An aggregate score was created by summing each of the 5 items with possible scores ranging from 4 to 21; higher scores indicated higher quality. To summarize at the participant level, average scores for these items were calculated across all a participants sessions with complete audio recording. Goal achievement or goal progress was not assessed in these analyses.

Time Spent Discussing Target Areas: Interventionists and coders assessed the percent of the session spent on each of the 4 target areas above (0%, 1-10%, 10-25%, 25-50%, and 50-100%). To estimate the number of minutes spent on each target area, the midpoint of the selected percent category (e.g., 5.5% if 1-10% was selected) was multiplied by the length of the session. Minutes were summed across all sessions to estimate overall minutes per target area for each participant. This estimation method was chosen as it allowed interventionists to quickly and feasibly assess this component after each intervention session.

Additional Fidelity Measures: Coders assessed participant engagement, interventionist/participant relationship, the percent of the session the parent versus interventionist spent talking, and the percent of session spent on-task (5-pt Likert scales). Variables were dichotomized into meeting adequate levels (coded as 4 or 5 on a 5-point Likert scale) or not (coded as 1-3 on a 5-point Likert scale). Percent of sessions with adequate levels were calculated for each participant.

Study Outcome Measure

Child height and weight were measured during baseline and 6-month visits using a Seca 876 scale and 217 stadiometer (Seca Corp., Hanover, MD) and BMI percentile was calculated [13]. A change score was calculated by subtracting baseline values from 6-month values for each participant.

Study Outcome Measure

Interventionist-Coder Reliability: Sessions with complete audio recordings were used to calculate reliability between interventionist and coder. For dichotomous items Cohen's kappa was calculated. For ordinal variables weighted Cohen's kappa with Fleiss- Cohen weights was calculated. For continuous variables intra-class correlation coefficients were calculated. T-tests, means, and standard deviations were used to examine interventionist-coder differences.

Intervention Fidelity Description: Intervention fidelity components were summarized across all sessions; means and standard deviations or percentages are presented. These data are also presented by session to look for possible trends over time using linear regression, with session number modeled as a continuous predictor and fidelity component as the outcome. Each slope was used to categorize participants whose fidelity scores improved, declined or stayed the same over time.

Exploratory Outcome Analyses: To demonstrate how average fidelity and fidelity slope may be used to predict outcomes, linear regression models were used with the primary outcome of change in BMI percentile from baseline to 6 months. BMI was mean centered and all models were adjusted for baseline BMI percentile.


Table 1 presents descriptive characteristics. Of the 209 sessions delivered across all participants, 131 sessions had complete audio recordings. First and last sessions were somewhat more likely to be missing or have incomplete audio than middle sessions (7-df χ2 = 11.5, p = 0.12). There was no significant difference in session length between those session with complete audio and those with missing or incomplete audio (22.7 ± 9.1min and 23.8 ± 9.2min, p = 0.41).

Interventionist-Coder Reliability

Table 2 presents interventionist-coder reliability. For time spent in target areas, reliability ranged from 0.78 to 0.88. Of the 131 recorded and coded sessions, a goal was set in 108 sessions. In these sessions, interventionist-coder reliability was poor for goal quality, ranging from 0.09 to 0.48. Shown in Table 3, interventions rated themselves higher than did coders on goal quality items (p < 0.001).

Intervention Fidelity Description

The percent of sessions with adequate time on-task was near 100% (Figure 1). Percentage of sessions with adequate participant time spent talking and participant engagement were also high. Coders rated participants and interventionists as having an adequate relationship in over half of sessions. Figure 2 shows the amount of time participants talked about the 4 target areas.

Figures 3 – 5 depict session fidelity over time. Overall, 80% of recorded sessions had a goal set. Although coder ratings of "time participants spent talking" increased slightly across the 8 phone sessions, ratings of goal setting and goal quality declined across the 8 sessions. All other measures stayed stable (data not shown) across the 8 sessions.

Exploratory Outcome Analyses

No statistically significant relationships were seen between time spent talking about different target areas and BMI percentile change (meals and snacks, B = -.002, p = 0.99; physical activity, B = -0.078, p = 0.48; media use, B = 0.171, p = 0.29; sugarsweetened beverages, B = -0.046, p = 0.83). No statistically significant associations were seen between change in BMI percentile and change over time in percent of session with a goal set (B = 0.222, p = 0.15), aggregate goal quality (B = 0.360, p = 0.78), participant-interventionist relationship (B = -5.648, p = 0.21), or participant engagement (B = -7.938, p = 0.07).

Participant change in intervention quality over time (declined, stayed the same, or improved) was also used to predict change in BMI percentile from baseline to 6 months. Exploratory non-significant results are depicted in Figures 6 and 7.


Robust intervention fidelity measurement could strengthen obesity prevention interventions. Current analyses present results from objectively assessed fidelity in HHHK-Preschool. There was high individual-item reliability between interventionists and coders for the items rating time spent talking about different target areas. This suggests that interventionists can reliably code these components without bias. Though not practical for interventionists, it may be feasible for coders listening to audio-recorded sessions to capture a more precise measure of time spent (i.e. exact minutes) in each target area. Future research is needed to see if this can be done reliably. Furthermore, this study only assessed a handful of fidelity measures thought to be the most relevant. There are many other components that could be explored in later work, including goal completion or progress.

When comparing coder and interventionist rating of goal quality, reliability was consistently lower (0.09 – 0.48) and interventionists consistently reported higher quality than coders, signifying interventionist-presentation bias on these items. Although this hypothesis is commonplace across behavioral research, it has not been explicitly tested. Our results do, however, coincide with work done by researchers in psychotherapy who found evidence of therapists overestimating adherence to protocols [14]. Specific steps may be taken to reduce the effect of this bias, for example, having interventionist training around susceptibility to overestimating session quality. Another option could be to revise subjective questions (e.g., "To what extent did you help the participant anticipate problems?") to more objective question (e.g., "What specific problems, if any, did you help the participant anticipate?"). Still, assessment by independent coder is likely a more valid way to assess these components, both in research and in clinical practice whenever possible.

When looking at coder-measured intervention fidelity, a number of interesting findings were unveiled. Although participants set a goal in the large majority of recorded sessions (80%), there was a downward trend in goal setting across sessions. Similarly, for recorded sessions where a goal was set, goal quality was high in initial sessions, but trended slightly downward over time. Conversely, the time parents spent talking improved over time, suggesting potential improvements in participant engagement. These results highlight the importance of examining trends in fidelity across sessions. This question of changes in fidelity over time has not been studied previously in behavioral obesity prevention interventions. In other fields, some have shown no change or slight increases in interventionists' report of fidelity over time [15,16]. To prevent downward trends in fidelity, researchers and practitioners should track these components during the course of the intervention and plan strategies to prevent declines.

Noting the limited sample size, exploratory analyses examined relationships between fidelity and BMI. As expected, no statistically significant associations were observed. Additionally, due to the limited sample size, these analyses were not adjusted for interventionist or family characteristics that may confound this relationship (e.g., child baseline BMI percentile, parent age, or parent sex), which would be an important consideration for future studies. However, this provides a template to be used in larger studies to examine these relationships. Another limitation of these analyses was the number of delivered sessions with missing or incomplete audio. There were a number of device issues (battery failures, etc.) that led to the missing data. It is possible that bias could be present in our results if (1) there were systematic differences in sessions with and without complete audio and (2) those differences led to differential reliability in coding, session fidelity, or participant outcomes. Though there was no difference in session length between sessions with and without audio, earlier and later sessions were slightly more likely to have missing audio than middle sessions. Future studies with larger sample sizes may be able to use imputation strategies to address this issue.

The current study successfully developed a coding protocol to objectively assess intervention fidelity. Detailed information about the fidelity of behavior change interventions can advance the ability to understand study outcomes and improve future interventions in research and clinical practice. Our findings highlight the need for rigorous, ongoing supervision and support for interventionists in the clinical context, such as case-consultations. It is noted, however, that objectively measuring all components of fidelity in clinical settings is impractical. Therefore, it is important for researchers to begin identifying the most effective intervention components and ways to practically monitor these "active ingredients."

Authors' Contributions

MMJ helped design the study, developed the coding protocol, trained coders, conducted statistical analyses, and drafted the manuscript. EMS participated in developing the coding protocol, coded intervention sessions, compiled data, and helped draft the manuscript. AMR coded intervention sessions and helped draft the manuscript. NES helped conceive of the study, participated in the study design and statistical analyses, as well as helped draft the manuscript. All authors read and approved the final manuscript.


The authors would like to thank Dani M. Bredeson for coding intervention sessions. This work was supported by the National Institute of Diabetes and Digestive and Kidney Diseases at the National Institutes of Health [grant numbers R21DK078239, P30DK050456, P30DK092924, T32DK083250].

Journal Of Obesity and Overweight

Tables at a glance
Table 1
Table 2
Table 3
Figures at a glance
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 1: Percent sessions meeting adequate fidelity levels (4,5 on a 5-point Likert scale, (N = 130 sessions))
Figure 2: Average minutes spent talking about target areas summed across all 8 intervention sessions
Figure 3: Percent of sessions with a goal set by session number
Figure 4: Average goal quality by session, reported by coders. Possible scores ranged from 4-21
Figure 5: Average time parent spent talking by session, reported by coders. Scores ranged from 1-5
Figure 6: Change in BMI (baseline to 6 months) by change in goal quality over time, p = 0.95
Figure 7: Change in BMI (baseline to 6 months) by change in parent time talking, p = 0.80
Figure 3: Percent of sessions with a goal set by session number
Figure 4: Average goal quality by session, reported by coders. Possible scores ranged from 4-21
Figure 5: Average time parent spent talking by session, reported by coders. Scores ranged from 1-5
  M (SD) or %
Child age (yrs) 2.6 (0.7)
Child gender (% female) 50
Child race (% White) 77
Child BMI percentile 83 (8)
Parent age (yrs) 34 (5)
Parent gender (% female) 97
Parent BMI (kg/m2) 29 (6)
Table 1: Baseline descriptives of sample, N = 30
  Interventionist-Coder Reliability
Time spent in target areas, N = 131 recorded sessions  
   Meals and snacks (min) 0.80
   Physical activity (min) 0.80
   Media use (min) 0.78
   Sugar-sweetened beverages (min) 0.88
Goal quality, N = 108 recorded sessions with a goal set  
   Goal specificity (1-5) 0.09
   Plan included small steps (1-5) 0.13
   Rehearse steps and anticipate problems (1-5) 0.19
   Track progress and create a reward system (1-5) 0.44
   Identify social support (yes/no) 0.38
   Aggregate goal quality (4-21) 0.48
Table 2: Interventionist-coder reliability as calculated by Cohen's kappa, weighted kappa, or intraclass correlation coefficient
  Coder Rating Interventionist Rating P-value*
Goal quality      
   Goal specificity (1-5) 2.5 (1.0) 4.0 (1.1) <.0001
   Plan included small steps (1-5) 2.1 (1.0) 3.8 (1.2) <.0001
   Rehearse steps and anticipate problems (1-5) 2.1 (1.0) 3.5 (1.5) <.0001
   Track progress and create a reward system (1-5) 1.9 (1.1) 2.9 (1.6) <.0001
   Identify social support, % 40% 70% <.0001
   Aggregate goal quality (4-21) 8.9 (3.8) 13.0 (5.3) <.0001
*P-values for two-sample t-test.
Table 3: Intervention goal quality as measured by coder and interventionist in the N=108 recorded sessions where a goal was set, out of the total 131 recorded session