Reproducibility of dietary and other data from a self-administered questionnaire.

This study examined the reproducibility of information obtained from a questionnaire covering dietary and other information by comparing the answers to the questionnaires with answers to the same questions one year later. Answers from 191 men and 220 women, aged 40 to 65, in Fukuoka prefecture, Japan were compared. The surveys were conducted in November 1989 and November 1990. In the second survey, 97.2% of the first respondents answered. The concordance between the two responses was high, and the differences between the mean intakes from the two surveys were within 5% for 7 food items and within 15% for 14 food items among the 20 items about which the respondents were asked. The intraclass correlation coefficients varied from 0.30 for eggs to 0.62 for milk, with 16 items greater than 0.4. The differences in reproducibility between the sexes and two age categories were not significant. Close values also were obtained for the estimated consumption of salt. A substantially high reproducibility was observed on items regarding drinking and smoking; most of the kappa statistics and the intraclass correlation coefficients were between 0.5 and 0.9. The above reproducibility on individual food items was comparable to or better than those reported from other studies. The results of the present study thus indicate that the self-administered semiquantitative food intake questionnaire used for our cross-sectional study is useful for epidemiologic studies to assess the association between diet and various diseases. In particular, the present questionnaire is highly dependable regarding the overall group intake of foods.(ABSTRACT TRUNCATED AT 250 WORDS)


Introduction
It is well known that dietary habits, smoking, and drinking are highly related to the occurrence of heart disease, cerebrovascular accidents, cancer, diabetes, and other serious diseases. To investigate the association between diet and various diseases, the precise assessment of dietary habits in humans is critical. We developed a self-administered questionnaire for a cross-sectional study in six prefectures in Japan to clarify some of the risk factors for cancer, especially gastric cancer. The self-administered questionnaire was designed primarily to assess dietary habits as well as drinking, smoking, and other practices. The present reproducibility study evaluated the reliability of the information obtained using that questionnaire. High reproducibility would show that the data of our cross-sectional study are reliable, and would give general indications of whether a dietary assessment using the questionnaire is useful for epidemiologic studies on the relationship between diet and cancer, and possibly diet and some other diseases. This is particularly important for Japan because only a few studies on the reliability of information from a questionnaire have been reported previously in this country.

Subjects and Methods
The subjects consisted of 411 residents (191 men and 220 women) in Fukuoka prefecture, Japan. The age and sex distribution of the subjects is shown in Table 1. The survey was designed primarily to assess the lifestyle of adults between 40 and 65 years old. They were randomly selected residents in the districts of three health centers in Fukuoka prefecture. We questioned them regarding their lifestyle in November 1989 and again in November 1990 using the same questionnaire. In the second survey, 97.2% of the first respondents answered the questionnaire. The 20 food items surveyed are shown in Table 2. We tried to make a comprehensive investigation of dietary habits, and included all food items reported to have an association with gastric cancer. For each food item, we determined the frequency of consumption by asking the number of times each food item was consumed within a given interval. The responses were categorized into no consumption, 1 to 3 times a month, 1 to 3 times a week, 4 to 6 times a week, daily, twice a day, and 3 times a day. The respondents were asked to circle the appropriate box. Next, we asked the respondents about portion size. To accurately estimate portion size, we showed participants a color photograph of the actual portion for each food. The respondents could indicate that the amount of the food consumed was about half the size of the photograph, 70 to 80%, the same, or 1.5 times as much. If consumption did not match the answer choices, then the respondents were asked to specify exactly how large the portion was in comparison to the portion size shown in the photo. The consumption of all food items was then estimated by multiplying the frequency by the portion size.
Since the distribution of the intake values of most food items was skewed toward higher values, all food intake values were log1o transformed before analysis. Some people reported no consumption for uncommon food items, so 1.0 was added to all values before transformation. Although the resulting distributions were not strictly normal, they did not include any serious outliers. The mean food intakes calculated by the two questionnaires were then compared after the log transformation. The homogeneity of the distributions of the repeated questionnaire was then evaluated by the Wilcoxon rank-sum test for each food item. We computed the intraclass correlation coefficient as a measure of reproducibility for dietary consumption calculated from the two questionnaires. The coefficient quantifies the extent of the within-subject agreements relative to the between-subject agreements (1). For all categories except dietary intake, we calculated the proportion of agreement. The intraclass correlation coefficients also were computed for ordinal data such as the amount of alcohol consumption and the number of cigarettes smoked. The reproducibility of the categorical data was evaluated by the kappa statistic. Kappa is the proportion of agreement after chance agreement is removed (2).
The following guidelines were used to evaluate kappa: below 0.20 indicated slight agreement, 0.21 to 0.40 fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial and 0.81 to 1.00 almost perfect (3). Although there is still some controversy as to whether or not this guideline, along with a similar one (4) for the range of acceptability, is adequately informative (5), we used it because there seems to be no appropriate alternative. Since the intraclass correlation coefficient is analogous to kappa (6), this guideline was also used for the intraclass correlation coefficient. Because the purpose of this study was to quantify reproducibility rather than to test the null hypothesis of agreement by chance, the p values were not presented.
The homogeneity of the intraclass correlation coefficients across the groups of subjects was tested with the t-approximation approach described by Kraemer (7). The statistical analyses were conducted using SAS, SAS Institute (Cary, NC) at the Computer Center, Kyushu University. The significance level of all statistical tests was 0.05. Environmental Health Perspectives 6 Results Table 2 shows the means, their 95% confidence intervals, the ratios of means, and intraclass correlation coefficients regarding intakes of 20 food items and the consumption of salt; the latter was estimated in two ways. The two questionnaire surveys produced closely similar results. The differences in the mean intakes from the two surveys were within 5% for 7 food items and within 15% for 14 items. ( Table 2 shows the ratio of the two means.) The intakes from the two surveys, tested with the Wilcoxon rank-sum test, were not significantly different for 16 items (two-sided, the significance level was 0.05.) The intraclass correlation coefficients varied from 0.30 for eggs to 0.62 for milk, with 16 items greater than 0.4. The mean (±SE) of the correlation coefficient was 0.47 (±0.02). The correlation coefficients showed a moderate reproducibility for most food items.
We estimated consumption of salt in two ways. Estimation 1 represented the intake of salt from all salted foods listed on the questionnaire, while estimation 2 was from highly salted foods only, such as strongly-pickled vegetables and salted fish. They are shown as Salt-I and Salt-2 in Table 2. The differences in the mean intakes at the two surveys were 8% for Salt-1 and 4% for Salt-2. Their intraclass correlation coefficients were 0.55 and 0.51, respectively. Thus, the two surveys provided similar values for the consumption of salt, and the reproducibility was moderate.
It is also of interest to see whether or not dietary reproducibility differs by gender. The mean and the intraclass correlation coefficient for the intake of the 20 food items listed and the estimated consumption of salt are shown separately for men and women in Table 3. The reproducibility of the mean food intake was similar between men and women. Of the 20 food items, the difference in the mean intake from the two surveys was less than 15% for 12 food items in men, and for 13 food items in women. The gender difference in the reproducibility of the mean intake of salt was also small. For men and women, the mean salt intake from the first questionnaire and the second were almost the same.
The difference in the intraclass correlation coefficients between the sexes was also small for most food items. The intraclass correlation coefficients of 13 items were greater than 0.4 in men while those of 15 items were greater than 0.4 in women. For the intake of all food items and salt, no significant difference was found between men and women in the intraclass correlation coefficients. In addition, the mean of the intraclass correlation coefficients for 20 food items did not differ significantly between the sexes (matched paired t-test). In Table 4, the means of the food and salt intakes and the intraclass correlation coefficients from the repeated questionnaires are shown separately for age groups 40 to 49 and 50 to 65. The reproducibility of the mean food intakes was similar between the two age groups. Regarding the 20 food items, the difference in the mean intakes from the two surveys was less than 15% for 13 food items in the 40 to 49 group, and for 15 food items in the 50 to 65 group. For both estimations of salt, the difference in the mean consumption was 12 to 13% for the 40 to 49 group, whereas the difference was less than 5% for the 50 to 65 group.
The difference in the intraclass correlation coefficients was similar for the two age groups for most food items. The intraclass correlation coefficients of 14 items were greater than 0.4 in the 40 to 49 group, while those of 16 items were greater than 0.4 in the 50 to 65 group. For some food items, however, a significant difference in the intraclass correlation coefficient was found between the two age groups. The intraclass correlation coefficients of green vegetables and fresh fish for the 40 to 49 group were significantly higher than those for the 50 to 65 group (p<0.05), whereas the correlation coefficient of milk for the 50 to 65 group was significantly higher than that for the 40 to 49 group (p<0.01). The difference in the mean intraclass correlation coefficients between the age groups was not significant (p > 0.05, matched paired t-test). On the whole, the reproducibility was similar between the two age groups. Table 5 shows the reproducibility of the items concerning drinking alcohol and smoking. Reproducibility was presented separately for men and women. The agreement for drinking was approximately 0.9 and its kappa was greater than 0.7 for both men and women. For most types of alcoholic beverages, the values of the kappa were between 0.4 and 0.7. The intraclass correlation coefficients for the amount of drinking and the age when individuals first started to drink were between 0.57 to 0.72. For smoking, the kappa values were 0.8 and 0.89 for men and women, respectively. These data demonstrated a substantially high to almost perfect reproducibility Volume 102, Supplement 8, November 1994  On other miscellaneous items, reproducibility was also analyzed and is shown in Table 6. The items on blood transfusion showed moderate to substantial reproducibility. The kappa for previous blood transfusions received was 0.74, while the intraclass correlation coefficient for the age at the time of receiving a blood transfusion was 0.63.

Discussion
As stated, the precise measurement of dietary habits is by no means an easy task. Therefore, many investigators have examined reproducibility to check the reliability of dietary information using various methods of assessments (8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21). Unfortunately, only a few studies have yet been reported on this subject in Japan (10,22); and these studies have been greatly limited in that the dietary questions were very simple and only the frequency of dietary intake usually was assessed. The present study is probably the first reproducibility study conducted in Japan using a comprehensive, semiquantitative questionnaire. Hankin et al. (11,12) examined reproducibility among Japanese-Americans in Hawaii; however the dietary habits of Japanese-Americans may be substantially different from those of indigenous Japanese living in Japan.
In this study, we assumed that the dietary habits in Fukuoka prefecture are stable between the two questionnaires. Although the dietary habits were not surveyed specifically in Fukuoka prefecture from 1989 to 1990, the Ministry of Welfare in Japan surveys the dietary and nutritional intake of Japanese on an annual nationwide scale known as the National Nutritional Survey (NNS). According to the NNS (23,24), the mean (± SE) ratio of consumption of some food items common to the food items used in this study in 1989 and 1990 was 0.99 (± 0.08) in the northern Kyushu district where the study took place. Thus, the respondents' dietary habits during the 2 years seemed to be the same; if they changed, the magnitude is probably very slight.
The present study was concerned primarily with the reproducibility of studies of individual food intake; similar studies have been reported elsewhere (8)(9)(10)13,19). All previous authors adopted different methods, such as interviewing the subjects or using self-administered questionnaires. Some questionnaires used color pictures to show portion size and some merely stated a unit size. For the sake of comparison, we computed Pearson's correlation coefficient (r), Spearman's r, and Kendall's rwhenever necessary.
Nomura et al. (14) reinterviewed 109 men using the same questionnaire after a 6 month interval and 111 men after a 2 year interval. For 19 food items common to those used in our study, the means (± SE) of Kendall's r were 0.32 (± 0.04) for the 6month group and 0.29 (± 0.03) for the 2year group, while the present study showed that the mean Kendall's rwas 0.39 (± 0.02).
Fukao et al. (10) sent out the same selfadministered questionnaire on food consumption to 3157 men and women after an interval of 5 month. The means (± SE) of the proportion of exact agreement and kappa were 0.43 (± 0.02) and 0.20 (± 0.02), respectively, for the 19 food items that overlapped with ours. Since Fukao et al. surveyed only the frequency of food intake, we also calculated the proportion of exact agreement and kappa for frequency of consumption. In the present study, the means (± SE) of the proportion of exact agreement and kappa were 0.46 (± 0.02) and 0.25 (± 0.02), respectively, and showed better agreement than their studies.
Salvini et al. (19) evaluated reproducibility on a self-administered food frequency questionnaire administered twice at a 12 month interval for 173 women from the Nurses' Health Study. In their questionnaire, a standard serving size was specified for each food. Of the 24 food items that overlapped with ours, the mean (± SE) of Pearson's r was 0.51 (± 0.02), which showed a slightly better agreement than ours, 0.47 (± 0.02). The subjects in the study by Salvini et al., however-all nurses-were well educated and this factor may be associated with the higher reproducibility. Colditz et al. (9) reported the reproducibility of a semiquantitative food frequency questionnaire completed by the participants in the Nurses' Health Study and repeated 9 months later. For 24 food items common to those used in the present study, the mean Pearson's r (± SE) was 0.52 (± 0.02), a slightly higher value than ours, 0.49 (± 0.02).
Bueno de Mesquita et al. (8) tested the reproducibility of a food frequency questionnaire used in a case-control study. They conducted two studies 12 months apart, each using 63 men and 54 women. The mean (± SE) of the Pearson's r for the 18 food items overlapping with our study was 0.64 (± 0.04) and 0.64 (± 0.03). These coefficients are much higher than ours, probably because the subjects were inter-Environmental Health Perspectives viewed by the same trained dietitian and were assisted by other household members.
Jacobsen and B0naa (13) assessed the reproducibility of a self-administered questionnaire by comparing answers from 201 men and women to questions in two surveys approximately 1 year apart. The reproducibility expressed by Spearman's r was 0.61 and the proportion of agreement in the consumption category was 0.63 for six food items, which were comparable to ours. In our study, the means (± SE) of Spearman's r and the proportion of exact agreement were 0.49 (± 0.02) and 0.46, respectively. The men and women included in Jacobsen and B0naa's study were selected from the general population, were willing to attend a health screening, and later were selected because of previously unrecognized mild hypertension and a higher-than-desirable total serum cholesterol value. These selection factors also may be associated with higher reproducibility.
Overall, the reproducibility obtained by our study was better than or comparable to other studies, even though the subjects were randomly selected from the general population. Color photographs of actual portion sizes used in our questionnaire probably contributed to the high reproducibility.
One disadvantage of this study is that the estimation of nutrient intake was not done due to practical reasons. Only salt intake was estimated, and the intraclass correlation coefficients for the two estimations were 0.55 and 0.51, respectively. Rimm et al. (18) and Pietinen et al. (15) reported that the reproducibility on sodium intake was 0.72 and 0.70, respec-tively. However, their studies are not comparable to ours because the former was conducted on selected subjects-male health professionals-while the latter examined the reproducibility of the questionnaire distributed three different times.
Among the studies dealing with the reproducibility of nutrients (8,11,12,18,25), the intraclass correlation coefficients, Pearson's r, and Spearman's r, ranged from 0.12 to 0.87 and most fell into a range between 0.5 and 0.6. As stated, our study was not intended to estimate nutrient intake. With regard to salt intake, the intraclass correlation coefficients, Pearson's r and Spearman's r ranged from 0.49 to 0.56 for the two estimates of salt intake in our study.
Several other studies have investigated the reproducibility of responses for drinking alcoholic beverages and smoking. Kelly et al. (26) reported the reliability, expressed by kappa, of the answers to questions on the current status of cigarette smoking and alcohol drinking. In their study, the kappa values for cigarette smoking and alcohol drinking were 0.81 and 0.47, respectively. The intraclass correlation coefficient for number of cigarettes smoked per day was 0.69, and for years since stopping smoking was 0.86. Bueno de Mesquita et al. (8) reported that the reproducibility (Pearson's r) of the daily intake of alcoholic beverages was 0.96 and 0.78 in their two studies. In the present study, the reproducibility (kappa) of drinking was 0.77 for men and 0.73 for women, while that of smoking was 0.80 for men and 0.89 for women. The intraclass correlation coefficients on amount of drinking, number of cigarettes smoked, and year the subject last smoked ranged from 0.57 to 0.82. These high values of reproducibility were also comparable to those of the above studies.
In the current study, the reproducibility of the question on previous blood transfusion received also was evaluated. Blood transfusions have been done only rarely in recent years in Japan. Therefore, even if some subjects may have received a blood transfusion after the first questionnaire survey, the number would be very small. Other items, such as working status, type of job, and smoking and drinking history, may change after the first survey. Thus, the reproducibility on these items reported in the current study would be the minimum estimates.
In summary, the mean values of the food intake obtained by the two questionnaire surveys were close, in spite of the 1 year interval and the nature of the selfadministered questionnaire. The intraclass correlation coefficients showed a moderate agreement for individual food intakes and estimated salt intakes. These results thus indicate that the semiquantitative food intake questionnaire used for our cross-sectional study is useful for assessing dietary intake and, in particular, for evaluating the dietary habits of the group. For most items regarding drinking, smoking, and blood transfusion, the reproducibility was substantially high, and thus was considered reliable when used in conjunction with the epidemiologic survey.