Overstating the Consequences : Peipins et al . ’ s Response

Peipins et al. (2003) described the study conducted by the Agency for Toxic Substances and Disease Registry (ATSDR 2001, 2002a) to analyze the association between the prevalence of radiographic abnormalities and asbestos exposure pathways for residents of Libby, Montana. Although Peipins et al. presented many detailed results, they failed to explicitly state the obvious conclusion of their analysis: excess risk of asbestos-related disease for Libby residents is a consequence of occupational exposure, and risk associated with low-level exposure is negligible. This finding is extremely important for guiding future public health assessments of exposure to vermiculite from the Libby mine and exposure to amphibole asbestos in general. Peipins et al. (2003) made a clear case for the importance of occupational exposure. They reported the following statistically significant factors for predicting pleural abnormalities: being a former mine worker; being older; having been a household contact of a mine worker; and being male. These results associate pleural abnormalities with high occupational exposure groups—mine workers and household contacts of mine workers. Only one environmental exposure pathway was statistically significant— played in vermiculite pile, which may involve exposures as high as occupational exposures. The ATSDR data (ATSDR 2001, 2002a) indicate that 17.8% of the 6,668 subjects with X-ray films had pleural abnormalities. This percentage appears large in comparison to 6.7% of subjects with pleural abnormalities in the internal control group (no exposure). However, occupational exposures inflate the difference. I reanalyzed the data by forming three exposure groups: mine workers, residents with other occupational or domestic exposure, and residents with no occupational or domestic exposure (the “environmental exposure only” group) (Price B. In press). Pleural abnormalities were recorded for 51% of mine workers, 19.9% of residents with other occupational or domestic exposure, and 9.1% of residents with no occupational or domestic exposure. These results again demonstrate the importance of occupational exposure for pleural abnormalities. The percentage for the “environmental exposure only” group is close to the percentage for the internal controls. Other confounding factors mentioned by Peipins et al. (2003) could further reduce the difference. One of those factors is the false positive bias due to subpleural fat, which would be expected to be greater in the environmental exposure group than in the mine worker group (Price B. In press). Peipins et al. (2003) implied that asbestos exposure was high in Libby:

However, the cited references do not contain data for samples collected in 1975. Dixon et al. (1985) reported that a total of four stationary samples were collected, each for 2 hr (Atkinson et al. 1982). These samples, analyzed by two laboratories, produced the following measurements, respectively: 0.08 and 0.50 fiber/cm 3 , 0.10 and 0.02 fiber/cm 3 , 0.03 fiber/cm 3 and none detected, and 0.03 and 0.02 fiber/cm 3 . Comparison of these 2-hr sampling results to OSHA's 8-hr limit is not meaningful. Also, because the measurements vary considerably between laboratories, the largest measurement is not a reliable estimate of exposure in Libby.
Recently, the U.S. Environmental Protection Agency (U.S. EPA 2001EPA , 2002 measured airborne asbestos levels in Libby and estimated exposure for Libby residents engaged in activities that disturbed household dust, vermiculite attic insulation, and soil. The results, stated as lifetime average daily exposure, ranged from 0.00007 to 0.005 fiber/cm 3 . Risk levels calculated by the U.S. EPA for these exposures were between 1 × 10 -6 and 1 × 10 -4 , the acceptable range defined by the U.S. EPA (1989) for regulatory decisions.
The study described by Peipins et al. (2003) is one of a number of studies addressing asbestos exposure and risk in Libby (Amandus and Wheeler 1987;ATSDR 2000ATSDR , 2001ATSDR , 2002aATSDR , 2002bMcDonald 2001;McDonald et al. 1986;McDonald JC. Unpublished data;U.S. EPA 2001U.S. EPA , 2002. The perception that mine workers' disease rates apply to all Libby residents gained credibility through the ATSDR (2000) asbestosis mortality study: "… mortality in Libby resulting from asbestosis was approximately 40-60 times higher than expected." Later, the ATSDR (2002b) explained that the excess mortality was a consequence of occupational exposure. As noted by Peipins et al. (2003), the ATSDR observed a total of 12 asbestosis deaths: 11 males who were previously employed at the Libby mine and 1 female who was a household contact of a mine worker.
The results presented by Peipins et al. (2003) and results from other studies of asbestos in Libby indicate that occupational exposure-not low-level environmental exposure-is the most significant risk factor for asbestos-related disease. Peipins et al. should have stated that conclusion explicitly, taking the first step toward correcting misperceptions about asbestos disease in Libby and, more generally, the risk of disease associated with low-level exposures to amphibole asbestos.
The author declares he has no competing financial interests. We appreciate Price's interest in our article . We stated clearly that being a former W.R. Grace (WRG) worker was a significant risk factor for both pleural and interstitial abnormalities. We also noted that only age was more strongly associated with these outcomes in multivariate analyses and that these results were not unexpected. However, we disagree with Price's statement that the obvious conclusion of our analysis is that risk associated with low-level environmental exposure is negligible. Such a conclusion ignores key results. For example, we found that playing in the vermiculite piles and longer duration of residence in Libby, Montana, were associated with pleural abnormalities, even after controlling for occupational and domestic exposures . We also found that the prevalence of pleural abnormalities increased with increasing number of exposure pathways, even after we removed WRG workers from the analysis. This suggests a cumulative effect from multiple exposures that exclude working in the mine.
Price incorrectly labels our "no-apparentexposure" group as an "internal control group." We did not have an internal noexposure group . Our no-apparent-exposure group consisted of participants who responded "no" to the exposure pathways listed in the questionnaire and who were likely exposed via ambient air and other pathways not assessed by our screening questionnaire. The rate of 6.7% for the no-apparent-exposure group in our analysis and the rate of 9.1% given by Price in his letter are considerably higher than the prevalence rates of pleural abnormalities found in published studies of other nonoccupationally exposed populations in the United States, which range from 0.2% among bluecollar workers in North Carolina (Castellan et al. 1985) to 2.3% among patients at Veterans Affairs hospitals in New Jersey (Miller and Zurlo 1996). Of note, these studies did not exclude family contacts of workers or domestic exposures (Castellan et al. 1985;Anderson et al. 1979).
When assessing subpleural fat as a confounding factor, we found former WRG workers to have higher body mass indexes (BMIs) than those who were not former WRG workers. We controlled for subpleural fat by including BMI in both our multivariate analyses and our pathways analyses. Therefore, the associations between environmental exposures, as well as occupational and domestic exposure, and pleural abnormalities remained when controlled for BMI.
In regard to Price's comments on past exposures in Libby, sampling performed by WRG in 1975 showed markedly elevated ambient air asbestos concentrations in downtown Libby [U.S. Environmental Protection Agency (EPA 2002)]. These findings are consistent with the limited ambient air samples collected by the U.S. EPA (Dixon et al. 1985;Atkinson et al. 1982). Although Price points out that the variation in detectable laboratory results ranged from 0.02 to 0.5 fiber/cm 3 , depending on the laboratory, it is clear that the ambient air concentrations in Libby easily approached, if not exceeded, occupational 8-hr limits. In a cross-sectional study of workers at an Ohio fertilizer plant that processed vermiculite from Libby, Montana, Lockey et al. (1984) found that workers with daily time-weighted-average exposures of 0.031-0.415 fiber/cm 3 , similar to the ambient air concentrations reported in Libby, had significantly elevated radiographic pleural changes and pleuritic chest pain.
Price asserts that Agency for Toxic Substances and Disease Registry (ATSDR) mortality studies conducted for the Libby area have created a false perception of the community's asbestos-related mortality experience. Results from ATSDR's mortality study (ATSDR 2002) revealed significantly elevated rates of mesothelioma, asbestosis, and lung cancer when compared with the Montana and U.S. populations. Workers were included in the determination of asbestos-related mortality in Libby, as is done as a matter of practice throughout the nation to determine comparative standardized mortality rates. Nevertheless, there were several deaths found that did not appear to be occupationally related. Notably, one of the three mesothelioma cases identified for inclusion in our study ) did not occur among former mine workers (ATSDR 2002). Additionally, Lincoln County, Montana, had the highest age-adjusted asbestosis mortality rate in the United States for 1988-1997, even when compared to other counties that contain large asbestos exposed workforces (Castellan R. Unpublished data).
On the basis of our results, we conclude that both occupational and environmental risk factors are important predictors of asbestos-related radiographic abnormalities in this community. We thank Price for his comments and hope that this letter provides additional insights to these issues.

Overstating the Consequences of Radiographic Abnormalities
"Radiographic Abnormalities and Exposure to Asbestos-Contaminated Vermiculite in the Community of Libby, Montana, USA" by Peipins et al. (2003) is the first journal publication by the Agency for Toxic Substances and Disease Registry (ATSDR) of the results of their multiyear medical testing program of Libby residents. Peipins et al.'s (2003) conclusion that 18% of study participants had pleural abnormalities has received wide attention and has lead to understandable concern among Libby residents and health professionals. We, as principals of Health Network America (HNA) and administrators of the Libby Medical Plan (LMP), are in the unique position of having participated in the peer review of applicants and members of the LMP. The LMP is a health benefit program for the people who lived in and around Libby and developed an asbestos-related condition. The peer reviewers include two board-certified radiologists who specialize in chest radiography and/or pneumoconiosis and are certified B-readers, and a third board-certified radiologist who specializes in interpretation of pleural disease on chest computed tomography (CT) scans. Although the review process is ongoing, some of our preliminary observations are relevant because they include many cases reported as "abnormal" by the ATSDR. In this letter we seek to communicate the more serious issues raised by this review process.
The basic ATSDR study design included a three-view chest X ray (posterior-anterior, left and right, and obliques) on all participants over 18 years of age, with the X rays to be interpreted by three B-readers. If two of the B-readers identified a pleural or interstitial abnormality, this would be regarded as a positive response by the ATSDR. If only one of the initial two B-readers identified an abnormality, the third B-reader also performed an interpretation.
Several problems are raised by this design. First, all B-readers were aware that the X rays were of Libby residents, opening the door to reader bias. Second, the B-readers knew the sequence in which they reviewed the films. B-readers 1 and 2 knew they were always the first or second reader. B-reader 3, then, not only knew the X rays were from Libby but also had the reports of B-readers 1 and 2 prior to making his interpretation. Although control films or a control group would have been useful in resolving these issues, they were not used. Peipins et al. (2003) bolstered their findings with the assertion that "if two out of three B-readers indicated…," implying agreement between these readers. The HNA's review (HNA. Unpublished data) reveals that in many instances this was not the case. For example, if one reader found a potential pleural plaque on the right chest wall and the second reader recorded a possible pleural plaque on the left chest wall or diaphragm, this was apparently recorded as a positive finding of an abnormality by two B-readers. These discrepancies were not reported by the ATSDR to study participants or by Peipins et al. (2003).
Perhaps the most troubling issue in the study by Peipins et al. (2003) is the misreading of plaques or thickening when only pleural fat was present. While acknowledging the confounding influence of obesity and pleural fat in determining pleural disease, Peipins et al. (2003) failed to scientifically account for this. The ATSDR (2001) reported that the body mass indexes (BMIs) of 67% of the 7,307 participants were ≥ 25, indicating overweight, and 32% of these were obese (BMI of ≥ 30). We independently verified the true incidence of overweight and obesity by calculating BMIs of the LMP participants: 89% were overweight and 54% were obese (Table 1). Peipins et al. (2003) conceded that This difficulty was clearly present in their study. As part of the HNA review, X rays and CT scans of study participants were sent for peer review as described above. Although this review is continuing, it is clear that in many cases, participants coded as positive for pleural changes either had no visible asbestos-related changes on their X rays or they had subpleural fat that was misdiagnosed as pleural thickening or plaques. As a result of the study bias, nonconformity of the B-reader reports, and not accounting for high BMIs and pleural fat, the study by Peipins et al. (2003) markedly overstated the consequences of asbestos exposure in Libby, Montana.

The authors are employees of Health Network
America, which is the plan administrator for the Libby Medical Plan. Medical evidence in support of applications is peer-reviewed by independent medical professionals to determine eligibility. Health Network America is paid a fee for its services that is independent of admission or nonadmission of applicants to the plan.

Overstating the Consequences: Peipins et al.'s Response
Flynn et al. raise concerns about differences in chest radiograph interpretation and bias among B-readers in our study . Furthermore, they suggest that an internal review by Health Network America (HNA) found misdiagnosis in many cases, with subpleural fat being miscoded as asbestos-related pleural changes.
In the Libby, Montana, screening program, all films were reviewed by at least two experienced B-readers, with a third B-reader functioning as a "tiebreaker" to settle disagreements. Participants were categorized as "positive" if two B-readers reported any pleural abnormality, and as "indeterminate" if only one B-reader reported an abnormality. For clinical purposes, participants in both the "positive" and the "indeterminate" categories were notified and encouraged to follow up with their personal physician. The three B-readers are respected experts in the field and are distinguished members of academic institutions. Furthermore, the design of the screening program employed is similar to that used in previous studies of asbestos-related radiographic abnormalities (Rogan et al. 1987(Rogan et al. , 2000. Although the B-readers in the screening program were aware that the X rays were from the Libby area, they had no other information about occupational or environmental exposure pathways and were blinded to the identities of individuals who were screened. This is important because a clear exposure-response relationship was documented between the presence of pleural abnormalities and the number of exposure pathways reported by participants. Only 6.7% of the group with "no apparent exposure" had pleural abnormalities, compared to 10.8% of the group reporting one to three exposure pathways, 14.4% of the group reporting four to five exposure pathways, and 23.7% of the group reporting six or more pathways. This trend remained significant even after controlling for body mass index in the multivariate analysis. We recently evaluated preliminary data from high resolution chest computed tomography (CT) scans conducted on 353 Libby medical screening participants with "indeterminate" chest radiographs. Pleural abnormalities were identified in 98 persons (28% of all tested) whose chest radiographs were classified as "indeterminate" (i.e., only one out of three B-readers noted an abnormality). This suggests that the results of the Libby screening program may have actually underestimated the number of abnormal findings.
In summary, we disagree with Flynn et al.'s assertion that these findings can be explained by study bias issues related to chest radiograph B-readings or by misinterpretation of pleural fat. Rather, more recent high-resolution CT scanning results suggest that our estimates of pleural abnormalities in this population may be conservative and may actually underestimate the true prevalence of these abnormalities seen on chest radiographs.

Expert Witnesses Need to Know About the New Risks
Professionals should not enter into providing witness or consulting services without considering their personal liabilities. I would like to provide some recent changes and liability issues for the consideration of expert witnesses and consultants.
Times change, as does the law. Under recent court decisions, an expert witness whose work fails to meet professional standards may find himself/herself being hauled back into court as a defendant in a malpractice claim. If the expert knows of possible personal weaknesses, hiring additional experts to cover these weak areas is a feasible solution.
Historically, witnesses could not be sued for defamation on the basis of their testimony in court. The law granted this immunity to encourage candid testimony, or as one court put it, "to ensure that the path to truth is left as free and unobstructed as possible." In 1999, however, the Supreme Court of Pennsylvania carved a large exception out of the immunity doctrine for expert witnesses. In the case of LLMD of Michigan, Inc. v. Jackson-Cross Co. (1999), the court held that a client could sue his expert witness for negligence if the expert fails to exercise the care and skill common to his profession in forming his opinions on the client's case.
In 2002, the Supreme Court of Appeals in West Virginia took the issue further in Davis v. Wallace (2002) when it suggested that an expert witness could be sued for negligence not only by his own client but also by the opposing party against whom the expert testifies. Experts need to keep up with their fields of interest in the era of rapid new information. This can be done by reading, attending meetings, and talking with other experts. Publishing articles in peer-reviewed journals can help document and prove expertise to the judge. Keeping good records, which display knowledge through good writing skills, is critical; many decisions are based on written documents. Continuing education courses are available to improve the expert's practice and avoid liability problems. Also, all communications should be carefully edited before they are sent.
Scientists who serve as expert witnesses in federal lawsuits should be prepared to justify their theories and methods used in each case in a Daubert hearing. This is a layman's shorthand reference to the 1993 U.S. Supreme Court decision in Daubert v. Merrell Dow Pharmaceuticals, Inc. (1993). The Daubert case radically altered the rules for expert witnesses in federal court. For most of the twentieth century, federal courts were supposed to permit expert witnesses to offer scientific evidence at trial only if the scientific principles involved were generally accepted within their field. In the Daubert case, the Supreme Court held that the general acceptance test was too restrictive and was not compatible with modern rules of evidence.
On one hand, the Daubert case (Daubert v. Merrell Dow Pharmaceuticals, Inc. 1993) opened the courthouse door to novel scientific evidence. On the other hand, this case vested trial judges with a gatekeeping role to weed out "irrational pseudoscientific assertions," or to separate cutting-edge principles from "junk science" (Daubert v. Merrell Dow Pharmaceuticals, Inc. 1993). As part of their gatekeeping function, judges are required to make a preliminary assessment to determine whether the expert's theory or technique has been and can be tested, whether it has been subject to peer review and publication, and whether the theory or technique has a known or potential rate of error.
Courts have not established hard and fast rules on the manner in which they will decide Daubert objections. The process could be as simple as the lawyers submitting documents to the judge, or it could involve a form of mini-trial in which the court hears testimony from the expert and other experts who attempt to validate or discredit the expert's theories and practices at issue.
What should scientists do, knowing that their work as expert witnesses may be rejected by a judge and knowing that their own clients may sue them if the case turns out badly? Some experts may decide that it is not worth the liability exposure and may decline invitations to serve as an expert witness. But doing good, reliable, legally defensible work is the best way a scientist can survive a Daubert challenge.

Validation of (Q)SARs Models
From a practitioner's point of view (but not having been part of the workshop), I feel compelled to comment on "Summary of a Workshop on Regulatory Acceptance of (Q)SARs for Human Health and Environmental Endpoints" by .
There are a variety of quantitative structure-activity relationships [(Q)SARs] models available for a variety of purposes, and, as stated by , predictive power is a critical issue in evaluating any model. Regrettably, the accompanying articles by Eriksson et al. (2003) and  fail to mention any of the recent publications on the application of probabilistic neural networks (PNNs) for the modeling of toxicity endpoints. Highly effective PNN models have been demonstrated for the fathead minnow (Kaiser and Niculescu 1999), the waterflea Daphnia magna (Kaiser and Niculescu 2001a), the ciliate Tetrahymena pyriformis (Niculescu et al. 2000), the Microtox bacterium Vibrio fischeri (Kaiser and Niculescu. In press), and estrogen receptor binding affinity (Kaiser and Niculescu 2001b). Indeed, Moore et al. (2003) have shown that fathead minnow PNN has superior performance in essentially all aspects when compared to the other methods. Other types of neural networks have similarly been shown to be robust and to provide optimal predictions (e.g., Burden and Winkler 1999). Furthermore, commercially available programs using PNN methodology have recently become available for the estimation of several toxicologic endpoints, such as fathead minnow 96-hr median lethal concentrations (LC 50 ) (TerraBase, Inc. 2002), rat and mouse intravenous LD 50 (TerraBase, Inc. 2003a), and estrogen receptor binding affinity (TerraBase, Inc. 2003b).
Although representativity or domain of a model are good concepts in theory, they are difficult to define or use in practice. Moreover, the statistical descriptors of a model's performance-such as goodness of fit, specificity, sensitivity, transparency, and similarity-are often misleading because the applied data set(s) for many (Q)SARs are narrow, skewed, or otherwise nonrepresentative of the chemical world existing in reality.
In most cases, a model user cannot ascertain whether a particular model may or may not be used for a particular compound and end point to be estimated. Without tests of comparative performance, this conundrum exists for users of most models. Even for quite similar compounds, model outputs can vary by several orders of magnitude between both models and measured values. For example, predictions of octanol/water partition coefficients (a physical property) for a small set of quite similar compounds by commonly used models show a large divergence of values (Vrakas et al. 2003). Therefore, the (only) proof of model accuracy is in the testing of each model's performance against a broad spectrum of measured data, which are not part of the training set of each model. In practice, this means that performance of a model should be the driving force for its acceptability in the regulatory world, not its statistics.
Regular scrutiny of performance has been commonplace in other areas. For example, the performance of Canadian environmental analytical laboratories is regularly checked with round robin testing. The predictive power of carcinogenicity and mutagenicity models has been evaluated in several rounds of testing, with the biological testing subsequent to the models' predictions. There is a great need for such comparative testing of the usefulness of various existing (Q)SAR models. The valiant performance testing of several toxicity-prediction (Q)SARs models by Moore et al. (2003) shows some surprising results and further gives credence to this thought. Indeed,  also stress the need for an independent organization to validate data and models irrespective of any model's claims.
The author is the director of research and a principal of TerraBase, Inc.

Klaus L.E. Kaiser
TerraBase, Inc. Hamilton, Ontario, Canada E-mail: mail@terrabase-inc.com general, only predictive techniques used by regulatory bodies are mentioned, although some other work to illustrate various other techniques was cited to provide examples. We did state that the probabilistic neural network (PNN) provided the best predictions for fish toxicity on the basis of an external test set ), but like the vast majority of examples of (Q)SARs in the literature, the other endpoints were not included. With regard to the article by Eriksson et al. (2003), the scope was clearly defined in the title; that is, it provided an assessment of regression-and projection-based methods. There are also other approaches to (Q)SAR, such as support vectors machines; however, these were not included because they are not used broadly in all (Q)SAR disciplines. Further, these techniques are not in frequent use by regulatory bodies.
We agree with Kaiser that the performance of a model should be a driving force and is more important than statistics. Appropriate measures of predictivity or performance are enshrined in the cornerstones of current attempts at validation. As part of the validation process, the domain of applicability should also be defined. This will allow a user to know whether a prediction is likely to be valid. It should also be noted that Eriksson et al. (2003) explain that representativity and homogeneity are basic requirements for (Q)SAR modeling, regardless of the statistical method used to develop the model. Eriksson et al. (2003) also noted that these concepts were easy to check and accomplish. Provided the basic conditions of representativity and homogeneity are not violated, statistical descriptors of a model's performance are not misleading. Indeed, failure to comply with representativity and homogeneity will result in the statistical assessment of model performance becoming inappropriate.
The reality is that simple and transparent models are (on the whole) favored by regulatory bodies over more opaque approaches, even if it means losing some statistical fit. We understand that problems could arise in some PNN approaches that have used large numbers of descriptors, some with no relevance to the mechanism of action (Cronin and Schultz 2001). A further issue with PNNs is the fact that they do not provide the same numerical solution to a problem when repeated.
The goal of the Setubal workshop was to encourage further development of the (Q)SAR sciences toward practical use and application. It is important, that as these models develop into more efficient tools for society to understand the fate and effect of chemicals, without extensive use of animals and other resources, they be seen to be valid and suitable for the purpose. To achieve this goal, there are a number of ongoing attempts to validate (Q)SARs (e.g., at the European Union and the Organisation for Economic Co-operation and Development level). These will attempt to weigh all the evidence and produce scientifically sound and usable validation techniques. Interested parties are encouraged to join in these efforts. More information is available on the European Commission Joint Research Centre website (JRC 2003).