Community-Led Total Sanitation: A Mixed-Methods Systematic Review of Evidence and Its Quality

Background: Community-led total sanitation (CLTS) is a widely applied rural behavior change approach for ending open defecation. However, evidence of its impact is unclear. Objectives: We conducted a systematic review of journal-published and gray literature to a) assess evidence quality, b) summarize CLTS impacts, and c) identify factors affecting implementation and effectiveness. Methods: Eligible studies were systematically screened and selected for analysis from searches of seven databases and 16 websites. We developed a framework to appraise literature quality. We qualitatively analyzed factors enabling or constraining CLTS, and summarized results from quantitative evaluations. Discussion: We included 200 studies (14 quantitative evaluations, 29 qualitative studies, and 157 case studies). Journal-published literature was generally of higher quality than gray literature. Fourteen quantitative evaluations reported decreases in open defecation, but did not corroborate the widespread claims of open defecation–free (ODF) villages found in case studies. Over one-fourth of the literature overstated conclusions, attributing outcomes and impacts to interventions without an appropriate study design. We identified 43 implementation- and community-related factors reportedly affecting CLTS. This analysis revealed the importance of adaptability, structured posttriggering activities, appropriate community selection, and further research on combining and sequencing CLTS with other interventions. Conclusions: The evidence base on CLTS effectiveness available to practitioners, policy makers, and program managers to inform their actions is weak. Our results highlight the need for more rigorous research on CLTS impacts as well as applied research initiatives that bring researchers and practitioners together to address implementation challenges to improve rural sanitation efforts. https://doi.org/10.1289/EHP1965


Introduction
An estimated 946 million people in the world practiced open defecation in 2015, 90% of whom lived in rural areas (UNICEF and WHO 2015). Open defecation adversely affects human health, contributing to diarrheal diseases and childhood stunting (Clasen et al. 2014;Spears et al. 2013;Vyas et al. 2016). Poor sanitation also has an adverse economic impact (DeFrancis 2011), and disproportionately affects the safety, health, and dignity of women Jadhav et al. 2016;Khanna and Das 2016;Kulkarni and O'Reilly 2014).
For decades, governments and nongovernmental organizations (NGOs) provided free or subsidized latrines to households, but practitioners widely believe that this approach was unable to guarantee regular latrine use. This recognition led to a focus on hygiene and health education programs, often combined with latrine subsidies, such as the Participatory Hygiene and Sanitation Transformation approach (WHO 1997). Lessons learned from implementing these programs led many sanitation professionals to conclude that while the infrastructure-heavy approach may have increased access to latrines and educational approaches may have increased awareness of health benefits, these strategies were largely insufficient to generate demand for latrines and change sanitation behavior (Jenkins and Sugden 2006).
As a response, the community-led total sanitation (CLTS) approach was developed, aiming to create open defecation-free (ODF) communities (Kar and Chambers 2008). This approach signified a fundamental shift from a focus on individual or household sanitation to a community-level concern for open defecation. CLTS facilitators attempt to trigger collective behavior change by encouraging and motivating people to confront the impact of community-wide open defecation. CLTS comprises three stages: • Pretriggering: selecting communities, training facilitators, collecting baseline information, and coordinating community entry. • Triggering: organizing a community-wide meeting where facilitators conduct participatory exercises intended to trigger shame and disgust. Attendees are expected to be motivated to change their sanitation situation. • Posttriggering: conducting routine follow-up visits, with the goal of verifying and certifying ODF status in communities. Since the first pilot projects in Bangladesh in 2000, CLTS has been adopted by many international NGOs involved in rural sanitation and has been incorporated into national policy by many governments. It is arguably now the predominant rural sanitation behavior change approach.
Most literature on CLTS is contained on websites and knowledge bases in the form of gray literature, primarily produced by practitioners to share insights from their implementation experiences. It has often been noted that there is limited rigorous evidence on CLTS impacts. Governments and organizations implementing CLTS face the challenge of navigating a vast and cluttered body of literature to inform their decisions.
We identified 41 published systematic reviews relating to sanitation interventions. Most study the impact of sanitation on health outcomes. A handful look at behavior-or demand-related topics, such as factors affecting sustained adoption of water and sanitation technologies , behavioral research relating to point-of-use-water treatment technologies (Fiebelkorn et al. 2012), behavioral models for water and sanitation (Dreibelbis et al. 2013;Dwipayanti et al. 2017), and water, sanitation, and hygiene (WaSH) social marketing approaches (Evans et al. 2014). No reviews were found that concentrate on CLTS or related interventions.
To address this gap, we conducted a mixed-methods systematic review of journal-published and gray literature on CLTS to characterize the state of the evidence. For the purpose of this review, journal-published refers only to peer-reviewed journals, with literature in other journals being classified as gray literature. The main objectives of the review were to: a) assess the quality of evidence, b) summarize evidence on the effectiveness and impact of CLTS on sanitation and health outcomes, and c) identify factors affecting CLTS implementation and effectiveness. We aimed to comprehensively document current understanding on CLTS from a variety of sources, including researchers, practitioners, and donors.

Search Strategy
We searched seven online peer-reviewed databases (Cochrane Library, Embase, Global Health, Web of Science, PubMed, Proquest, and Scopus) and the websites of 15 international organizations or sanitation knowledge hubs that document literature on CLTS. When expanding the scope of our search using Google Scholar, we observed considerable overlap between these search results and results from the 15 other websites already included in our search strategy. A previous study recommended focusing on the first 200 to 300 results from Google Scholar to find the most relevant results for systematic reviews of gray literature (Haddaway et al. 2015); therefore, we used this rationale to search the first 200 results from Google Scholar. A variety of search terms were employed to comprehensively search journalpublished and gray literature, including combined keywords related to CLTS, open defecation, and demand-led and participatory approaches (see Table S1 for the full search strategy and list of databases). Documents were also reviewed from bibliographic hand searches and expert consultations. Searches were first conducted in December 2015 and updated in March 2017.

Document Selection and Eligibility Criteria
A multistep screening process was conducted for both journalpublished and gray literature (Figure 1). Titles and abstracts of search results were screened independently by two authors (V.V. and A.K.), and they discussed any discrepancies in the selection of documents to reach a consensus on the final list before fulltext review. Research studies, conference proceedings, evaluations, dissertations, reports, working papers, and organizational learning notes published between January 2000 (the year that CLTS was first piloted) and March 2017 were included. Guidelines, manuals, publicity material, news stories, slide presentations, workshop minutes, blog posts, reviews, and commentaries were excluded. In the case of multiple documents reporting data and findings from the same intervention or study, only the most recent document was included. Because the aim was to assess the quality of literature, no documents were excluded based on quality.
Interventions labeled as CLTS are likely to contain a variety of adaptations, and several total sanitation strategies share characteristics with CLTS [e.g., School-led Total Sanitation, Community Approaches to Total Sanitation (CATS), Community-based Total Sanitation, and India's variations on total sanitation campaigns (TSC)]. To be inclusive, interventions that met all of the following criteria were reviewed: • Mentioned sanitation behavior change as a key component of implementation. • Aimed to reduce or end open defecation. Figure 1. Flow diagram of screening and selection process of community-led total sanitation literature. Note: In the first stage of screening, 4,187 records were excluded because the titles or abstracts indicated that they were guidelines, manuals, news stories, slide presentations, workshop minutes, blog posts, reviews, or commentaries.
• Mobilized entire communities for sanitation rather than targeting households or specific populations. • Included participatory activities such as triggering, village mapping, or transect walks in the decision-making and datagathering process. Although the Handbook on CLTS describes it as a no-subsidy approach (Kar and Chambers 2008), there is considerable debate about the role of latrine subsidies as part of, or following CLTS activities (Papafilippou et al. 2011). Therefore, interventions that met the above criteria and provided subsidized latrine hardware were also considered for inclusion in this review.
We classified literature as quantitative evaluations, qualitative studies, and case studies and project reports, adapting an approach used in a systematic review of cook stoves (Rehfuess et al. 2014). For this review, quantitative evaluations were defined as studies designed to attribute outcomes to a CLTS or CLTS-like intervention. Studies had to include primary data collection of outcomes and an experimental comparison group (controlled trials, quasiexperimental designs, and before-after comparisons). Quantitative studies that did not meet these criteria or did not have a comparison group were classified as case studies. Qualitative studies were those that used qualitative data collection methods and analytical techniques. Case studies and project reports included mixed-methods studies, cross-sectional studies, and literature that described practitioner experiences or reports and evaluations of specific CLTS projects. Papers that shared general lessons or reflections without references to primary data were classified as commentaries and were excluded from the review.

Quality Appraisal
To characterize the quality of evidence on CLTS across journalpublished and gray literature, we developed a quality appraisal framework for each of the three study types by reviewing and adapting questions from previously used protocols (Jack et al. 2010;Harden 2010;Heale and Twycross 2015;Loevinsohn 1990;Pluye et al. 2011;Puzzolo et al. 2013;Spencer et al. 2003;Thomas et al. 2004). Our framework comprises three categories: quality of reporting, minimizing risk of bias, and appropriateness of conclusions. Table 1 presents our quality appraisal framework, with questions to assess each criterion for the three study types. Most criteria in the quality appraisal could receive a score of 0, 0.5, or 1; the remaining, such as whether there was evidence of external peer review, could only receive a score of zero or one (see Table S2 for detailed scoring guidelines). Similar questions a were asked about quality of reporting (with a maximum possible score of 6) and appropriateness of conclusions (with a maximum possible score of 3) across all three study types. However, because of differences in study design and intent, questions to assess the risk of bias differed by study type; in this category, case studies and projects reports could receive a maximum possible score of 4, qualitative studies a maximum possible score of 5, and quantitative evaluations a maximum possible score of 7 (see Table S2). Scores within each of the three categories were converted into percentages. Documents with the maximum possible score for each category were assigned a value of 100% for that category. The percentage scores for all documents within a particular subgroup (e.g., the 10 quantitative evaluations from gray literature) were then averaged to derive the mean percentage to assess differences by type of study and type of literature (journalpublished vs. gray literature). An aggregate quality score was not computed for each document, as this would not allow for a nuanced discussion of quality by category, and could lead to misinterpretation of scoring. All documents were scored by the first author (V.V.), and 20% of the documents were subjected to independent quality control by the second or third authors (J.C. and A.K.).

Data Extraction and Analysis
Descriptive data on study type, author, project year, study design, countries of focus, country of publication, and methods were entered into a Microsoft Excel database. The main outcomes from quantitative evaluations were extracted and summarized. Qualitative content analysis was conducted for all included literature, regardless of study type, using Atlas.ti (version 7.0; Atlas.ti Scientific Softare Development). Documents were coded in two cycles (each cycle by one author) for the following: enablers and barriers to successful implementation in different stages of CLTS, key themes discussed, and indicators of success measured by programs and researchers. The first cycle of coding identified 150 factors reported as enabling or constraining CLTS. By combining similar factors, we narrowed this list to 43 factors in the second cycle. A similar process was conducted for indicators of success. Based on this inductive analysis, factors that enabled or constrained CLTS were grouped under three implementationrelated domains (policy environment, implementation quality, and administrative context) and four community-related domains (environment, capacity, participation patterns, and behavior). Indicators of success were grouped under the following domains: WaSH outcomes, CLTS process, behavioral outcomes, extended impact, and motivators for behavior change.

Results
We identified 5,884 documents from databases, websites, Google Scholar, and hand searches ( Figure 1). After screening for duplicates and excluding documents that did not meet our inclusion criteria, we reviewed the full texts of 855 documents for further assessment. Of these, we further excluded documents that did not describe and analyze primary data, documents that referenced identical data and findings, and documents in which the interventions were not described sufficiently to determine their similarity to CLTS. In total, 200 out of the 5,884 documents were included in this review. The results are organized into four sections: broad characteristics of the literature, quality appraisal, summary of quantitative outcomes and indicators from the literature, and qualitative analysis of factors influencing CLTS implementation. Table 2 presents broad characteristics of the 200 documents included in the review (see Excel Tables S1-S5 for detailed study-level information and Table S3 for a full list of included documents). One hundred and sixty-two (81%) documents were gray literature. Ten of the 14 (71%) quantitative evaluations were journal-published literature. One hundred and twenty-seven (64%) documents were based exclusively on CLTS interventions, 47 (23%) included CLTS interventions as part of a larger WaSH project, and 26 (13%) documents were based on CLTS-like interventions. The top three publishers of gray literature were the United Nations Children's Fund (UNICEF); the Water, Engineering, and Development Centre at the University of Loughborough; and the Institute of Development Studies at the University of Sussex.

Quality of the Literature
The average score for quality of reporting was higher for journal-published literature than for gray literature overall (80% vs. 58%, respectively), as well as for each study type ( Figure  2A). No document scored zero on quality of reporting, and 13 of the 200 documents received a full score (6 points, 100%). There was greater variability in quality of reporting scores for gray literature than for journal-published literature. Case studies and project reports received the lowest average score (57% for both literature types combined).
On average, journal-published literature scored better than gray literature in minimizing risk of bias across all study types a See Excel Tables S1-S3 for individual study-level information. b The sum of documents for world regions is greater than 200 because some documents covered multiple world regions (two world regions, n = 6; three world regions, n = 6; four world regions, n = 2).

Figure 2.
Quality appraisal scores (mean percentage) by literature type and study design for (A) quality of reporting, (B) minimizing risk of bias, and (C) appropriateness of conclusions. Raw scores in each category were converted into percentages, and documents that received the maximum possible score in a category were assigned a value of 100%. Error bars represent standard error of the mean.
( Figure 2B). Quantitative evaluations had the highest average score of 69% across all literature types (average values of 85% for journal-published vs. 55% for gray literature). Three quantitative evaluations scored below 50% (i.e., <3:5 points out of 7 possible), and one study scored 100% in this category. Qualitative studies had an average score of 48% across all literature types (average values of 60% for journal-published vs. 41% for gray literature). Case studies and project reports had an average score of 20% (average values of 53% for journal-published vs. 15% for gray literature). Seventy-two of the 157 case studies and projects reports (71 of which were gray literature) scored zero points on minimizing risk of bias, 22 (13 of which were gray literature) scored ≥50% (raw score ≥2), and one received the maximum score (4, 100%). Sixty-two percent of the 200 documents received a score of zero for the independence of data collection criterion.
We used the appropriateness of conclusions category to assess interpretation of findings, description of limitations, and whether conclusions were within the scope of the study design. Appropriateness of conclusion scores were generally higher across all study types than the other two categories, and there was less variation between journal-published and gray literature within each study type ( Figure 2C). Case studies and project reports had the lowest average score in this category of 61% for journal-published and 62% for gray literature. Seven of the 14 quantitative evaluations (two of which were gray literature) received a maximum score of 3 (100%), as did 13 of the 29 qualitative studies (10 of which were gray literature), and 27 of the 157 case studies and project reports (23 of which were gray literature). No quantitative evaluations received a score of zero, whereas two qualitative studies and three case studies and project reports received a score of zero. Study limitations were not described in 128 of the 200 documents (64%), including 17 of 38 documents from journal-published literature (45%) and 111 of 162 gray literature documents (69%). However, 63% of all 200 documents received a full score for the conclusions criterion, indicating that stated conclusions and implications were within the scope of study design and data collection methods.

Measuring the Effectiveness of Community-Led Total Sanitation
The main characteristics of the 14 quantitative evaluations and the main outcomes of the interventions they evaluate are presented in Table 3. Through qualitative coding of all documents, we also aggregated a list of commonly used indicators grouped under WaSH outcomes, CLTS process, behavioral outcomes, extended impact, and motivators for behavior change (Table 4). The following section reports quantitative outcomes from the 14 evaluations as well as indicators used across all 200 documents. Because of differences in study designs, it was not possible or appropriate to calculate pooled estimates of health impacts.
Nine evaluations were randomized controlled trials (RCTs), one used a quasi-experimental design, two used cross-sectional designs with a comparison group, and two were baseline to endline evaluations of a single intervention group (Table 3). The RCTs comprised evaluations of the following interventions and comparison groups: CLTS vs. a control group in Mali (Pickering et al. 2015); a four-arm intervention of CLTS, CLTS plus handwashing, handwashing only, and a control group in Tanzania (Briceño et al. 2015); CLTS with sanitation marketing vs. a control group in Indonesia (Cameron et al. 2013;Borja-Vega 2014); TSC in India combining behavior change activities with the option of subsidies vs. control groups (Patil et al. 2014;Pattanayak et al. 2009;Dickinson et al. 2015); the One Million Initiative in Mozambique that included CLTS vs. a control group (Godfrey et al. 2014); and a comparison of conventional CLTS to CLTS plus training natural leaders in Ghana (Crocker et al. 2016a). The quasi-experimental study compared conventional CLTS in Ethiopia facilitated by Health Extension Workers (HEWs) to teacher-facilitated CLTS (Crocker et al. 2016b). Of the two single-group evaluations, one was a baseline to end-line evaluation of the Philippines Phased Approach to Total Sanitation (UNICEF 2016), and the other was a baseline to endline evaluation of CLTS and other WaSH components in Kenya (Schlegelmilch et al. 2016). Finally, of the two comparative cross-sectional studies, one evaluated CLTS and Hygiene in Ethiopia to a control group (BDS-Center for Development Research 2016), and the other assessed health outcomes in a CLTS group vs. a control group in Kenya (Makotsi et al. 2016).
Latrine ownership, use, and quality indicators were identified in most of the 200 documents, but diverse measures were used (Table 4). A few documents contained definitions of the types of latrine that would be acceptable, and others outlined latrine quality indicators such as a cover, superstructure, handwashing facility with soap, and evidence of use. Of the 14 quantitative evaluations, all but two measured either private or household latrine ownership or latrine use after CLTS, and four evaluations also reported some measure of latrine quality, such as the presence of a cover, concrete slab, superstructure, or availability of handwashing materials (Table 3). Overall, the quantitative evaluations reported a statistically significant increase in private or shared latrine construction in intervention groups compared to comparison groups. The Mali CLTS evaluation reported a 32-percentage-point (pp) increase in latrine use in intervention villages and no statistically significant increase in the control group (Pickering et al. 2015) 18 mo after the intervention. In Orissa, India, a 29-pp increase in ownership 4 to 6 mo after the intervention was attributed to the subsidy and behavior change intervention, with two-thirds of the overall treatment effect due to the CLTS-like component compared to the subsidy component of the intervention (Pattanayak et al. 2009). The CLTS studies from Ghana and Ethiopia differed slightly from other included studies in that they compared changes in sanitation outcomes between conventional CLTS in that country and a modification of the approach. The Ghana study reported an increase in private latrine ownership of 18.3 pp from training natural leaders (Crocker et al. 2016a), and the Ethiopia study reported that where HEWs facilitated CLTS, latrine ownership increased by 9 pp more than where teachers facilitated CLTS soon after the interventions (Crocker et al. 2016b).
Declaration or certification of ODF status was the second most common indicator in the literature after latrine access, but no consistent definition was reported for ODF (Table 4). Latrine coverage was the most widely used proxy measure of ODF status. Status was measured at the community level, most often by CLTS facilitators or local government. Most documents that attempted to define ODF wrote about the absence of open defecation in the environment, but few described criteria or frequency for verifying this observation. There was some recognition that "the process of maintaining an open defecation free community is not static and communities cannot simply be checked off and assumed to be ODF, without systems in place that monitor and assist households to repair/replace/rebuild their latrines" (Haq and Bode 2009). In one program in the Philippines, ODF included the "enactment of local legislation at the village level supporting CLTS activities" and the "implementation of other local government activities that supported the maintenance of ODF status" (Belizario et al. 2015). Of the 14 quantitative evaluations summarized in Table 3, two reported the status of ODF certification conducted by the government. In Ethiopia, four of the six kebeles (subgovernmental units) were certified as ODF in the first year, and the remaining two were certified during the follow-up stage (Crocker et al. 2016b). In Mali, 97% of 60 villages in the treatment group were certified as ODF (Pickering et al. 2015). Two other indicators relating to open defecation identified in the literature were the number of people practicing open defecation (n = 31, 16%) in a community, and the number reverting to open defecation after a community had achieved ODF status (n = 14, 7%) (Table 4). Change in open defecation practice was reported in ten quantitative evaluations, measured at the household or individual level. As shown in Table 3, results varied considerably, from no statistically significant change between baseline and end line (15.2%) in the Philippines (UNICEF 2016), to a statistically significant 6-pp (17%) decrease in open defecation by nonpoor households in Indonesia (Cameron et al. 2013), to a 23to 24-pp (71%) decrease in open defecation by adults in Mali (Pickering et al. 2015). The study from Maharashtra reported a 9to 10-pp decrease in open defecation by adults; reductions were greater in below-poverty-line households or households that did not have latrines at baseline compared to wealthier households or those with latrines at baseline (Patil et al. 2014).
Fifty-one (26%) documents reported some measure of change in health status in communities after CLTS, often comprising anecdotal reports of changes in diarrhea after achieving ODF status (Table 4). Nine quantitative evaluations measured potential health impacts through self-reported changes in diarrhea prevalence or anthropometric measures in children (Table 3). The study from Mali reported no differences in childhood diarrheal prevalence between CLTS and comparison villages, but reported modest statistically significant improvements in child height, stunting, and weight (Pickering et al 2015). One study from Indonesia reported a 1.4-pp (30%) reduction in diarrhea in CLTS communities, decreases in the intensity of parasitic infection, and increases in height and weight among nonpoor households that had no sanitation at baseline (Cameron et al. 2013). A comparative cross-sectional study from Kenya reported lower diarrhea prevalence in the CLTS group vs. a control group (Makotsi et al. 2016). Four other studies did not find statistically significant evidence of health impacts due to the intervention (Borja-Vega 2014; Briceño et al. 2015;Patil et al. 2014;Dickinson et al. 2015), and one evaluation did not report statistically significant health outcomes (BDS-Center for Development Research 2016).
In addition to quantitative sanitation and health indicators, some documents described qualitative measures of CLTS success, such as influence on sanitation policy (n = 15, 8%), influence of the intervention on women and girls, particularly with regard to participation and leadership in CLTS activities (n = 13, 7%), diffusion of CLTS messages to other communities (n = 24, 13%), and positive nonsanitation outcomes resulting from CLTS, such as community mobilization for other development activities (n = 27, 14%) (Table 4). Table 5 presents enabling and constraining factors that emerged inductively from a qualitative content analysis of themes discussed in the 200 documents, as well as the stage of CLTS in which they occur. The 21 implementation-related factors fall under three domains: policy environment, implementation quality, and administrative context. The 22 community-related factors fall under four domains: environment, capacity, participation patterns, and behavior.

Factors Affecting Community-Led Total Sanitation Implementation and Outcomes
Policy environment. Eighty-four (42%) documents referred to the influence of the policy environment on CLTS activities, often about policies regarding latrine subsidies and latrine quality ( Table 5). The national sanitation policy (n = 37, 19%) was reported more often as a constraint than as an enabler. Policy that promoted specific national latrine standards was perceived to conflict with the CLTS message of building a latrine with whatever means available. Policy that encouraged hardware subsidiesmost often targeted to the poor-often conflicted with the nosubsidy approach of many CLTS projects. A history of latrine subsidies in communities that were to be triggered with CLTS, or current provision of subsidies near CLTS communities, were cited as constraints, and were often a result of policy decisions beyond the control of CLTS implementers.
A few documents mentioned latrine subsidy policy being used to the advantage of CLTS, such as in Nigeria, where WaterAid Nigeria prioritized follow-up activities on households that had hardware available from previous subsidy programs, "as these households are easiest targets" (Bawa and Ziyok 2013). They reported that this approach led to faster latrine construction because households did not need to think about technology options or financing. A study in Cambodia also reported that "ODF has been reached regularly in the dry season" in villages with subsidy programs, although they also reported that "proximity to on-going subsidized programmes erodes the effectiveness of CLTS" (Kunthy and Catalla 2009).
As shown in Table 5, setting national targets for sanitation was described in 26 (13%) documents and was a constraint in all but one case, where it created an incentive for local government officers in Kenya (Musyoki 2010). Most documents that referred to this factor noted that setting targets created a sense of topdown policy making that conflicted with the community-led nature of CLTS (Crocker et al. 2016c;Davis 2012;De Silva 2013;UNICEF WCARO 2011), led to a focus on rapid latrine construction rather than behavior change (AAN Associates 2013; Dyalchand et al. 2009;Haq and Bode 2009;Jha 2007;Kar and Bongartz 2006;Pardeshi et al. 2008;USAID 2014), led to community triggering outpacing capacity to follow-up (Toft and Onabolu 2012;UNICEF 2014), and created an incentive for implementers to misrepresent data (Mukherjee et al. 2012).
Implementation quality. Factors relating to implementation quality were reported in 149 (75%) documents (Table 5). Adequate preparation and planning in the pretriggering stage, including the importance of systematic community selection, was emphasized in 30 (15%) documents. Some mentioned the need to target specific types of communities rather than using CLTS everywhere (Burton 2007;Crocker et al. 2016b;Evans et al. 2009; Global Sanitation Fund 2015; Kunthy and Catalla 2009), but one suggested that targeting certain communities leaves behind those communities with unfavorable conditions (Bawa and Ziyok 2013).
The importance of the facilitators' skills (n = 45, 23%) and quality of triggering events (n = 80, 40%) were identified as determinants of CLTS success, with an underlying theme of adaptation (Table 5). A practitioner account from Zimbabwe emphasized the "need to be culturally insensitive during facilitation" by not being afraid to use bold terminology and to prioritize creative adaptations of triggering tools based on the context, with the aim to "create a sense of shame, fear and disgust" without "teaching, preaching or prescribing" (Chimhowa 2010). On the other hand, a study critical of CLTS in Indonesia concludes that "the use of shaming and taunting both disqualifies it as an empowerment approach and is likely to undermine its effectiveness in promoting long-term behaviour change. Even if shaming were shown to be effective, the morality of punishing the poor for their circumstances requires deeper consideration" (Engel and Susilo 2014).
Eight of the ten implementation quality factors presented in Table 5 were referred to primarily in the context of posttriggering activities. Fifty-four (27%) documents provided examples of frequent follow-up activities by NGOs or local government helping, or poor follow-up hurting CLTS outcomes. As part of follow-up, the theme of improving monitoring and evaluation of programs was mentioned in 48% of the literature. Many expressed a need for more systematic evaluations of CLTS projects and better use of data that are already being collected by practitioners. In one study, authors observe: "Several nongovernment organizations in the WASH sector worldwide have developed different protocols for defining, declaring, and certifying ODF status in communities, yet no protocol has been recognized as the global standard"  (Belizario et al. 2015). Based on their experience with CLTS in Indonesia, authors from the Water and Sanitation Program of the World Bank recommended that "post-triggering processes should be given a verifiable structure by establishing and periodically checking for desired progress quality indicators/milestones for success in triggered communities in order to improve institutional accountability for and the quality of follow-up" (Mukherjee et al. 2012). This need for a reporting structure was echoed in reports from Kenya (Tiwari 2011) and Ghana (Magala and Roberts 2009), among others. Technical support (n = 44, 22%) was often cited as an enabler in projects that provided guidance directly to communities on latrine construction or trained masons to improve toilet design (Evans et al. 2009;Huda 2009;Kalimuthu 2008;Magala and Roberts 2009;SEED Madagascar 2016;Shayamal et al. 2008;WaterAid India 2008). Technical support and subsidies were both contentious in the literature, with several practitioner accounts defending a strictly no-subsidy implementation of CLTS, whereas others advocated for greater flexibility, such as this evaluation of a UNICEF program: "Many implementers share the opinion that more work on the technical standards together with targeted subsidies are unavoidable to help reach the households build latrines and reach the ODF status in such areas" (Hydroconseil et al. 2014).
The presence of enforcement mechanisms or sanctions on open defecation or latrine construction was described in 39 (20%) The pretriggering stage comprises community selection, facilitator training, baseline information, and community entry; the triggering stage comprises a community-wide meeting with participatory exercises to trigger shame and disgust; and the posttriggering stage includes routine follow-up visits to verify and certify ODF status in communities. documents (Table 5). One study in Nepal reported that "coercive methods . . . did not always bring out tangible results" (Jha 2007), and in Bangladesh, enforcement led some people to "construct toilets out of fear of being fined without understanding the reasoning for doing so or the best methods for construction. This in turn leads to poor use of the latrines" (Kar and Bongartz 2006). An evaluation of CLTS in West Africa found that "such punitive measures seem out of line with the CLTS spirit of self-help and dignity. However, community enforcement may be considered as an appropriate additional measure [if] it is implemented in a real participatory and community-based way, with a collective decision" (UNICEF WCARO 2011). Local by-laws were described as effective in several settings; an evaluation of CATS reported that "in many countries, the strongest evidence of a change in social norms is the genuine adoption and the enforcement of formal and informal rules/bylaws at the level of the community, accepted by all the community members and recognized as collective rules which cannot be transgressed without consequences" (Hydroconseil et al. 2014).
Administrative context. Over half of the literature (n = 112, 56%) contained factors relating to the administrative context (Table 5). This domain included institutional capacity to implement CLTS (n = 66, 33%), administrative or financial arrangements (n = 60, 30%), and coordination between implementing organizations (n = 37, 19%). Concerns were documented relating to time availability, technical experience, skilled human resources, and the capacity to plan, budget, and allocate resources for CLTS. Some adaptations were also documented, such as "peerto-peer accountability mechanisms" for government health surveillance assistants in Malawi (Kennedy and Meek 2013) and village-level microplanning exercises for local government in Kenya (Singh and Balfour 2015). More documents described existing administrative and financial arrangements as constraints rather than enablers of CLTS activities.
Eighty-four (42%) documents cited local government ownership of CLTS as an important factor for success, scale-up, or sustainability (Table 5). The most effective level and type of local government involvement was unclear. In an evaluation from Zambia, authors state: "The level of support given to CLTS in certain districts is obvious, with a high level of involvement from everyone from Town Clerks and Chiefs to government representatives across sectors, knowing and understanding what the CLTS approach means. This level of understanding surely forms the basis for sustainability in an institutional sense" (Morris-Iveson and Siantumbu 2011). On the other hand, in Moroto, Uganda, "support for better sanitation and hygiene from political leadership was reported as lacking or weak in most respects . . . and served to undermine efforts of the extension staff as the latter strive to promote ODF villages" (Asingwire 2012). Local government ownership was often related to decentralization as well. For example, in Cambodia, decentralization helped transfer financial responsibility to the local government, creating a local source of funds for district and commune activities (Kunthy and Catalla 2009). On the other hand, it created uncertainty in Kenya (Crocker et al. 2016c), and was not matched with sufficient institutional capacity in Indonesia (Engel and Susilo 2014).
Community environment. Across 66 (33%) documents, environmental, geographical, and climate-related factors were cited, such as poor soil conditions and floods destroying latrines (Table  5). However, remoteness of a village (n = 13, 7%) was sometimes an enabler, as remote villages were less likely to have been exposed to subsidy projects and might therefore be more receptive to the CLTS message. Access to clean water in triggered communities (n = 23, 12%) was described as an enabler in several documents. For example: "one of the key entry processes is access to water. In the project communities water points were rehabilitated and in few cases new ones were installed. Communities clearly associated the effectiveness of CLTS to availability of water" (Burton 2007). Adeyeye also noted that WaterAid Nigeria "holds that access to water is a necessary prerequisite to access to adequate sanitation" (Adeyeye 2011).
Community capacity. Ninety-seven documents (49%) cited at least one community capacity factor (Table 5). The most frequently identified factors related to building latrines were access to supply of latrine hardware (n = 62, 31%), availability of financial resources (n = 54, 27%), and technical knowledge of latrine construction (n = 24, 12%). Studies often reported community members' desire for more guidance from implementers on how to build a high-quality latrine to avoid costly maintenance and repairs that could result in reversion to open defecation. There were examples of local mechanisms to address financial constraints, such as creating access to credit through village savings and loans associations (Adhikari et al. 2008;De Silva 2013;Global Sanitation Fund 2015;Magala and Roberts 2009;Mwanzia and Misati 2013;Tremolet et al. 2010) or collective community efforts to build latrines (Mukherjee et al. 2012;Priyono 2009). Nevertheless, poor latrine quality and resulting sustainability challenges emerged as important themes.
Community participation patterns. A key participant in CLTS is the natural leader, typically a community member who emerges in the triggering process as someone particularly motivated to improve sanitation. Eighty-two (41%) documents broadly referred to natural leaders (Table 5). Twenty-nine (15%) documents specifically noted the initiative of natural leaders as an enabler or barrier to CLTS, but only a few gave concrete examples, such as the challenge in identifying natural leaders or how training natural leaders in latrine construction or mobilization techniques proved to be effective. One study reported that training led to greater participation and better sanitation outcomes in Ghana (Crocker et al. 2016a), and practitioners in Madagascar reported that training helped motivate natural leaders to be more active in their communities (SEED Madagascar 2016).
Broader mobilization, participation, and motivation of community members in triggering and posttriggering activities was reported in 82 (41%) documents as an important reason for success or failure of CLTS (Table 5). A sense of community responsibility (n = 25, 13%) and social cohesion (n = 27, 14%) emerged in several documents. Smaller, homogeneous communities tended to be more successful (e.g., Evans et al. 2009;Haq and Bode 2009;Mukherjee et al. 2012;Tyndale-Biscoe et al. 2013;USAID 2014;Venkataramanan 2016), and greater cohesion was also connected with greater likelihood of self-help initiatives, for example, gotong rayong in Indonesia (Mukherjee et al. 2012).
Community behavior. Expectation of subsidies for latrine construction was the most commonly cited behavioral constraint in the literature (n = 29, 15%) ( Table 5). Preference for open defecation was also an important behavioral factor (n = 20, 10%) that related to slow progress or no change in sanitation behavior after triggering. Part of this was reported to be due to cultural or religious beliefs regarding open defecation or latrine use, which were often cited as either enablers and barriers to CLTS. For example, speaking about defecation was considered to be a private matter in several settings (Dittmer 2009;Evans et al. 2009;Shayamal et al. 2008). There are also taboos around different members of the household using or sharing a latrine (Bulaya et al. 2015;Burton 2007;Kappauf 2011;Mukherjee et al. 2012;Zombo 2010) or superstitions around latrine use (Dittmer 2009). Dittmer (2009) gave an example of an ethnic group in Burkina Faso with the tradition that "if someone gives you food, you are expected to defecate in his field (and fertilize the crops), as the act of giving entitles the giver to receive something in return." These traditional beliefs were used to adapt triggering in several programs (n = 19, 10%) by engaging religious leaders or using passages from the Bible or Quran during triggering events.
Fourteen (7%) documents noted that priorities other than sanitation can affect the response to CLTS (Table 5). For example, Engel and Susilo document one village chief's observation in Indonesia that "despite the supposedly participatory approach of the CLTS, the villagers did not want a sanitation project and would have preferred an adequate irrigation system for their farmland and a programme for replanting an area of cleared forest located near their farmland" (Engel and Susilo 2014).

Discussion
In our systematic review of CLTS and related interventions, we comprehensively characterize the state of evidence through a detailed quality appraisal, summary of quantitative outcomes, and qualitative analysis of factors affecting CLTS implementation and outcomes.

Quality of the Literature
Our quality appraisal indicates that evidence available to practitioners and policy makers is of variable quality, particularly regarding the ability to estimate the impact of CLTS on sanitation, health, or other community outcomes. Overall, we found that the journal-published literature was of higher quality than gray literature. We show that case studies and project reportswhich were primarily found in the gray literature-did not adequately describe their study design, data collection, or data analysis. Poor reporting of study characteristics makes it difficult to judge the objectivity and quality of information presented. Minimizing risk of bias was the weakest link in the quality of CLTS literature across all study types, but particularly in the case of qualitative studies, and case studies and project reports, which rarely described sampling methods, quality control in data collection, or analysis appropriate for the respective study design. Nearly two-thirds of all literature lacked independent data collection; improving this metric alone would minimize the risk of bias of all study types. Although nearly all documents gave context for their findings, there were large gaps in the description of limitations, preventing the reader from understanding the extent to which findings may be generalized. Furthermore, more than onefourth of the literature overstated conclusions by attributing outcomes to their intervention without an appropriate study design or by making claims about impact using unverified data sources or anecdotes.
CLTS is one of the most common rural sanitation behavior change approaches. It is increasingly being tested in urban settings as well (Murigi et al. 2015;Mwanzia and Misati 2013;Myers 2016;Prabhakaran et al. 2016). Therefore, there is an urgent need to better understand its effectiveness by improving the rigor of the evidence base. By reviewing the literature as a whole, we are able to compare the quality of different study types and identify specific areas for improvement. Case studies and project reports-which have the potential to detail processes and share lessons learned-can be improved through more systematic data collection and analysis, and more thorough reporting to determine the extent of transferability of findings. Quantitative cross-sectional designs-a subset of studies we classified as case studies and project reports-can describe outcomes on a large scale, but can be improved through more detailed descriptions of context and intervention processes. Qualitative studies can provide rich contextual descriptions, perceptions of different stakeholders, and reasons for success or failure, but researchers and implementers using these methods must improve the rigor of data collection and analytical techniques in order for findings to have sound policy and practice implications. Finally, welldesigned quantitative evaluations have the potential to attribute outcomes and impact to interventions, but the quality of quantitative evaluations can be improved through more rigorous data collection methods and better descriptions of context, process, and study limitations.

Measuring the Effectiveness of Community-Led Total Sanitation
We found few rigorous quantitative evaluations of sanitation and health impacts of CLTS and related interventions. The 14 that we included evaluated interventions from nine countries, whereas CLTS is now practiced in at least 53 countries. These studies reported increases in latrine ownership and decreases in open defecation, but did not corroborate the widespread claims of ODF villages reported in case studies and project reports. As Evans et al. note: "Like many terms in development, [ODF] has become de-linked from its true semantic meaning and become more of a milestone or marker in programme development" (Evans et al. 2009). The term ODF may serve as a motivator for communities to improve sanitation behavior, but is a poor indicator to compare across studies, programs, or countries, given the variety of definitions (see Thomas and Bevan 2014 for a review of various ODF protocols in sub-Saharan Africa). Household-level latrine outcomes, while imperfect, are a better measure of sanitation progress. For research purposes, pairing these with more robust measures of the defecation practices of individual people is a further improvement. To avoid the pitfall of simply counting latrines, programs could add routine measures of open defecation behavior, latrine use, and cleanliness through community monitoring initiatives (Coombes 2011).
One of the primary aims of sanitation is improved health, but measuring these changes is difficult if the sanitation intervention did not result in a sufficient reduction in open defecation or exposure to fecal contamination. Our synthesis of health outcomes from CLTS and related interventions supports findings of previous reviews Sclar et al. 2016;Taylor et al. 2015;Wolf et al. 2014) and underscores the challenge of attributing health impacts to sanitation, particularly over a short followup. Furthermore, our review supports prior observations that selfreported diarrhea is an unreliable measure of impact (Schmidt 2014), even though one-fourth of the literature contained anecdotal reports of perceived health impact through self-reported reduction in diarrhea. While more studies that consider a variety of sanitation-related health outcomes, including measures of nutritional status, may be beneficial, they tend to be expensive, require the intervention itself to be sufficiently successful to change sanitation outcomes, and require a long enough follow-up period to observe a noticeable change in health outcomes (Cairncross et al. 2010;Gertler et al.2015).
Many programs are unlikely to have the resources or technical expertise to incorporate health impact into monitoring and evaluation systems or to commission such studies. Nevertheless, our review reveals an opportunity for researchers and practitioners to work together to address more immediate implementation and operational research questions by leveraging a variety of study designs. Given the participatory nature of CLTS activities and emphasis on sustained behavior change, such research would be strengthened using mixed methods, including qualitative indicators of participation and perceptions and better measures of social norms for a thorough picture of CLTS effectiveness and impact.
We identified 43 implementation and community-related factors from the literature affecting CLTS. Many were context-specific enablers or constraints to CLTS implementation and outcomes. Other factors, such as local government ownership of CLTS, institutional capacity, importance of facilitators' skills, and community participation in CLTS, were described in a similar manner across much of the literature. We suggest four important considerations from this qualitative analysis of the literature.
First, our review confirms the narrative of CLTS as a highly adaptable approach. Like Sigler et al. (2015)'s finding that multiple behavior change frameworks are employed in CLTS, we found that shame and disgust, although popular, were not reported as universal motivators that triggered communities; instead, improved health, dignity, and pride were cited more often. Skilled facilitators adapted their triggering techniques based on cultural considerations. However, finding such facilitators was described in the literature as an important constraint. Less-skilled facilitators resorted to either lecturing communities on health benefits or falling back on conventional shaming or disgustinducing triggering techniques, regardless of their appropriateness in that context (Venkataramanan 2016). We did not find any studies in our review that evaluated the relative effectiveness of different triggering adaptations, despite calls for a closer analysis of the potential human rights implications of CLTS and related techniques (Bardosh 2015;Bartram et al. 2012;Engel and Susilo 2014;Galvin 2015).
Second, although a high degree of flexibility is expected during triggering, lack of structure in posttriggering activities may be less beneficial. The Handbook on CLTS-the de facto manual that most CLTS programs use as a starting point-does not detail the structure of the posttriggering stage, acknowledging that activities are likely to depend on the context and characteristics of the specific community (Kar and Chambers 2008). Our analysis suggests, however, that certain elements of posttriggering activities routinely challenge programs around the world. For example, there was no clear evidence on the effectiveness or appropriateness of providing incentives or subsidies to some communities or on the role of enforcement and sanctions for noncompliance. Our review confirms Bartram et al.'s observation that there continues to be minimal debate or critical review of the effectiveness or humans rights consequences of posttriggering punitive measures (Bartram et al. 2012).
Another set of posttriggering challenges relates to the supply of durable and affordable latrine hardware and technical support on latrine construction. Notably, we identified a debate over the nature of technical support that should be provided to communities for latrine construction. CLTS programs do not follow uniform guidance on technical support, as communities are supposed to identify their own solutions to stop open defecation. Whereas some programs provided detailed technical support on latrine options, trained masons, or attempted to improve the supply chain for hardware, follow-up in other programs simply meant monitoring latrine construction. However, our analysis of the literature suggests a need for additional guidance, as substantive concerns were expressed from both community and implementer perspectives about the quality of latrines built because of CLTS, potentially discouraging sustained behavior change and possibly explaining the minimal effects seen in health impact studies (Papafilippou et al. 2011). We argue that programs should routinely incorporate technical support into the posttriggering stage, particularly when communities prefer durable latrines and express a need for this kind of support. An eight-country evaluation of CLTS in Africa similarly recommended that in the absence of a sanitation marketing program, "the post-ODF approach should include a set of 'second-phase' interventions designed to provide advice on how to upgrade and improve sanitation and handwashing facilities using local materials" (Robinson 2016).
Third, we suggest that communities should be selected for triggering based on community characteristics and resources available to maintain routine follow-up activities, including local government ownership. We reveal conflicting views on the scope for application of CLTS, with practitioners often suggesting that it is appropriate in all rural settings, whereas evaluations and studies of CLTS point to more deliberate targeting. This is particularly relevant when community members express priorities other than sanitation.
What interventions should be implemented, then, in places where CLTS is not likely to be successful? While there are some settings where CLTS is never going to be an appropriate intervention, there are also likely to be settings where CLTS may not be successful on its own, but can result in sustained changes when combined or sequenced with other demand-generating or demand-fulfilling approaches, such as sanitation marketing or other WaSH interventions. Further research is needed to understand the most effective combination and sequencing of WaSH interventions. Our review revealed that several programs install water supply projects simultaneously with or following CLTS projects to try to ensure that anticipated gains from sanitation behavior change were not lost due to limited water supply. Several programs also measured total sanitation practices in their CLTS programs such as handwashing, water and food safety, and garbage disposal as opposed to focusing solely on open defecation. Some practitioners consider this lack of standardization of CLTS as a problem for scalability and sustainability (SNV Uganda 2014), but we suggest that it can instead be viewed as an opportunity to expand the conversation to consider CLTS as part of a total WaSH strategy to achieve the WaSH Sustainable Development Goals by 2030 (United Nations General Assembly 2015).

Limitations
Although we present findings from 53 countries, we did not specifically search for non-English documents and may have missed experiences from some countries. Because gray literature is, by definition, not published in peer-reviewed journals, and because it is produced so rapidly, we may not have captured all the available literature. However, we believe we reached saturation and have captured the vast majority of the CLTS evidence base by scanning 5,884 documents from diverse sources and reviewing 200 in detail. Finally, although the content analysis was conducted systematically in two stages by two authors, it is possible that our frequency counts were slightly underestimated, and we were unable to capture every factor and indicator presented across all documents.

Conclusions
This is the first comprehensive systematic review, to our knowledge, of the state of the CLTS evidence base. Most literature on CLTS is on websites and knowledge bases rather than in peerreviewed journals; this gray literature is more extensive and more accessible. Therefore, the large and inclusive scope of this review offers one of the first aggregate views of the evidence currently available for decision-makers as they consider whether and how to test, adopt, or scale-up CLTS worldwide. By including a variety of literature types (journal-published and gray) and study designs (quantitative, qualitative, and case studies and project reports), we were also able to identify their strengths or weaknesses and compare their relative contribution to the evidence base.
The quality appraisal framework we developed serves as a practical tool for assessing the quality of evidence from sources as varied as NGO reports, qualitative studies, and RCTs. To our knowledge, this is the first tool of its kind that enables a combined assessment of such literature on water and sanitation to develop specific recommendations for improving the evidence base.
The mixed-methods analysis of the quality and content of literature enabled us to pool findings in a much richer way than a meta-analysis of one particular study type would have allowed. By and large, there is substantive room for improvement in the quality of evidence on CLTS. We found that CLTS has been rolled out with minimal rigorous evidence on its effectiveness and impact on sanitation and health outcomes. While quantitative evaluations show reductions in open defecation and increases in latrine coverage, they do not mirror practitioner accounts of widespread elimination of open defecation. There is little evidence for sustained sanitation behavior change as a result of CLTS, and there has been minimal systematic research of the CLTS implementation process and its adaptations. We also provide evidence for the need to improve the structure of CLTS activities and the need to consider CLTS as part of a larger WaSH strategy rather than as a singular solution to changing sanitation behavior.
The research-practice gap can be narrowed if researchers work more closely with implementers to design implementation and operational research studies to address specific challenges relating to sustainable behavior change and change in social norms, as well combining and sequencing of different sanitation or WaSH approaches. Donor agencies and national governments should support researcher-practitioner initiatives to improve the evidence base and provide policy makers opportunities to make more informed decisions to improve sanitation outcomes.