A bad start for socioeconomically disadvantaged children.

Imagesp462-ap462-b


INTRODUCTION
Biological robustness, a fundamental and ubiquitous phenomenon observed in biological systems, is broadly understood as the ability to maintain stable functioning in the face of various perturbations. Depending on whether the perturbations are inheritable or not, robustness is characterized as genetic (mutational) or environmental robustness (1). Genetic robustness describes insensitivity of a phenotype facing genetic mutations, and the insensitivity to environmental factors is called environmental robustness. Biologists have a long-standing interest in biological robustness, going back to Fisher's work on dominance (2-4) and Waddington's developmental canalization research (5,6). Robustness has become a focus of numerous studies in recent years, and has been found at various levels of biological systems, including gene expression, protein folding, metabolic flux, physiological homeostasis, development and even organism fitness (7). Hiroaki Kitano argued that the requirements for robustness and evolvability are similar, since robustness facilitates evolution and evolution favors robust traits (8). A proper understanding of the origins of robustness in biological systems will catalyze our understanding of evolution (9).
The secondary structure of RNA is a suitable test bed for studying biological robustness. Wagner and Stadler provided evidence that robustness of RNA viruses to mutational changes in secondary structure has evolved (10). Mutational robustness has also been found in viroids (11,12). By examining microRNA genes of serveral species, Borenstein and Ruppin (13) recently showed that the structure of miRNA precursor stem-loops exhibits a significantly high level of genetic robustness, compared with random sequences with similar stem-loop structures as native miRNAs which were generated by inverse folding algorithm, indicating that this excess robustness of miRNA went beyond the intrinsic robustness of the stem-loop hairpin structure. Furthermore, they demonstrated it was not the by-product of a base composition bias. Their findings suggest that the excess robustness of miRNA stem-loops is the result of direct evolutionary pressure toward increased robustness (13).
Although the mechanisms of robustness have been widely explored (13)(14)(15), to date, the evolutionary origins of robustness are still controversial, which is partly due to the difficulty in providing evidence for robustness in natural biological systems (16). Addressing this challenge, a convenient computational tool for the structural robustness evaluation is strongly needed.
The RNA structural robustness evaluator (RSRE) presented here is a web tool developed for RNA structural robustness evaluation, both for genetic robustness and environmental robustness. By using classical RNA structural distance measurement methods, the robustness of a given RNA and its control sequences can be evaluated quantitatively based on a generalized definition of neutrality. The RSRE web server will finally give statistical significances of the robustness differences between the given RNA and its control sequences. The RSRE will facilitate wide exploration on the origins of robustness and catalyze our understanding of RNA evolution.

Control sequence generation
Random sequences are used to extract statistical significance for properties from biological sequences, providing the 'back-ground noise' to differentiate the real biological information (17). However, a simple randomization method of RNA sequence obscures the frequencies of the mononucleotides and dinucleotides, which are biased and crucial for the physical stability of the secondary structure (18)(19)(20)(21). It is consequently essential to rule out the bias of base compositions in the robustness analysis. To this end, we can generate additionally four types of random sequences preserving the exact or nearly exact mononucleotide and dinucleotide base compositions as the native sequence, besides the pure random sequences. The five randomization methods used in RSRE are described in detail as follows: Pure random. This method produces pure random sequences with the same length as the original. The mononucleotide and dinucleotide frequencies are completely distorted using this method. Shuffling based on zero-Markov model. The mononucleotide frequencies, P(b), for the native biological sequence are calculated and used to generate a random sequence in which bases were simply chosen at random from P(b) until the length of the native sequence is reached. Mono-shuffling. This type of shuffling is done by permuting the nucleotides of the sequence at random. The dinucleotide frequencies are completely distorted using this method. Shuffling based on first-Markov model. This method derives as first-order Markov model from the conditional probabilities P(a|b) of nucleotide a given b, which is found from the frequencies of all possible pairs ab in the biological sequences. A random sequence is generated by first choosing a random nucleotide x 1 , and then, a sequence is generated by choosing each nucleotide x iþ1 from the probability P(x iþ1 |x i ). The process will stop when the sequence has exactly the same length as that of the native. This method produces shuffled sequences with dinucleotide frequencies close to the original sequences. Mononucleotide frequencies are not preserved.
Dishuffling. In this method, a sequence is shuffled while keeping the dinucleotide distribution (or frequency) constant. A similar implementation of the Erikson-Altschul algorithm (18,19) was used. The dinucleotide and mononucleotide frequencies are exactly preserved.
Considering that certain secondary structures may be inherently more robust than others, random sequences with both phenotypically similar configurations and similar base compositions as native RNAs are also needed to control the effects of secondary structure in some researches (13). However, it is difficult to provide such control sets by most computational servers, due to the high computational cost (13). With the development of fast RNA inverse folding algorithms, we will find approaches to provide this kind of control sets in the future version of our web server.

Robustness evaluation
Experimental researches have demonstrated that the secondary structure of some RNAs are tolerant to some mutational changes (11)(12)(13)(22)(23)(24)(25). To reflect this flexibility in sequence/structure requirements, at a given threshold T j , we defined the robustness j as follows: where d is the secondary structure distance between the original RNA and its mutant, and N j (d) is the number of mutants with structure distance lesser than or equal to the threshold T j . j is the average of N j (d) over all 3 Â L onemutant neighbors at the threshold T j . The maximum value of the secondary structure distances between the random sequences and their mutants was used as a baseline value to evaluate the threshold level of each distance metric (Supplementary Figures S1 and S2). The threshold T j , j ¼ 0,1,2,. . .,9 was set to 0, 10, 20,. . ., 90% of the maximum value of the metric, respectively. At threshold T 0 , robustness is reduced to the definition of neutrality (13). The larger value of the robustness j at threshold T j indicated a relatively higher level of robustness. A variety of distance measures for secondary structures (26)(27)(28)(29) realized by RNAdistance in the Vienna RNA package (version 1.6) (27,30) were used to compare the secondary structures between the wild-type and its mutants, including tree-edit distance, string distance and base-pair distance (27,31,32).
The RNAfold and RNAsubopt (32) in the Vienna RNA package (version 1.6.1) (27,30) were utilized with default parameter values T ¼ 378C to predict the secondary structures. The former is a variation of the Zuker and Stiegler's (33,34) minimum free energy algorithm, while the latter is for the calculation of all sub-optimal structures within a user-defined energy range above the minimum free energy (MFE). In order to mitigate the uncertainty of the MFE structure, sub-optimal structures of mutants within 1 kcal/mol (the default setting of RNAsubopt) above the MFE are considered. A synthetic estimation method is used to estimate the differences between the structures of the wild-type R and possible structure set of the mutants À where R Ã i represents the ith predicted structure of the mutant. It is given by summing the contributions of all structures weighted by their Boltzmann probabilities, which is similar to the methods used in other researches (35). In this case, the distance is given by d 0 ðR, To explore the evolutionary origins of genetic robustness, we also examined the thermodynamic stability of RNAs in an analogous manner to the method used in previous researches (18,19,36), due to the possible correlation between the thermodynamic stability (environmental robustness) of the minimum free energy structure of a given sequence and its genetic robustness (32).

Statistical significance analysis of robustness
At each threshold T j , we evaluated the robustness i of the inputting sequence and Ç j ¼ f c i j ,i ¼ 1,2, . . . ,Ng of the corresponding control sequence set X (N is the number of sequences in the control set X), and then compared i with W j . The Z-score and P-value were then computed to determine whether the secondary structure of the inputting RNA molecular showed significantly more robustness than the control sequences. The Z-score is defined as: where Á h i and (Á)denote the mean and the standard deviation of W j , respectively. The P-value of j is the fraction of sequences in X having robustness greater than the inputting RNA molecular, defined as: where M is the number of sequences with more robustness than the inputting RNA molecular in X.
The statistical significance analysis of environmental robustness was similar to that done for genetic robustness, in which the robustness j at threshold T j was replaced by free energy of the sequences.

IMPLEMENTATION
The core module of RSRE is written in Cþþ and the web interface is implemented in PHP and JavaScript. RSRE runs on two work stations with dual AMD X64 CPUs, 4G memory and Linux operating system.

Input and options
With a step-by-step style input interface (Figure 1), the RSRE web server is easy to use. A valid email address is required for each job. The sequence of an RNA molecule can be inputted either by pasting raw sequence or by uploading sequence file in FASTA format. The sequence should be a string of unmodified RNA/DNA bases (A, U/T, G and C), any other character in the sequence will be edited out. Multi-FASTA (MFA) format sequence file is also supported to facilitate users.
The inputting limit is set to 10 sequences for a job and 200 bases for each sequence. The analysis scheme is designed to be custom-built for users. The methods for using the sub-optimal structures can be selected by users. Users can also choose any one of the randomization methods described above and the number of control sequences according to their analysis requirements. Evaluation of either type of robustness (environmental robustness and genetic robustness) or both of them can be selected by the user. In the case of genetic robustness, users can select the algorithms for computing structure distance.

Output
To illustrate how our web applications can be helpful to the evaluation of the RNA structural robustness, the Caenorhabditis elegans let-7 microRNA precursor, one of the founding members of the microRNA family (37,38), was submitted to RSRE. A notification email containing a URL linked to the output page ( Figure 2A) was sent to the user when the job was completed. This URL remains valid for 48 h. To make the analysis results intuitive, the statistical distributions of free energy and robustness value j at threshold T j , j ¼ 0,1,2,. . .,9 are calculated and illustrated as histograms. By selecting the content item and clicking the 'view' button on the output page, the details of the results can be viewed as graphic representations. Figure 2B is the distribution histogram of free energy of cel-let-7 with its corresponding control sequences preserving the dinucleotide frequencies. Figure 2C is the distribution histograms of the robustness values at different threshold levels. With a hyperlink located at the bottom of the output page (Figure 2A), the output page offers download of the results as a single packed file in '.gz' format for off-line analysis. In addition to the robustness distribution histograms (in 'PNG'image format), the corresponding P-value and Z-score of let-7 at different thresholds (in 'TXT'text format), the corresponding control sequences (in MFA format) and the robustness values at all the 10 threshold levels of let-7 and its corresponding 1000 control sequences (in 'TXT' text format) are also included in the result file ( Figure 2D). The result file name is in the form 'yymmddhhmmss.no', where 'yy' is year, 'mm' is month, 'dd' is day, 'hh' is hour, 'mm' is minute, 'ss' is second and 'no' is serial number.

Performance of the web server
To test the computational efficiency of RSRE, 10 groups of random sequences with 8 different lengths (from 25 to 200 with step 25) were submitted. All types of structure distance measurement are used in these tests. The CPU time of the 10 groups' tests is illustrated in Supplementary  Figure S3. Since June 2006, the two sites have been active for several months and served over 1000 submissions.

CONCLUSION
The RSRE web server we presented here provides a freely available online tool for RNA structural robustness evaluation. The sufficient control data and the widely accepted definition of neutrality give high reliability to the estimation results. The sub-optimal predicted RNA structures can also be optionally involved to mitigate the uncertainty of secondary structure prediction. Intuitive illustrations are provided along with the original computational results in the output page of RSRE to facilitate analysis. RSRE will facilitate a wide range of studies on RNA structural robustness, and therefore, will be helpful in RNA evolution exploration, artificial RNA design and other related research.

FUTURE PLANS
To provide a wide basis for RNA robustness exploration, our future works will focus on increasing the computational ability of the web server. By using a supercomputing blade system, the limit of inputting sequence length will be eased to meet the need of ncRNA robustness analysis in more cases. Also, in the future, we will provide more randomization methods, including the methodgenerating random sequences with both phenotypically similar configurations and similar base compositions as native RNAs.