The modern synthesis theory of evolution postulates that the evolution of living organisms results from random mutations that are selected or filtered by natural or sexual selection, or that are under neutral evolution when they do not affect gene product functions. “Random mutations” means that mutations occur at random locations in the genomic DNA or that mutations occur randomly with respect to the biological effects they may drive. However, increasing evidence indicates that the DNA location of mutations is not random. For example, the sequencing of an increasing number of human genomes from parent-offspring trios indicates that de novo mutations in the germline are not randomly distributed across the genome [1-5]. It seems instead that the DNA location of mutations is associated with local genomic factors including local sequence context (e.g. CpG dinucleotides), DNA biochemical (e.g. methylation) and structural features (e.g. non B-DNA structures), or chromatin marks and topology [1-5]. This is likely because these genomic factors impact on different mutational processes such as DNA damage or editing, or processes involved in DNA repair [6-9]. This interplay raises the possibility that some cellular processes could impact on the mutational rate of genomic loci by impacting on local genomic factors.
In this context, increasing evidence indicates that RNA molecules target specific genomic loci through several mechanisms (e.g. base-pairing to complementary nascent RNAs or to complementary DNA strands), and can impact on the local genomic factors described above. RNAs can direct chromatin or DNA biochemical and structural modifications, and can guide to targeted loci, enzymes involved in DNA sequence modifications (e.g. DNA endonucleases and editing enzymes) [10-13]. RNAs are also involved in DNA repair and recombination, and they direct complex DNA rearrangements [14-16]. These observations have led several authors to propose that RNAs are involved in “re-writing” the genome [14-19]. Consequently, some biological processes could result in the biogenesis of RNAs that target complementary genomic loci and that modulate the mutation rate of the targeted-loci. If such a process exists, where might these RNAs originate, and could they increase the local mutational rate of targeted genomic locations in a non-random way with respect to the biological effects they potentially drive?
While it has long been established that transcription is coupled to translation in prokaryotic cells, one of the most exciting recent discoveries in eukaryotic cells, is that the different steps of the gene expression process are physically coupled. Indeed, transcription is coupled to RNA processing, and mRNA metabolism is coupled to protein metabolism [20, 21]. In this article, I show that the physical proximity or the tight interplay between different steps of gene expression provides the molecular framework for a potential feedback from the gene expression process back to DNA mutation process. More specifically, since the biogenesis of any gene product (i.e. RNA or protein) in the crowded interior of a cell can threaten cellular and genomic integrity, I will propose that the cell might overcome these problems through molecular pathways that facilitate changes to genomic sequences that produce” toxic molecules” (Box 1). Since parasitic nucleic acids are also a threat to the integrity of the cell and its genome, the same molecular pathways, which rely on small RNAs to mutate or remove exogenous toxic nucleic acids, are presumably also at work to modify or remove toxic endogenous nucleic acid sequences.
Co-transcriptional biophysical constraints shape gene architecture
The interplay between co-transcriptional biophysical constraints, DNA instability, and RNA processing drives DNA sequence evolution
It is now clearly established that transcription generates topological and biophysical constraints on DNA with consequences on genome stability. For example, negative and positive supercoils are generated behind and in front, respectively of the transcribing RNA polymerases. This then results in the formation of non-B DNA structures and transcription or replication roadblocks that can induce DNA instability (e.g. DNA breaks) [22, 23]. In addition, the newly synthesized RNA molecules can re-hybridize to the template DNA strand, to create co-transcriptional R-loops (composed of an RNA:DNA hybrid and a displaced single-stranded DNA). These R-loops are a major source of DNA instability [23, 24]. Transcription can therefore challenge DNA integrity (e.g. by inducing DNA breaks) both in an RNA-dependent and -independent manner, through co-transcriptional biophysical constraints.
As well as generating genome instability, co-transcriptional constraints trigger the co-transcriptional processing of nascent RNAs (e.g. splicing, 3′-end RNA processing) in eukaryotic cells. For example, chromatin compactness and transcriptional roadblocks impacts on co-transcriptional splicing or trigger transcription termination, RNA cleavage and 3′-end RNA processing [20, 25-28]. The formation of R-loops has also been involved in 3′-end RNA processing [29, 30]. It is believed that co-transcriptional physical constraints induce pausing of RNA polymerase, which increases the time window required for the co-transcriptional recruitment of RNA processing factors on the nascent RNAs [20, 26, 31, 32].
While co-transcriptional physical constraints impact on co-transcriptional RNA processing, increasing evidence indicates that co-transcriptional RNA processing itself alleviates these constraints, and protects DNA from transcriptional-mediated damages. Indeed, inhibition of different RNA processing steps, including splicing, leads to genome instability [33-36]. Several mechanisms could explain the protective effect of co-transcriptional RNA processing on DNA integrity. First, RNA-binding proteins involved in RNA processing may coat the nascent RNA and prevent it from hybridizing back to the DNA template. Supporting this model, depletion of several splicing factors lead to R-loop formation and genome instability [33-39]. Second, co-transcriptional RNA processing could favor the removal of newly synthetized transcripts from chromatin since splicing is coupled to exon junction complex recruitment, which contributes to co-transcriptional RNA packaging and RNA export. Supporting this model, depletion of factors involved in the coupling between RNA processing, RNA packaging and export, results in DNA instability [40, 41]. Co-transcriptional translation in prokaryotes may have a similar protective role to eukaryotic co-transcriptional RNA processing, by taking nascent RNA off the DNA template .
Because of the interplay between co-transcriptional biophysical constraints, DNA instability and co-transcriptional RNA processing, I propose that co-transcriptional biophysical constraints within a transcriptional unit increase DNA damage locally, that is, in the vicinity of the physical constraints (Fig. 1). As long as the constraints persist, the resulting DNA instability increases the probability that mutations occur at the challenged locus. The high local mutational rate may lead to the emergence of RNA processing sites over evolutionary time because they alleviate transcription-mediated genotoxicity and thus increase the DNA stability of their host gene.
Are small RNAs involved in directing genetic variations in response to transcription-induced genome instability?
The mutational process in Fig. 1 that ultimately overcomes co-transcriptional DNA instability likely involves small RNAs. Indeed, small noncoding RNAs are produced in the vicinity of double-stranded DNA break (DSB) sites [43-46]. These small RNAs can induce chromatin modifications to assist DSB repair or drive the local recruitment of proteins involved in DNA repair [43-47]. Recent work also suggested that RNAs can be used as templates for homologous recombination in bacteria, yeast, and human. Indeed, RNAs can anneal to complementary DNA sequences and serve as templates for DNA repair by reverse transcription [48-51]. RNAs can also direct endonucleases or DNA-editing enzymes, like the activation-induced cytidine deaminase, to genomic loci, which results in increased DNA instability and mutation there [12, 13, 16].
Collectively, these observations raise the possibility that transcription-mediated DNA damage induces the biogenesis of small RNAs (either by transcription or nascent RNA cleavage) that subsequently modulates the local mutational rate. It would be interesting to look, in the future, at whether co-transcriptional biophysical constraints induce specific types of small RNAs, DNA damage and mutations that are more likely to generate splicing or polyadenylation sites or some RNA processing regulatory sequences. For example, since co-transcriptional R-loops are made of a displaced single-stranded DNA that is the preferential substrate for DNA editing enzymes, it would be interesting to test whether the mutations mediated by DNA-editing enzymes increase the generation of RNA (regulatory) processing sites. Even if this is not the case, the important take home message from Fig. 1, is that local DNA instability that is induced by co-transcriptional constraints persists until sequences (e.g. RNA processing sites) that alleviate the co-transcriptional physical constraints emerge over evolutionary time.
Understanding genomic features in light of co-transcriptional physical constraints driving DNA sequence evolution
The mutational process hypothesis outlined in Fig. 1 implies that biophysical parameters that impact co-transcriptional events (and that therefore more-or-less directly impact DNA stability) can be modified during evolution not because of randomly located mutations. Rather, mutations preferentially occur where co-transcriptional parameters induce genomic instability. In prokaryotes, this model implies that transcriptional elongation rate directly contributes to codon-use evolution. Indeed, a fast transcriptional elongation rate would create genetic instability when the nascent RNA is not efficiently translated. Therefore, RNA-mediated genotoxicity would favor the emergence of codons that synchronize the kinetic parameters of translation and transcription [42, 52].
A consequence in eukaryotes of the mutational process in Fig. 1 is the emergence, over evolutionary time, of a large number of cryptic and alternative RNA processing sites. Indeed, if co-transcriptional constraints trigger both DNA instability and RNA processing, which in turn alleviates these constraints, transcription-mediated genome instability would constantly favor the emergence of new RNA processing sites. This conclusion leads on to a novel interpretation of the huge number of available RNA processing sites (splicing and polyadenylation sites) present in most eukaryotic genes. It is often assumed that alternative RNA processing sites are generated by random mutations and are then selected during evolution depending on the cellular functions of the gene products they allow the synthesis . However, it is conceivable that RNA processing sites are generated over evolutionary time because of the DNA instability triggered by co-transcriptional biophysical constraints. Neo-formed RNA processing sites would be passed down the generations because they contribute to the genetic stability of the transcribed loci they are embedded in. Neo-formed RNA processing sites within a gene would next be filtered depending on their potential impact on the cellular function of the gene products (see below).
The interplay between co-transcriptional constraints, DNA instability and co-transcriptional RNA processing could also contribute to explain the evolutionary success of some mobile elements, like the Alu elements in primates. Alu elements, belonging to a class of retroelements termed SINEs (short interspersed elements), contribute to 11% of the human genome. Despite the fact that the expansion of these elements increases the size of transcribed genomic regions and is a threat to the hosting genome [54-56], Alus may advantage the hosting genome by reducing transcription-mediated genomic instability. Indeed, it is now well established that Alu elements provide both polyadenylation and splicing sites [57-60]. This means that the genotoxicity of Alu elements might be counterbalanced by the ability of these elements to spread RNA processing sites within their host genomes. It is interesting also to note that Alu sequences are co-transcriptionally wrapped up by RNA processing factors and favor nascent RNA folding, which may collectively help to take Alu-containing nascent RNAs off chromatin [61, 62]. Although it is known that T-rich sequences are important for Alu element insertion , we do not know yet the rules that could direct the insertion of Alu elements into specific DNA locations. Could Alu elements be preferentially inserted where co-transcriptional biophysical constraints are creating genomic instability? Alu elements could take advantage of the single-stranded DNA in co-transcriptional R-loops. T-rich sequences that induce RNA polymerase pausing  could increase the likelihood of Alu insertion, especially if they are downstream of GC-rich sequences that increase the likelihood of co-transcriptional R-loops . Insertion of Alu elements within these unstable DNA regions would bring pseudo-RNA-processing sites which would bring RNA processing-mediated genome stability. The evolutionary success of Alu elements might therefore be due to their ability to disseminate alternative (or pseudo-) RNA processing sites within their host genomes, as these elements simultaneously stabilize transcribed-genomic loci and increase the molecular diversity generated from these loci .
Mutations generated by the transcription-mediated mutational process could next be filtered during evolution based on the gene products’ functions. A mutation may alleviate a co-transcriptional constraint but disturbs the function of the gene product and give rise to deleterious biological consequences. However, the flexibility of RNA processing pathways  could “buffer” the potential deleterious effects of newly generated RNA processing sites on gene product functions. Indeed, newly generated and weak alternative RNA processing sites could be used in cases of “emergency,” when an RNA polymerase gets stuck in a locus. The strength of RNA processing sites in a given locus therefore likely relies on the evolutionary-controlled equilibrium between the sites’ effects on the stability of the locus and on the functions of its gene products. Having focus above on co-transcriptional biophysical constraints, I will next address the potential interplay between biophysical constraints occurring during translation and the evolution of coding gene sequences.
Co-translational biophysical constraints shape coding sequences
Formation of RNA and protein aggregates is a threat to the cell homeostasis
Before describing how co-translational biophysical constraints might shape coding sequences over evolutionary time, it is important to first highlight that one of the main pitfalls of the gene expression process is the formation of toxic RNA and protein aggregates. Aggregation can result from the increase of the local concentration of proteins and RNAs since their physicochemical properties make them prone to form aggregates. For example, protein aggregates can be seeded by increased local protein concentration, which might be critical at the protein production site and because many proteins contain aggregation-prone intrinsically disordered regions [66, 67]. Protein aggregates can also be initiated by protein unfolding during translation, as peptides emerging from ribosomes can form “spurious” contacts with peptides from the same nascent polypeptide [66, 67]. Likewise, the physicochemical properties of RNAs, their ability to interact with each other through base pairing, and their ability to interact with RNA-binding proteins that contain aggregation-prone intrinsically disordered regions, make the RNAs prone to form aggregates [68, 69]. Therefore, the main question is not why proteins and RNAs form aggregates, but is rather what the mechanisms preventing the formation of aggregates are in the crowded interior of the cells?
One straightforward mechanism is RNA cleavage. For example, it has been shown that co-translational unfolding of nascent proteins can induce co-translational mRNA cleavage. This has been observed during the endoplasmic reticulum stress response where an mRNA can be co-translationally cleaved if it is in the process of producing an unfolded nascent protein that is translocating into the endoplasmic reticulum . A similar process occurs widely, free in the cytoplasm, where nascent protein unfolding can induce co-translational mRNA cleavage and thus translation arrest [71, 72]. In addition, several translation-associated processes, including nonsense-mRNA mediated decay, “no-go” decay and the nonstop decay pathways, induce mRNA cleavage if a specific step of the protein synthesis process (e.g. translation termination) is inefficient [73, 74]. It has also been recently demonstrated that synonymous codons affect the kinetics of translation elongation, which impacts co-translational mRNA cleavage [74-76]. Therefore, mRNAs can be co-translationally cleaved when their translation is inefficient or results in the synthesis of nascent proteins that initiate aggregate formation.
While RNA cleavage is often associated with RNA degradation, there is increasing evidence that cleaved RNAs can give rise to small functional RNAs. There are indeed numerous examples of mature coding and noncoding RNA molecules being cleaved by endoribonucleases and giving rise to small RNAs that regulate diverse biological processes [77-82]. An emerging concept is that cleavage-derived small RNAs are involved in feedback loops, and allow cells to “fight” potentially toxic RNAs. This is illustrated by the piRNA pathway [83-87]. piRNAs were originally identified as small RNAs that are cut out of transcribed retrotransposons. The cleavage of retrotransposon RNAs decreases their ability to invade their host's genome [88, 89]. In addition, the retrotransposon-derived piRNAs are loaded onto proteins of the Argonaute family, which directs the cleavage of any transcripts that contain retrotransposon-complementary sequences [88-91]. But these small RNAs can also target the genome regions that produce their precursors and induce targeted transcriptional gene silencing [83, 85, 87, 88]. Interestingly, the piRNA pathway is not restricted to retrotransposon-derived RNAs as (i) piRNA-like molecules can also be derived from pseudogenes, 5′- or 3′-UTRs and even coding mRNA sequences [91-95]; (ii) piRNA-like molecules have also been shown to post-transcriptionally regulate mRNAs [91, 94]; (iii) piRNA-like molecules have been shown to target genomic regions that do not contain retrotransposons [96-100]. The piRNA pathway illustrates how toxic RNAs (e.g. retrotransposon RNAs) can be cleaved and trigger the biogenesis of small RNAs that next contribute to targeted-RNA cleavage (i.e. post-transcriptional gene silencing, PTGS) or transcriptional gene silencing (TGS). Both pathways can be described as feedback loops since they inhibit the production or accumulation of the toxic RNAs (Fig. 2).
The question now to be addressed is whether small RNAs deriving from potentially toxic precursor RNAs impact on the precursors’ DNA sequences as part of a cellular process that “fights” against genome-generated toxic RNAs? This possibility is supported by the direct and indirect roles of small RNAs in chromatin and DNA biochemical modifications, DNA repair, and DNA sequence modifications and recombination at targeted loci, as described in the previous part.
In summary, (i) co-translational events can trigger RNA cleavage; (ii) cleaved RNAs can be further processed into small RNAs; and (iii) small RNAs can impact on DNA stability. The next section describes how these molecular pathways could work together to provide a molecular framework to link co-translational biophysical constraints to directed-mutations within coding sequences.
One sequence can impact on DNA, RNA, and protein features
Protein coding sequences are clearly shaped by functional constraints depending on the amino acid chain sequence. However, there is now clear evidence that the biophysical processes of protein synthesis and folding also contribute to shape coding sequences, as even synonymous sites appear to be under evolutionary constraints [101-105]. It is believed that synonymous sites that are neutral at the amino acid chain level, are not neutral in terms of quantitative and qualitative parameters of biophysical processes like protein synthesis and folding. The preference for specific synonymous codons depends on their effects on RNA secondary structures, and on the fact that they determine the nature of the anti-codons (tRNAs) to be used during translation. Both RNA secondary structure and codon usage impact on translation kinetics and protein folding [66, 103-105]. Therefore, coding sequences could be selected over evolutionary time, based not only on the encoded amino acids, but also based on their impact on co-translational biophysical processes. Could these co-translational biophysical processes drive genetic variations?
I propose that co-translational biophysical constraints that cause nascent protein mis-folding and aggregate formation, trigger co-translational mRNA cleavage. Cleaved mRNAs could then initiate the biogenesis of small RNAs that next target the loci they originate from and increase the local mutational rate of the targeted loci. A special class of nucleic acid sequences, namely G-rich tracts, illustrates how such a molecular framework could be straightforward.
G-rich DNA or RNA strands can form G-quadruplexes that are topologically polymorphic secondary structures. These structures that are mutation hotspots impact several processes at the DNA, RNA and protein level (Fig. 3, left panel) [106-115]. Because of the G-quadruplex features, RNA molecules containing G-quadruplexes may allow a feedback from translation to DNA mutational processes. Indeed, if a co-translational event is inefficient or altered (e.g. if a nascent peptide is misfolded and initiates aggregation), there is a probability that translationally repressed mRNAs will be co-translationally cleaved in the vicinity of structures, like G-quadruplexes, that reduce the motion of ribosomes and that can act as translational roadblocks (Fig. 3, right panel) [106, 111-115]. These RNA structures may help the recruitment of endoribonucleases and, alternatively, the translationally repressed mRNAs might be cleaved anywhere and trimmed by exoribonucleases that can be blocked at stable RNA secondary structures, like G-quadruplexes [116-119]. Therefore, co-translational biophysical constraints may result in the production of G-quadruplex-containing RNA fragments and mRNA-derived small RNAs. Accordingly, G-quadruplexes are involved in regulating the biogenesis of small RNAs, such as piRNAs and they are present in several kinds of cleavage-derived small RNAs [116, 119, 120]. Therefore, RNA secondary structures like G-quadruplexes might play a role in the biogenesis of small RNAs from precursor RNAs by impacting on RNA cleavage, by protecting RNA fragments from exoribonucleases, or by initiating the biogenesis of small RNAs.
G-quadruplex-containing small RNAs may next target the genomic loci they originate from, by base pairing with nascent RNAs or with strands of opened DNA. Indeed, it has been shown that G-quadruplex-containing RNAs can form stable RNA:DNA hybrids, that is, where G-quadruplexes are made of half RNA and half DNA [109, 121, 122]. These RNA:DNA hybrids form R-loops, which can lead to DNA instability. G-quadruplex containing RNAs may also direct enzymes like DNA-editing enzymes (e.g. Activation-induced cytidine deaminase) to targeted loci. Indeed, it has been shown that after transcription and splicing, the lariats produced from immunoglobulin gene introns that contain repeated sequences (i.e. the switch regions) are de-branched and used for the biogenesis of G-quadruplex-containing small RNAs. These small RNAs can be bound by Activation-induced cytidine deaminase and guide the enzyme to the genomic intronic switch-regions in a sequence-specific manner . The interaction between the G-quadruplex-containing RNAs and one of the DNA switch-region strands may lead to the formation of R-loop structures, within which the single-stranded DNA is the preferential substrate of the Activation-induced cytidine deaminase's enzymatic activity . Deaminated DNA next engages the base excision and mismatch repair machineries to generate double-stranded DNA breaks, which creates genetic variability within the immunoglobulin loci [13, 123]. This mechanism also occurs outside the immunoglobulin loci (“off-targets”) in DNA regions that can form G-quadruplex structures [13, 123]. In conclusion, RNAs containing G-quadruplexes can direct genetic variability.
Other specific DNA and RNA sequences and structures likely play a similar role to G-quadruplexes. Of particular interest are short tandem repeats like trinucleotide repeats that are involved in many genetic diseases. These sequences generate biophysical constraints during DNA replication and transcription and are highly mutagenic [124-126]. They also contribute to the formation of structured RNAs and can induce ribosome stalling during the elongation phase of translation [127-130]. Remarkably, mRNAs containing trinucleotide repeats can be cleaved and initiate the biogenesis of small functional RNAs [131-133]. Finally, RNAs containing trinucleotide repeats can form DNA:RNA hybrids or triplexes in trans . In conclusion, some sequences (e.g. G-quadruplexes, trinucleotide repeats) have features that could allow a straightforward feedback from translation to directed-genetic variation.
Do co-translational physical constraints drive DNA sequence co-evolution?
In the molecular pathway in Figs. 2 and 3, biophysical parameters impacting on co-translational events can be modified during evolution not because of randomly located mutations but because co-translational events trigger co-translational cleavage of mRNA and the biogenesis of small RNAs that next increase the local mutational rate of targeted loci. A consequence of this model is that protein chaperones that help protein folding during translation should “buffer” this mutational process (as the proteins involved in RNA processing do, see part 1). Recent evidence has shown that protein chaperones, like heat shock proteins (HSPs) that help protein folding during translation, couple protein and RNA homeostasis, and are involved in piRNA biogenesis pathways, impact on genome evolution [135-140]. It has been suggested that on the one hand HSP chaperones buffer mutations, as they allow some protein sequence variations by helping protein folding, and on the other hand, the HSP knockdown induces the apparition of de novo mutations [139-146]. An interesting possibility is that the absence of HSP chaperones increases co-translational aggregate formation and results in the production of mutagenic small RNAs (Fig. 4A).
Consequently, proteins interacting with a nascent polypeptide could also contribute to directed-mutations by impacting co-translational protein folding. Indeed, in contrast to what is often believed, many events affecting proteins occur during translation, which includes protein-protein interactions [147-149]. If a mutation affects a protein A that interacts with a nascent protein B and alters its folding during translation, this could trigger the mutational process described above and lead to mutation of the protein B encoding gene (Fig. 4B). Therefore, if biophysical constraints trigger mutations that alleviate the initiating constraints, these mutations can in turn create other constraints anywhere within interacting networks, which will trigger mutations in other genes. Consequently, the interplay between biophysical constraints and mutational processes could explain the evolution of protein interaction networks.
Evolution of coding sequences may not just be fuelled by random mutations. Mutational processes may also be triggered by co-translational biophysical parameters that can feedback on DNA sequences through the biogenesis of structured and mutagenic small RNAs.
A widespread driving force shaped in a species-specific manner
Genome defence systems and RNA-mediated genome evolution are the two faces of the same coin
If gene expression-generated biophysical constraints drive genetic variations, this process is likely to be ubiquitous. First, co-transcriptional and co-translational biophysical constraints rely on the physicochemical properties of nucleic acids and proteins. Second, transcription and translation are universal. Finally, this concept relies on the basic notion (that could apply to any cell) that some expressed-genomic sequences are toxic (Box 1) and challenge the integrity of the cell and its genome. Since every living organism has evolved specific molecular pathways to “fight” parasitic nucleic acids, which relies on RNA-guided immunity, nucleic acid cleavage or editing, it would be expected that each organism uses the same (or similar) molecular pathways to fight both parasitic nucleic acids and endogenous toxic sequences [83-87, 150-152].
The porous nature of the frontier between “self” cellular nucleic acids and endogenous or exogenous parasitic “non-self” nucleic acids supports the notion of an interplay between genome defence systems and RNA-mediated genome evolution. Indeed, it was believed that cellular RNA decay and small RNA biogenesis pathways were distinct pathways, the first one being involved in “self” RNA degradation and the second one being used to fight “non-self” parasite RNAs. However, there is now considerable evidence that both pathways are tightly connected [83-87, 153, 154]. For example, and as already mentioned, piRNAs that allow the cell to fight against retrotransposon invasion are produced from retrotransposons, coding genes and pseudogenes. In addition, cellular RNAs are massively edited, as parasite RNAs are, and can be used to generate RNAs that activate the immune response . The L1-ribonucleoprotein particle, which is responsible for the genomic insertion of retrotransposon-derived RNAs, can also create processed pseudogenes when it allows genomic integration of mRNA sequences (self RNAs) after reverse transcription [156, 157]. Collectively, these observations suggest that cellular “self” RNAs can at some points get entangled in the biological pathways normally used to fight parasite “non-self” nucleic acids (Fig. 5). Extrapolating from the cross-talk between anti-parasitic nucleic acid pathways and cellular RNA metabolism pathways would explain how and why some cellular RNAs could become “mutagenic.”
The genome defence systems against parasitic nucleic acids and the RNA-mediated genetic variations of endogenous toxic genomic sequences could actually be two faces of the same coin as these processes both just remove or modify toxic sequences. As a consequence, the evolutionary trajectory of a genome would directly depend on the parasitic nucleic acids it met. This means that although biophysical constraint-mediated genome evolution could be a widespread driving force, the precise molecular pathways that drive genetic variations could be specific to each organism depending on the precise genome defence systems it has. While cells may take advantage of molecular pathways involved in fighting parasitic nucleic acids to modify their own genome, it is interesting to underline that the eukaryotic gene expression process has recently been proposed to be shaped by the “combat” against parasitic nucleic acids .
Are RNA-directed genetic variations triggered by co-transcriptional and co-translational biophysical constraints interconnected?
In the first section, I proposed that co-transcriptional biophysical constraints shape gene architecture. In this context, it is interesting to underline that genome-wide waves of transcription occur during the development and differentiation of male germ cells. These genome-wide waves of transcription are the consequences of the genome-wide epigenetic “reprogramming” occurring during male germ cell development and differentiation [159-162]. As a consequence, male germ cells produce the most complex set of coding and noncoding transcripts and alternative splicing variants [159-162]. Therefore, male germ cells may experience extensive transcription-mediated genomic instability that could explain the large-scale apoptosis of immature sperm cells [4, 163, 164]. Another consequence of transcription of genome-wide waves in male germ cells is the expression of a wide variety of retrotransposon-derived RNAs, which results in the activation of the piRNA pathway [159-162]. As described in section 1, the expression of retrotransposon-derived RNAs (e.g. Alu) and the genomic insertion of these elements could, at some point, help male germ cells to alleviate local co-transcriptional constraints and therefore increase the survival of the germ cell with this particular kind of de novo mutations.
However, de novo mutations could generate deleterious gene products. Mattick et al. recently described a molecular pathway by which de novo mutation filtering could be performed during male germ cell development. In this scenario spermatogonia die if they have mutations that do not pass molecular “quality control” . This filtering of de novo mutations during gametogenesis likely reduces transmission of deleterious genetic variants to the next generation.
If de novo mutations, resulting from insertion of Alu elements within an intronic locus pass the spermatogenic “quality control,” then this could result in the Alu element exonization being weakly recognized as an alternative exon . The buffering activity of the splicing process would first decrease the likelihood of generating deleterious gene products (i.e. weak splice sites are more often missed). However, if co-transcriptional physical constraints “push” toward the acquisition of stronger RNA processing sites, this would lead to the increase in the inclusion rate of new exons (e.g. Alu exons) during splicing. Meanwhile, the newly included exons would create constraints during translation (e.g. nascent protein mis-folding). These co-translational constraints would, in turn, trigger mutations in the corresponding coding sequences through G-quadruplex-containing small RNAs deriving from Alu sequences. Indeed, retrotransposons are prone to form G-quadruplex structures and they may have contributed to the spread of G-quadruplex structures within genomes [165, 166]. Therefore, some retrotransposons (e.g. Alu) may not only spread RNA processing sites (see first part) but also G-quadruplexes that could help retrotransposon-derived exons to rapidly evolve as coding exons through the mutational process relying on co-translational biophysical constraints. Therefore, Alu-derived exons could evolve to encode peptides that do not create deleterious constraints during translation. One general prediction of this model is that biophysical parameters impacting on co-translational events evolve together with biophysical parameters impacting on co-transcriptional events. This would imply the existence of a relationship between codon optimization, translation, and transcription, as was recently suggested [167-169].
In addition to favoring germ cell survival, by decreasing transcription-mediated genomic instability, some de novo mutations could provide germ cell growth advantage. Massive production of undifferentiated spermatogonia and their large-scale apoptosis is thought to reflect a spermatogonial selection process (“selfish spermatogonial selection”) [4, 163, 164]. Recent evidence indicates that many evolutionarily new genes are specifically expressed first in the testis. This has led to the view of the testis as a “nursery” for new gene products and the view that genes can emerge dependent on testis-specific function (the “out of testis” hypothesis) [160, 170, 171]. Therefore, not only do de novo mutations occur during gametogenesis but some of them might be filtered and eventually selected during this process.
Conclusion and outlook
If gene expression-generated biophysical constraints direct genome evolution, then organismal evolution may not just be fuelled by random mutations. First, some de novo mutations would preferentially occur in an RNA-directed manner in genomic regions that generate constraints or some kind of toxicity when they are expressed (Box 1). Second, de novo mutations would not randomly occur with respect to the biological effects they may drive. Indeed, small RNA-directed genetic variations would start because genomic sequences are toxic and it would end when the sequences are no longer toxic. Different experimental designs could be developed to test the proposed hypothesis (Box 2).
This box aims at describing some experimental settings that could help test the proposed hypothesis, focusing first on prokaryotic cells and next on eukaryotic cells. Since translation is coupled to transcription in prokaryotic cells, modifying optimal toward non-optimal codons in highly transcribed genes, is expected to increase local genomic instability. This would favor the re-emergence of optimal codons that would “synchronize” the dynamic of transcription and translation, therefore decrease genomic instability. Likewise, it may be possible to engineer coding sequences leading to the biogenesis of proteins having a high probability to misfold when emerging from ribosomes. Co-translational (therefore co-transcriptional) misfolding of nascent proteins is expected to increase local genomic instability, which would end up when new sequences alleviating protein misfolding emerge. Since RNA processing is coupled to transcription in eukaryotic cells, it might be possible to insert within gene bodies (e.g. in introns) DNA sequences that create constraints during transcription (e.g. R-loop prone sequences). These sequences would increase the local DNA instability, which would favor the emergence of RNA processing sites. It might also be possible to engineer an eukaryotic cellular model in which the folding of a protein can be challenged during translation in a controlled manner. A prediction resulting from the proposed hypothesis is that small RNAs corresponding to pieces of the parent mRNAs should be detectable, as should an increase of the mutational rate of the corresponding locus. While the experiment settings described above are expected to be associated with genetic variations at targeted loci, the characterization of the underlying mutational processes involved in prokaryotic and eukaryotic cells would allow to decipher whether genetic variations are random or can be driven by dedicated processes. The interplay between antiparasite genome defence systems and RNA-directed genetic variations could be addressed in unicellular organisms by exposing them to different stressful environments, when their genome defence systems are either active or inactive. One prediction is that mutation-mediated organismal adaptation would be strongly impaired in cell without an efficient antiparasite genome defence system.
Since the same gene expression-generated constraints exist in genomes of different individuals from the same species, mutations in certain genomic locations could recur frequently amongst individuals. This is in contrast with random mutations that have a low probability of occurring several times at the same location. The high frequency of these constraint-derived mutations in a population would increase their penetrance. RNA-directed mutation process may also explain why some mutations are more frequent than others and have been recurrently generated during evolution [172-175]. It is also interesting to note that many disease-associated mutations often affect RNA processing sites .
What could be the link between gene expression-generated biophysical constraints directing genome evolution and phenotype? First, it cannot be excluded that the cellular micro-environment can impact on biophysical parameters (e.g. co-translational protein folding) which could trigger a mutational pathway that would end up alleviating the environment-mediated constraints. Therefore, modifications of the cellular environment could trigger a mutational process that increases the likelihood of generating genetic variants impacting gene products involved in the environment-mediated constraints. Related to this phenomenon, I recently proposed that mutations in cancer cells might be directed and adapted to the tumoral micro-environment. This would help explain why most (if not all) anticancer therapies failed because of tumor cell resistance .
The concept of toxicity used throughout this manuscript must be understood in a general sense and covers several notions. For example, transcription is toxic to DNA because it can cause DNA breaks. RNAs also have some degree of toxicity to DNA since they can re-hybridize to DNA and generate R-loops that are genotoxic. RNAs produced from some genomic repeated elements, like retrotransposons, are toxic to genomes because they can invade it and alter their functioning by interfering with the activity of genes. RNAs are also potentially toxic because of their ability to form molecular aggregates that are toxic to the cell as it is now clearly established in a variety of pathologies. Similarly, protein biogenesis can lead to the formation of toxic molecular aggregates that can be linked, for example, to nascent protein misfolding. The concept of toxicity used in this manuscript relies therefore on the notions that (i) a genome is used by a cell to produce molecules, RNAs, some of which being used to produce proteins; (ii) these biogenesis processes are potentially “dangerous” for the cell because the act of production of these molecules or these molecules themselves can create deleterious biophysical constraints within the crowded intracellular environment.
Assessing the presence of a ventromedian muscles from classical morpho‐histological data is challenging, as they are often of insufficient resolution to observe axochord‐like structures, which are frequently of minute size. Facilitating our search, however, a large dataset of phalloidin stainings covering virtually all bilaterian phyla has been produced in the last 20 years, allowing widespread testing for the presence of axochord‐like structures in bilaterians.
The axochord is conserved across annelids
The first implication of our hypothesis is that the axochord must be an ancestral annelid feature. Annelids are a highly diverse group, for which the internal phylogeny has been recently clarified by phylogenomics 22, 23, making it an ideal test case. Phalloidin stainings have been published for 14 families, covering both main annelid clades (Errantia and Sedentaria) and two families that likely diverged earlier (Oweniidae 14, 24 and Magelonidae 25). Axochord‐like ventromedian muscles have been observed in virtually all of them, and usually serve as attachment bands for transverse muscles. Axochords are always composed of a pair of longitudinal myofibers closely flanking the midline, which contact each other in the main part of the trunk, but diverge at their anterior and posterior extremities (behind the mouth and in front of the anus). The degree of terminal divergence is modest in Platynereis and most other genera, but more extensive in Pomatoceros26. In Prionospio, both myofibers closely flank the midline, but do not actually touch each other; in this configuration, the corresponding muscle has been called “paramedian muscle” 25.
We hypothesized that the paramedian configuration can be developmentally explained by incomplete convergence toward the midline of axochord‐like precursor cells during early development. We tested this hypothesis by studying an annelid known to possess such a paramedian muscle at early larval stages 27: C. teleta, a model species belonging to Sedentaria 22. Phalloidin stainings revealed that the previously documented paramedian muscle fibers converge in late development and form a proper axochord before hatching 14 (Fig. 2B). Axochord development thus underwent a heterochronic shift between Platynereis and Capitella: in Capitella, axochordal cells first form differentiated myofibers and then converge, while in Platynereis those events happen in the opposite order (Fig. 2A and B). Gene expression data for all axochord markers investigated (brachyury, foxA, netrin, slit, hedgehog, and twist2) are consistent with expression in the Capitella axochord 14, 28, 29. The Capitella data thus confirm conservation of at least part of the axochord/notochord molecular signature within annelids, and provide a possible mechanism for the evolutionary transition between paramedian and ventromedian configurations. Finally, in a subgroup of Sedentaria (Clitellata, which include earthworms and leeches), the entire body is surrounded by a continuous longitudinal muscle layer 30, complicating observations. However, in earthworms and leeches, a distinct ventromedian longitudinal muscle (called “epineural muscle” or “capsular muscle” 31, 32) is present immediately above the ventral nerve cord and below the ventral blood vessel – thus representing a bona fide axochord. Like the Platynereis axochord, the epineural muscle is firmly embedded within the ventral nerve cord sheath. Its contractions are thought to allow deformation of the nerve cord in concert with body shape changes during peristaltic motion. Molecular data on clitellates are scarce, but the ventromedian myofibers of leeches have been reported to express the specific intermediate filament‐encoding gene hif‐3, which is absent from lateral longitudinal muscles 33.
Only two annelid families clearly lack an axochord: Sphaerodoridae 34 and Sipunculidae 35.
The most parsimonious ancestral state for annelids is the presence of a canonical axochord, composed of two adjacent longitudinal myofibers flanking the midline, with attached transverse muscles (Fig. 2C). Importantly, conservation of a stereotypical axochord is compatible with the huge variety of annelid lifestyles and morphologies, including sessile suspension‐feeders, errant bottom‐dwellers, burrowers, and undulatory swimmers.
Conclusions about other phyla face two main limitations: for most, the internal phylogeny is still under debate (apart from annelids, molluscs, arthropods, and chordates), and the interrelationships of the phyla themselves (i.e. the higher‐order bilaterian phylogeny) remain partially unresolved. While “Chordata,” “Ambulacraria,” “Ecdysozoa,” and “Spiralia” seem stable, their internal branching is more contentious. Moreover, the bilaterian phylogeny is strongly dichotomous: the general structure of the animal phylogenetic tree seems closer to successive symmetrical bifurcations between equally large groups, than to successive branching of individual phyla from one stem – hence producing a “balanced” or “symmetrical” phylogenetic tree 36, 37. In such a tree, there are no strategically located “basal” branches that would carry higher weight on the inferred ancestral states at key nodes, and conclusions can only be reached after examination of a broad sample. With these caveats in mind, a survey of the available data allows some insights into musculature evolution and the possible ancestrality of ventromedian muscles.
The axochord is conserved across non‐Annelid spiralians
Annelids are part of the superphylum Spiralia, which includes both large coelomate animals and small acoelomate groups of the interstitial fauna (or “platyzoa,” which are likely not monophyletic 21) 38. One additional microscopic phylum, Cycliophora, exclusively lives as a commensal on the mouthparts of lobsters 39. Strikingly, axochord‐like ventromedian muscles have been described in both coelomate and acoelomate spiralians. For ancestral state reconstruction, we will use a recent phylogeny of Spiralia 21, which proposes that this clade is composed of three monophyletic groups: Lophotrochozoa (containing all coelomate spiralians), and two acoelomate groups: Rouphozoa and Gnathifera.
An axochord is present in molluscs, brachiopods and nemerteans
In molluscs, a ventromedian muscle composed of adjacent paired fibers has been described in the larvae of Aplacaphora (Wirenia argentea) and Polyplacophora (Leptochiton asellus and Mopalia muscosa) 40 (Fig. 3A), where it serves as an attachment point for transverse muscles. Together, Aplacophora and Monoplacophora form a clade considered the sister‐group of all other molluscs 41, 42. The ventromedian and transverse muscles exist only transitorily during mollusc development, and have been speculated to represent recapitulative instances of ancient structures 40.
The axochord in spiralians. The tree follows 21. A: Trochozoa. Annelid after 14, molluscs after 40, brachiopod after 43 (and personal communication of Dr. Andreas Altenburger), nemertean after 46. B: Cycliophora, Rouphozoa and Gnathifera. Cycliophoran...
In brachiopods, ventromedian myofibers have been detected in the ventral midline of the early three‐lobed larvae of Argyrotheca and Terebratalia (43 and Dr. Andreas Altenburger, personal communication; Fig. 3A and Supp. Fig. S1). Conserved expression of the annelid axochord markers mox, foxD, and noggin has been reported in a stripe of ventromedian mesoderm in Terebratalia, suggesting conservation of the axochord molecular profile between annelids and brachiopods 44. At very late larval stages, only faint phalloidin stainings are visible in the ventral midline 44, 45, which suggests the Terebratalia axochord might grow at a smaller rate (or regress) compared to other ventral muscles; its earlier presence is however unambiguous (43 and Fig. S1).
In nemerteans, a ventromedian muscle (without transverse fibers) is among the first muscles to form in the embryo of Prosorhochmus46 (Fig. 3A).
One acoelomate phylum has been tentatively assigned to Lophotrochozoa 15: the minute cycliophorans. In this group, planktonic larvae possess a hugely expanded and vacuolized ventromedian muscle: the “chordoid organ” 39, 47, 48 (Fig. 3B). Its function in a larva that moves primarily by ciliary beating is unclear: its role might be maintaining body shape (as both the axochord and the notochord do 14), and in particular bracing the midline when ventrolateral muscles contract during turning. The chordoid organ cells contain circular myofilaments, organized as “ring fibers” surrounding the vacuoles. The peculiar orientation of these fibers might be a consequence of vacuolization (see below).
Ectoprocts and entoprocts, which lack unambiguous dorsal and ventral sides, are not considered here, as their strongly modified bodyplan precludes comparisons.
Axochords have a mosaic presence in gnathiferans (rotifers, gnathostomulids and micrognathozoans) and rouphozoans (platyhelminthes and gastrotrichs)
In interstitial phyla, paired myofibers closely flanking the ventral midline have also been reported – for example, in the trunk of the rotifers Proales daphnicola49 and Brachionus urceolaris50, of the gastotrich Xenotrichula intermedia51, and of the gnathostomulid Gnathostomula peregrina52 (Fig. 3B). As in annelids, they diverge anteriorly and posteriorly, and some other species display the same fibers in a more divergent, paramedian configuration. Variable degrees of convergence (ranging all the way from ventromedian to paramedian) can coexist within the same genus – for example, Proales (rotifer) 49, or Xenotrichula (gastrotrich) 51. Despite these differences, this muscle has been recognized as clearly being the same under both configurations (from its position, general morphology and connections) in the descriptions of these genera. This suggests that, as in annelids, the transition between ventromedian and paramedian muscles is easily achieved by complete versus partial convergence processes. The adaptive significance for these varying degrees of convergence is unclear.
In Limnognathia maerski, the only species of the small gnathiferan phyla Micrognathozoa, transverse muscles are attached to a paramedian muscle, which itself is attached to the posterior border of the pharynx 53 (Fig. 3B) – hence displaying connection properties similar to the ventromedian/paramedian muscles of other gnathifers and of annelids. We hypothesize here that there is homology between those midline‐flanking paired longitudinal muscles across Spiralia: while varying degrees of convergence can result in slightly different morphologies, their connection properties are conserved, as their molecular profiles should be – allowing eventual testing of this hypothesis by expression profiling.
The axochord likely represents an ancestral spiralian feature
An annelid‐like axochord has been reported for the majority of spiralian phyla, and usually serves as an attachment band for repeated transverse muscles. The fact that the axochord is a sometimes transient feature of early development supports its ancestral presence in Spiralia and argues for evolutionary transitions from ancestral muscular systems based on antagonism between ventromedian, transverse, and ventrolateral myofibers (possibly already surrounded by a circular layer 55), to worm‐shaped peristaltic forms relying exclusively on continuous longitudinal and circular layers (e.g. the adults of some large nemerteans), or to sessile lophophorate forms (e.g. adult brachiopods). A temporary embryonic/larval axochord might persist by sheer phylogenetic inertia, or it might still fulfill transient function, such as larval locomotion or signaling.
Is an axochord conserved in Ecdysozoa?
The internal ecdysozoan phylogeny is still unclear 48, 49, 50, 56, 57. Three frequently proposed clades are Panarthropoda (onychophorans, tardigrades, and arthropods), Scalidophora (priapulids, kinorhynchs, and loriciferans), and Nematoida (nematodes and nematomorphs), and we follow this view here. Ecdysozoans are defined by the shared presence of a moulting exoskeleton 58 which, in several phyla, shows a tendency to become increasingly rigid and to replace muscles as supporting structures or as antagonists. Some degree of repeated muscle loss would thus be unsurprising in ecdysozoans. Nevertheless, some phyla have an axochord, and hypotheses on the evolution of ecdysozoan musculature can be proposed.
Ventromedian muscles are present in Scalidophora (kinorhynchs and loriciferans)
Paired ventromedian muscles serving as an attachment band for transverse muscles exist in the kinorhynch Antygomonas59 and in the Higgins larva of the loriciferan Armorloricus60 (Fig. 4A). Adult priapulids rely on antagonism between continuous longitudinal and circular layers around the body, as typical for burrowing worms 30, and the musculature of embryonic/larval priapulids is still incompletely known (but see Ref. 61 for a recent description of the Priapulus caudatus larval musculature with a mention of a ventromedian retractor muscle in the first lorica larva).
The axochord in ecdysozoans and deuterostomes. A: Ecdysozoans. Loriciferan after 60, kinorhynch after 59. Onychophoran reconstituted after 62. B: Deuterostomes and chaetognaths. Chaetognath after 14. Hemichordate reconstituted after after 77. Colors are...
Ventromedian muscles are present in onychophorans but not in tardigrades
Onychophorans have a hugely developed ventromedian muscle 62 (Fig. 4A) – which probably acts in bracing the body, and notably in preventing deformation of the ventral side (housing the ventral nerve cord) during hydrostatic expansion/retraction of lateral appendages. Unlike the annelid axochord, the onychophoran ventromedian muscle is not an attachment band for transverse muscles: onychophoran appendage muscles attach to the tegument, on a structure called the ventral organ 63. This argues against homology of onychophoran appendicular muscles to the annelid ventral oblique muscles moving the parapodia – consistent with the common assumption that Urbilateria lacked trunk appendages 64, and that different muscles might have been co‐opted or neoformed for appendage movement in different phyla. No ventromedian muscle is known in the tardigrade trunk (though a minute one is present in the foregut 65, 66).
The special case of the arthropod mesodermal midline glia: a modified axochord?
In line with the evolution of a sclerotized cuticle as a supporting scaffold, arthropods have been proposed to have undergone a massive reduction of their ancestral onychophoran‐like circular/longitudinal musculature, that lost its ancestral bracing function 67. Any ventromedian muscle (absent from all investigated arthropods) would plausibly have been lost in this process.
However, insects do possess a non‐muscular mesodermal midline: the so‐called “mesodermal midline glia” or “DM cells” (“dorsal median,” as they are positioned immediately dorsally to the central nervous system). At first sight, the Drosophila mesodermal midline glia seem to display a number of similarities to the axochord: they are present under the form of segmentally repeated pairs of cells immediately below the ventral nerve cord 68; are required for commissural axon guidance 69, express netrin70, and are specialized in matrix secretion 71 – including some common components with axochord and notochord (laminin) but also some that are not (collagen IV and the arthropod‐specific protein glutactin). Finally, the key defining transcription factor of the DM cells, the homeodomain protein mox/buttonless69, is also expressed in the Platynereis axochord (but not in the notochord) 14.
However, a number of key differences cast doubt on the homology of the mesodermal midline glia to the axochord: 1) DM cells, which have an elongated monopolar shape, extend long lateral processes in a transverse, rather than longitudinal, direction 68, 69 2) DM cells coexpress paraxis (CG12648/CG33557) 72 and engrailed73, which together represent a specific profile for the annelid ventral oblique muscles – which also express mox and netrin14. On the other hand, DM cells express none of the specific axochordal/notochordal transcription factors (such as brachyury and foxA). 3) The lateral processes of DM cells are anchored at the attachment point of the lateral longitudinal muscles on the body wall (muscle 7) 68. These connection properties are expected if they are equivalent to transverse muscles, which in annelids reach out to the ventrolateral longitudinal muscles (Fig. 2A) – but have nothing to do with those of ventromedian myocytes.
By their molecular profile, orientation and muscular connections, DM cells are more similar to annelid ventral oblique muscles than to the axochord, and it can be hypothesized that they are modified transverse muscles. In this hypothesis, if a ventromedian muscle was ancestrally present in panarthropods (as suggested by the onychophoran situation), it would have been entirely lost in Drosophila, and former transverse myocytes would have come to occupy the vacant mesodermal midline. This homology hypothesis is testable in several ways: while, as noted above, onychophorans lack annelid‐like transverse muscles, they might possess mox/en/netrin/paraxis+ DM‐like cells – which should attach both to the ventromedian muscle (lost in Drosophila) and to the lateral longitudinal muscles (as in Drosophila); the transverse muscles of kinorhynchs would be interesting to investigate in this respect.
The axochord has most likely been lost in Nematoida (nematodes and nematomorphs)
Nematoids have a near‐continuous longitudinal muscle layer surrounding the body. Unlike the clitellate configuration, this longitudinal muscle layer lies internal to the ventral nerve cord (though the nervous ganglia secondarily “sink” below the muscle layer by crossing it during nematomorph development 74). Conservation of the axochord molecular profile in nematoids is unlikely, because, at least in the model nematode C. elegans, several key axochord/notochord genes (including brachyury, colA, and soxE) have simply been lost from the genome. This makes it difficult to identify any potential axochord homolog, which might either have been lost or modified beyond recognition. Ventral longitudinal muscles of C. elegans still specifically express unc130/foxD75 and netrin76 (two ventral somatic muscle markers in Platynereis), but not foxA (PHA‐4) – showing that, while some general musculature patterning is recognizable in nematodes, a specific axochord homolog cannot be identified. The axochord might thus have been lost, together with transverse muscles, in conjunction with the evolution of the specialized nematoid locomotion, relying on the antagonism between ventro‐ and dorsolateral muscle blocks and an elastic cuticle 30.
The axochord has a mosaic presence in ecdysozoans
Of the three ecdysozoan clades, only scalidophorans can be inferred to ancestrally possess an axochord. The panarthropod ancestral state is undetermined: only onychophorans have a clear ventromedian muscle. Finally, nematoids possess a simplified musculature – and, at least in C. elegans, a simplified genome. The ancestral state for ecdysozoans therefore remains undecided. However, the clear presence of an axochord in at least three ecdysozoan phyla – and its inferred ancestral presence in the outgroups Spiralia and Chaetognatha (see below) – make the hypothesis of an ancestral ecdysozoan axochord attractive. To this ground pattern, scalidophoran‐like transverse muscles might be added. Molecular characterization of ecdysozoan ventral mesodermal cells will be key in testing these hypotheses.
A ventromedian muscle is present in Chaetognatha, a possible protostome outgroup
Chaetognatha is a relatively small (but very abundant) phylum (120 species) of worm‐shaped swimming invertebrates, which might have diverged before all other protostomes 78, 79. The chaetognath body comprises pairs of coelomic cavities surrounding the gut, separated along the midline by myoepithelial dorsal and ventral mesenteries 80. The body is almost entirely surrounded by strong longitudinal striated muscles. Flanking the ventral midline, directly connected to the ventral mesentery, are specialized longitudinal myofibers of triangular cross‐section, which present a unique type of striation, and are hence called “secondary muscles” 81. This distinguishes them from all neighboring ventrolateral longitudinal muscles, as this peculiar striation type is only present in two other locations in the body (laterally and in the dorsal midline). Their nature, orientation, triangular shape and connection to the ventral mesentery are reminiscent of the axochord. Moreover, like the axochord, the chaetognath ventromedian longitudinal muscle bifurcates behind the foregut 14 (Fig. 4B). The chaetognath ventromedian muscle might thus be an axochord homolog.
No transverse muscles are known. The weak circular smooth fibers of myoepithelial cells within the mesenteries provide some limited antagonism to longitudinal muscles, but they are incomparable in nature and position to the transverse muscles of other protostomes 80. Transverse fibers might have been lost during the evolution of the highly specialized chaetognath undulatory swimming, which is effected by dorso‐ventral antagonism; alternatively, they might have evolved only after chaetognaths branched off the protostome stem.