The sequencing of complete genomes has confirmed the high frequency of LGTs in prokaryotes (Koonin et al. 2001; Ochman et al. 2000) and of genes duplications in Eukaryotes (Lynch and Conery 2000). Therefore it is potentially dangerous to deduce the phylogeny of the organisms starting from trees that are only based on a single gene (Page and Charleston 1997). The presence of multiple copies of a homologous gene in a single genome opens the way to recombination. But, if a gene becomes involved in recombination events, the most appropriate mathematical representation of its history would be a network, rather than a tree (i.e. a connected graph without cycles). Difficult to identify, the intragene recombination was recently demonstrated through the incongruent taxonomic distribution of indels in the enolase and in the inositol monophosphate dehydrogenase (IM-PDH), rendering the phylogenetic reconstruction of Eukaryotes based on these genes elusive (Bapteste and Philippe 2002; Keeling and Palmer 2001). In fact, recombination between paralogous or xenologous genes is probably more frequent than was previously thought (Archibald and Roger 2002), rendering the phylogenetic inference even more problematic (Posada and Crandall 2002).
The methods used to identify the LGTs (so-called indirect methods) are principally based on the non-normal nucleotide composition and the fact that the most similar sequences in the data banks did not belong to closely related species (similarity is normally estimated by a BLAST search). But these approaches are only poor indicators (Guindon and Perriere 2001; Koski and Golding 2001; Koski et al. 2001; Wang 2001). This is illustrated by the fact that four alternative methods, detected different groups of genes (Ragan 2001). The most likely explanation is that these indirect methods could potentially detect LGT of different ages (Lawrence and Ochman 2002; Ragan 2001). In consequence, albeit slow, the thorough phylogenetic approach represents the only direct detection method that should be used systematically (see below).
The underlying assumption of this method is that a phylogeny of organisms exists and can be represented in the form of a bifurcating tree. Because LGTs seem to be common and affect all genes (Koonin et al. 2001), it was suggested that "the history of life could not be correctly represented in the form of a tree" (Doolittle 1999c). Other authors have less radical opinions on the damaging impact of LGTs on the reconstruction of the evolutionary history. For example, it was suggested that LGTs could have played an important role only during a precellular stage of the evolution because it was hypothesised that "primitive entities were basically modular (loosely coupled) in construction", thus facilitating LGT events (Woese 2000). Alternatively, only a subdivision of the genetic materials could be affected by this phenomenon, excluding a veritable core of nontransferable genes (Jain et al. 1999). In any case, it seems that the LGTs likely represent an important evolutionary force in the diversification of the prokaryotes (Doolittle 1999b; Martin 1999; Ochman et al. 2000).
In order to examine the sheer existence of the phylogeny of organisms, several groups applied genomic approaches (Fitz-Gibbon and House 1999; Korbel et al. 2002; Lin and Gerstein 2000; Tekaia et al. 1999; Wolf et al. 2001). These methods are normally based on gene content (presence or absence) or the order of genes. The resulting phylogenies are more or less similar to the rRNA-based tree. For example, the monophyly of the three domains and of several other major groups (like animals, spirochaetes, Proteobacteria) is generally recovered. However, unexpected relationships, like that between Thermoplasma (Euryarchaeota) and the Crenarchaeotes (Korbel et al. 2002), demonstrate biases introduced in these approaches by frequent LGTs between phylogenetically distant taxa, but that are coexisting in the same environment (Ruepp et al. 2000). Another striking example is given by the hyperthermophilic bacteria, which probably acquired up to 24% of their genes from the Archaea (Aravind et al. 1998; Nelson et al. 1999). Furthermore, the convergent loss/gain of a comparable ensemble of genes due to similar physiological conditions, e.g. the adaptation to intracellular parasitism, can equally introduce biases. In conclusion, the methodology based on complete genomes should rather be considered as phenetic rather than as phylogenetic approaches (Doolittle 1999a; Wolf et al. 2001). Large-scale studies based on the concatenation of a great number of genes (Brown et al. 2001), despite their great potential for the inference of phylogenetic trees, can also be influenced by LBA artefacts, especially if they are applied to prokaryotes. In fact, if too many of the laterally transferred genes are included in the data sets, the results could correspond to a simple mean of the LGTs frequencies between the organisms, rather than being a correct indicator of the phylogeny of the organisms. Circular and therefore specious methodology could be applied in order to recover the tree in which one believes. For example, Brown and collaborators (2001) observed incongruities between their tree based on several fused genes and the one based on the rRNA. They obtained an increase of the congruence (e.g. the basal emergence of the hyperthermophilic bacteria) by eliminating the nine genes that did not recover the monophyly of the bacterial clade out of a total of 23. Such an approach based on only 14 genes does not represent convincing evidence in favour of the existence of a phylogeny of the organisms.
In order to avoid these limitations, our group recently developed a method that evaluates congruence among genes without a reference phylogeny (Brochier et al. 2002). Starting with 45 complete bacterial genomes, 59 markers implicated in translation (57 proteins and the 16S and 23S) were chosen, which were supposed to be excluded from LGTs (Jain et al. 1999), as well as 39 markers well known for being transferred (essentially tRNA synthetases (Woese et al. 2000), but also some ribosomal proteins (Brochier et al. 2000)). To estimate the congruence, each marker was described by a vector containing its likelihood values for an ensemble of representative topologies. These vectors were subsequently examined by a principal-component analysis (PCA). If two genes share the same evolutionary history, they furnish comparable support for the same topologies and will therefore group in the same area of the PCA. After eliminating stochas tic effects, 46 of the 59 markers a priori supposed to be nontransferable, and six of the 39 a priori supposed transferable formed a compact cloud of points. In consequence, the identification of a genuine core of 52 genes, that belong a posteriori to the nontransferred fraction of the genome among the 45 analysed species, argues strongly in favour of a phylogeny of organisms for the Bacteria. Furthermore, the phylogeny based on these proteins is in good agreement with that based on the two concatenated rRNA genes (Brochier et al. 2002). The PCA method also allowed detection of the very likely cases of LGTs that were not reported earlier (e.g. for the translation elongation factor EF-G).
We applied the same approach to 14 archaeal species and obtained very similar results: a core of nontransferred genes could be detected and the phylogeny based on their concatenation gives a good approximation of the phylogeny of the species (e.g. obtained by the concatenated rRNAs) (Matte-Tailliez et al. 2002). Without minimising the importance of the LGTs in Prokaryotes, these results nevertheless demonstrate that a universal tree of organisms really exists and that its inference represents the primary challenge of the molecular phylogenies. We will now focus on the recent progress made to resolve this challenge.
Was this article helpful?