A couple of weeks ago I was talking about foreign DNA and how to differentiate it from that of the host genome. Well, there were two other ways to identify foreign sequences that I was gonna talk about:
- Genomic dissimilarity
Genomes have another “odd” characteristic: they have a particular frequency for dinucleotide pairs, that is, the frequency of ATs toghether in Salmonella enterica is different from that in Shewanella amazonensisthe same applies for other pairs (AG, GT, CA,…).
When you compare whole genomic dinucleotide frequencies with that of a group of genes from the same organism they should give a very simmilar result (genomic dissimilarity low) so you can estate that the latter (the genes) belong to the former (the whole genome), however if you get high genomic dissimilarity, which means having two groups with very different dinucleotide frequencies, you can say that the genes you have compared with the whole genome might not belong originally to the core genome and might have been horizontally acquired. Or what is the same, three points in the acquired-o-meter!
- Codon and aminoacid usage
Another intrinsic characteristic of the genomes is the codon usage. As you might know, the same
aminoacid can be encoded by different codons (a string of three nucleotides encoding one aminoacid) that is what is called the degeneracy of the genetic code.
Have a look at this
So, again, each organism is going to have its preferences when encoding an aminoacid, for instance, most codons encoding Proline for organism A will be CCT, however, for organism B it might be CCA, this is called codon usage. If you compare the codon usage of the entire genome of your organism with that of a group of genes that you think might have been acquired, you can corroborate whether your hypothesis was true or not, Simmilar codon usage: possible origin in common; different codon usage: possible different origin (acquired).
There is a big drawback for this method: highly expressed genes usually show a different codon usage and they belong to the same genome. So you might get false positives using this method. Another thing to take into account is to check that your genes are in-frame, an extra nucleotide will provoke a frameshift and would completely alter your results.
I think that’s enough!
Viva la evolucion and the genetic exchange!