Proc Natl Acad Sci U S A. 2006 Jan 31;103(5):1364-9. Epub 2006 Jan 23.
Genomic fossils as a snapshot of the human transcriptome.
Shemesh R, Novik A, Edelheit S, Sorek R.
Processed pseudogenes (PPGs) are cDNA sequences that were generated
through reverse transcription of mature, spliced mRNAs and have
subsequently been reinserted at a new genomic location. These cDNA
sequences are usually no longer transcribed and are considered "dead on
arrival." Here we show that PPGs can be used to generate a map of the
transcriptome. By analyzing thousands of human PPGs, we were able to
discover hundreds of transcript variants so far unidentified. An
experimental verification of a subset of these variants by RT-PCR
indicates that most of them are still active in the human transcriptome.
Furthermore, we demonstrate that PPGs can enable the identification of
ancient splice variants that were expressed ancestrally but are now
extinct. Our results show that the genome itself carries a "virtual cDNA
library" that can readily be used to analyze both present and ancestral
transcripts. Our approach can be applied to sequenced metazoan genomes
to computationally annotate splicing variation even when expressed
sequences are unavailable.