2UMR CNRS 5558, Laboratoire de Biométrie et Biologie Evolutive, INRIA Bamboo, Université Claude Bernard, Villeurbanne 69100, France
3Department of Computer Science, Colorado State University, Fort Colins, CO 80523-1873, USA
4Bioinformatics Unit, Structural Biology and BioComputing Program, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
Chimeric transcripts were confirmed using paired-end RNA-seq data, by mapping the RNA-seq reads to the sequence of each chimera and specifically, its junction sites . For humans, we utilized the Human Body Map 2.0 data generated on the HiSeq 2000 by Illumina in 2010. Briefly, the following mapping protocol was used to ensure that a read could be unambiguously assigned as a chimera and not to a specific location in the genome. First, we mapped the RNA-seq reads to the reference genome to identify which reads could be linearly assigned to genomic regions. Subsequently, we selected reads not mapped in this previous stage and attempted to map them to chimeric transcripts. Finally, we screened only for reads that mapped precisely to either side of the junction of the chimera, with at least six nucleotides inside the junction. In this way, 175 chimeric transcripts were confirmed by at least two RNA-seq reads covering the gene-gene junction site .
In the ChiTaRS database we provide detailed information about more than 500 unique breakpoints in cancers reported in the TICdb and dbCrid databases based on the Mittelman database (see "Breakpoints"). To the best of our knowledge, this is the first catalogue that enables cross-referencing between chimeric transcripts found in GenBank, relevant Pubmed articles about putative breakpoints, the two incorporated genes, respective genomic loci and RNA-seq evidences. Moreover, the entries in ChiTaRS (http://chitars.bioinfo.cnio.es/) are linked from the universal UniProt Knowledgebase system (UniProtKB), which contains a broad catalogue of information on proteins from laboratories around the world.
A bonus feature of our ChiTaRS is that it provides visualization of chimeric transcripts, and their genomic context, including the junction site. These figures were produced using the SpliceGrapher package, which was designed for analysis and visualization of RNA-Seq data . These figures highlight the genes on either side of a chimeric junction, making it possible to visualize the potential transcripts that could arise from each chimera.
In summary, ChiTaRS database  may be useful for the biologists looking for chimeras and their corresponding proteins, for the genome researchers interested in the regions of chromosomal aberrations and their DNA sequences as well as for the biomedical studies of protein fusions related to cancer translocations. This database represents a valuable tool for the large-scale study of chimeric RNAs, chimeric proteins and their potential functions in human cancers.
2 Robertson, H.M., et al. (2007) The bursicon gene in mosquitoes: an unusual example of mRNA trans-splicing. Genetics 176, 1351-1353.
3 Herai, R.H. and Yamagishi, M.E. (2010) Detection of human interchromosomal trans-splicing in sequence databanks. Brief Bioinform 11, 198-209.
4 Douris, V., et al. (2010) Evidence for multiple independent origins of trans-splicing in Metazoa. Mol Biol Evol 27, 684-693.
5 Pettitt, J., et al. (2010) The evolution of spliced leader trans-splicing in nematodes. Biochem Soc Trans 38, 1125-1130.
6 Allen, M.A., et al. (2010) A global analysis of C. elegans trans-splicing. Genome Res
7 McManus, C.J., et al. (2010) Global analysis of trans-splicing in Drosophila. Proc Natl Acad Sci U S A 107, 12975-12979.
8 Gingeras, T.R. (2009) Implications of chimaeric non-co-linear transcripts. Nature 461, 206-211
9 Li, H., et al. (2008) A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells. Science 321, 1357.
10 McManus, C.J., et al. (2010) Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Res 20, 816-825.
11 Pirrotta, V. (2002) Trans-splicing in Drosophila. Bioessays 24, 988-991.
12 Maher, C.A., et al. (2009) Chimeric transcript discovery by paired-end transcriptome sequencing. Proc Natl Acad Sci U S A 106, 12353-12358.
13 Birney, E., et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799-816.
14 Frenkel-Morgenstern el al (2012) Chimeras taking shape: Potential functions of proteins encoded by chimeric RNA transcripts, Genome Research, Published in Advance May 15, 2012, doi: 10.1101/gr.130062.111.
15 Rogers, M.F., Thomas, J., Reddy, A.S.N. and Ben-Hur, A. ,SpliceGrapher: Detecting patterns of alternative splicing from RNA-seq data in the context of gene models and EST data. Genome Biology 13:R4, 2012.
16 Frenkel-Morgenstern M, Gorohovski A, Lacroix V, Rogers M, Ibanez K, Boullosa C, Andres Leon E, Ben-Hur A, Valencia A.,ChiTaRS: a database of human, mouse and fruit fly chimeric transcripts and RNA-sequencing data. Nucleic Acids Res. 2013 Jan;41(Database issue):D142-51.