NAR Molecular Biology Database Collection entry number 1577
Shi-Jian Zhang1,#, Chu-Jun Liu1,#, Mingming Shi1,#, Lei Kong2, Jia-Yu Chen1, Wei-Zhen Zhou2, Xiaotong Zhu1, Peng Yu1, Jue Wang1, Xinzhuan Yang1, Ning Hou1, Zhiqiang Ye3, Rongli Zhang1, Ruiping Xiao1, Xiuqin Zhang1,* and Chuan-Yun Li1,*
1Institute of Molecular Medicine, Peking University, Beijing, China; 2Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, China; 3Drug Discovery Center, Key Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen, China
Although the rhesus macaque is a unique model for the translational study of human diseases, currently its use in biomedical research is still in its infant stage due to error-prone gene structures and limited annotations. We performed strand-specific mRNA next-generation sequencing in ten rhesus macaque tissues and generated 1.2 billion 90-bp paired-end long expression tags, covering >97.4% of the putative exon in monkey transcripts annotated by Ensembl. We found that at least 28.7% of the rhesus macaque transcripts were previously mis-annotated, mainly due to incorrect exon-intron boundaries, incomplete UTRs and missed exons. Compared with the previous gene models, the revised transcripts show clearer sequence motifs near splicing junctions and the end of UTRs, as well as cleaner patterns of exon-intron distribution for expression tags and cross-species conservation scores. Strikingly, 1,292 exon-intron boundary revisions between coding exons corrected the previously mis-annotated open reading frames. The revised gene models were experimentally verified in randomly-selected cases. We further integrated functional genomics annotations from >60 categories of public and in-house resources and developed an online accessible database, the RhesusBase (http://www.rhesusbase.org). User-friendly interfaces were developed to update, retrieve, visualize, and download the RhesusBase meta-data, providing a "one-stop" resource for the monkey research community.
We thank Drs. Heping Cheng and Liping Wei at Peking University, Dr. Yong E Zhang at the Chinese Academy Of Sciences for insightful suggestions for RhesusBase. We acknowledge Hui Wang, Wen Zheng, Bao Hai and Haitao Yang for assistance in RhesusBase development and Dr. Iain C. Bruce for manuscript revision.
Category: Human and other Vertebrate Genomes
Subcategory: Model organisms, comparative genomics
Oxford University Press is not responsible for the content of external internet sites