HRaP - Database of occurrence of HomoRepeats and Patterns in proteomes
NAR Molecular Biology Database Collection entry number 1683
Lobanov, M.Yu., Sokolovskiy, I.V. and Galzitskaya, O.V.
Group of Bioinformatics, Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia.
With active studying of disordered regions and their function we focus our attention on manifold long repeats of one amino acid (homorepeats) (1). Our database includes 122 proteomes, 97 eukaryotic and 25 bacterial ones that can be divided into 9 kingdoms and 5 phyla of bacteria. Considering these proteomes we have 1 449 561 protein sequences. The database includes 771 786 of proteins with GO annotations. It has been found that leucine repeats were especially abundant in the "Receptor and/or Membrane" group, glutamine and alanine repeats in Transcription factor and/or Development, and lysine repeats in Metabolism (2, 3). HRaP can be used to analyze evolution differences between proteins from different proteomes and connections of these regions with some definite functions. To see the occurrence of a homorepeat, at the first step the user should choose a proteome among 122 considered ones, and then at the second step choose the investigated homorepeat with the given length or pattern. After that the list of proteins with the given homorepeat or pattern appears with GO annotations (if such is determined). Usually, long proteins contain a homorepeat or several different homorepeats. If several homorepeats and patterns exist in a protein then all these regions will be marked by different colors in the sequence. In the section HomoRepeats or Patterns you can find the occurrence of homorepeats with different lengths (or patterns) for all 122 proteomes.
The patterns and homorepeats assotiated with the functions are presented in section GO annotations. We have determined homorepeats and patterns that are associated with some function. We have created the list of human proteins with homorepeats that are associated with disease. The list can be found in Frequently Asked Questions (FAQ) section. Also, the list of proteins with homorepeats of 6 and more residues long from the clustered Protein Data Bank (4, 5) can be found in FAQ section.
We thank O.V. Sokolovskaya for assistance in programming. HRaP is funded by the Russian Foundation for Basic Research [grant number 11-04-00763]; and Russian Academy of Sciences (programs "Molecular and Cell Biology" [grant number 01201353567] and "Fundamental Sciences to Medicine"). Funding for open access charge: Russian Academy of Sciences programs "Molecular and Cell Biology" [grant number 01201353567].
1. Lobanov,M.Yu. and Galzitskaya,O.V. (2012) Occurrence of disordered patterns and homorepeats in eukaryotic and bacterial proteomes. Molecular BioSystems, 8, 327-337. 2. Jorda,J. and Kajava,A.V. (2010) Protein homorepeats sequences, structures, evolution, and functions. Adv. Protein Chem. Struct. Biol., 79, 59-88. 3. Mularoni,L., Ledda,A., Toll-Riera,M. and Albà ,M.M. (2010) Natural selection drives the accumulation of amino acid tandem repeats in human proteins. Genome Res., 20, 745-754. 4. Lobanov,M.Y. and Galzitskaya,O.V. (2011) Disordered patterns in clustered Protein Data Bank and in eukaryotic and bacterial proteomes. PLoS One, 6, e27142. 5. Lobanov,M.Yu., Sokolovskiy,I.V. and Galzitskaya,O.V. (2013) IsUnstruct: prediction of the residue status to be ordered or disordered in the protein chain by a method based on the Ising model. J. Biomol. Struct. Dyn., 31, 1034-1043.
Category: Protein sequence databases
Subcategory: Protein sequence motifs and active sites
Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites