Skip Navigation

PEP: Predictions for Entire Proteomes

NAR Molecular Biology Database Collection entry number 377
Carter, P.1, Liu, J.2, Rost, B.1
1Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA
2Department of Pharmacology, Columbia University, 630 West 168th Street, New York, NY 10032, USA

Database Description

PEP is a database of Predictions for Entire Proteomes. The database contains summaries of analyses of protein sequences from a range of organisms representing all three major kingdoms of life: eukaryotes, prokaryotes, and archaea. All proteins publicly available for each organism were aligned against SWISS-PROT, TrEMBL and PDB. Additionally the following annotations are provided: secondary structure, transmembrane helices, coiled coils, regions of low complexity, signal peptides, nuclear localization signals, PROSITE motifs and classes of cellular function. Proteins that contain long regions without regular secondary structure are also identified. We have produced a related database of structural domain-like fragments derived from PEP, and clusters based on homology between all fragments. The PEP database, fragments and clusters are distributed freely as a set of flat files, and have been integrated into SRS. The PEP group of databases can be accessed from:


Thanks to Dariusz Przybylski, Rajesh Nair and Kazimierz Wrzeszczynski (Columbia University) for providing preliminary information and programs. Thanks to the SRS team for their software. The work was supported by the grants 1-P50-GM62413-01 and RO1-GM63029-01 from the National Institute of Health (NIH). Last, not least, thanks to all those who deposit their experimental data in public databases, and to those who maintain these databases.


1. Rost, B. (2002) Current Opinion in Structural Biology, 12, 409-416.
2. Liu, J. and Rost, B. (2001) Protein Sci, 10, 1970-1979.
3. Liu, J., Tan, H. and Rost, B. (2002) Journal of Molecular Biology, in press.
4. Liu, J. and Rost, B. (2002) Bioinformatics, 18, 922-933.
5. Montelione, G.T. (2001). Northeast Structural Genomics Consortium (NESG).
6. Bairoch, A. and Apweiler, R. (2000) Nucleic Acids Res, 28, 45-48.
7. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. and Bourne, P.E. (2000) Nucleic Acids Res, 28, 235-242.
8. (2002) Nucleic Acids Res, 30, 106-108.
9. Stein, L., Sternberg, P., Durbin, R., Thierry-Mieg, J. and Spieth, J. (2001) Nucleic Acids Res, 29, 82-86.
10. Altschul, S.F. and Gish, W. (1996) Methods in Enzymology, 266, 460-480.
11. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Nucleic Acids Res, 25, 3389-3402.
12. Sander, C. and Schneider, R. (1991) Proteins, 9, 56-68.
13. Tamames, J., Ouzounis, C., Casari, G., Sander, C. and Valencia, A. (1998) Bioinformatics, 14, 542-543.
14. Lupas, A. (1996) Methods Enzymol, 266, 513-525.
15. Rost, B. (2001) Journal of Structural Biology, 134, 204-218.
16. Rost, B. (1996) Methods Enzymol, 266, 525-539.
17. Wootton, J.C. and Federhen, S. (1996) Methods Enzymol, 266, 554-571.
18. Nielsen, H., Engelbrecht, J., Brunak, S. and von Heijne, G. (1997) Protein Eng, 10, 1-6.
19. Nair, R., Carter, P. and Rost, B. (2002) Nucleic Acids Research, submitted.
20. Cokol, M., Nair, R. and Rost, B. (2000) EMBO Reports, 1, 411-415.
21. Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K. and Bairoch, A. (2002) Nucleic Acids Res, 30, 235-238.
22. Etzold, T. and Argos, P. (1993) Comput Appl Biosci, 9, 49-57.

Go to the abstract in the NAR 2003 Database Issue.
Oxford University Press is not responsible for the content of external internet sites