Skip Navigation


NAR Molecular Biology Database Collection entry number 1177
Heinz-Uwe Hobohm
University of Applied Sciences, Bioinformatics, D-35390 Giessen, Germany

Database Description

PDBselect ( is a list of representative protein chains with low mutal sequence identity selected from the protein data bank (PDB) to enable unbiased statistics. The list increased from 155 chains in 1992 to more than 3300 chains in 2007.

PDBselect started, when the realm of protein chains with known 3D structure was around 700, less than 1% of the January 2007 count, resulting in a representative list of 155 protein chains with mutual sequence similarity of less than 30 percent (in subsequent releases we used a threshold of 25 percent). To generate the representative list of protein chains, an all-versus-all sequence comparison was implemented. The distance between two protein sequences is calculated by applying the HSSP-function, later refined by Abagyan and Batalov based on a larger data set. When two protein chains score related by the function, the one with lower quality is removed, to end up with a representative list of high quality structures. Quality is defined as "resolution [in Angstrom] plus R-factor/20", with NMR structures allocated an arbitrary (low) quality.


1. Hobohm, U., Scharf, M., Schneider, R. and Sander, C. (1992) Selection of representative protein data sets. Protein Sci, 1, 409-417.
2. Hobohm, U. and Sander, C. (1994) Enlarged representative set of protein structures. Protein Sci, 3, 522-524.

Subcategory: Protein structure

Go to the abstract in the NAR 2010 Database Issue.
Oxford University Press is not responsible for the content of external internet sites