NAR Molecular Biology Database Collection entry number 255
Fox, Naomi; Brenner, Steven; Chandonia, John-Marc
1Berkeley Structural Genomics Center, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA, 94720 USA
2Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720-3102, USA
3MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
4Department of Structural Biology, D-109 Fairchild, Stanford University, Stanford, CA, USA
5Berkeley Structural Genomics Center, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA, and Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720-3102, USA

Database Description

The ASTRAL compendium provides a set of tools and databases designed to aid investigators in the analysis of protein structure, particularly through the use of sequence comparison. Astral augments SCOP, a manual classification of protein domains according to structure, by providing a library of sequences which each corresponds to a structural domain classified in SCOP. To do so, the PDB entry for each SCOP domain is examined, and a mapping is constructed between the SEQRES information (that reflects the molecule studied) and the ATOM records (atoms observed experimentally) Because the majority of the structures in PDB are very similar to others, it is frequently helpful to reduce the redundancy by selecting high-quality representative subsets. To do this, we compare all extracted sequences using standard sequence comparison algorithms. This information is then combined with a quality score that provides a first order estimate of the resolution and regularity of crystallographically determined protein structures. We are thus able to provide sequence subsets with both limited redundancy and high quality structural information. The level of redundancy in these subsets is user defined, and is based on one of three criteria: percent sequence identity, BLAST E-value, or SCOP similarity. These sequence subsets are an ideal starting point for homology based structure prediction, and have also proven useful for testing new sequence comparison methods, and structure analysis. Several major improvements have been made to the ASTRAL compendium since its initial release two years ago. The number of protein domains included has doubled from 15,190 to 30,867, and additional databases have been added. The Rapid Access Format (RAF) database contains manually curated mappings linking the amino acid sequences of proteins in the PDB (SEQRES records in the database entry) to the atoms experimentally observed (ATOM records), in a format designed for rapid access by automated tools. This information is used to derive sequences for protein domains in the SCOP database. In cases where a SCOP domain spans several protein chains, all of which can be traced back to a single genetic source, a genetic domain sequence is created by concatenating the sequences of each chain in the order found in the original gene sequence. Both the standard library of SCOP sequences and a library including genetic domain sequences are available. Selected representative subsets derived from both libraries using the criteria described above are also included.

Recent Developments

Manually curated, Rapid Access Format sequence maps Genetic Domain sequences Translation table for chemically modified amino acids


This project is funded by NIH grant 1 P50 GM62412. S.E.B. is supported NIH grant 1 K22 HG00056 and is a Searle Scholar (01-L-116).


1. Brenner SE, Koehl P, Levitt M (2000) The ASTRAL compendium for protein structure and sequence analysis. NAR 28: 254-256.

Subcategory: Protein structure

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites