NAR Molecular Biology Database Collection entry number 220
Krause, A.1, Haas, S.A.1, Coward, E.2, Vingron, M.1
1MPI for Molecular Genetics, Computational Molecular Biology, Ihnestr. 73, 14195 Berlin Germany
2University of Bergen, Department of Informatics, PB. 7800, 5020 Bergen, Norway

Database Description

The integration of SYSTERS, GeneNest and SpliceNest into one framework facilitates the over-all exploration of the whole sequence space covering protein, mRNA and EST sequences, as well as genomic DNA. The SYSTERS protein sequence cluster set provides an automatically generated classification of all sequences of the SWISS-PROT and TrEMBL databases as well as of the predicted protein sequence sets of several completely sequenced organisms into disjoint protein family and superfamily clusters annotated with sequence information from various other resources. For each cluster an MView (database search or multiple alignment viewer) output is generated and from the resulting partial multiple alignment a majority consensus sequence is calculated. All consensus sequences together build a searchable sequence database. The sequences in every cluster have been multiply aligned and annotated with known domains from the Pfam protein family database.

GeneNest is a database and software package for producing and visualizing gene indices from ESTs and mRNAs. Currently, the database comprises gene indices of human (based on UniGene), mouse, A.thaliana, drosophila, and zebrafish. All sequences are preprocessed to detect, annotate and clip regions containing vector sequence, repeats or are of low quality. The subsequent assembly step is done with the Staden package. For all contigs of a cluster, consensus sequences are generated and extracted to build a searchable sequence database. The visualization of a contig provides further information about the sequences, the represented gene and open reading frames, and links to precomputed protein homologies detected in the SYSTERS database.

SpliceNest is a web based graphical tool to explore gene structure based on a mapping of the EST consensus sequences from GeneNest to a complete genome. Assuming that a cluster normally represents a single gene, every contig of a cluster is aligned separately to the corresponding genomic region, using a spliced alignment program. The alignments are visualized in a diagram showing the exon/intron structure of all the exons simultaneously, mapped on the common genomic sequence, automatically highlighting candidates of alternative splicing.

Recent Developments

SYSTERS provides now a taxonomy driven user interface as well as the possibility for the interactive generation of a user-defined multiple alignment. A visualization of the tissue information in GeneNest was added for the analysis of tissue-specific gene expression. SpliceNest contains now the complete genomes of human, mouse, Drosophila, and Arabidopsis.


1. Krause,A., Haas, S.A., Coward,E., and Vingron,M. (2002) SYSTERS, GeneNest, SpliceNest: Exploring Sequence Space from Genome to Protein. Nucleic Acids Research, 30, 299-300.
2. Haas,S.A., Beissbarth,T., Rivals,E., Krause,A. and Vingron,M. (2000) GeneNest: automated generation and visualization of gene indices. Trends Genet., 16, 521-523.
3. Coward,E., Haas,S.A. and Vingron,M. (2002) SpliceNest: visualizing gene structure and alternative splicing based on EST clusters. Trends Genet., 18, 53-55.

Subcategory: Human ORFs

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites