NAR Molecular Biology Database Collection entry number 274
Rakesh R., Bhaskara R. M., and Srinivasan N.
Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India

Database Description

The database of Phylogeny and ALIgnment of homologous protein structures (PALI) contains structure-based sequence alignments and dendrograms based on information primarily derived from the structural alignments at domain level [1,2]. Protein domain decomposition as proposed by SCOP (Version 1.75C) is used in the PALI database (version 2.8c) [3]. Non-redundant entries with best resolution corresponding to each SCOP domain family are used for the structural alignments.

Recent Developments

There are 2220 multi-member families and 1582 orphans (single-member families) consisting of about 20,000 domains in the current version of PALI. Over 200,000 pair-wise and 2220 multiple structural alignments have been generated for all multimember families. Every family with at least three members is associated with two dendrograms, one based on the structural dissimilarity metric (SDM) defined for every pair-wise superposition and the other based on the identity of the topologically equivalent residues. For orphan families, the domain level sequences are provided. Alignments of protein domains of known 3-D structure from PALI integrated with homologous sequences from UniProt (Universal Protein Resource) database [4,5] are also available for every family in PALI. PSI-BLAST search using a query sequence can be performed against the structural members in PALI, the structural members integrated with the sequence homologues from UniProt database (PALI+). All the pair-wise structural superposition were generated using DALI [6,7] program. Structure alignment program MUSTANG [8] was used to superimpose multiple homologous protein domain structures. A graphical interface (Jmol applet) for every family in PALI to view the structure based multiple alignment and pair-wise structural alignments is also provided. Integration of the domain sequences with UniProt database is achieved in the following two steps: Structure based multiple sequence alignments for each family are queried against Uniref90 using PSI-BLAST for 20 iterations and the hits obtained are filtered subsequently based on 70% query coverage and E-value of 0.0001. HMM profiles for each family are generated based on structure based multiple sequence alignment using hmmbuild (HMMER 3.0) [9]. These profiles are used to obtain integrated sequence-structure alignments at the family level using hmmalign (HMMER 3.0). For orphan PALI families subsequent to PSI-BLAST runs the sequences are aligned using MAFFT [10].
In the current version (v2.8c), we have added ~300 new multi-membered families and 148 new orphan families to the previous version. These correspond to ~2500 new domains added to the multiple and pairwise structural alignments. These additions are a result of the SCOP update of 1.75C.


RMB is supported by Council of Scientific and Industrial Research, New Delhi and SM is supported by Indian Institute of Science, Bangalore.


1. Balaji, S., Sujatha, S., Kumar, S.S.C. and Srinivasan, N. (2001) PALI: A database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res. 29, 61-65

2. V.S. Gowri, S.B. Pandit, P.S. Karthik, N. Srinivasan & S. Balaji (2003) Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database Nucleic Acids Res. 31, 486-488

3. Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536-540

4. Apweiler, R,. Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et. al. (2004) UniProt: the Universal Protein Knowledgebase. Nucleic Acids Res. 32, D115-D119

5. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, The Universal Protein Resource (UniProt). Nucleic Acids Res. 33, D154-159

6.Holm L.& Sander C.(1996). Mapping the protein universe. Science 273,595-602

7.Holm L.& Park J.(2000). DaliLite workbench for structure comparison.Bioinformatics 16,566-567

8. Konagurthu A.S., Whisstock J.C.,Stuckey P.J.,Lesk A.M.(2006) MUSTANG:A multiple structural alignment algorithm.Proteins: Structure, Function and Bioinformatics.64,559-574

9. Eddy S. R.(1998). Profile hidden Markov models. Bioinformatics.14,755-763

10. Katoh K., Misawa K., Kuma K., and Miyata T.(2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acid Res. 30,3059-3066

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites