Skip Navigation


NAR Molecular Biology Database Collection entry number 203
Henikoff, J.G.1, Pietrokovski, S.3, Greene, E.A.1, Taylor, N.1, Ng, P.C.1, Henikoff, S.2
1Fred Hutchinson Cancer Research Center, Seattle, WA, USA
2Howard Hughes Medical Institute
3Weizmann Institute of Science, Rehovot, Israel

Database Description

Blocks are ungapped multiple alignments corresponding to the most conserved regions of proteins. The Blocks Database (1) consists of blocks constructed from documented families of related proteins by the automated PROTOMAT system (2). It currently contains 11,853 blocks representing 2,608 protein families documented in InterPro (3) and Prints (4). A blocks multiple alignment consists of ungapped conserved regions separated by unaligned regions of variable size. The PROTOMAT system applies a robust motif-finder (5) to a set of related protein sequences. Resulting candidate motifs are assembled into a best set along the lengths of the sequences, and additional sequences may be added if they are known to be related and can be aligned with all of the resulting blocks for a family. In order to reduce the redundancy and size of the Blocks Database, PROTOMAT is applied to families of sequences documented in InterPro in a heirarchical manner by perceived quality of documentation and lack of family inter-relationships. Since the Prints Database format is consistent with that of Blocks, Prints blocks are added directly without running PROTOMAT. The LAMA algorithm (6) is used to compare all blocks added to the Blocks Database with each other to further reduce redundant entries. The Blocks Database is used to annotate proteins of unknown function (16). Protein or DNA sequence can be compared with the Blocks Database using the BLIMPS searching tool (7), or the IMPALA and RPS-BLAST tools from NCBI (8, 9), all of which provide statistics to evaluate hits. When a search of the Blocks Database hits a protein family, the user is linked to the InterPro documentation page. The Blocks WWW Server also provides several tools that enhance the information found there. Visual displays of the blocks for a family are provided by sequence logos (10), by maps of the blocks along the sequences, and by highlighting the blocks on known structures of the sequences in them. Structures can be viewed with the ProWeb PDB Viewer, or with a browser helper application. A phylogenetic tree is made from the blocks representing each family and can be explored with the ProWeb tree viewer. Reverse searches of the block alignments against sequence databases are facilitated by links to BLAST (9, 11), MAST (12) and LAMA (6) searching pages. Blocks are linked to the CODEHOP PCR primer design tool (14), which uses the multiple alignment to design hybrid consensus-degenerate primers. Blocks are also linked to SIFT (15), a program which predicts the effect of amino acid substitutions from multiple alignment information. Finally, links are provided from the Blocks Database to CYRCA sets of consistently aligned blocks (13). Each CYRCA set contains similar blocks identified from consistent LAMA alignments of pairs of blocks. Each set usually contains conserved regions of similar function and structure that appear in different contexts. For blocks that have members with known structures, CYRCA has a tool to superimpose the structures according to their alignment in these sets. All of these tools except CYRCA are also available when users make blocks from their own sequences or excise blocks from their own multiple alignments using the 'Block Maker' and 'Multiple Alignment Processor' features.

Recent Developments

Links are now provided from user blocks created by Block Maker and from blocks excised from other user-provided multiple alignments by the Multiple Alignment Processor to the ProWeb Tree Viewer and PDB Viewer. The ProWeb PDB Viewer now quickly displays blocks on structures of sequences included in them without the need to install a helper application. A new publication provides an in-depth introduction to using the Blocks Database to recognize functional domains (16).


This work is supported by grants from the NIH (GM29009) and the DOE (DE-FG03-97ER62382).


1. Henikoff, J.G., Greene, E.A., Pietrokovski, S. and Henikoff, S. (2000) Nucleic Acids Res., 28, 228-230.
2. Henikoff S. and Henikoff J.G. (1994) Nucleic Acids Res., 19, 97-107.
3. Apweiler, R., Attwood, T.K., Bairoch, A, et al (2000), Bioinformatics, 16, 1145-1150.
4. Attwood, T.K., Croning, M.D., Flower, D.R. et al (2000), Nucleic Acids Res., 28, 225-227.
5. Smith, H.O., Annau, T.M. and Chandrasegaran, S. (1990) Proc. Natl. Acad. Sci. USA, 87, 826-830.
6. Pietrokovski, S., Nucleic Acids Res. (1996) 24, 3836-3845.
7. Henikoff, J.G. and Henikoff, S. (1996) Computer Applications in the Biological Sciences, 12, 135-143.
8. Schaffer, A.A., Wolf, Y.I., Ponting, C.P., Koonin, E.V., Aravind, L. and Altschul, S.F. (1999?) Bioinformatics,
9. Altschul, S.F., Madden, T.L., Schaffer, A.A., et al (1997), Nucleic Acids Res., 25, 3389-3402.
10. Schneider, T.D. and Stephens, R.M. (1990), Nucleic Acids Res., 18, 6097-6100.
11. Henikoff, S. and Henikoff, J.G. (1997) Protein Science, 6, 698-705.
12. Bailey, T.L. and Gribskov, M. (1998) Bioinformatics, 14, 48-54.
13. Kunin, V., Chan, B., Sitbon, E., Lithwick, G. and Pietrokovski, S. (2001) J. Mol. Biol., 307, 939-949.
14. Rose, T.M., Schultz, E.R., Henikoff, J.G., Pietrokovski, S., McCallum, C.M. and Henikoff, S. (1998) Nucleic Acids Res., 26, 1628-1635.
15. Ng, P.C. and Henikoff, S. (2001) Genome Res., 5, 863-874.
16. Henikoff, J.G., Pietrokovski, S., Greene, E.A., Taylor, N. and Henikoff, S. (2002-in press) Current Protocols in Bioinformatics, Unit 2.2.

Go to the abstract in the NAR 2000 Database Issue.
Oxford University Press is not responsible for the content of external internet sites