Skip Navigation


NAR Molecular Biology Database Collection entry number 207
Sarah Hunter1, Philip Jones1, Alex Mitchell1, Rolf Apweiler1, Teresa K. Attwood, Alex Bateman3, Thomas Bernard4, David Binns1, Peer Bork5, Sarah Burge1, Edouard de Castro6, Penny Coggill3, Matthew Corbett1, Ujjwal Das1, Louise Daugherty1, Lauranne Duquenne4, Robert D. Finn3, Matthew Fraser1, Julian Gough7, Daniel Haft8, Nicolas Hulo6, Daniel Kahn4, Elizabeth Kelly9, Ivica Letunic5, David Lonsdale1, Rodrigo Lopez1, Martin Madera7, John Maslen1, Craig McAnulla1, Jennifer McDowall1, Conor McMenamin1, Huaiyu Mi10, Prudence Mutowo-Muellenet1, Nicola Mulder9, Darren Natale11, Christine Orengo12, Sebastien Pesseat1, Marco Punta3, Antony F. Quinn1, Catherine Rivoire6, Amaia Sangrador-Vegas1, Jeremy D. Selengut8, Christian J. A. Sigrist6, Maxim Scheremetjew1, John Tate3, Manjulapramila Thimmajanarthanan,1 Paul D. Thomas10, Cathy H. Wu12, Corin Yeats, and Siew-Yit Yong1
1EMBL Outstation European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, CB10 1SD, Cambridge, 2Faculty of Life Science and School of Computer Science, The University of Manchester, M13 9PL, Manchester, 3The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK, 4Pôle Rhône-Alpin de Bio-Informatique (PRABI) and Laboratoire de Biométrie et Biologie Evolutive; CNRS; INRA; Université de Lyon; Université Lyon 1, 69622 Villeurbanne, France, 5European Molecular Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany, 6Swiss Institute of Bioinformatics (SIB), CMU - Rue Michel-Servet 11211, Geneva 4, Switzerland, 7Department of Computer Science, University of Bristol, Woodland Road, Bristol, BS8 1UB, UK, 8J. Craig Venter Institute (JCVI), 9704 Medical Center Drive, Rockville, MD 20850, USA, 9Computational Biology Unit, Institute of Infectious Disease and Molecular Medicine, University of Cape Town Health Sciences Campus, Anzio Road, Observatory 7925, South Africa, 10University of Southern California, Los Angeles, CA 90089, USA, 11Protein Information Resource (PIR), Georgetown University Medical Center, 3300 Whitehaven Street, NW, Suite 1200 Washington, D.C. 20007, USA and 12Structural and Molecular Biology Department, University College London, University of London, WC1E 6BT UK

Database Description

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 to amalgamate the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIR SuperFamily and, the structure-based SUPERFAMILY have been manually integrated and are available in InterPro for text- and sequence-based searching. CATH and PANTHER HMMs will soon be integrated. The results are provided in a single, comprehensive format, with links to the original data sources, as well as specialised functional databases. The latest release of InterPro contains over 10,000 entries, with 78% coverage of all proteins in UniProt. Each entry has annotation provided in the name, GO mapping and abstract fields, and all matches against the Swiss-Prot and TrEMBL components of UniProt are precomputed and available for viewing in different formats. Protein 3D structural information is integrated from MSD, CATH and SCOP, and this data is available in the match views to provide an at a glance comparison of sequence and structural domains. The database is available via a webserver ( and anonymous FTP ( InterProScan provides a sequence search package that can be used via a web interface or can be installed locally for bulk searches.

Recent Developments

New features of the database include improved match views and a taxonomy servlet. The match views now include both extended and compact views that can be ordered by protein accession number, name, taxonomy or by proteins of known structure. The InterPro Domain Architectures view is a graphical representation of protein domain architecture, where the domain architecture of a protein sequence is displayed as a series of non-overlapping domains. This provides a means of viewing and displaying protein domain compositions. The taxonomic range of proteins matching each InterPro entry is displayed in a new field. The number of proteins matching each taxonomic group links to the graphical view of that subset of proteins. The first HMMs from the CATH database, which bases its entries on structural superfamilies from CATH, have been integrated, and PANTHER is the next database awaiting integration.


The InterPro project is supported by the ProFuSe grant (QLG2-CT-2000-00517) of the European Commission.


1. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J., Chothia, C. and Murzin, A.G. (2004) SCOP database in 2004: refinements integrate structure and sequence family. Nucleic Acids Research 32(1), D226-229.
2. Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O'Donovan, C., Redaschi, N. and Yeh, L.S. (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Research 32(1), D115-119.
3. Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Mitchell, A.L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A. and Zygouri, C. (2003) PRINTS and its automatic supplement pre-PRINTS. Nucleic Acids Research 31(1), 400-402.
4. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Studholme, D.J., Yeats, C. and Eddy, S.R. (2004) The Pfam protein families database. Nucleic Acids Research 32(1), D138-141.
5. Biswas, M., O´Rourke, J.F., Camon, E., Fraser, G., Kanapin, A., Karavidopoulou, Y., Kersey, P., Kriventseva, E., Mittard, V., Mulder, N., Phan, I., Servant, F. and Apweiler, R. (2002) Applications of InterPro in protein annotation and genome analysis. Briefings in Bioinformatics 3(3), 285-295.
6. Golovin A, Oldfield TJ, Tate JG, Velankar S, Barton GJ, Boutselakis H, Dimitropoulos D, Fillon J, Hussain, A., Ionides, J.M., John, M., Keller, P.A., Krissinel, E., McNeil, P., Naim, A., Newman, R., Pajon, A., Pineda, J., Rachedi, A., Copeland, J., Sitnov, A., Sobhany, S., Suarez-Uruena, A., Swaminathan, G.J., Tagari, M., Tromm, S., Vranken, W. and Henrick, K. (2004) E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Research 32(1), 211-216.
7. Haft, D.H., Selengut, J.D. and White, O. (2003) The TIGRFAMs database of protein families. Nucleic Acids Research 31, 371-373.
8. Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., Richter, J., Rubin, G.M., Blake, J.A., Bult, C., Dolan, M., Drabkin, H., Eppig, J.T., Hill, D.P., Ni, L., Ringwald, M., Balakrishnan, R., Cherry, J.M., Christie, K.R., Costanzo, M.C., Dwight, S.S., Engel, S., Fisk, D.G., Hirschman, J.E., Hong, E.L., Nash, R.S., Sethuraman, A., Theesfeld, C.L., Botstein, D., Dolinski, K., Feierbach, B., Berardini, T., Mundodi, S., Rhee, S.Y., Apweiler, R., Barrell, D., Camon, E., Dimmer, E., Lee, V., Chisholm, R., Gaudet, P., Kibbe, W., Kishore, R., Schwarz, E.M., Sternberg, P., Gwinn, M., Hannick, L., Wortman, J., Berriman, M., Wood, V., de la Cruz, N., Tonellato, P., Jaiswal, P., Seigfried, T. and White, R. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research 32(1), 258-261.
9. Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P. and Bairoch, A. (2004) Recent improvements to the PROSITE database. Nucleic Acids Research 32(1), 134-137.
10. Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P. and Bork, P. (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Research 32(1), 142-144.
11. Madera, M., Vogel, C., Kummerfeld, S.K., Chothia, C. and Gough, J. (2004) The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Research 32(1), 235-239.
12. Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Barrell, D., Bateman, A., Binns, D., Biswas, M., Bradley, P., Bork, P., Bucher, P., Copley, R.R., Courcelle, E., Das, U., Durbin, R., Falquet, L., Fleischmann, W., Griffiths-Jones, S., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lopez, R., Letunic, I., Lonsdale, D., Silventoinen, V., Orchard, S.E., Pagni, M., Peyruc, D., Ponting, C.P., Selengut, J.D., Servant, F., Sigrist, C.J., Vaughan, R. and Zdobnov, E.M. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Research 31(1), 315-318.
13. Orengo, C.A., Pearl, F.M. and Thornton, J.M. (2003) The CATH domain structure database. Methods in Biochemical Analysis 44, 249-271.
14. Pearl, F.M., Lee, D., Bray, J.E., Buchan, D.W., Shepherd, A.J. and Orengo, C.A. (2002) The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Science 11(2), 233-244.
15. Servant, F., Bru, C., CarrÈre, S., Courcelle, E., Gouzy, J., Peyruc, D. and Kahn, D. (2002) ProDom: Automated clustering of homologous domains. Briefings in Bioinformatics 3, 246-25.
16. Wu, C.H., Nikolskaya, A., Huang, H., Yeh, L.S., Natale, D.A., Vinayaka, C.R., Hu, Z.Z., Mazumder, R., Kumar, S., Kourtesis, P., Ledley, R.S., Suzek, B.E., Arminski, L., Chen, Y., Zhang, J., Cardenas, J.L., Chung, S., Castro-Alvear, J., Dinkov, G., Barker, W.C. (2004) PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Research 32(1), 112-114.
17. Zdobnov, E.M., Apweiler, R. (2001) InterProScan - an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9), 847-848.

Go to the abstract in the NAR 2009 Database Issue.
Oxford University Press is not responsible for the content of external internet sites