NAR Molecular Biology Database Collection entry number 207
Finn, Robert; Mitchell, Alex; Chang, Hsin-Yu; Daugherty, Louise; Fraser, Matthew; Hunter, Sarah; Lopez, Rodrigo; McAnulla, Craig; McMenamin, Conor; Nuka, Gift; Pesseat, Sebastien; Sangrador-Vegas, Amaia; Scheremetjew, Maxim; da Silva, Claudia; Yong, Siew-Yit; Bateman, Alex; Punta, Marco; Attwood, Teresa; Sigrist, Christian; Redaschi, Nicole; Rivoire, Catherine; Xenarios, Ioannis; Bork, Peer; Letunic, Ivica; Gough, Julian; Oates, Matt; Haft, Daniel; Huang, Hongzhan; Natale, Darren; Wu, Cathy H; Orengo, Christine; Sillitoe, Ian; Mi, Huaiyu'; Thomas, Paul
EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Database Description

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 to amalgamate the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIR SuperFamily and, the structure-based SUPERFAMILY have been manually integrated and are available in InterPro for text- and sequence-based searching. CATH and PANTHER HMMs will soon be integrated. The results are provided in a single, comprehensive format, with links to the original data sources, as well as specialised functional databases. The latest release of InterPro contains over 10,000 entries, with 78% coverage of all proteins in UniProt. Each entry has annotation provided in the name, GO mapping and abstract fields, and all matches against the Swiss-Prot and TrEMBL components of UniProt are precomputed and available for viewing in different formats. Protein 3D structural information is integrated from MSD, CATH and SCOP, and this data is available in the match views to provide an at a glance comparison of sequence and structural domains. The database is available via a webserver ( and anonymous FTP ( InterProScan provides a sequence search package that can be used via a web interface or can be installed locally for bulk searches.


1. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J., Chothia, C. and Murzin, A.G. (2004) SCOP database in 2004: refinements integrate structure and sequence family. Nucleic Acids Research 32(1), D226-229.
2. Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O'Donovan, C., Redaschi, N. and Yeh, L.S. (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Research 32(1), D115-119.
3. Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Mitchell, A.L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A. and Zygouri, C. (2003) PRINTS and its automatic supplement pre-PRINTS. Nucleic Acids Research 31(1), 400-402.
4. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Studholme, D.J., Yeats, C. and Eddy, S.R. (2004) The Pfam protein families database. Nucleic Acids Research 32(1), D138-141.
5. Biswas, M., O´Rourke, J.F., Camon, E., Fraser, G., Kanapin, A., Karavidopoulou, Y., Kersey, P., Kriventseva, E., Mittard, V., Mulder, N., Phan, I., Servant, F. and Apweiler, R. (2002) Applications of InterPro in protein annotation and genome analysis. Briefings in Bioinformatics 3(3), 285-295.
6. Golovin A, Oldfield TJ, Tate JG, Velankar S, Barton GJ, Boutselakis H, Dimitropoulos D, Fillon J, Hussain, A., Ionides, J.M., John, M., Keller, P.A., Krissinel, E., McNeil, P., Naim, A., Newman, R., Pajon, A., Pineda, J., Rachedi, A., Copeland, J., Sitnov, A., Sobhany, S., Suarez-Uruena, A., Swaminathan, G.J., Tagari, M., Tromm, S., Vranken, W. and Henrick, K. (2004) E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Research 32(1), 211-216.
7. Haft, D.H., Selengut, J.D. and White, O. (2003) The TIGRFAMs database of protein families. Nucleic Acids Research 31, 371-373.
8. Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., Richter, J., Rubin, G.M., Blake, J.A., Bult, C., Dolan, M., Drabkin, H., Eppig, J.T., Hill, D.P., Ni, L., Ringwald, M., Balakrishnan, R., Cherry, J.M., Christie, K.R., Costanzo, M.C., Dwight, S.S., Engel, S., Fisk, D.G., Hirschman, J.E., Hong, E.L., Nash, R.S., Sethuraman, A., Theesfeld, C.L., Botstein, D., Dolinski, K., Feierbach, B., Berardini, T., Mundodi, S., Rhee, S.Y., Apweiler, R., Barrell, D., Camon, E., Dimmer, E., Lee, V., Chisholm, R., Gaudet, P., Kibbe, W., Kishore, R., Schwarz, E.M., Sternberg, P., Gwinn, M., Hannick, L., Wortman, J., Berriman, M., Wood, V., de la Cruz, N., Tonellato, P., Jaiswal, P., Seigfried, T. and White, R. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research 32(1), 258-261.
9. Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P. and Bairoch, A. (2004) Recent improvements to the PROSITE database. Nucleic Acids Research 32(1), 134-137.
10. Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P. and Bork, P. (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Research 32(1), 142-144.
11. Madera, M., Vogel, C., Kummerfeld, S.K., Chothia, C. and Gough, J. (2004) The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Research 32(1), 235-239.
12. Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Barrell, D., Bateman, A., Binns, D., Biswas, M., Bradley, P., Bork, P., Bucher, P., Copley, R.R., Courcelle, E., Das, U., Durbin, R., Falquet, L., Fleischmann, W., Griffiths-Jones, S., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lopez, R., Letunic, I., Lonsdale, D., Silventoinen, V., Orchard, S.E., Pagni, M., Peyruc, D., Ponting, C.P., Selengut, J.D., Servant, F., Sigrist, C.J., Vaughan, R. and Zdobnov, E.M. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Research 31(1), 315-318.
13. Orengo, C.A., Pearl, F.M. and Thornton, J.M. (2003) The CATH domain structure database. Methods in Biochemical Analysis 44, 249-271.
14. Pearl, F.M., Lee, D., Bray, J.E., Buchan, D.W., Shepherd, A.J. and Orengo, C.A. (2002) The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Science 11(2), 233-244.
15. Servant, F., Bru, C., CarrÈre, S., Courcelle, E., Gouzy, J., Peyruc, D. and Kahn, D. (2002) ProDom: Automated clustering of homologous domains. Briefings in Bioinformatics 3, 246-25.
16. Wu, C.H., Nikolskaya, A., Huang, H., Yeh, L.S., Natale, D.A., Vinayaka, C.R., Hu, Z.Z., Mazumder, R., Kumar, S., Kourtesis, P., Ledley, R.S., Suzek, B.E., Arminski, L., Chen, Y., Zhang, J., Cardenas, J.L., Chung, S., Castro-Alvear, J., Dinkov, G., Barker, W.C. (2004) PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Research 32(1), 112-114.
17. Zdobnov, E.M., Apweiler, R. (2001) InterProScan - an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9), 847-848.

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites