NAR Molecular Biology Database Collection entry number 1476
Nikolaus Fortelny1,2,3,4, Sharon Yang3,5,6, Paul Pavlidis4,7, Philipp F. Lange2,3,* and Christopher M. Overall1,2,3,*
1Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, British Columbia, Canada, 2Department of Oral Biological and Medical Sciences, University of British Columbia, Vancouver, British Columbia, Canada, 3Centre for Blood Research, University of British Columbia, Vancouver, British Columbia, Canada, 4Centre for High Throughput Biology, University of British Columbia, Vancouver, British Columbia, Canada, 5Department of Computer Science, University of British Columbia, Vancouver, British Columbia, Canada, 6Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada and 7Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada

Database Description

TopFIND (1–3) is a protein-centric database for the annotation of protein termini currently in its third version. Non-canonical protein termini can be the result of multiple different biological processes, including pre-translational processes such as alternative splicing and alternative translation initiation or post-translational protein processing by proteases that cleave proteases as part of protein maturation or as a regulatory modification. Accordingly, protein termini evidence in TopFIND is inferred from other databases such as ENSEMBL (4) transcripts, TISdb (5) for alternative translation initiation, MEROPS (6) for protein cleavage by proteases, and UniProt (7) for canonical and protein isoform start sites. Additionally, termini are annotated from user submitted lists of termini and inferred from user submitted lists of cleavage sites.
As a protein-centric database, TopFIND presents a website for each protein isoform (organized around UniProt accession codes). These websites contain general protein information, such as organism, chromosome location, and proteins sequence. They then list position information such as specific termini evidences, known cleavage sites, sequence features and domains for each protein. In addition, TopFIND shows each protein in the context of the protease web, a network of proteases and their inhibitors, where a protease can cleave of other proteases and their inhibitors thus influencing their activity (8). All information in TopFIND can be filtered by a powerful filter engine that relies on rich annotation as to the origin of data in TopFIND. TopFIND can also be programmatically queried using the PSICQUIC or XML API.

Recent Developments

Software tools were developed to enable quick access to TopFIND data for lists of termini obtained by, for example, proteomic termini screens (terminomics). TopFIND Explorer “TopFINDer” reports position specific protein information for protein termini, such as terminus evidences, prime and non-prime sequences, and protein domains affected by cleavage. TopFINDer further reports summary statistics for protein cleavage by known proteases. PathFINDer is a second tool that reports proteolytic paths from a query protease to identified protein substrates thus enabling the differentiation between direct and indirect protease substrates and yielding mechanistic insights into pathways based on existing information.


We thank all members of the Overall and Pavlidis labs at UBC, as well as Natalie Marshall, for many helpful discussions. This work was supported by the Center for Blood Research [N.F. and S.Y.]; the Alexander von Humboldt Foundation, the Breast Cancer Society of Canada and the MSFHR [P.F.L.]; CIHR; Infrastructure Grant from MSFHR; the Canada Foundations for Innovation; Infrastructure Grant from MSFHR; the Canada Foundations for Innovation.


1. Lange,P.F. and Overall,C.M. (2011) TopFIND, a knowledgebase linking protein termini with function. Nat. Methods, 8, 703–704.
2. Lange,P.F., Huesgen,P.F. and Overall,C.M. (2011) TopFIND 2.0--linking protein termini with proteolytic processing and modifications altering protein function. Nucleic Acids Res., 40, D351–D361.
3. Fortelny,N., Yang,S., Pavlidis,P., Lange,P.F. and Overall,C.M. (2014) Proteome TopFIND 3.0 with TopFINDer and PathFINDer: database and analysis tools for the association of protein termini to pre- and post-translational events. Nucleic Acids Res., 10.1093/nar/gku1012.
4. Flicek,P., Amode,M.R., Barrell,D., Beal,K., Billis,K., Brent,S., Carvalho-Silva,D., Clapham,P., Coates,G., Fitzgerald,S., et al. (2014) Ensembl 2014. Nucleic Acids Res., 42, D749–D755.
5. Wan,J. and Qian,S.-B. (2014) TISdb: a database for alternative translation initiation in mammalian cells. Nucleic Acids Res., 42, D845–850.
6. Rawlings,N.D., Barrett,A.J. and Bateman,A. (2012) MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res., 40, D343–D350.
7. Consortium,T.U. (2014) Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res., 42, D191–D198.
8. Fortelny,N., Cox,J.H., Kappelhoff,R., Starr,A.E., Lange,P.F., Pavlidis,P. and Overall,C.M. (2014) Network Analyses Reveal Pervasive Functional Regulation Between Proteases in the Human Protease Web. PLoS Biol, 12, e1001869.

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites