Interrupted coding sequences

NAR Molecular Biology Database Collection entry number 826

Database Description

Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterisation, from in silico analysis to high-throughput proteomic projects. Here we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface ( The definition of each interrupted gene is provided as well as the ICDS genomic localisation with the surrounding sequence. Furthermore, to facilitate the experimental characterisation of ICDS, we propose optimised primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: i) ICDS prediction on a benchmark of artificially created frameshifts, ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580, iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95% and 82% respectively) of our program and the efficiency of primer determination.

Go to the abstract in the NAR 2006 Database Issue.
Oxford University Press is not responsible for the content of external internet sites