NAR Molecular Biology Database Collection entry number 805
Robertson A.G.1, Bilenky M.1, Lin K.1, He A.1, Yuen, W.1, Dagpinar M.1, Varhol R.1, Teague K.1, Griffith O.L.1, Zhang X.1, Pan, Y.1, Hassel M.1, Sleumer M.C.1, Pan, W.1, Pleasance E.D.1, Chuang, M.1, Hao H.1, Li Y.Y.1, Robertson N.1, Fjell C.1, Li B.1, Montgomery S.B.1, Astakhova T.1, Zhou J.2, Sander J.2, Siddiqui A.S.1 and Jones S.J.M.1
1Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
2Department of Computing Science, University of Alberta, Edmonton, AB, Canada

Database Description

cisRED ( is a database for predicted regulatory elements that are identified and ranked by a computational system for genome-scale discovery of phylogenetically conserved motifs. The current version of the database, human v2, contains 386 000 motifs with lengths between 6 to ~30 base pairs. Motifs were predicted for ~18 000 human genes, using sequence search regions that extended from 1.5 Kb upstream to 200b downstream of a transcription start site, net of most types of repeats and of coding exons, which were masked. Many known transcription factor binding sites are located in such regions. Motifs were predicted using multiple de novo discovery methods applied to multi-species sequence sets that contained between 4 and 15 vertebrate species. An empirical p-value was assigned to each motif by applying motif discovery methods to randomized sequence sets that were adaptively derived from target sequence sets, and retaining motifs below a p-value threshold. Groups of similar motifs were identified using OPTICS hierarchical clustering; co-occurring patterns of motifs are being identified. Predicted regulatory elements can be viewed directly in cisRED's web user interface, and can be user-filtered by p-value or species composition. In addition, motifs can be viewed in the UCSC or Ensembl genome browsers, and in the Sockeye comparative genomics workspace. Methods are described by documentation on the cisRED web site. A schema diagram is available, and the data and SQL structure for the MySQL databases can be downloaded. The database can be queried directly at, using "anonymous" as a username and leaving the password blank.

Recent Developments

Sequences from low-coverage genomes have been added to existing species resources in order to increase species depth for motif discovery. Large scale predictions for mouse and rat will be available early in 2006.


cisRED was funded by Genome Canada, Genome British Columbia and the BC Cancer Foundation. S.J.M. Jones, M.C. Sleumer, S.B. Montgomery and O.L. Griffith were supported by the Michael Smith Foundation for Health Research (MSFHR). E. Pleasance was supported by the Canadian Institutes of Health Research (CIHR). S.B. Montgomery and O.L. Griffith were also supported by the Natural Sciences and Engineering Research Council. H. Hao was supported by the CIHR/MSFHR Strategic Training Program in Bioinformatics. cisRED calculations were enabled by the use of WestGrid computing resources, which are funded in part by the Canada Foundation for Innovation, Alberta Innovation and Science, BC Advanced Education, and the participating research institutions.

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites