Skip Navigation

DDBJ - DNA Data Bank of Japan


NAR Molecular Biology Database Collection entry number 1
Nakamura, Yasukazu; Kosuge, Takehide; Mashima, Jun`; Kodama, Yuichi; Fujisawa, Takatomo; Kaminuma, Eli; Ogasawara, Osamu; Okubo, Kousaku; Takagi, Toshihisa
1Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Yata, Mishima 411-8540, Japan.
2Department of Population Genetics, National Institute of Genetics, Research Organization of Information and Systems, Yata, Mishima 411-8540, Japan.

Database Description

As a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), DDBJ (http://www.ddbj.nig.ac.jp) has steadily collected, annotated, released and exchanged the original DNA sequence data, which, for example, is shown by a growth curve of the data submissions in the past years (visit http://www.ddbj.nig.ac.jp/images/breakdown_stats/percentage-e.gif). However, the current situation of data submissions is dramatically changing due to the emergence of ultra-high speed or the 2nd generation sequencers (2GS) such as 454 (by 454 Life Sciences), Solexa (by Illumina, Inc.), SOLiD (by Applied Biosystems) and Helicos (by Heliscope). With these machines the whole human genome could now be sequenced at one-thousandth or less speed of the first cases in 2001 (1, 2). Recently, two reports announced that the whole genome was sequenced for two well-known persons (3, 4), which was perhaps the beginning of personal genomics. Also known is the 1000 human genomes project that is underway in USA, Europe and China to obtain a complete and detailed catalogue of genetic variations of humans (http://www.1000genomes.org/page.php). Those activities warn us that the above growth curve will steepen drastically. At present INSDC have released about 100 billion bases in total. This is the outcome of the collaboration among the three member banks for more than 20 years. However, this number will easily be surpassed when the 1000 human genomes project is completed and the result is submitted to INSDC in a few years, or even before that. To cope with those activities INSDC collaborators discussed in 2008 the attitude towards handling mass submissions produced by 2GS. The common fear among the collaborators was limited computer storages that will sooner or later be filled with continuously coming mass submissions. Nevertheless, the collaborators agreed to collect, distribute and exchange mass data of transcriptomes such as trace archives and short reads, upon the condition that the sequences are assembled. DDBJ has also started to accept and release such mass sequence data.

Recent Developments

In the following text, DDBJ's activity is reported with focus on mass data submissions from Japanese universities and institutes. DDBJ (http://www.ddbj.nig.ac.jp) collected and released 2,368,110 entries or 1,415,106,598 bases in the period from July 2007 to June 2008. The releases in this period include genome scale data of Bombyx mori, Oryzas latipes, Drosophila and Lotus japonicus. In addition, from this year we collected and released trace archive data in collaboration with National Center for Biotechnology Information (NCBI). The first release contains those of Oryzas latipes and bacterial meta-genomes in human gut. To cope with the current progress of sequencing technology, we also accepted and released more than 100 million short reads of parasitic protozoa and their hosts that were produced using a Solexa sequencer.

Acknowledgements

We thank all staff of DDBJ for the data collection, annotation, release, management and software development. DDBJ is funded by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) with the management expenses grant for national university cooperation. DDBJ is also supported by a grant from the National Project of Integrating Life Science Databases.

References

1. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W. et al. (2001) Initial sequencing and analysis of the human genome, Nature, 409, 860-921
2. Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A. et al. (2001) The sequence of the human genome, Science, 291, 1304-1351
3. Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A., He W., Chen, Y.-J., Makhijani, V., Roth, G.T. et al. (2008) The complete genome of an individual by massively parallel DNA sequencing, Nature, 452, 872-876
4. Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P., Axelrod, N., Huang, J., Kirkness, E.F., Denisov, G. et al. (2008) The diploid genome sequence of an individual human, PLoS Biology, 5, 2113-2144


Go to the abstract in the NAR 2014 Database Issue.
Oxford University Press is not responsible for the content of external internet sites