NAR Molecular Biology Database Collection entry number 1798
Wang, Guo-Dong; Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Irwin, David; Zhang, Ya-Ping; Liu, Yan-Hu; Zhu, Jun-Wei
1Laboratory for Conservation and Utilization of Bioresource & Key Laboratory for Microbial Resources of the Ministry of Education, Yunnan University, Kunming 650091, China. 2Core Genomic Facility and 3CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. 4Department of Molecular and Cell Biology, School of Life Sciences, University of Science and Technology of China, Hefei 230026, China. 5Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada. 6State Key Laboratory of Genetic Resources and Evolution, and Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China. 7Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, China.

Database Description

Dogs have been the dearest friends of humans as guardians, companions and working partners for thousands of years. Since dogs are one of the most thoroughly domesticated animals, this has made them a long-term focus and perfect model for domestication genetics studies. Furthermore, their parallel evolution with humans has facilitated our understanding of human evolution itself. However, currently the most widely used dog single nucleotide polymorphism (SNP) dataset is from dbSNP (build139:, where ~80% of the ~2.7M SNPs were called from only two dogs, one Boxer(1) and one Standard Poodle(2). dbSNP only provides a total SNP list, combining SNPs from all samples, and does not have individual lists, which limits its usability in population genetic analysis. The rapid development of Next-Generation Sequencing (NGS) technology has facilitated the generation of massive dog/wolf genome datasets. However, SNP calling from the massive data generated by NGS is very laborious and requires a large amount of computational resources. This situation has promoted us to establish a canidae specific database (DoGSD) that focuses on whole genome SNP data from domesticated dogs and grey wolves. We collected SNPs from 2 unpublished dog/wolf genomes, 3 recently published works (3-5) that used 75 dog/wolf samples and the latest dog SNP dataset (dbSNP139). In total, DoGSD includes 8 grey wolves sampled from the Eurasian continent and America, 34 Chinese indigenous dogs from China, 34 breed dogs representing 5 breeds and 1 Dingo. Dogs from the 5 breeds include 11 German Shepherd dogs (Germany), 10 Kunming dogs (China), 11 Tibetan Mastiffs (China), 1 Belgian Malinois (Belgium), and 1 Basenji (Congo). The 34 Chinese indigenous dogs in our samples represent a key phase in dog domestication and are not included in any existing SNP database. DoGSD provides a powerful SNP retrieval interface that can extract data for each individual sample as well as a non-redundant dataset. SNPs are annotated to integrate information such as SNP-related genes, transcripts, proteins, and allows calculation of allele frequencies. DoGSD functionality includes the ability to search for gene-related synonymous and non-synonymous SNPs. In addition, we incorporate the essential genetic statistics viz. F- statistics (Fst) into DoGSD.

Recent Developments

The DoGSD database can be accessed through a simple user interfaces. There are 2 main functionalities designed into DoGSD for data retrieval: Browse and Search. Users may browse the non-redundant or individual sample SNPs either as text format in tables, or in a chromosome based graphical GBrowse interface. Sample information, SNP-related genes, transcripts, proteins, allele frequency, Fst values, synonymous and non-synonymous SNPs summary tables can be retrieved through the Browse and the Search interfaces. Comparative searches for SNPs between two or more individuals are also implemented. Moreover, all SNP data (individual or non-redundant set) can be freely downloaded as tab-delimited files and bam/fastq/sra format files for the 75 cited samples provided.


We thank all group members of DoGSD for sample collection, sequencing, SNP calling, annotation, database release and management. DoGSD is funded by grants from the 973 program [2013CB835200 and 2013CB835202 to G.D.W], and the Chinese Academy of Sciences [1731200000001 to W.Z]. The Youth Innovation Promotion Association, Chinese Academy of Sciences provided support to G.D.W.


1. Lindblad-Toh, K., Wade, C.M., Mikkelsen, T.S., Karlsson, E.K., Jaffe, D.B., Kamal, M., Clamp, M., Chang, J.L., Kulbokas, E.J., 3rd, Zody, M.C. et al. (2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature, 438, 803-819.
2. Kirkness, E.F., Bafna, V., Halpern, A.L., Levy, S., Remington, K., Rusch, D.B., Delcher, A.L., Pop, M., Wang, W., Fraser, C.M. et al. (2003) The dog genome: survey sequencing and comparative analysis. Science, 301, 1898-1903.
3. Gou, X., Wang, Z., Li, N., Qiu, F., Xu, Z., Yan, D., Yang, S., Jia, J., Kong, X., Wei, Z. et al. (2014) Whole-genome sequencing of six dog breeds from continuous altitudes reveals adaptation to high-altitude hypoxia. Genome research, 24, 1308-1315.
4. Freedman, A.H., Gronau, I., Schweizer, R.M., Ortega-Del Vecchyo, D., Han, E., Silva, P.M., Galaverni, M., Fan, Z., Marx, P., Lorente-Galdos, B. et al. (2014) Genome sequencing highlights the dynamic early history of dogs. PLoS genetics, 10, e1004016.
5. Wang, G.D., Zhai, W., Yang, H.C., Fan, R.X., Cao, X., Zhong, L., Wang, L., Liu, F., Wu, H., Cheng, L.G. et al. (2013) The genomics of selection in dogs and the parallel evolution between dogs and humans. Nature communications, 4, 1860.

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites