SoyKB - Soybean Knowledge Base

NAR Molecular Biology Database Collection entry number 1746
Joshi, T.1,2,3,4, Fitzpatrick, M.R.1,2, Chen, S.1,2, Liu, Y.2,4, Zhang, H.1,2, Endacott, R.Z.1,2, Gaudiello, E.C.1,2, Stacey, G.2,3,5, Nguyen, H.T.2,3,5, Xu, D.1,2,3,4
1Department of Computer Science, University of Missouri, Columbia, MO 65211, USA 2Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA 3National Center for Soybean Biotechnology, University of Missouri, Columbia, MO 65211, USA 4Informatics Institute, University of Missouri, Columbia, MO 65211, USA 5Division of Plant Sciences, University of Missouri, Columbia, MO 65211, USA

Database Description

Many genome-scale data are available in soybean (Glycine Max) (1) including genomics, transcriptomics, proteomics and metabolomics datasets, together with growing knowledge of soybean in gene, microRNAs, pathways, and phenotypes. This represents rich and resourceful information which can provide valuable insights, if mined in an innovative and integrative manner. Soybean Knowledge Base (SoyKB), is a comprehensive all-inclusive web resource developed for soybean translational genomics and molecular breeding. SoyKB stores information about genes/proteins, miRNAs/sRNAs, metabolites, SNPs, plant introduction (PI lines) and traits. It handles the management and integration of soybean genomics and multi-omics data along with gene function annotations, biological pathway and trait information. It has many useful tools including gene family search, multiple gene/metabolite analysis, motif analysis tool, protein 3D structure viewer and data download and upload capacity. It has a user-friendly web interface together with genome browser and pathway viewer, which displays data in an intuitive manner to the soybean researchers, breeders and consumers. SoyKB can be publicly accessed at

Recent Developments

Since the initial publication (2) we have made significant developments and added many new and advanced tools, new types of data and analysis capabilities for our users. We have incorporated two completely new entities to our database from the molecular breeding perspective, which were not part of SoyKB earlier. One is the Plant Introduction (PI) data for ~19,000 soybean germplasm lines from USDA-ARS ( and another is the Trait entity describing phenotypic data. These data have been integrated with our QTLs, SNP and GWAS datasets in our newly developed suite of tools for the In Silico Breeding Program. This suite of tools allows integration and extraction of the data in a tabular format as well as graphical visualization in our in-house Chromosome Visualizer. It also supports integration and visualization of Genotype by Sequencing (GBS) data for molecular breeding and phenotypic inferences.

In addition, SoyKB now has capacity for incorporation of DNA methylation data (3) and fast neutron mutation datasets. It is also linked seamlessly with P3DB (4) for phosphorylation data. We have also incorporated suite of tools for differential expression analysis for microarray, transcriptomics RNA-seq, proteomics and metabolomics datasets. It includes access to gene lists, Venn diagrams, Volcano plots, functional annotations and pathway analysis. SoyKB is now powered by the iPlant (5) Cyber-Infrastructure. The website is hosted on the iPlant's advanced computing infrastructure established to leverage the data analysis capabilities. We are also developing Cyber Studio, an in silico hypothesis generation and testing tool, which will provide various templates for users to conduct multi-omics analysis, signaling pathway construction, gene regulatory network prediction, genotype-phenotype inference, etc. based on protein-protein interaction data, pathways, gene expression and comparative genomics.


The authors wish to thank the labs of Suk-Ha Lee, Jay Thelen, Steve Clough and Melissa Mitchum for contributing data to SoyKB. The development has been supported by the Missouri Soybean Merchandising Council (MSMC #306); United Soybean Board (project 8236); National Science Foundation (#DBI-0421620); Department of Energy (DE-SC0004898, and the National Center for Soybean Biotechnology. We also thank the iPlant Collaborative for their computational resources and technical support. The iPlant Collaborative ( is funded by a grant from the National Science Foundation (#DBI-0735191).


1. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, et al. (2010) Genome sequence of the palaeopolyploid soybean. Nature, 463, 178-183.
2. Joshi T, Patil K, Fitzpatrick MR, Franklin LD, Yao Q, Cook JR, Wang Z, Libault M, Brechenmacher L, Valliyodan B, et al. (2012) Soybean Knowledge Base (SoyKB): a web resource for soybean translational genomics. BMC Genomics, 13(Suppl 1): S15.
3. Schmitz RJ, He Y, Valdés-López O, Khan SM, Joshi T, Urich MA, Nery JR, Diers B, Xu D, Stacey G, et al. (2013) Epigenome-wide inheritance of cytosine methylation variants in a recombinant inbred population. Genome Res., 10.1101/gr.152538.112
4. Gao J, Agrawal GK, Thelen JJ, Xu D. (2009) P3DB: a plant protein phosphorylation database. Nucleic Acids Res., 37, D960-962.
5. Goff SA, Vaughn M, McKay S, Lyons E, Stapleton AE, Gessler D, Matasci N, Wang L, Hanlon M, Lenards A, et al. (2011) The iPlant Collaborative: Cyberinfrastructure for Plant Biology. Front Plant Sci., 2: 34. 10.3389/fpls.2011.00034.

Category: Plant databases
Subcategory: Other plants

Go to the abstract in the NAR 2014 Database Issue.
Oxford University Press is not responsible for the content of external internet sites