NAR Molecular Biology Database Collection entry number 392
Altman R.B., Carrillo M.W., Gong M, Gor W., Hernandez-Boussard T., Holbert D., Kiuchi M., MacBride A., Murray T., Liu F., Thorn C.F., Woon M., Truong T., Zhou T. and Klein T.E.
Department of Genetics, Stanford Medical Informatics

Database Description

The Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB; is a public resource that promotes research into the relationships between human genotypes, phenotypes, and clinical outcomes by linking and annotating primary datasets from ongoing research and established data from the literature. In addition to gene-drug relationships, the PharmGKB also contains data on gene variation, genomics, gene-disease relationships, drug action, and pathways. The PharmGKB has developed highly curated pathways documenting the genes involved in pharmacodynamics and pharmacokinetics of aselection of drugs and aims to enrich these by exchange with other metabolic and signaling pathway resources using BioPAX exchange format. We developed an XML format for defining genotype data, a relational database schema for data storage, and a flexible mechanism for submitting phenotype data. We are participating in the PML effort to define an XML standard for genotype/phenotype data exchange. Finally, we created a community project that encourages the research community to submit pharmacogenomic knowledge from the literature. PharmGKB first came online in April 2000. Access is free but requires users to register for a username and password for viewing individual subject data, in order to comply with HIPAA regulations.

CATEGORIES OF EVIDENCE To establish a link between a genetic variation and variation in a phenotype, many different types of experiments can be performed, ranging from genotypic studies to cellular phenotype assays, to clinical studies. It is problematic to search a database with such diverse data types. We associate all experimental results with a category of evidence, which provides the context for its significance. We use five categories, and all data entering PharmGKB must be labeled with one of them. If a study shows a significant difference in a clinical outcome (death, disability, pain, days of work missed) based on a genetic difference, then the data falls under the Clinical Outcome category. The Pharmacodynamics and Drug Response category consists of studies that have shown a variation in a drug response that can be measured clinically, but is not a direct outcome. The Pharmacokinetics category quantifies how drug metabolism changes based on genetic changes, and Molecular and Cellular Functional Assays show associations between genetic changes and changes in functional assay results. Finally, the Genotype category contains data showing basic variability in gene sequences. We show the Category of Evidence for all data being displayed, as well as appropriate literature annotations and cross-references to external databases. We expect that researchers will direct their hypotheses into those areas where data is sparse.

PHENOTYPE DATA While much work has been done to create an XML and associated database schema for describing information about genetic variation, the PharmGKB is also focused on collecting phenotypic data associated with genetic variation. There are inherent difficulties of attempting to create a database schema for phenotypic data because of the vast range of clinical studies, each with specialized methods of reporting results. Therefore, PharmGKB initially took a library approach to accepting phenotypic data. We accept annotated spreadsheet data from researchers, who also provide us with the Category of Evidence of the data, the gene-drug relationships, and keywords for finding the data. We store and index the data in the database and link to related research information. We present a unified view with the Categories of Evidence, such that users can easily access all information related to a drug-gene relationship. Users searching our site can then view and download phenotype data. We are actively engaged in the PML effort (2) to create a standard for the definition, storage, and exchange of phenotype data via XML.

DATABASE UPGRADES Qualitative analysis of our system has shown that having a frame-based data storage facility suffered in retrieval performance. While the use of a frame-base system provided a flexible means for early data modeling, a database backend has shown better performance in storing and retrieving information. (3) Data is now stored centrally in a relational database. 4) We are beginning to accept large, high-throughput datasets, including whole-genome analyses, and are preparing to meet the challenges of future datasets in the multi-gigabyte-to-terabyte range.

Recent Developments

In an effort to build up a repository of drug-gene relationships, we have created the Community-Based Pharmacogenetic Information Project, which allows the scientific community to submit information about gene-drug relationships while specifying a Category of Evidence and citing a source for the information. We expect that by encouraging the community to submit pharmacogenomic associations that they deem important, we can capture relationships and solicit data sets that may have been overlooked. Users can also search through this repository to discover relationships between genes or drugs of interest, and direct research into those areas.

We have released an extensive XML schema that describes our object model, the relationships among the categories of evidence, and encompasses fundamental objects such as genes, drugs, and chromosomes. We are committed to using and supporting open source projects, to encourage other informatics groups to benefit from our code and from our knowledge representations.


PharmGKB is financially supported by grants from the National Institute of General Medical Sciences (NIGMS), Human Genome Research Institute (NHGRI) and National Library of Medicine (NLM) within the National Institutes of Health (U01GM61374; Russ Altman, PI).


1. Evans,W.E. and Relling,M.V. (1999) Pharmacogenomics: translating functional genomics into rational therapeutics. Science, 286, 487-491.
2. Sugawara et al. (2004) Polymorphism Markup Language (PML) for the interoperability of data on SNPs and other sequence variations. Presented at the 15th International Conference on Genome Informatics December 16-18, 2004, Yokohama Pacifico, Japan
3. Rubin DL, Shafa F, Oliver DE, Hewett M, Altman RB. (2002) Representing genetic sequence data for pharmacogenomics: an evolutionary approach using ontological and relational database models. Bioinformatics, 18 Suppl 1, S207-15
4. Hewett M, Oliver DE, Rubin DL, Easton KL, Stuart JM, Altman RB, Klein TE. (2001) PharmGKB: the Pharmacogenomics Knowledge Base. Nucleic Acids Res. 30, 163-5
5. Thorn CF, Klein TE, Altman RB (2005) PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. Methods Mol Biol. 311:179-191.

Subcategory: Drugs and drug design

Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites