Skip Navigation

BioCreative Virtual Issue

This Database virtual issue is themed around BioCreative: Critical Assessment of Information Extraction in Biology, an international community-wide effort for evaluating text mining and information extraction systems applicable to the biological field. The aim is to drive the development of practically relevant text mining systems in order that these may facilitate information access to biologists and also provide tools that may be integrated into the biocuration workflow and the searching processes conducted by databases.
The following articles were subject to the journal’s normal peer review process, and are collected together here as a ‘virtual issue’.

BioCreative V

EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation
Evangelos Pafilis, Pier Luigi Buttigieg, Barbra Ferrell, Emiliano Pereira, Julia Schnetzer, Christos Arvanitidis and Lars Juhl Jensen
Database 2016: baw005 doi:10.1093/database/baw005
FREE Full Text

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task
Chih-Hsuan Wei, Yifan Peng, Robert Leaman, Allan Peter Davis, Carolyn J. Mattingly, Jiao Li, Thomas C. Wiegers and Zhiyong Lu
Database Vol. 2016, baw032 doi:10.1093/database/baw032
FREE Full Text

CD-REST: a system for extracting chemical-induced disease relation in literature
Jun Xu, Yonghui Wu, Yaoyun Zhang, Jingqi Wang, Hee-Jin Lee and Hua Xu
Database Vol. 2016, baw036 doi:10.1093/database/baw036
FREE Full Text

Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall
Daniel M. Lowe, Noel M. O’Boyle and Roger A. Sayle
Database Vol. 2016, baw039 doi:10.1093/database/baw039
FREE Full Text

Chemical-induced disease relation extraction with various linguistic features
Jinghang Gu, Longhua Qian and Guodong Zhou
Database Vol. 2016, baw042 doi:10.1093/database/baw042
FREE Full Text

Extraction of chemical-induced diseases using prior knowledge and textual information
Ewoud Pons, Benedikt F.H. Becker, Saber A. Akhondi, Zubair Afzal, Erik M. van Mulligen and Jan A. Kors
Database Vol. 2016, baw046 doi:10.1093/database/baw046
FREE Full Text

Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning
Yaoyun Zhang, Jun Xu, Hui Chen, Jingqi Wang, Yonghui Wu, Manu Prakasam and Hua Xu
Database Vol. 2016, baw049 doi:10.1093/database/baw049
FREE Full Text

Exploiting syntactic and semantics information for chemical–disease relation extraction
Huiwei Zhou, Huijie Deng, Long Chen, Yunlong Yang, Chen Jia and Degen Huang
Database Vol. 2016, baw048 doi:10.1093/database/baw048
FREE Full Text

A crowdsourcing workflow for extracting chemical-induced disease relations from free text
Tong Shu Li, Àlex Bravo, Laura I. Furlong, Benjamin M. Good and Andrew I. Su
Database Vol. 2016, baw051 doi:10.1093/database/baw051
FREE Full Text

Chemical entity recognition in patents by combining dictionary-based and statistical approaches
Saber A. Akhondi, Ewoud Pons, Zubair Afzal, Herman van Haagen, Benedikt F.H. Becker, Kristina M. Hettne, Erik M. van Mulligen and Jan A. Kors
Database Vol. 2016, baw061 doi:10.1093/database/baw061
FREE Full Text

BioCreative V CDR task corpus: a resource for chemical disease relation extraction
Jiao Li, Yueping Sun, Robin J. Johnson, Daniela Sciaky, Chih-Hsuan Wei, Robert Leaman, Allan Peter Davis, Carolyn J. Mattingly, Thomas C. Wiegers and Zhiyong Lu
Database Vol. 2016, baw068 doi:10.1093/database/baw068
FREE Full Text

BelSmile: a biomedical semantic role labeling approach for extracting biological expression language from text
Po-Ting Lai, Yu-Yan Lo, Ming-Siang Huang, Yu-Cheng Hsiao, and Richard Tzong-Han Tsai
Database Vol. 2016, baw064 doi:10.1093/database/baw064
FREE Full Text

BioC-compatible full-text passage detection for protein–protein interactions using extended dependency graph
Yifan Peng, Cecilia Arighi, Cathy H. Wu, and K. Vijay-Shankeri
Database Vol. 2016, baw072 doi:10.1093/database/baw072
FREE Full Text

Chemical entity recognition in patents by combining dictionary-based and statistical approaches
Saber A. Akhondi, Ewoud Pons, Zubair Afzal, Herman van Haagen, Benedikt F.H. Becker, Kristina M. Hettne, Erik M. van Mulligen, and Jan A. Kors
Database Vol. 2016, baw061 doi:10.1093/database/baw061
FREE Full Text

BELTracker: evidence sentence retrieval for BEL statements
Majid Rastegar-Mojarad, Ravikumar Komandur Elayavilli, and Hongfang Liu
Database Vol. 2016, baw079 doi:10.1093/database/baw079
FREE Full Text

Argo: enabling the development of bespoke workflows and services for disease annotation
Riza Batista-Navarro, Jacob Carter, and Sophia Ananiadou
Database Vol. 2016, baw066 doi:10.1093/database/baw066
FREE Full Text

Mining chemical patents with an ensemble of open systems
Robert Leaman, Chih-Hsuan Wei, Cherry Zou, and Zhiyong Lu
Database Vol. 2016, baw065 doi:10.1093/database/baw065
FREE Full Text

A knowledge-poor approach to chemical-disease relation extraction
Firoj Alam, Anna Corazza, Alberto Lavelli, and Roberto Zanoli
Database Vol. 2016, baw071 doi:10.1093/database/baw071
FREE Full Text

MET network in PubMed: a text-mined network visualization and curation system
Hong-Jie Dai, Chu-Hsien Su, Po-Ting Lai, Ming-Siang Huang, Jitendra Jonnagaddala, Toni Rose Jue, Shruti Rao, Hui-Jou Chou, Marija Milacic, Onkar Singh, Shabbir Syed-Abdul, and Wen-Lian Hsu
Database Vol. 2016, baw090 doi:10.1093/database/baw090
FREE Full Text

HITSZ_CDR: an end-to-end chemical and disease relation extraction system for BioCreative V
Haodi Li, Buzhou Tang, Qingcai Chen, Kai Chen, Xiaolong Wang, Baohua Wang, and Zhe Wang
Database Vol. 2016, baw077 doi:10.1093/database/baw077
FREE Full Text

AuDis: an automatic CRF-enhanced disease normalization in biomedical text
Hsin-Chun Lee, Yi-Yu Hsu, and Hung-Yu Kao
Database Vol. 2016, baw091 doi:10.1093/database/baw091
FREE Full Text

Mining clinical attributes of genomic variants through assisted literature curation in Egas
Sérgio Matos, David Campos, Renato Pinho, Raquel M. Silva, Matthew Mort, David N. Cooper, and José Luís Oliveira
Database Vol. 2016, baw096 doi:10.1093/database/baw096
FREE Full Text

Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text
Àlex Bravo, Tong Shu Li, Andrew I. Su, Benjamin M. Good, and Laura I. Furlong
Database Vol. 2016, baw094 doi:10.1093/database/baw094
FREE Full Text

BioCreative V track 4: a shared task for the extraction of causal network information using the Biological Expression Language
Fabio Rinaldi, Tilia Renate Ellendorff, Sumit Madan, Simon Clematide, Adrian van der Lek, Theo Mevissen, and Juliane Fluck
Database Vol. 2016, baw067 doi:10.1093/database/baw067
FREE Full Text

Coreference resolution improves extraction of Biological Expression Language statements from texts
Miji Choi, Haibin Liu, William Baumgartner, Justin Zobel, and Karin Verspoor
Database Vol. 2016, baw076 doi:10.1093/database/baw076
FREE Full Text

Crowdsourcing and curation: perspectives from biology and natural language processing
Lynette Hirschman, Karën Fort, Stéphanie Boué, Nikos Kyrpides, Rezarta Islamaj Doğan, and Kevin Bretonnel Cohen
Database Vol. 2016, baw115 doi:10.1093/database/baw115
FREE Full Text

BioCreative 2014

EDITORIAL: BioCreative-IV virtual issue
Cecilia N. Arighi, Cathy H. Wu, Kevin B. Cohen, Lynette Hirschman, Martin Krallinger, Alfonso Valencia, Zhiyong Lu, John W. Wilbur, and Thomas C. Wiegers
Database 2014: bau039 doi:10.1093/database/bau039
FREE Full Text

A robust data-driven approach for gene ontology annotation
Yanpeng Li and Hong Yu
Database 2014: bau113 doi:10.1093/database/bau113
FREE Full Text

Unsupervised gene function extraction using semantic vectors
Ehsan Emadzadeh, Azadeh Nikfarjam, Rachel E. Ginn, and Graciela Gonzalez
Database 2014: bau084 doi:10.1093/database/bau084
FREE Full Text

Integrating information retrieval with distant supervision for Gene Ontology annotation
Dongqing Zhu, Dingcheng Li, Ben Carterette, and Hongfang Liu
Database 2014: bau087 doi:10.1093/database/bau087
FREE Full Text

RLIMS-P: an online text-mining tool for literature-based extraction of protein phosphorylation information
Manabu Torii, Gang Li, Zhiwen Li, Rose Oughtred, Francesca Diella, Irem Çelen, Cecilia N. Arighi, Hongzhan Huang, K. Vijay-Shanker, and Cathy H. Wu
Database 2014: bau081 doi:10.1093/database/bau081
FREE Full Text

Overview of the gene ontology task at BioCreative IV
Yuqing Mao, Kimberly Van Auken, Donghui Li, Cecilia N. Arighi, Peter McQuilton, G. Thomas Hayman, Susan Tweedie, Mary L. Schaeffer, Stanley J. F. Laulederkind, Shur-Jen Wang, Julien Gobeill, Patrick Ruch, Anh Tuan Luu, Jung-jae Kim, Jung-Hsien Chiang, Yu-De Chen, Chia-Jung Yang, Hongfang Liu, Dongqing Zhu, Yanpeng Li, Hong Yu, Ehsan Emadzadeh, Graciela Gonzalez, Jian-Ming Chen, Hong-Jie Dai, and Zhiyong Lu
Database 2014: bau086 doi:10.1093/database/bau086
FREE Full Text

Closing the loop: from paper to protein annotation using supervised Gene Ontology classification
Julien Gobeill, Emilie Pasche, Dina Vishnyakova, and Patrick Ruch
Database 2014: bau088 doi:10.1093/database/bau088
FREE Full Text

LiverCancerMarkerRIF: a liver cancer biomarker interactive curation system combining text mining and expert annotations
Hong-Jie Dai, Johnny Chi-Yang Wu, Wei-San Lin, Aaron James F. Reyes, Mira Anne C. dela Rosa, Shabbir Syed-Abdul, Richard Tzong-Han Tsai, and Wen-Lian Hsu
Database 2014: bau085 doi:10.1093/database/bau085
FREE Full Text

tmBioC: improving interoperability of text-mining tools with BioC
Ritu Khare, Chih-Hsuan Wei, Yuqing Mao, Robert Leaman, and Zhiyong Lu
Database 2014: bau073 doi:10.1093/database/bau073
FREE Full Text

BC4GO: a full-text corpus for the BioCreative IV GO task
Kimberly Van Auken, Mary L. Schaeffer, Peter McQuilton, Stanley J. F. Laulederkind, Donghui Li, Shur-Jen Wang, G. Thomas Hayman, Susan Tweedie, Cecilia N. Arighi, James Done, Hans-Michael Müller, Paul W. Sternberg, Yuqing Mao, Chih-Hsuan Wei, and Zhiyong Lu
Database 2014: bau074 doi:10.1093/database/bau074
FREE Full Text

Assisting manual literature curation for protein–protein interactions using BioQRator
Dongseop Kwon, Sun Kim, Soo-Yong Shin, Andrew Chatr-aryamontri, and W. John Wilbur
Database 2014: bau067 doi:10.1093/database/bau067
FREE Full Text

Web services-based text-mining demonstrates broad impacts for interoperability and process simplification
Thomas C. Wiegers, Allan Peter Davis, and Carolyn J. Mattingly
Database 2014: bau050 doi:10.1093/database/bau050
FREE Full Text

BioC interoperability track overview
Donald C. Comeau, Riza Theresa Batista-Navarro, Hong-Jie Dai, Rezarta Islamaj Doğan, Antonio Jimeno Yepes, Ritu Khare, Zhiyong Lu, Hernani Marques, Carolyn J. Mattingly, Mariana Neves, Yifan Peng, Rafal Rak, Fabio Rinaldi, Richard Tzong-Han Tsai, Karin Verspoor, Thomas C. Wiegers, Cathy H. Wu, and W. John Wilbur
Database 2014: bau053 doi:10.1093/database/bau053
FREE Full Text

Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12
Socorro Gama-Castro, Fabio Rinaldi, Alejandra López-Fuentes, Yalbi Itzel Balderas-Martínez, Simon Clematide, Tilia Renate Ellendorff, Alberto Santos-Zavaleta, Hernani Marques-Madeira, and Julio Collado-Vides
Database 2014: bau049 doi:10.1093/database/bau049
FREE Full Text

Processing biological literature with customizable Web services supporting interoperable formats
Rafal Rak, Riza Theresa Batista-Navarro, Jacob Carter, Andrew Rowley, and Sophia Ananiadou
Database 2014: bau064 doi:10.1093/database/bau064
FREE Full Text

Text-mining-assisted biocuration workflows in Argo
Rafal Rak, Riza Theresa Batista-Navarro, Andrew Rowley, Jacob Carter, and Sophia Ananiadou
Database 2014: bau070 doi:10.1093/database/bau070
FREE Full Text

BioC implementations in Go, Perl, Python and Ruby
Wanli Liu, Rezarta Islamaj Doğan, Dongseop Kwon, Hernani Marques, Fabio Rinaldi, W. John Wilbur, and Donald C. Comeau
Database 2014: bau059 doi:10.1093/database/bau059
FREE Full Text

Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus
Donald C. Comeau, Haibin Liu, Rezarta Islamaj Doğan, and W. John Wilbur
Database 2014: bau056 doi:10.1093/database/bau056
FREE Full Text

Egas: a collaborative and interactive document curation platform
David Campos, Jóni Lourenço, Sérgio Matos, and José Luís Oliveira
Database 2014: bau048 doi:10.1093/database/bau048
FREE Full Text

Finding abbreviations in biomedical literature: three BioC-compatible modules and four BioC-formatted corpora
Rezarta Islamaj Doğan, Donald C. Comeau, Lana Yeganova, and W. John Wilbur
Database 2014: bau044 doi:10.1093/database/bau044
FREE Full Text

tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles
Juan Miguel Cejuela, Peter McQuilton, Laura Ponting, Steven J. Marygold, Raymund Stefancsik, Gillian H. Millburn, Burkhard Rost, the FlyBase Consortium
Database 2014 : bau033 doi: 10.1093/database/bau033
FREE Full Text

iSimp in BioC standard format: enhancing the interoperability of a sentence simplification system
Yifan Peng, Catalina O. Tudor, Manabu Torii, Cathy H. Wu and K. Vijay-Shanker
Database 2014 : bau038 doi: 10.1093/database/bau038
FREE Full Text

BioC: a minimalist approach to interoperability for biomedical text processing
Donald C. Comeau, Rezarta Islamaj Doğan, Paolo Ciccarese, Kevin Bretonnel Cohen, Martin Krallinger, Florian Leitner, Zhiyong Lu, Yifan Peng, Fabio Rinaldi, Manabu Torii, Alfonso Valencia, Karin Verspoor, Thomas C. Wiegers, Cathy H. Wu, and W. John Wilbur
Database 2013: bat064 doi:10.1093/database/bat064
FREE Full Text

BioCreative 2012


BioCreative - 2012 Virtual Issue
Cathy H. Wu, Cecilia N. Arighi, Kevin B. Cohen, Lynette Hirschman, Martin Krallinger, Zhiyong Lu, Carolyn Mattingly, Alfonso Valencia, Thomas C. Wiegers, and W. John Wilbur
Database Vol. 2012, bas049; doi:10.1093/database/bas049
FREE Full Text


Collaborative biocuration—text-mining development task for document prioritization for curation
Thomas C. Wiegers, Allan Peter Davis, and Carolyn J. Mattingly
Database Vol. 2012, bas037; doi:10.1093/database/bas037
FREE Full Text

Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information
Sun Kim, Won Kim, Chih-Hsuan Wei, Zhiyong Lu, and W. John Wilbur
Database Vol. 2012, bas042; doi:10.1093/database/bas042
FREE Full Text

Using the OntoGene pipeline for the triage task of BioCreative 2012
Fabio Rinaldi, Simon Clematide, Simon Hafner, Gerold Schneider, Gintarė Grigonytė, Martin Romacker, and Therese Vachon
Database Vol. 2013, bas053; doi:10.1093/database/bas053
FREE Full Text

Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
Dina Vishnyakova, Emilie Pasche, and Patrick Ruch
Database Vol. 2012, bas050; doi:10.1093/database/bas050
FREE Full Text


Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II
Zhiyong Lu and Lynette Hirschman
Database Vol. 2012, bas043; doi:10.1093/database/bas043
FREE Full Text

Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR
Kimberly Van Auken, Petra Fey, Tanya Z. Berardini, Robert Dodson, Laurel Cooper, Donghui Li, Juancarlos Chan, Yuling Li, Siddhartha Basu, Hans-Michael Muller, Rex Chisholm, Eva Huala, Paul W. Sternberg and the WormBase Consortium
Database Vol. 2012, bas040; doi:10.1093/database/bas040
FREE Full Text

Building an efficient curation workflow for the Arabidopsis literature corpus
Donghui Li, Tanya Z. Berardini, Robert J. Muller, and Eva Huala
Database Vol. 2012, bas047; doi:10.1093/database/bas047
FREE Full Text

Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database
Harold J. Drabkin, Judith A. Blake, for the Mouse Genome Informatics Database
Database Vol. 2012, bas045; doi:10.1093/database/bas045
FREE Full Text

The Xenbase literature curation process
Jeff B. Bowes, Kevin A. Snyder, Christina James-Zorn, Virgilio G. Ponferrada, Chris J. Jarabek, Kevin A. Burns, Bishnu Bhattacharyya, Aaron M. Zorn, and Peter D. Vize
Database Vol. 2013, bas046; doi:10.1093/database/bas046
FREE Full Text

Opportunities for text mining in the FlyBase genetic literature curation workflow
Peter McQuilton and the FlyBase Consortium
Database Vol. 2012, bas039; doi:10.1093/database/bas039
FREE Full Text

Developing a biocuration workflow for AgBase, a non-model organism database
Lakshmi Pillai, Philippe Chouvarine, Catalina O. Tudor, Carl J. Schmidt, K. Vijay-Shanker, and Fiona M. McCarthy
Database Vol. 2012, bas038; doi:10.1093/database/bas038
FREE Full Text


An overview of the BioCreative 2012 Workshop Track III: interactive text mining task
Cecilia N. Arighi, Ben Carterette, K. Bretonnel Cohen, Martin Krallinger, W. John Wilbur, Petra Fey, Robert Dodson, Laurel Cooper, Ceri E. Van Slyke, Wasila Dahdul, Paula Mabee, Donghui Li, Bethany Harris, Marc Gillespie, Silvia Jimenez, Phoebe Roberts, Lisa Matthews, Kevin Becker, Harold Drabkin, Susan Bello, Luana Licata, Andrew Chatr-aryamontri, Mary L. Schaeffer, Julie Park, Melissa Haendel, Kimberly Van Auken, Yuling Li, Juancarlos Chan, Hans-Michael Muller, Hong Cui, James P. Balhoff, Johnny Chi-Yang Wu, Zhiyong Lu, Chih-Hsuan Wei, Catalina O. Tudor, Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan, Juan Miguel Cejuela, Pratibha Dubey, and Cathy Wu
Database Vol. 2013, bas056; doi:10.1093/database/bas056
FREE Full Text

PPInterFinder — a mining tool for extracting causal relations on human proteins from literature
Kalpana Raja, Suresh Subramani, and Jeyakumar Natarajan
Database Vol. 2013, bas052; doi:10.1093/database/bas052
FREE Full Text

The eFIP system for text mining of protein interaction networks of phosphorylated proteins
Catalina O. Tudor, Cecilia N. Arighi, Qinghua Wang, Cathy H. Wu, and K. Vijay-Shanker
Database Vol. 2012, bas044; doi:10.1093/database/bas044
FREE Full Text

T-HOD: a literature-based candidate gene database for hypertension, obesity and diabetes
Hong-Jie Dai, Johnny Chi-Yang Wu, Richard Tzong-Han Tsai, Wen-Harn Pan, and Wen-Lian Hsu
Database Vol. 2013, bas061; doi:10.1093/database/bas061
FREE Full Text

Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts
Chih-Hsuan Wei, Bethany R. Harris, Donghui Li, Tanya Z. Berardini, Eva Huala, Hung-Yu Kao, and Zhiyong Lu
Database Vol. 2012, bas041; doi:10.1093/database/bas041
FREE Full Text