Skip to main content

Zhiyong Lu, PhD


Research Interests

Dr. Lu’s research group is developing data-driven computational methods and software tools, using machine learning and natural language processing, for analyzing and making sense of unstructured text, image, and associated data in biomedicine towards accelerated discovery and better health. Over the years, his research group has developed many novel NLP algorithms and open-source software tools (e.g. PubTator) that are not only top-performers in community-wide challenges but also widely used in many real-world applications. For example, a number of his recent research has been successfully used in PubMed and searched daily by millions of people. His recent publications and talks generally have a focus on the following topics:

  • Literature/PubMed Search (e.g. relevance search; author name disambiguation)
  • NLP & Text Mining (e.g. entity recognition and information extraction)
  • Curation at scale (e.g. automated approaches for speeding up data curation)
  • Machine Learning for Healthcare (e.g. deep learning, medical text/image analysis)
Dr. Lu is a Senior Investigator and NCBI’s Deputy Director for Literature Search at the National Library of Medicine (NLM). He is a Fellow of the American College of Medical Informatics (ACMI), an Associate Editor for Bioinformatics (OUP), Artificial Intelligence in Medicine (Elsevier), BMC Bioinformatics (Springer), Journal of Healthcare Informatics Research (Springer), and serves on the Editorial Board for the journal Database (OUP). He is an organizer of BioCreative, founding chair of the ISMB Text Mining Community of Special Interest, and founding member of the ACL Special Interest Group on Biomedical NLP. In 2011, he was selected by the NIH as its first Earl Stadtman Investigator in Computational Biology and Bioinformatics. Dr. Lu has trained over 40 postdocs/students/interns, and (co-)authored over 180 peer-reviewed scientific publications with over 12,000 citations. He is a Highly Cited Researcher according to Web of Science.


Dai S, You R, Lu Z, Huang X, Mamitsuka H, Zhu S. FullMeSH: improving large-scale MeSH indexing with full text. Bioinformatics. 2020 Mar 1;36(5):1533-1541. doi: 10.1093/bioinformatics/btz756. PubMed PMID: 31596475; PubMed Central PMCID: PMC7523651.

Sayers EW, Beck J, Brister JR, Bolton EE, Canese K, Comeau DC, Funk K, Ketter A, Kim S, Kimchi A, Kitts PA, Kuznetsov A, Lathrop S, Lu Z, McGarvey K, Madden TL, Murphy TD, O'Leary N, Phan L, Schneider VA, Thibaud-Nissen F, Trawick BW, Pruitt KD, Ostell J. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2020 Jan 8;48(D1):D9-D16. doi: 10.1093/nar/gkz899. PubMed PMID: 31602479; PubMed Central PMCID: PMC6943063.

Keenan TD, Dharssi S, Peng Y, Chen Q, Agrón E, Wong WT, Lu Z, Chew EY. A Deep Learning Approach for Automated Detection of Geographic Atrophy from Color Fundus Photographs. Ophthalmology. 2019 Nov;126(11):1533-1540. doi: 10.1016/j.ophtha.2019.06.005. Epub 2019 Jun 11. PubMed PMID: 31358385; PubMed Central PMCID: PMC6810830.

Du J, Chen Q, Peng Y, Xiang Y, Tao C, Lu Z. ML-Net: multi-label classification of biomedical texts with deep neural networks. J Am Med Inform Assoc. 2019 Nov 1;26(11):1279-1285. doi: 10.1093/jamia/ocz085. PubMed PMID: 31233120; PubMed Central PMCID: PMC7647240.