CGR Impact Spotlight header image

Cover image of Bioinformatics Journal volume 38

Research Summary

PubMed iconPublication (PMID: 36124807) Bartoszewicz JM, Nasri F, Nowicka M, Renard BY. Detecting DNA of novel fungal pathogens using ResNets and a curated fungi-hosts data collection. Bioinformatics. 2022 Sep 16;38(Suppl_2):ii168-ii174. doi: 10.1093/bioinformatics/btac495.

Topic: Bioinformatics resource development

Researchers created a curated database of fungal host-range data linked to publicly available genomes. Using custom neural networks trained on these data, they showed that this combination of genomic and host information can be used to predict pathogenicity using both sequence homology and deep-learning approaches.

The researchers’ database contains over 1600 genomes linked to host and disease phenotype metadata from multiple existing databases. They found that their neural networks could accurately detect fungal pathogens in Next Generation Sequencing (NGS) datasets. The trained models predicted pathogenicity and whether a fungus infects humans vs. other hosts. They also developed models capable of identifying novel fungal, viral, and bacterial pathogens in human-derived samples.

Potential CGR Impact on Research

The following are examples of how CGR resources and capabilities could impact this study.

NCBI Datasets: NCBI Datasets: Researchers could automatically produce tabular-formatted metadata (and only for specified relevant fields) for a large list of taxa using NCBI Datasets. NCBI Datasets reports by taxon or genome would help researchers easily add information about assembly quality or status of included genomes, which could then be used to assess the effects of genome quality on results. The researchers could then use NCBI Datasets reports/assembled genomes to better understand taxonomic bias or gaps in the genomes available for the taxonomic group. Researchers could programmatically pull target genomes and metadata using command line tools or an API connection. Data pulls could be easily performed at regular intervals and fit specified characteristics, including newly added data.

NLM NCBI Logo   email icon Join our mailing list thumbs up icon Follow us on social  
  contact us icon Contact us website icon Visit our CGR website