![]() |
Research SummaryPublication: Bartoszewicz JM, Nasri F, Nowicka M, Renard BY. Detecting DNA of novel fungal pathogens using ResNets and a curated fungi-hosts data collection. Bioinformatics. 2022 Sep 16;38(Suppl_2):ii168-ii174. doi: 10.1093/bioinformatics/ btac495. PMID: 36124807.Topic: Bioinformatics resource development Researchers created a curated database of fungal host-range data linked to publicly available genomes. Using neural networks trained on this data, they tested whether this combination of genomic and host information can be used to predict pathogenicity using both sequence homology and deep-learning approaches. The researchers’ database contained over 1400 genomes linked to host and disease phenotype metadata from multiple existing databases and found that their neural networks could accurately detect fungal pathogens in Next Generation Sequencing (NGS) datasets. The trained models predicted pathogenicity and whether a fungus infects humans vs. other hosts. They also developed models with separate classifiers for fungal, viral, and bacterial pathogens. |
Potential CGR Impact on ResearchThe following are examples of how CGR resources and capabilities could impact this study.NCBI Datasets: Researchers could automatically produce tabular-formatted metadata (and only for specified relevant fields) for a large list of taxa using NCBI Datasets. NCBI Datasets reports by taxon or genome would allow the researchers to easily add information about assembly quality or status of included genomes, which could then be used to assess the effects of genome quality on results. The researchers could then use NCBI Datasets reports/assembled genomes to better understand taxonomic bias or gaps in the genomes available for taxonomic group. Researchers could programmatically pull target genomes and metadata using an API connection. Data pulls could be easily performed at regular intervals and fit specified characteristics, including newly added data. |