NLM Funds Study to Forecast Long-Term Costs of DataOctober 3, 2018
The National Library of Medicine (NLM) has teamed up with the National Academies of Sciences, Engineering, and Medicine (NASEM) to conduct a study on forecasting the long-term costs for preserving, archiving, and promoting access to biomedical data.
The study is being conducted as part of the NLM’s efforts to develop a sustainable data ecosystem, as outlined in both the NLM Strategic Plan and the NIH Strategic Plan for Data Science. Such an ecosystem is possible because the products and processes of research are now digital by default, and increasingly sophisticated and powerful computation can now be brought to data, rendering meaning that had previously been hidden. Across the biomedical sciences, decisions must be made about where in this ecosystem to invest limited resources to maximize the value of the data for scientific progress; strategies are needed to address question such as: What is the future value of research data? For how long must a dataset be preserved before it should be reviewed for long-term archiving? And what are the resources necessary to support long-term data storage?
“The development of innovative models and frameworks to address these fundamental questions could transform how we plan for the management of biomedical information resources.” said Patricia Flatley Brennan, Director of the National Library of Medicine. “It is essential to bring new ideas, such as econometric approaches, and infuse fresh perspectives into how decisions are made about the preservation and archiving of biomedical data.”
For this study, NASEM will appoint an ad hoc committee to develop a framework for forecasting these costs and estimating potential benefits to research. The committee will examine and evaluate:
- Economic factors to be considered when examining the life-cycle cost for data sets (e.g., data acquisition, preservation, and dissemination);
- Cost consequences for various practices in accessioning and de-accessioning data sets;
- Economic factors to be considered in designating data sets as high value;
- Assumptions built in to the data collection and/or modeling processes;
- Anticipated technological disruptors and future developments in data science in a 5- to 10-year horizon; and
- Critical factors for successful adoption of data forecasting approaches by research and program management staff.
The committee will provide a consensus report and two case studies illustrating the framework’s application to different biomedical contexts relevant to NLM’s data resources. Relevant life-cycle costs will be delineated, as will any assumptions underlying the models. To the extent practicable, NASEM will identify strategies to communicate results and gain acceptance of the applicability of these models.
As highlighted in a recent blog post, NASEM will host a two-day public workshop in late June 2019 to generate ideas and approaches for the committee to consider. Further details on the workshop and public participation will be made available in the coming months.
The NLM is supporting NASEM’s efforts to solicit names of committee members, as well as topics for the committee to consider. Suggestions should be sent to Michelle Schwalbe, Director of NASEM’s Board on Mathematical Sciences and Analytics, or Elizabeth Kittrie, NLM Senior Planning and Evaluation Officer.
The National Library of Medicine (NLM) is a leader in research in biomedical informatics and data science and the world’s largest biomedical library. NLM conducts and supports research in methods for recording, storing, retrieving, preserving, and communicating health information. NLM creates resources and tools that are used billions of times each year by millions of people to access and analyze molecular biology, biotechnology, toxicology, environmental health, and health services information. Additional information is available at https://www.nlm.nih.gov.
Last Reviewed: October 3, 2018