NLM Releases Extensible Markup Language (XML) for IndexCat™ Data
Data Includes More than 3.7 Million Bibliographic Items Spanning Five Centuries
The National Library of Medicine, the world's largest medical library and a component of the National Institutes of Health, announces that Extensible Markup Language (XML) data from the IndexCat™ database is now available for free download.
Released with a Document Type Definition (DTD) that allows researchers to validate the data, this new XML release includes the digitized content of more than 3.7 million bibliographic items from the printed, 61-volume Index-Catalogue of the Library of the Surgeon-General's Office, originally published from 1880 to 1961. The XML describes items spanning five centuries, including millions of journal and newspaper articles, obituaries, and letters, hundreds of thousands of monographs and dissertations, and thousands of portraits. Together, these items cover a wide range of subjects such as the basic sciences, scientific research, civilian and military medicine, public health, and hospital administration.
The NLM release of the Index-Catalogue in XML format opens this key resource in the history of medicine and science to new uses and users. It is one of the monuments of the Library's longstanding, systematic indexing of the medical literature, an effort which William Henry Welch (1850-1934), the great pathologist and bibliophile, considered to be "America's greatest contribution to medical knowledge." This indexing, begun by John Shaw Billings in the nineteenth century at the Library of the Surgeon-General's Office, United States Army (known today as the NLM), eventually created two distinct products: the Index-Catalogue of the Library of the Surgeon-General's Office, United States Army and the Index Medicus, forerunner of MEDLINE®, and now the largest component of PubMed.®
Released alongside the IndexCatalogue XML are an integrated XML file and associated DTD for two collections developed from the electronic database of A Catalogue of Incipits of Mediaeval Scientific Writings in Latin (rev.), by Lynn Thorndike and Pearl Kibre (eTK) and the updated and expanded version of Scientific and Medical Writings in Old and Middle English: An Electronic Reference (eVK2) edited by Linda Ehrsam Voigts and Patricia Deery Kurtz. Also available via the online IndexCat, these resources encompass over 42,000 records of incipits, or the beginning words of a medieval manuscript or early printed book, covering various medical and scientific writings on topics as diverse as astronomy, astrology, geometry, agriculture, household skills, book production, occult science, natural science, and mathematics, as these disciplines and others were largely intermingled in the medieval period of European history.
The NLM release of these resources in XML format joins many other freely downloadable resources, including the XML for MEDLINE®/PubMed® data, which includes over 22 million references to biomedical and life sciences journal articles back to 1946, and, for some journals, much earlier.
The release also coincides with the NLM's participation in "Shared Horizons: Data, Biomedicine, and the Digital Humanities," an interdisciplinary symposium exploring the intersection of digital humanities and biomedicine. To be held April 10-12, 2013 in partnership with the National Endowment for the Humanities' Office of Digital Humanities, Maryland Institute for Technology in the Humanities at the University of Maryland, and Research Councils UK, Shared Horizons will create opportunities for disciplinary cross-fertilization through a mix of formal and informal presentations combined with breakout sessions designed to promote a rich exchange of ideas about how large-scale quantitative methods can lead to new understandings of human culture. Bringing together researchers from the digital humanities and bioinformatics communities, the symposium will explore ways in which these two communities might fruitfully collaborate on projects that bridge the humanities and medicine around the topics of sequence alignment and network analysis, two modes of analysis that intersect with "big data."
All Shared Horizons sessions will be live-streamed with a monitored back channel for the public to post/tweet comments. Recordings of all talks will also be posted to the Shared Horizons website, with the ability to comment pre- and post-event.
Inquiries about the new XML datasets associated with the IndexCatalogue and eTKeVK2 may be directed to NLM Customer Service.