Mohammad Al-Ubaydli’s blog

The NCBI Bookshelf

Posted in Articles, My publications, People / organisations, Technology by Dr Mohammad Al-Ubaydli on February 1, 2003

I arrived to my first tutorial at medical school without a battery for my hearing aid. My eyes told me that the session involved a very detailed discussion, but my ears only caught the words “Stryer Biochemistry”. As the class members shuffled out at the end of the hour I managed to prod one of them into explaining that Stryer was the author of the textbook Biochemistry, and that the lecturer thought it a good idea for me to get a copy. I did get a copy, but for the rest of the year did not agree that the idea was good.

A few years and one barely passed biochemistry exam later, I am working as part of the BookShelf team at the National Center for Biotechnology Information (NCBI). The latest book to be added to the BookShelf web site is Biochemistry, and I am beginning to realize why the book was recommended.

It is too late to reverse my grades, but it is with renewed pleasure that I get to read two favourite textbooks of mine – Janeway et al’s Immunobiology, and Alberts et al’s Molecular Biology of the Cell. I can also discover Surgical Treatment, wallow in the detail of Medical Microbiology, and be thankful for the overviews of Genes and Disease.

These are but a few of the titles that are freely available for access on the BookShelf. Of course, as part of the NCBI, the site is fully integrated with other NCBI databases. For example references to papers link directly into PubMed for the abstract, or even, where available, directly into PubMed Central for the full text. And references to genetic information make full use of OMIM and LocusLink.

The integration works in the reverse direction as well: every abstract displayed through PubMed has a button labeled “Links”. Clicking this reveals a link to “Books”, and clicking that highlights words in the abstract for which more information is available through the BookShelf. So a biochemically-challenged clinician such as myself that encountered “pyruvate” within the abstract of a paper could click on “Links”, then “Books”, then “pyruvate”, and finally “118 items” to choose between the 118 passages in Biochemistry that discuss the molecule.

The BookShelf team run more than a clever web site operation. First, the standard textbooks are converted to XML (eXtensible Markup Language). XML is a standard for describing data, and the NCBI has used it to provide definitions for various types of biomedical data. The BookShelf team’s textbook content is one such type of data. Once in XML, the content can be repurposed for any display, including the BookShelf web site and PDF files.

The team also enhances the text with the correct labeling. The BookShelf texts are valuable not just because of the information that they contain, but also because of the information about information. In other words, interspersed in the text that the authors had written are labels that identify certain content as references, other parts as headings, and still other parts as figures with relevant captions. These labels in turn allow the reader access to sophisticated searches, and ensure appropriate display – for example the references become automatically linked to the PubMed abstracts.

For all the benefits of these labels, the task of adding them is a detailed one.

My work involves considering ways in which the task can be made more efficient. Indeed this is an issue which many publishing houses, libraries and content creators around the world are wrestling with. Some have created specialized authoring software that ensures correct labeling. But few authors have the time to invest in learning about new software. Others provide word processing templates and guidelines for the authors to follow. But as a colleague from a publishing house remarked, “we love templates but the authors just copy and paste over them”. A third alternative comes with the arrival of Microsoft’s Word 2003, which promises to read and write in XML.

Such options keep me entertained, as does the matter of collaborative authoring. Several authors from disparate institutions across the USA are collaborating to add content to the BookShelf. What is the best way to support their authoring process across such distances? And so, along with biochemistry and genetics, the BookShelf adds XML as a topic for my reading list.

review published in British Medical Informatics Today Winter 2003 issue