Programming languages for chemical information

Cheminformatics covers the representation, management and analysis of small molecules and their associated data. It is implied that these activities are performed on a computer. But how does one represent a molecule on a computer? How does one then perform computations on such a representation?

Programming languages form the basis of working with molecular structures on a computer. Data structures to represent a chemical structure (such as a connection table or a graph) and algorithms to manipulate them are implemented in a programming language. Importantly, there are many programming languages, each of which can be used to implement cheminformatics algorithms, with varying degrees of ease. Given that there are so many languages, how does one go about choosing a language for a cheminformatics project?

Given that there are so many languages, how does one go about choosing a language for a cheminformatics project?

To help answer this question, the Journal of Cheminformatics has created a thematic series to highlight the different programming languages that are employed within the cheminformatics community. The series consists of invited articles that look at how a given language has been used to solve cheminformatics problems and how features of the language help developing such solutions.

While most cheminformatics practitioners may not be aware of modern programming language (PL) research, such efforts have created language features that are commonly used in cheminformatics projects by virtue of being part of the programming language being used. For example, functional programming is touted for enabling practitioners to write concise code, with fewer bugs. While this is true in some cases, it is not clear whether this enables better cheminformatics software.

This leads to the original question of choosing a language for a cheminformatics project. The series doesn’t intend to recommend any given language over another. Rather, authors have presented what they feel a language enables them to do. Based on the current articles in the series, a common feature of languages used to implement cheminformatics is the presence of support libraries and a community. It does little good to write software in a language that only the developer can support. As a result, while niche languages might offer unique features, lack of community support (and associated tooling) can be an impediment to their broader usage.

The series doesn’t intend to recommend any given language over another. Rather, authors have presented what they feel a language enables them to do.

Following on from this, infrastructure around a language can greatly help development. For example, Jupyter notebooks have made literate programming significantly more accessible, for multiple languages such as Python, R and even C++. Similarly, RStudio as an environment for R development can significantly changes ones data analysis workflow. With the rise of web applications, the ability to support cheminformatics data structures and algorithms in web applications has become important. While one can use traditional languages such as C++ to write web applications, modern web projects tend to use Javascript and its variants. As a result, cheminformatics toolkits implemented in Javascript have appeared, enabling functionality from depiction of chemical structures to descriptor computations.

In summary, with the publication of this thematic series, the hope is that readers will be exposed to a variety of programming languages and be able to identify features in those languages that may be useful in their next project.

View the latest posts on the On Physical Sciences homepage

Comments