Core concepts of rascaline

Rascaline is a library computing representations of atomic systems for machine learning applications. These representations encode fundamental symmetries of the systems to ensure that the machine learning algorithm is as efficient as possible. Examples of representations include the Smooth Overlap of Atomic Positions (SOAP), Behler-Parrinello symmetry functions, Coulomb matrices, and many others. This documentation does not describe each method in details, delegating instead to many other good resources on the subject. This section in particular explains the three core objects rascaline is built upon: systems, calculators and descriptors.

../_images/core-concepts.svg

Schematic representations of the three core concepts in rascaline: systems, calculators and descriptors. The core operation provided by this library to compute the representation (associated with a given calculator) of one or multiple systems, getting the corresponding data in a descriptor.

Systems: atoms and molecules

Systems describe the input data rascaline uses to compute various representations. They contains information about the atomic positions, different atomic types, unit cell and periodicity, and are responsible for computing the neighbors of each atomic center.

Rascaline uses systems in a generic manner, and while it provides a default implementation called SimpleSystem it is able to use data from any source by going through a few lines of adaptor code. This enables using it directly inside molecular simulation engines, re-using the neighbors list calculation done by the engine, when using machine learning force-fields in simulations.

Both implementation and data related to systems are thus provided by users of the rascaline library.

Calculators: computing representations

Calculators are provided by rascaline, and compute a single representations. There is a calculator for the sorted distances vector representation, one for the spherical expansion representation, one for the LODE spherical expansion representation, and hopefully soon many others.

All calculators are registered globally in rascaline, and can be constructed with a name and a set of parameters (often called hyper-parameters). These parameters control the features of the final representation: how many are they, and what do they represent. All available calculators and the corresponding parameters are documented.

From a user perspective, calculators are black boxes that take systems as input and returns a descriptor object, described below.

Descriptors: data storage for atomistic machine learning

After using a calculator on one or multiple systems, users will get the numerical representation of their atomic systems in a descriptor object. Rascaline uses metatensor TensorMap type when returning descriptors.

A TensorMap can be seen as a dictionary mapping some keys to a set of data blocks. Each block contains both data (and gradients) arrays — i.e. multidimensional arrays containing the descriptor values — and metadata describing the different dimensions of these arrays. Which keys are present in a TensorMap will depend on Calculator being used. Typically, representation using one-hot encoding of atomic types will have the corresponding keys (for example center_type, neighbor_type, etc.), and equivariant representations will have keys for the different equivariance classes (o3_lambda for SO(3) equivariants, etc.).

For more information on TensorMap and what can be done with one, please see the metatensor documentation.