Core concepts of rascaline¶
Rascaline is a library computing representations of atomic systems for machine learning applications. These representations encode fundamental symmetries of the systems to ensure that the machine learning algorithm is as efficient as possible. Examples of representations include the Smooth Overlap of Atomic Positions (SOAP), Behler-Parrinello symmetry functions, Coulomb matrices, and many others. This documentation does not describe each method in details, delegating instead to many other good resources on the subject. This section in particular explains the three core objects rascaline is built upon: systems, calculators and descriptors.
Systems: atoms and molecules¶
Systems describe the input data rascaline uses to compute various representations. They contains information about the atomic positions, different atomic types, unit cell and periodicity, and are responsible for computing the neighbors of each atomic center.
Rascaline uses systems in a generic manner, and while it provides a default
implementation called SimpleSystem
it is able to use data from any source by
going through a few lines of adaptor code. This enables using it directly inside
molecular simulation engines, re-using the neighbors list calculation done by
the engine, when using machine learning force-fields in simulations.
Both implementation and data related to systems are thus provided by users of the rascaline library.
Calculators: computing representations¶
Calculators are provided by rascaline, and compute a single representations. There is a calculator for the sorted distances vector representation, one for the spherical expansion representation, one for the LODE spherical expansion representation, and hopefully soon many others.
All calculators are registered globally in rascaline, and can be constructed with a name and a set of parameters (often called hyper-parameters). These parameters control the features of the final representation: how many are they, and what do they represent. All available calculators and the corresponding parameters are documented.
From a user perspective, calculators are black boxes that take systems as input and returns a descriptor object, described below.
Descriptors: data storage for atomistic machine learning¶
After using a calculator on one or multiple systems, users will get the
numerical representation of their atomic systems in a descriptor
object.
Rascaline uses metatensor TensorMap
type when returning descriptors.
A TensorMap
can be seen as a dictionary mapping some keys to a set of data
blocks. Each block contains both data (and gradients) arrays — i.e.
multidimensional arrays containing the descriptor values — and metadata
describing the different dimensions of these arrays. Which keys are present in a
TensorMap
will depend on Calculator
being used. Typically,
representation using one-hot encoding of atomic types will have the
corresponding keys (for example center_type
, neighbor_type
, etc.), and
equivariant representations will have keys for the different equivariance
classes (o3_lambda
for SO(3) equivariants, etc.).
For more information on TensorMap
and what can be done with one, please see
the metatensor documentation.