.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/first-calculation.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_first-calculation.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_first-calculation.py:


.. _userdoc-tutorials-get-started:

First descriptor computation
============================

This is an introduction to the rascaline interface using a molecular crystals
dataset using the Python interface. If you are interested in another
programming language we recommend you first follow this tutorial and afterward
take a look at the how-to guide on :ref:`userdoc-how-to-computing-soap`.

The dataset
-----------

The atomic configurations used in our documentation are a small subset of the
`ShiftML2 dataset <https://pubs.acs.org/doi/pdf/10.1021/acs.jpcc.2c03854>`_
containing molecular crystals. There are four crystals - one with each of the
elements [hydrogen, carbon], [hydrogen, carbon, nitrogen, oxygen], [hydrogen,
carbon, nitrogen], or [hydrogen, carbon, oxygen]. Each crystal has 10 structures,
also denoted by frames, attributed to it. The first frame of each crystal structure
is the geometry-optimized frame. The following 9 frames contain atoms that are
slightly displaced from the geometry-optimized frame. You can obtain the dataset
from our :download:`website <../../static/dataset.xyz>`.

.. GENERATED FROM PYTHON SOURCE LINES 27-30

We will start by importing all the required packages: the classic numpy;
chemfiles to load data, and rascaline to compute representations. Afterward
we will load the dataset using chemfiles.

.. GENERATED FROM PYTHON SOURCE LINES 31-43

.. code-block:: Python


    import chemfiles
    import numpy as np

    from rascaline import SphericalExpansion


    with chemfiles.Trajectory("dataset.xyz") as trajectory:
        frames = [f for f in trajectory]

    print(f"The dataset contains {len(frames)} frames.")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The dataset contains 40 frames.


.. GENERATED FROM PYTHON SOURCE LINES 44-50

We will not explain here how to use chemfiles in detail, as we only use a few
functions. Briefly, :class:`chemfiles.Trajectory` loads structure data in a
format rascaline can use. If you want to learn more about the possibilities
take a look at the `chemfiles documentation <https://chemfiles.org>`_.

Let us now take a look at the first frame of the dataset.

.. GENERATED FROM PYTHON SOURCE LINES 51-56

.. code-block:: Python


    frame0 = frames[0]

    print(frame0)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Frame with 20 atoms


.. GENERATED FROM PYTHON SOURCE LINES 57-59

With ``frame0.atoms`` we get a list of the atoms that make up frame zero.
The ``name`` attribute gives us the name of the specified atom.

.. GENERATED FROM PYTHON SOURCE LINES 60-71

.. code-block:: Python


    elements, counts = np.unique([atom.name for atom in frame0.atoms], return_counts=True)

    print(
        f"The first frame contains "
        f"{counts[0]} {elements[0]}-atoms, "
        f"{counts[1]} {elements[1]}-atoms, "
        f"{counts[2]} {elements[2]}-atoms and "
        f"{counts[3]} {elements[3]}-atoms."
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The first frame contains 4 C-atoms, 8 H-atoms, 4 N-atoms and 4 O-atoms.


.. GENERATED FROM PYTHON SOURCE LINES 72-88

Calculate a descriptor
----------------------

We will now calculate an atomic descriptor for this structure using the SOAP
spherical expansion as introduced by `Bartók, Kondor, and Csányi
<http://dx.doi.org/10.1103/PhysRevB.87.184115>`_.

To do so we define below a set of parameters telling rascaline how the
spherical expansion should be calculated. These parameters are also called
hyper parameters since they are parameters of the representation, in
opposition to parameters of machine learning models. Hyper parameters are a
crucial part of calculating descriptors. Poorly selected hyper parameters will
lead to a poor description of your dataset as discussed in the `literature
<https://arxiv.org/abs/1502.02127>`_. The effect of changing some hyper
parameters is discussed in a :ref:`second tutorial
<userdoc-tutorials-understanding-hypers>`.

.. GENERATED FROM PYTHON SOURCE LINES 89-101

.. code-block:: Python


    HYPER_PARAMETERS = {
        "cutoff": 4.5,
        "max_radial": 9,
        "max_angular": 6,
        "atomic_gaussian_width": 0.3,
        "center_atom_weight": 1.0,
        "radial_basis": {"Gto": {"spline_accuracy": 1e-6}},
        "cutoff_function": {"ShiftedCosine": {"width": 0.5}},
        "radial_scaling": {"Willatt2018": {"scale": 2.0, "rate": 1.0, "exponent": 4}},
    }


.. GENERATED FROM PYTHON SOURCE LINES 102-106

After we set the hyper parameters we initialize a
:class:`rascaline.calculators.SphericalExpansion` object with hyper parameters
defined above and run the
:py:func:`rascaline.calculators.CalculatorBase.compute()` method.

.. GENERATED FROM PYTHON SOURCE LINES 107-112

.. code-block:: Python


    calculator = SphericalExpansion(**HYPER_PARAMETERS)
    descriptor0 = calculator.compute(frame0)
    print(type(descriptor0))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <class 'metatensor.tensor.TensorMap'>


.. GENERATED FROM PYTHON SOURCE LINES 113-120

The descriptor format is a :class:`metatensor.TensorMap` object. Metatensor is
like numpy for storing representations of atomistic ML data. Extensive details
on the metatensor are covered in the `corresponding documentation
<https://lab-cosmo.github.io/metatensor/>`_.

We will now have a look at how the data is stored inside
:class:`metatensor.TensorMap` objects.

.. GENERATED FROM PYTHON SOURCE LINES 121-125

.. code-block:: Python


    print(descriptor0)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    TensorMap with 112 blocks
    keys: o3_lambda  o3_sigma  center_type  neighbor_type
              0         1           1             1
              1         1           1             1
              2         1           1             1
              3         1           1             1
              4         1           1             1
              5         1           1             1
              6         1           1             1
              0         1           1             6
              1         1           1             6
              2         1           1             6
              3         1           1             6
              4         1           1             6
              5         1           1             6
              6         1           1             6
              0         1           1             7
              1         1           1             7
              2         1           1             7
              3         1           1             7
              4         1           1             7
              5         1           1             7
              6         1           1             7
              0         1           1             8
              1         1           1             8
              2         1           1             8
              3         1           1             8
              4         1           1             8
              5         1           1             8
              6         1           1             8
              0         1           6             1
              1         1           6             1
              2         1           6             1
              3         1           6             1
              4         1           6             1
              5         1           6             1
              6         1           6             1
              0         1           6             6
              1         1           6             6
              2         1           6             6
              3         1           6             6
              4         1           6             6
              5         1           6             6
              6         1           6             6
              0         1           6             7
              1         1           6             7
              2         1           6             7
              3         1           6             7
              4         1           6             7
              5         1           6             7
              6         1           6             7
              0         1           6             8
              1         1           6             8
              2         1           6             8
              3         1           6             8
              4         1           6             8
              5         1           6             8
              6         1           6             8
              0         1           7             1
              1         1           7             1
              2         1           7             1
              3         1           7             1
              4         1           7             1
              5         1           7             1
              6         1           7             1
              0         1           7             6
              1         1           7             6
              2         1           7             6
              3         1           7             6
              4         1           7             6
              5         1           7             6
              6         1           7             6
              0         1           7             7
              1         1           7             7
              2         1           7             7
              3         1           7             7
              4         1           7             7
              5         1           7             7
              6         1           7             7
              0         1           7             8
              1         1           7             8
              2         1           7             8
              3         1           7             8
              4         1           7             8
              5         1           7             8
              6         1           7             8
              0         1           8             1
              1         1           8             1
              2         1           8             1
              3         1           8             1
              4         1           8             1
              5         1           8             1
              6         1           8             1
              0         1           8             6
              1         1           8             6
              2         1           8             6
              3         1           8             6
              4         1           8             6
              5         1           8             6
              6         1           8             6
              0         1           8             7
              1         1           8             7
              2         1           8             7
              3         1           8             7
              4         1           8             7
              5         1           8             7
              6         1           8             7
              0         1           8             8
              1         1           8             8
              2         1           8             8
              3         1           8             8
              4         1           8             8
              5         1           8             8
              6         1           8             8


.. GENERATED FROM PYTHON SOURCE LINES 126-137

The :class:`metatensor.TensorMap` is structured in several instances of an
:class:`metatensor.TensorBlock`. To distinguish the block each block is associated
with a unique key. For the current example, we have one block for each angular channel
labeled by ``o3_lambda``, the central atom type ``center_type`` and
neighbor atom type labeled by ``neighbor_type``. Different atomic types are
represented using their atomic number, e.g. 1 for hydrogen, 6 for carbon, etc. To
summarize, this descriptor contains 112 blocks covering all combinations of the
angular channels of the central and neighbor atom types in our dataset.

Let us take a look at the second block (at index 1) in detail. This block contains the
descriptor for the :math:`l=1` angular channel for hydrogen-hydrogen pairs.

.. GENERATED FROM PYTHON SOURCE LINES 138-143

.. code-block:: Python


    block = descriptor0.block(1)
    print(descriptor0.keys[1])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    LabelsEntry(o3_lambda=1, o3_sigma=1, center_type=1, neighbor_type=1)


.. GENERATED FROM PYTHON SOURCE LINES 144-151

The descriptor values
---------------------

The values of the representation are stored as an array. Each entry in this
array also has associated unique metadata as each block. For the spherical
expansion calculator used in this tutorial the values have three dimensions
which we can verify from the ``.shape`` attribute.

.. GENERATED FROM PYTHON SOURCE LINES 152-156

.. code-block:: Python


    print(block.values.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (8, 3, 9)


.. GENERATED FROM PYTHON SOURCE LINES 157-166

The descriptor values
---------------------

The first dimension is denoted by the `samples`, the intermediate dimension by
`components`, and the last dimension by `properties`. The "sample dimension"
has a length of eight because we have eight hydrogen atoms in the first frame.
We can reveal more detailed metadata information about the sample-dimension
printing of the :py:attr:`metatensor.TensorBlock.samples` attribute of the
block

.. GENERATED FROM PYTHON SOURCE LINES 167-170

.. code-block:: Python


    print(block.samples)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Labels(
        system  atom
          0      12
          0      13
            ...
          0      18
          0      19
    )


.. GENERATED FROM PYTHON SOURCE LINES 171-173

The result is an :class:`metatensor.TensorMap` instance. It contains in total
eight tuples each with two values. The tuple values are named as follows

.. GENERATED FROM PYTHON SOURCE LINES 174-177

.. code-block:: Python


    print(block.samples.names)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ['system', 'atom']


.. GENERATED FROM PYTHON SOURCE LINES 178-184

Meaning that the first entry of each tuple indicates the _structure_, which is
0 for all because we only computed the representation of a single frame. The
second entry of each tuple refers to the index of the _center_ atom.

We can do a similar investigation for the second dimension: the
:py:attr:`metatensor.TensorBlock.components`.

.. GENERATED FROM PYTHON SOURCE LINES 185-188

.. code-block:: Python


    print(block.components)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [Labels(
        o3_mu
         -1
          0
          1
    )]


.. GENERATED FROM PYTHON SOURCE LINES 189-200

Here, the components are associated with the angular channels of the
representation. The size of ``o3_mu`` is :math:`2l + 1`, where
:math:`l` is the current ``o3_lambda`` of the block. Here, its
dimension is three because we are looking at the ``o3_lambda=1``
block. You may have noticed that the return value of the last call is a
:class:`list` of :class:`metatensor.Labels` and not a single ``Labels``
instance. The reason is that a block can have several component dimensions as
we will see below for the gradients.

The last value represents the number of radial channels. For the
:py:attr:`metatensor.TensorBlock.properties` dimension we find an object

.. GENERATED FROM PYTHON SOURCE LINES 201-204

.. code-block:: Python


    print(block.properties)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Labels(
        n
        0
        1
         ...
        7
        8
    )


.. GENERATED FROM PYTHON SOURCE LINES 205-206

containing a tuple of only one value ranging from 0 to 8. The name of this entry is

.. GENERATED FROM PYTHON SOURCE LINES 207-210

.. code-block:: Python


    print(block.properties.names)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ['n']


.. GENERATED FROM PYTHON SOURCE LINES 211-216

and denoting the radial channels. The range results from our choice of
``max_radial = 9`` in the hyper parameters above.

After looking at the metadata we can investigate the actual data of the
representation in more details

.. GENERATED FROM PYTHON SOURCE LINES 217-220

.. code-block:: Python


    print(block.values[0, 0, :])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [-0.00102818  0.00860425 -0.02339817 -0.11691264 -0.04411467 -0.02857226
     -0.02109534 -0.00712789 -0.00230597]


.. GENERATED FROM PYTHON SOURCE LINES 221-228

By using ``[0, 0, :]`` we selected the first hydrogen and the first ``m``
channel. As you the output shows the values are floating point numbers between
``-1.0`` and ``1.0``. Values in this range are reasonable and can be directly
used as input for a machine learning algorithm.

Rascaline is also able to process more than one structure within one function
call. You can process a whole dataset with

.. GENERATED FROM PYTHON SOURCE LINES 229-235

.. code-block:: Python


    descriptor_full = calculator.compute(frames)

    block_full = descriptor_full.block(0)
    print(block_full.values.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (420, 1, 9)


.. GENERATED FROM PYTHON SOURCE LINES 236-253

Now, the 0th block of the :class:`metatensor.TensorMap` contains not eight but
420 entries in the first dimensions. This reflects the fact that in total we
have 420 hydrogen atoms in the whole dataset.

If you want to use another calculator instead of
:class:`rascaline.calculators.SphericalExpansion` shown here check out the
:ref:`userdoc-references` section.

Computing gradients
-------------------

Additionally, rascaline is also able to calculate gradients on top of the
values. Gradients are useful for constructing an ML potential and running
simulations. For example ``gradients`` of the representation with respect to
atomic positions can be calculated by setting the ``gradients`` parameter of
the :py:func:`rascaline.calculators.CalculatorBase.compute()` method to
``["positions"]``.

.. GENERATED FROM PYTHON SOURCE LINES 254-262

.. code-block:: Python


    descriptor_gradients = calculator.compute(frame0, gradients=["positions"])

    block_gradients = descriptor_gradients.block(0)
    gradient_position = block_gradients.gradient("positions")

    print(gradient_position.values.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (52, 3, 1, 9)


.. GENERATED FROM PYTHON SOURCE LINES 263-279

The calculated descriptor contains the values and in each block the associated
position gradients as an :class:`metatensor.block.Gradient` instance. The
actual values are stored in the ``data`` attribute. Similar to the features
the gradient data also has associated metadata. But, compared to the values
were we found three dimensions, and gradients have four. Again the first is
called `samples` and the `properties`. The dimensions between the sample and
property dimensions are denoted by `components`.

Looking at the shape in more detail we find that we have 52 samples, which is
much more compared to features where we only have eight samples. This arises
from the fact that we calculate the position gradient for each pair in the
structure. For our selected block these are all hydrogen-hydrogen pairs.
Naively one would come up with ``8 * 8 = 64`` samples, but rascaline already
ignores pairs that are outside of the cutoff radius. Their position gradient
is always zero. The :attr:`metatensor.block.Gradient.samples` attribute shows
this in detail.

.. GENERATED FROM PYTHON SOURCE LINES 280-283

.. code-block:: Python


    print(gradient_position.samples)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Labels(
        sample  system  atom
          0       0      12
          0       0      13
                ...
          7       0      18
          7       0      19
    )


.. GENERATED FROM PYTHON SOURCE LINES 284-285

Note that we have a tuple of three with the names

.. GENERATED FROM PYTHON SOURCE LINES 286-289

.. code-block:: Python


    print(gradient_position.samples.names)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ['sample', 'system', 'atom']


.. GENERATED FROM PYTHON SOURCE LINES 290-294

In the above output of the Labels instance for example the `(2, 0, 17)` entry
is missing indicating that this pair is outside of the cutoff.

Now looking at the :attr:`metatensor.block.Gradient.components`

.. GENERATED FROM PYTHON SOURCE LINES 295-298

.. code-block:: Python


    print(gradient_position.components)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [Labels(
        xyz
         0
         1
         2
    ), Labels(
        o3_mu
          0
    )]


.. GENERATED FROM PYTHON SOURCE LINES 299-305

we find two of them. Besides the `o3_mu` component that is also present in the
features position gradients also have a component indicating the direction of the
gradient vector.

Finally, the :attr:`metatensor.block.Gradient.properties` dimension is the same as for
the values

.. GENERATED FROM PYTHON SOURCE LINES 306-309

.. code-block:: Python


    print(gradient_position.properties)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Labels(
        n
        0
        1
         ...
        7
        8
    )


.. GENERATED FROM PYTHON SOURCE LINES 310-318

Rascaline can also calculate gradients with respect to the strain (i.e. the virial).
For this, you have to add ``"strain"`` to the list parsed to the ``gradients``
parameter of the :py:func:`rascaline.calculators.CalculatorBase.compute()` method.
Strain gradients/virial are useful when computing the stress and the pressure.

If you want to know about the effect of changing hypers take a look at the next
tutorial. If you want to solve an explicit problem our :ref:`userdoc-how-to` might
help you.


.. _sphx_glr_download_examples_first-calculation.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: first-calculation.ipynb <first-calculation.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: first-calculation.py <first-calculation.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: first-calculation.zip <first-calculation.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_