Sample SelectionΒΆ
This examples shows how to compute a representation only for a subset of the available samples. In particular, we will compute the SOAP power spectrum representation for a specific subset of atoms, out of all atoms the code reads from a file. The path to the system file is taken from the first command line argument.
This can be useful if we are only interested in certain systems in a large dataset, or if we need to determine the effect of a certain type of atoms on some system properties. In the following, we will look at the tools with which sample selection can be done in rascaline.
The first part of this example repeats the Computing SOAP features, so we suggest that you read it initially.
You can obtain a testing dataset from our website
.
import chemfiles
import numpy as np
from metatensor import Labels
from rascaline import SoapPowerSpectrum
First we load the dataset with chemfiles
with chemfiles.Trajectory("dataset.xyz") as trajectory:
frames = [f for f in trajectory]
and define the hyper parameters of the representation
HYPER_PARAMETERS = {
"cutoff": 5.0,
"max_radial": 6,
"max_angular": 4,
"atomic_gaussian_width": 0.3,
"center_atom_weight": 1.0,
"radial_basis": {
"Gto": {},
},
"cutoff_function": {
"ShiftedCosine": {"width": 0.5},
},
}
calculator = SoapPowerSpectrum(**HYPER_PARAMETERS)
descriptor = calculator.compute(frames)
The selections for sample can be a set of Labels
, in which case the names
of the labels must be a subset of the names of the samples produced by the
calculator. You can see the default set of names with:
print("sample names:", descriptor.sample_names)
sample names: ['system', 'atom']
We can use a subset of these names to define a selection. In this case, only samples matching the labels in this selection will be used by rascaline (here, only atoms from system 0, 2, and 3)
selection = Labels(
names=["system"],
values=np.array([[0], [2], [3]]),
)
descriptor_selected = calculator.compute(frames, selected_samples=selection)
descriptor_selected = descriptor_selected.keys_to_samples("center_type")
descriptor_selected = descriptor_selected.keys_to_properties(
["neighbor_1_type", "neighbor_2_type"]
)
samples = descriptor_selected.block().samples
The first block should have [0, 2, 3]
as samples["system"]
print(f"we have the following systems: {np.unique(samples['system'])}")
we have the following systems: [0 2 3]
If we want to select not only based on the system indexes but also atomic indexes, we can do the following (here we select atom 0 in the first system and atom 1 in the third system):
selection = Labels(
names=["system", "atom"],
values=np.array([[0, 0], [2, 1]]),
)
descriptor_selected = calculator.compute(frames, selected_samples=selection)
descriptor_selected = descriptor_selected.keys_to_samples("center_type")
descriptor_selected = descriptor_selected.keys_to_properties(
["neighbor_1_type", "neighbor_2_type"]
)
The values will have 2 rows, since we have two samples:
print(
"shape of first block of descriptor:",
descriptor_selected.block(0).values.shape,
)
shape of first block of descriptor: (2, 1800)
The previous selection method uses the same selection for all blocks. If you can to use different selection for different blocks, you should use a TensorMap to create your selection
descriptor = calculator.compute(frames)
descriptor_selected = calculator.compute(frames, selected_samples=selection)
notice how we are passing a TensorMap as the selected_samples
argument:
print(type(descriptor_selected))
descriptor_for_comparison = calculator.compute(
frames, selected_samples=descriptor_selected
)
<class 'metatensor.tensor.TensorMap'>
The descriptor had 420 samples stored in the first block,
the descriptor_selected
had 0. So descriptor_for_comparison
will also have 0 samples.
print("shape of first block initially:", descriptor.block(0).values.shape)
print(
"shape of first block of reference:",
descriptor_selected.block(0).values.shape,
)
print(
"shape of first block after selection:",
descriptor_for_comparison.block(0).values.shape,
)
shape of first block initially: (420, 180)
shape of first block of reference: (0, 180)
shape of first block after selection: (0, 180)
To be done
To be done
To be done