This project is under development and solely for internal use. Many parts are in flux, and no guarantees about correctness or stability can be made.
bentopy is built on a Rust core. A Rust compiler is required during installation. To check whether one is present, you can run
cargo --version
If it is not present, you can install it by any means you prefer. Installation through rustup is very convenient.
If you don't care about peeking into the sources and just want access to the program, this is the quickest option.
pip3 install git+https://github.com/marrink-lab/bentopy
git clone https://github.com/marrink-lab/bentopy
cd bentopy
python3 -m venv venv && source venv/bin/activate # Not required, but often convenient.
pip3 install .
bentopy currently features four subcommands, pack, render, mask, and grocat.
You can learn about the available options through the help information.
bentopy-pack --help
bentopy-mask --help
...
[!NOTE] For the moment, the bentopy subcommands can be accessed through the
bentopy-<subcommand>
pattern. For example,bentopy-pack
. Shortly, abentopy
root command will be introduced, which provides access through abentopy <subcommand>
pattern, e.g.,bentopy pack
. Throughout this document, the former usage is shown. For now, a dash between the words is necessary.
A typical bentopy workflow may look like this.
bentopy-grocat -> bentopy-mask -> bentopy-pack -> bentopy-render -> bentopy-grocat
What follows is a brief explanation and example invocation of these subcommands. A more detailed walkthrough can be found in the example section.
pack provides the core functionality of bentopy. Given an input configuration file, a packing of the input structures within the specified space is created.
bentopy-pack --rearrange --seed 5172 input.json placements.json
Pack a system defined in input.json
and write the output placement list to
placements.json
. Prior to packing, rearrange the specified structures
according to a size heuristic to improve the possible density and set the
random seed to 5172.
This packing is stored as a placement list, which is a json
file that
describes which structures at what rotations are placed where. In order
to create a structure file (and topology file) from this placement list, the
render subcommand can be used.
bentopy-render placements.json structure.gro -t topol.top
Render placements.json
created by pack to a gro
file at structure.gro
and write a topology file to topol.top
.
To set up a configuration for pack, you must define a space into which the
structures will be packed. This space can be defined according to an analytical
function, such as a sphere. But, more interestingly, bentopy is capable of
packing arbitrary spaces. These spaces can be provided as voxel masks. Any
boolean numpy array stored as a compressed file (.npz
) of the
correct dimensions can function as a valid mask.
The mask subcommand provides a convenient and powerful means of setting up such masks based on your existing structures, right from the command line. mask can be used to automatically or manually select different compartments as determined by mdvcontainment.
bentopy-mask chrom_mem.gro mask.npz --autofill
Determine the compartments in chrom_mem.gro
and automatically select the
innermost compartment (--autofill
). From that selected compartment, write a
mask to mask.npz
.
As the name suggests, grocat is a tool for concatenating gro
files. Though
this is a relatively simple operation, grocat provides a convenient way of
telling apart different sections of large models by optionally specifying a new
residue name for a whole file in the argument list by appending
:<residue name>
to a file path.
bentopy-grocat chromosome.gro:CHROM membrane.gro:MEM -o chrom_mem.gro
Concatenate chromosome.gro
and membrane.gro
into chrom_mem.gro
, setting
the residue names of the chromosome atoms to CHROM
and those of the membrane
to MEM
in the concatenated structure.
Let's try to pack a spherical system full of lysozyme structures.
First, we want a structure to pack, so we can download the structure for
3LYZ
. We place it in a structures
directory to stay organized.
wget https://files.rcsb.org/download/3lyz.pdb
mkdir structures
mv 3lyz.pdb structures
Now we can set up our input configuration, which we will call
3lyz_input.json
:
{
"space": {
"size": [100, 100, 100],
"resolution": 0.5,
"compartments": [
{
"id": "main",
"shape": "spherical"
}
]
},
"output": {
"title": "3lyz",
"topol_includes": [
"forcefields/forcefield.itp",
"structures/3lyz.itp"
]
},
"segments": [
{
"name": "3lyz",
"number": 6500,
"path": "structures/3lyz.pdb",
"compartments": ["main"]
}
]
}
We set the space up to a size of 100×100×100 nm³, with a resolution of 0.5 nm. The mask—the volume that defines where structures can be placed—is set to be derived from a spherical analytical function.
In case you want to use a custom mask like you may set up with bentopy-mask, you could specify the space in the following manner.
"compartments": [
{
"id": "main",
- "shape": "spherical"
+ "voxels": { "path": "mask.npz" }
}
]
Here, voxels and the associated path point to a precomputed voxel mask.
This mask can be any data that can be loaded by np.load()
to be
interpreted as a three-dimensional boolean mask. The provided mask must have
the same size as specified in the space section's dimensions divided by
the resolution.
In output we set a title for the system, and the optional field
topol_includes, we can specify what itp
files files are to
be included if the placement list produced from this config is written to a
topology file (.top
).
Note
For this example, we filled this field with dummy paths.
Finally, in the segments section, we define a list of structures to place. In our case that is only one: which we give the name "3lyz", and we set the number of segments to place to 6500. Instead of a number, a concentration in mol/L for can be provided as well. The volume over which that concentration applies is that of the segment's associated compartments. The path points pack to where the structure file for this segment can be found.
Important
The name record must be selected carefully. If you want to write out a
valid topology file using bentopy-render, the value of name must
correspond to the names in the itp
files.
Constraining segment rotations.
For some systems, it can be helpful or necessary to constrain the rotation
of certain segments. The rotation_axes parameter takes a string with the
axes over which a structure may be randomly rotated. Over axes that are not
mentioned, no random rotation will be applied. For instance, the axes
definition "xyz"
indicates full rotational freedom and is the tacit default
(rotation is allowed over x, y, and z axes), while "z"
constrains the
rotation such that it may only occur over the z-axis, leaving x and z
rotation as provided in the structure file.
Additionally, one can set an initial_rotation for a segment.
It can be set in an axis-angle (degrees) [xangle, yangle, zangle]
list, where
the rotations are applied in x, y, z order.
This rotation will be applied to the structure as it is loaded from its file
and serves as the starting point for any subsequent rotations. The initial
rotation and constraining of the rotation axes as described above work together
to provide open-ended control of the possible rotations for segments.
{
"name": "1a0s",
"number": 100,
"path": "structures/1a0s.pdb",
"initial_rotation": [0, 90, 0],
"rotation_axes": "x",
"compartments": ["flat"]
}
With the above segment definition, up to a 100 instances of some structure will be placed according to some compartment with the id "flat". The structure will be rotated 90 degrees over its y-axis. Random rotations are applied only over its (post-initial rotation) x-axis.
This snippet thus allows placement of that structure over a yz plane, rotating over the axis that is perpendicular to that plane (x-axis).
Now, we are ready to pack the system. We could simply do this as follows.
bentopy-pack 3lyz_input.json 3lyz_placements.json
In order to make the procedure deterministic, the --seed
parameter can be
set. This means that the same command will produce the same output between
runs.
bentopy-pack --seed 1312 3lyz_input.json 3lyz_placements.json
In case we want to pack multiple structures, we may want to pass the
--rearrange
flag, as well. This will re-order the structures such that large
structures are placed first, and small structures are placed last. This
placement heuristic can lead to denser packings. When it is not set, the order
of the structures in the input configuration is respected.
After the command finishes, we will find that 3lyz_placements.json
has
been created. This is a single-line json
file, which can be hard to inspect.
If you are curious, you can use a tool such as jq
to look at what was
written in a more readable form.
jq . 3lyz_placement.json
The output may look like this (some lines have been cut and adjusted for legibility).
{
"title": "3lyz",
"size": [ 100, 100, 100 ],
"topol_includes": [ ... ],
"placements": [
{
"name": "3lyz",
"path": "structures/3lyz.pdb",
"batches": [
[
[
[ 1.0, 0.0, 0.0 ],
[ 0.0, 1.0, 0.0 ],
[ 0.0, 0.0, 1.0 ]
],
[
[ 8, 46, 68 ],
[ 26, 62, 88 ],
... many many more of such lines ...
]
],
[
[
[ 0.3658391780537972, -0.3882572475566672, -0.8458238619952991 ],
[ -0.8851693094147572, -0.4258733932991502, -0.18736901171236636 ],
[ -0.28746650147647396, 0.8172442490465064, -0.49947457185455224 ]
],
[
[ 31, 41, 56 ],
[ 61, 53, 4 ],
... many many more of such lines ...
]
]
... and on and on and on ...
]
}
]
}
render reads in the placement list and writes out a gro
file
(and optionally, a [top
topology file][top]). This is a separate operation,
since the packed systems can become very large. Storing the placement list as
an intermediate product decouples the hard task of packing from the simple work
of writing it into a structure file.
We want to render out the placement list we just created into a structure file
called 3lyz_sphere.gro
. Additionally, we would like to produce topology file
(topol.top
) that Gromacs uses to understand how the structure file is built
up.
bentopy-render 3lyz_placements.json 3lyz_sphere.gro -t topol.top
You can now inspect the 3lyz_sphere.gro
structure in a molecular
visualization program of your preference.
But beware! We just created big structure, and some programs may have a hard time keeping up.
Luckily, _bentopy-render_ has some additional tricks up its sleeve to ease this load.
In case you want to inspect only a small part of a very large placement list,
the --limits
option allows you to select a cuboid within the volume defined
by the placement list from which the placed structures will be rendered. The
volume that is cut out is defined by a sequence of six comma-separated values
in the order minx,maxx,miny,maxy,minz,maxz
. If a value is a number, it is
interpreted as a dimension in nm. If it is not a number (the phrase 'none' is
conventional) no limits are set on that dimension.
For example, to only render a 10×10×10 nm cube extending from the point (40, 40, 40) to (50, 50, 50), we can pass the following limits.
bentopy-render 3lyz_placements.json 3lyz_small_cube.gro --limits 40,50,40,50,40,50
Perhaps we would like to see a pancake instead! To do this, we can define the limits only for the z-direction.
bentopy-render 3lyz_placements.json 3lyz_pancake.gro --limits none,none,none,none,45,55
Using --limits
, we can cut out a part of the packed structure, but perhaps
you want to inspect the total structure without loading as many atoms.
For this, you can try the --mode
option, which gives you the ability to only
render out certain atoms (backbone
, alpha
carbon) or beads (representing
each residue
, or even only one per structure instance
). By default, the
mode is full
, and we have just seen its output. Let's try alpha
, now.
bentopy-render output/3lyz_placements.json 3lyz_alpha.gro --mode alpha
Now, we can compare the sizes of the files.
wc -l 3lyz_sphere.gro 3lyz_alpha.gro
Reducing the number of atoms that are rendered out can improve the time it takes to inspect a packing, if necessary.
[!NOTE] Using modes other than
full
(the default) is obviously not relevant beyond inspection and analysis of the packed structure. To reflect this, the option to write a topology file and setting a mode are mutually exclusive.
The residue numbers can be assigned to the atoms in the output structure file in two ways. This behavior can be set using the --resnum-mode
option.
--resnum-mode instance
: each instance of a segment will have its own residue number. The first instance that is placed will have a residue number of 1, the second is 2, etc.--resnum-mode segment
: all instances of a segment will have the same residue number. The whole group of placed structures for a segment can be selected by its associated group residue number. In a system with a hundred instances of two segments each, the hundred structures for the first segment can be selected with residue number 1, the hundred structures for the second segment with residue number 2.
In case you want to render out a structure based on a placement list that you
or a colleague have created in a different environment, it can be useful to
direct render to read the input structures from a different directory. To do
this, you can set a root path for the structures with the --root
option. This
path will be prepended to any relative structure path that is defined in the
placement list.