Notebook01: Compute Order Parameters#

In this notebook, we’ll show you how to do the complete pipeline from downloading a united-atom trajectory file (Berger POPC) from Zenodo, reconstructing hydrogens and calculating the order parameters using buildH. In addition, we show how to plot the calculated order parameters using matplotlib.

Foreword#

Launching Unix commands within a Jupyter Notebook#

Even though buildH can be used as a module (see Notebook04), it is mainly intended to be used in the Unix command line in a terminal. Thus, many cells in this notebook will run Unix commands instead of Python instructions. All Unix commands are prefixed with the symbol !. For example:

[1]:
!ls
img
Notebook_01_buildH_calc_OP.ipynb
Notebook_02_buildH_calc_OP_outputwH.ipynb
Notebook_03_mixture_POPC_POPE.ipynb
Notebook_04_library.ipynb
Notebook_05_mixture_POPC_cholesterol.ipynb

Checking buildH installation#

First, you have to install buildH. For that you can follow the instructions on the main README of the buildH repository or the installation help page. If you did not install buildH, this notebook will not work.

Before launching this notebook, you should have activated the conda environment from buildH (the $ below corresponds to you prompt) :

$ conda activate buildh

If this command worked, you should have got the message (buildh) at the beginning of your prompt stating that the buildh environment has been properly activated.

Once done, you have launched this Jupyter notebook :

$ jupyter lab

At this point, we should be able to launch buildH within this notebook:

[2]:
!buildH
usage: buildH [-h] [-v] -c COORD [-t TRAJ] -l LIPID
              [-lt LIPID_TOPOLOGY [LIPID_TOPOLOGY ...]] -d DEFOP
              [-opx OPDBXTC] [-o OUT] [-b BEGIN] [-e END]  [-igch3]
buildH: error: the following arguments are required: -c/--coord, -l/--lipid, -d/--defop

If you do not get this usage message and have some error, you probably did not activate correctly the buildH conda environment before launching this notebook. One other possibility is that the installation of the buildH environment did not succeed. In such a case, please refer to buildH installation.

Trajectory download#

Most of the time, you will use buildH on trajectories that you have produced. Here, for the sake of learning, let us download a trajectory of a Berger POPC membrane from Zenodo (taken from NMRlipidsI project). This should take a while for the trr file, so be patient!

[3]:
!wget https://zenodo.org/record/13279/files/popc0-25ns.trr
!wget https://zenodo.org/record/13279/files/endCONF.gro
!ls
--2021-07-06 16:49:43--  https://zenodo.org/record/13279/files/popc0-25ns.trr
Resolving zenodo.org (zenodo.org)... 137.138.76.77
Connecting to zenodo.org (zenodo.org)|137.138.76.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1712544744 (1.6G) [application/octet-stream]
Saving to: ‘popc0-25ns.trr’

popc0-25ns.trr      100%[===================>]   1.59G  20.5MB/s    in 74s

2021-07-06 16:50:58 (21.9 MB/s) - ‘popc0-25ns.trr’ saved [1712544744/1712544744]

--2021-07-06 16:50:59--  https://zenodo.org/record/13279/files/endCONF.gro
Resolving zenodo.org (zenodo.org)... 137.138.76.77
Connecting to zenodo.org (zenodo.org)|137.138.76.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1968406 (1.9M) [application/octet-stream]
Saving to: ‘endCONF.gro’

endCONF.gro         100%[===================>]   1.88M  10.5MB/s    in 0.2s

2021-07-06 16:51:00 (10.5 MB/s) - ‘endCONF.gro’ saved [1968406/1968406]

endCONF.gro
img
Notebook_01_buildH_calc_OP.ipynb
Notebook_02_buildH_calc_OP_outputwH.ipynb
Notebook_03_mixture_POPC_POPE.ipynb
Notebook_04_library.ipynb
Notebook_05_mixture_POPC_cholesterol.ipynb
popc0-25ns.trr

OK the 2 files freshly downloaded are here. Now we need one last file. It’s a definition file relating generic names for each C-H bond to the atom names in the PDB file. In buildH, there is a specific directory def_files from which you can download such a def file. (note that you can find some also on the NMRlipids repo). Here we want to download the .def file corresponding to POPC Berger :

[4]:
!wget https://raw.githubusercontent.com/patrickfuchs/buildH/master/def_files/Berger_POPC.def
!ls
--2021-07-06 16:52:18--  https://raw.githubusercontent.com/patrickfuchs/buildH/master/def_files/Berger_POPC.def
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2135 (2.1K) [text/plain]
Saving to: ‘Berger_POPC.def’

Berger_POPC.def     100%[===================>]   2.08K  --.-KB/s    in 0.001s

2021-07-06 16:52:18 (3.09 MB/s) - ‘Berger_POPC.def’ saved [2135/2135]

Berger_POPC.def
endCONF.gro
img
Notebook_01_buildH_calc_OP.ipynb
Notebook_02_buildH_calc_OP_outputwH.ipynb
Notebook_03_mixture_POPC_POPE.ipynb
Notebook_04_library.ipynb
Notebook_05_mixture_POPC_cholesterol.ipynb
popc0-25ns.trr

Let us have a look to this def file:

[5]:
!head Berger_POPC.def
gamma1_1 POPC C1  H11
gamma1_2 POPC C1  H12
gamma1_3 POPC C1  H13
gamma2_1 POPC C2  H21
gamma2_2 POPC C2  H22
gamma2_3 POPC C2  H23
gamma3_1 POPC C3  H31
gamma3_2 POPC C3  H32
gamma3_3 POPC C3  H33
beta1 POPC C5  H51

The first column is the generic C-H name, second is the residue (lipid) name, third column the carbon PDB name, last column the hydrogen PDB name. This file will tell buildH which Hs you want to rebuild, and then to calculate the order parameter on the corresponding C-Hs. It means that it is possible to provide only a subset of possible C-Hs if, for example, you only want those from the polar heads. Note that if you request buildH to output a trajectory (not done in this notebook), the def file must contain all possible C-Hs of the lipid considered. In such a case, the newly built H will have the name written in the 4th column of this def file.

Now, let’s have a quick look to the .gro file:

[6]:
!head endCONF.gro
Generated by trjconv : 2:1:1 sdpc:SM:CHOL system 256 lipids t= 50000.00000
28526
    1PLA     C1    1   5.576   3.681   2.227 -0.0394  0.0935 -0.3339
    1PLA     C2    2   5.553   3.886   2.337  0.1128 -0.0058 -0.1560
    1PLA     C3    3   5.762   3.834   2.277 -0.6144  0.1201 -0.8703
    1PLA     N4    4   5.640   3.768   2.326 -0.1182 -0.1907 -0.0302
    1PLA     C5    5   5.658   3.709   2.460 -0.2986 -0.2323 -0.0245
    1PLA     C6    6   5.766   3.607   2.495  0.4136  0.5471  0.0677
    1PLA     O7    7   5.752   3.488   2.417  0.3142  0.2538  0.5323
    1PLA     P8    8   5.893   3.411   2.413  0.2220  0.1104  0.0738

We see that in this .gro file containing POPC lipids, the residue name is called PLA. This residue name will be of importance when launching buildH.

Great, all files were successfully downloaded, we can now proceed. Let us launch buildH!

Launching buildH for order parameter calculation#

Now, we can launch buildH on this downloaded trajectory. For that we need a few arguments :

  • the -c argument corresponding to the coordinate file. It consists of a pdb or gro file, here endCONF.gro, buildH infers the topology of the system from it;

  • the -l argument which tells which lipid we want to analyze, here Berger_PLA (Berger corresponds to the force field name used for this simulation, PLA is the residue name of the lipid we want to analyze in the pdb/gro file); note that if you simulated a mixture of different lipids, you will have to launch buildH separately for each lipid (see Notebook03 and Notebook05);

  • the -d argument corresponding to the def file we just downloaded, here Berger_POPC.def ;

  • the -t argument which is the trajectory file, here popc0-25ns.trr.

  • the -o argument which tells buildH the output file name for storing the calculated order parameters.

Please be patient, this will take a while since we have 2500 frames. Fortunately, buildH geometrical calculations are accelerated using Numba. On single core Xeon @ 3.60GHz, it took ~ 7’.

[7]:
!buildH -c endCONF.gro -l Berger_PLA -d Berger_POPC.def -t popc0-25ns.trr -o OP_POPC_Berger_25ns.out
Constructing the system...
System has 28526 atoms
Dealing with frame 0 at 0.0 ps.
Dealing with frame 1 at 10.0 ps.
Dealing with frame 2 at 20.0 ps.
Dealing with frame 3 at 30.0 ps.
Dealing with frame 4 at 40.0 ps.
Dealing with frame 5 at 50.0 ps.
Dealing with frame 6 at 60.0 ps.
Dealing with frame 7 at 70.0 ps.
Dealing with frame 8 at 80.0 ps.
Dealing with frame 9 at 90.0 ps.
Dealing with frame 10 at 100.0 ps.
Dealing with frame 11 at 110.0 ps.
Dealing with frame 12 at 120.0 ps.
Dealing with frame 13 at 130.0 ps.
Dealing with frame 14 at 140.0 ps.
Dealing with frame 15 at 150.0 ps.
Dealing with frame 16 at 160.0 ps.
Dealing with frame 17 at 170.0 ps.
Dealing with frame 18 at 180.0 ps.
Dealing with frame 19 at 190.0 ps.
Dealing with frame 20 at 200.0 ps.
Dealing with frame 21 at 210.0 ps.
Dealing with frame 22 at 220.0 ps.
^C
Traceback (most recent call last):
  File "/home/fuchs/software/miniconda3/envs/buildh/bin/buildH", line 33, in <module>
    sys.exit(load_entry_point('buildh', 'console_scripts', 'buildH')())
  File "/home/fuchs/papers/buildH_project/buildH/buildh/UI.py", line 148, in entry_point
    main(args.coord, args.traj, args.defop, args.out, args.opdbxtc, dic_lipid, args.begin, args.end)
  File "/home/fuchs/papers/buildH_project/buildH/buildh/UI.py", line 341, in main
    core.fast_build_all_Hs_calc_OP(universe_woH, begin, end, dic_OP,
  File "/home/fuchs/papers/buildH_project/buildH/buildh/core.py", line 533, in fast_build_all_Hs_calc_OP
    Hs_coor = buildHs_on_1C(Cname_position, typeofH2build,
  File "/home/fuchs/papers/buildH_project/buildH/buildh/core.py", line 59, in buildHs_on_1C
    H1_coor, H2_coor, H3_coor = hydrogens.get_CH3(atom,
  File "/home/fuchs/papers/buildH_project/buildH/buildh/hydrogens.py", line 114, in get_CH3
    coor_He = LENGTH_CH_BOND * unit_vect_He + atom
KeyboardInterrupt

On the above cell, we launched buildH to show you how it works. For each frame handled by buildH, a message Dealing with frame XXX at YYY ps. is printed to the output. Because it takes some time and because it should have printed 2500 Dealing with frame..., we just pressed Ctrl-C to get this notebook not too much polluted by buildH output. This explains why there is a Python KeyboardInterrupt message.

Note that, you can also launch buildH in a regular shell that way:

$ buildH -c endCONF.gro -l Berger_PLA -d Berger_POPC.def -t popc0-25ns.trr -o OP_POPC_Berger_25ns.out

Or that way:

$ buildH -c endCONF.gro -l Berger_PLA -d Berger_POPC.def -t popc0-25ns.trr -o OP_POPC_Berger_25ns.out > /dev/null

In this case, all the messages Dealing with frame... go to /dev/null, thus no more lengthy output ;-)

Once completed, we should have a new file called OP_POPC_Berger_25ns.out containing the order parameters. You can choose any name with the option -o when you launch buildH. If you did not run buildH on the full trajectory, you can find that file here.

[9]:
!ls
Berger_POPC.def
endCONF.gro
img
Notebook_01_buildH_calc_OP.ipynb
Notebook_02_buildH_calc_OP_outputwH.ipynb
Notebook_03_mixture_POPC_POPE.ipynb
Notebook_04_library.ipynb
Notebook_05_mixture_POPC_cholesterol.ipynb
OP_POPC_Berger_25ns.out
popc0-25ns.trr

Analyzing results#

First, let’s have a look to what this output file OP_POPC_Berger_25ns.out contains.

[10]:
!head OP_POPC_Berger_25ns.out
# OP_name            resname atom1 atom2  OP_mean OP_stddev OP_stem
#--------------------------------------------------------------------
gamma1_1             PLA     C1    H11    0.01788  0.09262  0.00819
gamma1_2             PLA     C1    H12   -0.00744  0.03963  0.00350
gamma1_3             PLA     C1    H13   -0.00855  0.03601  0.00318
gamma2_1             PLA     C2    H21    0.01850  0.09283  0.00820
gamma2_2             PLA     C2    H22   -0.00666  0.04559  0.00403
gamma2_3             PLA     C2    H23   -0.00894  0.04016  0.00355
gamma3_1             PLA     C3    H31    0.01854  0.09235  0.00816
gamma3_2             PLA     C3    H32   -0.01158  0.03796  0.00335

For each C-H we get the mean order parameter OP_mean (average over frames and over residues), its standard deviation OP_stddev (standard deviation over residues), and its standard error of the mean OP_stem (OP_stddev divided by the square root of the number of residues). More information about how OP_mean, OP_stddev and OP_stem are computed can be found in the documentation: Order parameters and statistics

Now we can make a plot of the order parameter. For that, we can use Pandas and its very convenient dataframes to load buildH output. Pandas dataframes allow the selection of any row or column in a very powerful way. If you are not familiar with pandas dataframes, you can take a look here. For reading buildH output and generate a dataframe, we use the Pandas function .read_csv() which has a great deal of options to read a file. Here we tell it to use a specific separator (\s+ which means any combination of white spaces), to skip 2 lines at the beginning of the files (beginning with a #) and to assign specific names to each column.

[11]:
import pandas as pd

df = pd.read_csv("OP_POPC_Berger_25ns.out", sep="\s+", skiprows=2,
                 names=["CH_name", "res", "carbon", "hydrogen", "OP", "STD", "STEM"])
df
[11]:
CH_name res carbon hydrogen OP STD STEM
0 gamma1_1 PLA C1 H11 0.01788 0.09262 0.00819
1 gamma1_2 PLA C1 H12 -0.00744 0.03963 0.00350
2 gamma1_3 PLA C1 H13 -0.00855 0.03601 0.00318
3 gamma2_1 PLA C2 H21 0.01850 0.09283 0.00820
4 gamma2_2 PLA C2 H22 -0.00666 0.04559 0.00403
... ... ... ... ... ... ... ...
77 oleoyl_C17a PLA CA1 HA11 -0.05500 0.01999 0.00177
78 oleoyl_C17b PLA CA1 HA12 -0.05225 0.01950 0.00172
79 oleoyl_C18a PLA CA2 HA21 0.03903 0.02647 0.00234
80 oleoyl_C18b PLA CA2 HA22 -0.05292 0.02033 0.00180
81 oleoyl_C18c PLA CA2 HA23 -0.05025 0.01972 0.00174

82 rows × 7 columns

If we want only the order parameter for the polar head we can use the following. Suffixing the dataframe with a list of lists [["CH_name", "OP"]] allows us to select the columns we are interested in. Then, with the .iloc() method we can get the 18 first rows corresponding to the polar head (beware it starts counting at 0 like all sequential objects in Python).

[12]:
df[["CH_name", "OP"]].iloc[:18]
[12]:
CH_name OP
0 gamma1_1 0.01788
1 gamma1_2 -0.00744
2 gamma1_3 -0.00855
3 gamma2_1 0.01850
4 gamma2_2 -0.00666
5 gamma2_3 -0.00894
6 gamma3_1 0.01854
7 gamma3_2 -0.01158
8 gamma3_3 -0.00715
9 beta1 0.03634
10 beta2 0.06587
11 alpha1 0.10349
12 alpha2 0.13568
13 g3_1 -0.28403
14 g3_2 -0.15962
15 g2_1 -0.16767
16 g1_1 0.20330
17 g1_2 0.08910

Let us do the same for the palmitoyl chain.

[13]:
df[["CH_name", "OP"]].iloc[18:49]
[13]:
CH_name OP
18 palmitoyl_C2a -0.17593
19 palmitoyl_C2b -0.16136
20 palmitoyl_C3a -0.19742
21 palmitoyl_C3b -0.18853
22 palmitoyl_C4a -0.19367
23 palmitoyl_C4b -0.18011
24 palmitoyl_C5a -0.19815
25 palmitoyl_C5b -0.19364
26 palmitoyl_C6a -0.19251
27 palmitoyl_C6b -0.19193
28 palmitoyl_C7a -0.18894
29 palmitoyl_C7b -0.19106
30 palmitoyl_C8a -0.17809
31 palmitoyl_C8b -0.17827
32 palmitoyl_C9a -0.17190
33 palmitoyl_C9b -0.17066
34 palmitoyl_C10a -0.15468
35 palmitoyl_C10b -0.15328
36 palmitoyl_C11a -0.14349
37 palmitoyl_C11b -0.14297
38 palmitoyl_C12a -0.12392
39 palmitoyl_C12b -0.12448
40 palmitoyl_C13a -0.11173
41 palmitoyl_C13b -0.11094
42 palmitoyl_C14a -0.09097
43 palmitoyl_C14b -0.09039
44 palmitoyl_C15a -0.07367
45 palmitoyl_C15b -0.07496
46 palmitoyl_C16a 0.11169
47 palmitoyl_C16b -0.07181
48 palmitoyl_C16c -0.07307

And now, for the oleoyl chain.

[14]:
df[["CH_name", "OP"]].iloc[49:]
[14]:
CH_name OP
49 oleoyl_C2a -0.15474
50 oleoyl_C2b -0.17336
51 oleoyl_C3a -0.16911
52 oleoyl_C3b -0.17847
53 oleoyl_C4a -0.17385
54 oleoyl_C4b -0.18235
55 oleoyl_C5a -0.17504
56 oleoyl_C5b -0.17690
57 oleoyl_C6a -0.16138
58 oleoyl_C6b -0.16692
59 oleoyl_C7a -0.15512
60 oleoyl_C7b -0.15994
61 oleoyl_C8a -0.10190
62 oleoyl_C8b -0.10603
63 oleoyl_C9a -0.08281
64 oleoyl_C10a -0.01791
65 oleoyl_C11a -0.05119
66 oleoyl_C11b -0.04599
67 oleoyl_C12a -0.09275
68 oleoyl_C12b -0.08879
69 oleoyl_C13a -0.08643
70 oleoyl_C13b -0.08259
71 oleoyl_C14a -0.09133
72 oleoyl_C14b -0.08893
73 oleoyl_C15a -0.07729
74 oleoyl_C15b -0.07524
75 oleoyl_C16a -0.07256
76 oleoyl_C16b -0.07041
77 oleoyl_C17a -0.05500
78 oleoyl_C17b -0.05225
79 oleoyl_C18a 0.03903
80 oleoyl_C18b -0.05292
81 oleoyl_C18c -0.05025

Plot of the order parameter#

Now that we have loaded the data with pandas, and that we can select any part of the lipid, we can use the matplotlib module to make a nice plot. For example, we can plot the order parameter of the palmitoyl chain. Note that we select rows 18 to 45 to show only the first 15 carbons, since we do not want the final CH3 of carbon 16.

[15]:
import numpy as np
import matplotlib.pyplot as plt

OPs = df["OP"].iloc[18:46]

x = np.arange(len(OPs))
plt.scatter(x, -OPs)
plt.xticks(x, df["CH_name"].iloc[18:46], rotation='vertical')
plt.xlabel("C-H name")
plt.ylabel("-S_CH")
plt.title("Order parameter palmitoyl chain, Berger POPC")
[15]:
Text(0.5, 1.0, 'Order parameter palmitoyl chain, Berger POPC')
../_images/notebooks_Notebook_01_buildH_calc_OP_29_1.png

Another example with the polar head, on the alpha beta of the choline and glycerol C-Hs.

[16]:
import numpy as np
import matplotlib.pyplot as plt

OPs = df["OP"].iloc[9:18]
STEMs = df["STEM"].iloc[9:18]

# These 2 avoid plotting a line between the points.
lines = {'linestyle': 'None'}
plt.rc('lines', **lines)

x = np.arange(len(OPs))
plt.errorbar(x, -OPs, STEMs, fmt='', marker='.')
plt.xticks(x, df["CH_name"].iloc[9:18], rotation='vertical')
plt.xlabel("C-H name")
plt.ylabel("-S_CH")
plt.title("Order parameter polar head, Berger POPC")
[16]:
Text(0.5, 1.0, 'Order parameter polar head, Berger POPC')
../_images/notebooks_Notebook_01_buildH_calc_OP_31_1.png

Since the error bars are very small, we used marker='.' to draw tiny points. If we use the standard marker='o', the error bars are often not seen because they are smaller than the point itself.

Conclusion#

In this notebook, we showed you how to use buildH to build hydrogens from a united-atom trajectory and calculate the order parameter on it. Then, we showed some convenient use of pandas and matplotlib Python modules to select the data we are interested in and plot the corresponding results.