Biomodeling: water box
Hello, water box! 1_water_box in the tutorial files
Generate a water box in a 5-nm cube
-box: box dimension
-cs: type of water. By default, a 3-point water type.
-o: output file name
Open the output file using a text editor and see the format of PDB file
How many water molecules do we have in this 5-nm cubic box?
Log message will tell the number.
Otherwise, we can count by using wc command
Use VMD to see the water box.
Input files for MD simulation
topol.top: topology file
topol. top file contains information of force field and the simulation system.
copy files/topol.top to PWD (present working directory)
"." indicates the current directory.
A reusable part can be saved in a separate file, which can included in topol.top
To see the installed force fields, see $GMXDATA/top directory
In this example, we will use amber99sb.ff.
amber99sb.ff/forcefield.itp contains files with all force field parameters.
See the files that forcefield.itp includes.
amber99sb.ff/tip3p.itp contains "moleculetype" definition for the TIP3P water model.
grompp.mdp: MD Option file
MDP file contains MD simulation Options.
See https://manual.gromacs.org/current/user-guide/mdp-options.html for details.
Copy a sample MDP file to PWD (present working directory)
cp files/*.mdp .
Open grompp.mdp using a text editor
See each section of the file.
The first MD simulation
topol.tpr: Input file for MDRUN
Now, we have all input files for simulation:
Structure file: 1.pdb
gmx grompp program assembles information from all three files into a topol.tpr file
-pp: write the topology file interpreted by grompp. It writes a long file with all the parameters in a single file without using #include.
topol.tpr is a binary file.
To see the content of topol.tpr use gmx dump command.
The first MDRUN
Now, we perform MD by gmx mdrun program
-nt: the number of CPU cores for this run.
-gpu_id: GPU ID for this run if you have GPU in your system, 0, 1, 2, etc. If you have only one GPU in the system, use 0 or omit this option.
The simulation will fail because LJ interaction forces are too high (i.e., too much acceleration).
see md.log file
It is important to check the energy values
In normal MD simulations in a water box, the total energy of the system should be a huge negative value.
Minimize the system
Prepare a new topol.tpr file for minimization:
Compare mini.mdp with grompp.mdp:
diff mini.mdp grompp.mdp
-c: The minimized system will be written t0 2.pdb
see md.log and ener.edr files to see change in energy.
Try MD again
Open grompp.mdp and change the settings:
-c: Now, we are using the minimized structure.
it will generate some output files:
md.log : text log file
ener.edr : binary energy file
traj_comp.xtc : trajectory file
state.cpt : check point file for continuation
See traj_comp.xtc using VMD
traj_comp.xtc will look weird because water molecules at the boundary are broken.
This happens because gmx mdrun wants to have all atoms in a unit cell.
We can fix this by using gmx trjconv
Load tmp1.xtc to VMD, and see the difference.
We can fix PDB file using gmx trjconv
2.pdb file has broken water molecules because it is an output from gmx mdrun.
To prevent molecules from box crossing, use -pbc nojump option.
the starting conformation should be whole. 1.pdb file in this case.
MD for 10 ns
check box size and temperature
Structural properties of water: Radial distribution function, g(r)
g(r) tells the average spatial distribution (or probability) with respect to a particle.
g(r) is unitless and converges to 1 at large r.
Number of particles at rdr equals
g(r) * 4 pi rdr * average_number_density
g(r) can tell most thermodynamic and structural properties of simple liquid.
for example, G(r) = -kT ln g(r)
Calculation of g(r) using Gromacs
The analysis group here is oxygen atoms.
i.e., we want to calculate the distribution of O-O distance.
Generate an index file using gmx make_ndx
Index file (ndx) is just a collection of atom indices.
Default groups do not have oxygen atom groups.
Calculate g(r) using gmx rdf
Select "O*" group as a reference group
Select "O*" group as an analysis group.
ctrl+D to tell gmx rdf that you're done selction.
see rdf.xvg and rdf_cn.xvg files.
Plot rdf.xvg and rdf_cn_xvg using xmgrace or gnuplot.
Plot g(r) using gnuplot.
Run gnuplot by typing gnuplot in your terminal.
Type the followings in your gnuplot session:
The relationship between g(r) and the free energy G(r)
g(r) is related to the free energy G(r) by G(r) = -kT ln g(r).
Plot G(r) using gnuplot
The curve shows the average interaction free energy between water molecules.
Calculation of g(r) using VMD
Potential of mean force by Umbrella sampling
g(r) tells us the ensemble-averaged interaction for water pairs.
This is possible we can sample an ensemble of water configurations sufficiently.
In other words, this is now a rare event.
But, to calculate interaction ∆G for the hard-to-sample pairs, we need a technique that makes sampling rare events possible.
First, we need to select two water molecules, for which we want to calculate intermolecular interaction free energy.
The chance that these two specific water molecules meet is RARE!
Thus, it will take a very long time to sample the rare events sufficiently.
In umbrella sampling, we constraint the water-water pair by using a harmonic spring.
See PULL section in MDP options.
Generate index file for two water molecules
We will just select the first and the second water molecules. The selection will not change the results. All water molecules are equal!
Caution: The initial distance between chosen water molecules must be smaller than half of the box. Otherwise, it violated PBC.
Perform simulation in each window
0.2 nm window
See PULL section in pull.02.mdp
Generate pull.02.tpr for gmx mdrun
-t state.cpt: use information from the check point file.
Perform MDRUN for 1 ns
pullx files contain the length of spring
pullf files contain the force on the spring (F = -kx)
Repeat same things for other windows
Weighted Histogram Analysis Method (WHAM)
Umbrella potentials force sampling rare events.
But, the effect of umbrella potentials must be removed to get G(r).
Weighted histogram analysis method (WHAM) is a method that reconstructs G(r) by iteratively removing the effect of Umbrella potentials.
gmx wham is the Gromacs implemenation.
gmx wham requires two input files:
list of pullx files
list of tpr files
pullx-files.dat and tpr-files.dat must be ordered in the same way.
histo.xvg contains histogram (or probability distribution) of each window.
Plot histograms of all windows using gnuplot
profile.xvg file contains ∆G curve as a function of r.
Compare g(r) and ∆G using gnuplot
Note that kT ~ 2.5 kJ/mol nm
These are free energy difference ∆G(r), not the absolute free energy G(r). Thus, we can shift the curves by adding a constant.
What does 2.5*2*log($1) do?
How to estimate error?
gmx wham has bootstrap options.
More straightforward way is getting multiple results (or splitting simulation into multiple blocks) and calculate stardard errors.
How can we tell if the Umbrella sampling is long enough?
For many real problems, very long simulations are necessary.
Keep continuing each window.
When you have multiple pullx files for a window, you need to combine them into a pullx file. For example, awk command.
Calculate gmx wham frequently.
If PMF converges (i.e., PMF does not change any more), quit.
Otherwise, keep simulating.
Dynamic properties of water: Diffusion
Mean squared displacement (MSD) = 6D∆t
What is the pairwise interaction free energy between a pair of Na and Cl ions?
Add a pair of Na and Cl to water box.
You can do it manually using a text editor:
Change two water molecules to Na and Cl ions.
Change topol.top accordingly
Repeat the umbrella sampling for the Na-Cl pair