Lindqvist -- a blog about Linux and Science. Mostly.: PES

Showing posts with label PES. Show all posts

16 April 2014

572. autorotate/superimpose python script

If you want to calculate reaction coordinates between two structures you need to make sure that the structures haven't been rotated or translated, something which easily happens if you allow symmetry in gaussian and (it seems) z-matrix in nwchem.

I've written a script that lets you take two structures and align and superimpose them so that only the atoms that take part in the reaction move.

It works by you defining a minimum of four atoms that aren't supposed to move /relative to each other/ (i.e. they can be translated/rotated --- just not relative to each other) between the two structures. Four atoms far from each other are ideal. You need to make sure that they also don't lie in the same plane, but form a three-dimensional space.

I've tried this on real molecules too and it works better than I'd ever dared to hope for. The more 'stationary' atoms that you can use to make up the transformation matrix, the better.

Example:
In this example atom F is in a different location in structures a and b. The structures have also been rotated relative to each other.

a.xyz
6
A
A               0.00000             0.00000             1.00000
B               0.00000             1.00000             0.00000
C               1.00000             0.00000             0.00000
D              -1.00000             0.00000             0.00000
E               0.00000             0.00000            -1.00000
F               0.00000             0.500000            0.00000

b.xyz
6
B
A 1 0 0
B 0 1 0
C 0 0 -1
D 0 0 1
E -1 0 0
F 0 -1 0

./autorotate.py a.xyz b.xyz '1,2,3,4'

Selected atoms in molecules 1 and 2
['A', 0.0, 0.0, 1.0] ['A', 1.0, 0.0, 0.0]
['B', 0.0, 1.0, 0.0] ['B', 0.0, 1.0, 0.0]
['C', 1.0, 0.0, 0.0] ['C', 0.0, 0.0, -1.0]
['D', -1.0, 0.0, 0.0] ['D', 0.0, 0.0, 1.0]
Transformation max error: 3.33066907388e-16
Writing to  a.rot.xyz

a.rot.xyz
6
A
A               1.00000             0.00000             0.00000
B              -0.00000             1.00000            -0.00000
C              -0.00000             0.00000            -1.00000
D              -0.00000             0.00000             1.00000
E              -1.00000            -0.00000            -0.00000
F              -0.00000             0.50000            -0.00000

This is how it looks (note that the axis aren't aligned with (1,0,0; 0,1,0; 0,0,1) but seem to go through the centre of the molecule):

a.xyz

a.rot.xyz

b.xyz

Code:

#!/usr/bin/python
import sys
import numpy as np

#autorotate input_1.xyz input_2.xyz '1,2,3,4'
# need to pick at least four atoms that are not in the same plane
# input_1.xyz will be rotated to align with input_2.xyz
# you pick at least four atoms that should have the same positions
# relative to one another (i.e. distance and relative geometry). These 
# are then used to calculate an affine transform matrix which is used 
# to rotate and translate structure input_1.xyz to overlap with 
# structure 2

def formatinput(argument):
 infile1=sys.argv[1]
 atoms=sys.argv[3]
 atoms=atoms.split(',')
 coord_sys=[]

 for n in atoms:
  coord_sys+=[int(n)-1]
 try:
  infile2=sys.argv[2]
 except:
  infile2=''
 infile=[infile1,infile2]
 return infile,coord_sys
 
def getrawdata(infile):
 f=open(infile,'r')
 
 n=0
 preamble=[]
 
 struct=[]
 
 for line in f:
  if n<2: data-blogger-escaped-if="" data-blogger-escaped-line.rstrip="" data-blogger-escaped-n="" data-blogger-escaped-preamble="">1:
   line=line.rstrip()
   struct+=[line]
  n+=1
 xyz=[struct]
 
 return xyz, preamble

def getcoords(rawdata,preamble,atoms):
 
 n=0
 cartesian=[]
 
 for structure in rawdata:
  n=n+1
  num="%03d" % (n,)
  for item in structure:
   
   coordx=filter(None,item.split(' '))
   coordy=filter(None,item.split('\t'))
   if len(coordx)>len(coordy):
    coords=coordx
   else:
    coords=coordy
      
   coordinates=[float(coords[1]),float(coords[2]),float(coords[3])]
   element=coords[0]
   cartesian+=[[element,float(coords[1]),float(coords[2]),float(coords[3])]]
     
 return cartesian

def getstructures(rawdata,preamble):
 
 n=0
 cartesian=[]
 
 for structure in rawdata:
  n=n+1
  num="%03d" % (n,)
  for item in structure:
   
   coordx=filter(None,item.split(' '))
   coordy=filter(None,item.split('\t'))
   if len(coordx)>len(coordy):
    coords=coordx
   else:
    coords=coordy
      
   coordinates=[float(coords[1]),float(coords[2]),float(coords[3])]
   element=coords[0]
   cartesian+=[coordinates]
     
 return cartesian

def affine_transform(atoms,structures):
# from http://stackoverflow.com/questions/20546182/how-to-perform-coordinates-affine-transformation-using-python-part-2
 primaryatomcoords=[]
 for n in atoms:
  primaryatomcoords+=[structures[0][n]]

 secondaryatomcoords=[]
 for n in atoms:
  secondaryatomcoords+=[structures[1][n]]

 primary = np.array(primaryatomcoords)
 secondary = np.array(secondaryatomcoords)
 primaryfull = np.array(structures[0])

 # Pad the data with ones, so that our transformation can do translations too
 n = primary.shape[0]
 pad = lambda x: np.hstack([x, np.ones((x.shape[0], 1))])
 unpad = lambda x: x[:,:-1]
 X = pad(primary)
 Y = pad(secondary)
 Xp= pad(primaryfull)

 # Solve the least squares problem X * A = Y
 # to find our transformation matrix A
 A, res, rank, s = np.linalg.lstsq(X, Y)

 transform = lambda x: unpad(np.dot(pad(x), A))

# print "Max error should be as small as possible if the rotation is successful"
# print "If max error is large you may have selected a bad set of atoms"
 print "Transformation max error:", np.abs(secondary - transform(primary)).max()
 secondaryfull=transform(primaryfull)
 return secondaryfull

def transform_xyz(tmatrix,newxyz):
 final_xyz=[]
 for n in newxyz:
  coord=np.mat(str(n[0])+';'+str(n[1])+';'+str(n[2]))
  newcoord=np.dot(tmatrix,coord)
  newcoord=np.matrix.tolist(newcoord)
  final_xyz+=[[ newcoord[0][0],newcoord[1][0],newcoord[2][0]]]
 return final_xyz

def genxyzstring(coords,elementnumber):
 x_str='%10.5f'% coords[0]
 y_str='%10.5f'% coords[1]
 z_str='%10.5f'% coords[2]
 element=elementnumber
 xyz_string=element+(3-len(element))*' '+10*' '+\
 (8-len(x_str))*' '+x_str+10*' '+(8-len(y_str))*' '+y_str+10*' '+(8-len(z_str))*' '+z_str+'\n'
 
 return xyz_string

def write_aligned(aligned_structure,atom_coords,preamble,outfile):
 outfile=outfile.replace('.xyz','.rot.xyz')
 print "Writing to ",outfile
 g=open(outfile,'w')
 g.write(str(preamble[0])+'\n'+str(preamble[1])+'\n')
 
 for n in range(0,len(aligned_structure)):
  xyzstring=genxyzstring(aligned_structure[n],atom_coords[n][0])
  g.write(xyzstring)
 g.close()
 return 0
 
if __name__ == "__main__":
 infile,atoms=formatinput(sys.argv)
 
 xyz=['','']
 preamble=['','']
 
 #get raw data
 xyz[0],preamble[0]=getrawdata(infile[0])
 xyz[1],preamble[1]=getrawdata(infile[1])

 atom_coords=[getcoords(xyz[0],preamble[0],atoms)]
 atom_coords+=[getcoords(xyz[1],preamble[1],atoms)]
 
 #collect structures from raw data
 structures=[getstructures(xyz[0],preamble[0])]
 structures+=[getstructures(xyz[1],preamble[1])]
 
 print "Selected atoms in molecules 1 and 2"
 for n in atoms:
  print atom_coords[0][n],atom_coords[1][n]
  
 #transform structure
 aligned_structure=affine_transform(atoms,structures)
 
 write_aligned(aligned_structure,atom_coords[0],preamble[0],str(infile[0]))

05 September 2013

513. Extracting data from a PES scan with gaussian

There are a few reasons to like gaussian, and many reasons not to. Gaussian is fast, and their whitepapers are great resources for learning computational techniques.

Without going into discussions about the commercial behaviour of Wavefunction inc., the things I don't like about gaussian is the clunky input format (nwchem has a much more readable syntax), the inscrutable error messages, and the unreadable output. Well, it's not unreadable in a literal sense, but it could certainly be clearer. On the other hand, I've having issues with running some of my PES scans in nwchem -- and I can't find a solution (more about that in a later post)

Anyway, here's a python script for extracting optimized structures and energies from a relaxed PES scan in Gaussian 09.

First, an example of a simple scan:


%nprocshared=2
%Chk=methanol.chk
#P rB3LYP/6-31g 6D 10F Opt=(modredundant) NoSymm  Punch=(MO) Pop=() 

methanol

0 1 ! charge and multiplicity
 C     0.0351714     0.00548884     0.0351714
 H     -0.617781     -0.634073     0.667983
 H     0.667983     -0.634073     -0.617781
 H     -0.605139     0.646470     -0.605139
 O     0.839603     0.818768     0.839603
 H     1.38912     0.201564     1.38912

 1 5 S 10 0.1

And here's the script, pes_parse_g09:

#!/usr/bin/python
import sys

def getrawdata(infile):
        f=open(infile,'r')
        opt=0
        geo=0
        struct=[]
        structure=[]
        energies=[]
        energy=[]
        for line in f:
                
                if opt==1 and geo==1 and not ("---" in line):
                        structure+=[line.rstrip()]
                
                if 'Coordinates (Angstroms)' in line:
                        if opt==0:
                                opt=1
                                structure=[]
                        
                if opt==1 and "--------------------------" in line:
                        if geo==0:
                                geo=1
                        elif geo==1:
                                geo=0
                                opt=0
                if 'SCF Done' in line:
                        energy=filter(None,line.rstrip('\n').split(' '))
                if      'Optimization completed' in line and (opt==0 and geo==0):
                        energies+=[float(energy[4])]
                        opt=0
                        geo=0
                        struct+=[structure]
                        structure=[]
        
        return struct, energies

def periodictable(elementnumber):
        ptable={1:'H',2:'He',\
        3:'Li', 4:'Be',5:'B',6:'C',7:'N',8:'O',9:'F',10:'Ne',\
        11:'Na',12:'Mg',13:'Al',14:'Si',15:'P',16:'S',17:'Cl',18:'Ar',\
        19:'K',20:'Ca',\
        21:'Sc',22:'Ti',23:'V',24:'Cr',25:'Mn',26:'Fe',27:'Co',28:'Ni',29:'Cu',30:'Zn',\
        31:'Ga',32:'Ge',33:'As',34:'Se',35:'Br',36:'Kr',\
        37:'Rb',38:'Sr',\
        39:'Y',40:'Zr',41:'Nb',42:'Mo',43:'Tc',44:'Ru',45:'Rh',46:'Pd',47:'Ag',48:'Cd',\
        49:'In',50:'Sn',51:'Sb',52:'Te',53:'I',54:'Xe',\
        55:'Cs',56:'Ba',\
        57:'La',58:'Ce',59:'Pr',60:'Nd',61:'Pm',62:'Sm',63:'Eu',64:'Gd',65:'Tb',66:'Dy',67:'Ho',68:'Er',69:'Tm',70:'Yb',71:'Lu',\
        72:'Hf', 73:'Ta', 74:'W',75:'Re', 76:'Os', 77:'Ir',78:'Pt', 79:'Au', 80:'Hg',\
        81:'Tl', 82:'Pb', 83:'Bi',84:'Po',85:'At',86:'Rn',\
        87:'Fr',88:'Ra',\
        89:'Ac',90:'Th',91:'Pa',92:'U',93:'Np',94:'Pu',95:'Am',96:'Cm',97:'Bk',98:'Cf',99:'Es',100:'Fm',101:'Md',102:'No',\
        103:'Lr',104:'Rf',105:'Db',106:'Sg',107:'Bh',108:'Hs',109:'Mt',110:'Ds',111:'Rg',112:'Cn',\
        113:'Uut',114:'Fl',115:'Uup',116:'Lv',117:'Uus',118:'Uuo'}
        element=ptable[elementnumber]
        return element

def genxyzstring(coords,elementnumber):
        x_str='%10.5f'% coords[0]
        y_str='%10.5f'% coords[1]
        z_str='%10.5f'% coords[2]
        element=periodictable(int(elementnumber))
        xyz_string=element+(3-len(element))*' '+10*' '+\
        (8-len(x_str))*' '+x_str+10*' '+(8-len(y_str))*' '+y_str+10*' '+(8-len(z_str))*' '+z_str+'\n'
 
        return xyz_string

def getstructures(rawdata):
        
        n=0
        for structure in rawdata:
                
                n=n+1
                num="%03d" % (n,)
                g=open('structure_'+num+'.xyz','w')
                itson=False
                cartesian=[]
                        
                for item in structure:
                        
                        coords=filter(None,item.split(' '))
                        coordinates=[float(coords[3]),float(coords[4]),float(coords[5])]
                        element=coords[1]
                        cartesian+=[genxyzstring(coordinates,element)]
                g.write(str(len(cartesian))+'\n')
                g.write('Structure '+str(n)+'\n')
                for line in cartesian:
                        g.write(line)
                g.close()
                cartesian=[]
        return 0
        
if __name__ == "__main__":
        infile=sys.argv[1]
        rawdata,energies=getrawdata(infile)
        structures=getstructures(rawdata)
        g=open('energies.dat','w')
        for n in range(0,len(energies)):
                g.write(str(n)+'\t'+str(energies[n])+'\n')
        g.close()

And here's what we get from the output:

g09 methanol.in |tee methanol.out
pes_parse_g09 methanol.log
cat structure* > meoh_traj.xyz

And here's a plot of energies.dat:

512. Briefly: zmatrices in nwchem -- methanol

And another update:
I can now confirm that using your own z matrix still does not constrain the geometry during a PES scan, which was the original impetus for this post: http://verahill.blogspot.com.au/2013/09/511-when-nwchem-pes-scans-fail-to.html

Another update:
the gaussian run failed after 14 geometry steps during the first PES point.


NTrRot=    -1 NTRed=   628 NAtoms=    34 NSkip=   532 IsLin=F
 Error in internal coordinate system.
 Error termination via Lnk1e in /opt/gaussian/g09/l103.exe at Thu Sep  5 18:17:12 2013. 
 Job cpu time:  0 days 22 hours 25 minutes 27.6 seconds.
 File lengths (MBytes):  RWF=    192 Int=      0 D2E=      0 Chk=     28 Scr=      1

Not being an expert, to me it seems that there's something fundamentally difficult with the system I'm working on. In an ideal world I'd give the actual details, but quite apart from the risk of being scooped, doing so would also make it easier to identify me (not that it's impossible at this point).

[Suffice to say that the system holds a large polyoxoanion and a small p-block anion, both of which are symmetrical and negatively charged. The goal of the PES scan is to bring the ions closer to see whether they 'react'. Which is also a troublesome use of computational resources -- computational chemistry is good at answering well-defined questions using carefully designed computational experiments -- but not generally very good at answering ill-defined questions about synthesis (i.e. you can't generally 'mix two things together and see what happens' and expect a useful result. Anyway, regardless of that, that's exactly what I want to do.]

Update:
nwchem still gives errors about autoz in spite of using noautoz. But I also get messages about the user generated z matrix, so we'll see whether my input is respected or not.

Also, for one of the calcs I'm getting


There are insufficient internal variables: expected   95 got   96

which is really, really, really annoying since there doesn't seem to be a real fix for it -- I've tried everything suggested in http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id286. I can get the same calc to run in gaussian though (gaussian has its own issues), but it'd be nice if stuff just...worked...

Original post:
Normally you don't have to fiddle with zmatrices in nwchem -- instead you'd typically supply cartesian coordinates, and nwchem would do autoz to autogenerate internal (z matrix) coordinates.

Sometimes that fails, and nwchem defaults to using cartesian coordinates. In most cases, this isn't a cause for any real concern -- the computation will continue although I think cartesian coordinates are supposed to be slightly slower.

However, if you're doing a PES scan you'll notice that it's not proceeding as intended -- the constraints are completely ignored: 511. When nwchem PES scans fail to constrain -- autoz failure

The easiest remedy is to supply the internal coordinates directly, but there honestly aren't too many examples online showing how that's done, and I kept on getting annoying failure messages along the lines of


NWChem Input Module
                                -------------------


                                       zmat
                                       ----
 THE 3-D  PIECE OF -Z- DATA FOR ATOM =    2 IS NEITHER FLOATING POINT
 NOR ALPHANUMERIC OR COULD NOT BE MATCHED WITH A VARIABLE. STOP
 IAT=    2 ZMAT=  2  1  0  0  0     0.00000     0.00000     0.00000

 ------------------------------------------------------------------------
                               JOB STOPPED

 PROGRAM STOP IN - ZDAT -

 ------------------------------------------------------------------------
 ------------------------------------------------------------------------
  CALLS IT QUIT FROM HND_HNDERR         0
 ------------------------------------------------------------------------
 This error has not yet been assigned to a category

This particular error came about because the zmatrix module is case sensitive, and my Variables couldn't be interpreted (it should be variables). Anyway, you'll understand more after this post, and it isn't important anyway.

Calculation using a z matrix (internal coordinates) in nwchem, with a little bit of help from openbabel:

Assuming that you set up a calculation in e.g. ECCE for a geometry optimisation of methanol you'll end up with the following input file:


scratch_dir /home/andy/scratch
Title "methanol"

Start  methanol

echo

charge 0

geometry autosym units angstrom
 C     0.0351714     0.00548884     0.0351714
 H     -0.617781     -0.634073     0.667983
 H     0.667983     -0.634073     -0.617781
 H     -0.605139     0.646470     -0.605139
 O     0.839603     0.818768     0.839603
 H     1.38912     0.201564     1.38912
end

ecce_print ecce.out

basis "ao basis" spherical print
  H library "6-31+G*"
  O library "6-31+G*"
  C library "6-31+G*"
END

dft
  mult 1
  direct
  XC b3lyp
  grid fine
  mulliken
end

driver
  default
end

task dft optimize

Take the coordinates, and paste them into a file, e.g. methanol.xyz:


6
methanol
 C     0.0351714     0.00548884     0.0351714
 H     -0.617781     -0.634073     0.667983
 H     0.667983     -0.634073     -0.617781
 H     -0.605139     0.646470     -0.605139
 O     0.839603     0.818768     0.839603
 H     1.38912     0.201564     1.38912

Next, use openbabel:

babel -ixyz methanol.xyz -ogzmat 

#Put Keywords Here, check Charge and Multiplicity.

 methanol

0  1
C
H  1  r2
H  1  r3  2  a3
H  1  r4  2  a4  3  d4
O  1  r5  2  a5  3  d5
H  5  r6  1  a6  2  d6
Variables:
r2= 1.1117
r3= 1.1117
a3= 109.74
r4= 1.1094
a4= 108.78
d4= 118.90
r5= 1.3984
a5= 110.18
d5= 238.51
r6= 0.9924
a6= 105.98
d6=  60.61

1 molecule converted
18 audit log messages

The format isn't quite right (everything in red needs to go, and the V in blue should be lower case), but we can sort that out:

 babel -ixyz ~/methanol.xyz -ogzmat |sed 's/\=//g;s/V/v/g;s/\://g' |tail -n+6 > methanol.zmat

C
H  1  r2
H  1  r3  2  a3
H  1  r4  2  a4  3  d4
O  1  r5  2  a5  3  d5
H  5  r6  1  a6  2  d6
variables
r2 1.1117
r3 1.1117
a3 109.74
r4 1.1094
a4 108.78
d4 118.90
r5 1.3984
a5 110.18
d5 238.51
r6 0.9924
a6 105.98
d6  60.61

Let's update out nwchem input file with the internal coordinates:


scratch_dir /home/andy/scratch
Title "methanol"

Start  methanol

echo

charge 0

geometry noautoz
zmatrix
C
H  1  r2
H  1  r3  2  a3
H  1  r4  2  a4  3  d4
O  1  r5  2  a5  3  d5
H  5  r6  1  a6  2  d6
variables
r2 1.1117
r3 1.1117
a3 109.74
r4 1.1094
a4 108.78
d4 118.90
r5 1.3984
a5 110.18
d5 238.51
r6 0.9924
a6 105.98
d6  60.61
end
end

ecce_print ecce.out

basis "ao basis" spherical print
  H library "6-31+G*"
  O library "6-31+G*"
  C library "6-31+G*"
END

dft
  mult 1
  direct
  XC b3lyp
  grid fine
  mulliken
end

driver
  default
end

task dft optimize

And run. Done!

511. When nwchem PES scans fail to constrain -- autoz failure

Another update:
My jobs have run long enough now that I can confirm that using your own z matrix still does not constrain the bond lengths i.e. the link at the end of the post is useless.

Update:
I'm not sure using a zmatrix actually solved this -- for each step in the optimization it seems that nwchem attempts to generate a new zmatrix, and probably ignoring my input (and yes, I'm using noautoz). I'll let my calcs run for a little while to see whether the constraints are honoured or not.

But I'm getting really frustrated with nwchem right now, especially since gaussian isn't having any issues running these particular jobs (there are other issues with gaussian, such as the format of the output, etc.)

Original post:
I set up PES scans in nwchem as shown in this post: http://verahill.blogspot.de/2013/08/503-relaxed-pes-scanning-in-nwchem.html

I was noticing that while almost all of my potential energy surface scans were working out just fine, some of them would...not. What would happen is that there would be no error messages, but for some reason the e.g. atom-atom distance that was defined (and defined using constant) would not remain constant during the geometry optimization in each step.

I saw this when looking trying to move an anion (9 atoms) step-wise closer to a large, negatively charged metal oxide ion (25 atoms).

I took a while to chase this down. First I though that well maybe the distances weren't really set as immutable, but were just associated with a certain force constant -- and that the anion-anion repulsion somehow overcame this. That wasn't the case.

Instead it was something that I should've paid attention to: the zmatrix generation.
If you find that for some reason the PES scan is not constrained at all, look for something along the lines of the following in your output:


NWChem Input Module
                                -------------------


                                 molecules_def2_svp
                                 ----------------

 Scaling coordinates for geometry "geometry" by  1.889725989
 (inverse scale =  0.529177249)


          ------
          auto-z
          ------
      warning. autoz generated   10 bonds for atom   24
      warning. autoz generated   10 bonds for atom   25
      warning. autoz generated   10 bonds for atom   26
      warning. autoz generated   10 bonds for atom   27
      warning. autoz generated   10 bonds for atom   28
      warning. autoz generated   10 bonds for atom   29
  autoz: The atoms group into disjoint clusters
 cluster   1:    1    2    3    4
 cluster   2:    5    6    7    8    9   10   11   12   13   14   15
                16   17   18   19   20   21   22   23   24   25   26
                27   28   29
 Connecting clusters   1   2 via atoms    3    7 r = 3.71
  autoz: regenerating connections with new bonds
      warning. autoz generated   10 bonds for atom   24
      warning. autoz generated   10 bonds for atom   25
      warning. autoz generated   10 bonds for atom   26
      warning. autoz generated   10 bonds for atom   27
      warning. autoz generated   10 bonds for atom   28
      warning. autoz generated   10 bonds for atom   29
  autoz: excessive number of variables                  2066                   81

 AUTOZ failed to generate good internal coordinates.
 Cartesian coordinates will be used in optimizations.

If that happens, cartesian coordinates will be used, and your


python
from nwgeom import *
geom = ''' 
    geometry adjust
        zcoord
            bond 1 14 %f bond constant
        end
    end 
'''

won't do anything.

The solution is to provide the coordinates as a zmatrix instead -- and that's the focus of my next post:
http://verahill.blogspot.com.au/2013/09/512-briefly-zmatrices-in-nwchem-methanol.html

Oh, and don't forget to include noautz

30 August 2013

506. Extracting optimized structures from a potential energy scan in nwchem

Another update:
It now dumps the energies in a file, energies.dat, as well.

Update:
some programmes, like ecce, are more picky about the xyz format than others (e.g. jmol, vmd). I've updated the code to output xyz files that ecce too can read.

Original post:
When you use scan_input() in nwchem to do a PES scan (see e.g. here: http://verahill.blogspot.com.au/2013/08/503-relaxed-pes-scanning-in-nwchem.html) you get the energies and the gradients for the optimized structures returned as the results. However, for a casual user the atomic actual coordinates is more informative.

Here's a very simple parser written in python (2.7) which extracts the optimized structures from the output file:


#!/usr/bin/python
import sys

def getrawdata(infile):
        f=open(infile,'r')
        opt=0
        geo=0
        energy=[]
        energies=[]
        struct=[]
        structure=[]
        for line in f:
                if "Total DFT" in line:
                        line=filter(None,line.rstrip('\n').split(' '))
                        energy=float(line[4])
                if 'Optimization converged' in line:
                        opt=1
                if opt==1 and 'Geometry' in line:
                        geo=1
                if      'Atomic Mass' in line and (opt==1 and geo==1):
                        opt=0
                        geo=0
                        struct+=[structure]
                        energies+=[energy]
                        structure=[]
                if opt==1 and geo==1:
                        structure+=[line.rstrip()]
        return struct,energies

def genxyzstring(coords,element):
        x_str='%10.5f'% coords[0]
        y_str='%10.5f'% coords[1]
        z_str='%10.5f'% coords[2]
 
        xyz_string=element+(3-len(element))*' '+10*' '+\
        (8-len(x_str))*' '+x_str+10*' '+(8-len(y_str))*' '+y_str+10*' '+(8-len(z_str))*' '+z_str+'\n'
 
        return xyz_string

def getstructures(rawdata):
        
        n=0
        for structure in rawdata:
                
                n=n+1
                num="%03d" % (n,)
                g=open('structure_'+num+'.xyz','w')
                itson=False
                cartesian=[]
                        
                for item in structure:
                        
                        if itson and not(item==""):
                                coords=filter(None,item.split(' '))
                                coordinates=[float(coords[3]),float(coords[4]),float(coords[5])]
                                element=coords[1]
                                cartesian+=[genxyzstring(coordinates,element)]
                                #cartesian+=[coords[1]+'\t'+coords[3]+'\t'+coords[4]+'\t'+coords[5]+'\n']
                
                        if "---" in item:
                                itson=True
                        if item=="" and itson==True:
                                itson=False
                                if not(len(cartesian)==0):
                                        g.write(str(len(cartesian))+'\n')
                                        g.write('Structure '+str(n)+'\n')
                                        for line in cartesian:
                                                g.write(line)
                                        g.close()
                                cartesian=[]
        return 0
        
if __name__ == "__main__":
        infile=sys.argv[1]
        rawdata,energies=getrawdata(infile)
        structures=getstructures(rawdata)

        g=open('energies.dat','w')
        for n in range(0,len(energies)):
                g.write(str(n)+'\t'+str(energies[n])+'\n')
        g.close()

Presuming that you've saved it as pes_parse.py you can then generate a series of xyz files with the structures, catenate them into a trajectory file, and open it in e.g. jmol. I'm using the output from example 1 in http://verahill.blogspot.com.au/2013/08/503-relaxed-pes-scanning-in-nwchem.html as the example:

chmod +x pes_parse.py
./pes_parse.py nwch.nwout
ls

nwch.nwout    structure_001.xyz  structure_003.xyz  structure_005.xyz  structure_007.xyz  structure_009.xyz  structure_011.xyz  structure_013.xyz  structure_015.xyz  structure_017.xyz  structure_019.xyz
pes_parse.py  structure_002.xyz  structure_004.xyz  structure_006.xyz  structure_008.xyz  structure_010.xyz  structure_012.xyz  structure_014.xyz  structure_016.xyz  structure_018.xyz

cat structure_*.xyz >> trajectory.xyz 
jmol trajectory.xyz

You can go through the structures by clicking on the arrows indicated by the white arrow:

Finally, using VMD it's easy to make videos -- note that they for some reason look awful here (seems like a lot of frames are removed, in particular from the beginning of the run):

And here's the SN2 reaction from post 503:

28 August 2013

503. (relaxed) PES scanning in Nwchem revisited.

Update 2: The coordinates are actually gradients, and so aren't terribly informative to a casual user like myself. See this post for how to extract the geometries properly: http://verahill.blogspot.com.au/2013/08/506-extracting-optimized-structures.html

Update:
Please note that the coordinates in square brackets ([]) in the python output are not raw coordinates for the atoms in the molecule -- I haven't quite figured out how they scale, but it's not a simple matter of just multiplying. The energies are good though, and you can always extract the coordinates the slow and painful way by manually going through the output.

Another issue which should be stressed is that scan_input(geom,[1.398],[3.398],19,'dft',task_optimize) does not do the end points -- i.e. you won't get the energy for a bond length of 1.398, and you won't get the energy for a bond length of 3.398. Instead you'll get 19 data points in between these. It's a bit...awkward.

Original post:
A long time ago I made a post on doing potential energy surface (PES) scans in nwchem using python.

This is a post giving PES another look. The impetus for the post is that I'm tired of Gaussian failing and being opaque about the whole procedure.

The following page was of great help: http://www.fqt.izt.uam.mx/html/software_fqt/user/node34.html

NOTE: you'll need to compile nwchem with python support. See e.g. http://verahill.blogspot.com.au/2013/06/449-nwchem-63-updated-sources-compiling.html (the post is a bit messy, but persevere -- it's not that difficult)

On Debian the key is to change

    EXTRA_LIBS +=    -lnwcutil  -lpthread -lutil -ldl

    EXTRA_LIBS +=    -lnwcutil  -lpthread -lutil -ldl -lssl -lz

in config/makefile.h before compiling. It's not necessary on RHEL clones.

Below I'll show three examples:
* a simple bond dissociation reaction. I also discuss the use of 'constant', and task_energy vs task_optimize.
* an S_N2 reaction (CH₃Br + I^-)
* a 2D/parallel PES scan of ethane ( C-C bond length, H-C-C angle). I also show constant vs free variables.

Example 1.
Breaking the C-O bond in methanol

I set this up in ecce (see e.g. next example), but you don't have to. The input file I used was the following:


scratch_dir /scratch
Title "meoh_pes"

Start  meoh_pes

echo

charge 0

geometry autosym units angstrom
 C     0.0351714     0.00548884     0.0351714
 H     -0.617781     -0.634073     0.667983
 H     0.667983     -0.634073     -0.617781
 H     -0.605139     0.646470     -0.605139
 O     0.839603     0.818768     0.839603
 H     1.38912     0.201564     1.38912
end

ecce_print ecce.out

basis "ao basis" cartesian print
  H library "3-21G"
  O library "3-21G"
  C library "3-21G"
END

dft
  mult 1
  direct
  XC b3lyp
  grid fine
  iterations 99
  mulliken
end

driver
  default
  maxiter 888 
end

python
from nwgeom import *
geom = ''' 
    geometry adjust
        zcoord
            bond 1 5 %f cccc constant
        end
    end 
'''
results=scan_input(geom,[1.398],[3.398],19,'dft',task_optimize)
for i in range(0,len(results)):
    print results[i][0][0],results[i][1]
end


task python

The PES bit is highlighted in blue. Note the 'constant' keyword -- if you omit that the bond length will initially be set to whatever you define it to in your scan, but it can relax back to the optimum length. If you DO set 'constant' everything BUT that bond will be relaxed. Most likely this is what you will want to do.

Also note that a constrained (i.e. not relaxed) PES scan can be done by doing task_energy instead of task_optimize.

ECCE can't quite handle the textual output (alt+O) since there are lines that are too long. The output is properly written though -- you'll just have to look in the Output folder of the job. The ecce.out file works fine though.

The job takes 90-100 seconds on an old 3-core node (AMD Athlon II X3).

The very end of the output file has all the results, but in a non-obvious way:


1.498 (-115.07289914310056, [-0.00130778291169336, 0.01798903956433226, 0.0, -4.009155466250247e-05, 1.693340302064139e-05, -6.637550254401381e-06, -4.009155466250247e-05, 1.693340302064139e-05, 6.637550254401381e-06, 2.                 4514244186701895e-05, -1.5885649893555842e-05, 0.0, 0.0012636893525275195, -0.018041103298149008, 0.0, 9.97624242821682e-05, 3.4082577691996185e-05, 0.0]) 
(-114.8737952986994, [-4.7287277448850376e-05, 0.030029200359777717, 0.0, -1.3711175166353229e-06, -8.452926738775068e-08, 9.941241931599176e-07, -1.3711175166353229e-06, -8.452926738775068e-08, -9.941241931599176e-07, 8.                167348279908282e-07, -2.5820569179275075e-06, 0.0, 4.871429991895604e-05, -0.030027845123621805, 0.0, 4.984777179639632e-07, 1.3958792967685985e-06, 0.0]) 
1.498 (-115.07289914310056, [-0.00130778291169336, 0.01798903956433226, 0.0, -4.009155466250247e-05, 1.693340302064139e-05, -6.637550254401381e-06, -4.009155466250247e-05, 1.693340302064139e-05, 6.637550254401381e-06, 2.                 4514244186701895e-05, -1.5885649893555842e-05, 0.0, 0.0012636893525275195, -0.018041103298149008, 0.0,
[..]
3.198 (-114.87977711993531, [-0.00018979360652668711, 0.033296276783081655, 0.0, -2.3787379704320877e-06, 1.7510009376556918e-06, 1.3530564600128248e-06, -2.3787379704320877e-06, 1.7510009376556918e-06, -1.3530564600128248e-06, 8.       24207064487048e-06, -8.055936327900498e-07, 0.0, 0.00018027576986845428, -0.03329589479259992, 0.0, 6.033241931824307e-06, -3.0783987173960137e-06, 0.0])
3.298 (-114.8737952986994, [-4.7287277448850376e-05, 0.030029200359777717, 0.0, -1.3711175166353229e-06, -8.452926738775068e-08, 9.941241931599176e-07, -1.3711175166353229e-06, -8.452926738775068e-08, -9.941241931599176e-07, 8.          167348279908282e-07, -2.5820569179275075e-06, 0.0, 4.871429991895604e-05, -0.030027845123621805, 0.0, 4.984777179639632e-07, 1.3958792967685985e-06, 0.0])

All in all, there are 58 lines for 19 steps. I think that there are three things happening -- firstly, the line in blue is the output from the 19th step, and that somehow gets mixed in with the output from all the calculations. Secondly, the structure and energy of each step is reported twice at a time. Thirdly, the optimised structures/energies are reported one more time by injecting them into the output, like this:

S
A
B
B
C
C
D
D
A
E
E
B

where A is the first step, S is the 19th step etc. This way you get 19x3+1=58 lines. This is clearly idiotic.

Instead, you can look through the output and search for 'Scanning NWChem input - results from step' to see all the output for the optimised structures one by one:


Scanning NWChem input - results from step  2
 
(-115.06618436935011, [-0.0038228970733096973, 0.050051062094932305, 0.0, 2.9196769046224702e-05, -6.928661348853948e-06, 4.746536668570611e-06, 2.9196769046224702e-05, -6.928661348853948e-06, -4.746536668570611e-06, -1.0103262985700079e-05, 1.6491089715894858e-05, 0.0, 0.003767244388907326, -0.05005618579508188, 0.0, 7.362409274846993e-06, 2.489933151654522e-06, 0.0])

In this particular case I can grep my way through by doing

cat nwch.nwout |grep '^(-'|cat -n

1  (-115.07289914310056, [-0.00130778291169336, 0.01798903956433226, 0.0, -4.009155466250247e-05, 1.693340302064139e-05, -6.637550254401381e-06, -4.009155466250247e-05, 1.693340302064139e-05, 6.637550254401381e-06, 2.4514244186701895e-05, -1.5885649893555842e-05, 0.0, 0.0012636893525275195, -0.018041103298149008, 0.0, 9.97624242821682e-05, 3.4082577691996185e-05, 0.0])
     2  (-115.06618436935011, [-0.0038228970733096973, 0.050051062094932305, 0.0, 2.9196769046224702e-05, -6.928661348853948e-06, 4.746536668570611e-06, 2.9196769046224702e-05, -6.928661348853948e-06, -4.746536668570611e-06, -1.0103262985700079e-05, 1.6491089715894858e-05, 0.0, 0.003767244388907326, -0.05005618579508188, 0.0, 7.362409274846993e-06, 2.489933151654522e-06, 0.0])
     3  (-115.05478103866017, [-0.005033784212299788, 0.06848598587431667, 0.0, -1.3396548676491982e-06, -2.5875637174599397e-08, -5.261746410523127e-07, -1.3396548676491982e-06, -2.5875637174599397e-08, 5.261746410523127e-07, 1.4459720645843e-07, -2.8328952926398587e-06, 0.0, 0.005034455335082233, -0.0684825786855032, 0.0, 1.8635897582608418e-06, -5.225422206114883e-07, 0.0])
     4  (-115.04079235517, [-0.005485543277166251, 0.07798880362126945, 0.0, 4.745460307237215e-06, -5.597510268573469e-06, 5.645418744981701e-07, 4.745460307237215e-06, -5.597510268573469e-06, -5.645418744981701e-07, -6.651712157745848e-07, 6.750842351778419e-06, 0.0, 0.00548062073181968, -0.07798086728839469, 0.0, -3.903204054994669e-06, -3.4921546817404114e-06, 0.0])
     5  (-115.02560006674966, [-0.0054233976595857575, 0.08166232318137269, 0.0, -1.659239761503395e-06, -4.376603580866223e-07, 4.4580035316599265e-06, -1.659239761503395e-06, -4.376603580866223e-07, -4.4580035316599265e-06, 3.034808945895362e-06, -6.726118036586015e-06, 0.0, 0.005436665955901393, -0.08164730868562775, 0.0, -1.2984625724410392e-05, -7.4130570159938736e-06, 0.0])
[..]
    16  (-114.89364787840326, [-0.0005591249462735259, 0.04018795560035916, 0.0, -5.34666220519675e-07, 1.1370871814235517e-06, 4.809133242467123e-07, -5.34666220519675e-07, 1.1370871814235517e-06, -4.809133242467123e-07, -6.9140095421138525e-06, -3.095664552260277e-06, 0.0, 0.0005695756951453745, -0.040185884820554796, 0.0, -2.467406898132296e-06, -1.2492896190128416e-06, 0.0])
    17  (-114.8863872514371, [-0.00036666056940981573, 0.03667976502852128, 0.0, 2.9101399354747315e-06, -2.094045026924257e-06, -4.933288234976185e-06, 2.9101399354747315e-06, -2.094045026924257e-06, 4.933288234976185e-06, 1.6531622304416516e-07, 1.511517903679191e-07, 0.0, 0.00036162347288279384, -0.03668602744257765, 0.0, -9.484995716624312e-07, 1.0299352320775057e-05, 0.0])
    18  (-114.87977711993531, [-0.00018979360652668711, 0.033296276783081655, 0.0, -2.3787379704320877e-06, 1.7510009376556918e-06, 1.3530564600128248e-06, -2.3787379704320877e-06, 1.7510009376556918e-06, -1.3530564600128248e-06, 8.24207064487048e-06, -8.055936327900498e-07, 0.0, 0.00018027576986845428, -0.03329589479259992, 0.0, 6.033241931824307e-06, -3.0783987173960137e-06, 0.0])
    19  (-114.8737952986994, [-4.7287277448850376e-05, 0.030029200359777717, 0.0, -1.3711175166353229e-06, -8.452926738775068e-08, 9.941241931599176e-07, -1.3711175166353229e-06, -8.452926738775068e-08, -9.941241931599176e-07, 8.167348279908282e-07, -2.5820569179275075e-06, 0.0, 4.871429991895604e-05, -0.030027845123621805, 0.0, 4.984777179639632e-07, 1.3958792967685985e-06, 0.0])

Not pretty, but manageable.

cat nwch.nwout |grep '^(-'|sed 's/\,/\t/g;s/(\([^)]*\))/\1/g'|cat -n|gawk '{print $1,$2}' > profile.dat

and then plot it:

Example 2.
S_N2 reaction between iodide and bromomethane

You can set up your calc however you want, but ECCE is easier than anything else.

Draw bromomethane, then throw in an iodine atom. Adjust the angle across Br-C-I to 180 degrees, and set the C to I distance to 3 Å.

Set up the calculation -- in this case I used b3lyp/def2-svp

Edit the input and add


python
from nwgeom import *
geom = ''' 
    geometry adjust
        zcoord
            bond 1 6 %f cccc constant
        end
    end 
'''
results=scan_input(geom,[3.00],[1.5],20,'dft',task_optimize)
for i in range(0,len(results)):
    print results[i][0][0],results[i][1]
end


task python

(Delete 'task dft optimize')

You'll now have the following input file:

scratch_dir /scratch
Title "sn2_br"

Start  sn2_br

echo

charge -1

geometry noautosym units angstrom
 C     0.00000     0.00000     0.00000
 H     -0.675500     -0.675500     0.675500
 H     0.675500     -0.675500     -0.675500
 H     -0.675500     0.675500     -0.675500
 Br     1.10274     1.10274     1.10274
 I     -1.73205     -1.73205     -1.73205
end

ecce_print ecce.out

basis "ao basis" spherical print
  H library "def2-svpd"
  Br library "def2-svpd"
  C library "def2-svpd"
  I library "def2-svpd"
END
ECP
  I library "def2-ecp"
END

dft
  mult 1
  direct
  XC b3lyp
  grid fine
  iterations 99
  mulliken
end

driver
  default
  maxiter 99
end


python
from nwgeom import *
geom = '''
    geometry adjust
        zcoord
            bond 1 6 %f cccc constant
        end
    end
'''
results=scan_input(geom,[3.00],[1.5],20,'dft',task_optimize)
for i in range(0,len(results)):
    print results[i][0][0],results[i][1]
end


task python

Launch it and wait...eventually (2h 30 min on a slow three-core node) you'll get an output like the one below. Note that I didn't pre-optimise the bromomethane, so there's a bit of a drop in energy at the beginning. Likewise, I let the C-I distance get so short that the energy is rising rapidly at the end

Structure at the beginning

Transition-state-ish structure

Product

Example 3:
Two-dimensional PES scan

I'll keep this brief. First we do a scan where we use 'constant' for the angle, but not the bond length:


scratch_dir //scratch
Title "2d_pes-1"

Start  2d_pes-1

echo

charge 0

geometry noautosym units angstrom
 C     -2.51242e-66     1.67495e-66     -0.767732
 H     -0.722530     0.722530     -1.16548
 H     -0.264464     -0.986995     -1.16548
 H     0.986995     0.264464     -1.16548
 C     2.51242e-66     -2.51242e-66     0.767732
 H     0.264464     0.986995     1.16548
 H     -0.986995     -0.264464     1.16548
 H     0.722530     -0.722530     1.16548
end

ecce_print ecce.out

basis "ao basis" cartesian print
  H library "6-31G"
  C library "6-31G"
END

dft
  mult 1
  direct
  XC b3lyp
  grid fine
  iterations 99
  mulliken
end

driver
  default
end

python
from pes_scan import pes_scan
geom = ''' 
    geometry noprint adjust
        zcoord
            bond  1 5   %f cc
            angle 2 1 5 %f hcc constant
        end
    end 
'''
results = pes_scan(geom, \
[1.535, 111.269], 
[1.800,  90],
5, 'dft', task_optimize)

end

task python

And the output:

What's happening is that the bond length ends up being the same no matter what we initially set it to

If we instead set constant for the bond as well:


python
from pes_scan import pes_scan
geom = ''' 
    geometry noprint adjust
        zcoord
            bond  1 5   %f cc constant
            angle 2 1 5 %f hcc constant
        end
    end 
'''
results = pes_scan(geom, \
[1.535, 111.269], 
[1.800,  90],
5, 'dft', task_optimize)

end

task python

And we get:

22 May 2012

159. PES scanning of methanol bonds, angles, torsion using nwchem, nwgeom and python

NOTE: there's something dodgy with the potential/bond length plots -- they optimal bond lengths are way too long. I'll leave this post up here anyway, but be WARNED.

This is more of an overview of an idea of how to do it together with some explicit examples. This is more of a sketch than a step-by-step account.

Today's molecule is Methanol.

0. ecce and nwgeom/python
You need to set PYTHONPATH to /opt/nwchem/nwchem-6.1/contrib/python in order for nwchem to find the nwgeom module. That's easy enough on a local system since ~/.bashrc is read -- but it won't read ~/.bashrc on remote systems. For this you need to edit your CONFIG.<machine> files (in ~/.ECCE on your main node) -- add

setup {
setenv PYTHONPATH /opt/nwchem/nwchem-6.1/contrib/python
}

and/or

NWChemEnvironment {
PYTHONPATH /opt/nwchem/nwchem-6.1/contrib/python
}

1. Draw the molecule,
Draw the Carbon, then the oxygen, then the protons on the carbon, then the protons on the oxygen. Basically, draw the backbone first, then add protons by hand. Turn it into a residue-based system (under 'build') and optimise the structure using e.g. RHF/6-31G* (this was written with MM/FF parametrisation in mind -- for simple scanning, just do whatever you want )

2. Calculate the partial charges (rhf/6-31g*). Can skip this
You can e.g. constrain the methyl groups, or force all the methyl protons to be equal. It's a bit of a soft science, really. After the calc has finished, assign charges. I can't claim to understand which method is better (RESP, CRESP, CRESP2 etc.).

(this was written with MM/FF parametrisation in mind -- for simple scanning, skip this step)

Here's CRESP2 (some variability...):
C -0.721
O -0.368
H 0.240
H 0.240
H 0.240
H 0.368

Also, assign (atom) Types (CT, OH, HC, HC, HC, HO) -- this is done by hand. Pick atom table, select residue view (or something similar), and fill in the Type column.

Then click on Tools, check Residue table, click on the menu-looking icon in the residue view, and select write fragment. Make sure you put the fragment file in a place where ecce and nwchem will find it (e.g. amber_u). For some reason I can't get ecce to actually change the name of my residues, so edit the fragment file by hand and change all instance of UNK to the same name as the fragment file, e.g. TST if you called it TST.frg.

3. Write down the bonds, angles and dihedral angles.
Bonds
H-C 1.087
C-O 1.400
O-H 0.946

Angles
H-O-C 109.467
O-C-H 112.039
H-C-H 108.682

Torsion
H-C-O-H -61.229

When you scan the parameters using python you want to be able to
1) see if the lowest energy conf make sense and
2) not deviate too much from the ideal angle/bond/torsion.

Things get weird if you do.

4. Try to determine bond strength
This is best done outside of ecce, and you really should have compiled nwchem with python-support to make this easier.

Copy the input file you used for the ESP calc. Call it bonds.nw. Remove anything about esp and all task directives, then add:

Technically I think the bond strengths only really need two data points unless you want to fit the Lennard-Jones equation to them, but it certainly looks neater getting the full behaviour.

Then run using
mpirun -n 2 nwchem bonds.nw | tee bond12.nw

You can also do it in ecce -- but the plotting will have to be done by hand (I open the out file with vim, select the energy/atom-distance columns, :w angle.dat and then plot with gnuplot)

Do the same for atom pair 1-6 and 2-3. Make sure to pull the atoms far enough apart that the energy tails out (the 1-6 pair in the figure needs to be separated more)

5. Angles
As you'll discover, it's not just a matter of throwing in random numbers and scanning -- if you don't collect enough points, or if your first point is far away from the optimal angle, the data will look very odd.

6. Torsion

Pages