Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

22 May 2012

159. PES scanning of methanol bonds, angles, torsion using nwchem, nwgeom and python

NOTE: there's something dodgy with the potential/bond length plots -- they optimal bond lengths are way too long. I'll leave this post up here anyway, but be WARNED.

This is more of an overview of an idea of how to do it together with some explicit examples. This is more of a sketch than a step-by-step account.

Today's molecule is Methanol.

0. ecce and nwgeom/python
You need to set PYTHONPATH to /opt/nwchem/nwchem-6.1/contrib/python in order for nwchem to find the nwgeom module. That's easy enough on a local system since ~/.bashrc is read -- but it won't read ~/.bashrc on remote systems. For this you need to edit your CONFIG.<machine> files (in ~/.ECCE on your main node)  -- add
setup {
     setenv PYTHONPATH /opt/nwchem/nwchem-6.1/contrib/python
}
and/or

NWChemEnvironment {
          PYTHONPATH /opt/nwchem/nwchem-6.1/contrib/python
}      

1. Draw the molecule,
Draw the Carbon, then the oxygen, then the protons on the carbon, then the protons on the oxygen. Basically, draw the backbone first, then add protons by hand.  Turn it into a residue-based system (under 'build') and optimise the structure using e.g. RHF/6-31G* (this was written with MM/FF parametrisation in mind -- for simple scanning, just do whatever you want )

2. Calculate the partial charges (rhf/6-31g*).  Can skip this
You can e.g. constrain the methyl groups, or force all the methyl protons to be equal. It's a bit of a soft science, really. After the calc has finished, assign charges. I can't claim to understand which method is better (RESP, CRESP, CRESP2 etc.).

 (this was written with MM/FF parametrisation in mind -- for simple scanning, skip this step)

Here's CRESP2 (some variability...):
C -0.721
O -0.368
H 0.240
H 0.240
H 0.240
H 0.368

Also, assign (atom) Types (CT, OH, HC, HC, HC, HO) -- this is done by hand. Pick atom table, select residue view (or something similar), and fill in the Type column.

Then click on Tools, check Residue table, click on the menu-looking icon in the residue view, and select write fragment. Make sure you put the fragment file in a place where ecce and nwchem will find it (e.g. amber_u). For some reason I can't get ecce to actually change the name of my residues, so edit the fragment file by hand and change all instance of UNK to the same name as the fragment file, e.g. TST if you called it TST.frg.


3. Write down the bonds, angles and dihedral angles.
Bonds
H-C 1.087
C-O 1.400
O-H 0.946

Angles
H-O-C 109.467
O-C-H 112.039
H-C-H 108.682

Torsion
H-C-O-H -61.229

When you scan the parameters using python you want to be able to
1) see if the lowest energy conf make sense and
2) not deviate too much from the ideal angle/bond/torsion.

Things get weird if you do.

4. Try to determine bond strength
This is best done outside of ecce, and you really should have compiled nwchem with python-support to make this easier.

Copy the input file you used for the ESP calc. Call it bonds.nw. Remove anything about esp and all task directives, then add:


Technically I think the bond strengths only really need two data points unless you want to fit the Lennard-Jones equation to them, but it certainly looks neater getting the full behaviour.

Then run using
mpirun -n 2 nwchem bonds.nw | tee bond12.nw

You can also do it in ecce -- but the plotting will have to be done by hand (I open the out file with vim, select the energy/atom-distance columns, :w angle.dat and then plot with gnuplot)

Do the same for atom pair 1-6 and 2-3. Make sure to pull the atoms far enough apart that the energy tails out (the 1-6 pair in the figure needs to be separated more)


5. Angles
As you'll discover, it's not just a matter of throwing in random numbers and scanning -- if you don't collect enough points, or if your first point is far away from the optimal angle, the data will look very odd.




6. Torsion






09 May 2012

146. Nwchem with openblas

Openblas seems to be the successor of Gotoblas. http://xianyi.github.com/OpenBLAS/
I willingly admit that I am no expert on this blas library stuff, so I may well be doing something obviously wrong.

1. Compiling and installing openblas
sudo mkdir /opt/openblas
sudo chown ${USER} /opt/openblas
mkdir ~/tmp
cd ~/tmp

download from here http://github.com/xianyi/OpenBLAS/tarball/v0.1.1

tar xvf xianyi-OpenBLAS-v0.1.1-0-g5b7f443.tar.gz

 make all BINARY=64 CC=/usr/bin/gcc FC=/usr/bin/gfortran USE_THREAD=0 INTERFACE64=1 1> make.log 2>make.err

make PREFIX=/opt/openblas install
cp lib*.*  /opt/openblas/lib

cd /opt/openblas/lib
rm libopenblas.so.0
ln -s libopenblas.so libopenblas.so.0

do ls /opt/openblas/lib and note what cpu specific file you have such as libopenblas_barcelona-r0.1.1.so or libopenblas_nehalem-r0.1.1.a

edit /etc/ld.so.conf and add
/opt/openblas/lib

2. Compiling nwchem

sudo apt-get install build-essential gfortran python2.7-dev
mkdir /opt/nwchem/
sudo chown ${USER} /opt/nwchem
cd /opt/nwchem
wget http://www.nwchem-sw.org/images/Nwchem-6.0.tar.gz
tar -xvf Nwchem-6.0.tar.gz
cd nwchem-6.0/


For python support, edit src/config/makefile.h and add -lz -lssl to line 1962 (see here: http://verahill.blogspot.com.au/2012/04/adding-python-support-to-nwchem-under.html)

Next, continue
export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export PYTHONHOME=/usr
export PYTHONVERSION=2.7
export USE_MPI=y
export USE_MPIF=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export BLASOPT="-L/opt/openblas/lib -lopenblas -lopenblas_barcelona-r0.1.1"
export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/openmpi/lib
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran

Now edit your ~/.bashrc and add
export PATH=$PATH:/opt/nwchem/nwchem-6.0/bin/LINUX64
export LD_LIBRARY_PATH=/opt/openblas/lib:$LD_LIBRARY_PATH 
and you should be done.

You might find that ecce still won't properly execute nwchem jobs (you don't get an error -- it just doesn't work) -- edit your /etc/ld.so.conf and add a line saying

/opt/openblas/lib

Then do sudo ldconfig and it will work


Time to look into benchmarking...


So far I've got the following benchmarks (in seconds) for nwchem when optimising benzene with 6-311++g** using mpirun -n 3.

without -lpthread
_____Internal blas_____openblas_____ATLAS_____acml_____
Ta     249.8                    277.7                 335.9           ------
Be     408.2                    421.7                 568.4        
B      546.4                     579.9                 760.2        

Noticing a mention regarding -lpthread online I compiled with that as part of the mpilibs, and lo and behold:

with -lpthread

_____Internal blas_____openblas_____ATLAS_____acml_____
Ta      201.7                    198.9                233.9           ------
Be      371.1                    314.4                390.3        
B        496.6                    416.8                 521.9

* All run using mpirun -n 3. Be=phenom II X6. B= Athlon II X3. Ta= i7-1600 X4. Intel MKL cost money, acml are free.

This doesn't mean that one library is superior to the other -- it just means that under the conditions I employed and in the presence of any mistake I may have made, these are my observations. Your mileage will vary.

27 March 2012

115. Very Simple Python Queue Manager

I suppose we can call it the VSPQM, which sounds a bit like a Roman initialism, akin to SPQR.

I've spent the past few days trying to get to grips with the Sun Gridengine (SGE) but have given up for now. While it seems capable, it's just overkill for my purposes, especially taking into account the difficulties in simply configuring it. It's a bit similar to my experience with OpenDX, a very capable plotting program, but which I couldn't make work to satisfaction in spite of being one of the lucky few in possession of the "Open DX -- Paths to Visualisation" book.

Long story short -- I wrote a small script in python. It
- reads a file, list, with the name of shell scripts
- the shell scripts, job1.sh..jobn.sh, are executed sequentially - when the execution of one script is finished, the next one is executed
- jobs can be added and removed from list during execution

It's a 'dumb' script -- it does not try to balance jobs across nodes or look for idle cpus/cores. It just executes one job after the other, and mark jobs as done after execution.

To test it:
create a file called list and put the following lines in it:
pi40.sh
pi400.sh
pi2000.sh
The scripts are the following:

pi40.sh
echo "pi to 40 decimals"
echo "scale=40; 4*a(1)" | bc -l -q
echo "done"
pi400.sh
echo "scale=400; 4*a(1)" | bc -l -q
pi200.sh
 echo "scale=2000; 4*a(1)" | bc -l -q
The python code for vspqm.py is below

I've aliased my vspqm (edit ~/.bashrc):
alias vspqm='/home/me/work/vspqm/vspqm.py'
Then sourced ~/.bashrc

Launch in the directory you keep your list file using
me@beryllium:~/work/vspqm/jobs$ vspqm list > log &
[1] 23925
me@beryllium:~/work/vspqm/jobs$ cat log
pi to 40 decimals
3.1415926535897932384626433832795028841968
done
3.141592653589793238462643383279502884197169399375105820974944592307\
[..]
3.141592653589793238462643383279502884197169399375105820974944592307\
81640628620899862803482534211706798214808651328230664709384460955058\
[..]

An nwchem example would be
list:
ac.sh
bn.sh
ac.sh:
cd acetone/
mpirun -n 4 nwchem ac.nw>ac.out
cd ../
bn.sh:
cd benzene/
mpirun -n 4 nwchem bn.nw>bn.out
cd ../


Our python queue manager (which we'll call vspqm.py and chmod +x to make executable) is below. Don't forget to change #!/usr/bin/python2.4 if necessary -- I use 2.4 on ROCKS and 2.7 on Debian testing/wheezy

#!/usr/bin/python2.4
# rudimentary queue manager. Handles a single node,
# submitting a series of jobs in sequence. use python v2.4-2.7
import os
import time
import sys
infile=sys.argv[1]
print "pyqm v 0.0.3"
def launchjob(job):
        i=0
        print "######"
        job=job.rstrip('\n')
     
        i=os.system("sh "+job)
        if i==0:
                print "Job successful"
        else:
                print "Job failed"
        print "######"
        return i
def remake_list(infile):
        qfile=open(infile,"w")
        bakfile=open(infile+".bak",'r')
        for i in bakfile:
                qfile.write(i)
        return 0
def rewind(infile):
        qfile=open(infile,"w")
        bakfile=open(infile+".bak",'r')
        for i in bakfile:
                qfile.write(i[1:])
        return 0
def get_next_job(infile):
        qfile=open(infile,"r")
        bakfile=open(infile+".bak",'w')
        lines=""
        job=""
        for line in qfile:
                if line[0]=="*":
                        print "Marked as done: ",line[1:]
                if line[0]!="*" and job=="":
                        print "Launching: ", line
                        job=line
                        line="*"+line
                lines+=line
        bakfile.write(lines)
        qfile.close
        bakfile.close
        return job
def main(infile):
        jobs=1
        while (jobs==1):
                newjob=get_next_job(infile)
                remake_list(infile)
                if newjob!="":
                        jobs=1
                        echojob=launchjob(newjob)
                else:
                        print "No more jobs found at "+str(time.asctime())    
                        jobs=0
        return 0

if __name__ == "__main__":
        main(infile)
        rewind(infile)

25 November 2011

20. Is there no molecular weight calculator in the Debian repos?

UPDATE: see here for an isotopic calculator written in python -- it calculates mass as well: http://verahill.blogspot.com.au/2012/10/isotopic-pattern-caculator-in-python.html

The best molecular weight calculator which I've encountered is Matthew Monroe's Molecular Weight Calculator, which can be found at http://ncrr.pnnl.gov/software/

It does everything. Molecular weights. Isotopic patterns. And so, so, so much more.

It has one major drawback though - it's written for Windows. Luckily, it sort of works under Wine after you've done a bit of wine-trick-ery.

As great as the calculator is, sometimes you only need to calculate the molecular weight of something, and nothing else. Searching the debian repos I can' t find a single dedicated molecular weight calculator. In particular, a command-line driven calculator would be nice.

Seriously - it's a crying shame that the distribution with the largest repos, i.e. Debian, does not have a single passable molecular weight calculator. It is even more surprising given the number of chemistry-related packages which are present.

So, here's what I did:
a quick google on "python molecular weight calculator" brought me to http://pygments.org

A little bit of editing gave the code below, which was saved as molcalc, copied to /usr/bin, followed by sudo chmod +x /usr/bin/molcalc. It can now be called using
molcalc "(Co(CO)5)2"
and returns
The mass of (Co(CO)5)2 is 257.916890.

Here's the code, which is 99.9% the original and 0.1% my modification. All credit thus due to Lee, Freitas and Tucker.

NOTE: it doesn't handle layered parentheses. (Al(NO3)3)2 gets interpreted as Al2(NO3)3.

#!/usr/bin/python2.6
#########################################################################
# Author: Toni Lee with the help of Guilherme Freitas and Becky Tucker. Minor changes by Lindqvist
# Copyright: This module has been placed in the public domain
#########################################################################

#Import regular expressions
import re
import sys
try:
test=sys.argv[1]
except:
quit()

#Create the dictionary (From Becky with a value of 0 inserted for Uus(mass not measurable))
TableofElements ={ 'H':1.00794,'He':4.002602,'Li':6.941,'Be':9.012182,
                        'B':10.811,'C':12.0107,'N':14.0067,'O':15.9994,'F':18.9984032,'Ne':20.1797,
                        'Na':22.98976928,'Mg':24.3050,'Al':26.9815386,'Si':28.0855,
                        'P':30.973762,'S':32.065,'Cl':35.453,'Ar':39.948,'K':39.0983,'Ca':40.078,
                        'Sc':44.955912,'Ti':47.867,'V':50.9415,'Cr':51.9961,'Mn':54.938045,
                        'Fe':55.845,'Ni':58.6934,'Co':58.933195,'Cu':63.546,'Zn':65.38,'Ga':69.723,
                        'Ge':72.64,'As':74.92160,'Se':78.96,'Br':79.904,'Kr':83.798,'Rb':85.4678,
                        'Sr':87.62,'Y':88.90585,'Zr':91.224,'Nb':92.90638,'Mo':95.96,'Tc':98,
                        'Ru':101.07,'Rh':102.90550,'Pd':106.42,'Ag':107.8682,'Cd':112.411,
                        'In':114.818,'Sn':118.710,'Sb':121.760,'Te':127.60,'I':126.90447,
                        'Xe':131.293,'Cs':132.9054519,'Ba':137.327,'La':138.90547,'Ce':140.116,
                        'Pr':140.90765,'Nd':144.242,'Pm':145,'Sm':150.36,'Eu':151.964,'Gd':157.25,
                        'Tb':158.92535,'Dy':162.500,'Ho':164.93032,'Er':167.259,'Tm':168.93421,
                        'Yb':173.054,'Lu':174.9668,'Hf':178.49,'Ta':180.94788,'W':183.84,
                        'Re':186.207,'Os':190.23,'Ir':192.217,'Pt':195.084,'Au':196.966569,
                        'Hg':200.59,'Tl':204.3833,'Pb':207.2,'Bi':208.98040,'Po':210,'At':210,
                        'Rn':220,'Fr':223,'Ra':226,'Ac':227,'Th':232.03806,'Pa':231.03588,
                        'U':238.02891,'Np':237,'Pu':244,'Am':243,'Cm':247,'Bk':247,'Cf':251,
                        'Es':252,'Fm':257,'Md':258,'No':259,'Lr':262,'Rf':261,'Db':262,'Sg':266,
                        'Bh':264,'Hs':277,'Mt':268,'Ds':271,'Rg':272, 'Uus':0
}


#######################################
#Computes the MW of an atom-number pair
#######################################
def getMass(x):
    atom=re.findall('[A-Z][a-z]*',x)
    number=re.findall('[0-9]+', x)
    if len(number) == 0:
        multiplier = 1
    else:
        multiplier = float(number[0])
    atomic_mass=TableofElements[atom[0]]
    return (atomic_mass*multiplier)

################################################################
#Segments formula into atom-number sections (i.e. 'H3' or 'N10')
################################################################
def parseFormula(fragment):
    segments=re.findall('[A-Z][a-z]*[0-9]*',fragment)
    return (segments)

##################################################################################
#Computes total mass of both parenthetical and nonparenthetical formula components
##################################################################################
def molmass(formula):
    parenMass=0
    nonparenMass=0
    while (len(formula)>0):
        #First computes the molecular weight of all parenthetical formulas from left to right
        while (len(re.findall('\(\w*\)[0-9]+', formula))!=0):
            parenthetical=re.findall('\(\w*\)[0-9]+',formula)
            for i in range(0,len(parenthetical)):
                parenMult1 = re.findall('\)[0-9]+', parenthetical[i])
                parenMult2 = re.findall('[0-9]+', parenMult1[0])
                segments =parseFormula(parenthetical[i])
                for i in range(0, len(segments)):
                    parenMass= parenMass + ((getMass(segments[i]))*(float(parenMult2[0])))
            formula=re.sub('\(\w*\)[0-9]+', '', formula)
        #Sums nonparenthetical molecular weights when all parenthetical molecular weights have been summed
        segments = parseFormula(formula)
        for i in range(0, len(segments)):
            nonparenMass=nonparenMass + getMass(segments[i])
        formula=re.sub(formula, '', formula)

    Mass=parenMass+nonparenMass
    return Mass
     
if __name__ == '__main__':
test=test.split(',')
for element in test:
print ('The mass of %(substance)s is %(Mass)f.' % {'substance': \
element, 'Mass': molmass(element)})
 

26 July 2011

8. Very rudimentary peak-finding

The following scripts looks through a file with one x column and several y columns, and returns values larger than a certain cutoff.

Usage:
findpeaks filename columnnumber cutoffvalue
e.g.
findpeaks test.dat 3 0.01

findpeaks:
#!/usr/bin/python2.6
import sys
infile=sys.argv[1]
colno=int(sys.argv[2])
cutoff=float(sys.argv[3])

minsig=100
maxsig=0


spectra=open(infile,'r')



for spectrum in spectra:
spectrum=spectrum.rstrip('\n')
spectrum=spectrum.split(' ')
# print spectrum

try:
spectrum[colno]=float(spectrum[colno])
if spectrum[colno]>cutoff:
print spectrum[1],' ',spectrum[colno]

if spectrum[colno]<minsig:
minsig=spectrum[colno]
if spectrum[colno]>maxsig:
maxsig=spectrum[colno]


except:
a=0
maxsig=0
print "not working: "

print "max sig: ",maxsig,", minsig: ",minsig
spectra.close

20 July 2011

7. Processing 1D Bruker nmr data

Bruker 1D binary NMR files can be processed using a combination of cat, grep, sed, gawk and od, together with python and octave (w/ octave-optim) for some fancy line-fitting.

 brukdig2asc:
 #!/bin/bash
#usage: brukdig2asc
SW=`cat acqus | grep 'SW_h' | sed 's/\=/\t/g' | gawk '{print $2}'| tr -d '\n'`
TD=`cat acqus | grep 'TD=' | sed 's/\=/\t/g' | gawk '{print $2}'| tr -d '\n'`
O=`cat acqus | grep '$O1=' | sed 's/\=/\t/g' | gawk '{print $2}'`
SFO=`cat acqus | grep 'SFO1=' | sed 's/\=/\t/g' | gawk '{print $2}'`
#TIME=16384
#SWEEP=23809.5238095238
#AQ=`echo "1/(23809.5238095238/(16384/2))" | bc -lq`
cp fid fid.bin
ls fid.bin | cpio -o | cpio -i --swap -u
od -An -t dI -v -w8 fid.bin| gawk '{print NR,$1,$2}'| sed '1,64d' >fid.asc1
pynmr $SW $TD $O $SFO
makespec

pynmr:
#!/usr/bin/python2.6
import sys
#print str(sys.argv)
sweepwidth=float(sys.argv[1])
nopts=int(sys.argv[2])
centrefreq=float(sys.argv[3])
basefreq=float(sys.argv[4])

aq=1/(sweepwidth/(nopts/2))
#print str(sweepwidth),str(nopts)
f=open('fid.asc1','r')
g=open('fid.asc','w')
for line in f:
    line=line.rstrip('\n')
    line=line.split(' ')
#    print line
    freq=float(line[0])/(nopts/2)*sweepwidth+(centrefreq-sweepwidth/2)
    line[0]=(float(line[0])/(nopts/2))*aq
    g.write(str(line[0])+'\t'+str(line[1])+'\t'+str(line[2])+'\t'+str(freq)+'\n')
f.close
g.close

makespec:

#!/bin/bash
octave --silent --eval "fid=load('fid.asc');
#make xaxis
[nopts b]=size(fid);
aq=max(fid(:,1));
sw=nopts/aq;
freqx=linspace(0,sw,nopts)';

#apodizing
lb=5/10000;
fid(:,2)=fid(:,2).*exp(-lb.*freqx);
fid(:,3)=fid(:,3).*exp(-lb.*freqx);

#phasing
spec=[fid(:,1) real(fftshift(fft(fid(:,2)+i*fid(:,3)))) imag(fftshift(fft(fid(:,2)+i*fid(:,3))))];
[a b]=size(spec); spec(a/2,2:3)=[0 0];
phc=linspace(0,2*pi,180);
maxsig=0;k=1;
for n=1:180;
        localmax=max( real( (spec(:,2)+i*spec(:,3)).*exp(i*phc(n)) ));
        if (localmax>maxsig)
                maxsig=localmax;
                k=n;
        endif
endfor;
#simple baseline
absd=inline('m+t*0','t','m');
guess=0;
[f m kvg iter corp covp covr stdresid z r2]=leasqr(fid(:,4),real((spec(:,2)+i*spec(:,3)).*exp(i*phc(k))),guess,absd);
#disp(m)
#disp(sqrt(diag(covp)))

#make spectrum
spectrum=[fid(:,4) real((spec(:,2)+i*spec(:,3)).*exp(i*phc(k)))-m imag((spec(:,2)+i*spec(:,3)).*exp(i*(phc(k)+pi/2)))-m];

#fitting
pkg load optim
[a b]=max(spectrum(:,2));
centre=fid(b,4);
guess=[10 max(spec(:,2))]; #centre width height
#disp(guess)
lorentzian=inline('p(2)*(1/pi)*(p(1)/2)./((t-centre).^2+(0.5*p(1))^2)','t','p');
[f p r2]=leasqr(fid((b-150):(b+150),4),spectrum((b-150):(b+150),3),guess,lorentzian);

#filter out artefacts from fitting set
filtered=[0 0];
res=floor((max(fid(:,4))-min(fid(:,4)))/nopts);
for l=(b-ceil(5*p(1)/res)):(b+5*ceil(p(1))/res)
delta=lorentzian(fid(l,4),p)-spectrum(l,2);

if (delta>(lorentzian(fid(l,4),p))/1.2)
# do nothing
else
filtered=[filtered; fid(l,4) spectrum(l,2)];
endif
endfor

filtered=[ filtered(2:size(filtered(:,2)),1) filtered(2:(size(filtered(:,2))),2)  ];
[f p r2]=leasqr(filtered(:,1),filtered(:,2),p,lorentzian);

#disp(p')
#disp(r2)
params=[centre centre/67.8 max(lorentzian(fid(:,4),p)) p(1) 1.000 p(2)];
disp(params)
#save
spex=[fid(:,4) real((spec(:,2)+i*spec(:,3)).*exp(i*phc(k)))-m imag((spec(:,2)+i*spec(:,3)).*exp(i*(phc(k)+pi/2)))-m lorentzian(fid(:,4),p)];
save spectrum.dat spex;"