Showing posts with label centos. Show all posts
Showing posts with label centos. Show all posts

21 March 2017

635. Installing R on Rocks 5.4.3

Rocks 5.4.3 is based on CentOS 5.6 which is practically ancient by now (released Jan 2011).

Either way, when dealing with someone else's cluster its better to not fiddle too much with what is already working.

Here's a not at all elegant way of install R on Rocks 5.4.3
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/R-core-3.3.2-3.el5.x86_64.rpm
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/R-3.3.2-3.el5.x86_64.rpm 
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/R-devel-3.3.2-3.el5.x86_64.rpm 
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/libRmath-3.3.2-3.el5.x86_64.rpm 
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/libRmath-devel-3.3.2-3.el5.x86_64.rpm 
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/R-core-devel-3.3.2-3.el5.x86_64.rpm
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/libssh2-0.18-10.el5.x86_64.rpm 
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/xdg-utils-1.0.2-4.el5.noarch.rpm 
wget http://mirror.centos.org/centos/5/os/x86_64/CentOS/xz-devel-4.999.9-0.3.beta.20091007git.el5.x86_64.rpm
wget http://mirror.centos.org/centos/5/os/x86_64/CentOS/texinfo-tex-4.8-14.el5.x86_64.rpm
wget http://mirror.centos.org/centos/5/os/x86_64/CentOS/texinfo-4.8-14.el5.x86_64.rpm

sudo yum install R-3.3.2-3.el5.x86_64.rpm libRmath-devel-3.3.2-3.el5.x86_64.rpm libRmath-3.3.2-3.el5.x86_64.rpm R-devel-3.3.2-3.el5.x86_64.rpm R-core-3.3.2-3.el5.x86_64.rpm R-core-devel-3.3.2-3.el5.x86_64.rpm libssh2-0.18-10.el5.x86_64.rpm xdg-utils-1.0.2-4.el5.noarch.rpm texinfo-tex-4.8-14.el5.x86_64.rpm xz-devel-4.999.9-0.3.beta.20091007git.el5.x86_64.rpm texinfo-4.8-14.el5.x86_64.rpm 
[..] Total size: 169 M Downloading Packages: Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : libssh2 1/11 Installing : libRmath 2/11 Installing : texinfo 3/11 Installing : texinfo-tex 4/11 Installing : libRmath-devel 5/11 Installing : xz-devel 6/11 Installing : xdg-utils 7/11 Installing : R-core 8/11 Installing : R-core-devel 9/11 Installing : R-devel 10/11 Installing : R 11/11 Installed: R.x86_64 0:3.3.2-3.el5 R-core.x86_64 0:3.3.2-3.el5 R-core-devel.x86_64 0:3.3.2-3.el5 R-devel.x86_64 0:3.3.2-3.el5 libRmath.x86_64 0:3.3.2-3.el5 libRmath-devel.x86_64 0:3.3.2-3.el5 libssh2.x86_64 0:0.18-10.el5 texinfo.x86_64 0:4.8-14.el5 texinfo-tex.x86_64 0:4.8-14.el5 xdg-utils.noarch 0:1.0.2-4.el5 xz-devel.x86_64 0:4.999.9-0.3.beta.20091007git.el5 Complete!

Testing:
R
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > q() Save workspace image? [y/n/c]: n

19 May 2013

421. NWChem 6.3 on ROCKS 5.4.3/CentOS 5.6

Update 23 May 2013: The execution times are pretty much the same as for 6.1.1 with a new patch. I've updated the instructions below to incorporate this new patch (http://www.nwchem-sw.org/images/Iswtch.patch.gz)

Update 21 May 2013:
The execution times can be improved considerably by setting
ARMCI_NETWORK=SOCKETS

They are still ca 30% longer than 6.1.1 though due to slower SCF convergence.
See http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id834/Nwchem_6.3_running_2-5_times_slo....html

UPDATE 20 May 2013:
Nwchem 6.3 is very slow compared to 6.1.1. A six-core run (out of eight cores available) was 121 s using 6.1.1 but 254 seconds on 6.3!

I observed this on debian as well: 6.3 on debian is five times slower (190s vs 40 s for example at 8 cores in http://verahill.blogspot.com.au/2013/05/414-frequency-vs-cores-crude.html) than 6.1.1. Not sure why that is.

Original:
NWChem 6.3 is out now. Here's how to build it on ROCKS 5.4.3 (based on Centos 5.6) for CPU-based calculations (currently only CCSD(T) can take advantage of GPU/CUDA anyway).

To build on debian, see http://verahill.blogspot.com.au/2013/05/424-nwchem-63-on-debian-wheezy.html

This assumes that you've got a proper build environment (gcc, fortran, openmpi) installed.

Openblas:
I've added all users who do computations to the group compchem.
sudo mkdir /share/apps/openblas
sudo chown $USER:compchem /share/apps/openblas
cd ~/tmp
wget http://nodeload.github.com/xianyi/OpenBLAS/tarball/v0.1.1
tar xvf v0.1.1
cd xianyi-OpenBLAS-e6e87a2/
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
make all BINARY=64 CC=/usr/bin/gcc FC=/usr/bin/gfortran USE_THREAD=0 INTERFACE64=1 1> make.log 2>make.err

make PREFIX=/share/apps/openblas install
cp lib*.*  /share/apps/openblas/lib
sudo chmod 755 /share/apps/openblas -R

For later use with nwchem and ecce, add /share/apps/openblas/lib to /etc/ld.so.conf and do
sudo ldconfig

Put
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/openblas/lib
in ~/.bashrc and/or queue files.

NWChem
I've added all users who do computations to the group compchem.
sudo mkdir /share/apps/nwchem/
sudo chown $USER:compchem /share/apps/nwchem/

cd /share/apps/nwchem
wget http://www.nwchem-sw.org/download.php?f=Nwchem-6.3-src.2013-05-17.tar.gz
tar xvf Nwchem-6.3-src.2013-05-17.tar.gz 
cd nwchem-6.3-src.2013-05-17/
cd src/
wget http://www.nwchem-sw.org/images/Iswtch.patch.gz
gzip -d Iswtch.patch
patch -p0 < Iswtch.patch
cd ../
export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export PYTHONHOME=/opt/rocks
export PYTHONVERSION=2.4
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/opt/openmpi
export MPI_INCLUDE=/opt/openmpi/include
export LIBRARY_PATH=$LIBRARY_PATH:/opt/openmpi/lib:/share/apps/openblas
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
export BLASOPT="-L/share/apps/openblas/lib -lopenblas -lopenblas_nehalem-r0.1.1 -lopenblas_nehalemp-r0.1.1"

export ARMCI_NETWORK=SOCKETS

cd $NWCHEM_TOP/src
export FC=gfortran
make clean
make  nwchem_config
make  FC=gfortran
cd ../contrib
./getmem.nwchem
 sudo chmod 755 /share/apps/nwchem/nwchem-6.3-src.2013-05-17 -R

Create a default.nwchemrc in /share/apps/nwchem
nwchem_basis_library /share/apps/nwchem/nwchem-6.3-src.2013-05-17/src/basis/libraries/ ffield amber amber_1 /share/apps/nwchem/nwchem-6.3-src.2013-05-17/src/data/amber_s/ amber_2 /share/apps/nwchem/nwchem-6.3-src.2013-05-17/src/data/amber_x/ amber_3 /share/apps/nwchem/nwchem-6.3-src.2013-05-17/src/data/amber_q/ amber_4 /share/apps/nwchem/nwchem-6.3-src.2013-05-17/src/data/amber_u/ amber_5 /share/apps/nwchem/nwchem-6.3-src.2013-05-17/src/data/custom/ spce /share/apps/nwchem/nwchem-6.3-src.2013-05-17/src/data/solvents/spce.rst charmm_s /share/apps/nwchem/nwchem-6.3-src.2013-05-17/src/data/charmm_s/ charmm_x /share/apps/nwchem/nwchem-6.3-src.2013-05-17/src/data/charmm_x/
and put symmlinks to it in the users' home directories, e.g.
cd ~
ln -s /share/apps/nwchem/default.nwchemrc .nwchemrc

20 March 2013

364. Setting up a new user on a ROCKS cluster

Because I keep forgetting about the rocks sync command...

From https://groups.google.com/forum/?fromgroups=#!topic/rocks-clusters/P6tvn_2Gk5Y

To add a new user to a ROCKS cluster and let them use Sun Grid Engine, do the following

sudo useradd -m verahill
sudo passwd verahill
su verahill
exit
sudo usermod -a -G compchem verahill
rocks sync users
qconf -auser verahill
1 name verahill 2 oticket 0 3 fshare 0 4 delete_time 0 5 default_project NONE

where compchem is a usergroup I've set up to give everyone access to the executables they need.

The first login, using su above, creates the .ssh directory and rsa/dsa keys.

Finally, to force the user to change their password on first login, do
chage -d 0 verahill

15 February 2013

339. Compiling ncdu on ROCKS 5.4.3/Centos 5.6

du is nice, but ncdu gives a better overview. Nothing odd about building it though:

mkdir ~/tmp
cd ~/tmp
wget http://dev.yorhel.nl/download/ncdu-1.9.tar.gz
tar xvf ncdu-1.9.tar.gz
cd ncdu-1.9/
sudo mkdir /share/apps/tools/ncdu -p
sudo chown $USER /share/apps/tools/ncdu
./configure --prefix=/share/apps/tools/ncdu
make
make install
echo 'export PATH=$PATH:/share/apps/tools/ncdu/bin' >> ~/.bashrc
source ~/.bashrc

Start by running
ncdu

01 February 2013

329. ECCE, xterm and X forwarding: fixing broken "tail -f on output" in ECCE/'untrusted X11 forwarding' error


The problem
In ECCE when you highlight a running job on a remote server which you've set up with the frontendMachine option (here and here and here) which is a ROCKS 5.4.3/CentOS server and e.g. hit Alt+L or "Run Mgmt"/"Tail -f on Output file" and nothing happens, and when you set ECCE to provide verbose output (add "ECCE_RCOM_LOGMODE true" to ecce/apps/siteconfig/site_runtime) you see the following errors:

X11 connection rejected because of wrong authentication. X connection to localhost:43.0 broken (explicit kill or server shutdown).
and
OpenSSH_4.3p2, OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008 Warning: untrusted X11 forwarding setup failed: xauth key data not generated Warning: No xauth data; using fake authentication data for X11 forwarding.
Obviously there are non-ECCE related situation where you may see these errors too. Doesn't matter -- same solution.


The diagnostics
cat /etc/ssh/sshd_config |grep X11
X11Forwarding yes X11DisplayOffset 10
cat /etc/ssh/ssh_config |grep X11|grep -v ^#
ForwardX11 yes
sudo cat /etc/ssh/sshd_config |grep X11|grep -v ^#
X11Forwarding yes X11DisplayOffset 10

So, why localhost:43? And why isn't it working? From my workstation to the cluster which is connected to the net via the front node, and then from the cluster front to the cluster front's local name.

ssh -X server.external.dns
echo $DISPLAY
localhost:42.0
ssh -X server.local.dns
Warning: untrusted X11 forwarding setup failed: xauth key data not generated Warning: No xauth data; using fake authentication data for X11 forwarding.
echo $DISPLAY
localhost:44.0
yet
ssh -Y server.local.dns

works fine.

The solution:
Simpler than I thought:
I edited ~/.ssh/config on the server, and did
Host server.local.dns Hostname server.local.dns User me ForwardX11 yes ForwardX11Trusted yes

And now it works!

Presumably I could've just edited /etc/ssh_config instead, but it's a multi-user cluster and I'm happier to change things on a user-by-user basis.

05 November 2012

275. Compiling Dalton 2011 on ROCKS 5.4.3/CentOS

I've previously struggled with Dalton 2.0-cam and given up. I somehow didn't know about Dalton 2011 at that point, but it turns out it's much easier to build. Well, I managed to build it on ROCKS/CentOS (gcc 4.1). I'm still working on the debian version which has a much newer gcc (4.7)

Before you get started you may want to compile ATLAS as shown here: http://verahill.blogspot.com.au/2012/09/rocks-543-atlas-and-gromacs-on-xeon.html

License:
First go to http://daltonprogram.org/licence/ and fill out the license agreement. Once that's done you'll get an automated email with a license form, which you should print, sign, scan and email to the email address you're given. Once your form has been processed you'll be sent another email with a user name and password. I received my user name and password the next business day.

Go online and download the source file, Dalton2011_release_v0.tgz, and put it in ~/tmp. Sort out where you want your program to end up
sudo mkdir /share/apps/dalton
sudo chown $USER /share/apps/dalton
mkdir /share/apps/dalton/bin /share/apps/dalton/basis /share/apps/dalton/lsdalton

Next,
cd ~/tmp
tar xvf Dalton2011_release_v0.tgz
cd Dalton2011_release/DALTON
./configure 

and answer all the questions:
------------------------------------------------------------------
   Configuring the DALTON Makefile.config and "dalton" run script
------------------------------------------------------------------

INFO: Operating system from 'uname -s' : Linux
INFO: Processor type   from 'uname -m' : x86_64
No architecture specified, attempting auto-configuration:
This appears to be a -linux architecture. Is this correct? [Y/n] 
--> Installing DALTON on a -linux computer


Note that 64-bit integers are desirable for Cholesky and very large
scale CI, otherwise the most important effect is that some files will be bigger.

If you choose 64-bit integers, be careful that any system library
routines (incl. MPI) also use 64-bit integers!

Do you want 64-bit integers? [y/N] Do you want to install the program in a parallel MPI version? [Y/n] 
-->WARNING: Makefiles for MPI architecture are difficult to guess
   Please compare the generated Makefile.config with local documentation.

   Checking for Fortran compiler ...
   from this list: mpif90 mpiifort ifort pgf95 pgf90 gfortran g95 

Compiler /opt/openmpi/bin/mpif90 found, use this compiler? [Y/n] 
-->Compiler mpif90 found and accepted.
Is backend compiler gfortran ? [Y/n] 
   Checking for C compiler ...
   from this list: mpicc  mpiicc   icc ecc pgcc gcc 

Compiler /opt/openmpi/bin/mpicc found, use this compiler? [Y/n] 
-->Compiler mpicc found and accepted.

Testing existence of libraries in this order:
 libacml.a libmkl.so libmkl_p3.a libatlas.a libblas.a
Directory search list for libraries:
  /state/partition1/home/me/tmp/ATLAS/build/lib /state/partition1/apps/ATLAS/lib /lib /usr/local/lib /usr/lib /usr/local/lib/ATLAS /lib64 /usr/lib64 /usr/local/lib64 

Do you want to replace this with your own directory search list? [y/N] Found /state/partition1/home/me/tmp/ATLAS/build/lib/libatlas.a, use it? [Y/n] Found /state/partition1/apps/ATLAS/lib/libatlas.a, use it? [Y/n] 
-->The following mathematical library(ies) will be used:
   -L/state/partition1/apps/ATLAS/lib -llapack -llapack -lf77blas -latlas


DALTON uses almost 100 Megabytes of static
allocations, in addition to the dynamic allocation.

DALTON has the possibility to reserve an amount of static memory
for storing two-electron integrals in direct and parallel calculations
Storing some or all of the 2-el. integrals in memory will speed up
direct and parallel calculations (and in particular the latter).
NOTE: This will increase the static memory allocation used by DALTON

Would you like to activate the possibility of storing 2-el.int. in memory? [y/N] How many MB to use for storing 2-el. integrals? 
-->Program will be installed with 500 MB (65000000 words) used for storing 2-el. integrals

Maximum amount of work memory for dynamic allocations can be changed
at run time with the environment variable WRKMEM (in REAL*8 words = megabytes/8)
or by using the -M option to the run script: "dalton -M mb ..." (in megabytes).
We recommend at least 200 MB work memory,
larger for correlated calculations, but it should for maximum
efficiency NOT exceed available physical memory per CPU in parallel calculations.

How many MB to use as default for work memory (hit return for default of 1000 MB)? 
-->Program will be installed with a default work memory of 900 MB (117000000 words)

-->Current directory is /home/me/tmp/Dalton2011_release/DALTON

Use default ../bin as installation directory for DALTON binaries and scripts? [Y/n] Please enter another installation directory: 
-->DALTON executable and script will be placed in /share/apps/dalton/test directory


-->Default basis set directory will be /home/me/tmp/Dalton2011_release/DALTON/../basis/

Use this directory as default basis set directory? [Y/n] 
Please choose another default basis set directory (must end with /) 
-->Default basis set directory will be /share/apps/dalton/basis/


I did not find /work, /scratch, /scr, or /temp. I will use /tmp

-->Job specific directories under $SCRATCH/$USER
-->will be used for temporary files when running DALTON

Use SCRATCH=/tmp as default root scratch space in "dalton" run script? [Y/n] 
-->Creating Makefile.config ...
gfortran version 412 prc=x86_64
INFO: Compiling with 32-bit integers.
INFO: Make sure pre-compiled BLAS, MPI etc. libraries are also with 32-bit integers!!!

Proper 64-bit file access detected.

-->Creating the DALTON run-script in /share/apps/dalton/test

   The configuration of DALTON has finished succesfully.
   Check compiler flags etc. in Makefile.config and run "make" to get executable.

Regardless of what you'll answer, here's an example of a Makefile.config that I used. The key is to add -I../modules to INCLUDES, and delete -fbacktrace.


ARCH        = linux
#
#
CPPFLAGS      = -DVAR_GFORTRAN -DSYS_LINUX -DVAR_MFDS -D'INSTALL_WRKMEM=117000000' -D'INSTALL_MMWORK=65000000' -D_FILE_OFFSET_BITS=64 -DVAR_MPI -DGFORTRAN=412 -DIMPLICIT_NONE
F90           = mpif90
CC            = mpicc
LOADER        = mpif90
RM            = rm -f
FFLAGS        = -march=x86-64 -O3 -ffast-math -funroll-loops -ftree-vectorize 
SAFEFFLAGS    = -march=x86-64 -O3 -ffast-math -funroll-loops -ftree-vectorize 
CFLAGS        = -march=x86-64 -O3 -ffast-math -funroll-loops -ftree-vectorize -std=c99 -DRESTRICT=restrict -DFUNDERSCORE=1
INCLUDES      = -I../include -I../modules
MODULES       = -J../modules
LIBS          = -L/state/partition1/apps/ATLAS/lib -llapack -llapack -lf77blas -latlas -L/opt/openmpi/lib -lmpi
INSTALLDIR    = /share/apps/dalton/test
PDPACK_EXTRAS = linpack.o eispack.o gp_zlapack.o gp_dlapack.o
GP_EXTRAS     = 
AR            = ar
ARFLAGS       = rvs
# flags for ftnchek on Dalton /hjaaj
CHEKFLAGS  = -nopure -nopretty -nocommon -nousage -noarray -notruncation -quiet  -noargumants -arguments=number  -usage=var-unitialized
# -usage=var-unitialized:arg-const-modified:arg-alias
# -usage=var-unitialized:var-set-unused:arg-unused:arg-const-modified:arg-alias
#
default : dalton linuxparallel.x
SAFE_FFLAGS_for_ifort = $(FFLAGS)
#
# Parallel initialization
#
MPI_INCLUDE_DIR = 
MPI_LIB_PATH    = 
MPI_LIB         = 
#
#
# Suffix rules
# hjaaj Oct 04: .g is a "cheat" suffix, for debugging.
#               'make x.g' will create x.o from x.F or x.c with -g debug flag set.
#
.SUFFIXES : .F .F90 .c .o .i .g .s

.F.o:
        $(F90) $(INCLUDES) $(MODULES) $(CPPFLAGS) $(FFLAGS) -c $*.F 

.F.i:
        $(F90) $(INCLUDES) $(MODULES) $(CPPFLAGS) -E $*.F > $*.i

.F.g:
        $(F90) $(INCLUDES) $(MODULES) $(CPPFLAGS) $(SAFEFFLAGS) -g -c $*.F 

.F.s:
        $(F90) $(INCLUDES) $(MODULES) $(CPPFLAGS) $(FFLAGS) -S -g -c $*.F 

.F90.o:
        $(F90) $(INCLUDES) $(MODULES) $(CPPFLAGS) $(FFLAGS) -c $*.F90 

.F90.i:
        $(F90) $(INCLUDES) $(MODULES) $(CPPFLAGS) -E $*.F90 > $*.i

.F90.g:
        $(F90) $(INCLUDES) $(MODULES) $(CPPFLAGS) $(SAFEFFLAGS) -g -c $*.F90 

.F90.s:
        $(F90) $(INCLUDES) $(MODULES) $(CPPFLAGS) $(FFLAGS) -S -g -c $*.F90 

.c.o:
        $(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -c $*.c 

.c.i:
        $(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -E $*.c > $*.i

.c.g:
        $(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -g -c $*.c 

.c.s:
        $(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -S -g -c $*.c 

 
If all is looking well, make.
make
cd ../
cp basis/* /share/apps/dalton/basis

DO NOT RUN MAKE IN PARALLEL i.e. no make -j3 or anything like that.
Add /share/apps/dalton/bin to your PATH i.e. add a line saying
export PATH=$PATH:/share/apps/dalton/bin
to your ~/.bashrc and source it.
So far I haven't had much time to look at it, but here's the result of the 'short' test series:
./TEST -dalton /share/apps/dalton/bin/dalton short 
[..]
#####################################################################
                              Summary
#####################################################################

THERE IS A PROBLEM IN TEST CASE(S)
 prop_exci prop_vibg2 walk_vibave2 dftmm_1
date and time         : Sun Nov  4 18:41:59 PST 2012

Here's what I found for each of the troublesome ones above:

prop_exci:
126:  INFO from READIN: Threshold for discarding integrals was    1.00D-16
127:  INFO from READIN: Threshold is reset to minimum value       1.00D-15
But otherwise it finished ok.

prop_vibg2:
 SIROUT stat info, IST and IEND =                   0                  -1
 IST or IEND out of bounds - probably no optimization in this run.
But otherwise it finished ok.

walk_vibave2:
3 informational messages have been issued by Dalton,
output from 'grep -n INFO'  (max 10 lines):
549: *** SETSIR-INFO, time in NSETUP:       0.00 seconds.
2346: *** SETSIR-INFO, time in NSETUP:       0.00 seconds.
3691: *** SETSIR-INFO, time in NSETUP:       0.00 seconds
But otherwise it finished ok.

dftmm_1:
 NOTE:    1 warnings have been issued.
 Check output, result, and error files for "WARNING".
dftmm_1.tar.gz has been copied to /home/me/tmp/Dalton2011_release/DALTON/test
----------------------------------------------------------
2 WARNINGS have been issued by Dalton,
output from 'grep -n -i WARNING'  (max 10 warnings):
711: NOTE:    1 warnings have been issued.
712: Check output, result, and error files for "WARNING".
I can't find the warning in the output, which looks like it finished ok.

All in all, it looks very promising.


Note on running in parallel
I had to do

mkdir /tmp/$USER
first.

In addition, when running I have to explicitly define my scratch directory:
dalton -t /tmp/$USER -N 4 myinput.dal myinput.mol
Other than that it's OK. I just get the overall impression that things aren't very stable (some jobs crash, some don't)

30 October 2012

272. Compiling NWChem 6.1.1.1 on ROCKS 5.4.3/CentOS 5.6

Nothing weird with this one and it's all but identical to the build on debian, but here's a step by step anyway to help those who are computational chemists, but not sysadmins.

Preparations:
First compile openblas according to http://verahill.blogspot.com.au/2012/05/building-nwchem-61-on-debian.html 

Next, create e.g. /share/apps/nwchem, like this
sudo mkdir /share/apps/nwchem
sudo chmod 755 /share/apps/nwchem

It will allows you to read, write and execute. It will allow group members and 'world' to read and execute, but not write.

If you've already built earlier versions of nwchem you want to skip the steps above.

NWChem:
You will need to go to http://www.nwchem-sw.org/index.php/Download and download version 6.1.1. Using the direct link (http://www.nwchem-sw.org/images/Nwchem-6.1.1-src.2012-06-27.tar.gz) with wget isn't working for me anymore.

Put your Nwchem-6.1.1-src.2012-06-27.tar.gz in /share/apps/nwchem and expand it.
tar xvf Nwchem-6.1.1-src.2012-06-27.tar.gz
cd nwchem-6.1.1-src/

Create buildconf.sh
export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export PYTHONHOME=/opt/rocks
export PYTHONVERSION=2.4
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/opt/openmpi
export MPI_INCLUDE=/opt/openmpi/include
export LIBRARY_PATH=$LIBRARY_PATH:/opt/openmpi/lib:/share/apps/openblas
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
export BLASOPT="-L/share/apps/openblas/lib -lopenblas -lopenblas_nehalem-r0.1.1 -lopenblas_nehalemp-r0.1.1"
cd $NWCHEM_TOP/src
export FC=gfortran
make clean
make  nwchem_config
make  FC=gfortran |tee make.log
cd ../contrib
./getmem.nwchem

Before running it, edit src/config/makefile.h and change line 1957:
1957      EXTRA_LIBS +=    -lnwcutil  -lpthread -lutil -ldl -lz -lssl
You are now ready to build.
time sh buildconf.sh

It took about 15 minutes to build -- a clear improvement over 6.1 for me (30 min+)

Create a default.nwchemrc in your /share/apps/nwchem/nwchem-6.1.1-src/ folder
nwchem_basis_library /share/apps/nwchem/nwchem-6.1.1-src/src/basis/libraries/
ffield amber
amber_1 /share/apps/nwchem/nwchem-6.1.1-src/src/data/amber_s/
amber_2 /share/apps/nwchem/nwchem-6.1.1-src/src/data/amber_x/
amber_3 /share/apps/nwchem/nwchem-6.1.1-src/src/data/amber_q/
amber_4 /share/apps/nwchem/nwchem-6.1.1-src/src/data/amber_u/
amber_5 /share/apps/nwchem/nwchem-6.1.1-src/src/data/custom/
spce /share/apps/nwchem/nwchem-6.1.1-src/src/data/solvents/spce.rst
charmm_s /share/apps/nwchem/nwchem-6.1.1-src/src/data/charmm_s/
charmm_x /share/apps/nwchem/nwchem-6.1.1-src/src/data/charmm_x/
Then each user can do
ln -s /share/apps/nwchem/nwchem-6.1.1-src/default.nwchemrc ~/.nwchemrc

You might also want to add nwchem to path -- add
export PATH=$PATH:/share/apps/nwchem/nwchem-6.1.1-src
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib:/share/apps/openblas
to your ~/.bashrc

01 June 2012

170. Compiling PVM and XPVM on ROCKS 5.4.3

And we're back to ROCKS again.

NOTE: I haven't actually tested the binaries and libs compiled here. I think they should work. But I don't know for sure.

PQS works with openmpi, mpich and PVM. Our vanilla ROCKS install already has openmpi and mpich. There's a package called rocks-pvm, but the size is 50 kb and didn't seem to actually install anything precompiled, so I removed it and decided to go the compilation way instead.

The paths here are specific to the cluster I did this on, so customise as needed.

sudo mkdir /share/apps/pvm
sudo chown ${USER} /share/apps/pvm
cd /share/apps/pvm
wget http://www.netlib.org/pvm3/pvm3.4.6.tgz
tar xvf pvm3.4.6.tgz
cd pvm3/
export PVM_ROOT=`pwd`
make

Time to set up environment variables. Either edit /etc/profile or ~/.bashrc, depending on powers and reach., and add
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:share/apps/pvm/pvm3/lib/LINUX64
export PATH=$PATH:/share/apps/pvm/pvm3/bin/LINUX64
export PVM_ROOT=/share/apps/pvm/pvm3

Changes won't take effect until you source the file, or open a new terminal.


I profess to be ignorant about how to actually use pvm, so no testing just yet.


So I also stumbled across xpvm, which sounds (and looks) neat.

wget http://www.netlib.org/pvm3/xpvm/XPVM.src.1.2.5.tgz
cd /share/apps/pvm
tar xvf XPVM.src.1.2.5.tgz
cd xpvm/

Time to do some housekeeping before compiling:
It requires:
1. PVM 3.3.0 or later.
2. TCL 7.3 or later.
3. TK 3.6.1 or later.

I find
/usr/share/tk8.4
/usr/share/tcl8.4
so I might be ok. We just compiled pvm 3.4.6, so it should be alright.

First figure out where stuff is:

locate libtk|grpe so
/usr/lib/libtk.so
/usr/lib/libtk8.4.so
/usr/lib64/libtk.so
/usr/lib64/libtk8.4.so
locate libtcl|grep so
/usr/lib/libtcl.so
/usr/lib/libtcl8.4.so
/usr/lib64/libtcl.so
/usr/lib64/libtcl8.4.so
/usr/lib64/tclx8.4/libtclx8.4.so
These are fairly standard locations, so they should already be searched by ld -- no need to specify them thus.

Include is potentially worse since they'd typically need the development packages.
locate tcl | grep "\.h"
[..]
/usr/include/tcl-private/generic/tcl.h
[..]
locate tk|grep "\.h"
[..]
/usr/include/tk-private/generic/tk.h
[..]
So we /should/ be fine.

We also need the X11 libs and headers:
locate libX11

/usr/lib/libX11.so
/usr/lib/libX11.so.6
/usr/lib/libX11.so.6.2.0
/usr/lib64/libX11.so
/usr/lib64/libX11.so.6
/usr/lib64/libX11.so.6.2.0
locate X11|grep include

[..]
/usr/include/X11
[..]
Finally,

locate libdl
/lib/libdl-2.5.so
/lib/libdl.so.2
/lib64/libdl-2.5.so
/lib64/libdl.so.2
/usr/lib/libdl.a
/usr/lib/libdl.so
/usr/lib64/libdl.a
/usr/lib64/libdl.so
I'll specify the lib locations even though in some of these particular cases it isn't necessary:


Edit xpvm/src/Makefile.aimk and set (line numbers added by me):
19  PVMVERSION = -DUSE_PVM_34
Comment out line 42:
 42 #TCLTKHOME  =  $(HOME)/TCL
and
 44 TCLTKHOME  =   /usr/include
Change

 47 TCLINCL     =   -I$(TCLTKHOME)/tcl-private/generic
 48 TKINCL      =   -I$(TCLTKHOME)/tk-private/generic
and

 57 TCLLIBDIR   =   -L/usr/lib64/tclx8.4
 58 TKLIBDIR    =   -L/usr/lib64
and
 70 TCLLIB      =   -ltcl8.4
 71 TKLIB       =   -ltk8.4
and
83 XINCL       = -L/usr/include/X11
84 XLIBDIR     = -L/usr/lib64
and finally,
 96 SHLIB       = -ldl



Fell asleep? Time to get compiling.
export XPVM_ROOT=/share/apps/pvm/xpvm

export TCL_LIBRARY=
/usr/share/tcl8.4

export TK_LIBRARY=/usr/share/tk8.4

cd ${XPVM_ROOT}
make
[..]
Installing xpvm.tcl
Installing globs.tcl
Installing procs.tcl
Installing util.tcl
make[1]: Leaving directory `/share/apps/pvm/xpvm/src/LINUX64'

The beautiful thing is that the xpvm binary automagically ends up in the pvm3/bin/LINUX64 directory, so no need to fiddle with path.



In theory everything should work now if you log in with ssh -XC. However I get
xpvm
libpvm [pid2607] /tmp/pvmd.502: No such file or directory
libpvm [pid2607]: Can't Start PVM: Can't start pvmd
I'm not actually running -- nor have I ever run -- anything with pvm.

touch /tmp/pvmd.502
xpvm
libpvm [pid4219]: mksocs() read addr file: wrong length read
Connecting to PVMD already running... libpvm [pid4219]: mksocs() read addr file: wrong length read
libpvm [pid4219]: mksocs() read addr file: wrong length read
libpvm [pid4219]: mksocs() read addr file: wrong length read
libpvm [pid4219]: pvm_mytid(): Can't contact local daemon
libpvm [pid4219]: Error Joining PVM: Can't contact local daemon
I mean, it looks like it should work, once pvm is being used.

13 March 2012

106. htop 1.0.1 and sinfo-0.0.45 on rock 5.4.3/centos 5.6

There are a number of performance monitor tools in the debian repos. ROCKS 5.4.3/Centos doesn't seem quite as well-equipped.

First out, htop:

htop:
wget http://downloads.sourceforge.net/project/htop/htop/1.0.1/htop-1.0.1.tar.gz
tar -xvf htop-1.0.1.tar.gz
cd htop-1.0.1/
./configure --prefix=/home/me/.htop
make
make install

It's as simple as that.
Add e.g.
alias htop='/home/me/.htop/bin/htop'
to your ~/.bashrc
Note: this works on Scientific Linux (boron) 5.4 as well.

sinfo:
Update 13/03/2012:
Sinfo <0.0.44 has IPv6 enabled by default.
On sinfo >=0.0.45 you can disable IPv6 using ./configure --disable-IPv6

Sinfo is probably the snazziest cluster monitoring tool that I know of. Sure, ganglia etc. are nice too, but they run as web service. Sinfo is a 'simple' curses program, but building it on CentOS was a bit of a challenge.

Be aware that sinfo versions prior to 0.045 expect ipv6 to work -- by default ROCKS disables IPv6, so use sinfo 0.0.45 and above.





First boost:
(yum install boost-devel didn't do anything for me)
cd ~/tmp
wget http://sourceforge.net/projects/boost/files/boost/1.49.0/boost_1_49_0.tar.gz/download
tar -xvf boost_1_49_0.tar.gz
cd boost_1_49_0/
./bootstrap.sh --prefix=/usr

Edit Jamroot and add
using mpi ;
The space between mpi and ; is needed.

Symlink to your mpic++, e.g. if your mpic++ is in /opt/openmpi:
sudo ln -s /opt/openmpi/bin/mpic++ /usr/bin/mpic++

The following step takes a long time:
sudo ./b2 -a install --layout=versioned --build-type=complete

These days all the libboost libs are multithread aware (or so I hear), and in debian it turns out that the -mt.so libs are just symbolic links to the 'regular' libs.
sudo ln -s /usr/lib/libboost_signals.so /usr/lib/libboost_signals-mt.so
sudo ln -s /usr/lib/libboost_date_time.so /usr/lib/libboost_date_time-mt.so
sudo ln -s /usr/lib/libboost_serialization.so /usr/lib/libboost_serialization-mt.so
sudo ln -s /usr/lib/libboost_wserialization.so /usr/lib/libboost_wserialization-mt.so
sudo ln -s /usr/lib/libboost_regex.so /usr/lib/libboost_regex-mt.so

sudo ln -s /usr/lib/libboost_signals.so.1.49.0 /usr/lib64/libboost_signals.so.1.49.0

Then asio
cd ~/tmp
wget "http://downloads.sourceforge.net/project/asio/asio/1.5.3%20%28Development%29/asio-1.5.3.tar.bz2?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fasio%2F&ts=1331441086&use_mirror=aarnet"
tar -xvf asio-1.5.3.tar.bz2
cd asio-1.5.3/
./configure
make
sudo make install

Then sinfo
cd ~/tmp
wget http://www.ant.uni-bremen.de/whomes/rinas/sinfo/download/sinfo-0.0.45.tar.gz
tar -xvf sinfo-0.0.45.tar.gz
cd sinfo-0.0.45/
./configure --disable-IPv6

The build should be fine.

Configuration:
you'll end up with
/usr/local/sbin/sinfod
/usr/local/bin/sinfo
You may want to make sure there are paths to them by adding the following to your ~/.bashrc:
export PATH=$PATH:/usr/local/bin:/usr/local/sbin
The changes take effect next time you log in to a shell, or just run
source ~/.bashrc
for immediate effect.

Also, create a file called /etc/default/sinfo with the following in it:
OPTS="--quiet --bcastaddress=192.168.1.255"

Start sinfod with
sinfod --quiet --bcastaddress=192.168.1.255

then check that it's running
ps aux | grep sinfod

If it's not running, then try
sinfod -F

If it gives something along the lines of
exception:open:address family not supported
you most likely
1) haven't enabled ipv6 for your interface and
2) didn't disable IPv6 during compilation and/or
3) used version<0.045

Check by doing ifconfig -- does it return both an ipv4 and an ipv6 address?

Enabling ipv6
Unless you know what you're doing, don't fiddle with the network interfaces on a production cluster -- network interfaces on a multinode cluster are typically highly tuned to minimise latency, so don't mess it up.

Anyway. First check your /etc/modules.conf and - if present - comment out
alias ipv6 off
options ipv6 disable=1
Edit your /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1:0
IPADDR=192.168.1.111
NETMASK=255.255.255.0
BOOTPROTO=none
MTU=1500
TYPE=Ethernet
GATEWAY=192.168.1.1
USERCTL=no
IPV6INIT=yes
PEERDNS=yes
ONPARENT=yes
IPV6ADDR=fe80::2f0:4dff:f383:b44/64
IPV6_DEFAULTGW=fe80::2f0:4dff:fe83:a48/64
I just made up the IPV6ADDR, and took the IPV6_DEFAULTGW from my gateway machine (running debian, so ipv6 enabled by default)

Assuming that your firewall is allowing traffic at port 60003 and free traffic in and out on 192.168.1.255 things should work fine.



Errors


Error (boost):
MPI auto-detection failed: unknown wrapper compiler mpic++
Please report this error to the Boost mailing list: http://www.boost.org
You will need to manually configure MPI support.
Solution:
make sure you've symlinked to your mpic++ instance in /usr/bin
e.g. if your mpic++ is in /opt/openmpi/bin/mpic++
sudo ln -s /opt/openmpi/bin/mpic++ /usr/bin/mpic++


Error (sinfo):
message.cc: In member function 'void Message::popFrontMemory(void*, size_t)':
message.cc:183: error: 'memory' was not declared in this scope
message.cc:193: error: 'boost' has not been declared
message.cc:193: error: expected primary-expression before 'char'
message.cc:193: error: expected `;' before 'char'
message.cc:196: error: 'newMemory' was not declared in this scope
message.cc:196: error: 'memory' was not declared in this scope
make[2]: *** [message.lo] Error 1
make[2]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/libmessage'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/libmessage'
make: *** [all-recursive] Error 1
Solution:
You need to make sure that the libs are found -- either symlink manually between your build directory and /usr/lib, or use boostrap.sh --prefix=/usr. See above for how to do it.

Error (sinfo):
udpmessagereceiver.h:14: error: 'asio' has not been declared
udpmessagereceiver.h:14: error: ISO C++ forbids declaration of 'endpoint' with no type
udpmessagereceiver.h:14: error: expected ';' before 'sender_endpoint'
udpmessagereceiver.h:16: error: 'asio' has not been declared
udpmessagereceiver.h:16: error: ISO C++ forbids declaration of 'io_service' with no type
udpmessagereceiver.h:16: error: expected ';' before '&' token
udpmessagereceiver.h:17: error: 'asio' has not been declared
udpmessagereceiver.h:17: error: ISO C++ forbids declaration of 'socket' with no type
udpmessagereceiver.h:17: error: expected ';' before 'sock'
udpmessagereceiver.h:20: error: expected ',' or '...' before '::' token
udpmessagereceiver.h:20: error: ISO C++ forbids declaration of 'asio' with no type
udpmessagereceiver.h:23: error: 'asio' has not been declared
udpmessagereceiver.h:23: error: expected `)' before '&' token
udpmessagereceiver.cc:5: error: 'asio' has not been declared
udpmessagereceiver.cc:5: error: expected `)' before '&' token
make[1]: *** [udpmessagereceiver.lo] Error 1
make[1]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/libmessageio'
make: *** [all-recursive] Error 1

Solution: you've only got boost::asio installed, not the independent asio. See above for how to compile and install asio.

Error (sinfo):

/usr/bin/ld: cannot find -lboost_signals-mt
collect2: ld returned 1 exit status
make[2]: *** [sinfod] Error 1
make[2]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/sinfod'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/sinfod'
make: *** [all-recursive] Error 1
Solution:
You need a symlink pointing form /usr/lib/libboost_signals-mt.so to /usr/lib/libboost_signals.so
ln -s /usr/lib/libboost_signals.so /usr/lib/libboost_signals-mt.so 

Error (sinfod):
sinfod --quiet --bcastaddress=192.168.1.255 gives nothing and sinfod exits silently immediately
sinfod -F gives
exception:open:address family not supported
Here's the relevant strace output:
[..]
 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 6
[..]
 socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDP) = -1 EAFNOSUPPORT (Address family not supported by protocol)
futex(0x333a40d350, FUTEX_WAKE_PRIVATE, 2147483647) = 0
close(6)                                = 0
close(3)                                = 0
close(4)                                = 0
close(5)                                = 0
write(2, "Exception: ", 11)             = 11
write(2, "open: Address family not support"..., 46) = 46
write(2, "\n", 1)                       = 1
exit_group(0)                           = ?

Solution: enable ipv6 (see above)