Showing posts with label ROCKS 5.4.3. Show all posts
Showing posts with label ROCKS 5.4.3. Show all posts

21 March 2017

635. Installing R on Rocks 5.4.3

Rocks 5.4.3 is based on CentOS 5.6 which is practically ancient by now (released Jan 2011).

Either way, when dealing with someone else's cluster its better to not fiddle too much with what is already working.

Here's a not at all elegant way of install R on Rocks 5.4.3
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/R-core-3.3.2-3.el5.x86_64.rpm
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/R-3.3.2-3.el5.x86_64.rpm 
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/R-devel-3.3.2-3.el5.x86_64.rpm 
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/libRmath-3.3.2-3.el5.x86_64.rpm 
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/libRmath-devel-3.3.2-3.el5.x86_64.rpm 
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/R-core-devel-3.3.2-3.el5.x86_64.rpm
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/libssh2-0.18-10.el5.x86_64.rpm 
wget http://mirror.nsw.coloau.com.au/epel/5/x86_64/xdg-utils-1.0.2-4.el5.noarch.rpm 
wget http://mirror.centos.org/centos/5/os/x86_64/CentOS/xz-devel-4.999.9-0.3.beta.20091007git.el5.x86_64.rpm
wget http://mirror.centos.org/centos/5/os/x86_64/CentOS/texinfo-tex-4.8-14.el5.x86_64.rpm
wget http://mirror.centos.org/centos/5/os/x86_64/CentOS/texinfo-4.8-14.el5.x86_64.rpm

sudo yum install R-3.3.2-3.el5.x86_64.rpm libRmath-devel-3.3.2-3.el5.x86_64.rpm libRmath-3.3.2-3.el5.x86_64.rpm R-devel-3.3.2-3.el5.x86_64.rpm R-core-3.3.2-3.el5.x86_64.rpm R-core-devel-3.3.2-3.el5.x86_64.rpm libssh2-0.18-10.el5.x86_64.rpm xdg-utils-1.0.2-4.el5.noarch.rpm texinfo-tex-4.8-14.el5.x86_64.rpm xz-devel-4.999.9-0.3.beta.20091007git.el5.x86_64.rpm texinfo-4.8-14.el5.x86_64.rpm 
[..] Total size: 169 M Downloading Packages: Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : libssh2 1/11 Installing : libRmath 2/11 Installing : texinfo 3/11 Installing : texinfo-tex 4/11 Installing : libRmath-devel 5/11 Installing : xz-devel 6/11 Installing : xdg-utils 7/11 Installing : R-core 8/11 Installing : R-core-devel 9/11 Installing : R-devel 10/11 Installing : R 11/11 Installed: R.x86_64 0:3.3.2-3.el5 R-core.x86_64 0:3.3.2-3.el5 R-core-devel.x86_64 0:3.3.2-3.el5 R-devel.x86_64 0:3.3.2-3.el5 libRmath.x86_64 0:3.3.2-3.el5 libRmath-devel.x86_64 0:3.3.2-3.el5 libssh2.x86_64 0:0.18-10.el5 texinfo.x86_64 0:4.8-14.el5 texinfo-tex.x86_64 0:4.8-14.el5 xdg-utils.noarch 0:1.0.2-4.el5 xz-devel.x86_64 0:4.999.9-0.3.beta.20091007git.el5 Complete!

Testing:
R
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > q() Save workspace image? [y/n/c]: n

17 August 2013

495. Briefly: gromacs 4.6 on ROCKS 5.4.3

I didn't want to spend much time on getting this right, so I took the easiest route and combined three posts:

Firstly, I compiled cmake:
http://verahill.blogspot.com.au/2012/05/compiling-openbabel-231-and-cmake-on.html

Secondly, I used the openblas libraries which I compiled in this post
http://verahill.blogspot.com.au/2013/05/421-nwchem-63-on-rocks-543centos-56.html

Thirdly, I looked at this post which deals with gromacs 4.6 on debian:
http://verahill.blogspot.com.au/2013/04/396-compiling-gromacs-46-with-openblas.html

gromacs 4.6 can download and build its own fftw libs, so you don't need to do that separately.

First make the target directory, e.g.
sudo mkdir /share/apps/gromacs
sudo chown $USER:$USER /share/apps/gromacs

Single precision:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib:/share/apps/openblas/lib export LDFLAGS="-L/share/apps/openblas/lib -lopenblas" export CPPFLAGS="-I/share/apps/openblas/include" export CC=/usr/bin/gcc44 export CXX=/usr/bin/g++44 cmake -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=On -DGMX_DOUBLE=off -DCMAKE_INSTALL_PREFIX=/share/apps/gromacs/gromacs4.6_single -DGMX_EXTERNAL_BLAS=/share/apps/openblas/lib ../gromacs-4.6 make make install
Double precision:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib:/share/apps/openblas/lib export LDFLAGS="-L/share/apps/openblas/lib -lopenblas" export CPPFLAGS="-I/share/apps/openblas/include" export CC=/usr/bin/gcc44 export CXX=/usr/bin/g++44 cmake -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=On -DGMX_DOUBLE=on -DCMAKE_INSTALL_PREFIX=/share/apps/gromacs/gromacs4.6_double -DGMX_EXTERNAL_BLAS=/share/apps/openblas/lib ../gromacs-4.6 make make install
In my particular case I've got all users as members of the compchem group:
chown $USER:compchem /share/apps/gromacs -R 
chmod g+rwx /share/apps/gromacs -R

15 February 2013

339. Compiling ncdu on ROCKS 5.4.3/Centos 5.6

du is nice, but ncdu gives a better overview. Nothing odd about building it though:

mkdir ~/tmp
cd ~/tmp
wget http://dev.yorhel.nl/download/ncdu-1.9.tar.gz
tar xvf ncdu-1.9.tar.gz
cd ncdu-1.9/
sudo mkdir /share/apps/tools/ncdu -p
sudo chown $USER /share/apps/tools/ncdu
./configure --prefix=/share/apps/tools/ncdu
make
make install
echo 'export PATH=$PATH:/share/apps/tools/ncdu/bin' >> ~/.bashrc
source ~/.bashrc

Start by running
ncdu

01 February 2013

329. ECCE, xterm and X forwarding: fixing broken "tail -f on output" in ECCE/'untrusted X11 forwarding' error


The problem
In ECCE when you highlight a running job on a remote server which you've set up with the frontendMachine option (here and here and here) which is a ROCKS 5.4.3/CentOS server and e.g. hit Alt+L or "Run Mgmt"/"Tail -f on Output file" and nothing happens, and when you set ECCE to provide verbose output (add "ECCE_RCOM_LOGMODE true" to ecce/apps/siteconfig/site_runtime) you see the following errors:

X11 connection rejected because of wrong authentication. X connection to localhost:43.0 broken (explicit kill or server shutdown).
and
OpenSSH_4.3p2, OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008 Warning: untrusted X11 forwarding setup failed: xauth key data not generated Warning: No xauth data; using fake authentication data for X11 forwarding.
Obviously there are non-ECCE related situation where you may see these errors too. Doesn't matter -- same solution.


The diagnostics
cat /etc/ssh/sshd_config |grep X11
X11Forwarding yes X11DisplayOffset 10
cat /etc/ssh/ssh_config |grep X11|grep -v ^#
ForwardX11 yes
sudo cat /etc/ssh/sshd_config |grep X11|grep -v ^#
X11Forwarding yes X11DisplayOffset 10

So, why localhost:43? And why isn't it working? From my workstation to the cluster which is connected to the net via the front node, and then from the cluster front to the cluster front's local name.

ssh -X server.external.dns
echo $DISPLAY
localhost:42.0
ssh -X server.local.dns
Warning: untrusted X11 forwarding setup failed: xauth key data not generated Warning: No xauth data; using fake authentication data for X11 forwarding.
echo $DISPLAY
localhost:44.0
yet
ssh -Y server.local.dns

works fine.

The solution:
Simpler than I thought:
I edited ~/.ssh/config on the server, and did
Host server.local.dns Hostname server.local.dns User me ForwardX11 yes ForwardX11Trusted yes

And now it works!

Presumably I could've just edited /etc/ssh_config instead, but it's a multi-user cluster and I'm happier to change things on a user-by-user basis.

10 January 2013

314. Briefly: Installing talkd on ROCKS 5.4.3

I was asked to set up talkd on our Rocks 5.4.3 cluster (Centos 5.4) . There's no talkd or talk-server packages in the repos on that server.

Note: The general consensus seems to be that talk is
1. insecure and
2. outdated.

To install:
wget http://mirror.centos.org/centos/5/os/x86_64/CentOS/talk-server-0.17-31.el5.x86_64.rpm
yum localinstall talk-server-0.17-31.el5.x86_64.rpm
sudo iptables -A INPUT -p udp --dport 517 -s 127.0.0.1 -j ACCEPT
sudo iptables -A INPUT -p udp --dport 518 -s 127.0.0.1 -j ACCEPT

Above I've added, but haven't yet tried, -s 127.0.0.1 to limit connections from the local computer (localhost). If it doesn't work with -s 127.0.0.1, then try without -- but then be aware of the security implications. These rules also aren't permanent and will be lost on reboot. To make them permanent, edit /etc/sysconfig/iptables.

I couldn't get talk to work before opening the ports and would get
Error on read from talk daemon : Connection refused
Note that talkd uses Xinet and not init -- it will not run as  memory resident daemon, but instead be launched by xinet only when talkd is needed (traffic is detected to the ports associated with talkd). Xinet.d is a bit like a concierge, waking up whomever seems to be the adressee.


Edit both /etc/xinetd.d/talk and /etc/xinetd.d/ntalk. Change to:
# default: off # description: The talk server accepts talk requests for chatting with users \ # on other systems. service talk { flags = IPv4 disable = no socket_type = dgram wait = yes user = nobody group = tty server = /usr/sbin/in.talkd }
Finally, restart xinet.d (doing chkconfig talk on wasn't enough):
sudo /etc/init.d/xinet.d restart



Note: If you or the other user have several terminals open you should figure out which terminal to use. If you're user2, do
ps
PID TTY TIME CMD 5455 pts/23 00:00:00 bash 9321 pts/23 00:00:00 ps
user1 can then do
[user1@host ~]$ talk user2@localhost pts/23

and user2 will see the following in that terminal:
Message from Talk_Daemon@host at 14:49 ... talk: connection requested by user1@localhost.localdomain. talk: respond with: talk user1@localhost.localdomain
If you are user1 and have no idea on what terminal user2 is logged on, you can try
w|grep user2
user2 pts/8 remote:S.0 04Nov12 59:00 0.04s 0.04s /bin/bash user2 pts/9 remote:S.1 04Nov12 17:28 0.03s 0.03s /bin/bash user2 pts/10 remote Mon12 2days 0.03s 0.00s ssh -X -v volde user2 pts/11 local Mon12 2days 0.29s 0.26s perl eccejobmon user2 pts/23 remote 14:30 7.00s 0.00s 0.00s -bash

13 September 2012

235. CPMD with Netlib's lapack, blas and your own fftw3 on ROCKS 5.4.3/CentOS 5.6

Update 8 Feb 2013:
I somehow had forgot to include some of the instructions for the BLAS part. Fixed now.

Post:
This is done pretty much like how it's done on Debian (-march=native didn't work in the BLAS compilation though, nor was -fno-whole-file accepted when compiling cpmd)

1. Compile cmake according to this post:
http://verahill.blogspot.com.au/2012/05/compiling-openbabel-231-and-cmake-on.html

2. Compile BLAS
sudo mkdir /share/apps/tools/netlib/blas/lib -p
sudo chown $USER /share/apps/tools/netlib -R
mkdir ~/tmp
cd ~/tmp
wget http://www.netlib.org/blas/blas.tgz
tar xvf blas.tgz
cd BLAS/

Edit make.inc
OPTS     = -O3 -shared -m64 -fPIC

make all

gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc
ln -s libblas.so.1.0.1 libnetblas.so
cp lib*blas* /share/apps/tools/netlib/blas/lib

3. Compile LAPACK
sudo mkdir /share/apps/tools/netlib/lapack -p
sudo chown $USER /share/apps/tools/netlib -R
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz

tar xvf lapack-3.4.1.tgz
cd lapack-3.4.1/
mkdir build
cd build
ccmake ../

Hit 'c' and edit the values:

 BUILD_COMPLEX                    ON
 BUILD_COMPLEX16                  ON
 BUILD_DOUBLE                     ON
 BUILD_SHARED_LIBS                ON
 BUILD_SINGLE                     ON
 BUILD_STATIC_LIBS                ON
 BUILD_TESTING                    ON
 CMAKE_BUILD_TYPE                    
 CMAKE_INSTALL_PREFIX             /share/apps/tools/netlib/lapack
 LAPACKE                          OFF
 LAPACKE_WITH_TMG                 OFF
 USE_OPTIMIZED_BLAS               ON
 USE_XBLAS                        OFF

Hit 'c' again, then hit 'g'.

Edit CMakeCache.txt and add the following lines at the beginning:

########################
# EXTERNAL cache entries
########################
BLAS_FOUND:STRING=TRUE
BLAS_GENERIC_FOUND:BOOL=TRUE
BLAS_GENERIC_blas_LIBRARY:FILEPATH=/share/apps/tools/netlib/blas/lib/libnetblas.so
BLAS_LIBRARIES:PATH=/share/apps/tools/netlib/blas/lib/libnetblas.so

Do
ccmake ../
again, hit 'c', then 'g'.

Now,
make
make install

4. Compile FFTW3
sudo mkdir /share/apps/tools/fftw3
sudo chown $USER /share/apps/tools/fftw3
cd ~/tmp
wget http://www.fftw.org/fftw-3.3.1.tar.gz
tar -xvf fftw-3.3.1.tar.gz
cd fftw-3.3.1
make distclean
./configure --enable-float --enable-mpi --enable-threads --with-pic --prefix=/share/apps/tools/fftw3/single
make
make install
make distclean
./configure --disable-float --enable-mpi --enable-threads --with-pic --prefix=/share/apps/tools/fftw3/double
make 
make install

5. Compile CPMD
I downloaded the cpmd file to a client computer, then uploaded it to the ROCKS front node:
sftp me@rocks:/home/me/tmp

Connected to rocks.
Changing to: /home/me/tmp
sftp> put cpmd-v3_15_3.tar.gz
Uploading cpmd-v3_15_3.tar.gz to /home/me/tmp/cpmd-v3_15_3.tar.gz
cpmd-v3_15_3.tar.gz                100% 2937KB 587.4KB/s   00:05
sftp> exit

I then logged in via ssh as normal.
cd ~/tmp
tar xvf cpmd-v3_15_3.tar.gz
cd CPMD/CONFIGURE
Create a new file LINUX-x86_64-ROCKS

     IRAT=2
     CFLAGS='-c -O2 -Wall'
     CPP='/lib/cpp -P -C -traditional'
     CPPFLAGS='-D__Linux -D__PGI -D__GNU -DFFT_FFTW3 -DPARALLEL -DPOINTER8'
     FFLAGS='-c -O2 -fcray-pointer -fsecond-underscore'
LFLAGS='-L/share/apps/tools/fftw3/double/lib -lfftw3-lfftw3_mpi -lfftw3_threads -I/usr/include -L/share/apps/tools/netlib/blas/lib -lnetblas -L/share/apps/tools/netlib/lapack/lib -llapack -L/opt/openmpi/lib -lpthread -lmpi'
     FFLAGS_GROMOS='  $(FFLAGS)' 
      FC='mpif77 -fbounds-check'
      CC='mpicc'
      LD='mpif77 -fbounds-check'

NOTE: I don't think the -I belongs in the LFLAGS statement, but I'm presuming that I put it there for a reason back when I did it the first time.

Go to ~/tmp/CPMD, and edit wfnio.F (basically replace 3 with 2 and remove 'L'):

 15       CHARACTER(len=*) TAG
 63         IF(TAG(1:2).EQ.'NI') THEN
201       IF(TAG(1:2).NE.'NI') THEN
271         IF(TAG(1:2).EQ.'NI') THEN

Finally, edit Makefile and change
  23 LD = f95 -O
to
  23 LD = mpif77 -fbounds-check

Time to compile


./mkconfig.sh LINUX-x86_64-ROCKS > Makefile
make
sudo mkdir /share/apps/cpmd
sudo chown $USER /share/apps/cpmd
cp cpmd.x /share/apps/cpmd

echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/tools/netlib/blas/lib:/share/apps/tools/netlib/lapack/lib:/share/apps/tools/fftw3/double/lib' >>~/.bashrc
echo 'export PATH=$PATH:/share/apps/cpmd' >> ~/.bashrc
echo "export PP_LIBRARY_PATH=/share/apps/cpmd/PP_LIBRARY" >>~/.bashrc

You're now done compiling. To test, you need to get some pseudopotential files -- look at e.g. the end of http://verahill.blogspot.com.au/2012/07/not-solved-compiling-cpmd-on-debian.html for instructions.


10 September 2012

230. ROCKS 5.4.3, ATLAS and Gromacs on Xeon X3480

After doing another round of 'benchmarks' (there are so many factors that differ between the systems that it's difficult to tell exactly what I'm measuring) I'm back to looking at my BLAS/LAPACK.


So here's compiling ATLAS on a cluster made up of six dual-socket mobos with 2x quadcore XeonX3480 CPUs and 8 Gb RAM. The cluster is running ROCKS 5.4.3, which is a spin based on Centos 5.6. We then compile GROMACS using ATLAS and compare it with Openblas. Please note that I am not an expert on optimisations (or computers or anything) so comparing Openblas vs ATLAS won't tell you which one is 'better'. They are just numbers based on what someone once observed on a particular system under a particular set of circumstances.

Hurdles: I first had to deal with the lapack + bad symbols + recompile with -fPIC problem (solved by using netlib lapack and building shared libraries), then encountered the 'libgmx.so: undefined reference to _gfortran_' issue (solved by adding -lgfortran to LDFLAGS).


ATLAS
sudo mkdir /share/apps/ATLAS
sudo chown $USER /share/apps/ATLAS
cd ~/tmp
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
 wget http://downloads.sourceforge.net/project/math-atlas/Developer%20%28unstable%29/3.9.72/atlas3.9.72.tar.bz2
tar xvf atlas3.9.72.tar.bz2
cd ATLAS/
mkdir build
cd build
.././configure --prefix=/share/apps/ATLAS -Fa alg '-fPIC' --with-netlib-lapack-tarfile=$HOME/tmp/lapack-3.4.1.tgz --shared
OS configured as Linux (1)
Assembly configured as GAS_x8664 (2)
Vector ISA Extension configured as  SSE3 (6,448)
Architecture configured as  Corei1 (25)
Clock rate configured as 3059Mhz
make
DONE  STAGE 5-1-0 at 15:23
ATLAS install complete.  Examine
ATLAS/bin/<arch>/INSTALL_LOG/SUMMARY.LOG for details.

ls lib/
libatlas.a  libcblas.a  libf77blas.a  libf77refblas.a  liblapack.a  libptcblas.a  libptf77blas.a  libptlapack.a  libsatlas.so  libtatlas.so  libtstatlas.a  Makefile  Make.inc
make install

In addition to successful copying you'll also get errors along the lines of

cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libsatlas.dylib': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libtatlas.dylib /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libtatlas.dylib': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libsatlas.dll /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libsatlas.dll': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libtatlas.dll /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libtatlas.dll': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libsatlas.so /share/apps/ATLAS/lib/.
cp /home/me/tmp/ATLAS/build/lib/libtatlas.so /share/apps/ATLAS/lib/.
because those files don't exist. 


Gromacs

FFTW3 was first build according to this. The only difference is the install targets (--prefix) -- I put things in /share/apps/gromacs/.fftwsingle and /share/apps/gromacs/.fftwdouble. Gromacs was downloaded and extracted as shown in that post, and /share/apps/gromacs was created.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib:/share/apps/ATLAS/lib
#single precision
export LDFLAGS="-L/share/apps/gromacs/.fftwsingle/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwsingle/include -I/share/apps/ATLAS/include/atlas"
./configure --disable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spa --prefix=/share/apps/gromacs
make -j3
make install
#double precision
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwdouble/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwdouble/include -I/share/apps/ATLAS/include/atlas"
./configure --disable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpa --prefix=/share/apps/gromacs
make -j3
make install
#single + mpi
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwsingle/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwsingle/include -I/share/apps/ATLAS/include/atlas""
./configure --enable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spampi --prefix=/share/apps/gromacs
make -j3
make install
#double + mpi
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwdouble/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwdouble/include -I/share/apps/ATLAS/include/atlas"
./configure --enable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpampi --prefix=/share/apps/gromacs
make -j3
make install
The -lgfortran is IMPORTANT, or you'll end up with
libgmx.so: 'undefined reference to _gfortran_' type errors.

Performance
I ran a 6x6x6 nm box of water for 5 million steps (10 ns) to get a rough idea of the performance.
Make sure to put
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/ATLAS/lib
in your ~/.bashrc, and to include it in your SGE jobs files (if that's what you use).

I allocated 8 Gb RAM and 8 cores for each run.

Double precision:
Openblas: 10.560 ns/day (11.7 GFLOPS, runtime 8182 seconds)
ATLAS  : 10.544 ns/day (11.6 GFLOPS, runtime 8194 seconds)

Single precision:
Openblas: 17.297 ns/dat (19.1 GFLOPS, runtime 4995 seconds)
ATLAS:   17.351 ns/day (19.2 GFLOPS, runtime 4980 seconds)
That's 15 seconds difference on a 1h 20 min run. I'd say they are identical.

24 July 2012

215. Compiling gcc 4.7.1/gfortran 4.7.1 on Centos 5.6/ROCKS 4.5.3 (and gmp, mpfr, mpc, binutils (ld,as), glibc, libunistring, libtools...)

Update June 2013:
See flakrat's post for a working example: http://flakrat.blogspot.com.au/2013/06/building-gcc-481-on-centos-59.html

I haven't updated this post for a long time, and I haven't been using the GCC I compiled this way. See flakrat's post for an up-to-date working version.

Update(s): This is the first time I compile the GCC and because of this I will like be going back, updating this document over the coming days. Make sure to 1) check back in a week and 2) hit refresh.

Updated (given in Melb./Au time zone): 24/7, 25/7

The reason for this post is the outdated version (4.1) of gcc on our ROCKS 5.4.3/CentOS 5.6 cluster.

I looked at installing GCC 4.4.6 using rpm packages I found online, but I'm not used to the Red Hat way of doing things, and e.g. openmpi-devel was requiring a version of libgomp I wanted to update. Ultimately I got scared of messing up a production cluster and decided that compiling, while slower, is a whole lot safer -- especially if you're not comfortable with the local package manager.

So, here's the alternative route of compiling your own.

I'm really not a friend of CentOS or ROCKS. On the other hand, I freely admit that this is probably in large part due to not being used to it. But...during the course of this compilation my feelings have gone from mild annoyance to active, fiery hate. Mostly it has to do with how old everything is, and the difficult in updating anything in ROCKS.

This is probably my most massive compilation, owing to the number of packages I ended up compiling (a lot of these are probably optional). Guile in particular is very, very demanding.

The order in which things are done is not random


1. Look at 
http://gcc.gnu.org/install/configure.html
http://gcc.gnu.org/install/prerequisites.html

2. Download and untar all the sources
NOTE: you should, if possible, select mirrors based on where you are. However, sometimes you're stuck in a shell and just need to get those files downloaded.

mkdir ~/tmp/gcc
cd ~/tmp/gcc
wget http://www.netgull.com/gcc/releases/gcc-4.7.1/gcc-4.7.1.tar.gz
wget http://www.mpfr.org/mpfr-current/mpfr-3.1.1.tar.gz
wget ftp://ftp.gmplib.org/pub/gmp-5.0.5/gmp-5.0.5.tar.bz2
wget http://www.multiprecision.org/mpc/download/mpc-1.0.tar.gz

tar xvf gcc-4.7.1.tar.gz
tar xvf gmp-5.0.5.tar.bz2
tar xvf mpfr-3.1.1.tar.gz
tar xvf mpc-1.0.tar.gz

That was the easy bit.

I've also set up a directory called /share/apps/tools/gcc/ with proper permissions already (i.e. whoever is doing the compiling should have write access)

3. Build gmp
cd gmp-5.0.5/
mkdir build
cd build/
.././configure --prefix=/share/apps/tools/gcc/gmp --program-suffix=-gcc47
make
make install
make check


4. Build mpfr
cd ../../mpfr-3.1.1/
mkdir build
cd build/
.././configure --prefix=/share/apps/tools/gcc/mpfr --program-suffix=-gcc-4.7 --with-gmp=/share/apps/tools/gcc/gmp
make
make install

Libraries have been installed in:
   /share/apps/tools/gcc/mpfr/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the `LD_RUN_PATH' environment variable
     during linking
   - use the `-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.

make check

It looks like your compile is starting from scratch, but then it ends with:

====================
All 160 tests passed
(1 test was not run)
====================

5. Build mpc
cd ../../mpc-1.0/
mkdir build
cd build/
.././configure --prefix=/share/apps/tools/gcc/mpc --program-suffix=-gcc-4.7 --with-gmp=/share/apps/tools/gcc/gmp --with-mpfr=/share/apps/tools/gcc/mpfr
make
make install
make check

===================
All 64 tests passed
===================


6. Binutils
https://www.gnu.org/software/binutils/
'Often' (judging from forum postings...) you can use an older version of the binutils (ld, as) with a newer version of GCC. However, I need crt1.o when compilicing, it's part of glibc, and glibc requires newer versions of ld and as. So, here we go.
wget http://ftp.gnu.org/gnu/binutils/binutils-2.22.tar.gz
tar xvf binutils-2.22.tar.gz
cd binutils-2.22/
mkdir build
cd build/
.././configure --prefix=/share/apps/tools/gcc/binutils --program-suffix=-gcc-4.7
make
make install
make check

At some point I did:
cd /share/apps/tools/gcc/binutils/bin
ln -s ld-gcc-4.7 ld
ln -s as-gcc-4.7 as
ln -s ar-gcc-4.7 ar
but I don't think it's essential

7. Build gcc
cd ~/tmp/gcc-4.7.1
mkdir build
cd build/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/tools/gcc/gmp/lib:/share/apps/tools/gcc/mpc/lib:/share/apps/tools/gcc/mpfr/lib
.././configure --prefix=/share/apps/tools/gcc/gcc47 --program-suffix=-gcc-4.7 --with-gmp=/share/apps/tools/gcc/gmp --with-mpfr=/share/apps/tools/gcc/mpfr --with-mpc=/share/apps/tools/gcc/mpc LD_FOR_TARGET=ld-gcc-4.7 AS_FOR_TARGET=as-gcc-4.7 --with-ld=/share/apps/tools/gcc/binutils/bin/ld-gcc-4.7 --with-as=/share/apps/tools/gcc/binutils/bin/as-gcc-4.7 --with-ar=/share/apps/tools/gcc/binutils/bin/ar-gcc-4.7 AR_FOR_TARGET=ar-gcc-4.7

make

This takes A LONG TIME.

make install

To do 'make check' you need to have done the compilations of the optional packages (autogen and dependencies) below.
make check

You now have a shiny new compiler.

Using it is another matter. I'm working on that post at the moment...
You may find the following informative:

gcc-gcc-4.7 --print-search-dirs

install: /share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/

programs: =/share/apps/tools/gcc/gcc47/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1/:/share/apps/tools/gcc/gcc47/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1/:/share/apps/tools/gcc/gcc47/libexec/gcc/x86_64-unknown-linux-gnu/:/share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/:/share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/:/share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../../x86_64-unknown-linux-gnu/bin/x86_64-unknown-linux-gnu/4.7.1/:/share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../../x86_64-unknown-linux-gnu/bin/

libraries: =/share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/:/share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../../x86_64-unknown-linux-gnu/lib/x86_64-unknown-linux-gnu/4.7.1/:/share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../../x86_64-unknown-linux-gnu/lib/../lib64/:/share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../x86_64-unknown-linux-gnu/4.7.1/:/share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../../lib64/:/lib/x86_64-unknown-linux-gnu/4.7.1/:/lib/../lib64/:/usr/lib/x86_64-unknown-linux-gnu/4.7.1/:/usr/lib/../lib64/:/share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../../x86_64-unknown-linux-gnu/lib/:/share/apps/tools/gcc/gcc47/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../:/lib/:/usr/lib/


8. glibc
cd ~/tmp/gcc
wget http://ftp.gnu.org/gnu/glibc/glibc-2.14.tar.gz
tar xvf blic-2.14.tar.gz
cd glibc-2.14/
mkdir build
cd build/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/tools/gcc/mpc/lib:/share/apps/tools/gcc/mpfr/lib:/share/apps/tools/gcc/gmp/lib
.././configure --prefix=/share/apps/tools/gcc/glibc CC=gcc-gcc-4.7 --with-headers=/usr/src/kernels/2.6.18-238.19.1.el5-x86_64/include
make
make install

Using the current glibc 2.16 requires kernel>2.6.19 which is why I used 2.14. I don't want to fiddle with installing a new kernel on a production system which is shared between several users (and time zones...):
configure: error: GNU libc requires kernel header files from
Linux 2.6.19 or later to be installed before configuring.
The kernel header files are found usually in /usr/include/asm and
/usr/include/linux; make sure these directories use files from
Linux 2.6.19 or later.  This check uses <linux/version.h>, so
make sure that file was built correctly when installing the kernel header
files.  To use kernel headers not from /usr/include/linux, use the
configure option --with-headers.




Optional -- libtool, libunistring, libffi, bdw-gc, autogen and guile

I'm still stuck on Guile.

If you want to run make check on your gcc build, you need autogen, which needs guile, which needs libtool and libunistring

cd ~/tmp/gcc
wget http://ftpmirror.gnu.org/libtool/libtool-2.4.2.tar.gz
tar xvf libtool-2.4.2.tar.gz
cd libtool-2.4.2/
mkdir build
cd build/
.././configure --prefix=/share/apps/tools/gcc/libtool --program-suffix=-2.4.2
make
make install

make check
(this is very slow)
======================
All 122 tests passed
(2 tests were not run)
======================
and
## ------------- ##
## Test results. ##
## ------------- ##

104 tests behaved as expected.
22 tests were skipped.

cd ~/tmp/gcc
wget http://ftp.gnu.org/gnu/libunistring/libunistring-0.9.3.tar.gz
tar xvf libunistring-0.9.3.tar.gz 
cd libunistring-0.9.3/
mkdir build
cd build/
.././configure --prefix=/share/apps/tools/gcc/libunistring
make
make install

make check
====================
All 418 tests passed
====================

cd ~/tmp/gcc
wget ftp://sourceware.org/pub/libffi/libffi-3.0.11.tar.gz
tar xvf libffi-3.0.11.tar.gz 
cd libffi-3.0.11/
mkdir build
cd build/
.././configure --prefix=/share/apps/tools/gcc/libffi
make
make install
cp include -R /share/apps/tools/gcc/libffi/
cp ../src/x86/ffitarget.h /share/apps/tools/gcc/libffi/include/

cd ~/tmp/gcc
wget http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/gc.tar.gz
tar xvf gc.tar.gz
cd gc-7.2/
mkdir build
cd build/
 .././configure --prefix=/share/apps/tools/gcc/bdw-gc
make

make check
===================
All 10 tests passed
===================

wget ftp://ftp.gnu.org/gnu/guile/guile-2.0.6.tar.gz
tar xvf guile-2.0.6.tar.gz
cd guile-2.0.6
mkdir build
cd build/
.././configure --prefix=/share/apps/tools/gcc/guile --with-libltdl-prefix=/share/apps/tools/gcc/libtool --with-libgmp-prefix=/share/apps/tools/gcc/gmp --with-libunistring-prefix=/share/apps/tools/gcc/libunistring LIBFFI_CFLAGS=-I/share/apps/tools/gcc/libffi/include LIBFFI_LIBS=-L/share/apps/tools/gcc/libffi/lib BDW_GC_CFLAGS=-I/share/apps/tools/gcc/bdw-gc/include BDW_GC_LIBS=-L/share/apps/tools/gcc/bdw-gc/lib
make

Stuck:

../.././libguile/finalizers.c:167: error: static declaration of 'GC_set_finalizer_notifier' follows non-static declaration
/share/apps/tools/gcc/bdw-gc/include/gc/gc.h:177: error: previous declaration of 'GC_set_finalizer_notifier' was here
make[3]: *** [libguile_2.0_la-finalizers.lo] Error 1
make[3]: *** Waiting for unfinished jobs....
make[3]: Leaving directory `/home/me/tmp/gcc/guile-2.0.6/build/libguile'
make: *** [all] Error 2


wget http://ftp.gnu.org/gnu/autogen/rel5.16/autogen-5.16.tar.gz
tar xvf autogen-5.16.tar.gz
cd autogen-5.16/
mkdir build
cd build/






Errors:
checking for x86_64-unknown-linux-gnu-gcc... /home/me/tmp/gcc/gcc-4.7.1/build/./gcc/xgcc -B/home/me/tmp/gcc/gcc-4.7.1/build/./gcc/ -B/share/apps/tools/gcc/gcc47/x86_64-unknown-linux-gnu/bin/ -B/share/apps/tools/gcc/gcc47/x86_64-unknown-linux-gnu/lib/ -isystem /share/apps/tools/gcc/gcc47/x86_64-unknown-linux-gnu/include -isystem /share/apps/tools/gcc/gcc47/x86_64-unknown-linux-gnu/sys-include
checking for suffix of object files... configure: error: in `/home/me/tmp/gcc/gcc-4.7.1/build/x86_64-unknown-linux-gnu/libgcc':
configure: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details.
make[2]: *** [configure-stage1-target-libgcc] Error 1
make[2]: Leaving directory `/home/me/tmp/gcc/gcc-4.7.1/build'
make[1]: *** [stage1-bubble] Error 2
make[1]: Leaving directory `/home/me/tmp/gcc/gcc-4.7.1/build'
make: *** [all] Error 2

Solution: export LD_LIBRARY_PATH to include the mpc, mpfr and gmp lib dirs (see above)

Links to this post:
http://blog.naver.com/PostView.nhn?blogId=myunggyu&logNo=60173986972&categoryNo=0&parentCategoryNo=0&viewDate&currentPage=1&postListTopCurrentPage=1&userTopListOpen=true&userTopListCount=30&userTopListManageOpen=false&userTopListCurrentPage=1
http://flakrat.blogspot.com/2013/06/building-gcc-481-on-centos-59.html

03 June 2012

172. ECCE and a ROCKS cluster: step by step

This is quite similar to a recent post, but here's a step-by-step, detailed account of how to set up ECCE for remote job submission to a ROCKS 5.4.3 cluster (one front node, 4 subnodes)

Coming soon (give it a week): Setting up a virtualbox machine with ecce for (stubborn) windows and ROCKS/CentOS users.

What isn't shown are all the failed attempts and dead-ends I went through and encountered getting to the point where I had a working system. I compiled ECCE. I compiled tcsh. I tried compiling bsd csh, which required me to compile bmake etc. This stuff looks simple, and it is simple -- but not obvious.

NOTE: From the outside we connect to rocks.university.edu. From inside the cluster the submit node is called rocks.local, and the subnodes are called node0, node1, etc. Refer to this naming if you get confused late.

Step 1. Create the site in ecce
From the terminal, do
ecce -admin
and add a new machine

Don't forget to hit Add/Change queue to make the changes to the queue part take effect. Then hit Add/Change. Oh, and pay attention to the Allocation Account tick box - if it's ticked you can't submit anything unless you add an account.  Important: the machine name you add here is the local name or local IP of the submit node -- it's not the 'public' name or url. We'll add that somewhere else later. Don't forget to select the queue manager (I forgot in the screen shot)

Close.

2. Editing your CONFIG file
Since you're already in the terminal, go to ecce-v6.3/apps/siteconfig

Take a quick peek at your Machines file (no editing):
Machines line
rocks rocks.local Dell beo Intel 40:5 ssh :NWChem:Gaussian-03 MN:RD:SD:UN:PW:Q:TL

Take another look at rocks.Q -- there's probably nothing to edit here either:

rocks.Q# Queue details for rocks
Queues:    nwchem
nwchem|minProcessors:       1
nwchem|maxProcessors:       40
nwchem|runLimit:       100000
nwchem|memLimit:       0
nwchem|scratchLimit:       0
Finally, do some editing of your CONFIG.rocks file.

CONFIG.rocks

NWChem: /share/apps/nwchem/nwchem-6.1/bin/LINUX64/nwchem
Gaussian-03: /share/apps/gaussian/g09/g09
perlPath: /usr/bin/
qmgrPath: /opt/gridengine/bin/lx26-amd64
sourcefile: /home/rocksuser/.cshrc
frontendMachine: rocks.university.edu

SGE {
#$ -S /bin/csh
#$ -cwd
#$ -l h_rt=$walltime
#$ -l h_vmem=$memoryG
#$ -j y
#$ -pe orte $totalprocs  
}

NWChemEnvironment{
            LD_LIBRARY_PATH /usr/lib/openmpi/1.3.2-gcc/lib/
}

NWChemCommand {
        /opt/openmpi/bin/mpirun -n $totalprocs $nwchem $infile > $outfile
}
Gaussian-03Command {
    setenv GAUSS_SCRDIR /tmp
    setenv GAUSS_EXEDIR /share/apps/gaussian/g09/bsd:/share/apps/gaussian/g09/local:/share/apps/gaussian/g09/extras:/share/apps/gaussian/g09
        time /share/apps/gaussian/g09/g09 $infile  $outfile }

Obviously, your variables will be different. NOTE that memory is in gigabyte here. You could also do $memoryM for megabyte. Just adjust your launcher requirements accordingly.

Step 3. Making csh modifications on the ROCKS cluster
On the main node just use the root password (or become sudo) and move /etc/csh.cshrc and /etc/csh.login out of the way (backing them up is a good idea). It doesn't seem like you need to make any changes csh-wise to the subnodes.

Step 4. Finalising our set up
Start ecce the normal way (e.g. run ecce from the terminal)
In the Gateway, start the Machine Browser, highlight 'rocks' and click on Setup Remote Access.
Do what you're told.

Step 5. Submit to your heart's content!

NOTE: the option to set the amount of memory is not shown in the launcher window above: my mistake. You can edit you apps/siteconfig/Machines file and add :MM at the end of the line, e.g.
Dynamic beryllium       Unspecified     Unspecified     Unspecified     18:3    ssh     :NWChem:Gaussian-03     MN:RD:SD:UN:PW:Q:TL:MM

171. Building ECCE on ROCKS/CentOS


I installed ECCE on a couple of a single workstation with ROCKS, and remotely on a 40 core cluster with ROCKS. The local, workstation install worked fine. I never really bothered much about the cluster install, and only recently looked closer at it. Well, I can launch the 'gateway' but nothing else -- when I click on e.g. the organizer button I get the rocks version of an hourglass that never goes away -- and I don't get any error messages. Turning on logging doesn't yield anything either. 

Ergo, I figured that building it myself  may yield a different result. It didn't on the ROCKS cluster, but everything worked just fine on the single-node ROCKS training box I keep in my office.


CentOS is a bit dated, so you'll need to build your own apr and apr-util. Build apr:
cd /share/apps/utils/
wget http://mirror.mel.bkb.net.au/pub/apache//apr/apr-1.4.6.tar.gz
wget http://mirror.mel.bkb.net.au/pub/apache//apr/apr-util-1.4.1.tar.gz
tar xvf apr-1.4.6.tar.gz
cd apr-1.4.6/
./configure --prefix=/share/apps/utils/apr
make
make install
cd ../
tar xvf apr-util-1.4.1.tar.gz
cd apr-util-1.4.1/
./configure --prefix=/share/apps/utils/apr-util --with-apr=/share/apps/utils/apr/


Time for ecce.
First download 
cd /share/apps/ecce/
tar xvf ecce-v6.3-src.tar.bz2
cd ecce-v6.3/
export ECCE_HOME=/share/apps/ecce/ecce-v6.3
cd build/

Edit build_ecce
889       ./configure --prefix=$ECCE_HOME/${ECCE_SYSDIR}3rdparty/httpd --enable-rewrite --enable-dav --enable-ss-compression
to
889       ./configure --prefix=$ECCE_HOME/${ECCE_SYSDIR}3rdparty/httpd --enable-rewrite --enable-dav --enable-ss-compression --with-apr=/share/apps/utils/apr/bin/apr-1-config --with-apr-util=/share/apps/utils/apr-util/bin/apu-1-config

./build_ecce
Just follow the instructions i.e. hit return, over and over again. Answer no to running tests again. Then run build_ecce again:
./build_ecce
Now stuff should be building. Do this another six times. From the README:
"At this stage the script will build one 3rd party package per invocation,
exiting after each package is built.  In order the 3rd party packages that
will be built are:
1. Apache Xerces XML parser
2. Mesa OpenGL
3. wxWidgets C++ GUI toolkit
4. wxPython GUI toolkit
5. Apache HTTP web server"
The httpd build ends with a minor error about "lib" missing. It's fine.

The sixth time ECCE itself is built, and that's the step that takes by far the longest. It finishes with:
 ECCE built and distribution created in /share/apps/ecce/ecce-v6.3
On a single-node desktop I got it to run a seventh time it seemed. The last step finished with the message above though.

Go to your /share/apps/ecce/ecce-v6.3/ dir where you'll find install_ecce.v6.3.csh
Do the install
csh -f install_ecce.v6.3.csh
Follow the instructions.

You may also want to
sudo mv /etc/csh.* ~/
to get rid of the crappy csh config files.

Edit your ~/.bashrc:

alias startecceserver='csh -f /share/apps/ecce/ecce-v6.3/server/ecce-admin/start_ecce_server'
alias stopecceserver='csh -f /share/apps/ecce/ecce-v6.3/server/ecce-admin/stop_ecce_server
export ECCE_HOME=/share/apps/ecce/ecce-v6.3/apps
export PATH=$PATH:${ECCE_HOME}/scripts

and your ~/.cshrc:

setenv ECCE_HOME /share/apps/ecce/ecce-v6.3/apps
set PATH= (/share/apps/nwchem/nwchem-6.1/bin/LINUX64 $PATH)

On my single-node box I had to edit the apps/siteconfig/DataServers and replace eccetera.emsl.pnl.gov with localhost (two instances), as well as the apps/siteconfig/jndi.properties file (one instance).

In spite of the hassle on the single node box, everything works there -- the builder, organizer etc. all open just fine. The rocks cluster, looks fine, but doesn't work.

The ROCKS Cluster:
Everything seems to work fine -- starting ecce launches the gateway, but clicking on anything sees the centos version of the hourglass churn over and over for all eternity. Nothing happens.

I looked through these two threads, and i also tried the pre-built 32 bit binary. All without luck.

I've also tried editing the site_runtime file:
ECCE_MESA_OPENGL true
ECCE_MESA_EXCEPT x86_64:RedHat:Fedora:CentOS
(matches the lsb_release -is output)




02 June 2012

170. tcsh in ROCKS/CentOS with hardcoded csh.cshrc path

WHAT THIS POST DOES: It shows you how to compile your own tcsh which won't be looking at /etc/csh.cshrc. It doesn't show you how to set up the correct .cshrc files. But it certainly allows you to experiment.

Also, keep in mind that since each local node hdd has it's own /bin directory (not exported) you need to make similar changes on each node (i.e. change the /bin/csh symlink -- see below)

(The) csh (startup files) is(are) horribly broken on ROCKS 5.4.3.

For now I've solved it by just moving  /etc/csh.cshrc out of the way, but what we do here is symlink /bin/csh to our own tcsh which been hardcoded to use a non-standard configuration file, so that you can use the standard ROCKS tcsh with /etc/csh.cshrc and your own csh(tcsh) with your own config files.

To be clear: it's not the csh binary which is borked on ROCKS 5.4.3, but the configuration files.

There's a patch for the broken csh -- but when I applied it to a test computer it only got broken-er and prevented the csh from opening and staying open. Good way of getting locked out. So I'm not keen on doing the same thing on someone else's production cluster. Also, I've opted for tcsh since the csh sources come with a bsd style makefile, and I really can't deal with that right now.

What we'll do is hardcode the location of the csh.cshrc file and change it from /etc/csh.cshrc to /share/apps/utils/tcsh

sudo mkdir /share/apps/utils
sudo chown ${USER} /share/apps/utils
cd /share/apps/utils
 wget http://ftp.de.debian.org/debian/pool/main/t/tcsh/tcsh_6.18.01.orig.tar.gz
tar xvf tcsh_6.18.01.orig.tar.gz
 cd tcsh-6.18.01/

Time for find out what to change:
tail -n 9999 *|strings|egrep "/etc/csh.cshrc|<=="


tells us we need to have a look at pathnames.h
Change

124 # define _PATH_DOTCSHRC     "/etc/csh.cshrc"
to
124 # define _PATH_DOTCSHRC     "/share/apps/utils/custom.tcshrc"

./configure --prefix=/share/apps/utils/tcsh
make
make install

If all went well:
cat tcsh|strings|grep custom.tcsh
/share/apps/utils/custom.tcshrc
and
tree /share/apps/utils/tcsh -L 1
/share/apps/utils/tcsh
|-- bin
`-- share

Obviously, this doesn't really make much of a difference just yet. Now comes the scary part -- and you need root access for this:
 which csh
/bin/csh

ls /bin/csh -lah
lrwxrwxrwx 1 root root 4 Feb 23 16:54 /bin/csh -> tcsh
and here's the 'dangerous' stuff:
sudo rm /bin/csh
sudo ln -s /share/apps/utils/bin/tcsh /bin/csh
sudo chown root:root /bin/csh
sudo chmod 777 /bin/csh

Since /bin/csh isn't a binary but a symmlink to tcsh in the /bin directory, we just delete the symlink and create a new one.

We can now make whatever changes we want to our custom.tcshrc while still being able to easily change back to the old setup. I do recognise that we could just have edited /etc/csh.cshrc and /login.cshrc, but I for some reason feel a lot more comfortable using this method.



01 June 2012

170. Compiling PVM and XPVM on ROCKS 5.4.3

And we're back to ROCKS again.

NOTE: I haven't actually tested the binaries and libs compiled here. I think they should work. But I don't know for sure.

PQS works with openmpi, mpich and PVM. Our vanilla ROCKS install already has openmpi and mpich. There's a package called rocks-pvm, but the size is 50 kb and didn't seem to actually install anything precompiled, so I removed it and decided to go the compilation way instead.

The paths here are specific to the cluster I did this on, so customise as needed.

sudo mkdir /share/apps/pvm
sudo chown ${USER} /share/apps/pvm
cd /share/apps/pvm
wget http://www.netlib.org/pvm3/pvm3.4.6.tgz
tar xvf pvm3.4.6.tgz
cd pvm3/
export PVM_ROOT=`pwd`
make

Time to set up environment variables. Either edit /etc/profile or ~/.bashrc, depending on powers and reach., and add
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:share/apps/pvm/pvm3/lib/LINUX64
export PATH=$PATH:/share/apps/pvm/pvm3/bin/LINUX64
export PVM_ROOT=/share/apps/pvm/pvm3

Changes won't take effect until you source the file, or open a new terminal.


I profess to be ignorant about how to actually use pvm, so no testing just yet.


So I also stumbled across xpvm, which sounds (and looks) neat.

wget http://www.netlib.org/pvm3/xpvm/XPVM.src.1.2.5.tgz
cd /share/apps/pvm
tar xvf XPVM.src.1.2.5.tgz
cd xpvm/

Time to do some housekeeping before compiling:
It requires:
1. PVM 3.3.0 or later.
2. TCL 7.3 or later.
3. TK 3.6.1 or later.

I find
/usr/share/tk8.4
/usr/share/tcl8.4
so I might be ok. We just compiled pvm 3.4.6, so it should be alright.

First figure out where stuff is:

locate libtk|grpe so
/usr/lib/libtk.so
/usr/lib/libtk8.4.so
/usr/lib64/libtk.so
/usr/lib64/libtk8.4.so
locate libtcl|grep so
/usr/lib/libtcl.so
/usr/lib/libtcl8.4.so
/usr/lib64/libtcl.so
/usr/lib64/libtcl8.4.so
/usr/lib64/tclx8.4/libtclx8.4.so
These are fairly standard locations, so they should already be searched by ld -- no need to specify them thus.

Include is potentially worse since they'd typically need the development packages.
locate tcl | grep "\.h"
[..]
/usr/include/tcl-private/generic/tcl.h
[..]
locate tk|grep "\.h"
[..]
/usr/include/tk-private/generic/tk.h
[..]
So we /should/ be fine.

We also need the X11 libs and headers:
locate libX11

/usr/lib/libX11.so
/usr/lib/libX11.so.6
/usr/lib/libX11.so.6.2.0
/usr/lib64/libX11.so
/usr/lib64/libX11.so.6
/usr/lib64/libX11.so.6.2.0
locate X11|grep include

[..]
/usr/include/X11
[..]
Finally,

locate libdl
/lib/libdl-2.5.so
/lib/libdl.so.2
/lib64/libdl-2.5.so
/lib64/libdl.so.2
/usr/lib/libdl.a
/usr/lib/libdl.so
/usr/lib64/libdl.a
/usr/lib64/libdl.so
I'll specify the lib locations even though in some of these particular cases it isn't necessary:


Edit xpvm/src/Makefile.aimk and set (line numbers added by me):
19  PVMVERSION = -DUSE_PVM_34
Comment out line 42:
 42 #TCLTKHOME  =  $(HOME)/TCL
and
 44 TCLTKHOME  =   /usr/include
Change

 47 TCLINCL     =   -I$(TCLTKHOME)/tcl-private/generic
 48 TKINCL      =   -I$(TCLTKHOME)/tk-private/generic
and

 57 TCLLIBDIR   =   -L/usr/lib64/tclx8.4
 58 TKLIBDIR    =   -L/usr/lib64
and
 70 TCLLIB      =   -ltcl8.4
 71 TKLIB       =   -ltk8.4
and
83 XINCL       = -L/usr/include/X11
84 XLIBDIR     = -L/usr/lib64
and finally,
 96 SHLIB       = -ldl



Fell asleep? Time to get compiling.
export XPVM_ROOT=/share/apps/pvm/xpvm

export TCL_LIBRARY=
/usr/share/tcl8.4

export TK_LIBRARY=/usr/share/tk8.4

cd ${XPVM_ROOT}
make
[..]
Installing xpvm.tcl
Installing globs.tcl
Installing procs.tcl
Installing util.tcl
make[1]: Leaving directory `/share/apps/pvm/xpvm/src/LINUX64'

The beautiful thing is that the xpvm binary automagically ends up in the pvm3/bin/LINUX64 directory, so no need to fiddle with path.



In theory everything should work now if you log in with ssh -XC. However I get
xpvm
libpvm [pid2607] /tmp/pvmd.502: No such file or directory
libpvm [pid2607]: Can't Start PVM: Can't start pvmd
I'm not actually running -- nor have I ever run -- anything with pvm.

touch /tmp/pvmd.502
xpvm
libpvm [pid4219]: mksocs() read addr file: wrong length read
Connecting to PVMD already running... libpvm [pid4219]: mksocs() read addr file: wrong length read
libpvm [pid4219]: mksocs() read addr file: wrong length read
libpvm [pid4219]: mksocs() read addr file: wrong length read
libpvm [pid4219]: pvm_mytid(): Can't contact local daemon
libpvm [pid4219]: Error Joining PVM: Can't contact local daemon
I mean, it looks like it should work, once pvm is being used.

08 May 2012

145. Rasmol on ROCKS 5.4.3

By request:

This shows how to install rasmol on ROCKS 5.4.3 which is based on CentOS.

Rasmol 2.7.5.2
wget http://www.rasmol.org/software/RasMol_Latest.tar.gz
tar xvf
cd RasMol-2.7.5.2/src/
./build_all.sh

If all went well
sh rasmol_run.sh
should start it

NOTE: this doesn't install rasmol anywhere -- it builds it and allows you to run it from the src directory. Put an alias in your ~/.bashrc  pointing towards the rasmol_run script e.g.
alias rasmol='sh /home/me/.rasmol/RasMol-2.7.5.2/src/rasmol_run.sh'

Note that prior to this you will have to set up a working build environment, e.g.
sudo yum install gcc gcc-c++ gcc-gfortran cpp

03 May 2012

133. Compiling Openbabel 2.3.1 and CMake on ROCKS/centos

Open Babel is a convenient tool for converting between chemistry-related file formats. Sadly, it's not included in ROCKS 5.4.3 from what I can see and I could only install a severely outdated rpm package which doesn't support gaussian 09 and nwchem well.

The easiest way to compile openbabel is by using cmake.

Cmake:
wget http://www.cmake.org/files/v2.8/cmake-2.8.8.tar.gz
tar -xvf cmake-2.8.8.tar.gz
cd cmake-2.8.8.8/
./configure --prefix=/home/me/.cmake
make
make install

Add the following to your ~/.bashrc and source it:
export PATH=$PATH:/home/me/.cmake/bin

Note: this works in Scientific Linux (Boron) 5.4 as well

Openbabel
wget http://downloads.sourceforge.net/project/openbabel/openbabel/2.3.1/openbabel-2.3.1.tar.gz?r=http%3A%2F%2Fopenbabel.org%2Fwiki%2FGet_Open_Babel&ts=1336048328&use_mirror=aarnet
tar -xvf openbabel-2.3.1.tar.gz
cd openbabel-2.3.1/
cmake -DCMAKE_INSTALL_PREFIX:PATH=/home/me/.babel
make
make install

Add the following to your ~/.bashrc and source it:
export PATH=$PATH:/home/me/.babel/bin

Note: this works in Scientific Linux (Boron) 5.4 as well

Do
babel -L formats 
to get a list of formats.

20 March 2012

113. Using ECCE to run nwchem jobs

EDIT: This post is getting messier as I'm hammering things out...but I've gotten everything to work in the end, so please persist.  The workflow described below is not the ideal one, but it'll get you started. I'll link here when I put up a newer, more reasonable tutorial.

EDIT2: I'm really warming to ECCE as I'm learning more about it. I still think it'd be nice if it was open source, and I can't understand why it has to be reliant on csh (which is pretty much broken on ROCKS, and uncomfortable at the best of times), but it's pretty neat once you've got all the details ironed out. Error feedback/report could be better though.

EDIT 3: ECCE is going open source the (northern) summer of 2012! As users we no longer have any excuses to complain.

Here's a quick introduction to getting started with using ECCE as the interface to nwchem, similar to how gaussview can be used to set up gaussian jobs.

This presumes that you've set up ECCE and preferably compiled your own version of nwchem:
http://verahill.blogspot.com.au/2012/03/ecce-on-debian-but-not-on-rockscentos.html
http://verahill.blogspot.com.au/2012/03/nwchem-61-with-openmpi-on-rocks.html
http://verahill.blogspot.com.au/2012/01/debian-testing-64-wheezy-nwhchem.html


##Important##
Once I had figured all of this out I rebuilt nwchem and re-installed ecce in the proper locations. You might want to do the same.

A. If you're going to use several nodes you should put nwchem in the same position in the file system hierarchy on all nodes e.g.
/opt/nwchem/nwchem-6.0/bin/LINUX64/nwchem

Also, make sure you share a folder (see how to use NFS) between the nodes which you can use for run time files e.g. /work

EDIT 4: This (probably) isn't necessary. In fact, using NFS in the wrong way will slow things down.

Set the permissions right (chown your user and set to 777 -- 755 is enough for nfs sharing between debian nodes, but between ROCKS and Debian you seem to need 777), and open your firewall on all ports for communication between the nodes.

B. Make sure that ECCE_HOME has been set in ~/.bashrc e.g.
export ECCE_HOME=/opt/ecce/apps

and in ~/.cshrc
setenv ECCE_HOME=/opt/ecce/apps

C.
edit /opt/ecce/apps/siteconfig/submit.site (location depends on where you install ecce)
Change lines 65+ from
#NWChemCommand {
#  $nwchem $infile > $outfile
#}
to (for multiple nodes)
NWChemCommand {
mpirun -hostfile /work/hosts.list -n $totalprocs --preload-binary /opt/nwchem/nwchem-6.0/bin/LINUX64/nwchem $infile > $outfile
}
to use mpirun for parallel job submissions and assuming you have a hosts file in /work. For running on a single node you can use


NWChemCommand {
mpirun  -n $totalprocs $nwchem  $infile > $outfile
}

user either --preload-binary /opt/nwchem/nwchem-6.0/bin/LINUX64/nwchem or $nwchem -- see what works for you. You probably can't do preload if you're running different linux distros (e.g. debian and centos)

My hosts.list looks like this:

tantalum slots=4 max_slots=4
beryllium slots=4 max_slots=5

Make sure that you don't accidentally put 2 jobs on node 0, then 2 jobs on node 1, then another 2 jobs on node 0, since they won't be consecutively numbered and will crash armci. You can avoid this by setting slots and max_slots to the same number.


D.
You may have to edit /etc/openmpi/openmpi-mca-params.conf if you have several (real or virtual) interfaces and add e.g.


btl=tcp,sm,self
btl_tcp_if_include=eth1,eth2
btl_tcp_if_exclude=eth0,virtbr0


Start ECCE:
First start the server
csh /home/me/tmp/ecce/ecce-v6.2/server/ecce-utils/start_ecce_server
then launch ecce

ecce

This will launch what the ecce people call the 'gateway':
The Gateway

0. Make sure you've got your machine set up
Click on Machine browser
Make sure that you can connect to the node e.g. by clicking on disk usage

Set the application paths. Don't fiddle with nodes -- just change number of processors to the total for all nodes.



1. Draw SiCl4 
Click on the Builder in the Gateway, which gives you the following:
The builder window

Click on More to get the periodic table which gives you access to Si

Select Geometry -- here, Tetrahedral

Si -- with four 'nubs' (yup, that's what the ecce ppl call them)

Time to attach Cl atoms to the nubs. Select Cl and pick Terminal geometry.

Click on a 'nub' to replace it with a Cl

And do it until you've replaced all 'nubs'. Hold down right mouse button to rotate

Click on the broom next to the bond menu on the right to pre-optimize  the structure using MM

And save. You will probably be limited to saving your jobs in folders below the ecce  folder.


2. Set up your job
Click on the Organizer icon in the 'gateway', which takes you here:

Click on the first icon, Editor

Focus on selecting Theory and Run type. Here's we'll do a geometry optimisation.

Click on Details for Theory

Click on Details for Run type

Constraints are optional

In the organizer, click on the third icon to set the basis set. Defined atoms for a particular basis set are indicated by a n orange right lower corner

You can get Details about the basis set

If you don't have a Navy Triangle you can't run. Click on Editor and see what might be wrong.

Ready to run. Click on Launch.
4. Running
I'm still working on enabling more than a single core...
Once you've clicked on launch you'll get

 If you click on viewer you can monitor the job

Optimization in progress
5. Re-launch a job at higher theory
In the Organizer, select your last job and then click on Edit, Duplicate Setup with Last Geometry
You then get a copy to edit

Change the basis set, save, then click on Final Edit

This is the nwchem input file in a vim instance

Add a line to the end, saying task scf freq to calculate the vibrations (there's another job option called geovib which does optim+freq , but here we do it by hand)

Launch

Running...

You can now look at the vibrations

And you can visualise MOs -- here's the HOMO which looks like all isolated p orbitals on the chlorine

You can also calculate 'properties'

These include GIAO shielding

Performance:
Here's phenol (scf/6-31g*) across three gigabit-linked nodes. The dotted line denotes node boundaries.


Here's a number of alkanes (scf/6-31g) on 4 cores on a single node: