24 February 2012

75. [solved] Problems with homebuilt nwchem 6.1 on Debian Testing


EDIT 18 May 2012: 
It's now been solved
Compiling nwchem 6.1 with internal libs on debian:
 http://verahill.blogspot.com.au/2012/05/compiling-nwchem-61-with-internal-libs.html
Compiling nwchem 6.1 with openblas on debian:
 http://verahill.blogspot.com.au/2012/05/building-nwchem-61-on-debian.html


UPDATE April 2012: Someone else is having the same problem: http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id435/ . Binaries built on ROCKS 5.4.3 work, but binaries built on debian testing don't: the gfortran version is GNU Fortran (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50). On debian, which yields a segfaulting binary, the version is GNU Fortran (Debian 4.6.3-1) 4.6.3


Nwchem 6.1 was released in February this year. The build instructions are ALMOST the same as for Nwchem 6.0 -- the difference is the use of export USE_MPIF4=y. Well, that and me not having much success in actually USING nwchem as opposed to building it.

There is now an nwchem version with mpi support in the debian unstable repos. I have not used or tested it.

I can build the 32 bit version of nwchem 6.1 just fine.Building the 64 bit version works absolutely fine too. However, once you attempt to run, it crashes. Ergo, this is NOT A SOLUTION. It's a bunch of error messages so that more seasoned and skilled operators than I may offer a solution. If you have an option, build and use version 6.0 instead.

Update:
I built a version with openmpi support as well, which also segfaults:
Here are the build instructions:

sudo apt-get install openmpi-bin openmpi-dev
export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=/home/me/tmp/nwchem-6.1
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export USE_MPI=y
export USE_MPIF=y
export MPI_LOC=/usr/lib/openmpi
export MPI_INCLUDE=/usr/lib/openmpi/include
export USE_MPIF4=y
export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/openmpi/lib
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77"
cd $NWCHEM_TOP/src
make clean
make  nwchem_config
make  FC=gfortran


and here's what happens on execution

[beryllium:24650] *** Process received signal ***
[beryllium:24650] Signal: Segmentation fault (11)
[beryllium:24650] Signal code: Address not mapped (1)
[beryllium:24650] Failing at address: 0x44000098
[beryllium:24650] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x324f0) [0x7f08deeb84f0]
[beryllium:24650] [ 1] /usr/lib/libmpi.so.0(PMPI_Comm_set_errhandler+0x60) [0x7f08e0526c30]
[beryllium:24650] [ 2] ./nwchem() [0x292d504]
[beryllium:24650] [ 3] ./nwchem() [0x292d596]
[beryllium:24650] [ 4] ./nwchem() [0x40657a]
[beryllium:24650] [ 5] ./nwchem() [0x406f7d]
[beryllium:24650] [ 6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f08deea4ead]
[beryllium:24650] [ 7] ./nwchem() [0x405189]
[beryllium:24650] *** End of error message ***


This only happens on 64 bit debian -- 32 bit deb and 64 bit centos are both fine

OLD POST:

--start here --

Here's what I've done so far

Put a hold on your mpich2 and mpich2-dev packages (see e.g. here for more details)
1. edit your /etc/apt/sources.list to allow packages from stable e.g.

deb ftp://ftp.au.debian.org/debian/ testing main contrib non-fre
deb ftp://ftp.au.debian.org/debian/ stable main contrib non-free

2. create an /etc/apt/preferences file e.g.

Package: *
Pin: release a=testing
Pin-Priority: 990
Package: *
Pin: release a=stable
Pin-Priority: -10
2. install v 1.2 explicitly
sudo apt-get update && sudo apt-get install mpich2=1.2.1.1-5 libmpich2-dev=1.2.1.1-5

3. put a hold on the packages

sudo su
echo "mpich2 hold"|dpkg --set-selections
echo "libmpich2-dev hold"|dpkg --set-selections

exit

Download the nwchem source
cd ~
wget http://www.nwchem-sw.org/images/Nwchem-6.1-2012-Feb-10.tar.gz
tar -xvf Nwchem-6.1-2012-Feb-10.tar.gz
cd nwchem-6.1

create buildconf.sh in ~/nwchem-6.1
Put the following in it (for 64 bit system):
export LARGE_FILES=TRUE
export TCGRSH=/usr/local/bin/ssh
export NWCHEM_TOP=/home/me/nwchem-6.1
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include/mpich2
export LIBMPI="-lmpich -lfmpich"
export NWCHEM_MODULES="all"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran

Build
Start the build
sh buildconf.sh

Building takes about half an hour. Everything builds fine. However, running -- with or without mpdrun -- causes the error below.

It doesn't matter how much memory I allocated. The error seems to have something to do with "Invalid write of size 8" which I understand to mean that pointers are 8 bytes long but don't have 8 bytes allocated to them. But then I'm not an expert.

Would it have something to do with
USE_MPIF4=y?

Without USE_MPIF4 I end up with the stupid_* error messages (stupid_sum, stupid_task etc.)



Error:
running e.g.  mpdrun -n 2 nwchem nwchem.nw gives:

      Screening Tolerance Information
      -------------------------------
          Density screening/tol_rho: 1.00D-10
          AO Gaussian exp screening on grid/accAOfunc:  14
          CD Gaussian exp screening on grid/accCDfunc:  20
          XC Gaussian exp screening on grid/accXCfunc:  20
          Schwarz screening/accCoul: 1.00D-08

0:Segmentation Violation error, status=: 11
(rank:0 hostname:tantalum pid:19944):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0



More detail:
Running just nwchem nwchem.nw gives a bit more detail:

      Screening Tolerance Information
      -------------------------------
          Density screening/tol_rho: 1.00D-10
          AO Gaussian exp screening on grid/accAOfunc:  14
          CD Gaussian exp screening on grid/accCDfunc:  20
          XC Gaussian exp screening on grid/accXCfunc:  20
          Schwarz screening/accCoul: 1.00D-08

0:Segmentation Violation error, status=: 11
(rank:0 hostname:tantalum pid:19676):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
application called MPI_Abort(comm=0x84000001, 11) - process 0
*** glibc detected *** nwchem: corrupted double-linked list: 0x000000010ac34880 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x75ab6)[0x7f597b129ab6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7754c)[0x7f597b12b54c]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7f597b12e7ec]
/usr/lib/x86_64-linux-gnu/libgfortran.so.3(+0xcc811)[0x7f597bbd5811]
/usr/lib/x86_64-linux-gnu/libgfortran.so.3(+0xdba7f)[0x7f597bbe4a7f]
/usr/lib/x86_64-linux-gnu/libgfortran.so.3(+0xdbbaa)[0x7f597bbe4baa]
/usr/lib/x86_64-linux-gnu/libgfortran.so.3(+0x1ab09)[0x7f597bb23b09]
/lib64/ld-linux-x86-64.so.2(+0xe21c)[0x7f597c42421c]
/lib/x86_64-linux-gnu/libc.so.6(+0x36df2)[0x7f597b0eadf2]
/lib/x86_64-linux-gnu/libc.so.6(+0x36e45)[0x7f597b0eae45]
/usr/lib/libmpich.so.1.2(+0xbedc9)[0x7f597c101dc9]
/usr/lib/libmpich.so.1.2(MPID_Abort+0x6d)[0x7f597c122d0d]
/usr/lib/libmpich.so.1.2(PMPI_Abort+0x2f5)[0x7f597c090805]
nwchem[0x2896591]
nwchem[0x2883883]
/lib/x86_64-linux-gnu/libc.so.6(+0x324f0)[0x7f597b0e64f0]
nwchem[0x29b6043]
nwchem[0x27a04a0]
nwchem[0x27a3955]
nwchem[0x271492b]
nwchem[0x5cf410]
nwchem[0x5b3d18]
nwchem[0x5a9735]
nwchem[0x5a99b6]
nwchem[0x418ee8]
nwchemAborted

And more detail:
valgrind nwchem nwchem.nw



==19910==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                          
==19910==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)                                                                      
==19910==    by 0x5A9734: nwdft_ (nwdft.F:274)                                                                                
==19910==    by 0x5A99B5: dft_energy_ (nwdft.F:18)                                                                            
==19910==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)                                                              
==19910==    by 0x41A57B: task_energy_ (task_energy.F:95)                                                                    
==19910==    by 0x40DAD2: task_ (task.F:337)                                                                                  
==19910==    by 0x4068F5: MAIN__ (nwchem.F:251)                                                                              
==19910==  Address 0x199750a0 is not stack'd, malloc'd or (recently) free'd                                                  
==19910==                                                                                                                    
==19910== Invalid write of size 8                                                                                            
==19910==    at 0x29B6043: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                              
==19910==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                            
==19910==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                          
==19910==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                            
==19910==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                          
==19910==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)                                                                      
==19910==    by 0x5A9734: nwdft_ (nwdft.F:274)                                                                                
==19910==    by 0x5A99B5: dft_energy_ (nwdft.F:18)                                                                            
==19910==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)                                                              
==19910==    by 0x41A57B: task_energy_ (task_energy.F:95)                                                                    
==19910==    by 0x40DAD2: task_ (task.F:337)                                                                                  
==19910==    by 0x4068F5: MAIN__ (nwchem.F:251)                                                                              
==19910==  Address 0x199750b0 is not stack'd, malloc'd or (recently) free'd                                                  
==19910==                                                                                                                    
0:Segmentation Violation error, status=: 11                                                                                  
(rank:0 hostname:tantalum pid:19910):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
application called MPI_Abort(comm=0x84000001, 11) - process 0                                                                
==19910==                                                                                                                    
==19910== HEAP SUMMARY:                                                                                                      
==19910==     in use at exit: 4,303,284,335 bytes in 695 blocks                                                              
==19910==   total heap usage: 2,132 allocs, 1,437 frees, 4,305,897,103 bytes allocated                                        
==19910==                                                                                                                    
==19910== LEAK SUMMARY:                                                                                          
==19910==    definitely lost: 24 bytes in 1 blocks
==19910==    indirectly lost: 512 bytes in 1 blocks
==19910==      possibly lost: 0 bytes in 0 blocks
==19910==    still reachable: 4,303,283,799 bytes in 693 blocks
==19910==         suppressed: 0 bytes in 0 blocks
==19910== Rerun with --leak-check=full to see details of leaked memory
==19910==
==19910== For counts of detected and suppressed errors, rerun with: -v
==19910== Use --track-origins=yes to see where uninitialised values come from
==19910== ERROR SUMMARY: 662 errors from 9 contexts (suppressed: 4 from 4)


And way too much detail:
valgrind --leak-check=full --track-origins=yes --log-file=valgrind.log nwchem nwchem.nw
==20005== Memcheck, a memory error detector

==20005== Memcheck, a memory error detector
==20005== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==20005== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==20005== Command: nwchem nwchem.nw
==20005== Parent PID: 19563
==20005==
==20005== Warning: set address range perms: large range [0x3952b040, 0x13352b110) (undefined)
==20005== Syscall param write(buf) points to uninitialised byte(s)
==20005==    at 0x12803980: __write_nocancel (syscall-template.S:82)
==20005==    by 0x127A8B92: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1276)
==20005==    by 0x127A8809: new_do_write (fileops.c:530)
==20005==    by 0x127A8B34: _IO_do_write@@GLIBC_2.2.5 (fileops.c:503)
==20005==    by 0x127A9347: _IO_file_sync@@GLIBC_2.2.5 (fileops.c:905)
==20005==    by 0x1279DE19: fflush (iofflush.c:43)
==20005==    by 0xA3CF1F: hdbm_file_flush (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x8C84AA: rtdb_seq_put (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x8C55BD: rtdb_put (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x8C48B2: rtdb_put_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x9EAFAC: util_set_rtdb_state_ (util_rtdb_state.F:40)
==20005==    by 0x4067FD: MAIN__ (nwchem.F:222)
==20005==  Address 0x10950022 is not stack'd, malloc'd or (recently) free'd
==20005==  Uninitialised value was created by a stack allocation
==20005==    at 0x8C6130: rtdb_seq_put_info (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6048: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975040 is 0 bytes after a block of size 42,008,576 alloc'd
==20005==    at 0x1155679D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20005==    by 0x291F69B: morecore (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x291F793: kr_malloc (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A492B: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B604D: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975050 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6052: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975060 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6057: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975070 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6035: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975080 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6039: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975090 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B603E: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x199750a0 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6043: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x199750b0 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005==
==20005== HEAP SUMMARY:
==20005==     in use at exit: 4,303,284,355 bytes in 697 blocks
==20005==   total heap usage: 2,135 allocs, 1,438 frees, 4,305,900,787 bytes allocated
==20005==
==20005== 536 (24 direct, 512 indirect) bytes in 1 blocks are definitely lost in loss record 662 of 679
==20005==    at 0x1155679D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20005==    by 0x11D6E128: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==20005==    by 0x11E332F8: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==20005==    by 0x11E2C573: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==20005==    by 0x11D6BB47: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==20005==    by 0x1093CCCF: call_init (dl-init.c:85)
==20005==    by 0x1093CDC6: _dl_init (dl-init.c:134)
==20005==    by 0x1092FB29: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so)
==20005==    by 0x1: ???
==20005==    by 0x7FF00033A: ???
==20005==    by 0x7FF000341: ???
==20005==
==20005== LEAK SUMMARY:
==20005==    definitely lost: 24 bytes in 1 blocks
==20005==    indirectly lost: 512 bytes in 1 blocks
==20005==      possibly lost: 0 bytes in 0 blocks
==20005==    still reachable: 4,303,283,819 bytes in 695 blocks
==20005==         suppressed: 0 bytes in 0 blocks
==20005== Reachable blocks (those to which a pointer was found) are not shown.
==20005== To see them, rerun with: --leak-check=full --show-reachable=yes
==20005==
==20005== For counts of detected and suppressed errors, rerun with: -v
==20005== ERROR SUMMARY: 663 errors from 10 contexts (suppressed: 4 from 4)



For comparison, here's using nwchem 6.0:
NOTE that this version works just fine and runs to completion without error messages normally.
valgrind --leak-check=full --track-origins=yes --log-file=valgrind.log nwchem nwchem.nw



==21014== Memcheck, a memory error detector
==21014== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==21014== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==21014== Command: nwchem nwchem.nw
==21014== Parent PID: 20854
==21014==
==21014== Warning: set address range perms: large range [0x3952b040, 0x13352b110) (undefined)
==21014== Syscall param write(buf) points to uninitialised byte(s)
==21014==    at 0x11673980: __write_nocancel (syscall-template.S:82)
==21014==    by 0x11618B92: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1276)
==21014==    by 0x11618809: new_do_write (fileops.c:530)
==21014==    by 0x11618B34: _IO_do_write@@GLIBC_2.2.5 (fileops.c:503)
==21014==    by 0x11619347: _IO_file_sync@@GLIBC_2.2.5 (fileops.c:905)
==21014==    by 0x1160DE19: fflush (iofflush.c:43)
==21014==    by 0x8A16D7: hdbm_file_flush (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==    by 0x83C973: rtdb_seq_put (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==    by 0x83ADA6: rtdb_put (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==    by 0x83A13F: rtdb_put_ (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==    by 0x85C72C: util_set_rtdb_state_ (util_rtdb_state.F:40)
==21014==    by 0x40636B: MAIN__ (nwchem.F:223)
==21014==  Address 0xf7c0022 is not stack'd, malloc'd or (recently) free'd
==21014==  Uninitialised value was created by a stack allocation
==21014==    at 0x83B7D0: rtdb_seq_put_info (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==
==21014== Conditional jump or move depends on uninitialised value(s)
==21014==    at 0x8A5E1F: sym_op_class_name_ (sym_op_cname.F:26)
==21014==    by 0x83FB41: sym_op_classify_ (sym_op_clsfy.F:49)
==21014==    by 0x845AE9: sym_movecs_adapt_ (sym_mo_adapt.F:77)
==21014==    by 0x719A4A: scf_movecs_sym_adapt_ (scf_sym_adap.F:70)
==21014==    by 0x731028: scf_vectors_guess_ (scf_vec_guess.F:403)
==21014==    by 0x58D4E6: dft_scf_ (dft_scf.F:526)
==21014==    by 0x58B67B: dft_main0d_ (dft_main0d.F:537)
==21014==    by 0x5818B3: nwdft_ (nwdft.F:309)
==21014==    by 0x581B24: dft_energy_ (nwdft.F:18)
==21014==    by 0x4174D7: task_energy_doit_ (task_energy.F:229)
==21014==    by 0x418AEB: task_energy_ (task_energy.F:74)
==21014==    by 0x40C646: task_ (task.F:301)
==21014==  Uninitialised value was created by a stack allocation
==21014==    at 0x83F9BD: sym_op_classify_ (sym_op_clsfy.F:32)
==21014==
==21014==
==21014== HEAP SUMMARY:
==21014==     in use at exit: 4,254,665,672 bytes in 20 blocks
==21014==   total heap usage: 10,491 allocs, 10,471 frees, 4,275,242,731 bytes allocated
==21014==
==21014== 17 bytes in 2 blocks are definitely lost in loss record 9 of 19
==21014==    at 0x103C679D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21014==    by 0x11626881: strdup (strdup.c:43)
==21014==    by 0x24D33F7: pbeginf_ (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==    by 0x405F53: MAIN__ (nwchem.F:66)
==21014==    by 0x406964: main (nwchem.F:336)
==21014==
==21014== 536 (24 direct, 512 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 19
==21014==    at 0x103C679D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21014==    by 0x10BDE128: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==21014==    by 0x10CA32F8: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==21014==    by 0x10C9C573: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==21014==    by 0x10BDBB47: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==21014==    by 0xF7ACCCF: call_init (dl-init.c:85)
==21014==    by 0xF7ACDC6: _dl_init (dl-init.c:134)
==21014==    by 0xF79FB29: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so)
==21014==    by 0x1: ???
==21014==    by 0x7FF00032A: ???
==21014==    by 0x7FF000331: ???
==21014==
==21014== LEAK SUMMARY:
==21014==    definitely lost: 41 bytes in 3 blocks
==21014==    indirectly lost: 512 bytes in 1 blocks
==21014==      possibly lost: 0 bytes in 0 blocks
==21014==    still reachable: 4,254,665,119 bytes in 16 blocks
==21014==         suppressed: 0 bytes in 0 blocks
==21014== Reachable blocks (those to which a pointer was found) are not shown.
==21014== To see them, rerun with: --leak-check=full --show-reachable=yes
==21014==
==21014== For counts of detected and suppressed errors, rerun with: -v
==21014== ERROR SUMMARY: 178 errors from 4 contexts (suppressed: 4 from 4)

2 comments:

  1. have the same problem on CentOS5.7

    ReplyDelete
    Replies
    1. I've managed to compile nwhcem 6.1 on ROCKS 5.4.3 which runs CentOS 5.6: http://verahill.blogspot.com.au/2012/03/nwchem-61-with-openmpi-on-rocks.html

      As I'm not great friend of CentOS, I don't know what may have changed between the two releases.

      It's interesting that the more up-to-date releases are incompatible with 6.1 though...

      Delete