Page 1 of 1

abinit 6.8.1 + gfortran + mpich2 = no compile

Posted: Thu Jul 07, 2011 10:57 pm
by l0rd_hex
Hi abiniters,

I'm having an issue compiling abinit 6.8.1 in parallel, I was able to compile it with no issues in serial.

I've cleaned the directory, and running configure like so:

Code: Select all

./configure --enable-mpi --with-mpi-prefix=/usr 


I'm getting positive results regarding MPI from the configure log:

checking whether the C compiler supports MPI... yes
checking whether the C++ compiler supports MPI... yes
checking whether the Fortran Compiler supports MPI... yes
checking whether MPI is usable... yes
configure: enabling MPI I/O support
checking whether to build MPI code... yes
checking whether to build MPI I/O code... yes

when I run

Code: Select all

make mj4
or just

Code: Select all

make


It eventually fails with this error:

Making all in 12_hide_mpi
make[3]: Entering directory `/home/XXX/Download/abinit-6.8.1/src/12_hide_mpi'
/usr/bin/mpif90 -DHAVE_CONFIG_H -I. -I../.. -I../../src/incs -I../../src/incs -ffree-form -J/home/XXX/Download/abinit-6.8.1/src/mods -O2 -mtune=native -march=native -mfpmath=sse -g -ffree-line-length-none -c -o m_xmpi.o m_xmpi.F90
m_xmpi.F90:2193.28:

call MPI_Type_hvector(ny,1,stride_x,column_type,plane_type,mpi_err)
1
Error: Type mismatch in argument 'v2' at (1); passed INTEGER(4) to INTEGER(8)
m_xmpi.F90:2195.28:

call MPI_Type_hvector(nz,1,ldy*stride_x,plane_type,vol_type,mpi_err)
1
Error: Type mismatch in argument 'v2' at (1); passed INTEGER(4) to INTEGER(8)
m_xmpi.F90:2197.28:

call MPI_Type_hvector(na,1,ldz*ldy*stride_x,vol_type,new_type,mpi_err)
1
Error: Type mismatch in argument 'v2' at (1); passed INTEGER(4) to INTEGER(8)
m_xmpi.F90:2113.28:

call MPI_Type_hvector(ny,1,stride_x,column_type,plane_type,mpi_err)
1
Error: Type mismatch in argument 'v2' at (1); passed INTEGER(4) to INTEGER(8)
m_xmpi.F90:2115.28:

call MPI_Type_hvector(nz,1,ldy*stride_x,plane_type,new_type,mpi_err)
1
Error: Type mismatch in argument 'v2' at (1); passed INTEGER(4) to INTEGER(8)
m_xmpi.F90:2036.28:

call MPI_Type_hvector(ny,1,stride_x,column_type,new_type,mpi_err)
1
Error: Type mismatch in argument 'v2' at (1); passed INTEGER(4) to INTEGER(8)
make[3]: *** [m_xmpi.o] Error 1
make[3]: Leaving directory `/home/XXX/Download/abinit-6.8.1/src/12_hide_mpi'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/XXX/Download/abinit-6.8.1/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/XXX/Download/abinit-6.8.1'
make: *** [all] Error 2

I'm not too sure what the problem is, I was wondering maybe if it was a 32-bit vs 64-bit problem (just because of the integer size mismatch).

Has anyone seen this or can offer some advice?

The machine is an Intel Xeon 64-bit install of Fedora 10 (yes, I know it's out of date but I'm hoping that's not the issue) with 16 GB of RAM.

Code: Select all

uname -a = Linux XXX 2.6.27.41-170.2.117.fc10.x86_64 #1 SMP Thu Dec 10 10:36:29 EST 2009 x86_64 x86_64 x86_64 GNU/Linux


EDIT: I forgot to mention, that I also tried it with --enable-64bit-flags in the configure settings, no dice.

Thanks for taking the time to read,

John

Re: abinit 6.8.1 + gfortran + mpich2 = no compile

Posted: Thu Jul 07, 2011 11:53 pm
by l0rd_hex
Hmm, I've got it compiled with --with-mpi-level=1 this should work for now.


Thanks,
John

Re: abinit 6.8.1 + gfortran + mpich2 = no compile

Posted: Fri Jul 08, 2011 10:32 pm
by jbeuken
Hi John,

I just encountered the same problem (not yet understood) with the development version 6.9.1 of abinit, mpich2 1.4.0 and gcc 4.6.1 :(

it still works with version 6.9.1, mpich2 1.3.1 and gcc 4.5.3 !

What versions of mpich2 and gcc are you using ?

regards

jmb

Re: abinit 6.8.1 + gfortran + mpich2 = no compile

Posted: Fri Jul 08, 2011 11:13 pm
by jbeuken
it compiles with 6.9.1, mpich2 1.3.2p1 and gcc 4.6.1 8-)

there is something in mpich2 1.4.0...

to follow...

jmb

Re: abinit 6.8.1 + gfortran + mpich2 = no compile

Posted: Sun Jul 10, 2011 9:10 am
by jbeuken
Hi,

the explaination :

mpich2-1.4's explicitly defines MPI_Type_hvector in fortran90 module
as follows:

SUBROUTINE MPI_TYPE_HVECTOR(v0,v1,v2,v3,v4,ierror)
USE MPI_CONSTANTS,ONLY:MPI_ADDRESS_KIND
INTEGER v0, v1
INTEGER(KIND=MPI_ADDRESS_KIND) v2
INTEGER v3, v4
INTEGER ierror
END SUBROUTINE MPI_TYPE_HVECTOR

While in mpich2-1.3.2p1, there isn't explicit prototype
for MPI_Type_hvector in fortran90 module.


there were some updates to f90 "mpi" module ( if we use "use mpi" ) in 1.4 which probably do a better job of catching these errors.

it's the reason why, with " --with-mpi-level=1" , it works...

now, until abinit's developpers solve the problem, we can use the mpich2 1.3.2p1 version

jmb

Re: abinit 6.8.1 + gfortran + mpich2 = no compile

Posted: Wed Jul 27, 2011 6:38 pm
by Steven Miller
If you look at this page:

http://www.mpi-forum.org/docs/mpi22-rep ... tm#Node327

It appears that the MPICH2-1.4 implementation is in error. The third parameter (stride) should be INTEGER, not INTEGER(KIND=MPI_ADDRESS_TYPE) as it is prototyped in the 1.4 version of MPICH2. This subroutine (MPI_TYPE_HVECTOR) is a legacy MPI-1 function which is deprecated in MPI-2, precisely because of this issue. The third parameter is defined differently for Fortran than for C or C++. I believe that the abinit code correctly uses the function according to the MPI standard.

However, the problem is easy enough to fix. I have a patch that corrects the prototype generation code in mpich2. I'm currently running abinit 6.8.1 with mpich2-1.4 using this modification and so far it seems fine. It patches the mpich2-1.4/src/binding/f90/buildiface script. I'd attach it, but the forum won't let me. :( But here is the text:

Code: Select all

diff -ur mpich2-1.4.1p1/src/binding/f90/buildiface ../mpipatch/mpich2-1.4.1p1/src/binding/f90/buildiface
--- mpich2-1.4.1p1/src/binding/f90/buildiface   2011-08-05 12:59:41.000000000 -0400
+++ ../mpipatch/mpich2-1.4.1p1/src/binding/f90/buildiface   2011-10-11 13:28:20.298986134 -0400
@@ -117,9 +117,13 @@
         'Type_hindexed-3' => 'int[]',
         'Type_indexed-2' => 'int[]',
         'Type_indexed-3' => 'int[]',
+        'Type_hvector-3' => 'int',
         'Type_struct-2' => 'int[]',
         'Type_struct-3' => 'int[]',
         'Type_struct-4' => 'MPI_Datatype[]',
+        'Type_extent-2' => 'int',
+        'Type_lb-2' => 'int',
+        'Type_ub-2' => 'int',
         'Waitall-2' => 'MPI_Request[]',
         'Waitall-3' => 'MPI_Status[]',
         'Waitany-2' => 'MPI_Request[]',
diff -ur mpich2-1.4.1p1/src/binding/f90/mpi_base.f90.in ../mpipatch/mpich2-1.4.1p1/src/binding/f90/mpi_base.f90.in
--- mpich2-1.4.1p1/src/binding/f90/mpi_base.f90.in   2011-09-01 14:53:13.000000000 -0400
+++ ../mpipatch/mpich2-1.4.1p1/src/binding/f90/mpi_base.f90.in   2011-10-11 13:28:27.970862078 -0400
@@ -15,9 +15,7 @@
        END SUBROUTINE MPI_COMM_FREE_KEYVAL
 
        SUBROUTINE MPI_TYPE_EXTENT(v0,v1,ierror)
-       USE MPI_CONSTANTS,ONLY:MPI_ADDRESS_KIND
-       INTEGER v0
-       INTEGER(KIND=MPI_ADDRESS_KIND) v1
+       INTEGER v0, v1
        INTEGER ierror
        END SUBROUTINE MPI_TYPE_EXTENT
 
@@ -114,9 +112,7 @@
        END SUBROUTINE MPI_OP_COMMUTATIVE
 
        SUBROUTINE MPI_TYPE_LB(v0,v1,ierror)
-       USE MPI_CONSTANTS,ONLY:MPI_ADDRESS_KIND
-       INTEGER v0
-       INTEGER(KIND=MPI_ADDRESS_KIND) v1
+       INTEGER v0, v1
        INTEGER ierror
        END SUBROUTINE MPI_TYPE_LB
 
@@ -562,9 +558,7 @@
        END SUBROUTINE MPI_TYPE_CREATE_RESIZED
 
        SUBROUTINE MPI_TYPE_UB(v0,v1,ierror)
-       USE MPI_CONSTANTS,ONLY:MPI_ADDRESS_KIND
-       INTEGER v0
-       INTEGER(KIND=MPI_ADDRESS_KIND) v1
+       INTEGER v0, v1
        INTEGER ierror
        END SUBROUTINE MPI_TYPE_UB
 
@@ -822,10 +816,7 @@
        END SUBROUTINE MPI_GET_VERSION
 
        SUBROUTINE MPI_TYPE_HVECTOR(v0,v1,v2,v3,v4,ierror)
-       USE MPI_CONSTANTS,ONLY:MPI_ADDRESS_KIND
-       INTEGER v0, v1
-       INTEGER(KIND=MPI_ADDRESS_KIND) v2
-       INTEGER v3, v4
+       INTEGER v0, v1, v2, v3, v4
        INTEGER ierror
        END SUBROUTINE MPI_TYPE_HVECTOR


I'm going to try to submit this to the mpich2 devs, but in the meantime, this patch seems to work.

UPDATE: (11 OCT 2011)
I updated the patch in the above code block. There were some things missing. See post below. Also, I tested with the newer mpich2-1.4.1p1 release, and it seems to work, but should work with any 1.4.x release. Let's hope they include it in 1.5.x!

Cut and paste the patch into a file (patchfile.txt) and then apply within the mpich2-1.4.x/ directory using

Code: Select all

patch -p1 <patchfile.txt


Please let me know if you have any problems with it.

Re: abinit 6.8.1 + gfortran + mpich2 = no compile

Posted: Mon Oct 10, 2011 3:44 pm
by jbeuken
Hi Steven,

sorry for this long delay of in my reaction...

Steven Miller wrote:

Code: Select all

--- src/binding/f90/buildiface   2011-03-20 15:37:51.000000000 -0400
+++ ../../mpich2-1.4/src/binding/f90/buildiface   2011-07-25 19:12:52.529963846 -0400
@@ -189,6 +189,7 @@
         'Op_commutative-2' => 'bool',
         'File_set_atomicity-2' => 'bool',
         'File_get_atomicity-2' => 'bool',
+        'Type_hvector-3' => 'int',
       );
 
 # Some routines must be skipped (custom code is provided for them)


I'm going to try to submit this to the mpich2 devs, but in the meantime, this patch seems to work.


but, I don't known how you succeeded to compile with this patch but for me, it doesn't work...

with the version mpich2 1.4.1p1 , it doesn't yet work... because the patch was not included...

but with the svn version, everything works 8-)

we can see in trac that , your patch was been integrated but it was not enough :

http://trac.mcs.anl.gov/projects/mpich2/changeset/8809/mpich2/trunk/src/binding/f90/buildiface#file0

http://trac.mcs.anl.gov/projects/mpich2/log/mpich2/trunk/src/binding/f90/buildiface?rev=9009


we must wait for 1.5.x version...

regards

jmb

Re: abinit 6.8.1 + gfortran + mpich2 = no compile

Posted: Tue Oct 11, 2011 9:26 pm
by Steven Miller
Hey Jean-Michel,

I forgot to mention in my earlier post: if you use the version of the patch I originally posted, you need to run the patched version of the "src/binding/f90/buildiface" script before running the "./configure" script. It generates some input files for the configure script, which configure uses to generate some of the header files. Ordinarily, this is something that is already done by the developers prior to releasing the source package (I mistakenly thought it was invoked during the configure script). I apologize for the omission.

I went ahead and updated my original post so that all files needed to be patched are patched. With this version of the patch, users do not need to re-run "buildiface" (but it won't hurt either). Also, as you mentioned, the trac website shows an expanded patch file over what I posted. Those additions are meant to address several other functions that were buggy in the mpich2-1.4.x releases (MPI_Type_extent, MPI_Type_lb and MPI_Type_ub), but ABINIT does not currently use these functions and was not affected by those bugs. Only MPI_Type_hvector was causing ABINIT problems, and that was the only one I addressed in the patch I originally posted.

But, I went ahead and included those other bug fixes in the updated patch (perhaps it will help other programs besides ABINIT). And further, I tested it with the current version, mpich2-1.4.1p1, and it seems to compile ABINIT 6.8.1 just fine. It should work with any of the 1.4.x versions. Give it a try and let me know if there's any further issues.

I see, as you mention, they fixed it in SVN, but while we wait for the next major release, users can hopefully continue to use the production 1.4.x releases with this patch.

-Steve