[solved] MPI tag values do not conform to MPI specification

Documentation, Web site and code modifications

Moderators: baguetl, routerov

Locked
robinson
Posts: 1
Joined: Wed Jul 07, 2010 3:07 pm

[solved] MPI tag values do not conform to MPI specification

Post by robinson » Wed Jul 07, 2010 3:30 pm

Hi developers,

On the Cray XT platform ABINIT 6.0.4 is crashing with the following error:

aborting job:
Fatal error in MPI_Send: Invalid tag, error stack:
MPI_Send(173): MPI_Send(buf=0x2aaab01b2010, count=57723432, MPI_DOUBLE_PRECISION, dest=0, tag=57723432, MPI_COMM_WORLD) failed
MPI_Send(99).: Invalid tag, value is 57723432
aborting job:
Fatal error in MPI_Recv: Invalid tag, error stack:
MPI_Recv(186): MPI_Recv(buf=0x2aaacf4bf010, count=57723432, MPI_DOUBLE_PRECISION, src=4, tag=57723432, MPI_COMM_WORLD, status=0x7fffffdf9af0) failed
MPI_Recv(106): Invalid tag, value is 57723432

The reason for the crash is the MPI tag value of 57723432. The MPI specification says implementations must support a tag value of at least 32767 (the upper bound is given by MPI_TAG_UB). However, the majority of MPICH2 implementations use a value of 2,147,483,647, which could explain why this bug is not commonly observed. Cray uses its own MPI library, which is based on MPICH2, but implements an upper bound of just 16777215 (24 bits, rather than 32 bits). Thus is complains that the tag is invalid and crashes out.

The following is the offending section of code (line 279 onwards in abinit-6.0.4/src/12_hide_mpi/xexch_mpi.F90):
tag=nt
if ( recever == me ) then
call MPI_RECV(vrecv,nt,MPI_DOUBLE_PRECISION,sender,tag,spaceComm,statux,ier)
end if
if ( sender == me ) then
call MPI_SEND(vsend,nt,MPI_DOUBLE_PRECISION,recever,tag,spaceComm,ier)
end if

(tag=net occurs 4 times in the code)

For portable code it's probably best not to use values for MPI tag greater than 32767. Thanks for reading and look forward to hearing your comments,
Best regards,
Tim Robinson

--
Tim Robinson
HPC Application Analyst
Swiss National Supercomputing Service (CSCS)

gonze
Posts: 412
Joined: Fri Aug 14, 2009 8:29 pm

Re: MPI tag values do not conform to MPI specification

Post by gonze » Fri Jul 30, 2010 4:29 pm

robinson wrote:The reason for the crash is the MPI tag value of 57723432. The MPI specification says implementations must support a tag value of at least 32767 (the upper bound is given by MPI_TAG_UB). However, the majority of MPICH2 implementations use a value of 2,147,483,647, which could explain why this bug is not commonly observed. Cray uses its own MPI library, which is based on MPICH2, but implements an upper bound of just 16777215 (24 bits, rather than 32 bits). Thus is complains that the tag is invalid and crashes out.

For portable code it's probably best not to use values for MPI tag greater than 32767. Thanks for reading and look forward to hearing your comments,
Best regards,
Tim Robinson


Thanks for identifying the problem, and reporting it !
Moreover, it appears only in the sections of code where the tag should play no role ...
This problem has been fixed in ABINITv6.2.2, to be released within one month or so.
Best regards,
Xavier

Locked