Page 1 of 1
Arguments in MPI_SEND
Posted: Wed Feb 17, 2010 10:06 pm
by jzwanzig
I guess this is a "proposed code modification"...kind of...at any rate, in berryphase_new.F90 there are several calls to MPI_SEND, in order to send wavefunctions (cg) from one processor to another (this is because in Berry's phase calculations it is necessary to overlap wavefunctions at different k points, and they may not be residing on the same processor). All of these calls include as an argument "mpi_status" right before "ierr"; however, the openmpi manual states the format for this call is
MPI_SEND(BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR)
<type> BUF(*)
INTEGER COUNT, DATATYPE, DEST, TAG, COMM, IERROR
while the format for MPI_RECV is
MPI_RECV(BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR)
<type> BUF(*)
INTEGER COUNT, DATATYPE, SOURCE, TAG, COMM
INTEGER STATUS(MPI_STATUS_SIZE), IERROR
So, several questions: 1) why does the code even compile, let alone work? And it certainly does work. 2) with the current format, MPI_SEND returns a non-zero ierr on the first call, I think because of the "wrong" argument list.
Thanks,
Joe
Re: Arguments in MPI_SEND
Posted: Thu Feb 18, 2010 11:52 am
by torrent
Hi Joe,
This issue is known... we regulary modify the MPI_SEND calls to suppress this extra "status" argument.
This part of the code is probably not often used... This explain why the MPI_SEND has not be corrected.
May I give you a suggestion ?
You should create new xmpi_send and xmpi_receive (generic) routines in 12_hide_mpi, with the correct calls (with or without status).
The goal is to empty 79_seqpar directory...
I sometines move routines from 79_seqpar to others directories by using generic routines for MPI (I recently did that for loper3, respfn, vtorho3...).
Re: Arguments in MPI_SEND
Posted: Thu Feb 18, 2010 3:00 pm
by jzwanzig
Hi Marc,
thanks for the response. I'm happy to do what you suggest but I don't think I quite understand. You say the goal is "to empty 79_seqpar_mpi", what does that mean exactly? Is the idea that, if i replace the raw MPI_SEND and MPI_RECV with generic xmpi calls, then I can (should?) move berryphase_new.F90 to a different directory? Why? I guess I don't quite know the logic of the directory naming scheme. Or, by "empty 79_seqpar_mpi" you mean to empty it of calls to the raw MPI_ routines, but leave the files in place?
Let me know--
thanks!
Joe
Re: Arguments in MPI_SEND
Posted: Thu Feb 18, 2010 4:04 pm
by torrent
Joe,
My remark was related to this doc:
http://www.abinit.org/developers/commun ... tification(see point I03).
The idea is to encapsulate as much as possible calls to MPI routines (by using generic routines put in 12_hide_mpi).
As you probably know only directories with "_mpi" suffixe are compiled with the "HAVE_MPI" directive enabled.
and (I guess) the goal is to minimize the number of MPI directories.
So the goal is to eliminate direct calls to MPI routines (in order to be able to switch from one MPI implementation to another).
Hope this helps...
Re: Arguments in MPI_SEND
Posted: Thu Feb 18, 2010 4:46 pm
by jzwanzig
OK, I made generic xsend_mpi and xrecv_mpi in 12_hide_mpi, they only have a couple of variants (so far) because that's all that's needed at the moment. There do not seem to be many MPI_SEND and MPI_RECV calls left in the code so I don't think I'll need to add too many other possibilities before I can replace them all with their xsend and xrecv equivalents.
In the new xrecv_mpi, I did use the predefined status object MPI_IGNORE_STATUS, which according to the documentation is faster because then the MPI_RECV doesn't spend time filling in the status object, and from what I can see whenever we call MPI_RECV, we don't use the status object anyway. So the new generic functions look pleasantly symmetrical:
xsend_mpi(data,destination,tag,Comm,ierr)
xrecv_mpi(data,source,tag,Comm,ierr)
So now berryphase_new.F90 is cleansed of MPI_SEND and MPI_RECV, and uses only x..mpi calls. Should I now move it to a different directory?
thanks,
Joe
Re: Arguments in MPI_SEND
Posted: Thu Feb 18, 2010 5:32 pm
by gmatteo
Hi Joe,
you might also have a look at the xech_mpi routines defined
in 12_hide_mpi in which both MPI_RECV and MPI_SEND are encapsulated.
You might also reuse the cprj_exch and cprj_bcast routines defined in 53_abiutil/cprj_utils
if you need to communicate the cprj matrix elements.
Regards
Matteo
Re: Arguments in MPI_SEND
Posted: Thu Feb 18, 2010 7:48 pm
by torrent
"
So now berryphase_new.F90 is cleansed of MPI_SEND and MPI_RECV, and uses only x..mpi calls. Should I now move it to a different directory?"
I'm not the manager for the 79_seqpar directory... but I would say yes.
One less to deal with
Marc
Re: Arguments in MPI_SEND
Posted: Thu Feb 18, 2010 10:55 pm
by jzwanzig
OK, I'm trying to use cprj_exch (for example) and I know I'm being thick but I just can't understand the flow (first time I've done in // programming). With MPI_SEND and and MPI_RECV the old code does things like:
do iproc = 1, nproc
if (iproc == me) then
if (relevant cprj is on my group) then
get cprj
MPI_SEND cprj to destinations
end if
end if
if (iproc /= me and iproc == destination) then
MPI_RECV cprj
end if
end do
But with cprj_exch am I supposed to do the same thing, or call it only once? (that is, every processor will call it and it will decide internally whether to act as a sender or a receiver).
sorry for the ill-posed questions.
thanks-
Joe
Re: Arguments in MPI_SEND
Posted: Fri Feb 19, 2010 11:20 am
by mverstra
As to where to move the routine to, just check the subroutines it calls, take the maximum directory number for those (say M), and then find the most appropriate directory (by thematic) whose number is > M.
At some point the PAW directories might have to be split - a lot of stuff is ending up there by default, just because it involves PAW (soon everything will!).
Matthieu
Re: Arguments in MPI_SEND
Posted: Sat Feb 20, 2010 1:04 am
by gmatteo
Dear Joe,
cprj_exch is very strict in the sense that it should be called _only_ by the
two nodes involved in the point-to-point communication.
cprj_exch indeed reports an error if a node whose rank differs either from "sender" or "receiver"
enters the routine.
This precaution is needed to avoid possible programming errors that might lead to deadlocks
in which an MPI packet is sent but there's no receiver waiting for it or vice versa.
I wrote cprj_exch for the generation of the KSS file needed in the GW part.
In this case I have to write the distributed cprj on an external file and therefore
each node has to send its own data to the master node that will write the final output.
In outkss I'm using the following coding.
master=0
receiver=master ! We are going to collect the cprjs on master
do iproc=0,nproc-1
sender = -1
if (me == iproc) then
sender = me
!I'm the sender: extract the cprjs to be transferred and copy them in Cprj_send
....
end if
if (iproc = receiver) then
! I'm the receiver and I have to allocate space for the MPI packet to be received
allocate(Cprj_recv)
....
end
if (me==sender .or. me==receiver) ! Exchange data btw the two nodes.
call cprj_exch(Cprj_send,Cprj_recv,sender,receiver,spaceComm,ierr)
if (me==receiver) then
! write Cprj_recv on file
.....
free(Cprj_recv)
end if
end if
end if
You might look at outkss.F90 for more details.
I have to say, however, that I usually try to avoid MPI exchange as it complicates the implementation,
it's prone to errors and it's very difficult to optimize the load distribution.
Matteo