Re: BUG MPI always crashes
Posted: Wed Aug 23, 2017 12:26 pm
by recohen
Again, thanks for ll the help! But I seem to be having the same or similar problem after setting prtden to 1, switching to ONCV norm-conserving pseudopotentials, setting up a spin-polarized computation explicitly, and getting rid of +U.
ITER STEP NUMBER 1
vtorho : nnsclo_now=2, note that nnsclo,dbl_nnsclo,istep=0 0 1
You should try to get npband*bandpp= 49
For information matrix size is 208468
You should try to get npband*bandpp= 49
For information matrix size is 208468
[0] WARNING: GLOBAL:DEADLOCK:NO_PROGRESS: warning
[0] WARNING: Processes have been blocked on average inside MPI for the last 5:00 minutes:
[0] WARNING: either the application has a load imbalance or a deadlock which is not detected
[0] WARNING: because at least one process polls for message completion instead of blocking
[0] WARNING: inside MPI.
[0] WARNING: [0] no progress observed for over 0:00 minutes, process is currently in MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2adeeb6267c0, *sendcounts=0x2ade6d07e8e0, *sdispls=0x2ade6d07e980, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2adeebfbe100, *recvcounts=0x2ade6d07e8c0, *rdispls=0x2ade6d07e920, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc400000e CART_SUB CART_CREATE CREATE COMM_WORLD [0:1], *ierr=0x7ffeb9769374)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [1] no progress observed for over 0:00 minutes, process is currently in MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2af022e26760, *sendcounts=0x2aefa42f6920, *sdispls=0x2aefa42f69c0, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2af02348d100, *recvcounts=0x2aefa42f6900, *rdispls=0x2aefa42f6960, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [0:1], *ierr=0x7ffe845494f4)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [2] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2b0eac2efae0, *sendcounts=0x2b0e321e2920, *sdispls=0x2b0e321e29c0, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2b0eacc88d00, *recvcounts=0x2b0e321e2900, *rdispls=0x2b0e321e2960, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [2:3], *ierr=0x7ffeecc16274)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [3] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2ac278c82b20, *sendcounts=0x2ac1fec1a920, *sdispls=0x2ac1fec1a9c0, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2ac27961bd40, *recvcounts=0x2ac1fec1a900, *rdispls=0x2ac1fec1a960, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [2:3], *ierr=0x7ffd4990e8f4)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [4] no progress observed for over 0:00 minutes, process is currently in MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2ab905021300, *sendcounts=0x2ab88aca6900, *sdispls=0x2ab88aca6960, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2ab903fb7a80, *recvcounts=0x2ab88aca6920, *rdispls=0x2ab88aca69c0, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [4:5], *ierr=0x7fff881a2474)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [5] no progress observed for over 0:00 minutes, process is currently in MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2b4a4b48e340, *sendcounts=0x2b49d04d6900, *sdispls=0x2b49d04d6960, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2b4a4a424ac0, *recvcounts=0x2b49d04d6920, *rdispls=0x2b49d04d69c0, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [4:5], *ierr=0x7ffdbd2992f4)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [6] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2b0ea0556c20, *sendcounts=0x2b0e26416920, *sdispls=0x2b0e264169c0, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2b0ea0eee480, *recvcounts=0x2b0e26416900, *rdispls=0x2b0e26416960, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [6:7], *ierr=0x7ffc260357f4)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [7] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2aed8d9a6b80, *sendcounts=0x2aed135aa920, *sdispls=0x2aed135aa9c0, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2aed8e33e480, *recvcounts=0x2aed135aa900, *rdispls=0x2aed135aa960, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [6:7], *ierr=0x7fff6b3d0974)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [8] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2b110ee27680, *sendcounts=0x2b1094582920, *sdispls=0x2b10945829c0, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2b110f48d700, *recvcounts=0x2b1094582900, *rdispls=0x2b1094582960, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [8:9], *ierr=0x7ffe3cfecef4)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [9] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2b780008c620, *sendcounts=0x2b7785eb2920, *sdispls=0x2b7785eb29c0, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2b7800a23700, *recvcounts=0x2b7785eb2900, *rdispls=0x2b7785eb2960, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [8:9], *ierr=0x7ffd00d81374)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [10] last MPI call:
[0] WARNING: mpi_comm_size_(comm=0xffffffffc4000005 CART_SUB CART_CREATE CREATE COMM_WORLD [10], *size=0x2b8336322f6c, *ierr=0x2b8336324634)
[0] WARNING: [11] last MPI call:
[0] WARNING: mpi_comm_size_(comm=0xffffffffc4000005 CART_SUB CART_CREATE CREATE COMM_WORLD [11], *size=0x2b1af06f6f6c, *ierr=0x2b1af06f8634)
[0] WARNING: [12] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2ba6f19a7420, *sendcounts=0x2ba677ba6960, *sdispls=0x2ba677ba6900, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2ba6f3c0d140, *recvcounts=0x2ba677ba69c0, *rdispls=0x2ba677ba6920, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [12:13], *ierr=0x7ffe70f15574)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [13] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2ae403626420, *sendcounts=0x2ae3894c2960, *sdispls=0x2ae3894c2900, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2ae40588c140, *recvcounts=0x2ae3894c29c0, *rdispls=0x2ae3894c2920, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [12:13], *ierr=0x7ffe27d8f374)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [14] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2b7766e27240, *sendcounts=0x2b76f01ca960, *sdispls=0x2b76f01ca900, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2b76f2cb7380, *recvcounts=0x2b76f01ca9c0, *rdispls=0x2b76f01ca920, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [14:15], *ierr=0x7ffe99c55ef4)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [15] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2b5162e27240, *sendcounts=0x2b50e8d2e960, *sdispls=0x2b50e8d2e900, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2b50eb81b380, *recvcounts=0x2b50e8d2e9c0, *rdispls=0x2b50e8d2e920, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [14:15], *ierr=0x7fff1c49f774)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [16] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2b8b86e27240, *sendcounts=0x2b8b0c21e960, *sdispls=0x2b8b0c21e900, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2b8b0f454700, *recvcounts=0x2b8b0c21e9c0, *rdispls=0x2b8b0c21e920, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [16:17], *ierr=0x7ffdad5caa74)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [17] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2abb419a6280, *sendcounts=0x2abac77e6960, *sdispls=0x2abac77e6900, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2abb43c0b740, *recvcounts=0x2abac77e69c0, *rdispls=0x2abac77e6920, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [16:17], *ierr=0x7ffc23009174)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [18] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2ab4b5da83e0, *sendcounts=0x2ab43bf92960, *sdispls=0x2ab43bf92900, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2ab4414d6100, *recvcounts=0x2ab43bf929c0, *rdispls=0x2ab43bf92920, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [18:19], *ierr=0x7ffe479617f4)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [19] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2b407fa26420, *sendcounts=0x2b40057ba960, *sdispls=0x2b40057ba900, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2b4081c8c140, *recvcounts=0x2b40057ba9c0, *rdispls=0x2b40057ba920, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [18:19], *ierr=0x7ffd5f9625f4)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [20] no progress observed for over 0:00 minutes, process is currently in MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2adcbee26780, *sendcounts=0x2adc441ce920, *sdispls=0x2adc441ce9c0, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2adc47a669c0, *recvcounts=0x2adc441ce900, *rdispls=0x2adc441ce960, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [20:21], *ierr=0x7fff0a328874)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [21] no progress observed for over 0:00 minutes, process is currently in MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2b708ee26740, *sendcounts=0x2b701475a920, *sdispls=0x2b701475a9c0, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2b708f48e980, *recvcounts=0x2b701475a900, *rdispls=0x2b701475a960, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [20:21], *ierr=0x7ffec154bd74)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [22] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2b6e46e27680, *sendcounts=0x2b6dccb5a960, *sdispls=0x2b6dccb5a900, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2b6e4748d840, *recvcounts=0x2b6dccb5a9c0, *rdispls=0x2b6dccb5a920, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [22:23], *ierr=0x7ffef3f170f4)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [23] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2ba07c08c620, *sendcounts=0x2ba001ec6960, *sdispls=0x2ba001ec6900, sendtype=MPI_DOUBLE_PRECISION, *recvbuf=0x2ba07c6f2840, *recvcounts=0x2ba001ec69c0, *rdispls=0x2ba001ec6920, recvtype=MPI_DOUBLE_PRECISION, comm=0xffffffffc4000007 CART_SUB CART_CREATE CREATE COMM_WORLD [22:23], *ierr=0x7ffe1b2644f4)
[0] WARNING: m_xmpi_mp_xmpi_alltoallv_dp2d_ (/mnt/beegfs/bin/abinit)
[0] WARNING: [24] last MPI call:
[0] WARNING: mpi_comm_size_(comm=0xffffffffc4000005 CART_SUB CART_CREATE CREATE COMM_WORLD [24], *size=0x2af9a73c8f6c, *ierr=0x2af9a73ca634)
[0] WARNING: [25] last MPI call:
[0] WARNING: mpi_comm_size_(comm=0xffffffffc4000005 CART_SUB CART_CREATE CREATE COMM_WORLD [25], *size=0x2b1c58b9df6c, *ierr=0x2b1c58b9f634)
[0] WARNING: [26] last MPI call:
[0] WARNING: mpi_wtime_()
[0] WARNING: [27] last MPI call:
[0] WARNING: mpi_wtime_()
[0] WARNING: [28] last MPI call:
[0] WARNING: mpi_alltoallv_(*sendbuf=0x2abcc6e25b20, *sendcounts=0x2abc4c6be920, *sdispls=0x2abc4c6be9c0, se
This is with files:
C20H16I4Fe2_AFM.in
C20H16I4Fe2_AFM.out
gs_i
gs_o
gs_g
/mnt/beegfs/rcohen/PSEUDO/Fe.psp8
/mnt/beegfs/rcohen/PSEUDO/Fe.psp8
/mnt/beegfs/rcohen/PSEUDO/C.psp8
/mnt/beegfs/rcohen/PSEUDO/H.psp8
/mnt/beegfs/rcohen/PSEUDO/I.psp8
and the attached input file