parallization and crashing

Phonons, DFPT, electron-phonon, electric-field response, mechanical response…

Moderators: mverstra, joaocarloscabreu

Locked
dfgtyx
Posts: 5
Joined: Mon Nov 08, 2010 2:53 pm

parallization and crashing

Post by dfgtyx » Thu Feb 17, 2011 7:36 am

All,

Do RF calculations parallelize? I cannot make it work.

When I set paral_rf and ngroup_rf the code segmentation faults straight away.

When I try to use any other kind of parallelization I get an error message like the following:

-P-0000 leave_test : error - 4 processors are not answering. Exiting...

I've tried this input file with abinit 6.2.4 and 6.4. I've also tried using gfortran/openmpi intel fortran/openmpi and intel fortran/mpich2 same problem for all.

Regards,
Nathan

here is my input file:

chkprim 0 # bomb if cell is not primative
ndtset 2 # number of data sets (a strange, yet wonderful, ABINIT feature)
vacwidth 5 # a gap (in Bohr) longer than this settings becomes flagged as a vacuum
boxcutmin 2 # controls accuracy in reciprocal space 2.0 is exact, matters for response function calculations
nstep 200 # maximum number of SCF cycles
localrdwf 0 # each processor uses its own local input files

occopt 1 # controls how the code fills orbitals, includes temperature effects
nsppol 1 # do spin polarized calculation
iprcel 0 # set the preconditioner
ecut 25 # cut off for fourier shape-functions
ixc 2 # sets the exchange-correlation functional
iscf 5 # specifies algorithm for scf iteration
iscf2 5 # specifies algorithm for scf iteration
tolvrs 1e-20 # convergence tolerance
nsym 0 # number of symmetries, 0=automatic
tsmear 5e-07 # sets the temperature corresponding to the occopt
ecutsm 2.5e-06 # slightly reduces the KE of the highest-frequency modes stabilizing relaxation

irdwfk 0 # controls where starting wave function comes from
irdden 0 # controls where the starting density comes from
prtwf 1 # print the wave function
prtden 0 # write the charge density to disk
prteig 0 # write out the energy eigenvalues

nshiftk 1 # number of k-point shifts
shiftk 0 0 0 #


kptopt 1 # use full symmetry to generate the kpoints
ngkpt 12 12 12 #


#
# The array of lattice constants
#
acell 10.1975585743926 10.1975585743926 10.1975585743926

#
# Atomic data
#
ntypat 1 # number of atoms (and psuedopotential lines) to read in
typat # specify atom type
1 # the index of the atom
1 # the index of the atom

natom 2 # number of atoms

xred
0 0 0
0.25 0.25 0.25


znucl
14
14

rprim
0.505 0.5 0
0 0.5 0.5
0.505 0 0.5





kptopt2 2
nqpt2 1 # number of q-points for the response-function calculation
qpt2 0 0 0 # mechanical behavior depends upon only the lowest modes
rfstrs2 3 # compute both shear and uniaxial strains
rfphon2 1 # the phonon response function calculation is used for relaxing the anadb results
rfatpol2 1 2 # perturb all the atoms
rfdir2 1 1 1 # compute the perturbation in all directions
getwfk2 -1
istwfk *1
#ngroup_rf 5
#paral_rf 1
#nproc_kpt 5
paral_kgb 1
npband 5
npfft 1
npkpt 1
wfoptalg 4
fftalg 401

dfgtyx
Posts: 5
Joined: Mon Nov 08, 2010 2:53 pm

Re: parallization and crashing

Post by dfgtyx » Thu Feb 17, 2011 6:28 pm

I compiled it with debug and reran it and it produced the following message:

Backtrace for this error:
+ /lib64/libc.so.6 [0x3cadc30280]
+ function wfsinp (0x60CE0F)
at line 566 of file wfsinp.F90
+ function inwffil (0x5DA3FE)
at line 652 of file inwffil.F90
+ function respfn (0x47A835)
at line 425 of file respfn.F90
+ function driver (0x449AE6)
at line 649 of file driver.F90
+ function abinit (0x43D5D8)
at line 445 of file abinit.F90
+ /lib64/libc.so.6(__libc_start_main+0xf4) [0x3cadc1d974]
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 2761 on
node compute-0-0.local exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

When I run with the following parallelization options:

ngroup_rf 5
paral_rf 1


Alternatively, when I run with parallelization over bands I still get the "4 procesors are not answering" error with no aditional information.

Regards,
Nathan

david.waroquiers
Posts: 138
Joined: Sat Aug 15, 2009 12:45 am

Re: parallization and crashing

Post by david.waroquiers » Fri Feb 18, 2011 4:35 pm

Hello,

I think that the option paral_kgb 1 is only available for ground-state calculations (not for RF).

David

dfgtyx
Posts: 5
Joined: Mon Nov 08, 2010 2:53 pm

Re: parallization and crashing

Post by dfgtyx » Fri Feb 18, 2011 6:30 pm

Thank you for your reply!

The documentation causes me to think that band-level parallelization is possible for a response functional calculation

http://www.abinit.org/documentation/helpfiles/for-v6.6/tutorial/lesson_parallelism.html wrote:Pararallelism over the bands

The parallelism over bands in the ground-state case is controlled by the wfoptalg and nbdblock input variables.
By contrast, for response-function jobs, the band parallelism is automatically activated when needed.


I tried this by setting:

wfoptalg 1
nbdblock 5

And it still crashed:

Code: Select all

accrho3.F90:158 : enter

accrho3.F90:309 : exit

-P-0000  leave_test : synchronization done...
-P-0000  leave_test : error -      4 procesors are not answering. Exiting...
[nathan@compute-0-0 si.elast]$
 with the following message:


Thanks in advance.

P.S. While I've begun looking through it, I still don't really know the code structure of ABINIT, so I'd not be good at debugging, but, if it turns out that something is unmaintained and needs to be updated, I'd be happy to give it a try, provided somebody could tell me, roughly, what needs to be done.

Regards,
Nathan


david.waroquiers wrote:Hello,

I think that the option paral_kgb 1 is only available for ground-state calculations (not for RF).

David

mverstra
Posts: 655
Joined: Wed Aug 19, 2009 12:01 pm

Re: parallization and crashing

Post by mverstra » Sat Feb 26, 2011 2:28 pm

Your main input choices were for _perturbation level_ parallelization. It is not clear that this still works, and using different datasets is simpler so you can submit different jobs.

In RF the k-point and band parallelizations are automatic. No need for the kgb input variabls (they will be ignored).
I have not seen conclusive tests of the band parallelism, but the k-point one works great for phonons etc... I recommend using nkpt processors (remember nkpt will change for each perturbation, depending on the residual symmetry in the little group) - if you impose kptopt 3 then nkpt will be constant across all perturbations.

Matthieu
Matthieu Verstraete
University of Liege, Belgium

Locked