Page 1 of 1

Abinit 7.2.1 on supercomputer with IBM MPi

Posted: Mon Apr 29, 2013 3:59 pm
by ChrisKue
Hi

I have a problem while executing a parallel job. I compile Abinit 7.2.1 for a supercomputer with IBM MPI. The compilation is successful with no errors and serial calculation works fine. But if i try to calculate a parallel job the result of the scf loops oscillate and do not converge.
The option for compilation are:

Code: Select all

enable_64bit_flags="yes"
prefix="${HOME}/abinit-7.2.1"

with_linalg_flavor="none"
with_trio_flavor="none"
with_algo_flavor="none"
with_math_flavor="none"
with_dft_flavor="none"

enable_mpi="yes"
with_mpi_prefix="/opt/ibmhpc/pecurrent/ppe.poe"

enable_mpi_inplace="yes"
enable_mpi_io="yes"
enable_mpi_trace="yes"

enable_maintainer_checks="yes"
enable_optim="aggressive"


The oscillating scf loop is

Code: Select all

--------------------------------------------------------------------------------

P newkpt: treating    648 bands with npw=     317 for ikpt=   1 by node    0
 
_setup2: Arith. and geom. avg. npw (full set) are     633.000     633.000

================================================================================

=== [ionmov= 2] Broyden-Fletcher-Goldfard-Shanno method (forces)           
================================================================================

--- Iteration: (  1/500) Internal Cycle: (1/1)
--------------------------------------------------------------------------------

---SELF-CONSISTENT-FIELD CONVERGENCE--------------------------------------------

     iter   Etot(hartree)      deltaE(h)  residm     vres2    diffor    maxfor
 ETOT  1  -2449.7491481191    -2.450E+03 1.559E+01 1.106E+06 3.647E-02 3.647E-02
 ETOT  2  -1908.5287109370     5.412E+02 1.234E+00 2.952E+06 1.707E-01 2.072E-01
 ETOT  3  -2650.1401216449    -7.416E+02 5.323E-01 9.688E+04 1.230E-01 8.418E-02
 ETOT  4  -2463.3304750865     1.868E+02 3.300E+00 7.423E+05 1.228E-01 2.069E-01
 ETOT  5  -2271.8742134520     1.915E+02 9.959E-02 8.444E+05 8.292E-02 1.240E-01
 ETOT  6  -2678.3828327727    -4.065E+02 7.962E+00 1.933E+05 3.422E-02 8.979E-02
 ETOT  7  -2485.1232461799     1.933E+02 4.216E-02 2.530E+05 5.313E-02 3.666E-02
 ETOT  8  -2731.7069603238    -2.466E+02 9.307E+00 7.595E+05 1.715E-02 5.381E-02
 ETOT  9  -2506.2111608259     2.255E+02 1.908E+01 9.982E+04 1.001E-02 6.381E-02
 ETOT 10  -2492.1516927538     1.406E+01 7.271E-02 1.880E+05 1.427E-02 4.954E-02
 ETOT 11  -2494.5140822345    -2.362E+00 3.513E-02 1.459E+05 1.254E-02 6.208E-02
 ETOT 12  -2711.5288090874    -2.170E+02 7.229E+01 4.049E+05 1.280E-01 1.901E-01
 ETOT 13  -2406.8075716655     3.047E+02 1.160E-01 4.170E+05 1.203E-01 6.985E-02
 ETOT 14  -2486.3544868005    -7.955E+01 3.622E-02 2.201E+05 2.281E-02 4.704E-02
 ETOT 15  -2589.0658764365    -1.027E+02 1.341E+01 5.282E+04 3.530E-02 1.174E-02
 ETOT 16  -2466.7381282575     1.223E+02 3.151E-01 2.135E+05 1.899E-02 3.074E-02
 ETOT 17  -2538.7829295215    -7.204E+01 6.262E-01 6.252E+03 1.058E-01 1.365E-01
 ETOT 18  -2389.9466843641     1.488E+02 6.580E-01 5.084E+05 1.362E-01 3.491E-04
 ETOT 19  -2633.5975866446    -2.437E+02 4.805E+00 3.510E+05 1.821E-02 1.786E-02
 ETOT 20  -2423.1921585020     2.104E+02 3.253E-01 3.595E+05 1.131E-01 9.525E-02
 ETOT 21  -2699.3263603187    -2.761E+02 8.874E+03 9.927E+05 5.350E-02 4.176E-02
 ETOT 22  -2498.9026705726     2.004E+02 7.663E-02 1.218E+05 1.737E-02 2.439E-02
 ETOT 23  -2627.7500459428    -1.288E+02 1.427E+00 1.339E+05 1.959E-01 2.203E-01
 ETOT 24  -2797.3547274009    -1.696E+02 2.071E+00 7.623E+05 1.035E-01 1.168E-01
 ETOT 25  -2509.6144420952     2.877E+02 3.542E-01 1.489E+05 2.882E-02 8.794E-02
 ETOT 26  -2714.9565423619    -2.053E+02 6.078E+00 4.591E+05 5.315E-02 3.479E-02
 ETOT 27  -2498.0810191057     2.169E+02 1.919E-01 1.472E+05 3.124E-03 3.792E-02
 ETOT 28  -2533.7396792319    -3.566E+01 6.000E+01 1.435E+05 5.261E-02 9.052E-02
 ETOT 29  -2675.5494619370    -1.418E+02 5.417E+00 2.122E+05 2.274E-01 3.179E-01
 ETOT 30  -2180.4540460547     4.951E+02 2.484E-01 1.117E+06 2.340E-01 8.392E-02
 ETOT 31  -2593.8101435970    -4.134E+02 1.674E+00 1.613E+05 5.136E-02 3.255E-02
 ETOT 32  -2502.2350454350     9.158E+01 7.391E-02 1.192E+05 9.526E-03 2.303E-02
 ETOT 33  -2498.5840060840     3.651E+00 4.044E-02 1.193E+05 3.004E-02 5.306E-02
 ETOT 34  -2582.1634020071    -8.358E+01 4.363E+01 4.313E+05 3.008E-02 8.315E-02
 ETOT 35  -2580.7167717936     1.447E+00 7.256E+01 1.896E+06 1.605E-03 8.475E-02
 ETOT 36  -2530.6218087053     5.009E+01 7.893E-01 3.538E+05 2.930E-02 5.545E-02
 ETOT 37  -2697.1660220348    -1.665E+02 1.763E+01 9.859E+05 5.833E-02 2.875E-03
 ETOT 38  -2501.2768200510     1.959E+02 7.982E-02 1.086E+05 9.579E-03 6.704E-03
 ETOT 39  -2502.1108430893    -8.340E-01 2.731E-01 1.258E+05 9.794E-03 1.650E-02
 ETOT 40  -2495.1663677031     6.944E+00 3.098E+00 1.375E+05 2.041E-02 3.691E-02
 ETOT 41  -2631.0129362011    -1.358E+02 9.581E+01 1.963E+05 6.654E-02 1.034E-01
 ETOT 42  -1976.5456166436     6.545E+02 9.070E-02 1.557E+06 2.065E-01 1.030E-01
 ETOT 43  -2704.6430689286    -7.281E+02 8.066E+00 4.090E+05 1.112E-01 8.202E-03
 ETOT 44  -2502.0155982046     2.026E+02 6.762E-02 1.179E+05 9.694E-03 1.790E-02
 ETOT 45  -2498.3745986236     3.641E+00 9.922E-02 1.224E+05 5.027E-02 6.817E-02
 ETOT 46  -2596.2198003514    -9.785E+01 4.832E+03 1.404E+05 8.568E-02 1.538E-01
 ETOT 47  -2529.9420883862     6.628E+01 1.518E+00 5.133E+05 1.042E-01 4.965E-02
 ETOT 48  -2501.1913288599     2.875E+01 8.587E-02 1.237E+05 1.602E-02 3.363E-02
 ETOT 49  -2629.1375917424    -1.279E+02 1.268E+01 2.048E+05 3.177E-02 6.540E-02
 ETOT 50  -2493.0209369762     1.361E+02 3.461E-01 2.576E+05 4.266E-02 1.081E-01
 ETOT 51  -2525.4978566771    -3.248E+01 5.218E-01 9.816E+04 5.606E-02 5.200E-02
 ETOT 52  -2557.3992073116    -3.190E+01 8.456E-01 1.277E+05 7.334E-02 1.253E-01
 ETOT 53  -2474.5550570760     8.284E+01 2.202E-01 1.636E+05 1.671E-02 1.086E-01
 ETOT 54  -2503.5636237806    -2.901E+01 8.063E-01 2.007E+05 3.985E-02 6.878E-02
 ETOT 55  -2417.6565746425     8.591E+01 1.097E-01 3.338E+05 1.515E-01 2.203E-01
 ETOT 56  -2526.1750918231    -1.085E+02 5.601E+00 1.382E+05 2.106E-01 9.742E-03
 ETOT 57  -2583.3185725844    -5.714E+01 1.005E+00 1.802E+05 6.032E-02 7.006E-02
 ETOT 58  -2500.8101037352     8.251E+01 7.988E-02 1.343E+05 3.882E-02 3.124E-02
 ETOT 59  -2554.7673887897    -5.396E+01 4.192E-01 1.371E+05 1.677E-01 1.990E-01
 ETOT 60  -2685.5522391431    -1.308E+02 2.816E+01 1.445E+06 2.091E-01 1.011E-02
 ETOT 61  -2729.5142177261    -4.396E+01 2.963E+00 2.097E+05 5.688E-02 4.677E-02
 ETOT 62  -2499.5728596750     2.299E+02 8.329E-02 1.283E+05 3.830E-02 8.474E-03
 ETOT 63  -2572.0775566567    -7.250E+01 3.732E-01 5.578E+04 1.642E-01 1.727E-01
 ETOT 64  -2457.0736132653     1.150E+02 9.964E+00 2.633E+05 4.935E-02 1.233E-01
 ETOT 65  -2093.1775664573     3.639E+02 2.140E+01 1.302E+06 1.235E-01 2.506E-04
 ETOT 66  -2519.1583333894    -4.260E+02 4.496E-01 1.326E+05 8.154E-02 8.129E-02
 ETOT 67  -2577.6452839894    -5.849E+01 1.838E+00 1.075E+05 1.550E-02 9.679E-02
 ETOT 68  -2690.9997826766    -1.134E+02 1.629E+01 3.020E+05 3.766E-02 5.913E-02
 ETOT 69  -2400.2549439474     2.907E+02 3.429E-01 3.666E+05 1.024E-01 1.615E-01
 ETOT 70  -2538.9481200387    -1.387E+02 3.974E+00 2.714E+05 1.003E-01 2.618E-01
 ETOT 71  -2533.8838077090     5.064E+00 2.968E-01 1.921E+05 1.425E-01 1.193E-01
 ETOT 72  -2590.2168129662    -5.633E+01 1.372E+02 1.628E+05 6.871E-02 1.880E-01
 ETOT 73  -2468.6178470257     1.216E+02 3.046E+00 5.162E+05 7.419E-03 1.806E-01
 ETOT 74  -2475.6439028592    -7.026E+00 1.702E-01 1.640E+05 3.649E-02 1.441E-01
 ETOT 75  -2577.2867979982    -1.016E+02 5.859E+00 1.399E+05 6.007E-02 8.402E-02
 ETOT 76  -2585.6627197849    -8.376E+00 5.468E+01 3.424E+05 1.382E-01 2.222E-01
 ETOT 77  -2703.3889941289    -1.177E+02 8.195E+02 6.145E+05 5.975E-02 1.625E-01
 ETOT 78  -2558.0441533667     1.453E+02 1.652E+01 2.372E+05 1.812E-03 1.643E-01
 ETOT 79  -2499.0339392314     5.901E+01 1.573E-01 1.294E+05 9.810E-02 6.618E-02
 ETOT 80  -2598.5694503684    -9.954E+01 1.467E+00 3.515E+05 3.725E-01 4.387E-01
 ETOT 81  -2437.3020479001     1.613E+02 1.114E+00 6.363E+05 1.242E-01 3.145E-01
 ETOT 82  -2582.0533091529    -1.448E+02 5.872E+06 1.340E+05 1.706E-01 1.440E-01
 ETOT 83  -2501.2851661480     8.077E+01 1.402E-02 1.257E+05 1.042E-01 3.971E-02
 ETOT 84  -2601.3838905585    -1.001E+02 1.322E+01 4.610E+04 1.104E-02 5.075E-02
 ETOT 85  -2561.1153268841     4.027E+01 6.628E+00 4.168E+05 1.176E-01 1.683E-01
 ETOT 86  -2510.8464426708     5.027E+01 7.041E+00 5.774E+05 3.608E-02 1.323E-01
 ETOT 87  -2505.1876331079     5.659E+00 1.820E-01 1.198E+05 5.988E-02 7.239E-02
 ETOT 88  -2515.1294162429    -9.942E+00 7.321E-01 1.569E+05 2.863E-02 1.010E-01

I am not sure whether the problem is a compiling problem or a problem in IBM MPI communication.

I will be very grateful for any help.

Re: Abinit 7.2.1 on supercomputer with IBM MPi

Posted: Tue Apr 30, 2013 11:49 am
by Alain_Jacques
Hi would rather think about a problem coming from a poor choice in the input variables rather than a compilation (you ran the test suite of course :-) ) or a communication problem (Abinit would crash in this case).

Kind regards,

Alain

Re: Abinit 7.2.1 on supercomputer with IBM MPi

Posted: Thu May 02, 2013 5:54 pm
by ChrisKue
Hi Alain

Thank you for your reply. Yes, the tests for serial calculation done successfully. Are there tests for parallel calculations also ?

I ran the same input file with abinit compiled with OpenMPI 1.6.4 and it works fine.
The result for the first iteration is (openMPI):

Code: Select all

=== [ionmov= 2] Broyden-Fletcher-Goldfard-Shanno method (forces)            
================================================================================

--- Iteration: (  1/500) Internal Cycle: (1/1)
--------------------------------------------------------------------------------

---SELF-CONSISTENT-FIELD CONVERGENCE--------------------------------------------

     iter   Etot(hartree)      deltaE(h)  residm     vres2    diffor    maxfor
 ETOT  1  -2355.1567606197    -2.355E+03 6.727E-03 6.797E+04 1.437E-02 1.437E-02
 ETOT  2  -2381.3393905295    -2.618E+01 3.602E-03 7.727E+04 5.641E-02 7.078E-02
 ETOT  3  -2454.6409303548    -7.330E+01 7.864E-03 7.389E+03 3.435E-02 3.643E-02
 ETOT  4  -2459.8097748530    -5.169E+00 1.796E-03 2.369E+03 1.771E-02 1.872E-02
 ETOT  5  -2462.3713315850    -2.562E+00 3.889E-04 2.072E+01 1.247E-02 6.250E-03
 ETOT  6  -2462.3818146804    -1.048E-02 1.517E-03 2.364E+01 3.714E-03 9.963E-03
 ETOT  7  -2462.4015562034    -1.974E-02 1.431E-04 7.397E+00 1.965E-03 7.998E-03
 ETOT  8  -2462.4111187489    -9.563E-03 6.499E-05 3.235E-02 7.276E-04 8.726E-03
 ETOT  9  -2462.4111857066    -6.696E-05 1.280E-05 5.567E-04 4.408E-06 8.721E-03
 ETOT 10  -2462.4111865290    -8.224E-07 6.476E-06 8.933E-06 6.376E-08 8.721E-03
 ETOT 11  -2462.4111865473    -1.837E-08 1.430E-06 1.670E-06 2.513E-06 8.724E-03
 ETOT 12  -2462.4111865488    -1.414E-09 6.980E-07 2.212E-08 3.404E-07 8.723E-03
 ETOT 13  -2462.4111865488    -7.913E-11 2.000E-07 1.969E-10 7.492E-08 8.723E-03
 ETOT 14  -2462.4111865488     4.911E-11 1.724E-07 8.994E-11 1.332E-08 8.723E-03
 ETOT 15  -2462.4111865488    -1.364E-12 7.423E-08 3.297E-12 3.206E-10 8.723E-03
 ETOT 16  -2462.4111865488    -3.502E-11 4.617E-08 3.010E-13 2.298E-09 8.723E-03

 At SCF step   16, forces are converged :
  for the second time, max diff in force=  2.298E-09 < toldff=  1.000E-08

 Cartesian components of stress tensor (hartree/bohr^3)
  sigma(1 1)= -1.95973466E-03  sigma(3 2)=  0.00000000E+00
  sigma(2 2)= -1.95973466E-03  sigma(3 1)=  0.00000000E+00
  sigma(3 3)= -4.02573912E-04  sigma(2 1)=  0.00000000E+00


The result for abinit compiled with IBM MPI is (no convergence):

Code: Select all

================================================================================

=== [ionmov= 2] Broyden-Fletcher-Goldfard-Shanno method (forces)           
================================================================================

--- Iteration: (  1/500) Internal Cycle: (1/1)
--------------------------------------------------------------------------------

---SELF-CONSISTENT-FIELD CONVERGENCE--------------------------------------------

     iter   Etot(hartree)      deltaE(h)  residm     vres2    diffor    maxfor
 ETOT  1  -2355.1567606197    -2.355E+03 6.727E-03 6.797E+04 1.437E-02 1.437E-02
 ETOT  2  -2381.3393905296    -2.618E+01 3.602E-03 7.727E+04 5.641E-02 7.078E-02
 ETOT  3  -2454.6409303549    -7.330E+01 7.864E-03 7.389E+03 3.435E-02 3.643E-02
 ETOT  4  -2459.8097748540    -5.169E+00 1.796E-03 2.369E+03 1.771E-02 1.872E-02
 ETOT  5  -2462.3713315893    -2.562E+00 3.889E-04 2.072E+01 1.247E-02 6.250E-03
 ETOT  6  -2462.3818106664    -1.048E-02 1.517E-03 2.364E+01 3.714E-03 9.963E-03
 ETOT  7  -2462.4013158659    -1.951E-02 1.431E-04 7.395E+00 1.965E-03 7.998E-03
 ETOT  8  -2462.4108882245    -9.572E-03 6.500E-05 3.242E-02 7.275E-04 8.725E-03
 ETOT  9  -2462.3879554104     2.293E-02 1.280E-05 2.668E-03 7.783E-06 8.718E-03
 ETOT 10  -2461.1280574983     1.260E+00 6.485E-06 5.053E+00 7.083E-06 8.725E-03
 ETOT 11  -2461.0464839574     8.157E-02 9.123E-05 2.796E+00 2.346E-04 8.490E-03
 ETOT 12  -2461.8178220150    -7.713E-01 8.622E-05 5.292E-01 9.956E-05 8.590E-03
 ETOT 13  -2462.3511561243    -5.333E-01 1.239E-05 5.065E-03 1.529E-04 8.743E-03
 ETOT 14  -2455.9749478611     6.376E+00 2.274E-06 2.615E+01 1.325E-02 2.200E-02
 ETOT 15  -2462.1809233422    -6.206E+00 8.275E-03 2.577E-02 1.411E-02 7.890E-03
 ETOT 16  -2462.8516957640    -6.708E-01 1.964E-04 1.997E+00 1.921E-03 9.812E-03
 ETOT 17  -2462.1760074388     6.757E-01 1.501E-02 4.730E-02 1.842E-03 7.970E-03
 ETOT 18  -2462.3473268886    -1.713E-01 2.828E-03 2.392E-04 8.447E-04 8.814E-03
 ETOT 19  -2461.0931791377     1.254E+00 9.428E-02 4.540E+01 8.995E-03 1.781E-02
 ETOT 20  -2472.6435061673    -1.155E+01 1.991E-02 2.450E+03 4.990E-02 6.771E-02
 ETOT 21  -2462.7669417649     9.877E+00 2.580E-02 8.381E+02 2.847E-02 3.923E-02
 ETOT 22  -2462.0524075595     7.145E-01 5.397E-03 2.001E-01 3.030E-02 8.931E-03
 ETOT 23  -2462.3924868590    -3.401E-01 1.303E-03 5.821E-05 2.143E-04 8.716E-03
 ETOT 24  -2462.2993641003     9.312E-02 7.872E-04 2.405E-02 3.069E-06 8.713E-03
 ETOT 25  -2462.4058375182    -1.065E-01 3.532E-04 2.189E-04 2.472E-05 8.738E-03
 ETOT 26  -2459.9956625583     2.410E+00 2.516E-04 3.019E+00 4.602E-03 1.334E-02
 ETOT 27  -2468.8581686922    -8.863E+00 4.178E-02 7.806E+02 3.037E-02 4.371E-02
 ETOT 28  -2471.3997442570    -2.542E+00 2.837E-02 5.447E+03 2.728E-02 7.099E-02
 ETOT 29  -2457.9015430165     1.350E+01 1.783E-02 5.559E+01 6.007E-02 1.092E-02
 ETOT 30  -2462.2692605733    -4.368E+00 1.910E-03 3.960E-01 3.237E-03 7.682E-03
 ETOT 31  -2462.3258802532    -5.662E-02 1.119E-03 8.455E-03 8.483E-04 8.530E-03
 ETOT 32  -2462.2106094438     1.153E-01 5.512E-04 7.591E-02 1.917E-04 8.721E-03
 ETOT 33  -2462.3868872010    -1.763E-01 2.631E-04 2.203E-04 1.264E-05 8.734E-03
 ETOT 34  -2457.6401701494     4.747E+00 2.766E-04 2.049E+00 5.535E-03 1.427E-02
 ETOT 35  -2462.2468561724    -4.607E+00 4.582E-04 5.691E-02 5.535E-03 8.735E-03
 ETOT 36  -2462.1075837582     1.393E-01 8.415E-05 6.262E-02 8.883E-05 8.646E-03
 ETOT 37  -2462.2058576695    -9.827E-02 3.216E-05 6.787E-02 4.174E-05 8.688E-03
 ETOT 38  -2462.3099543303    -1.041E-01 2.255E-05 6.632E-03 2.209E-05 8.710E-03
 ETOT 39  -2460.4549749883     1.855E+00 8.107E-06 5.564E+00 2.063E-04 8.503E-03
 ETOT 40  -2461.4777306191    -1.023E+00 3.157E-02 1.085E+01 8.590E-03 1.709E-02
 ETOT 41  -2462.3049321220    -8.272E-01 6.202E-03 6.416E-02 6.690E-03 1.040E-02
 ETOT 42  -2462.6974723555    -3.925E-01 2.159E-04 1.783E+00 6.571E-04 9.746E-03
 ETOT 43  -2461.0070948216     1.690E+00 3.950E-03 4.781E+00 4.500E-04 9.296E-03
 ETOT 44  -2462.4042090813    -1.397E+00 8.963E-05 1.068E-04 5.879E-04 8.708E-03
 ETOT 45  -2460.5684745668     1.836E+00 4.587E-06 4.112E+00 7.623E-04 7.946E-03
 ETOT 46  -2461.5964535029    -1.028E+00 1.919E-04 2.022E-01 6.537E-04 8.600E-03
 ETOT 47  -2461.6453809249    -4.893E-02 2.277E-05 1.003E+00 1.372E-06 8.598E-03
 ETOT 48  -2459.6832934065     1.962E+00 4.678E-02 2.195E+01 1.168E-02 2.027E-02
 ETOT 49  -2471.3401673154    -1.166E+01 2.065E-02 1.411E+03 3.667E-02 5.695E-02
 ETOT 50  -2468.1232005018     3.217E+00 3.430E-02 2.969E+03 1.195E-02 6.890E-02
 ETOT 51  -2460.9504897846     7.173E+00 1.175E-02 2.706E+00 5.915E-02 9.746E-03
 ETOT 52  -2462.3960308600    -1.446E+00 1.621E-03 1.942E-02 1.207E-03 8.539E-03
 ETOT 53  -2462.4102231876    -1.419E-02 1.634E-03 7.110E-04 1.967E-04 8.736E-03
 ETOT 54  -2462.3358128266     7.441E-02 7.948E-04 9.298E-03 1.465E-05 8.721E-03
 ETOT 55  -2462.3682582109    -3.245E-02 9.577E-04 5.593E-04 3.195E-06 8.718E-03
 ETOT 56  -2462.4064742819    -3.822E-02 4.440E-04 2.108E-04 4.405E-06 8.714E-03
 ETOT 57  -2461.0647944956     1.342E+00 1.242E-04 3.250E+00 2.480E-04 8.466E-03
 ETOT 58  -2462.8254312558    -1.761E+00 7.693E-05 2.032E+00 1.479E-03 9.945E-03
 ETOT 59  -2462.2966197139     5.288E-01 4.078E-03 1.527E-02 1.013E-03 8.932E-03
 ETOT 60  -2462.8349859633    -5.384E-01 1.600E-05 1.994E+00 7.719E-04 9.704E-03
 ETOT 61  -2471.5455561467    -8.711E+00 2.626E-02 1.465E+03 4.871E-02 5.841E-02
 ETOT 62  -2331.6504926034     1.399E+02 1.470E-01 2.100E+05 2.196E-02 3.645E-02
 ETOT 63  -2446.3107361237    -1.147E+02 2.220E-01 5.333E+02 2.519E-02 1.126E-02
 ETOT 64  -2462.1406142309    -1.583E+01 4.020E-03 1.117E+00 1.497E-03 9.765E-03
 ETOT 65  -2462.4107682737    -2.702E-01 1.394E-03 3.691E-03 1.130E-03 8.635E-03
 ETOT 66  -2462.4111356524    -3.674E-04 1.320E-03 2.239E-04 8.156E-05 8.717E-03
 ETOT 67  -2462.4069423186     4.193E-03 1.210E-03 1.336E-04 4.759E-06 8.722E-03
 ETOT 68  -2462.4053302336     1.612E-03 1.197E-04 1.312E-04 1.328E-06 8.723E-03
 ETOT 69  -2462.1029026369     3.024E-01 4.409E-05 1.284E-01 1.440E-05 8.737E-03
 ETOT 70  -2461.7389079021     3.640E-01 2.176E-05 7.156E-01 5.621E-05 8.793E-03
 ETOT 71  -2461.1642400137     5.747E-01 3.109E-05 2.747E+00 3.472E-04 8.446E-03
 ETOT 72  -2461.8320036520    -6.678E-01 9.973E-05 1.752E+00 8.437E-03 1.688E-02
 ETOT 73  -2460.9428528225     8.892E-01 1.184E-03 3.259E+00 8.508E-03 8.376E-03
 ETOT 74  -2462.9740126347    -2.031E+00 2.702E-05 4.539E+00 3.075E-04 8.683E-03
 ETOT 75  -2460.6729691135     2.301E+00 4.154E-03 4.019E+00 2.805E-04 8.403E-03
 ETOT 76  -2462.7144000681    -2.041E+00 6.349E-02 1.294E+00 1.361E-04 8.539E-03
 ETOT 77  -2464.6952913196    -1.981E+00 1.798E-02 5.997E+01 1.187E-02 2.041E-02
 ETOT 78  -2472.6175618324    -7.922E+00 1.083E-02 7.012E+03 5.906E-02 7.948E-02
 ETOT 79  -2452.7825268339     1.984E+01 3.182E-02 1.633E+02 6.309E-02 1.638E-02
 ETOT 80  -2460.4788273397    -7.696E+00 1.877E-03 1.069E+01 9.180E-03 7.204E-03
 ETOT 81  -2462.3805351743    -1.902E+00 1.263E-03 8.533E-03 1.367E-03 8.570E-03
 ETOT 82  -2462.4105710285    -3.004E-02 1.278E-03 9.371E-04 3.059E-05 8.601E-03
 ETOT 83  -2461.8958925865     5.147E-01 3.327E-04 8.181E-02 3.094E-06 8.598E-03
 ETOT 84  -2462.3777959337    -4.819E-01 1.908E-04 1.949E-03 1.175E-04 8.715E-03
 ETOT 85  -2461.1206802575     1.257E+00 5.584E-05 4.438E+00 3.862E-04 9.101E-03
 ETOT 86  -2459.0397092265     2.081E+00 8.952E-05 2.558E+00 8.140E-03 1.724E-02
 ETOT 87  -2461.3242101350    -2.285E+00 1.553E-04 2.003E+00 8.523E-03 8.718E-03
 ETOT 88  -2462.7256806687    -1.401E+00 2.127E-02 2.401E+00 5.915E-04 9.310E-03
 ETOT 89  -2462.8593788140    -1.337E-01 5.235E-03 2.031E+00 3.452E-04 9.655E-03
 ETOT 90  -2461.7231586564     1.136E+00 1.414E-02 1.182E+00 1.656E-03 7.999E-03
 ETOT 91  -2460.6362343049     1.087E+00 1.122E-04 1.886E-01 4.650E-03 3.349E-03
 ETOT 92  -2461.2338558575    -5.976E-01 3.169E-05 2.219E+00 5.688E-03 9.037E-03
 ETOT 93  -2461.3548812813    -1.210E-01 8.968E-05 3.168E+00 1.670E-03 7.367E-03
 ETOT 94  -2461.1181429693     2.367E-01 9.202E-05 2.496E+00 7.408E-04 8.108E-03
 ETOT 95  -2462.4042398987    -1.286E+00 2.887E-05 2.224E-02 1.862E-04 8.294E-03
 ETOT 96  -2462.4076774112    -3.438E-03 1.430E-07 6.419E-03 2.499E-04 8.544E-03
 ETOT 97  -2460.9653699703     1.442E+00 6.298E-08 2.179E-01 9.903E-03 1.845E-02
 ETOT 98  -2460.3685938131     5.968E-01 3.202E-05 7.009E+00 1.009E-02 8.355E-03
 ETOT 99  -2462.6333679887    -2.265E+00 6.216E-02 2.102E+00 1.778E-03 1.013E-02
 ETOT100  -2461.4726082165     1.161E+00 8.686E-03 1.158E+00 1.556E-03 8.577E-03
 ETOT101  -2462.4082745923    -9.357E-01 1.092E-04 7.111E-04 1.959E-04 8.773E-03
 ETOT102  -2462.4065883827     1.686E-03 1.772E-07 6.315E-04 4.811E-05 8.725E-03
 ETOT103  -2462.3666750247     3.991E-02 2.298E-07 2.568E-03 8.230E-05 8.807E-03
 ETOT104  -2455.1903458430     7.176E+00 1.017E-06 3.767E+01 1.012E-03 7.795E-03
 ETOT105  -2461.8399032955    -6.650E+00 3.739E-03 6.652E-02 2.874E-03 1.067E-02
 ETOT106  -2462.7739116957    -9.340E-01 1.030E-05 8.451E+00 4.484E-03 1.515E-02
 ETOT107  -2469.4274245609    -6.654E+00 2.334E-02 1.331E+04 5.592E-02 7.108E-02
 ETOT108  -2447.7609437576     2.167E+01 2.414E-02 3.205E+02 5.262E-02 1.846E-02
 ETOT109  -2460.8164263606    -1.306E+01 2.320E-03 5.032E+00 1.077E-02 7.691E-03
 ETOT110  -2462.4093151815    -1.593E+00 9.338E-04 2.435E-03 1.059E-03 8.751E-03
 ETOT111  -2462.4062386808     3.077E-03 2.043E-04 5.543E-04 1.598E-05 8.735E-03
 ETOT112  -2462.3969729261     9.266E-03 3.448E-03 8.513E-04 7.166E-06 8.728E-03
 ETOT113  -2462.1509455518     2.460E-01 1.228E-03 1.647E-01 5.971E-05 8.787E-03
 ETOT114  -2462.3776541505    -2.267E-01 1.777E-04 2.638E-03 6.549E-05 8.722E-03
 ETOT115  -2461.4212987627     9.564E-01 2.075E-05 2.288E-01 8.548E-05 8.807E-03
 ETOT116  -2462.2979142484    -8.766E-01 1.587E-03 9.919E-03 2.396E-04 8.568E-03
 ETOT117  -2462.3805288612    -8.261E-02 7.909E-06 2.991E-03 1.373E-04 8.705E-03
 ETOT118  -2461.1019370853     1.279E+00 1.598E-06 4.368E+00 4.629E-04 9.168E-03


The input file for both calculations was:

Code: Select all

########################## Definition of cell #################################
  acell   1.9118357528E+01  1.9118357528E+01  1.9612307652E+01  Bohr
  rprim=  1.0000000000E+00  0.0000000000E+00  0.0000000000E+00
          0.0000000000E+00  1.0000000000E+00  0.0000000000E+00
          0.0000000000E+00  0.0000000000E+00  1.0000000000E+00
########################### Definition of atoms ################################

     ntypat  2
     znucl   8.00000   40.00000
     natom   96

     xred    1.2500000000E-01 -5.2909066017E-17  1.2500000000E-01
             1.2500000000E-01  2.5000000000E-01  3.7500000000E-01
             3.7500000000E-01  2.5000000000E-01  1.2500000000E-01
             3.7500000000E-01 -5.2909066017E-17  3.7500000000E-01
             6.2500000000E-01 -5.2909066017E-17  1.2500000000E-01
             6.2500000000E-01  2.5000000000E-01  3.7500000000E-01
             8.7500000000E-01  2.5000000000E-01  1.2500000000E-01
             8.7500000000E-01 -5.2909066017E-17  3.7500000000E-01
             1.2500000000E-01  5.0000000000E-01  1.2500000000E-01
             1.2500000000E-01  7.5000000000E-01  3.7500000000E-01
             3.7500000000E-01  7.5000000000E-01  1.2500000000E-01
             3.7500000000E-01  5.0000000000E-01  3.7500000000E-01
             1.2500000000E-01 -5.2909066017E-17  6.2500000000E-01
             1.2500000000E-01  2.5000000000E-01  8.7500000000E-01
             3.7500000000E-01  2.5000000000E-01  6.2500000000E-01
             3.7500000000E-01 -5.2909066017E-17  8.7500000000E-01
             6.2500000000E-01  5.0000000000E-01  1.2500000000E-01
             6.2500000000E-01  7.5000000000E-01  3.7500000000E-01
             8.7500000000E-01  7.5000000000E-01  1.2500000000E-01
             8.7500000000E-01  5.0000000000E-01  3.7500000000E-01
             6.2500000000E-01 -5.2909066017E-17  6.2500000000E-01
             6.2500000000E-01  2.5000000000E-01  8.7500000000E-01
             8.7500000000E-01  2.5000000000E-01  6.2500000000E-01
             8.7500000000E-01 -5.2909066017E-17  8.7500000000E-01
             1.2500000000E-01  5.0000000000E-01  6.2500000000E-01
             1.2500000000E-01  7.5000000000E-01  8.7500000000E-01
             3.7500000000E-01  7.5000000000E-01  6.2500000000E-01
             3.7500000000E-01  5.0000000000E-01  8.7500000000E-01
             6.2500000000E-01  5.0000000000E-01  6.2500000000E-01
             6.2500000000E-01  7.5000000000E-01  8.7500000000E-01
             8.7500000000E-01  7.5000000000E-01  6.2500000000E-01
             8.7500000000E-01  5.0000000000E-01  8.7500000000E-01
            -7.2858385991E-17  1.2500000000E-01  4.7496459713E-01
            -7.2858385991E-17  1.2500000000E-01  2.2496459713E-01
            -7.2858385991E-17  3.7500000000E-01  2.7503540287E-01
            -7.2858385991E-17  3.7500000000E-01  2.5035402873E-02
             2.5000000000E-01  1.2500000000E-01  2.7503540287E-01
             2.5000000000E-01  1.2500000000E-01  2.5035402873E-02
             2.5000000000E-01  3.7500000000E-01  4.7496459713E-01
             2.5000000000E-01  3.7500000000E-01  2.2496459713E-01
             5.0000000000E-01  1.2500000000E-01  4.7496459713E-01
             5.0000000000E-01  1.2500000000E-01  2.2496459713E-01
             5.0000000000E-01  3.7500000000E-01  2.7503540287E-01
             5.0000000000E-01  3.7500000000E-01  2.5035402873E-02
             7.5000000000E-01  1.2500000000E-01  2.7503540287E-01
             7.5000000000E-01  1.2500000000E-01  2.5035402873E-02
             7.5000000000E-01  3.7500000000E-01  4.7496459713E-01
             7.5000000000E-01  3.7500000000E-01  2.2496459713E-01
            -7.2858385991E-17  6.2500000000E-01  4.7496459713E-01
            -7.2858385991E-17  6.2500000000E-01  2.2496459713E-01
            -7.2858385991E-17  8.7500000000E-01  2.7503540287E-01
            -7.2858385991E-17  8.7500000000E-01  2.5035402873E-02
             2.5000000000E-01  6.2500000000E-01  2.7503540287E-01
             2.5000000000E-01  6.2500000000E-01  2.5035402873E-02
             2.5000000000E-01  8.7500000000E-01  4.7496459713E-01
             2.5000000000E-01  8.7500000000E-01  2.2496459713E-01
            -7.2858385991E-17  1.2500000000E-01  9.7496459713E-01
            -7.2858385991E-17  1.2500000000E-01  7.2496459713E-01
            -7.2858385991E-17  3.7500000000E-01  7.7503540287E-01
            -7.2858385991E-17  3.7500000000E-01  5.2503540287E-01
             2.5000000000E-01  1.2500000000E-01  7.7503540287E-01
             2.5000000000E-01  1.2500000000E-01  5.2503540287E-01
             2.5000000000E-01  3.7500000000E-01  9.7496459713E-01
             2.5000000000E-01  3.7500000000E-01  7.2496459713E-01
             5.0000000000E-01  6.2500000000E-01  4.7496459713E-01
             5.0000000000E-01  6.2500000000E-01  2.2496459713E-01
             5.0000000000E-01  8.7500000000E-01  2.7503540287E-01
             5.0000000000E-01  8.7500000000E-01  2.5035402873E-02
             7.5000000000E-01  6.2500000000E-01  2.7503540287E-01
             7.5000000000E-01  6.2500000000E-01  2.5035402873E-02
             7.5000000000E-01  8.7500000000E-01  4.7496459713E-01
             7.5000000000E-01  8.7500000000E-01  2.2496459713E-01
             5.0000000000E-01  1.2500000000E-01  9.7496459713E-01
             5.0000000000E-01  1.2500000000E-01  7.2496459713E-01
             5.0000000000E-01  3.7500000000E-01  7.7503540287E-01
             5.0000000000E-01  3.7500000000E-01  5.2503540287E-01
             7.5000000000E-01  1.2500000000E-01  7.7503540287E-01
             7.5000000000E-01  1.2500000000E-01  5.2503540287E-01
             7.5000000000E-01  3.7500000000E-01  9.7496459713E-01
             7.5000000000E-01  3.7500000000E-01  7.2496459713E-01
            -7.2858385991E-17  6.2500000000E-01  9.7496459713E-01
            -7.2858385991E-17  6.2500000000E-01  7.2496459713E-01
            -7.2858385991E-17  8.7500000000E-01  7.7503540287E-01
            -7.2858385991E-17  8.7500000000E-01  5.2503540287E-01
             2.5000000000E-01  6.2500000000E-01  7.7503540287E-01
             2.5000000000E-01  6.2500000000E-01  5.2503540287E-01
             2.5000000000E-01  8.7500000000E-01  9.7496459713E-01
             2.5000000000E-01  8.7500000000E-01  7.2496459713E-01
             5.0000000000E-01  6.2500000000E-01  9.7496459713E-01
             5.0000000000E-01  6.2500000000E-01  7.2496459713E-01
             5.0000000000E-01  8.7500000000E-01  7.7503540287E-01
             5.0000000000E-01  8.7500000000E-01  5.2503540287E-01
             7.5000000000E-01  6.2500000000E-01  7.7503540287E-01
             7.5000000000E-01  6.2500000000E-01  5.2503540287E-01
             7.5000000000E-01  8.7500000000E-01  9.7496459713E-01
             7.5000000000E-01  8.7500000000E-01  7.2496459713E-01



     typat    2  2  2  2    2  2  2  2    2  2  2  2    2  2  2  2    2  2  2  2    2  2  2  2    2  2  2  2    2  2  2  2 
              1  1  1  1  1  1  1  1    1  1  1  1  1  1  1  1    1  1  1  1  1  1  1  1    1  1  1  1  1  1  1  1 
              1  1  1  1  1  1  1  1    1  1  1  1  1  1  1  1    1  1  1  1  1  1  1  1    1  1  1  1  1  1  1  1

     chkprim  0          # dos not check for primitive unit-cells



########################## Definition of k-Points ##########################

     kptopt   0          # read k-Points from nkpt
     nkpt     1          # number of k-Points
     nband    648        # number of Bands
     
##################### Definition of the planewave basis set #####################

     ecut    10.0        # Maximal kinetic energy cut-off. in Hartree



######################## Exchange-correlation functional ########################

     ixc 7               # LDA Perdew-Wang 92 functional



########################## Optimization of latice ##########################

     ionmov   2          # optimization of atom positions with the Broyden-Fletcher-Goldfarb-Shanno Minimization-Method
     optcell  2          # optimize cell
     tolmxf   1e-5       # max gradient on ion or stress in cell optimization
     strfact  20         # stopping criterion for cell optimization
     iscf     7          # Self-consistent calculation, using algorithm 7
     ntime    500        # Max. number of ion/cell optimization steps
     ecutsm   0.5        # needed for variable cell
     dilatmx  1.1        # needed for variable cell



######################## definition of the SCF procedure ########################

     toldff   1.0d-8     # SCF stopping criterion
     nstep    250        # Maximal number of SCF cycles
     diemac   5.0        # Dielectric constant



######################## parallelisation parameters for Abinit ########################

     paral_kgb= 1       # 0: only k-Point parallelisation (default)
                         # 1: all parallelisation Methods are used
                         # -X: calculates options for parallelisation with up to X Processors


   npband     4       # Number of Processors at the BAND level, npband has to be a divisor or equal to nband
   npfft      5        # Number of Processors at the FFT level
   npspinor   1        # Number of Processors at the SPINOR level, can be 1 or 2
   npkpt      1        # Number of Processors at the k-Point level, npkpt should be a divisor or equal to with
                         # the number of k-point/spin-components (nkpt*nsppol
   bandpp   1
   npimage  1


######################## Timinganalysis of Abinit ########################

     timopt     2        # short analysis, appropriate for parallel execution

I think this must be a problem with the IBM MPI.
I discovered two other curiosities for calculations with IBM MPI:

(1) If I make a calculation with npband = 0, and e.g. npfft = 20, nproc = 20 I get the error:

Code: Select all

---SELF-CONSISTENT-FIELD CONVERGENCE--------------------------------------------

 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  60  60  80
         ecut(hartree)=     10.000   => boxcut(ratio)=   2.20463

 getcut : COMMENT -
  Note that boxcut > 2.2 ; recall that boxcut=Gcut(box)/Gcut(sphere) = 2
  is sufficient for exact treatment of convolution.
  Such a large boxcut is a waste : you could raise ecut
  e.g. ecut=   12.150974 Hartrees makes boxcut=2


 ewald : nr and ng are    3 and   15

 ITER STEP NUMBER     1
 vtorho : nnsclo_now=  2, note that nnsclo,dbl_nnsclo,istep=  0 0  1
 ** On entry to DGEMM parameter number  8 had an illegal value
 ** On entry to DGEMM parameter number  8 had an illegal value
 ** On entry to DGEMM parameter number  8 had an illegal value
 ** On entry to DGEMM parameter number  8 had an illegal value
 ** On entry to DGEMM parameter number  8 had an illegal value
Message id 76 from task 9 to task 17 timed out.
epoch_ready=1 msg_len=8 hdr_len=20 msg_type=19 hndlr_idx=0
Last progress made at time 3 s. Current time 904 s.
ERROR 1 from file /afs/apd/u/hxue/sandbox/barlx2_work/src/ppe/lapi/include/DynamicModule.h line 53
Failed opening module libnuma.so. libnuma.so: cannot open shared object file: No such file or directory
ERROR: 0031-300  Forcing all remote tasks to exit due to exit code 1 in task 9


(2) If I set the parameter npband > 0, e.g. npband = 4 and npfft = 5, nproc = 20, I run into the oscillating problem with no convergence.

I tried one more and compile abinit with intel MPI and have the same oscillating problem like with IBM MPI.
Is it possible, that some configuration parameters are wrong or unset ?

Chris

Re: Abinit 7.2.1 on supercomputer with IBM MPi

Posted: Thu May 02, 2013 7:03 pm
by ChrisKue
The log file

Re: Abinit 7.2.1 on supercomputer with IBM MPi

Posted: Thu May 02, 2013 7:52 pm
by gmatteo
could you recompile the code with

enable_optim="standard"
enable_mpi_inplace="no"

and then let us know if this solves the problem with IBM MPI?

Re: Abinit 7.2.1 on supercomputer with IBM MPi

Posted: Fri May 03, 2013 4:26 pm
by ChrisKue
I recompiled the code with the options

Code: Select all

enable_optim="standard"
enable_mpi_inplace="no"
But again no changes in convergence.
Then I find out that IBM MPI use other compiler names as OpenMPI.

In OpenMPI the compile names are:
mpicc, mpiCC, mpifort

In IBM MPI this compilers are without partition manager support.
The correct IBM MPI compilers with partition manager are:
mpcc, mpCC mpfort (without "i")

... annoying modification from IBM!

Code: Select all

CC="/opt/ibmhpc/pecurrent/ppe.poe/bin/mpcc -compiler gnu"
CXX="/opt/ibmhpc/pecurrent/ppe.poe/bin/mpCC -compiler gnu"
FC="/opt/ibmhpc/pecurrent/ppe.poe/bin/mpfort -compiler gnu"

Recompiling with this compilers solve the numerical problem and the calculation converged. :D

Additionally I change the compiler with the flag "-compiler gcc" to GNU compiler. But I am not sure if this is a part of the solution. Before it was INTEL Compilers.

Thank you for your help.
Chris

Re: Abinit 7.2.1 on supercomputer with IBM MPi [SOLVED]

Posted: Thu Jul 04, 2013 7:42 pm
by ChrisKue
Hi, again.

It seems to be that the problem is not solved :(. After tests with much more pocessors the problem is back. I tried a lot of different configurations but get every time the same error.

Code: Select all

---SELF-CONSISTENT-FIELD CONVERGENCE--------------------------------------------

 getcut: wavevector=  0.0000  0.0000  0.0000  ngfft=  64  64  64
         ecut(hartree)=     10.000   => boxcut(ratio)=   2.29238

 getcut : COMMENT -
  Note that boxcut > 2.2 ; recall that boxcut=Gcut(box)/Gcut(sphere) = 2
  is sufficient for exact treatment of convolution.
  Such a large boxcut is a waste : you could raise ecut
  e.g. ecut=   13.137487 Hartrees makes boxcut=2


 ewald : nr and ng are    3 and   15

 ITER STEP NUMBER     1
 vtorho : nnsclo_now=  2, note that nnsclo,dbl_nnsclo,istep=  0 0  1
MKL ERROR: Parameter 8 was incorrect on entry to DGEMM .
MKL ERROR: Parameter 8 was incorrect on entry to DGEMM .
MKL ERROR: Parameter 8 was incorrect on entry to DGEMM .
MKL ERROR: Parameter 8 was incorrect on entry to DGEMM .
MKL ERROR: Parameter 8 was incorrect on entry to DGEMM .
MKL ERROR: Parameter 8 was incorrect on entry to DGEMM .
MKL ERROR: Parameter 8 was incorrect on entry to DGEMM .
...

If I do not compile with Intel MKL (netlib-fallback) I get no error message. But abinit stops at the same point.

The error occurs everytime in the first iteration in the first step
Configurations which I tried:
abinit 7.2.1 - GNU 4.7. - IBM MPI - with MKL and without MKL
abinit 7.2.1 - INTEL 12.1 - IBM MPI - with MKL and without MKL
abinit 7.2.2 - INTEL 12.1 - IBM MPI - with MKL and without MKL

The compilation is succesful with no errors.
Config file:

Code: Select all

enable_64bit_flags="yes"
prefix="${HOME}/abinit-7.2.1"

CC="mpicc"
CXX="mpiCC"
FC="mpif90"

with_dft_flavor="libxc+wannier90"

with_linalg_flavor="mkl"
with_linalg_incs="$MKL_INC"
with_linalg_libs="$MKL_SHLIB"

enable_mpi="yes"
with_mpi_libs="$MPI_LIB"
with_mpi_incs="$MPI_INC"

enable_mpi_io="yes"
enable_mpi_trace="yes"

enable_optim="aggressive"

I tried enable_optim="standard" as well.
Parameter 8 is the size of the first matrix M in dgemm.

Code: Select all

M, N, K
    Integers indicating the size of the matrices:

        A: M rows by K columns

        B: K rows by N columns

        C: M rows by N columns

http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/tutorials/mkl_mmx_f/GUID-36BFBCE9-EB0A-43B0-ADAF-2B65275726EA.htm

It depend on the count of atoms and processors when the error occurs. With the input file in this thread with approx. more than 265 processors.

Anyone has a idea what's going wrong?
I would be thankful for any help.

Chris