We have installed Abinit 7.4.3 on our HPC cluster. The installation looks OK and it detects the GPU correctly and runs without any error. However during the Ground State calculations the result starts to become unstable.
For example below is a simple Ground Stage calculation using GPU:
Code: Select all
================================================================================
iter Etot(hartree) deltaE(h) residm vres2 diffor maxfor
ETOT 1 -614.67427375897 -6.147E+02 4.717E+02 5.900E+03 1.946E-01 1.946E-01
ETOT 2 -625.17143397088 -1.050E+01 4.887E-02 1.056E+04 2.327E-01 2.194E-01
ETOT 3 -625.43735474989 -2.659E-01 2.372E-02 1.982E+02 1.783E-01 4.997E-02
ETOT 4 -625.44870336575 -1.135E-02 2.902E-03 5.599E+02 3.648E-02 5.826E-02
ETOT 5 -625.45241712954 -3.714E-03 5.347E-03 1.229E+02 8.376E-02 2.770E-02
ETOT 6 -625.42948954513 2.293E-02 2.676E-03 1.610E+03 8.347E-02 1.090E-01
ETOT 7 -625.45235843257 -2.287E-02 1.504E-03 4.714E+02 5.455E-02 6.004E-02
ETOT 8 -625.45289293032 -5.345E-04 1.467E-03 3.796E+02 3.038E-02 3.688E-02
ETOT 9 -625.46020574598 -7.313E-03 1.359E-03 8.227E+01 3.441E-02 7.403E-03
ETOT 10 -625.45992937236 2.764E-04 6.554E-04 9.409E+01 2.631E-03 6.256E-03
ETOT 11 -625.45957973008 3.496E-04 8.895E-04 1.165E+02 1.607E-02 1.591E-02
ETOT 12 -625.45996211767 -3.824E-04 4.067E-04 1.027E+02 7.077E-03 2.299E-02
ETOT 13 -625.45998908711 -2.697E-05 4.992E-04 1.366E+02 2.326E-02 4.625E-02
ETOT 14 -625.46077744096 -7.884E-04 4.742E-04 1.007E+02 4.907E-03 4.617E-02
ETOT 15 -625.46155659717 -7.792E-04 4.376E-04 4.587E+01 1.227E-02 3.390E-02
ETOT 16 -625.46185049469 -2.939E-04 4.852E-04 3.050E+01 5.599E-03 3.256E-02
ETOT 17 -625.46179040979 6.008E-05 3.677E-04 3.270E+01 1.225E-03 3.164E-02
ETOT 18 -625.46148686775 3.035E-04 3.539E-04 6.483E+01 1.174E-02 4.337E-02
ETOT 19 -625.46092150417 5.654E-04 1.575E-03 1.074E+02 9.513E-03 5.289E-02
ETOT 20 -625.46142489640 -5.034E-04 2.135E-03 6.336E+01 1.155E-02 4.133E-02
ETOT 21 -625.46176207532 -3.372E-04 2.658E-03 2.207E+01 2.015E-02 2.119E-02
ETOT 22 -625.46202927714 -2.672E-04 4.456E-03 4.347E-01 1.980E-02 1.393E-03
ETOT 23 -625.46204446163 -1.518E-05 9.901E-03 1.162E-01 2.197E-03 9.645E-04
ETOT 24 -625.46205698647 -1.252E-05 1.422E-02 1.500E-02 1.269E-03 1.237E-03
ETOT 25 -625.46205970161 -2.715E-06 2.804E-02 5.794E-03 8.660E-04 1.459E-03
ETOT 26 -625.46213203883 -7.234E-05 2.571E-02 3.322E-03 2.309E-04 1.228E-03
ETOT 27 -625.46229354344 -1.615E-04 6.794E-02 1.857E-03 5.736E-04 1.275E-03
ETOT 28 -625.46241877821 -1.252E-04 9.490E-02 7.197E-04 8.970E-05 1.262E-03
ETOT 29 -625.46239955130 1.923E-05 1.396E-01 9.224E-04 1.458E-05 1.254E-03
ETOT 30 -625.46479403242 -2.394E-03 1.801E+01 2.448E-03 2.343E-04 1.194E-03
ETOT 31 -625.46206179995 2.732E-03 1.455E+00 8.336E-04 8.157E-05 1.259E-03
ETOT 32 -625.46188952129 1.723E-04 1.463E+00 6.354E-04 8.958E-05 1.258E-03
ETOT 33 -625.46197051877 -8.100E-05 9.567E-02 1.040E-04 1.539E-04 1.242E-03
ETOT 34 -625.46194943353 2.109E-05 1.130E+00 5.291E-04 4.825E-05 1.213E-03
ETOT 35 -625.46276205637 -8.126E-04 1.489E-01 2.987E-04 3.724E-05 1.235E-03
ETOT 36 -625.46292613056 -1.641E-04 2.252E-01 3.962E-04 8.574E-05 1.160E-03
ETOT 37 -625.46318942199 -2.633E-04 9.648E+00 3.699E-03 2.828E-04 1.173E-03
ETOT 38 -625.46969350540 -6.504E-03 2.534E+01 2.352E-02 1.798E-04 1.042E-03
ETOT 39 -625.46363128740 6.062E-03 2.135E+01 2.685E-03 1.814E-04 1.141E-03
ETOT 40 -625.46421352465 -5.822E-04 2.165E+01 2.219E-03 1.400E-04 1.200E-03
ETOT 41 -625.46627147493 -2.058E-03 2.148E+01 6.273E-03 7.285E-05 1.171E-03
ETOT 42 -625.46400768318 2.264E-03 2.956E+01 2.827E-03 1.545E-04 1.295E-03
ETOT 43 -625.46171156333 2.296E-03 6.034E+00 2.025E-03 1.006E-04 1.194E-03
ETOT 44 -625.46153823985 1.733E-04 1.252E+00 4.623E-03 1.111E-04 1.290E-03
ETOT 45 -625.46180218026 -2.639E-04 4.374E-01 1.607E-03 2.003E-04 1.271E-03
ETOT 46 -625.46189523391 -9.305E-05 1.387E-01 1.602E-03 3.773E-05 1.233E-03
ETOT 47 -625.46198486281 -8.963E-05 9.258E-02 4.528E-05 2.187E-04 1.246E-03
ETOT 48 -625.46200178156 -1.692E-05 2.015E-02 3.491E-05 1.829E-05 1.252E-03
ETOT 49 -625.46201898264 -1.720E-05 8.556E-03 3.869E-05 8.890E-06 1.247E-03
ETOT 50 -625.46230267849 -2.837E-04 1.240E-01 1.303E-06 7.283E-06 1.243E-03
ETOT 51 -625.46202695009 2.757E-04 6.509E-02 3.273E-05 1.534E-05 1.251E-03
ETOT 52 -625.46201688702 1.006E-05 4.299E-03 3.603E-05 5.260E-06 1.253E-03
ETOT 53 -625.46214993996 -1.331E-04 4.811E-02 1.676E-05 4.655E-06 1.251E-03
ETOT 54 -625.46204091336 1.090E-04 2.675E-02 2.879E-05 7.792E-06 1.251E-03
ETOT 55 -628.03593538990 -2.574E+00 3.079E+03 3.862E+02 4.657E-02 4.532E-02
ETOT 56 -627.17880347739 8.571E-01 1.285E+03 2.445E+02 1.494E-02 3.038E-02
ETOT 57 -626.12131376390 1.057E+00 2.699E+03 6.884E+01 2.024E-02 1.343E-02
ETOT 58 -626.00415509520 1.172E-01 3.814E+02 5.156E+01 5.683E-03 7.752E-03
ETOT 59 -627.38439375739 -1.380E+00 3.416E+03 2.786E+02 2.669E-02 3.354E-02
ETOT 60 -632.95425743477 -5.570E+00 4.342E+03 1.054E+03 1.026E-01 1.315E-01
ETOT 61 -650.16880154160 -1.721E+01 2.119E+04 2.427E+03 2.645E-01 3.960E-01
ETOT 62 -635.45601580413 1.471E+01 8.000E+03 1.322E+03 2.270E-01 1.690E-01
ETOT 63 -654.94809236847 -1.949E+01 1.144E+04 2.631E+03 2.831E-01 4.475E-01
ETOT 64 -893.64730728837 -2.387E+02 1.807E+05 6.825E+03 2.771E+00 3.218E+00
ETOT 65 -728.14012678208 1.655E+02 5.614E+04 4.884E+03 1.815E+00 1.428E+00
ETOT 66 -1043.5032395855 -3.154E+02 4.429E+05 7.792E+03 3.224E+00 4.627E+00
ETOT 67 -3634.9490307075 -2.591E+03 1.044E+06 1.165E+04 1.729E+01 2.192E+01
ETOT 68 -8397.8304723706 -4.763E+03 1.181E+06 1.302E+04 2.361E+01 4.553E+01
ETOT 69 -104390.57286085 -9.599E+04 1.404E+07 1.528E+04 2.950E+02 3.405E+02
ETOT 70 -19029.180990350 8.536E+04 5.049E+06 1.367E+04 2.555E+02 8.506E+01
ETOT 71 -22443.391627179 -3.414E+03 9.421E+06 1.483E+04 2.685E+01 1.062E+02
ETOT 72 -53781.205583605 -3.134E+04 6.185E+06 1.281E+04 1.052E+02 1.480E+02
ETOT 73 -350903.81671060 -2.971E+05 3.494E+07 1.256E+04 4.965E+02 6.007E+02
ETOT 74 -797082.79203942 -4.462E+05 5.181E+07 1.128E+04 6.468E+02 1.111E+03
ETOT 75 -273544.63610541 5.235E+05 3.695E+07 7.505E+03 8.899E+02 3.069E+02
ETOT 76 -392045.68540145 -1.185E+05 4.876E+07 7.573E+03 3.715E+02 4.900E+02
ETOT 77 -1790072.7958265 -1.398E+06 1.423E+08 3.189E+03 5.472E+02 1.037E+03
ETOT 78 -8518335.8132030 -6.728E+06 3.485E+08 9.323E+02 2.488E+03 1.451E+03
ETOT 79 -8027037.8074598 4.913E+05 2.116E+08 3.640E+03 2.411E+03 3.862E+03
ETOT 80 -14497554.703925 -6.471E+06 6.400E+08 2.495E+03 1.495E+03 3.852E+03
ETOT 81 -2642857.3088491 1.185E+07 1.605E+08 2.909E+03 2.605E+03 1.246E+03
ETOT 82 -6999151.4590077 -4.356E+06 1.853E+08 1.176E+04 3.211E+03 4.457E+03
ETOT 83 -92241497.370347 -8.524E+07 1.064E+09 2.420E+02 2.680E+03 5.682E+03
ETOT 84 -496249710.79844 -4.040E+08 7.721E+09 1.344E+01 8.434E+03 6.337E+03
ETOT 85 -349355358.81005 1.469E+08 6.006E+09 3.341E+01 7.370E+03 4.686E+03
ETOT 86 -690958872.02225 -3.416E+08 9.879E+09 6.304E-01 4.962E+03 2.220E+03
ETOT 87 -1526464156.5710 -8.355E+08 9.610E+09 2.742E+00 7.782E+03 7.955E+03
ETOT 88 -485404640.40571 1.041E+09 1.031E+10 1.479E-01 7.006E+03 9.493E+02
ETOT 89 -13980219.489800 4.714E+08 1.238E+09 2.443E-01 9.715E+02 4.652E+01
ETOT 90 -9260982.1081759 4.719E+06 6.860E+08 1.948E-01 2.779E+01 1.873E+01
ETOT 91 -3103866.6737715 6.157E+06 7.555E+08 8.005E-01 2.923E+01 1.866E+01
ETOT 92 -1158202.5009766 1.946E+06 6.250E+08 1.663E+00 1.118E+01 7.476E+00
ETOT 93 -1628667.1855357 -4.705E+05 7.143E+08 7.381E-01 1.244E+01 4.965E+00
ETOT 94 -3379321.6251889 -1.751E+06 2.617E+08 6.826E-01 1.005E+01 1.295E+01
ETOT 95 -343399.17585614 3.036E+06 1.130E+08 4.333E+00 1.324E+01 4.648E+00
ETOT 96 -2027875.1023575 -1.684E+06 2.281E+08 6.816E-01 4.991E+00 9.157E+00
ETOT 97 -4527325.3363628 -2.499E+06 1.628E+08 3.346E-01 3.947E+00 1.261E+01
ETOT 98 -13750774.583784 -9.223E+06 5.468E+08 1.548E-01 2.454E+01 3.715E+01
ETOT 99 -6615240.6457553 7.136E+06 3.729E+08 2.532E-01 1.521E+01 2.194E+01
ETOT 100 -1487759.0490269 5.127E+06 4.633E+08 1.131E+00 1.952E+01 8.834E+00
scprqt: WARNING -
nstep= 100 was not enough SCF cycles to converge;
potential residual= 1.131E+00 exceeds tolvrs= 1.000E-12
You can see from 50th iteration the system starts to become unstable, while running the same script without using GPU leads to convergence after 43 iterations as listed below:
Code: Select all
================================================================================
iter Etot(hartree) deltaE(h) residm vres2 diffor maxfor
ETOT 1 -614.67427375897 -6.147E+02 4.717E+02 5.900E+03 1.946E-01 1.946E-01
ETOT 2 -625.17143397088 -1.050E+01 4.887E-02 1.056E+04 2.327E-01 2.194E-01
ETOT 3 -625.43735474989 -2.659E-01 2.372E-02 1.982E+02 1.783E-01 4.997E-02
ETOT 4 -625.44870333943 -1.135E-02 2.902E-03 5.599E+02 3.648E-02 5.826E-02
ETOT 5 -625.45242405097 -3.721E-03 5.341E-03 1.227E+02 8.372E-02 2.768E-02
ETOT 6 -625.42951148671 2.291E-02 2.677E-03 1.609E+03 8.346E-02 1.089E-01
ETOT 7 -625.45236462880 -2.285E-02 1.503E-03 4.711E+02 5.452E-02 6.002E-02
ETOT 8 -625.45289954886 -5.349E-04 1.468E-03 3.792E+02 3.039E-02 3.684E-02
ETOT 9 -625.46020258128 -7.303E-03 1.357E-03 8.238E+01 3.441E-02 7.381E-03
ETOT 10 -625.45993109615 2.715E-04 6.665E-04 9.402E+01 2.606E-03 6.245E-03
ETOT 11 -625.45956897175 3.621E-04 8.892E-04 1.171E+02 1.628E-02 1.611E-02
ETOT 12 -625.45995970528 -3.907E-04 4.121E-04 1.031E+02 7.142E-03 2.325E-02
ETOT 13 -625.46001760244 -5.790E-05 4.989E-04 1.339E+02 2.234E-02 4.559E-02
ETOT 14 -625.46081833914 -8.007E-04 4.563E-04 9.762E+01 5.111E-03 4.553E-02
ETOT 15 -625.46153912087 -7.208E-04 4.420E-04 4.727E+01 1.120E-02 3.433E-02
ETOT 16 -625.46182012216 -2.810E-04 4.702E-04 3.493E+01 4.468E-03 3.460E-02
ETOT 17 -625.46175654526 6.358E-05 3.755E-04 3.649E+01 1.303E-03 3.330E-02
ETOT 18 -625.46160983850 1.467E-04 3.460E-04 5.703E+01 8.550E-03 4.185E-02
ETOT 19 -625.46090254058 7.073E-04 2.623E-04 1.203E+02 1.633E-02 5.818E-02
ETOT 20 -625.46176897178 -8.664E-04 2.040E-04 2.251E+01 3.546E-02 2.271E-02
ETOT 21 -625.46189436686 -1.254E-04 1.370E-04 1.081E+01 7.634E-03 1.508E-02
ETOT 22 -625.46201162069 -1.173E-04 1.085E-04 2.970E-01 1.403E-02 1.291E-03
ETOT 23 -625.46201548672 -3.866E-06 1.033E-04 1.162E-01 5.557E-04 1.596E-03
ETOT 24 -625.46201557276 -8.603E-08 6.328E-05 3.339E-02 8.990E-04 1.404E-03
ETOT 25 -625.46201581900 -2.462E-07 7.313E-05 7.643E-04 8.262E-04 1.304E-03
ETOT 26 -625.46201578657 3.244E-08 5.447E-05 4.402E-04 1.106E-04 1.246E-03
ETOT 27 -625.46201571842 6.815E-08 4.970E-05 1.346E-04 3.715E-05 1.250E-03
ETOT 28 -625.46201573272 -1.430E-08 4.536E-05 1.276E-06 4.976E-05 1.247E-03
ETOT 29 -625.46201572899 3.728E-09 3.809E-05 1.889E-06 8.842E-06 1.250E-03
ETOT 30 -625.46201573455 -5.564E-09 4.280E-05 1.315E-06 7.209E-06 1.249E-03
ETOT 31 -625.46201573341 1.142E-09 3.527E-05 1.427E-06 9.650E-06 1.251E-03
ETOT 32 -625.46201573163 1.782E-09 3.923E-05 1.958E-08 4.146E-06 1.250E-03
ETOT 33 -625.46201573071 9.254E-10 3.174E-05 9.487E-08 6.736E-07 1.249E-03
ETOT 34 -625.46201573132 -6.112E-10 3.498E-05 1.789E-09 1.519E-06 1.249E-03
ETOT 35 -625.46201573159 -2.717E-10 2.784E-05 1.193E-08 8.205E-07 1.249E-03
ETOT 36 -625.46201573145 1.374E-10 3.043E-05 1.586E-10 5.692E-07 1.249E-03
ETOT 37 -625.46201573149 -3.502E-11 2.387E-05 1.467E-10 2.241E-08 1.249E-03
ETOT 38 -625.46201573149 -9.095E-13 2.592E-05 9.312E-11 2.447E-08 1.249E-03
ETOT 39 -625.46201573151 -1.955E-11 2.165E-05 4.730E-12 5.574E-08 1.249E-03
ETOT 40 -625.46201573150 5.684E-12 2.168E-05 7.849E-12 6.289E-09 1.249E-03
ETOT 41 -625.46201573151 -1.091E-11 2.287E-05 2.422E-12 4.698E-09 1.249E-03
ETOT 42 -625.46201573151 2.956E-12 1.786E-05 3.201E-12 2.839E-09 1.249E-03
ETOT 43 -625.46201573151 -3.411E-12 2.382E-05 5.239E-13 6.703E-09 1.249E-03
At SCF step 43 vres2 = 5.24E-13 < tolvrs= 1.00E-12 =>converged.
As you can see the first few iterations are identical but when Abinit uses the GPU it starts to become unstable.
The node is equipped with a Tesla K20 GPU and Abinit detects the device correctly:
Code: Select all
-P-0000
-P-0000 setdevice_cuda : COMMENT -
-P-0000 GPU 0 has been properly initialized, continuing...
-P-0000
-P-0000 ________________________________________________________________________________
-P-0000 ________________________ Graphic Card Properties _______________________________
-P-0000
-P-0000 Device 0 : Tesla K20c
-P-0000 Revision number: 3.5
-P-0000 Total amount of global memory: 4799.6 Mbytes
-P-0000 Clock rate: 0.7 GHz
-P-0000 Max GFLOP: 73 GFP
-P-0000 Total constant memory: 65536 bytes
-P-0000 Shared memory per block: 49152 bytes
-P-0000 Number of registers per block: 65536
-P-0000
-P-0000 ________________________________________________________________________________
-P-0000
and this is the build information:
Code: Select all
=== Build Information ===
Version : 7.4.3
Build target : x86_64_linux_intel13.0
Build date : 20131113
=== Compiler Suite ===
C compiler : intel13.0
CFLAGS : -g -O2 -vec-report0
C++ compiler : intel13.0
CXXFLAGS : -g -O2 -vec-report0
Fortran compiler : intel13.0
FCFLAGS : -g -extend-source -vec-report0 -noaltparam -nofpscomp
FC_LDFLAGS : -static-intel -static-libgcc
=== Optimizations ===
Debug level : basic
Optimization level : standard
Architecture : intel_xeon
=== MPI ===
Parallel build : yes
Parallel I/O : auto
Time tracing : no
GPU support : yes
=== Connectors / Fallbacks ===
Connectors on : yes
Fallbacks on : yes
DFT flavor : libxc+atompaw+bigdft-fallback+wannier90
FFT flavor : fftw3-mkl
LINALG flavor : mkl
MATH flavor : none
TIMER flavor : abinit
TRIO flavor : netcdf-fallback+etsf_io-fallback+fox-fallback
=== Experimental features ===
Bindings : no
Exports : no
GW double-precision : yes
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Default optimizations:
-O2 -xHost
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPP options activated during the build:
CC_INTEL CXX_INTEL FC_INTEL
HAVE_DFT_ATOMPAW HAVE_DFT_BIGDFT HAVE_DFT_LIBXC
HAVE_DFT_WANNIER90 HAVE_FC_ALLOCATABLE_DT... HAVE_FC_CONTIGUOUS
HAVE_FC_CPUTIME HAVE_FC_ETIME HAVE_FC_EXIT
HAVE_FC_FLUSH HAVE_FC_GAMMA HAVE_FC_GETENV
HAVE_FC_GETPID HAVE_FC_IOMSG HAVE_FC_ISO_C_BINDING
HAVE_FC_LONG_LINES HAVE_FC_NULL HAVE_FC_STREAM_IO
HAVE_FFT HAVE_FFT_FFTW3_MKL HAVE_FFT_MPI
HAVE_FFT_SERIAL HAVE_GPU HAVE_GPU_CUDA
HAVE_GPU_CUDA_DP HAVE_GPU_SERIAL HAVE_LINALG
HAVE_LINALG_AXPBY HAVE_LINALG_GEMM3M HAVE_LINALG_MKL_IMATCOPY
HAVE_LINALG_MKL_OMATADD HAVE_LINALG_MKL_OMATCOPY HAVE_LINALG_SERIAL
HAVE_MPI HAVE_MPI2 HAVE_MPI_IO
HAVE_MPI_TYPE_CREATE_S... HAVE_OS_LINUX HAVE_TIMER
HAVE_TIMER_ABINIT HAVE_TIMER_MPI HAVE_TIMER_SERIAL
HAVE_TRIO_ETSF_IO HAVE_TRIO_FOX HAVE_TRIO_NETCDF
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Finally, this is the configuration script that has been used to build Abinit:
Code: Select all
/configure \
--prefix="/gpfs/apps/abinit/v7.4.3_cuda" \
--enable-mpi \
--with-mpi-prefix="$MPI_HOME" \
--with-fft-flavor="fftw3-mkl" \
--with-fft-libs="-L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm" \
--with-linalg-flavor="mkl" \
--with-linalg-libs="-L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm" \
--enable-atompaw --with-atompaw-bins="$ATOMPAW_HOME/bin" --with-atompaw-libs="-L$ATOMPAW_HOME/lib -latompaw" \
--with-libxc-incs="-I$LIBXC_HOME/include" --with-libxc-libs="-L$LIBXC_HOME/lib -lxc" \
--enable-wannier90 --with-wannier90-bins="$WANNIER_ROOT" --with-wannier90-libs="-L$WANNIER_ROOT -lwannier"\
--with-dft-flavor="atompaw+bigdft+libxc+wannier90" \
--with-trio-flavor="etsf_io+fox+netcdf" \
--enable-clib="yes" \
--enable-gw-dpc="yes" \
--enable-gpu \
FC=mpif90 CC=mpicc CXX=mpiCC
Can you think of anything that might cause the instability of Abinit using GPU during calculation?
Thank you very much in advanced,
Farzad