ABINIT 7.4.3 + CUDA cause unstable result

option, parallelism,...

Moderators: fgoudreault, mcote

Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Locked
User avatar
f.hayati
Posts: 2
Joined: Fri Jul 15, 2011 1:08 pm
Location: UK
Contact:

ABINIT 7.4.3 + CUDA cause unstable result

Post by f.hayati » Wed Nov 27, 2013 11:40 am

Dear all,

We have installed Abinit 7.4.3 on our HPC cluster. The installation looks OK and it detects the GPU correctly and runs without any error. However during the Ground State calculations the result starts to become unstable.

For example below is a simple Ground Stage calculation using GPU:

Code: Select all

================================================================================

     iter   Etot(hartree)      deltaE(h)  residm     vres2    diffor    maxfor
 ETOT  1  -614.67427375897    -6.147E+02 4.717E+02 5.900E+03 1.946E-01 1.946E-01
 ETOT  2  -625.17143397088    -1.050E+01 4.887E-02 1.056E+04 2.327E-01 2.194E-01
 ETOT  3  -625.43735474989    -2.659E-01 2.372E-02 1.982E+02 1.783E-01 4.997E-02
 ETOT  4  -625.44870336575    -1.135E-02 2.902E-03 5.599E+02 3.648E-02 5.826E-02
 ETOT  5  -625.45241712954    -3.714E-03 5.347E-03 1.229E+02 8.376E-02 2.770E-02
 ETOT  6  -625.42948954513     2.293E-02 2.676E-03 1.610E+03 8.347E-02 1.090E-01
 ETOT  7  -625.45235843257    -2.287E-02 1.504E-03 4.714E+02 5.455E-02 6.004E-02
 ETOT  8  -625.45289293032    -5.345E-04 1.467E-03 3.796E+02 3.038E-02 3.688E-02
 ETOT  9  -625.46020574598    -7.313E-03 1.359E-03 8.227E+01 3.441E-02 7.403E-03
 ETOT 10  -625.45992937236     2.764E-04 6.554E-04 9.409E+01 2.631E-03 6.256E-03
 ETOT 11  -625.45957973008     3.496E-04 8.895E-04 1.165E+02 1.607E-02 1.591E-02
 ETOT 12  -625.45996211767    -3.824E-04 4.067E-04 1.027E+02 7.077E-03 2.299E-02
 ETOT 13  -625.45998908711    -2.697E-05 4.992E-04 1.366E+02 2.326E-02 4.625E-02
 ETOT 14  -625.46077744096    -7.884E-04 4.742E-04 1.007E+02 4.907E-03 4.617E-02
 ETOT 15  -625.46155659717    -7.792E-04 4.376E-04 4.587E+01 1.227E-02 3.390E-02
 ETOT 16  -625.46185049469    -2.939E-04 4.852E-04 3.050E+01 5.599E-03 3.256E-02
 ETOT 17  -625.46179040979     6.008E-05 3.677E-04 3.270E+01 1.225E-03 3.164E-02
 ETOT 18  -625.46148686775     3.035E-04 3.539E-04 6.483E+01 1.174E-02 4.337E-02
 ETOT 19  -625.46092150417     5.654E-04 1.575E-03 1.074E+02 9.513E-03 5.289E-02
 ETOT 20  -625.46142489640    -5.034E-04 2.135E-03 6.336E+01 1.155E-02 4.133E-02
 ETOT 21  -625.46176207532    -3.372E-04 2.658E-03 2.207E+01 2.015E-02 2.119E-02
 ETOT 22  -625.46202927714    -2.672E-04 4.456E-03 4.347E-01 1.980E-02 1.393E-03
 ETOT 23  -625.46204446163    -1.518E-05 9.901E-03 1.162E-01 2.197E-03 9.645E-04
 ETOT 24  -625.46205698647    -1.252E-05 1.422E-02 1.500E-02 1.269E-03 1.237E-03
 ETOT 25  -625.46205970161    -2.715E-06 2.804E-02 5.794E-03 8.660E-04 1.459E-03
 ETOT 26  -625.46213203883    -7.234E-05 2.571E-02 3.322E-03 2.309E-04 1.228E-03
 ETOT 27  -625.46229354344    -1.615E-04 6.794E-02 1.857E-03 5.736E-04 1.275E-03
 ETOT 28  -625.46241877821    -1.252E-04 9.490E-02 7.197E-04 8.970E-05 1.262E-03
 ETOT 29  -625.46239955130     1.923E-05 1.396E-01 9.224E-04 1.458E-05 1.254E-03
 ETOT 30  -625.46479403242    -2.394E-03 1.801E+01 2.448E-03 2.343E-04 1.194E-03
 ETOT 31  -625.46206179995     2.732E-03 1.455E+00 8.336E-04 8.157E-05 1.259E-03
 ETOT 32  -625.46188952129     1.723E-04 1.463E+00 6.354E-04 8.958E-05 1.258E-03
 ETOT 33  -625.46197051877    -8.100E-05 9.567E-02 1.040E-04 1.539E-04 1.242E-03
 ETOT 34  -625.46194943353     2.109E-05 1.130E+00 5.291E-04 4.825E-05 1.213E-03
 ETOT 35  -625.46276205637    -8.126E-04 1.489E-01 2.987E-04 3.724E-05 1.235E-03
 ETOT 36  -625.46292613056    -1.641E-04 2.252E-01 3.962E-04 8.574E-05 1.160E-03
 ETOT 37  -625.46318942199    -2.633E-04 9.648E+00 3.699E-03 2.828E-04 1.173E-03
 ETOT 38  -625.46969350540    -6.504E-03 2.534E+01 2.352E-02 1.798E-04 1.042E-03
 ETOT 39  -625.46363128740     6.062E-03 2.135E+01 2.685E-03 1.814E-04 1.141E-03
 ETOT 40  -625.46421352465    -5.822E-04 2.165E+01 2.219E-03 1.400E-04 1.200E-03
 ETOT 41  -625.46627147493    -2.058E-03 2.148E+01 6.273E-03 7.285E-05 1.171E-03
 ETOT 42  -625.46400768318     2.264E-03 2.956E+01 2.827E-03 1.545E-04 1.295E-03
 ETOT 43  -625.46171156333     2.296E-03 6.034E+00 2.025E-03 1.006E-04 1.194E-03
 ETOT 44  -625.46153823985     1.733E-04 1.252E+00 4.623E-03 1.111E-04 1.290E-03
 ETOT 45  -625.46180218026    -2.639E-04 4.374E-01 1.607E-03 2.003E-04 1.271E-03
 ETOT 46  -625.46189523391    -9.305E-05 1.387E-01 1.602E-03 3.773E-05 1.233E-03
 ETOT 47  -625.46198486281    -8.963E-05 9.258E-02 4.528E-05 2.187E-04 1.246E-03
 ETOT 48  -625.46200178156    -1.692E-05 2.015E-02 3.491E-05 1.829E-05 1.252E-03
 ETOT 49  -625.46201898264    -1.720E-05 8.556E-03 3.869E-05 8.890E-06 1.247E-03
 ETOT 50  -625.46230267849    -2.837E-04 1.240E-01 1.303E-06 7.283E-06 1.243E-03
 ETOT 51  -625.46202695009     2.757E-04 6.509E-02 3.273E-05 1.534E-05 1.251E-03
 ETOT 52  -625.46201688702     1.006E-05 4.299E-03 3.603E-05 5.260E-06 1.253E-03
 ETOT 53  -625.46214993996    -1.331E-04 4.811E-02 1.676E-05 4.655E-06 1.251E-03
 ETOT 54  -625.46204091336     1.090E-04 2.675E-02 2.879E-05 7.792E-06 1.251E-03
 ETOT 55  -628.03593538990    -2.574E+00 3.079E+03 3.862E+02 4.657E-02 4.532E-02
 ETOT 56  -627.17880347739     8.571E-01 1.285E+03 2.445E+02 1.494E-02 3.038E-02
 ETOT 57  -626.12131376390     1.057E+00 2.699E+03 6.884E+01 2.024E-02 1.343E-02
 ETOT 58  -626.00415509520     1.172E-01 3.814E+02 5.156E+01 5.683E-03 7.752E-03
 ETOT 59  -627.38439375739    -1.380E+00 3.416E+03 2.786E+02 2.669E-02 3.354E-02
 ETOT 60  -632.95425743477    -5.570E+00 4.342E+03 1.054E+03 1.026E-01 1.315E-01
 ETOT 61  -650.16880154160    -1.721E+01 2.119E+04 2.427E+03 2.645E-01 3.960E-01
 ETOT 62  -635.45601580413     1.471E+01 8.000E+03 1.322E+03 2.270E-01 1.690E-01
 ETOT 63  -654.94809236847    -1.949E+01 1.144E+04 2.631E+03 2.831E-01 4.475E-01
 ETOT 64  -893.64730728837    -2.387E+02 1.807E+05 6.825E+03 2.771E+00 3.218E+00
 ETOT 65  -728.14012678208     1.655E+02 5.614E+04 4.884E+03 1.815E+00 1.428E+00
 ETOT 66  -1043.5032395855    -3.154E+02 4.429E+05 7.792E+03 3.224E+00 4.627E+00
 ETOT 67  -3634.9490307075    -2.591E+03 1.044E+06 1.165E+04 1.729E+01 2.192E+01
 ETOT 68  -8397.8304723706    -4.763E+03 1.181E+06 1.302E+04 2.361E+01 4.553E+01
 ETOT 69  -104390.57286085    -9.599E+04 1.404E+07 1.528E+04 2.950E+02 3.405E+02
 ETOT 70  -19029.180990350     8.536E+04 5.049E+06 1.367E+04 2.555E+02 8.506E+01
 ETOT 71  -22443.391627179    -3.414E+03 9.421E+06 1.483E+04 2.685E+01 1.062E+02
 ETOT 72  -53781.205583605    -3.134E+04 6.185E+06 1.281E+04 1.052E+02 1.480E+02
 ETOT 73  -350903.81671060    -2.971E+05 3.494E+07 1.256E+04 4.965E+02 6.007E+02
 ETOT 74  -797082.79203942    -4.462E+05 5.181E+07 1.128E+04 6.468E+02 1.111E+03
 ETOT 75  -273544.63610541     5.235E+05 3.695E+07 7.505E+03 8.899E+02 3.069E+02
 ETOT 76  -392045.68540145    -1.185E+05 4.876E+07 7.573E+03 3.715E+02 4.900E+02
 ETOT 77  -1790072.7958265    -1.398E+06 1.423E+08 3.189E+03 5.472E+02 1.037E+03
 ETOT 78  -8518335.8132030    -6.728E+06 3.485E+08 9.323E+02 2.488E+03 1.451E+03
 ETOT 79  -8027037.8074598     4.913E+05 2.116E+08 3.640E+03 2.411E+03 3.862E+03
 ETOT 80  -14497554.703925    -6.471E+06 6.400E+08 2.495E+03 1.495E+03 3.852E+03
 ETOT 81  -2642857.3088491     1.185E+07 1.605E+08 2.909E+03 2.605E+03 1.246E+03
 ETOT 82  -6999151.4590077    -4.356E+06 1.853E+08 1.176E+04 3.211E+03 4.457E+03
 ETOT 83  -92241497.370347    -8.524E+07 1.064E+09 2.420E+02 2.680E+03 5.682E+03
 ETOT 84  -496249710.79844    -4.040E+08 7.721E+09 1.344E+01 8.434E+03 6.337E+03
 ETOT 85  -349355358.81005     1.469E+08 6.006E+09 3.341E+01 7.370E+03 4.686E+03
 ETOT 86  -690958872.02225    -3.416E+08 9.879E+09 6.304E-01 4.962E+03 2.220E+03
 ETOT 87  -1526464156.5710    -8.355E+08 9.610E+09 2.742E+00 7.782E+03 7.955E+03
 ETOT 88  -485404640.40571     1.041E+09 1.031E+10 1.479E-01 7.006E+03 9.493E+02
 ETOT 89  -13980219.489800     4.714E+08 1.238E+09 2.443E-01 9.715E+02 4.652E+01
 ETOT 90  -9260982.1081759     4.719E+06 6.860E+08 1.948E-01 2.779E+01 1.873E+01
 ETOT 91  -3103866.6737715     6.157E+06 7.555E+08 8.005E-01 2.923E+01 1.866E+01
 ETOT 92  -1158202.5009766     1.946E+06 6.250E+08 1.663E+00 1.118E+01 7.476E+00
 ETOT 93  -1628667.1855357    -4.705E+05 7.143E+08 7.381E-01 1.244E+01 4.965E+00
 ETOT 94  -3379321.6251889    -1.751E+06 2.617E+08 6.826E-01 1.005E+01 1.295E+01
 ETOT 95  -343399.17585614     3.036E+06 1.130E+08 4.333E+00 1.324E+01 4.648E+00
 ETOT 96  -2027875.1023575    -1.684E+06 2.281E+08 6.816E-01 4.991E+00 9.157E+00
 ETOT 97  -4527325.3363628    -2.499E+06 1.628E+08 3.346E-01 3.947E+00 1.261E+01
 ETOT 98  -13750774.583784    -9.223E+06 5.468E+08 1.548E-01 2.454E+01 3.715E+01
 ETOT 99  -6615240.6457553     7.136E+06 3.729E+08 2.532E-01 1.521E+01 2.194E+01
 ETOT  100  -1487759.0490269     5.127E+06 4.633E+08 1.131E+00 1.952E+01 8.834E+00

 scprqt:  WARNING -
  nstep=  100 was not enough SCF cycles to converge;
  potential residual=  1.131E+00 exceeds tolvrs=  1.000E-12

You can see from 50th iteration the system starts to become unstable, while running the same script without using GPU leads to convergence after 43 iterations as listed below:

Code: Select all

================================================================================

     iter   Etot(hartree)      deltaE(h)  residm     vres2    diffor    maxfor
 ETOT  1  -614.67427375897    -6.147E+02 4.717E+02 5.900E+03 1.946E-01 1.946E-01
 ETOT  2  -625.17143397088    -1.050E+01 4.887E-02 1.056E+04 2.327E-01 2.194E-01
 ETOT  3  -625.43735474989    -2.659E-01 2.372E-02 1.982E+02 1.783E-01 4.997E-02
 ETOT  4  -625.44870333943    -1.135E-02 2.902E-03 5.599E+02 3.648E-02 5.826E-02
 ETOT  5  -625.45242405097    -3.721E-03 5.341E-03 1.227E+02 8.372E-02 2.768E-02
 ETOT  6  -625.42951148671     2.291E-02 2.677E-03 1.609E+03 8.346E-02 1.089E-01
 ETOT  7  -625.45236462880    -2.285E-02 1.503E-03 4.711E+02 5.452E-02 6.002E-02
 ETOT  8  -625.45289954886    -5.349E-04 1.468E-03 3.792E+02 3.039E-02 3.684E-02
 ETOT  9  -625.46020258128    -7.303E-03 1.357E-03 8.238E+01 3.441E-02 7.381E-03
 ETOT 10  -625.45993109615     2.715E-04 6.665E-04 9.402E+01 2.606E-03 6.245E-03
 ETOT 11  -625.45956897175     3.621E-04 8.892E-04 1.171E+02 1.628E-02 1.611E-02
 ETOT 12  -625.45995970528    -3.907E-04 4.121E-04 1.031E+02 7.142E-03 2.325E-02
 ETOT 13  -625.46001760244    -5.790E-05 4.989E-04 1.339E+02 2.234E-02 4.559E-02
 ETOT 14  -625.46081833914    -8.007E-04 4.563E-04 9.762E+01 5.111E-03 4.553E-02
 ETOT 15  -625.46153912087    -7.208E-04 4.420E-04 4.727E+01 1.120E-02 3.433E-02
 ETOT 16  -625.46182012216    -2.810E-04 4.702E-04 3.493E+01 4.468E-03 3.460E-02
 ETOT 17  -625.46175654526     6.358E-05 3.755E-04 3.649E+01 1.303E-03 3.330E-02
 ETOT 18  -625.46160983850     1.467E-04 3.460E-04 5.703E+01 8.550E-03 4.185E-02
 ETOT 19  -625.46090254058     7.073E-04 2.623E-04 1.203E+02 1.633E-02 5.818E-02
 ETOT 20  -625.46176897178    -8.664E-04 2.040E-04 2.251E+01 3.546E-02 2.271E-02
 ETOT 21  -625.46189436686    -1.254E-04 1.370E-04 1.081E+01 7.634E-03 1.508E-02
 ETOT 22  -625.46201162069    -1.173E-04 1.085E-04 2.970E-01 1.403E-02 1.291E-03
 ETOT 23  -625.46201548672    -3.866E-06 1.033E-04 1.162E-01 5.557E-04 1.596E-03
 ETOT 24  -625.46201557276    -8.603E-08 6.328E-05 3.339E-02 8.990E-04 1.404E-03
 ETOT 25  -625.46201581900    -2.462E-07 7.313E-05 7.643E-04 8.262E-04 1.304E-03
 ETOT 26  -625.46201578657     3.244E-08 5.447E-05 4.402E-04 1.106E-04 1.246E-03
 ETOT 27  -625.46201571842     6.815E-08 4.970E-05 1.346E-04 3.715E-05 1.250E-03
 ETOT 28  -625.46201573272    -1.430E-08 4.536E-05 1.276E-06 4.976E-05 1.247E-03
 ETOT 29  -625.46201572899     3.728E-09 3.809E-05 1.889E-06 8.842E-06 1.250E-03
 ETOT 30  -625.46201573455    -5.564E-09 4.280E-05 1.315E-06 7.209E-06 1.249E-03
 ETOT 31  -625.46201573341     1.142E-09 3.527E-05 1.427E-06 9.650E-06 1.251E-03
 ETOT 32  -625.46201573163     1.782E-09 3.923E-05 1.958E-08 4.146E-06 1.250E-03
 ETOT 33  -625.46201573071     9.254E-10 3.174E-05 9.487E-08 6.736E-07 1.249E-03
 ETOT 34  -625.46201573132    -6.112E-10 3.498E-05 1.789E-09 1.519E-06 1.249E-03
 ETOT 35  -625.46201573159    -2.717E-10 2.784E-05 1.193E-08 8.205E-07 1.249E-03
 ETOT 36  -625.46201573145     1.374E-10 3.043E-05 1.586E-10 5.692E-07 1.249E-03
 ETOT 37  -625.46201573149    -3.502E-11 2.387E-05 1.467E-10 2.241E-08 1.249E-03
 ETOT 38  -625.46201573149    -9.095E-13 2.592E-05 9.312E-11 2.447E-08 1.249E-03
 ETOT 39  -625.46201573151    -1.955E-11 2.165E-05 4.730E-12 5.574E-08 1.249E-03
 ETOT 40  -625.46201573150     5.684E-12 2.168E-05 7.849E-12 6.289E-09 1.249E-03
 ETOT 41  -625.46201573151    -1.091E-11 2.287E-05 2.422E-12 4.698E-09 1.249E-03
 ETOT 42  -625.46201573151     2.956E-12 1.786E-05 3.201E-12 2.839E-09 1.249E-03
 ETOT 43  -625.46201573151    -3.411E-12 2.382E-05 5.239E-13 6.703E-09 1.249E-03

 At SCF step   43       vres2   =  5.24E-13 < tolvrs=  1.00E-12 =>converged.

As you can see the first few iterations are identical but when Abinit uses the GPU it starts to become unstable.

The node is equipped with a Tesla K20 GPU and Abinit detects the device correctly:

Code: Select all

-P-0000
-P-0000  setdevice_cuda : COMMENT -
-P-0000   GPU 0 has been properly initialized, continuing...
-P-0000
-P-0000  ________________________________________________________________________________
-P-0000  ________________________ Graphic Card Properties _______________________________
-P-0000
-P-0000     Device                0 : Tesla K20c
-P-0000     Revision number:                   3.5
-P-0000     Total amount of global memory:  4799.6 Mbytes
-P-0000     Clock rate:                        0.7 GHz
-P-0000     Max GFLOP:                          73 GFP
-P-0000     Total  constant memory:          65536 bytes
-P-0000     Shared memory per block:         49152 bytes
-P-0000     Number of registers per block:   65536
-P-0000
-P-0000  ________________________________________________________________________________
-P-0000

and this is the build information:

Code: Select all

=== Build Information === 
  Version       : 7.4.3
  Build target  : x86_64_linux_intel13.0
  Build date    : 20131113
 
 === Compiler Suite ===
  C compiler       : intel13.0
  CFLAGS           :  -g -O2 -vec-report0
  C++ compiler     : intel13.0
  CXXFLAGS         :  -g -O2 -vec-report0
  Fortran compiler : intel13.0
  FCFLAGS          :  -g -extend-source -vec-report0 -noaltparam -nofpscomp
  FC_LDFLAGS       :    -static-intel -static-libgcc
 
 === Optimizations ===
  Debug level        : basic
  Optimization level : standard
  Architecture       : intel_xeon
 
 === MPI ===
  Parallel build : yes
  Parallel I/O   : auto
  Time tracing   : no
  GPU support    : yes
 
 === Connectors / Fallbacks ===
  Connectors on : yes
  Fallbacks on  : yes
  DFT flavor    : libxc+atompaw+bigdft-fallback+wannier90
  FFT flavor    : fftw3-mkl
  LINALG flavor : mkl
  MATH flavor   : none
  TIMER flavor  : abinit
  TRIO flavor   : netcdf-fallback+etsf_io-fallback+fox-fallback
 
 === Experimental features ===
  Bindings            : no
  Exports             : no
  GW double-precision : yes
 
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Default optimizations:
   -O2 -xHost


 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 CPP options activated during the build:

                  CC_INTEL                 CXX_INTEL                  FC_INTEL

          HAVE_DFT_ATOMPAW           HAVE_DFT_BIGDFT            HAVE_DFT_LIBXC

        HAVE_DFT_WANNIER90 HAVE_FC_ALLOCATABLE_DT...        HAVE_FC_CONTIGUOUS

           HAVE_FC_CPUTIME             HAVE_FC_ETIME              HAVE_FC_EXIT

             HAVE_FC_FLUSH             HAVE_FC_GAMMA            HAVE_FC_GETENV

            HAVE_FC_GETPID             HAVE_FC_IOMSG     HAVE_FC_ISO_C_BINDING

        HAVE_FC_LONG_LINES              HAVE_FC_NULL         HAVE_FC_STREAM_IO

                  HAVE_FFT        HAVE_FFT_FFTW3_MKL              HAVE_FFT_MPI

           HAVE_FFT_SERIAL                  HAVE_GPU             HAVE_GPU_CUDA

          HAVE_GPU_CUDA_DP           HAVE_GPU_SERIAL               HAVE_LINALG

         HAVE_LINALG_AXPBY        HAVE_LINALG_GEMM3M  HAVE_LINALG_MKL_IMATCOPY

   HAVE_LINALG_MKL_OMATADD  HAVE_LINALG_MKL_OMATCOPY        HAVE_LINALG_SERIAL

                  HAVE_MPI                 HAVE_MPI2               HAVE_MPI_IO

 HAVE_MPI_TYPE_CREATE_S...             HAVE_OS_LINUX                HAVE_TIMER

         HAVE_TIMER_ABINIT            HAVE_TIMER_MPI         HAVE_TIMER_SERIAL

         HAVE_TRIO_ETSF_IO             HAVE_TRIO_FOX          HAVE_TRIO_NETCDF

 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Finally, this is the configuration script that has been used to build Abinit:

Code: Select all

/configure \ 
--prefix="/gpfs/apps/abinit/v7.4.3_cuda" \
--enable-mpi \
--with-mpi-prefix="$MPI_HOME" \
--with-fft-flavor="fftw3-mkl" \
--with-fft-libs="-L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm" \
--with-linalg-flavor="mkl" \
--with-linalg-libs="-L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm" \
--enable-atompaw --with-atompaw-bins="$ATOMPAW_HOME/bin" --with-atompaw-libs="-L$ATOMPAW_HOME/lib -latompaw" \
--with-libxc-incs="-I$LIBXC_HOME/include" --with-libxc-libs="-L$LIBXC_HOME/lib -lxc" \
--enable-wannier90 --with-wannier90-bins="$WANNIER_ROOT" --with-wannier90-libs="-L$WANNIER_ROOT -lwannier"\
--with-dft-flavor="atompaw+bigdft+libxc+wannier90" \
--with-trio-flavor="etsf_io+fox+netcdf" \
--enable-clib="yes" \
--enable-gw-dpc="yes" \
--enable-gpu \
FC=mpif90 CC=mpicc CXX=mpiCC


Can you think of anything that might cause the instability of Abinit using GPU during calculation?

Thank you very much in advanced,
Farzad

User avatar
jbeuken
Posts: 365
Joined: Tue Aug 18, 2009 9:24 pm
Contact:

Re: ABINIT 7.4.3 + CUDA cause unstable result

Post by jbeuken » Fri Nov 29, 2013 9:30 am

Hi,

first, the min version of intel compiler must be >= 13.0.1 or >= 13.1.3

next, this is the cuda.ac file used to test the support of CUDA in ABINIT in our test farm :


Code: Select all

enable_mpi = yes
enable_mpi_io = yes
with_mpi_prefix = /usr/local/openmpi_gcc46
with_dft_flavor = none
with_trio_flavor = none
enable_gw_dpc = yes
#
with_linalg_flavor = mkl+magma
with_linalg_incs = -I${MAGMA_ROOT}/include -I${MKLROOT}/include
with_linalg_libs = -L${MAGMA_ROOT}/lib -Wl,--start-group -lmagma -lmagmablas -lcuda -Wl,--end-group -L${MKLROOT} -lmkl_scalapack_lp64 -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -lpthread -lm
#
enable_gpu = yes
with_gpu_flavor = cuda-double
NVCC_CFLAGS = -O3 -arch=sm_13 -Xptxas=-v --use_fast_math --compiler-options -O3,-fPIC
#
FC_LDFLAGS_EXTRA = -Wl,-z,muldefs


then, use the command :

Code: Select all

./configure --with-config-file=./cuda.ac


I hope it can help you,

jmb
------
Jean-Michel Beuken
Computer Scientist

User avatar
f.hayati
Posts: 2
Joined: Fri Jul 15, 2011 1:08 pm
Location: UK
Contact:

Re: ABINIT 7.4.3 + CUDA cause unstable result

Post by f.hayati » Tue Dec 03, 2013 11:45 am

Dear Beuken,

Thank you for your prompt reply.
I've talked to our HPC Service Manager regarding the changes you suggested. Unfortunately due to work load at this time of the year he is unable to look into a new build before New Year. Consequently, I am unable to provide any update until after New Year. I will update this thread as soon as we applied your suggestions to the new build.

Warmest wishes,
Farzad

mike7
Posts: 3
Joined: Thu Oct 16, 2014 4:32 pm

Re: ABINIT 7.4.3 + CUDA cause unstable result

Post by mike7 » Thu Oct 16, 2014 5:14 pm

Hello.
I get the same result with 7.8.2 +cuda+gfortran 4.6 +mkl+magma 1.5: in the respfn teph1 tutorial, it does not converge from the start. I had to set fftalg=112 to make it work at all, but then it does not converge within 800 iterations. If I set use_gpu_cuda=0 in the input file, it works.
Here is my config, I also tried to vary some parameters taking them from other configurations, but it does not make any difference.

Code: Select all

enable_mpi="yes"
enable_mpi_io="yes"
with_mpi_prefix="/usr"
NVCC=/usr/local/cuda-6.0/bin/nvcc
#NVCC_CFLAGS="-arch=sm_30 -Xptxas=-v --use_fast_math --compiler-options -O3,-fPIC"
NVCC_CPPFLAGS="-arch=sm_30 -Xptxas=-v --use_fast_math --compiler-options -O3,-fPIC"
#with_dft_flavor=none
#with_trio_flavor=none
#with_fft_flavor="fftw3"
#with_fft_incs="-I/usr/include/"
#with_fft_libs="-L/usr/lib/x86-64-linux-gnu/ -lfftw3 -lfftw3f"
enable_gw_dpc=yes
with_linalg_flavor="mkl+magma"
with_linalg_incs="-I${MAGMA_ROOT}/include -I/home/mike/mkl/include"
#with_linalg_libs="-L${MAGMA_ROOT}/lib -Wl,--start-group -lmagma -lcuda -Wl,--end-group -L${MKLROOT} -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -lpthread -lm"
with_linalg_libs="-L/home/mike/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_lp64 -Wl,--start-group -L/home/mike/mkl/lib/intel64 -lmkl_gf_lp64 -lmkl_core -lmkl_sequential -Wl,--end-group -L/home/mike/mkl/lib/intel64/ -ldl -lpthread -lm -L${MAGMA_ROOT}/lib -lmagma -lcuda"
enable_gpu="yes"
with_gpu_flavor="cuda-double"
#NVCC_CFLAGS="-O3 -arch=sm_30 -Xptxas=-v --use_fast_math --compiler-options -O3,-fPIC"
#with-nvcc-flags="-O3 -arch=sm_30 -Xptxas=-v --use_fast_math --compiler-options -O3,-fPIC"
FC_LDFLAGS_EXTRA=-Wl,-z,muldefs -pthread


For non-cuda version, it does work fine, all tests pass, I tried in different configurations.

Jordan
Posts: 282
Joined: Tue May 07, 2013 9:47 am

Re: ABINIT 7.4.3 + CUDA cause unstable result

Post by Jordan » Fri Oct 17, 2014 10:50 am

Please read these threads : http://forum.abinit.org/viewtopic.php?f=2&t=2729#p8425 and http://forum.abinit.org/viewtopic.php?f=2&t=2729#p8437.
If you can try to downgrade your libraries to the one tested, we'll know if there is an issue due them or not.

Cheers

Jordan

mike7
Posts: 3
Joined: Thu Oct 16, 2014 4:32 pm

Re: ABINIT 7.4.3 + CUDA cause unstable result

Post by mike7 » Sat Oct 18, 2014 1:04 pm

Thank you for suggestions.
I compiled magma 1.5.0 and 1.2.1 and abinit 7.8.2 with intel 13.0.1, gcc 4.6 and 4.8 and cuda 6.0 and 6.5 in different combinations. Also replaced scalapack with lapack95. It does not improve anything.
It looks like intel 13 fails very often in the tests with segfault, in the same tests gnu compiler usually gives unstability or slightly different result exceeding tolerance.
I used "fast", "gpu" and "tutorespfn" keywords when testing.
For example, "gpu" group tests 2 and 4 success, 1 and 3 either pass or fail with error slightly exceeding tolerance. Changes in compiler/libary versions changes result from pass to fail and back. The runtime of tests is often 2-3 times or more higher than non-gpu version.

For example, "fast" test summary:

Code: Select all

[fast][t17-t19-t20-t21-t23]    failed    7.56    7.65
[fast][t25][np=1]    failed    1.12    1.15
[fast][t27-t28-t29]    failed    8.99    9.16
[fast][t00][np=1]    succeeded    1.53    1.64
[fast][t01][np=1]    succeeded    1.09    1.13
[fast][t02][np=1]    succeeded    2.63    2.69
[fast][t03-t05-t06-t07-t08-t09-t11-t12-t14-t16]    succeeded    11.90    12.36
[fast][t04][np=1]    succeeded    0.68    0.70
[fast][t24][np=1]    succeeded    1.27    1.32
[fast][t26][np=1]    succeeded    1.13    1.15
[fast][t30][np=1]    succeeded    0.90    0.96


For non-cuda version, fast group has one passed and all other success.

But I needed abinit for teph_1 tutorial, it does not converge anyway. Also I compared output, with cuda it sets fftalg=112, useylm=1 and lmnmax=9 , without 312,0,3 - that is the only difference in displayed parameters.

Jordan
Posts: 282
Joined: Tue May 07, 2013 9:47 am

Re: ABINIT 7.4.3 + CUDA cause unstable result

Post by Jordan » Mon Oct 20, 2014 9:52 am

The only tests actually tested on GPU in our test farm are "gpu"
All the other tests are not expected to succeed with GPU, altough they should....
So please consider testing with ../../tests/runtests.py gpu to confirm that at least these tests pass.

The FFT problem, namely fftalg is changed from 312 to 112 isn't relevant since in the CPU case the FFT is performed by the CPU with the FFTW3 library, whereas when the GPU version is activated and one GPU is detected then the FFT is done by the GPU using the cuFFT library.

Note that the parameters to converge on CPU might be different than for GPU what could explained that in some cases you don't converge on one but this other.

The GPU implementation is now a few years old (around 3 to 4) and is not often used/improved/checked (despite the 4 GPU tests we have), meaning it can be outdated. Furthermore, the speed up using GPU is around x3 ou x4 which can be slower than using a fast multicore cpu with in between 6 to 16 cores (or more), depending on what you want to achieve. So the question is, is it worth to spend a lot of time making the GPU version working ? In some cases, it is ( I used them when I could not have access to a large number of CPU which were in addition very slow).

Hope that helps

Jordan

mike7
Posts: 3
Joined: Thu Oct 16, 2014 4:32 pm

Re: ABINIT 7.4.3 + CUDA cause unstable result

Post by mike7 » Tue Oct 21, 2014 7:26 pm

Hello.
I tested with gpu test group and the tests all are shown as fail. But looking into the files reveal that except for 4th test, 3 first ones have many differences between very small numbers like 1e-30 . The 4th test has larger error, the pressure tensor is different in the 3rd digit. What is also visible is that the results are different on every run i.e. those small numbers are not the same. So it looks like uninitialized variables when using gpu.
I also measured the speed: the usage of mkl improves the speed by 1.5 times and makes 1 cpu equal or a little faster than 1 gpu version. I needed GPU version to use abinit with a supercomputer which has 80% of power in GPUs.

Also I reduced the input file for the response function tutorial where I noticed inconvergence:

Code: Select all

nband        3

tolvrs 1.e-8
kptopt  1
ngkpt        4 4 4
ecut         4.0
nshiftk      1
shiftk       0.0 0.0 0.0
acell        3*7.5
rprim
 0.0 0.5 0.5
 0.5 0.0 0.5
 0.5 0.5 0.0
occopt       3
tsmear       0.001
natom        1
typat        1
xred         0.00 0.00 0.00
nstep        800
ntypat       1
znucl        13

If I change nband from 2 to 3, it does not converge - at that iteration where it is supposed to converge, the energy starts to oscillate. For 2 it converges but gives a little different result as compared to CPU version. Maybe that will give you a clue where the error is.

Jordan
Posts: 282
Joined: Tue May 07, 2013 9:47 am

Re: ABINIT 7.4.3 + CUDA cause unstable result

Post by Jordan » Thu Oct 23, 2014 11:18 am

I can not give you the solution/explanation right now but we will investigate very soon and will provide more information on the supported libraries for GPU.
We'll try also to provide more gpu tests.

About your input file, you are considering a metallic system with a fermi-dirac distribution. Depending on the number of valence electrons (I guess 3) in your pseudopotential, you may not have enough bands with nband=2, occopt 3 and tsmear 0.001. Please check that you have at least one empty band in your calculation. If not, increase nband until the last band is at most 0.000 in the log or output file and compare the results GPU/CPU

Jordan

Locked