Hello,
Here are excution times of Abinit compiled with and without ACML library using gfortran and internal FFT.
System: AMD 8350 Vishera 4.0 GHz CPU (8 cores), 32 GB memory. Input: scf calculation for 34 Si atoms in FCC cell.
Execution times per core (8 cores)
Abinit 7.4.2 compiled with ACML 1861
Abinit 7.4.2 without internal lib 1890
Abinit 7.2.1 - binary from abint site 1883
However, for phon calculations with the same system, Abinit with the ACML library uses over 32 GB memory,
whereas Abinit from the web site or compiled with internal libs uses only ~19 GB.
Any suggestions why? The ACML library supposed to be optimized for AMD processors?
Thank you,
Jan Gryko
Execution time and memory with ACLM
Moderators: fgoudreault, mcote
Forum rules
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
Please have a look at ~abinit/doc/config/build-config.ac in the source package for detailed and up-to-date information about the configuration of Abinit 8 builds.
For a video explanation on how to build Abinit 7.x for Linux, please go to: http://www.youtube.com/watch?v=DppLQ-KQA68.
IMPORTANT: when an answer solves your problem, please check the little green V-like button on its upper-right corner to accept it.
- Alain_Jacques
- Posts: 279
- Joined: Sat Aug 15, 2009 9:34 pm
- Location: Université catholique de Louvain - Belgium
Re: Execution time and memory with ACLM
Hi Jan,
First of all, if you invest time in optimizing Abinit, I would compile it with FFTW3 - an enhanced FFT lib is probably more efficient than enhanced BLAS/LAPACK
Although AMD markets the 8350 as an 8 "cores" CPU (very deceptively IMHO) , it only has 4 compute units i.e. 4 caches and 4 arithmetic units. So it is what I call (and Intel too) a hyperthreaded 4 cores CPU. May I suggest to compare ACML to plain BLAS LAPCK with only 4 parallel threads to avoid overloading. I don't know how you did the test (either several MPI slots with single threaded BLAS/LAPCK or sequential with multithreaded BLAS/LAPACK - the former is the most efficient and will benefit from enhanced libs)
I have no clue about the memory footprint discrepancy - ACML is multithreaded but it shouldn't replicate data for this to work.
ACML is supposed to be optimized ... I have better results with MKL or openBLAS on AMD CPUs
Kind regards,
Alain
First of all, if you invest time in optimizing Abinit, I would compile it with FFTW3 - an enhanced FFT lib is probably more efficient than enhanced BLAS/LAPACK
Although AMD markets the 8350 as an 8 "cores" CPU (very deceptively IMHO) , it only has 4 compute units i.e. 4 caches and 4 arithmetic units. So it is what I call (and Intel too) a hyperthreaded 4 cores CPU. May I suggest to compare ACML to plain BLAS LAPCK with only 4 parallel threads to avoid overloading. I don't know how you did the test (either several MPI slots with single threaded BLAS/LAPCK or sequential with multithreaded BLAS/LAPACK - the former is the most efficient and will benefit from enhanced libs)
I have no clue about the memory footprint discrepancy - ACML is multithreaded but it shouldn't replicate data for this to work.
ACML is supposed to be optimized ... I have better results with MKL or openBLAS on AMD CPUs
Kind regards,
Alain
Re: Execution time and memory with ACLM
Thank you very much for your quick answer. Here are more tests for two acml libraries: libacml and libacml_mp for Vishera 8350:
Abinit linked with libacml run with mpirun -np 4 or mpirun -np 8 uses 4 or 8 cores, almost 100%
Abinit linked with libacml_mp runs with mpirun -np 4 using 8 cores at about 70 - 80%.
Abinit linked with libacml_mp runs with mpirun -np 8, using 8 cores 100%, but the execution time is about 5 - 7 times longer.
Changing the subject - I am trying to link Abinit with fftw3. I installed fftw3-3.3, all tests reported fine, but when linking with Abinit, several subroutines are missing, for example:
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o): In function `__m_fftw3_MOD_cplan_many_dft':
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:2908: undefined reference to `sfftw_plan_many_dft_'
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o): In function `__m_fftw3_MOD_fftw3_c2c_op_spc':
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:1802: undefined reference to `sfftw_execute_dft_'
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o): In function `__m_fftw3_MOD_fftw3_execute_dft_spc':
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o):/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: more undefined references to `sfftw_execute_dft_' follow
Any suggestions why?
Thank you very much in advance,
Jan Gryko
Abinit linked with libacml run with mpirun -np 4 or mpirun -np 8 uses 4 or 8 cores, almost 100%
Abinit linked with libacml_mp runs with mpirun -np 4 using 8 cores at about 70 - 80%.
Abinit linked with libacml_mp runs with mpirun -np 8, using 8 cores 100%, but the execution time is about 5 - 7 times longer.
Changing the subject - I am trying to link Abinit with fftw3. I installed fftw3-3.3, all tests reported fine, but when linking with Abinit, several subroutines are missing, for example:
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o): In function `__m_fftw3_MOD_cplan_many_dft':
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:2908: undefined reference to `sfftw_plan_many_dft_'
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o): In function `__m_fftw3_MOD_fftw3_c2c_op_spc':
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:1802: undefined reference to `sfftw_execute_dft_'
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o): In function `__m_fftw3_MOD_fftw3_execute_dft_spc':
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: undefined reference to `sfftw_execute_dft_'
../../src/52_fft_mpi_noabirule/lib52_fft_mpi_noabirule.a(m_fftw3.o):/home/gryko/abinit-7.4.2/src/52_fft_mpi_noabirule/m_fftw3.F90:3215: more undefined references to `sfftw_execute_dft_' follow
Any suggestions why?
Thank you very much in advance,
Jan Gryko