Input Variable (rename)

mcote · Post by **mcote** » Wed Oct 21, 2009 5:58 pm

Dear all,

During the last Abinit Workshop, we discussed the possibility to rename the input variable. The goal is to make it easier for new users to figure out the meaning of input variable and to establish a scheme in naming new variables. This process need to be thought carefully as it requires many changes to the code.

It is a good time to start the reflection on this subject, and the modifications could be implemented in the upcoming version, most likely between the 6.0 and 6.1 version and later.

I have discussed the issue with Paul Boulanger and Simon Blackburn, and we have come up with a "first" classification that should help in the renaming process. It is based on a Class - Subclass - Subsubclass classification:

Atomic structure
-> Alchemical potential
-> Jellium
Electronic structure parameters
-> basis
-> paw
-> spin
-> dft
-> k-points
-> convergence
-> bands
Method
-> Ground state
-> DFT+U
-> Relaxation
-> Berry phase
-> BigDFT
-> DOS
-> PBE0
-> MD
-> positron
-> perturbation
-> STM
-> Wannier90
-> GW
-> Screening
-> Sigma
-> Screening and Sigma
-> Response function
->
Linear response
-> Non-linear response
-> Parallel
-> TDDFT
-> General
-> parallel
->DFT
Dataset
Development

Paul and Simon went through all of the present input variables and assigned them to these categories. The result of this exercise is in the attached Excel file.

At the moment, we are looking for a few "volunteers" to participate in a committee to discuss this first proposal and present a naming scheme. Ideally, this committee should contain 4-6 experienced users and developers.

Any volunteer?

Michel

jzwanzig · Post by **jzwanzig** » Wed Oct 21, 2009 7:05 pm

Hi, I can help with this effort.

Post by **pouillon** » Wed Oct 21, 2009 7:05 pm

Do you already have proposals about the nomenclature? If not, I recommend having a look at what they did for Octopus.

I can help in particular with the practical aspects of the migration to the new naming scheme once it is defined.
pouillon

gonze · Post by **gonze** » Wed Oct 21, 2009 7:06 pm

A few things to pay attention to :
(1) At present, in ABINIT, In order to avoid a mess and facilitate the documentation,
the external names of input variables are exactly equal to the internal names
in the dtset type. It is very important not to break this rule, so that a change of name
will have to be done in the whole ABINIT, not simply using an interface. It is not a real
problem, indeed one can use a script to make the change automatically.
This is to be kept in mind, though.
(2) What will be the maximum length of input variables allowed ? Indeed, there is a
hard-coded value of the length of the input variable token, appearing e.g. in prttagm.F90,
in order to allow nicely formatted echo of the input variables and their input (or output) values.
At present, it is set to 9 , which is very small according to modern usage of names.
It would be too much of a hassle (and likely break the nice formatting of the echo of variables) to allow
for no limit (that I think might be restricted to 32 characters, if we take into account the ETSF_IO experience).
Perhaps going to 16 or 24 characters might be fine ?
(3) There should be a transition period, during which the old names are still allowed.
(4) The names have to be changed in the input variables (e.g. all input files), reference output files (this will be done automatically),
the sources, and also in the doc, including the tutorials. Only a script can be sufficiently systematic to leave no instance of the
old variables ...
(5) I suggest to lead a pilot experience with two or three changes of variable names, to see where are the pitfalls, before
trying to change massively variable names. This should be done rater early : it might impact the choice of names ...

Xavier

gonze · Post by **gonze** » Wed Oct 21, 2009 7:07 pm

Dear all,

Likely most developers have other priorities ...
Nevertheless, I would like to progress at one level : the maximum length of input variables.
On the forum, I wrote, answering your mail :

--------------------------
(2) What will be the maximum length of input variables allowed ? Indeed, there is a
hard-coded value of the length of the input variable token, appearing e.g. in prttagm.F90,
in order to allow nicely formatted echo of the input variables and their input (or output) values.
At present, it is set to 9 , which is very small according to modern usage of names.
It would be too much of a hassle (and likely break the nice formatting of the echo of variables) to allow
for no limit (that I think might be restricted to 32 characters, if we take into account the ETSF_IO experience).
Perhaps going to 16 or 24 characters might be fine ?
---------------------------

It happens that, in order to start the new automatic tests that check that all input variables
are tested and documented, I had to clean the checking routines, and already some input variables
are longer than 9 characters, thus they are not echoed correctly. So, I would like to switch from
the maximal length of 9 characters to a larger value just after having merged the contributions from v5.9.3 .
The biggest impact will be in the echo of input variables, the section of output file that starts with

-outvars: echo values of preprocessed input variables --------
acell 1.0263110000E+01 1.0263110000E+01 1.0263110000E+01 Bohr
amu 2.80855000E+01
diemac 1.00000000E+00
diemix 3.33333333E-01
ecut 3.99000000E+00 Hartree
enunit 2
intxc 1
irdwfk 1

Because I will have no time to spend to completely change the output format, or to code something more elaborate,
I plan simply to shift to the right the alignement, by the number of added characters.
I think we should go at least to 16, but perhaps 18 or 20 . E.g. , with 16 :

-outvars: echo values of preprocessed input variables --------
acell 1.0263110000E+01 1.0263110000E+01 1.0263110000E+01 Bohr
amu 2.80855000E+01
diemac 1.00000000E+00
diemix 3.33333333E-01
ecut 3.99000000E+00 Hartree
enunit 2
intxc 1
irdwfk 1

Going to 16 is already such a large change from the current standard that I am not sure one should consider bigger numbers.
The longest name of an input variable at present is scphon_supercell (16 characters).

Any comments/suggestions ?

Xavier

mverstra · Post by **mverstra** » Wed Oct 21, 2009 7:07 pm

scphon_ stuff is my fault - it can be shortened of course if necessary.

As to the class structure it would be very nice (what is the stray dft
subclass in method? it seems ambiguous with the others in some way...)
- will these correspond to sub objects of dtset, which can be passed
one at a time instead of all together? E.g. without giving parallelism
info to low level processes. It is important that the dtset and input
variable restructurings go together. I think Matteo is the most active
in the former rethink (I have added him to the conversation).

ciao tutti

Matthieu

jzwanzig · Post by **jzwanzig** » Wed Oct 21, 2009 7:08 pm

Or, you could represent "class" by a single letter, "subclass" by a
single letter, "subsubclass" by a single letter, etc. Then variable
names would look like aac_ecut, dab_qpt, etc, depending on the mapping
from categories to letters. Then it is only 4 characters longer than
at present, and has the advantage of being easy to write automation
scripts that deal with all variables of a certain type (only need to
parse the first character or first and second and so forth).

Joe

mverstra · Post by **mverstra** » Wed Oct 21, 2009 7:14 pm

Hi Joe,

I would not be that succinct: input variables are the main interaction
with users, and if they become even more obscure or one has to go
through the doc to understand what aacd_stuff means we have only
gained scripting efficiency. It's also easier to forget which aac_ to
put in front, whereas everyone remembers "ecut"... Ergonomics is a
horrible science.

M.

dcaliste · Post by **dcaliste** » Wed Oct 21, 2009 7:15 pm

Hello,

Le 18/10/2009, Michel Cote <Michel.Cote@umontreal.ca> a écrit :
These classes and subclasses are only first guesses and they need to
be fully discussed.
I like much the idea of classifying the input variables, especially in
the output, as you propose in your first email, Michel. I see many
advantages to this:
- clearer output for users, better and incremental understanding of the
DFT (beginners can look at struct and cv params and ignore others...).
- it force the dev to separate dtset into smaller types that are
consistant by themselves (set of k points, atomic structure, elec cv
params, geo cv params...). So later we may creates smaller types, put
them all into dataset for the higher routines and pass the small ones
to the lower routine. I love the idea.
- having dtset%kpoints%wkpt or dtsets%ecv%ecut is not so nice but not
worst than dtset_kpoints_wkpt, and it has the definitive advantage that
for lower routines, only kpoints%wkpt or even wkpt will be used. We keep
thus the code short and clear. Because disadvantages of long names is
that lines become verrry long and less readable (see the ETSF_IO
related routines, I know, I created them ; but don't regret neither).
- we keep the current names, which is good for users. We can extend
maybe from 9 to 12 for better clarity of some variables, but most of
them will keep their names, also in the output. Backward compatibility
is a nice bonus.

So I really love the idea of subclassing like that.

Now, the drawbacks I see of this approach (but I think that Michel's
group has much background here since they test this since several
months):
- There no unique partition of input variables. What do we do when a
variable should be in two subclasses ? My answer in that case,
following what has been done in other computer domains like toolkits, is
duplicate. GTK in the first version (Gtk+1.2) keep references on a
unique value. It leads to much memory management issues and several
segfault and inconsistencies because an object modify the value but not
all objects using it also were aware of changing. So in Gtk+2.x which
exists now since 2002, everything is copied. So each objects depends
on their own variables. It considerably harden the code and simplify
life of developers. The drawback of memory consumption and time to copy
is not important since copies are done on creation and memory is not an
issue for scalars and small arrays (even the array of atom coordinates
for 1000 atoms is very small, compared to cg in that case).
- the transformation of the code will not be automatic. At least in the
final step, to pass from dtset to kpoints in a routine declaration if
inside only dtset%kpoints are used. But the first step to pass from
dtset%ecut to dtset%ecv%ecut can be automatic.

If this is a good direction, may we suggest a first subclassing of all
input variable to begin the war of "no this variable should be in A"
"no in B" ?

Damien.

PS : I know that you don't worry for that, but whatever the proposed
solution up to now, they are still compatible with the bindings I've
done in C (with more or less work). So good.

gonze · Post by **gonze** » Wed Oct 21, 2009 7:15 pm

Dear all,

There are many interesting points raised in the different mails.
Still, the final picture is not yet clear for me.
I have written below some of my concerns.

But, in a more pragmatic approach, I think I can go forward with
extending the length of input variables...
I think to 30 characters for reading, while for writing, there might be two cases :
either the input variable name is "long"
(between, let's say 16 and 30 characters), or it is short (smaller than 16 characters).
For the short case, one would keep the present formatting (extended to 16 characters), while for the long case, one
might simply insert a carriage return after the name of the input variable

short_name1 1 2 3
short_name2 8 9 1
very_very_very_long_name
1 2 3 4 5 6 9

Obviously this does not precludes introducing a class/subclass/subsubclass system in the future.

The discussion about changing the input variable names
can then proceed without worrying about a deadline ...

Best regards,
Xavier

------------------------------------------------------------------------------

I. About the long-term "ideal" situation

What I would like to have clarified are the rules to be followed between :
(1) the keywords to be used for input
(2) the internal representation (possibly hierarchy of datastructures)
(3) the actual echo
and also the possibility of non-uniqueness at each of these levels (well, perhaps not for the
actual echo, that will be unique ...).

Indeed : at present, representations (1), (2), and (3) are completely identical,
and this is a nice feature of ABINIT, I think.
Indeed, let's take nkpt as an example :
- nkpt is the keyword to be read
- it is stored inside dtset%nkpt
- it is echoed as nkpt .
So, there is no confusion.
Also, at the level of the ordering, for representations (2) and (3), everything presently follows well-established
rules : alphabetical order for echo, and scalar/array order, followed by alphabetic (sub)order.
Such an "automatic" classification saves time for the developer (no thinking is needed, as there is no regrouping to be done).

When we started to discuss changing the names, I thought that this would simply
involve replace keywords by some other keyword, with possibly a period of time
during which the two names would be allowed as input :
e.g. nkpt might have been replaced by number_of_k_points (I take the ETSF standard as an example).

I recognize that some input variables names are rather obscure, so that changing their name
is an excellent idea. And to have a class_subclass_subsubclass (or, as a first step, a class_keyword)
is a way to generate consistent names throughout.

But the recent discussions seem to go beyond that, because
potentially, the representations (1), (2) and (3) might become different (e.g. having the possibility to mention only the subsubclass in the echo)
and perhaps non-unique (a subsubclass might belong to two different subclasses, as mentioned by Damien).

I would favour keeping coherent representations (1) (2) (3), with e.g.
(1) input variable kpoints_wkpt
(2) internal representation dtset%kpoints%wkpt (allowing to pass kpoints%wkpt as argument of some routine)
(3) echo kpoints_wkpt
Then, the correct structuration of the input variable should follow the intended existence of the datastructure inside ABINIT.
Namely, if for internal purposes, there should be a datastructure called "kpoints", then its content should be determined
by its use for coding purposes, not by logical concern as seen by users.

Also, will a subsubclass keyword be unique ? That is : not the same subsubclass keyword can be used for two different variables,
that would happen to belong to different class_subclass ? Then, if the subsubclass keyword is sufficient to specify the input variable,
will it be possible to use it without mentioning the class/subclass ?

II. About the transition to the ideal situation.

One must realize that changing all input variable names, keeping coherency with an internal representation, is a huge task,
that must be prepared by writing scripts ...
In order to prepare it correctly, I think that there should be a pilot project, doing the change for several input variables,
e.g. forming a data structure, so that one can see really where are the problems, and what are the implications of the change.

mverstra · Post by **mverstra** » Wed Oct 21, 2009 7:25 pm

gonze wrote:But, in a more pragmatic approach, I think I can go forward with
extending the length of input variables...
I think to 30 characters for reading, while for writing, there might be two cases :
either the input variable name is "long"
(between, let's say 16 and 30 characters), or it is short (smaller than 16 characters).
For the short case, one would keep the present formatting (extended to 16 characters), while for the long case, one
might simply insert a carriage return after the name of the input variable

short_name1 1 2 3
short_name2 8 9 1
very_very_very_long_name
1 2 3 4 5 6 9

I agree entirely, and this is the only "urgent" problem

gonze wrote:I. About the long-term "ideal" situation

What I would like to have clarified are the rules to be followed between :
(1) the keywords to be used for input
(2) the internal representation (possibly hierarchy of datastructures)
(3) the actual echo
and also the possibility of non-uniqueness at each of these levels (well, perhaps not for the
actual echo, that will be unique ...).

Indeed : at present, representations (1), (2), and (3) are completely identical,
and this is a nice feature of ABINIT, I think.
Indeed, let's take nkpt as an example :
- nkpt is the keyword to be read
- it is stored inside dtset%nkpt
- it is echoed as nkpt .
So, there is no confusion.

welllll.... there are input variables like ngkpt which do not become
internal ones, and there are (many) others which get renamed, in all
or in part of the code. But I agree what you describe would be a clean
clear situation to idealize. There is also more stuff in dtset which
is not accessible as an input variable (this is normal, but it means
it's not bijective).

gonze wrote:When we started to discuss changing the names, I thought that this would simply
involve replace keywords by some other keyword, with possibly a period of time
during which the two names would be allowed as input :
e.g. nkpt might have been replaced by number_of_k_points (I take the ETSF standard as an example).

again - think of the user. We will alienate people if we change all of
the input variable names. They have to re-learn everything. In some
cases clarity will increase, but in most cases it's ok as it is.

gonze wrote:I recognize that some input variables names are rather obscure, so that changing their name
is an excellent idea. And to have a class_subclass_subsubclass (or, as a first step, a class_keyword)
is a way to generate consistent names throughout.

But the recent discussions seem to go beyond that, because
potentially, the representations (1), (2) and (3) might become different (e.g. having the possibility to mention only the subsubclass in the echo)
and perhaps non-unique (a subsubclass might belong to two different subclasses, as mentioned by Damien).

I would discourage this bit, and in our case I think things are
sufficiently simple to avoid duplications (with a bit of thinking and
planning).

gonze wrote:I would favour keeping coherent representations (1) (2) (3), with e.g.
(1) input variable kpoints_wkpt

or even keeping wkpt

gonze wrote:(2) internal representation dtset%kpoints%wkpt (allowing to pass kpoints%wkpt as argument of some routine)
(3) echo kpoints_wkpt
Then, the correct structuration of the input variable should follow the intended existence of the datastructure inside ABINIT.
Namely, if for internal purposes, there should be a datastructure called "kpoints", then its content should be determined
by its use for coding purposes, not by logical concern as seen by users.

By having ((sub )sub)classes which are clearly encapsulating each
other, the correspondence should be unique

dtset
|----------------------------------------------------------|
| kpts | atoms | basis | ... |
| | | PAW | NC | wavelets| |

If each of the classes has a nice output routine for its own stuff,
you call output(dtset) and that chains all the other ones.

mverstra · Post by **mverstra** » Wed Oct 21, 2009 8:53 pm

Test the forum we shall!

I think most of the present variable names should be kept, modifying only those which are unclear or obscure (many of those are not widely used anyway). A list of these should be started (maybe Michel already has one?)

The variables should be allocated to a single (sub^N)class, which is hierarchically contained in others, as in Michel's examples.

This way we get the modularity, the one-to-one correspondence, and the classes with a minimal change to the user interface.

M.

mcote · Post by **mcote** » Fri Oct 23, 2009 1:27 pm

Dear all,

Paul, Simon and I already started to assign the input variables to the different classes and subclasses.

Since the change of variable will imply quite a lot of changes to the code, let me propose the following actions:

1) As a first step, let us try not to change name of the input variable but assign them to the different classes and subclasses. The web page could be redesign to display the input variables according to classes and subclasses. We will certainly have feedback for the community which will suggest some reassignments.

2) Once the classes and subclasses are accepted as information on the web page, we may think on how to better integrate them in the code itself.

At the end of the process, I would like to see the code have the structure dset%class%subclass%name%arguments. This way integration will XML format will also be easy.

May I ask volunteers to adjust our first assignment and to rethink the choices of classes and subclasses?

Michel

gonze · Post by **gonze** » Tue Nov 03, 2009 4:15 pm

You might have remarked that in v5.9.4, the echo of the input variable names is OK for up to 16 characters.
Xavier

ABINIT Discussion Forums

Input Variable (rename)

Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)

Re: Input Variable (rename)