Results database¶
Abiflows provides an automated way to store the output of the workflow in a MongoDB database. The default approach is that of storing the output of the whole workflow in a single document of a MongoDB collection. Each type of workflow produces one specific output, storing the most relevant information produced by that workflow. In the following we describe the infrastructure used to store and retrieve the output documents.
Workflow output¶
The approach in the storage of the results is that in the MongoDB document only data that are expected to be used are query parameters are included. A few examples of what this includes are the composition and symmetry of the structure and the main configuration options of the calculation. Other values that can be found in the document are some basic output values that are likely to be accessed more frequently or to be used as a filtering as well, like the final magnetization of the system or the statistics of execution of the calculations.
On the other hand the fully detailed values of the results rarely needs to be queried on. In the Abiflows output these values are kept in the native machine readable format produced directly by Abinit, i.e. in NetCDF files. These files contain the full set of data produced by the calculations and Abipy offers a full support to analyse and plot their content. These files are stored in MongoDB GridFS as they are and referenced in the main document. Usually some files in text format are also stored, both for necessity, when the mainly supported version of an output file is a text one (e.g. the DDB file), and for completeness. These files usually cover all the relevant outputs produced by the workflow.
Mongoengine documents¶
In order to provide a common base for the standard keywords that are stored in the
database and for the structure of the documents produced by the different workflows,
the interaction with the output database in Abiflows is handled using
mongoengine. For this purpose a set of
mongoengine Document
objects are implemented in the
abiflows.database.mongoengine.abinit_results
module.
From a technical point of view, the results documents are obtained composing different
mixins, each of which define a specific set of properties that are likely to be queried
and that are usually shared by different kinds of workflows (e.g. the
abiflows.database.mongoengine.mixins.MaterialMixin
). In this way the uniformity
of the notation for common keywords is guaranteed across the different kind of outputs.
The queries to the documents can be done using the standard mongoengine approach:
for result in RelaxResult.objects(nsites=5):
print(result.pretty_formula)
where it is possible to query the properties in the Document
object. See the
mongoengine user guide for more
details.
Database connection definition¶
The information required to connect to the database (e.g. address, username, …)
are stored in a specific object abiflows.database.mongoengine.utils.DatabaseData
so that it can be used to define where to store the results and to access, passing
it to the task responsible to generate the document with the output of the workflow.
In addition, since mongoengine uses the name of the class as default name for the collection
where to store and retrieve the data, DatabaseData
offers a shortcut for the
switch_collection
method. You can thus use it to query the database, as shown in the Phonons
example:
db = DatabaseData(host='db_address', port=27017, collection='collection_name',
database='db_name', username='user', password='pass')
with source_db.switch_collection(RelaxResult) as RelaxResult:
for result in RelaxResult.objects(nsites=5):
print(result.pretty_formula)
Note
Even though it might be convenient to rely on the mongoengine documents to interact with the results produced by Abiflows, this is by no means a strict requirement and the database can be queried using the standard MongoDB connections and queries.