next up previous contents index
Next: 6.2.6 vectmath.h Up: 6.2 The NEMO Macro Previous: 6.2.4 Alternatives to nemo_main   Contents   Index

6.2.5 filestruct.h

The filestruct package provides a direct and consistent way of passing data between NEMO programs, much as getparam provides a way of passing (command line) arguments to programs. For reasons of economy and accuracy, much of the data manipulated by NEMO is stored on disk in binary form. Normally, data stored this way is completely unintelligible, except to specialized programs which create and access it. Furthermore, this approach to data handling tends to be very brittle: a trivial addition or alteration to the data stored in such a file can force the tedious and error-prone revision of many programs. To get around these problems and provide an explicit, flexible, and structured method of storing binary data, we developed a collection of general purpose routines to access binary data files.

From the programmers point of view, a structured binary file is a stream of tagged data objects. These objects come in two classes. An item is a single instance or a regular array of one of the following C primitive types: char, short, int, long, float or double. A set is an unordered sequence of items and sets. This definition is recursive, so fully hierarchical file structures are allowed, and indeed encouraged. Every set or item has a name tag associated with it, used to label the contents of a file and to retrieve objects from a set. Data items have a type and array dimension attributed associated with them as well. This of course means that there is a little overhead, which may become too large if many small amounts of data are to be handled. For example, a snapshot with 128 bodies (created by mkplummer) with double precision masses and full 6 dimensional phase space coordinates totals 7425 bytes, whereas a straight dump of only the essential information would be 7168 bytes, a mere 3.5% overhead. After an integration, with 9 full snapshots stored and 65 snapshots with only diagnostics output, the overhead is much larger: 98944 bytes of data, of which only 64512 bytes are masses and phase space coordinates: the overhead is 53% (of which 29% though are the diagnostics output, such conservation of energy and angular momentum, cputime, center of mass, etc.).

The filestruct package uses ordinary stdio(3) streams to access input and output files; hence the first step in using filestruct is to open the file streams. For this job we use the NEMO library routine stropen(), which itself is not part of filestruct. stropen(name,mode) is much like fopen() of stdio, but slightly more clever; it will not open an existing file for output, unless the mode string is "w!". An additional oddity to stropen is that it treats the dash filename "-", as standard in/output,6.7and "s" as a scratch file. Since stdio normally flushes all buffers on exit, it is often not necessary to explicitly close open streams, but if you do so, use the matching routine strclose(). This also frees up the table entries on temporary memory used by the filestruct package. As in most applications/operating systems a task can have a limited set of open files associated with it. Scratch files are automatically deleted from disk when they are closed.

Having opened the required streams, it is quite simple to use the basic data I/O routines. For example, suppose the following declarations have been made:

    #include <stdinc.h>
    #include <filestruct.h>

    stream instr, outstr;
    int    nbody;
    string headline;

    #define MAXNBODY 100
    real    mass[MAXNBODY];

(note the use of the stdinc.h conventions). And now suppose that, after some computation, results have been stored in the first nbody components of the mass array, and a descriptive message has been placed in headline. The following piece of code will write the data to a structured file:

    outstr = stropen("mass.dat", "w");

    put_data(outstr, "Nbody", IntType, &nbody, 0);
    put_data(outstr, "Mass", RealType, mass, nbody, 0);
    put_string(outstr, "Headline", headline);

    strclose(outstr);

Data (the 4th argument in put_data, is always passed by address, even if one element is written. This not only holds for reading, but also for writing, as is apparent from the above example. Note that no error checking is needed when the file is opened for writing. If the file mass.dat would already have existed, error() would have been called inside stropen() and aborted the program. Names of tags are arbitrary, but we encourage you to use descriptive names, although an arbitrary maximum of 64 is enforced by chopping any incoming string.

The resulting contents of mass.dat can be viewed with the tsf utility:

    % tsf mass.dat
    int Nbody 010
    double Mass[8] 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
    char Headline[20] "All masses are equal"

Note the octal representation 010=8 of Nbody.

It is now trivial to read data from this file:

    instr = stropen("mass.dat", "r");

    get_data(instr, "Nbody", IntType, &nbody, 0);
    get_data(instr, "Mass", RealType, mass, nbody, 0);
    headline = get_string(instr, "Headline");

    strclose(instr);

Note that we read the data in the same order as they were written.

During input, the filestruct routines normally perform strict type-checking; the tag, type and dimension supplied to get_data() must match the attributes of the data item, written previously, exactly. Such strict checking helps prevent many common errors in using binary data. Alternatively, you can use get_data_coerced(), which is called exactly like get_data(), but interconverts float and double values6.8.

To provide more flexibility in programming I/O, a series of related items may be hierarchically wrapped into a set:

    outstr = stropen("mass.dat", "w");

    put_set(outstr, "NotASnapShot");
       put_data(outstr, "Nbody", IntType, &nbody, 0);
       put_data(outstr, "Mass", RealType, mass, nbody, 0);
       put_string(outstr, "Headline", headline);
    put_tes(outstr, "NotASnapShot");

    strclose(outstr);

Note that each put_set() must be matched by an equivalent put_tes(). For input, corresponding routines get_set() and get_tes() are used. These also introduce a significant additional functionality: between a get_set() and get_tes(), the data items of the set may be read in any order6.9, or not even read at all. For example, the following is also a legal way to access the NotASnapShot6.10:

    instr = stropen("mass.dat", "r");

    if (!get_tag_ok(instr,"NotASnapShot"))
        error("File mass.dat is not a NotASnapShot\n");

    get_set(instr,"NotASnapShot");
    headline = get_string(instr, "Headline");
    get_tes(instr,"NotASnapShot");

    strclose(instr);

This method of ``filtering'' a data input stream clearly opens up many ways of developing general-purpose programs. Also note that the bool routine get_tag_ok() can be used to control the flow of the program, as get_set() would call error() when the wrong tag-name would be encountered, and abort the program.

The UNIX program cat can also be used to catenate multiple binary data-sets into one, i.e.

    %   cat mass1.dat mass2.dat mass3.dat > mass.dat

The get_tag_ok routine can be used to handle such multi-set data files. The following example shows how loop through such a combined data-file.

    instr = stropen("mass.dat", "r");

    while (get_tag_ok(instr, "NotASnapShot") {
       get_set(instr, "NotASnapShot");
       get_data(instr, "Nbody", IntType, &nbody, 0);
       if (nbody > MAXNBODY) {
            warning("Skipping data with too many (%d) items",nbody);
            get_tes(instr,"NotASnapShot");
            continue;
       }
       get_data(instr, "Mass", RealType, mass, nbody, 0);
       headline = get_string(instr, "Headline");
       get_tes(instr,"NotASnapShot");
       /*   process data   */
    }

    strclose(instr);

The loop is terminated at either end-of-file, or if the next object in instr is not a NotASnapShot.

It is easy to the skip for an item if you know if it is there:

    while(get_tag_ok(instr,"NotASnapShot"))     /*  ??????? */
        skip_item(instr,"NotASnapShot");

The routine skip_item() is only effective, or for that matter required, when doing input at the top level, i.e. not between a get_set() and matching get_tes(), since I/O at deeper levels is random w.r.t. items and sets. In other words, at the top level I/O is sequential, at lower levels random.

!Some lyrics on new random access features in filestruct will go in here!


next up previous contents index
Next: 6.2.6 vectmath.h Up: 6.2 The NEMO Macro Previous: 6.2.4 Alternatives to nemo_main   Contents   Index
(c) Peter Teuben