CML-file

 

Ondra Hrstka <ondra@klobouk.fsv.cvut.cz>

June 6th, 2000


This is basic HTML documentation how to use a CML-file class.

1. Introduction & Simple Example

2. Instanciating & configuring CML-file

3. Reading data on lines

4. Reading data in sections

5. Checking Requirements

6. Outputting

7. Error flags

8. Compiling CML-file

9. Notices


1. Introduction & Simple Example

    Important note: CML-file stands for 'Cybule-like Multi Language'; obviously CML-file does not define any multi-language such as HTML; it's a joke :o). Firstly it was developed for my project Cybule (something like neuronal network) and was generalised into this.

    Well, what is it CML-file: it's a C++ class that provides some (I guess very easy) way to acquire data from a text configuration or data file. It uses very common "technology" of keywords: a keyword stands on the begining of each line. If somebody wants the rest of the line, simply sends the keyword to CML-file and CML-file returns wanted data. It's all for now but CML-file has some other capabilities, such as:

    Enough advertisment; let's go on with some instructive example of the file, that will be read by CML-file:

    # -- this file contains information about train 'W.A.Mozart' that is going from Berlin - Haupt Bahnhoff to Prague - Main Station --
    # -- definition of the train --
    Name "Wolfgang Amadeus Mozart"
    Locomotive type=Electric power=1750.5kW
    Wagon Post
    Wagon 1stClass 1
    Wagon Restaurant
    Wagon 2ndClass 2
    Wagon 2ndClass 3
    Wagon 2ndClass 4
    # -- crew --
    Machinenfuhrer "Helmut Diesenhoffer"
    Conductors 3
    ConductorNames "Irene Ober-Hausendorfer" "Joachim Heinzmann" "Carl Wolf"
    Cook "Jonathan Krueger-Stadler"
    Waiter "Heinrich Zuckermann"
    # -- stations --
    Station "Berlin - Haupt Bahnhoff"
    Station "Dresden"
    Station "Chemnitz"
    Station "Decin"
    Station "Usti nad Labem"
    Station "Prague - Main Station"
    # -- arrivals and departs
    Depart "Berlin-Haupt Bahnhoff" 10.15
    In "Dresden" 11.29 11.35
    In "Chemnitz" 13.02 13.15
    In "Decin" 14.44 14.50
    In "Usti nad Labem" 15.22 15.30
    Arrival "Prague - Main Station" 16.54
    # -- reservations --
    SECTION Reservations
    "Armin Muehler-Stahl" 1 33 "Berlin - Haupt Bahnhoff" "Prague - Main Station"
    "Jorgen Prochnow" 2 59 "Chemnitz" "Dresden"
    "Ludwig von Spretti-Weilbach" 2 58 "Berlin - Haupt Bahnhoff" "Dresden"
    END
    # -- end of file --

    I guess nobody needs any explanation of this absolutely stupid and useless example - but look, it's only an example :o).
    Several modes of data format occur in this example, so:

    Conductors 3

    This is the most simple form - a keyword and single value (int in this case); value can be also double or string, as in following:

    Name "Wolfgang Amadeus Mozart"

    Strings need not to be in commas ("something") but in case they include spaces.
    If you want to have more than one value on single line you may (and they need not to ne of the same type, one may be int and other double etc.). Example is:

   ConductorNames "Irene Ober-Hausendorfer" "Joachim Heinzmann" "Carl Wolf"

    CML-file does not provide any parsing of a single line; if you need to parse something like this:

    Locomotive type=Electric power=1750.5kW

    you have to read the rest of the line behind the keyword as a line (whole string till the end of line) and parse it using other routines.  As you can mention, this example does not tell how many wagons the train has - but it doesn't matter, CML-file will count them itself. So, when you want to read data of wagons, you start with asking "how many wagons do we have" and then (for example in some for-cycle) you read them one after one in the order as they are stored in the file by asking "give me type of wagon number 3" for example (but look, 3rd wagon is this:

    Wagon 2ndClass 2

 - every indexing starts from zero).  Notice: number 2 is number of passenger wagon that I use in reservations section - look there.
    The same comes for data of stations.
    Lines that contain information about stations between Berlin and Prague has the same keyword and you can read this using reading with key, where key is string-type and is for example "Dresden".  Than you can use 'reading with order' as in ConductorNames line; note that 11.29 (we are in Dresden) has order 0 and 11.35 has order 1:

    In "Dresden" 11.29 11.35

    Last thing are reservations. Between lines

    SECTION Reservations

    and

    END

    are stored data lines:

    "Armin Muehler-Stahl" 1 33 "Berlin - HauptBahnhoff" "Prague - Main Station"
  "Jorgen Prochnow" 2 59 "Chemnitz" "Dresden"
  "Ludwig von Spretti-Weilbach" 2 58 "Berlin - HauptBahnhoff" "Dresden"

    Reading of this is done by following steps:
    1. Tell CML-file which section you want to read (well, you may have more than one section in one file - but they must have different names).
    2. One after one you get whole these lines.
    3. Tell CML-file that you are done with this section (note: you may not start reading somewhere in other place of file during reading one section; first you must tell CML-file that you have finished the section.

2. Instanciating & configuring CML-file

    Well, just look how to instanciate some CML-file.
    The CML-file is an C++ class, so you must declare this as a variable (the instance of class):

    cmlfile F ;

    Or you may straightly open a file doing this:

    cmlfile F( "train.cml" ) ;

    I think I should note there that after all you may close it using:

    F.close() ;

    but it is not necessary, because destructor of the class itself closes everything. Well, generally it is better to use destructor, because CML-file has relatively high amount of memory allocated for itself (of course, not megabytes) and destructor frees this all.
    After instanciating CML-file you should configure it, which means, tell CML-file what keywords and what sections you will use and also the 'requirements' - this means for example the following: every train must have a locomotive, so line with keyword Locomotive is necessary - so required. The CML-file can than answer a question if every required data are present in the file.
    You have 3 ways to configure a CML-file: or using some C++ interface, or with so called rc-file or by 'inheritance' from another CML-file. No, I don't mean 'inheritance' in the C++ way of speaking but simply this: you send one CML-file to another and the other will copy the configuration to itself. It will copy - well , it's slow and I could only link it, but I assume, that another file may have something special and so I would like to leave the possibility to make additional changes.
    There is a function called

  void cmlfile::compile ( void )

that reads the file and creates a map. Normally it is called automatically; but you can call it by yourself if you want:

    F.compile() ;

2.1 Configuring using C++ interface

    It is the most complicated way. We have some couple of functions:

    void cmlfile::set_labels ( int olabels ) ;
    void cmlfile::set_sections ( int osections ) ;
    void cmlfile::set_label_string ( int olabel , char *olabel_string ) ;
    void cmlfile::set_section_string ( int olabel , char *osection_string ) ;
    void cmlfile::require ( int olabel ) ;
    void cmlfile::minimal ( int olabel , int ominimal ) ;
    void cmlfile::optionalize ( int olabel ) ;
    void cmlfile::refuse ( int olabel ) ;
    void cmlfile::set_minimal_rows ( int osection , int ominimal ) ;

    So:
    set_labels tells CML-file how many labels (keywords as for example Machinenfuhrer) will be used,
    set_sections tells CML-file how many sections (as Reservations) will be used,
    set_label_string tells that for example keyword for label number 5 is "Machinenfuhrer"; this number has no other meaning but it must be lower than number of labels that was set using set_labels; also, if I said that I will have 10 labels, their numbers must be between 0 and 9; 10 is too big,
    set_section_string is the same for sections,
    require tells CML-file that apropriate label is necessary to be in the file; it means that it must be there minimally once,
    minimal tells minimal amount of lines with apropriate label that must be in the file,
    optionalize tells that apropriate label need not to be in the file,
    refuse tells that apropriate label must not be in the file (but actually it does not generate an error if it catch such label),
    set_minimal_rows tells that apropriate section must have minimally some amount of rows; if no of functions above (since require) is used, default value is that every label is optional and minimal number of rows for all sections is 0.

    Now, I will show example function that configure CML-file for reading file that defines the train (but first I will make some #defines, because I don't want to remember numbers of labels):

      #include "cmlfile.h"

    #define Name 0
    #define Locomotive 1
    #define Wagon 2
    #define Machinenfuhrer 3
    #define Conductors 4
    #define ConductorNames 5
    #define Cook 6
    #define Waiter 7
    #define Station 8
    #define Depart 9
    #define In 10
    #define Arrival 11

    #define Reservations 0

    void configure ( cmlfile &oF )
    {
        oF.set_labels( 12 ) ;
        oF.set_sections( 1 ) ;

        oF.set_label_string( Name,"Name" ) ;
        oF.set_label_string( Locomotive,"Locomotive" ) ;
        oF.set_label_string( Wagon,"Wagon" ) ;
        oF.set_label_string( Machinenfuhrer,"Machinenfuhrer" ) ;
        oF.set_label_string( Conductors,"Conductors" ) ;
        oF.set_label_string( ConductorNames,"ConductorNames" ) ;
        oF.set_label_string( Cook,"Cook" ) ;
        oF.set_label_string( Waiter,"Waiter" ) ;
        oF.set_label_string( Station,"Station" ) ;
        oF.set_label_string( Depart,"Depart" ) ;
        oF.set_label_string( In,"In" ) ;
        oF.set_label_string( Arrival,"Arrival" ) ;

        oF.set_section_string( Reservations,"Reservations" ) ;

        oF.require( Name ) ;
        oF.require( Locomotive ) ;
        oF.minimal( Wagon,2 ) ;
        oF.require( Machinenfuhrer ) ;
        oF.require( Conductors ) ;
        oF.minimal( Station,2 ) ;
        oF.require( Depart ) ;
        oF.require( Arrival ) ;
    }

    Everything clear? Hope so :o)
    Note: you may use functions as require anytime, so for example, when you realise, that train goes through 4 station between Berlin and Prague, you may apply:

    F.minimal( In,4 ) ;

2.2 Configuring using an rc-file

    This is somehow more simple. You create an rc-file, that may look like this:

          Name 1
    Locomotive 1
    Wagon 2
    Machinenfuhrer 1
    Conductors 1
    ConductorNames 1
    Cook 0
    Waiter 0
    Station 2
    Depart 1
    In 0
    Arrival 1
    SECTION Reservations 0

    Number at the end of each lines means the minimal number of lines with apropriate keyword in the file.
    Assume that name of rc-file is example.rc. Then we can configure CML-file instead of function configure from previous subsection by calling this:

    F.load_labels( "example.rc" ) ;

    It opens rc-file and reads everuthing itself.
    Obviously, if you want another format of rc-file, you must read your own routine using the C++ interface.

2.3  Configuring using another CML-file

    Well, if you have for example CML-file F already configured, you can inherit it's configuration to another CML-file (for example F2) using this:

    F2.inherit_labels( F ) ;

    And it's all.

3. Reading data on lines

    There are several cases that may occur while reading data on lines (which means not in sections).

3.1 Simple data

    This is (in our railroady example):

       Name "Wolfgang Amadeus Mozart"
   Machinenfuhrer "Helmut Diesenhoffer"
   Conductors 3
   Cook "Jonathan Krueger-Stadler"
   Waiter "Heinrich Zuckermann"

    Nothing more than a keyword (single in whole file) and single data item. We have functions:

       void cmlfile::get_value ( int olabel , int &ovalue ) ;
   void cmlfile::get_value ( int olabel , double &ovalue ) ;
   void cmlfile::get_value ( int olabel , char *ovalue ) ;

    So we can acquire for example name of the train by this:

   char train_name[64] ;
   F.get_value( Name,train_name ) ;

    I would like to note here, that if you used rc-file to configure CML-file that in such case is not very apropriate to define these #defines as was done is subsection 2.1 - and this is because you may not be sure that you wrote items into rc-file in the order that is corresponing with #defines. So you must either remember this order (and it is not comfortable, I know) or instead of what is written above, use:

        F.get_value( LABEL( "Name" ),train_name ) ;

3.2 Multiple data on one line

    Example again:

    ConductorNames "Irene Ober-Hausendorfer" "Joachim Heinzmann" "Carl Wolf"

    Irene Ober-Hausendorfer has order 0, Joachim Heinzmann 1 and Carl Wolf 2.
    Functions are similar only they have one parameter additional:

    void cmlfile::get_value ( int olabel , int oorder , int &ovalue ) ;
    void cmlfile::get_value ( int olabel , int oorder , double &ovalue ) ;
    void cmlfile::get_value ( int olabel , int oorder , char *ovalue ) ;

    Carl Wolf can be obtained by this:

    char conductor_name3[64] ;
    F.get_value( ConductorNames,2,conductor_name3 ) ;

3.3 Multiple data for one label

    Example are wagons:

    Wagon Post
    Wagon 1stClass 1
    Wagon Restaurant
    Wagon 2ndClass 2
    Wagon 2ndClass 3
    Wagon 2ndClass 4

    First of all you should ask CML-file how many wagons do we have:

    int number_of_wagons ;
    F.get_value_count( Wagon,number_of_wagons ) ;

    Then you may read information on wagons one after one in cycle:

    int i,wagon_id ;
    char wagon[64] ;
    for ( i=0 ; i<number_of_wagons ; i++ )
    {
        F.get_value_with_index( Wagon,i,wagon ) ;
        if ( !strcmp( wagon,"1stClass" ) || ( !strcmp( wagon,"2ndClass" )))
        {
            F.get_value_with_index( Wagon,i,1,wagon_id ) ;
        }
    }

    Note - I have read the number at the end of the line if type of wagon was 1stClass or 2ndClass using the function that has additional parameter order like in previous subsection.
    Yes, I merely forgot to citate the function headers:

    void cmlfile::get_value_with_index ( int olabel , int oindex , int &ovalue ) ;
    void cmlfile::get_value_with_index ( int olabel , int oindex , double &ovalue ) ;
    void cmlfile::get_value_with_index ( int olabel , int oindex , char *ovalue ) ;
    void cmlfile::get_value_with_index ( int olabel , int oindex , int oorder , int &ovalue ) ;
    void cmlfile::get_value_with_index ( int olabel , int oindex , int oorder , double &ovalue ) ;
    void cmlfile::get_value_with_index ( int olabel , int oindex , int oorder , char *ovalue ) ;

3.4 Data with label and keys

    Sometimes we may have multiple lines with the same keyword and we want to choose which one to read each time by some additional key; in our example - we already know through what stations our train goes and now we need to know where it arrives there and when leaves. These lines contain such information.

    In "Dresden" 11.29 11.35
   In "Chemnitz" 13.02 13.15
   In "Decin" 14.44 14.50
   In "Usti nad Labem" 15.22 15.30

    So we should use some function of type get_value_with_key; here they are:

    void cmlfile::get_value_with_key ( int olabel , char *okey , int &ovalue ) ;
    void cmlfile::get_value_with_key ( int olabel , char *okey , double &ovalue ) ;
    void cmlfile::get_value_with_key ( int olabel , char *okey , char *ovalue ) ;
    void cmlfile::get_value_with_key ( int olabel , char *okey , int oorder , int &ovalue ) ;
    void cmlfile::get_value_with_key ( int olabel , char *okey , int oorder , double &ovalue ) ;
    void cmlfile::get_value_with_key ( int olabel , char *okey , int oorder , char *ovalue ) ;

    Say that we would like to acquire data for Chemnitz. Just use this:

    double chemnitz_arrival,chemnitz_leave ;
    F.get_value_with_key( In,"Chemnitz",0,chemnitz_arrival ) ;
    F.get_value_with_key( In,"Chemnitz",1,chemnitz_leave ) ;

    The key is not necessarily a string; it may also be simply int. Then you may use these set of functions:

    void cmlfile::get_value_with_key ( int olabel , int okey , int &ovalue ) ;
    void cmlfile::get_value_with_key ( int olabel , int okey , double &ovalue ) ;
    void cmlfile::get_value_with_key ( int olabel , int okey , char *ovalue ) ;
    void cmlfile::get_value_with_key ( int olabel , int okey , int oorder , int &ovalue ) ;
    void cmlfile::get_value_with_key ( int olabel , int okey , int oorder , double &ovalue ) ;
    void cmlfile::get_value_with_key ( int olabel , int okey , int oorder , char *ovalue ) ;

3.5 Percentages

    It's absolutely simple. Double value may be given also by percentage - instead 0.05 there is 5%. If you use one of these functions:

    void cmlfile::get_percentage ( int olabel , double &ovalue ) ;
    void cmlfile::get_percentage ( int olabel , int oorder , double &ovalue ) ;
    void cmlfile::get_percentage_with_key ( int olabel , int okey , double &ovalue ) ;
    void cmlfile::get_percentage_with_key ( int olabel , int okey , int oorder , double &ovalue ) ;
    void cmlfile::get_percentage_with_key ( int olabel , char *okey , double &ovalue ) ;
    void cmlfile::get_percentage_with_key ( int olabel , char *okey , int oorder , double &ovalue ) ;
    void cmlfile::get_percentage_with_index ( int olabel , int okey , double &ovalue ) ;
    void cmlfile::get_percentage_with_index ( int olabel , int okey , int oorder , double &ovalue ) ;

    they automaticaly choose whether it is written as  the first or second case and in both return 0.05.
    Promiles are not supported :o).

3.6 Reading whole line for parsing using other routines

    As was said before, CML-file does not provide any functionality for parsing lines such as:

    Locomotive type=Electric power=1750.5kW

    So you must parse it yourself. CML-file only gives you the line, ifever there is only one line or multiple lines. If you have more than one line, you can access them either by index similar as functions get_value_with_index  either using int or string key similar as with functions get_value_with_key. So there are these functions:

      void cmlfile::get_line ( int olabel , char *oline ) ;
     void cmlfile::get_line_with_key ( int olabel , int okey , char *oline ) ;
     void cmlfile::get_line_with_key ( int olabel , char *okey , char *oline ) ;
     void cmlfile::get_line_with_index ( int olabel , int oindex , char *oline ) ;     .

    Using this:

    char line[256] ;
    F.get_line( Locomotive,line ) ;

    you acquire string 'type=Electic power=1750.5kW' into the variable line.

3.7 Reading data while they are spread into multiple files

    Very shortly. Files may be included one into other using directive FILE. I decided about the limit of maximum files opened from and it is now 16. If you disagree, just hack my cmlfile.c file, find #define MaxFiles 16, correct it and recompile.
    So, to using it, simply write something like:

        FILE otherfile.cml

    into file (like train.cml) and everything else work (I hope :o)).
    And - if you place 6 stations in file train.cml and 4 in file otherfile.cml, the train will than go through 10 stations (get_value_count will return 10.) Is it clear?

4. Reading data in sections

    Some kind of data are such that using keyword with them is the right thing. Assume that if we have very long list of data where each item has the same syntax, using keyword on the beggining of each line will make our file unusably too long. So we close them all into the section, that looks like this:

     SECTION Reservations
    "Armin Muehler-Stahl" 1 33 "Berlin - Haupt Bahnhoff" "Prague - Main Station"
    "Jorgen Prochnow" 2 59 "Chemnitz" "Dresden"
    "Ludwig von Spretti-Weilbach" 2 58 "Berlin - Haupt Bahnhoff" "Dresden"
    END

    Reading of these data contains 3 steps.
    1. Start reading some section - you will get number of rows.

    int rows;
    F.find_section( Reservation,rows ) ;

    2. Read one after one all lines

    char line[256] ;
    int i ;
    for ( i=0 ; i<rows ; i++ )
    {
        F.get_section_line( Reservations,line ) ;
    }

    3. Tell CML-file that you are done reading this section. As was mentioned before, CML-file will not allow you to read any other data if you don't finish reading the section.

    F.end_reading_section() ;

    There are headers of our functions:

    void cmlfile::find_section ( int olabel , int &osection_rows ) ;
    void cmlfile::get_section_line ( int olabel , char *oline ) ;
    void cmlfile::end_reading_section ( void ) ;

    How many sections can you have? As many as you want, but they must have different names. Also, you can spread one section into multiple files and CML-file will paste them into one.
    If you used rc-file to configure CML-file, it is not very useful to send directly the labels into functions above and instead you may write:

    F.find_section( SECTION( "Reservations" ),rows ) ;

    and so on.

5. Checking requirements

    CML-file is able to check whether the file contains enough data. Look in chapter Instanciating & configuring CML-file, there I explained how to tell CML-file, which data are required and how many times. After this you can use couple of functions that can tell you if file is OK or not:

    void cmlfile::check_requirements ( void ) ;
    void cmlfile::check_label ( int olabel ) ;
    void cmlfile::check_section ( int osection ) ;

    Must I explain details? Guess not.
    Instead I will say something about what these functions are doing if they find some problem. They:

    1. Write some sentence to console.
    2. Sets some internal flag.

    You can check whether some problem occures using this function:

    int cmlfile::error_in_requirements ( void ) ;

    which returns 1 if problems have occured and 0 if everything is OK.
    Well, I think that you won't be satisfied with this working-around-problems, but if you know something better, just hack my cybfile.c file. Find these functions - I think everybody can understand very easily.

6. Ouputting

    CML-file provides capability for writing into some output file also. It uses the same labels and so it prints out the correct strings as keywords into output file. If you find this useless, you obviously need not to use it :o).
    I will present here functions and examples and for each example I will present it's output. But first you must open some file for output:

    cmlfile outF ;
    F.open_for_output( "out.cml" )  ;

    Than just use these functions:

    void cmlfile::out ( int olabel , int ovalue ) ;
    void cmlfile::out ( int olabel , double ovalue ) ;
    void cmlfile::out ( int olabel , char *ovalue ) ;
    void cmlfile::out_label ( int olabel ) ;
    void cmlfile::out_int ( int ovalue ) ;
    void cmlfile::out_double ( double ovalue ) ;
    void cmlfile::out_string ( char *ovalue ) ;
    void cmlfile::out_eoln ( void ) ;
    void cmlfile::out_section ( int olabel ) ;
    void cmlfile::out_data ( char *oline ) ;
    void cmlfile::out_section_end ( void ) ;

    Now, one after one:

    outF.out( Name,"Wolfgang Amadeus Mozart" ) ;

    will write this:

    Name "Wolfgang Amadeus Mozart"

    Then:

    outF.out_label( ConductorNames ) ;
    outF.out_string( "Joseph Keitel" ) ;
    outF.out_string( "Heinz Albert Petersen" ) ;
    outF.out_eoln() ;

    will write this:

    ConductorNames "Joseph Keitel" "Heinz Albert Petersen"

    And:

    outF.out_section( Reservations ) ;
    outF.out_data( "Helmut Grockenberger 4 70 Dresden Decin" ) ;
    outF.out_section_end() ;

    will write this:

    SECTION Reservations
    Helmut Grockenberger 4 70 Dresden Decin
    END

    It is not exactly what you expect? Commas missing? Well, section data are not my problem, you know. CML-file only reads and writes whole lines.

7. Error flags

    If some error ocuurs, CML-file sets some internal flag. These functions tell you if some error has occured.

    int cmlfile::error_opening_file ( void ) ;
    int cmlfile::error_reading_file ( void ) ;
    int cmlfile::error_writing_file ( void ) ;
    int cmlfile::error_in_requirements ( void ) ;
    int cmlfile::any_error ( void ) ;

    You should check it - "time to time." Note that if error occurs and you want to continue reading, you should use this:

    void cmlfile::clear_errors ( void ) ;

    CML-file allows you reading next data even if error-flag is set to 1. But - you cannot recognize any new errors that may occur next because flags are already set to 1.

8. Compiling CML-file

    CML-file is now released as shared library. You get file cmfile-0001.tgz; unpack it, than type 'make' or 'make lib' -  it creates the library; than become root and type 'make install' - first of it you can edit lines:

        INSTDIR_INCLUDE = /usr/include
    INSTDIR_LIB = /usr/lib

    These lines specify destination, where header file (cmlfile.h) and library (libcmlfile.so) are going to be stored. (Wow, these lines you can find in the makefile.)

    After having successfully installed cmlfile, you can use it in some your own program. Just include the header:

        #include <cmlfile.h>

    When compiling and linking your program, use switch -lcmlfile to your linker. Such as:

    g++ -o something something.o -lcmlfile

    Please do not forget that cmlfile is c++ class, so you need g++ to compile the piece of program that uses it.

    
9. Notices

    This is the first released version of cmlfile (cmlfile-0001). I hope it does not contain fatal errors, but who knows...
    I must mention, that this comes under GPL.