mgl-msgconv(1) Version 1.3.0 | MGL-Tools Manual

2023-05-17

NAME

mgl-msgconv - CSV to message data converter

SYNOPSIS

mgl-msgconv csv-filename [options]

DESCRIPTION

mgl-msgconv is a tool for converting multi-language message data into binary or C/C++ source code.

A CSV format file is used to input the message data. For details on the format, see the CSV FORMAT section.

CSV FORMAT

The source CSV file must be UTF-8 encoded according to RFC 4180. Most software can output RFC 4180 compliant CSV files by specifying ',' as the delimiter and CR+LF as the linefeed code.

The first record is the language name. Type "id" at the beginning of the field and the corresponding language name in the following fields. The first language entered is the default language.

Example: id, en_US, ja_JP

The second and subsequent records should begin with a label name followed by the appropriate language message. If the message is left blank, the default language message is used.

Example: HelloWorld, "hello, world", "こんにちは,世界"

It is recommended to use C/C++ variable naming conventions for the language name and message ID.

OPTIONS

--hash-seed

Specifies the seed value of the hash generation function to be used for binary output. If not specified, 0xA3F6C23E is used. Use this option to change the seed value in case of hash value collision.

--help

Display simple help and exit.

--indexing, --indexing-all

Converts the characters of the message data into an index and outputs a list of using characters in the message data in JSON format. The character list is used to resolve Unicode associations and can also be used for texture generation by passing it to mgl-font2tex. If --indexing is specified, indexing is performed for each language, and if --indexing-all is specified, all languages are indexed together. This option is ignored when the --source option is specified.

--output, -o

Specifies the output directory. If not specified, the output is directed to the current directory.

--prefix, -p

Specifies the prefix of the output file. If not specified, the prefix "msg" is used.

--replace

Replace a part of an existing file with the conversion result and output it. Writing a replace command in the source file replaces the part with the converted result. See the TEXT REPLACE section for details on the replace command.

--source, -s

Outputs the conversion result as a string that can be used in C or C++. When this option is used, binary data and related files are not output. The output file name of the message data for each language is prefix + language name + “.inc”, and the message ID is output as prefix + “_id.inc”. If the --replace option is used, these files are not output.

--version, -v

Display the version information and exit.

EXIT STATUS

0

Succeeded.

otherwise

Failed.

TEXT REPLACE

mgl-msgconv will recognize a string enclosed in ‘$’ as a replacement command.

Example: $REPLAC_COMMAND$

By separating the replacement commands with ‘:’, subsequent strings are treated as arguments to the command.

Example: $REPLAC_COMMAND:argument$

A replacement command must be written at the beginning of a line, and if any character other than a space or tab precedes the command, it will not be recognized as a replacement command. The indentation before the command is reflected in the result. Because replacement is done on a line-by-line basis, all characters after the replacement command up to the new line are ignored.

Replacement command

ID_LIST

Output a list of message IDs on that line. The source code output is in the same order as the output from MESSAGE_LIST. In the case of binary output, it is output as a hash value definition of the message ID.

LANGUAGE_LIST

Outputs a list of language names on that line. The order of the output is the order of the fields in the first record of the CSV.

MESSAGE_LIST:language

Outputs a list of messages in the language specified by the argument on the line. This command is available for source output only. The output order is the order in which the data is written to the CSV file.

IF:condition

Outputs the C preprocessor’s #if directive to the line. The argument is a condition that will be replaced by 0 or 1 on output. See the conditions section below for valid arguments.

ELIF:condition

The C preprocessor’s #elif directive is output to the line. The argument is a condition that is replaced by 0 or 1 in the output. See the conditions section below for valid arguments.

ELSE

Output the C preprocessor’s #else directive on that line.

ENDIF

Output the C preprocessor’s #endif directive on that line.

OUTPUT_NAME:name

Specifies the name of the replaced file. If name is a relative path, the path specified in the --output option is used as the starting point. If this command is not used, the same name as the input file name is used. If the input and output files are equivalent, “.replaced” is added to the end of the name to prevent overwriting.

Conditions

SOURCE

Outputs 1 if source code is output, 0 otherwise.

BINARY

Outputs 1 if the output is binary, 0 otherwise.

INDEXED

Outputs 1 if --indexing or --indexing-all is used to output the indexed string, 0 otherwise.

TRUE

Always outputs 1.

FALSE

Always outputs 0.

BINARY FORMAT

The binary data output by the converter contains the following contents in the following order.

All values are little-endian in byte order.

The header is 24 bytes in length and contains the following parameters.

+0 hashSeed (4 bytes)

The seed value for the hash calculation used by this file. The value specified by the --hash-seed option is stored in this value. See the HASH ALGORITHM section for the hash generation algorithm used by mgl-msgconv.

+4 identify (4 bytes)

It is an identifier to confirm that the file is a message data file, and contains the hashed value of the string "MessageData" using hashSeed. This value is used to verify that the file is message data and also to verify that the hash generation algorithm calculates the intended value.

+8 revision (4 bytes)

Revision information for this binary format. It is always 0 now and will be added in the future when the format is changed.

+12 languageCount (4 bytes)

The number of languages included in the message data.

+16 identifyCount (4 bytes)

The number of message IDs included in the message data.

+20 flags (4 bytes)

Bit flags that indicate the attributes of the message data. See the header bit flags described below for the contents.

Header bit flags

0-bit: Indeded strings flag

This flag indicates whether or not the string stored in the message pool is an indexed string. If this flag is 0, the message data is encoded in UTF-8.

1-bit to 31-bit: Unused

This area is currently not in use.

Language table

The language table contains 32-bit hash values of the language names specified in CSV. The total size is the number of languages x 4 bytes.

This table is used to derive an index from a given language.

Message ID table

The message ID hash table contains a 32-bit hash value of the name of each ID specified in CSV. The total size is the number of message IDs x 4 bytes.

This table is used to derive an index from a given message ID.

Offset pool

The offset pool contains values for deriving the message storage location from the language and message ID. The first 4 bytes contain the pool size in bytes, and the area for the pool size from the next byte is the offset pool.

The offset pool is an array containing offset values on the message pool in 4-byte units. To obtain the offset value from the language and message ID, read the elements of the index calculated in the following way

Index of message ID * number of languages + index of language

Elements read from this index are used as indexes in the message pool.

Message pool

The message pool is an area that stores message data. The pool size is stored in bytes in the first 4 bytes, and the area for the pool size is the message pool from the next byte.

By accessing elements read from the offset pool as indexes, the beginning of the message data corresponding to the language and message ID can be obtained.

The size of each element in the message pool depends on the encoding. If the message is stored in UTF-8, it is 1 byte; if it is an indexed string, it is 2 bytes.

Each message is terminated by a terminator. The terminator is 0x00 for UTF-8 and 0xFFFF for indexed string.

The footer is 4 bytes and contains a hashed 32-bit value of the string "EndOfRecord". If this value cannot be read after all elements have been read, parsing of the message data may have failed.

HASH ALGORITHM

mgl-msgconv uses the FNV1a algorithm for hash generation. The following is an example of a C++ implementation of this algorithm.

constexpr uint32_t kFNV1aPrime32 = 0x1000193;

constexpr uint32_t FNV1a(
        const char *str,
        const uint32_t seed) noexcept
{
    if (str[0] == '\0')
    {
        return seed;
    }
    
    return FNV1a(
            &str[1],
            (seed ^ uint32_t(str[0])) * kFNV1aPrime32);
}

SEE ALSO

mgl-font2tex(1)