2023-05-17
mgl-msgconv - CSV to message data converter
mgl-msgconv csv-filename [options]
mgl-msgconv is a tool for converting multi-language message data into binary or C/C++ source code.
A CSV format file is used to input the message data. For details on the format, see the CSV FORMAT section.
The source CSV file must be UTF-8 encoded according to RFC 4180. Most software can output RFC 4180 compliant CSV files by specifying ',' as the delimiter and CR+LF as the linefeed code.
The first record is the language name. Type "id" at the beginning of the field and the corresponding language name in the following fields. The first language entered is the default language.
Example: id, en_US, ja_JP
The second and subsequent records should begin with a label name followed by the appropriate language message. If the message is left blank, the default language message is used.
Example: HelloWorld, "hello, world", "こんにちは,世界"
It is recommended to use C/C++ variable naming conventions for the language name and message ID.
--hash-seed
Specifies the seed value of the hash generation function to be used for binary output. If not specified, 0xA3F6C23E is used. Use this option to change the seed value in case of hash value collision.
--help
Display simple help and exit.
--indexing, --indexing-all
Converts the characters of the message data into an index and outputs
a list of using characters in the message data in JSON format. The
character list is used to resolve Unicode associations and can also be
used for texture generation by passing it to mgl-font2tex. If
--indexing
is specified, indexing is performed for each
language, and if --indexing-all
is specified, all languages
are indexed together. This option is ignored when the
--source
option is specified.
--output, -o
Specifies the output directory. If not specified, the output is directed to the current directory.
--prefix, -p
Specifies the prefix of the output file. If not specified, the prefix "msg" is used.
--replace
Replace a part of an existing file with the conversion result and output it. Writing a replace command in the source file replaces the part with the converted result. See the TEXT REPLACE section for details on the replace command.
--source, -s
Outputs the conversion result as a string that can be used in C or
C++. When this option is used, binary data and related files are not
output. The output file name of the message data for each language is
prefix + language name + “.inc”, and the message ID is output as prefix
+ “_id.inc”. If the --replace
option is used, these files
are not output.
--version, -v
Display the version information and exit.
0
Succeeded.
otherwise
Failed.
mgl-msgconv will recognize a string enclosed in ‘$’ as a replacement command.
Example: $REPLAC_COMMAND$
By separating the replacement commands with ‘:’, subsequent strings are treated as arguments to the command.
Example: $REPLAC_COMMAND:argument$
A replacement command must be written at the beginning of a line, and if any character other than a space or tab precedes the command, it will not be recognized as a replacement command. The indentation before the command is reflected in the result. Because replacement is done on a line-by-line basis, all characters after the replacement command up to the new line are ignored.
ID_LIST
Output a list of message IDs on that line. The source code output is in the same order as the output from MESSAGE_LIST. In the case of binary output, it is output as a hash value definition of the message ID.
LANGUAGE_LIST
Outputs a list of language names on that line. The order of the output is the order of the fields in the first record of the CSV.
MESSAGE_LIST:language
Outputs a list of messages in the language specified by the argument on the line. This command is available for source output only. The output order is the order in which the data is written to the CSV file.
IF:condition
Outputs the C preprocessor’s #if directive to the line. The argument is a condition that will be replaced by 0 or 1 on output. See the conditions section below for valid arguments.
ELIF:condition
The C preprocessor’s #elif directive is output to the line. The argument is a condition that is replaced by 0 or 1 in the output. See the conditions section below for valid arguments.
ELSE
Output the C preprocessor’s #else directive on that line.
ENDIF
Output the C preprocessor’s #endif directive on that line.
OUTPUT_NAME:name
Specifies the name of the replaced file. If name is a relative path,
the path specified in the --output
option is used as the
starting point. If this command is not used, the same name as the input
file name is used. If the input and output files are equivalent,
“.replaced” is added to the end of the name to prevent overwriting.
SOURCE
Outputs 1 if source code is output, 0 otherwise.
BINARY
Outputs 1 if the output is binary, 0 otherwise.
INDEXED
Outputs 1 if --indexing
or --indexing-all
is used to output the indexed string, 0 otherwise.
TRUE
Always outputs 1.
FALSE
Always outputs 0.
The binary data output by the converter contains the following contents in the following order.
All values are little-endian in byte order.
The header is 24 bytes in length and contains the following parameters.
+0 hashSeed (4 bytes)
The seed value for the hash calculation used by this file. The value
specified by the --hash-seed
option is stored in this
value. See the HASH ALGORITHM section for the hash generation algorithm
used by mgl-msgconv.
+4 identify (4 bytes)
It is an identifier to confirm that the file is a message data file, and contains the hashed value of the string "MessageData" using hashSeed. This value is used to verify that the file is message data and also to verify that the hash generation algorithm calculates the intended value.
+8 revision (4 bytes)
Revision information for this binary format. It is always 0 now and will be added in the future when the format is changed.
+12 languageCount (4 bytes)
The number of languages included in the message data.
+16 identifyCount (4 bytes)
The number of message IDs included in the message data.
+20 flags (4 bytes)
Bit flags that indicate the attributes of the message data. See the header bit flags described below for the contents.
0-bit: Indeded strings flag
This flag indicates whether or not the string stored in the message pool is an indexed string. If this flag is 0, the message data is encoded in UTF-8.
1-bit to 31-bit: Unused
This area is currently not in use.
The language table contains 32-bit hash values of the language names specified in CSV. The total size is the number of languages x 4 bytes.
This table is used to derive an index from a given language.
The message ID hash table contains a 32-bit hash value of the name of each ID specified in CSV. The total size is the number of message IDs x 4 bytes.
This table is used to derive an index from a given message ID.
The offset pool contains values for deriving the message storage location from the language and message ID. The first 4 bytes contain the pool size in bytes, and the area for the pool size from the next byte is the offset pool.
The offset pool is an array containing offset values on the message pool in 4-byte units. To obtain the offset value from the language and message ID, read the elements of the index calculated in the following way
Index of message ID * number of languages + index of language
Elements read from this index are used as indexes in the message pool.
The message pool is an area that stores message data. The pool size is stored in bytes in the first 4 bytes, and the area for the pool size is the message pool from the next byte.
By accessing elements read from the offset pool as indexes, the beginning of the message data corresponding to the language and message ID can be obtained.
The size of each element in the message pool depends on the encoding. If the message is stored in UTF-8, it is 1 byte; if it is an indexed string, it is 2 bytes.
Each message is terminated by a terminator. The terminator is 0x00 for UTF-8 and 0xFFFF for indexed string.
The footer is 4 bytes and contains a hashed 32-bit value of the string "EndOfRecord". If this value cannot be read after all elements have been read, parsing of the message data may have failed.
mgl-msgconv uses the FNV1a algorithm for hash generation. The following is an example of a C++ implementation of this algorithm.
constexpr uint32_t kFNV1aPrime32 = 0x1000193;
constexpr uint32_t FNV1a(
const char *str,
const uint32_t seed) noexcept
{
if (str[0] == '\0')
{
return seed;
}
return FNV1a(
&str[1],
(seed ^ uint32_t(str[0])) * kFNV1aPrime32);
}
mgl-font2tex(1)