Error Handling

Text is notorious for being a constant and consistent malformed source of input. From intermediate services mangling encodings and producing Mojibake to bungled normalization and bad programs not understanding even the slightest hint of code beyond ASCII, there is a lot of text data that is strictly bad for any program to consume.

When interfacing with range types such as ztd::text::decode_view, functions like ztd::text::transcode, and individual .encode_one or .decode_one calls on encoding objects like ztd::text::utf8, you can:

  • give an error handler type as a template parameter and as part of the constructor; or,

  • pass it in as a normal argument to the function to be used.

They can change the conversion and other operations happen works. Consider, for example, this piece of code which translates from Korean UTF-8 to ASCII:

 1#include <ztd/text/transcode.hpp>
 2
 3#include <iostream>
 4
 5int main(int, char*[]) {
 6	// (1)
 7	std::string my_ascii_string = ztd::text::transcode(
 8	     // input
 9	     u8"안녕",
10	     // from this encoding
11	     ztd::text::utf8,
12	     // to this encoding
13	     ztd::text::ascii);
14
15	std::cout << my_ascii_string << std::endl;
16
17	return 0;
18}

Clearly, the Korean characters present in the UTF-8 string just cannot fit in a strict, 7-bit ASCII encoding. What, then, becomes the printed output from std::cout at // (2)? The answer is two ASCII question marks, ??. The ztd::text::replacement_handler_t object passed in at // (1) substitutes replacement characters (zero or more) into the output for any failed operation. There are multiple kinds of error handlers with varying behaviors:

  • replacement_handler_t, which inserts a substitution character specified by either the encoding object or some form using the default replacement character "U+FFFD" as well as skip over invalid input (either 1 input unit or as dictated by ztd::text::skip_input_error);

  • skip_handler_t, which skips over invalid input (and does not reflect it in the output) by either 1 input unit or as dictated by ztd::text::skip_input_error;

  • pass_handler, which simply returns the error result as it and, if there is an error, halts higher-level operations from proceeding forward;

  • default_handler, which is just a name for the replacement_handler_t or throw_handler or some other type based on compile time configuration of the library;

  • throw_handler, for throwing an exception on any failed operation;

  • incomplete_handler, which will accumulate 1 encode_one/decode_one’s worth of failure and let the end-user do something with it;

  • assume_valid_handler, which triggers no checking for many error conditions and can leads to ☢️☢️Undefined Behavior☢️☢️ if used on malformed input.

Warning

⚠️ For the love of what little remains holy, PLEASE don’t use ztd::text::assume_valid_handler unless you REALLY know you need it. It is a surefire way to open up vulnerabilities in your text processing algorithm. Not a single line of code using this type should pass code review if there is even the slightest thought that this will be used on any input that is not PERFECTLY under the DIRECT, PERSONAL control of the authors, auditors, and maintainers of the code.

These are all the error handlers that you have at your disposal, but they are just pre-provided types you can instantiate yourself. Nothing stops you from making your own error handling type! In order to do that, however, you need to understand what an error handler is composed of, and what it’s got inside of itself.