Error Handler Anatomy

An error handler is just a function (or an object with a function call operator) that takes 3 parameters and returns 1 result:

takes the encoding that will call it when something goes wrong;
takes the result object you expect to be working with (specifically, ztd::text::encode_result and ztd::text::decode_result), which contains the current state of affairs from the encoding operation;
takes a contiguous range representing any input values that may have been read but will not be used; and,
returns the same result type with any modifications (or not!) you’d like to make.

They are classes with a function call operator and utilizes a few templates. Here’s the skeleton for one:

#include <ztd/text.hpp>

struct my_error_handler {
	// Helper definitions
	template <typename Encoding>
	using code_point_span = ztd::span<const ztd::text::code_point_t<Encoding>>;
	template <typename Encoding>
	using code_unit_span = ztd::span<const ztd::text::code_unit_t<Encoding>>;

	// Function call operator that returns a "deduced" (auto) type
	// Specifically, this one is called for encode failures
	template <typename Encoding, typename Input, typename Output, typename State>
	auto operator()(
	     // First Parameter
	     const Encoding& encoding,
	     // Second Parameter, encode-specific
	     ztd::text::encode_result<Input, Output, State> result,
	     // Third Parameter
	     code_point_span<Encoding> input_progress,
	     // Fourth Parameter
	     code_unit_span<Encoding> output_progress) const noexcept {
		// ... implementation here!
		(void)encoding;
		(void)input_progress;
		(void)output_progress;
		return result;
	}

	// Function call operator that returns a "deduced" (auto) type
	// Specifically, this one is called for decode failures
	template <typename Encoding, typename Input, typename Output, typename State>
	auto operator()(
	     // First Parameter
	     const Encoding& encoding,
	     // Second Parameter, decode-specific
	     ztd::text::decode_result<Input, Output, State> result,
	     // Third Parameter
	     code_unit_span<Encoding> input_progress,
	     // Fourth Parameter
	     code_point_span<Encoding> output_progress) const noexcept {
		// ... implementation here!
		(void)encoding;
		(void)input_progress;
		(void)output_progress;
		return result;
	}
};

int main(int, char* argv[]) {

	// convert from execution encoding to utf8 encoding,
	// using our new handler
	std::string utf8_string = ztd::text::transcode(std::string_view(argv[0]),
	     ztd::text::execution, ztd::text::compat_utf8, my_error_handler {});

	return 0;
}

This skeleton, by itself, works. It doesn’t do anything: it just returns the result object as-is. This will result in the algorithm stopping exactly where the error occurs, and returning back to the user. This is because the result has an error_code member variable, and that member variable, when it reaches the higher level algorithms, stops all encoding, decoding, transcoding, counting, validation, and etc. work and exists with the proper information.

First Parameter

The first parameter is simple enough: it is the encoding that is calling this error handler. If you invoke an encode_one or decode_one (or a higher-level conversion algorithm) on a ztd::text::utf8 object, then you can expect a first parameter of type ztd::text::utf8 to be passed to the error handler.

Note

👉 If the function call .encode_one or .decode_one is a static function that has no instance, then the encoding object will create a temporary instance to pass to the function. This happens with most encodings that do not contain any pertinent information on the encoding object itself, like all the Unicode encodings and the ASCII/locale/string literal encodings.

This can be handy if you need to access information about the encoding object or encoding type. You can get information about the encoding by using:

Second Parameter

The second parameter is the result object. It is of the type ztd::text::decode_result or ztd::text::encode_result. The two types have identical information inside of them, but have different names so that a function call operator can tell the difference between the two, if it’s necessary.

This contains all of the state and information that the decode operation/encode operation would return, if left unmodified by the error handler. If you don’t want to do anything to it, simply pass it through by returning it with return result;. Otherwise, you have access to the input range, the output range, any .state relevant to the operation, the .error_code, and the .error_handled value. You can modify any one of theses, or even perform a recovery operation and change the .error_code to be ztd::text::encoding_error::ok. Literally, anything can be done!

For example, someone can see if there is space left in the result.output parameter, and if so attempt to serialize a replacement character in place there (this is what ztd::text::replacement_handler_t does).

Third Parameter

The third parameter is a contiguous range of input values that were read. Typically, this is a ztd::span handed to you, or something that can construct a ztd::span or either code units or code points (whatever the output type has). This is useful for input_ranges and input_iterators where it is impossible to guarantee a value can be written, as is the case with istream_iterator and other I/O-style iterators and ranges.

Fourth Parameter

The fourth parameter is a contiguous range of output values that were almost written to the output, but could not be because the output has no more room left. Typically, this is a ztd::span handed to you, or something that can construct a ztd::span or either code units or code points (whatever the input type has). This is particularly useful for output_ranges and output_iterators where there is no way to guarantee all characters will be successfully written, as is the case with ostream_iterator and other I/O-style iterators and ranges.

The fourth parameter is only ever filled out if the error returned is ztd::text::encoding_error::insufficient_output. It is very important for when someone does bulk-buffered writes, since multiple writes are not guaranteed to fit within the given ztd::text::max_code_points_v or ztd::text::max_code_units_v for a specific encoding. (They only represent the maximum for a single, indivisible operation.)

This is useful for grabbing any would-be-written output data, and storing it for later / completing it. For example, writing to a smaller, contiguous buffer for delivery and looping around that buffer can be faster, but it runs the risk of partial reads/writes on the boundaries of said smaller, contiguous buffer.

Secret Type Definition

There is a type definition you can add to your error handler to signal that it is okay to ignore it’s calls. It goes on the struct and looks like:

using assume_valid = std::false_type; // or std::true_type

This is allows any encoding which uses ztd::text::is_ignorable_error_handler property on your error handler to know if it’s okay to ignore the error handler when bad things happen. Having this functionality means you can create a “debug handler” for text you previously know is valid, but might want to check during a debug or tracing build or something as it encodes and decodes through the system:

struct my_debug_handler {

        // Assume it's valid if the config value
        // is explicitly turned off
        using assume_valid = std::integral_constant<
                bool, (MY_ENCODING_TRACE_IS_TURNED_OFF != 0)
        >;

        // rest of the implementation...
};