Error Handler Anatomy
An error handler is just a function (or an object with a function call operator) that takes 3 parameters and returns 1 result:
takes the encoding that will call it when something goes wrong;
takes the result object you expect to be working with (specifically, ztd::text::encode_result and ztd::text::decode_result), which contains the current state of affairs from the encoding operation;
takes a contiguous range representing any input values that may have been read but will not be used; and,
returns the same result type with any modifications (or not!) you’d like to make.
They are classes with a function call operator and utilizes a few templates. Here’s the skeleton for one:
1#include <ztd/text.hpp>
2
3struct my_error_handler {
4 // Helper definitions
5 template <typename Encoding>
6 using code_point_span = ztd::span<const ztd::text::code_point_t<Encoding>>;
7 template <typename Encoding>
8 using code_unit_span = ztd::span<const ztd::text::code_unit_t<Encoding>>;
9
10 // Function call operator that returns a "deduced" (auto) type
11 // Specifically, this one is called for encode failures
12 template <typename Encoding, typename Input, typename Output, typename State>
13 auto operator()(
14 // First Parameter
15 const Encoding& encoding,
16 // Second Parameter, encode-specific
17 ztd::text::encode_result<Input, Output, State> result,
18 // Third Parameter
19 code_point_span<Encoding> input_progress,
20 // Fourth Parameter
21 code_unit_span<Encoding> output_progress) const noexcept {
22 // ... implementation here!
23 (void)encoding;
24 (void)input_progress;
25 (void)output_progress;
26 return result;
27 }
28
29 // Function call operator that returns a "deduced" (auto) type
30 // Specifically, this one is called for decode failures
31 template <typename Encoding, typename Input, typename Output, typename State>
32 auto operator()(
33 // First Parameter
34 const Encoding& encoding,
35 // Second Parameter, decode-specific
36 ztd::text::decode_result<Input, Output, State> result,
37 // Third Parameter
38 code_unit_span<Encoding> input_progress,
39 // Fourth Parameter
40 code_point_span<Encoding> output_progress) const noexcept {
41 // ... implementation here!
42 (void)encoding;
43 (void)input_progress;
44 (void)output_progress;
45 return result;
46 }
47};
48
49int main(int, char* argv[]) {
50
51 // convert from execution encoding to utf8 encoding,
52 // using our new handler
53 std::string utf8_string = ztd::text::transcode(std::string_view(argv[0]),
54 ztd::text::execution, ztd::text::compat_utf8, my_error_handler {});
55
56 return 0;
57}
This skeleton, by itself, works. It doesn’t do anything: it just returns the result
object as-is. This will result in the algorithm stopping exactly where the error occurs, and returning back to the user. This is because the result
has an error_code
member variable, and that member variable, when it reaches the higher level algorithms, stops all encoding, decoding, transcoding, counting, validation, and etc. work and exists with the proper information.
First Parameter
The first parameter is simple enough: it is the encoding that is calling this error handler. If you invoke an encode_one
or decode_one
(or a higher-level conversion algorithm) on a ztd::text::utf8 object, then you can expect a first parameter of type ztd::text::utf8
to be passed to the error handler.
Note
👉 If the function call .encode_one
or .decode_one
is a static function that has no instance, then the encoding object will create a temporary instance to pass to the function. This happens with most encodings that do not contain any pertinent information on the encoding object itself, like all the Unicode encodings and the ASCII/locale/string literal encodings.
This can be handy if you need to access information about the encoding object or encoding type. You can get information about the encoding by using:
Second Parameter
The second parameter is the result object. It is of the type ztd::text::decode_result or ztd::text::encode_result. The two types have identical information inside of them, but have different names so that a function call operator can tell the difference between the two, if it’s necessary.
This contains all of the state and information that the decode operation/encode operation would return, if left unmodified by the error handler. If you don’t want to do anything to it, simply pass it through by returning it with return result;
. Otherwise, you have access to the input
range, the output
range, any .state
relevant to the operation, the .error_code
, and the .error_handled
value. You can modify any one of theses, or even perform a recovery operation and change the .error_code
to be ztd::text::encoding_error::ok
. Literally, anything can be done!
For example, someone can see if there is space left in the result.output
parameter, and if so attempt to serialize a replacement character in place there (this is what ztd::text::replacement_handler_t does).
Third Parameter
The third parameter is a contiguous range of input values that were read. Typically, this is a ztd::span
handed to you, or something that can construct a ztd::span
or either code units or code points (whatever the output type has). This is useful for input_range
s and input_iterator
s where it is impossible to guarantee a value can be written, as is the case with istream_iterator and other I/O-style iterators and ranges.
Fourth Parameter
The fourth parameter is a contiguous range of output values that were almost written to the output, but could not be because the output has no more room left. Typically, this is a ztd::span
handed to you, or something that can construct a ztd::span
or either code units or code points (whatever the input type has). This is particularly useful for output_range
s and output_iterator
s where there is no way to guarantee all characters will be successfully written, as is the case with ostream_iterator and other I/O-style iterators and ranges.
The fourth parameter is only ever filled out if the error returned is ztd::text::encoding_error::insufficient_output. It is very important for when someone does bulk-buffered writes, since multiple writes are not guaranteed to fit within the given ztd::text::max_code_points_v or ztd::text::max_code_units_v for a specific encoding. (They only represent the maximum for a single, indivisible operation.)
This is useful for grabbing any would-be-written output data, and storing it for later / completing it. For example, writing to a smaller, contiguous buffer for delivery and looping around that buffer can be faster, but it runs the risk of partial reads/writes on the boundaries of said smaller, contiguous buffer.
Secret Type Definition
There is a type definition you can add to your error handler to signal that it is okay to ignore it’s calls. It goes on the struct and looks like:
using assume_valid = std::false_type; // or std::true_type
This is allows any encoding which uses ztd::text::is_ignorable_error_handler property on your error handler to know if it’s okay to ignore the error handler when bad things happen. Having this functionality means you can create a “debug handler” for text you previously know is valid, but might want to check during a debug or tracing build or something as it encodes and decodes through the system:
1struct my_debug_handler {
2
3 // Assume it's valid if the config value
4 // is explicitly turned off
5 using assume_valid = std::integral_constant<
6 bool, (MY_ENCODING_TRACE_IS_TURNED_OFF != 0)
7 >;
8
9 // rest of the implementation...
10};