Lossy Operation Protection

Occasionally, you will end up in a situation where you want to convert some text from its pristine and ideal Unicode form to some other form. Maybe for interoperation purposes, maybe because some function call can’t properly handle embedded NULs in the text so you need to use an overlong sequence to encode the 0 value in your text. No matter what the case is, you need to leave the world of Unicode Code Points, Unicode Scalar Values, and all the guarantees they provide you. Let’s take an example, going from UTF-8 to 7-bit-clean ASCII:

#include <ztd/text/transcode.hpp>

#include <iostream>

int main(int, char*[]) {
	// (1)
	std::string my_ascii_string = ztd::text::transcode(
	     // input
	     u8"안녕",
	     // from this encoding
	     ztd::text::utf8,
	     // to this encoding
	     ztd::text::ascii);

	std::cout << my_ascii_string << std::endl;

	return 0;
}

This will produce a compile time error (with this error number for MSVC as an example):

error C2338: The encode (output) portion of this transcode is a lossy, non-injective operation. This means you may lose data that you did not intend to lose; specify an ‘out_handler’ error handler parameter to transcode[_to](in, in_encoding, out_encoding, in_handler, out_handler, ...) or transcode_into_raw(in, in_encoding, out, out_encoding, in_handler, out_handler, ...) explicitly in order to bypass this.

The reason this happens is because we can detect, at compile time, that the conversion from Unicode Code Points to ASCII is a lossy transformation. When this happens, we realize the conversion will be a lossy one: therefore, it makes sense that the user cannot perform the encoding or decoding operation without being explicit about how they are going to handle errors because there is such a gigantically enormous possibility that they will mangle incoming text.

Since this library is trying to prevent Mojibake and other encoding problems, you are required to tag any potentially-lossy encoding with an error handler, to be explicit and acknowledge that you may or may not be ruining someone’s day:

#include <ztd/text/transcode.hpp>

#include <iostream>

int main(int, char*[]) {
	std::string my_ascii_string = ztd::text::transcode(
	     // input
	     u8"안녕",
	     // from this encoding
	     ztd::text::utf8,
	     // to this encoding
	     ztd::text::ascii,
	     // (1) error handler
	     ztd::text::replacement_handler);

	std::cout << my_ascii_string << std::endl; // (2)

	ZTD_TEXT_ASSERT(my_ascii_string == "??");

	return 0;
}

Any encoding which does not meet the requirements of either ztd::text::is_encode_injective_v or ztd::text::is_decode_injective_v (or both, for transcoding which uses both an encode and a decode operation) will throw an error if you specify no error handlers in the text. This is done through the Injectivity Lucky 7 Extensions that go beyond the traditional Lucky 7 with 2 std::true_type/std::false_type definitions.