Marking an encoding as Unicode-Capable

Sometimes, you need to make your own encodings. Whether for legacy reasons or for interoperation reasons, you need the ability to write an encoding that can losslessly handle all \(2^21\) code points. Whether it’s writing a variant of UTF-7, or dealing with a very specific legacy set like Unicode v6.0 with the Softbank Private Use Area, you are going to need to be able to say “hey, my encoding can handle all of the code points and therefore deserves to be treated like a Unicode encoding”. There are 2 ways to do this, one for decisions that can be made at compile time, and one for decisions that can be made at runtime (e.g., over a variant_encoding<X, Y, Z>).

compile time

The cheapest way to tag an encoding as Unicode Capable and have the library recognize it as such when ztd::text::is_unicode_encoding is used is to just define a member type definition:

class utf8_v6_softbank {
public:
        // …
        using is_unicode_encoding = std::true_type;
        // …
};

That is all you have to write. Both ztd::text::is_unicode_encoding and ztd::text::contains_unicode_encoding will detect this and use it.

Run-time

If your encoding cannot know at compile time whether or not it is a unicode encoding (e.g., for type-erased encodings, complex wrapping encodings, or encodings which rely on external operating system resources), you can define a method instead. When applicable, this will be picked up by the ztd::text::contains_unicode_encoding function. Here is an example of a runtime, locale-based encoding using platform-knowledge to pick up what the encoding might be, and determine if it can handle working in Unicode:

}
#define UCHAR_ACCESS ::
		bool is_complete() const noexcept {
			return UCHAR_ACCESS mbsinit(&c_stdlib_state) != 0;
		}
	};

	struct encode_state {
		ztd_mbstate_t c_stdlib_state;

		encode_state() noexcept : c_stdlib_state() {
			// properly set for c32rtomb state
			code_unit ghost_ouput[MB_LEN_MAX] {};
			UCHAR_ACCESS c32rtomb(ghost_ouput, U'\0', &c_stdlib_state);
		}

		bool is_complete() const noexcept {
			return UCHAR_ACCESS mbsinit(&c_stdlib_state) != 0;
		}
	};

	bool contains_unicode_encoding() const noexcept {
#if defined(_WIN32)
		CPINFOEXW cp_info {};
		BOOL success = GetCPInfoExW(CP_THREAD_ACP, 0, &cp_info);
		if (success == 0) {
			return false;
				     empty_code_point_span(), empty_code_unit_span());

That is it. ztd::text::contains_unicode_encoding will detect this and use your function call, so you should never be calling this or accessing the above compile time classification if necessary and always delegating to the ztd::text::contains_unicode_encoding function call.