Marking an encoding as Unicode-Capable

Sometimes, you need to make your own encodings. Whether for legacy reasons or for interoperation reasons, you need the ability to write an encoding that can losslessly handle all \(2^21\) code points. Whether it’s writing a variant of UTF-7, or dealing with a very specific legacy set like Unicode v6.0 with the Softbank Private Use Area, you are going to need to be able to say “hey, my encoding can handle all of the code points and therefore deserves to be treated like a Unicode encoding”. There are 2 ways to do this, one for decisions that can be made at compile time, and one for decisions that can be made at runtime (e.g., over a variant_encoding<X, Y, Z>).

compile time

The cheapest way to tag an encoding as Unicode Capable and have the library recognize it as such when ztd::text::is_unicode_encoding is used is to just define a member type definition:

class utf8_v6_softbank {
        // …
        using is_unicode_encoding = std::true_type;
        // …

That is all you have to write. Both ztd::text::is_unicode_encoding and ztd::text::contains_unicode_encoding will detect this and use it.


If your encoding cannot know at compile time whether or not it is a unicode encoding (e.g., for type-erased encodings, complex wrapping encodings, or encodings which rely on external operating system resources), you can define a method instead. When applicable, this will be picked up by the ztd::text::contains_unicode_encoding function. Here is an example of a runtime, locale-based encoding using platform-knowledge to pick up what the encoding might be, and determine if it can handle working in Unicode:

 4	struct encode_state {
 5		std::mbstate_t c_stdlib_state;
 7		encode_state() noexcept : c_stdlib_state() {
 8			// properly set for c32rtomb state
 9			code_unit ghost_ouput[MB_LEN_MAX] {};
10			UCHAR_ACCESS c32rtomb(ghost_ouput, U'\0', &c_stdlib_state);
11		}
12	};
14	bool contains_unicode_encoding() const noexcept {
15#if defined(_WIN32)
16		CPINFOEXW cp_info {};
17		BOOL success = GetCPInfoExW(CP_THREAD_ACP, 0, &cp_info);
18		if (success == 0) {
19			return false;
20		}
21		switch (cp_info.CodePage) {
22		case 65001: // UTF-8
23		            // etc. etc. …
24			return true;
25		default:
26			break;
27		}
28				     empty_code_point_span(), empty_code_unit_span());

That is it. ztd::text::contains_unicode_encoding will detect this and use your function call, so you should never be calling this or accessing the above compile time classification if necessary and always delegating to the ztd::text::contains_unicode_encoding function call.