Injective: Promoting Safety in Encodings
As detailed in the Lossy Operation Protection section, is_encode_injective and is_decode_injective help the library understand when a conversion you are doing cannot be guaranteed at compile time to be lossless. Injectivity is a high-brow mathematical term:
In mathematics, an injective function (also known as injection, or one-to-one function) is a function that maps distinct elements of its domain to distinct elements of its codomain.
This is very fancy speak for the fact that for every complete, well-formed input value, there is a well-formed, distinct output value. It does not have to cover all of the potential output values: so long as there is a one-to-one mapping that is unambiguous for all the input values, it is injective. For practical purposes, it means that all of the code unit sequences that are valid can produce a unique code point sequence (“the decode operation is injective”). And, in the reverse case, it means that all the code point sequences that are valid can produce a unique code unit sequence (“the encode operation is injective”).
These two properties appear on the type itself, and is a way to opt-in to saying that a conversion is not lossy (e.g., it preserves information perfectly if the input is well-formed). You can define them by placing them on your Encoding Object Type’s definition:
1struct any_unicode_byte_encoding {
2 // …
3 using is_decode_injective = std::true_type;
4 using is_encode_injective = std::true_type;
5 using code_unit = std::byte;
6 using code_point = ztd::text::unicode_scalar_value;
7 // …
8};
This signals that the encode_one and decode_one functions — if they are given well-formed input — will never be lossy between their code_point type and their code_unit types when performing the desired operation. If only one half of that equation is lossy, then you can mark only one, or the other. For example, ztd::text::ascii is lossy only in for the encode_one operation, so it has is_decode_injective = std::true_type; for decode operations, but is_encode_injective = std::false_type; for encode operations:
1 template <typename _CodeUnit, typename _CodePoint = unicode_code_point>
2 class basic_ascii {
3 //////
4 /// @brief Whether or not the decode operation can process all forms of input into code point values.
5 ///
6 /// @remarks ASCII can decode from its 7-bit (unpacked) code units to Unicode Code Points. Since the converion
7 /// is lossless, this property is true.
8 using is_decode_injective = ::std::true_type;
9 //////
10 /// @brief Whether or not the encode operation can process all forms of input into code unit values. This is
11 /// not true for ASCII, as many Unicode Code Point and Unicode Scalar Values cannot be represented in ASCII.
12 /// Since the conversion is lossy, this property is false.
13 using is_encode_injective = ::std::false_type;
14 };
15 //////
16 /// @brief The American Standard Code for Information Exchange (ASCII) Encoding.
17 ///
18 /// @remarks The most vanilla and unimaginative encoding there is in the world, excluding tons of other languages,
19 /// dialects, and even common English idioms and borrowed words. Please don't pick this unless you have good
20 /// reason!
21 using ascii_t = basic_ascii<char>;
If the type definition is not present and is not std::true_type, then the implementation assumes that this is false for a given encoding. See ztd::text::is_decode_injective and ztd::text::is_encode_injective for more information.