Injective: Promoting Safety in Encodings¶
As detailed in the Lossy Operation Protection section, is_encode_injective
and is_decode_injective
help the library understand when a conversion you are doing cannot be guaranteed at compile time to be lossless. Injectivity is a high-brow mathematical term:
In mathematics, an injective function (also known as injection, or one-to-one function) is a function that maps distinct elements of its domain to distinct elements of its codomain.
This is very fancy speak for the fact that for every complete, well-formed input value, there is a well-formed, distinct output value. It does not have to cover all of the potential output values: so long as there is a one-to-one mapping that is unambigious for all the input values, it is injective. For practical purposes, it means that all of the code unit sequences that are valid can produce a unique code point sequence (“the decode operation is injective”). And, in the reverse case, it means that all the code point sequences that are valid can produce a unique code unit sequence (“the encode operation is injective”).
These two properties appear on the type itself, and is a way to opt-in to saying that a conversion is not lossy (e.g., it preserves information perfectly if the input is well-formed). You can define them by placing them on your Encoding Object Type’s definition:
1struct any_unicode_byte_encoding {
2 using is_decode_injective = std::true_type;
3 using is_encode_injective = std::true_type;
4 using code_unit = std::byte;
5 using code_point = ztd::text::unicode_scalar_value;
6 // …
7};
This signals that the encode_one
and decode_one
functions — if they are given well-formed input — will never be lossy between their code_point
type and their code_unit
types when performing the desired operation. If only one half of that equation is lossy, then you can mark only one, or the other. For example, ztd::text::ascii is lossy only in for the encode_one
operation, so it has is_decode_injective = std::true_type;
for decode
operations, but is_encode_injective = std::false_type;
for encode
operations:
1 //////
2 /// @brief The individual units that result from an encode operation or are used as input to a decode
3 /// operation.
4 /// @remarks ASCII can decode from its 7-bit (unpacked) code units to Unicode Code Points. Since the converion
5 /// is lossless, this property is true.
6 //////
7 using is_decode_injective = ::std::true_type;
8 //////
9 /// @brief Whether or not the encode operation can process all forms of input into code unit values. This is
10 /// not true for ASCII, as many Unicode Code Point and Unicode Scalar Values cannot be represented in ASCII.
11 /// Since the conversion is lossy, this property is false.
12 //////
13 using is_encode_injective = ::std::false_type;
14 //////
15 /// @brief The maximum code units a single complete operation of encoding can produce.
16 inline static constexpr const ::std::size_t max_code_units = 1;
17 ::std::move(__inlast)),
If the type definition is not present and is not std::true_type
, then the implementation assumes that this is false for a given encoding. See ztd::text::is_decode_injective and ztd::text::is_encode_injective for more information.