WTF-8

Wobby Transformat Format 8 (WTF-8) is an encoding scheme that preserves lone-encoded surrogates, which is generally not allowed in streams composed purely of Unicode Scalar Values.

Aliases

constexpr wtf8_t ztd::text::wtf8 = {}

An instance of the WTF-8 type for ease of use.

using ztd::text::wtf8_t = basic_wtf8<uchar8_t>

A “Wobbly Transformation Format 8” (WTF-8) Encoding that traffics in char8_t. See ztd::text::basic_wtf8 for more details.

Base Template

template<typename _CodeUnit, typename _CodePoint = unicode_code_point>
class basic_wtf8 : public __utf8_with<basic_wtf8<_CodeUnit, unicode_code_point>, _CodeUnit, unicode_code_point, __txt_detail::__empty_state, __txt_detail::__empty_state, false, true, false>

A “Wobbly Transformation Format 8” (WTF-8) Encoding that traffics in, specifically, the desired code unit type provided as a template argument.

Remark

This type as a maximum of 4 input code points and a maximum of 1 output code point. Unpaired surrogates are allowed in this type, which may be useful for dealing with legacy storage and implementations of the Windows Filesystem (modern Windows no longer lets non-Unicode filenames through). For a strict, Unicode-compliant UTF-8 Encoding, see ztd::text::basic_utf8 .

Template Parameters:
  • _CodeUnit – The code unit type to use.

  • _CodePoint – The code point type to use.

Public Types

using is_unicode_encoding = ::std::true_type

Whether or not this encoding that can encode all of Unicode.

using self_synchronizing_code = ::std::true_type

The start of a sequence can be found unambiguously when dropped into the middle of a sequence or after an error in reading as occurred for encoded text.

Remark

Unicode has definitive bit patterns which resemble start and end sequences. The bit pattern 0xxxxxxx indicates a lone bit, and 1xxxxxx indicates a potential start bit for UTF-8. In particular, if 0 is not the first bit, it must be a sequence of 1s followed immediately by a 0 (e.g., 10xxxxxx, 110xxxxx, 1110xxxx, or 11110xxx).

using decode_state = __txt_detail::__empty_state

The state that can be used between calls to the encoder and decoder. It is normally an empty struct because there is no shift state to preserve between complete units of encoded information.

using encode_state = __txt_detail::__empty_state

The state that can be used between calls to the encoder and decoder. It is normally an empty struct because there is no shift state to preserve between complete units of encoded information.

using code_unit = _CodeUnit

The individual units that result from an encode operation or are used as input to a decode operation. For UTF-8 formats, this is usually char8_t, but this can change (see ztd::text::basic_utf8).

using code_point = unicode_code_point

The individual units that result from a decode operation or as used as input to an encode operation. For most encodings, this is going to be a Unicode Code Point or a Unicode Scalar Value.

using is_decode_injective = ::std::true_type

Whether or not the decode operation can process all forms of input into code point values. Thsi is true for all Unicode Transformation Formats (UTFs), which can encode and decode without a loss of information from a valid collection of code units.

using is_encode_injective = ::std::true_type

Whether or not the encode operation can process all forms of input into code unit values. This is true for all Unicode Transformation Formats (UTFs), which can encode and decode without loss of information from a valid input code point.

Public Static Functions

static inline constexpr ::ztd::span<const code_unit, 3> replacement_code_units() noexcept

Returns the replacement code units to use for the ztd::text::replacement_handler_t error handler.

static inline constexpr ::ztd::span<const code_point, 1> replacement_code_points() noexcept

Returns the replacement code point to use for the ztd::text::replacement_handler_t error handler.

static inline constexpr auto skip_input_error(decode_result<_Input, _Output, _State> __result, const _InputProgress &__input_progress, const _OutputProgress &__output_progress) noexcept

Allows an encoding to discard input characters if an error occurs, taking in both the state and the input sequence to modify through the result type.

Remark

This will skip every input value until a proper starting byte is found.

static inline constexpr auto skip_input_error(encode_result<_Input, _Output, _State> __result, const _InputProgress &__input_progress, const _OutputProgress &__output_progress) noexcept

Allows an encoding to discard input characters if an error occurs, taking in both the state and the input sequence (by reference) to modify.

Remark

This will skip every input value until a proper UTF-32 unicode scalar value (or code point) is found.

static inline constexpr auto encode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, encode_state &__s)

Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Remark

To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.

Parameters:
  • __input[in] The input view to read code points from.

  • __output[in] The output view to write code units into.

  • __error_handler[in] The error handler to invoke if encoding fails.

  • __s[inout] The necessary state information. For this encoding, the state is empty and means very little.

Returns:

A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.

static inline constexpr auto decode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, decode_state &__s)

Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Remark

To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.

Parameters:
  • __input[in] The input view to read code uunits from.

  • __output[in] The output view to write code points into.

  • __error_handler[in] The error handler to invoke if encoding fails.

  • __s[inout] The necessary state information. For this encoding, the state is empty and means very little.

Returns:

A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.

Public Static Attributes

static constexpr ::std::size_t max_code_points

The maximum number of code points a single complete operation of decoding can produce. This is 1 for all Unicode Transformation Format (UTF) encodings.

static constexpr ::std::size_t max_code_units

The maximum code units a single complete operation of encoding can produce. If overlong sequence allowed, this is 6: otherwise, this is 4.

static constexpr ::ztd::text_encoding_id encoded_id

The encoding ID for this type. Used for optimization purposes.

static constexpr ::ztd::text_encoding_id decoded_id

The encoding ID for this type. Used for optimization purposes.