UTF-16

Aliases

constexpr utf16_t ztd::text::utf16 = {}

An instance of the UTF-16 encoding for ease of use.

typedef basic_utf16<char16_t, unicode_code_point> ztd::text::utf16_t

A UTF-16 Encoding that traffics in char16_t. See ztd::text::basic_utf16 for more details.

constexpr wide_utf16_t ztd::text::wide_utf16 = {}

An instance of the UTF-16 that traffics in wchar_t for ease of use.

using ztd::text::wide_utf16_t = basic_utf16<wchar_t>

A UTF-16 Encoding that traffics in wchar_t. See ztd::text::basic_utf16 for more details.

Base Template

template<typename _CodeUnit, typename _CodePoint = unicode_code_point>
class basic_utf16 : public ztd::text::__txt_impl::__utf16_with<basic_utf16<_CodeUnit, unicode_code_point>, _CodeUnit, unicode_code_point>

A UTF-16 Encoding that traffics in, specifically, the desired code unit type provided as a template argument.

Remark

This is a strict UTF-16 implementation that does not allow lone, unpaired surrogates either in or out.

Template Parameters:
  • _CodeUnit – The code unit type to use.

  • _CodePoint – The code point type to use.

Public Types

using is_unicode_encoding = ::std::true_type

Whether or not this encoding that can encode all of Unicode.

using self_synchronizing_code = ::std::true_type

The start of a sequence can be found unambiguously when dropped into the middle of a sequence or after an error in reading as occurred for encoded text.

Remark

Unicode has definitive bit patterns which resemble start and end sequences (β€œlow surrogate” and β€œhigh surrogate” for UTF-16).

using state = __txt_detail::__empty_state

The state that can be used between calls to the encoder and decoder. It is an empty struct because there is no shift state to preserve between complete units of encoded information.

using code_unit = _CodeUnit

The individual units that result from an encode operation or are used as input to a decode operation. For UTF-16 formats, this is usually char16_t, but this can change (see ztd::text::basic_utf16).

using code_point = _CodePoint

The individual units that result from a decode operation or as used as input to an encode operation. For most encodings, this is going to be a Unicode Code Point or a Unicode Scalar Value.

using is_decode_injective = ::std::true_type

Whether or not the decode operation can process all forms of input into code point values. Thsi is true for all Unicode Transformation Formats (UTFs), which can encode and decode without a loss of information from a valid collection of code units.

using is_encode_injective = ::std::true_type

Whether or not the encode operation can process all forms of input into code unit values. This is true for all Unicode Transformation Formats (UTFs), which can encode and decode without loss of information from a valid input code point.

Public Static Functions

static inline constexpr ::ztd::span<const code_unit, 1> replacement_code_units() noexcept

Returns the replacement code units to use for the ztd::text::replacement_handler_t error handler.

static inline constexpr ::ztd::span<const code_point, 1> replacement_code_points() noexcept

Returns the replacement code point to use for the ztd::text::replacement_handler_t error handler.

template<bool _Strawman = __surrogates_allowed, typename _Input, typename _Output, typename _State, typename _InputProgress, typename _OutputProgress, ::std::enable_if_t<!_Strawman>* = nullptr>
static inline constexpr auto skip_input_error(decode_result<_Input, _Output, _State> __result, const _InputProgress &__input_progress, const _OutputProgress &__output_progress) noexcept

Allows an encoding to discard input characters if an error occurs, taking in both the state and the input sequence (by reference) to modify.

Remark

This will skip every input value until a proper UTF-16 starting byte (single or leading surrogate).

Parameters:
  • __result – [in] The decode result being processed by the error handler.

  • __input_progress – [in] The input that has been read but not committed to consumption.

  • __output_progress – [in] The output that has been written but could not be committed due to an error.

template<typename _Input, typename _Output, typename _State, typename _InputProgress, typename _OutputProgress>
static inline constexpr auto skip_input_error(encode_result<_Input, _Output, _State> __result, const _InputProgress &__input_progress, const _OutputProgress &__output_progress) noexcept

Allows an encoding to discard input characters if an error occurs, taking in both the state and the input sequence (by reference) to modify.

Remark

This will skip every input value until a proper UTF-32 unicode scalar value (or code point) is found.

template<typename _Input, typename _Output, typename _ErrorHandler>
static inline constexpr auto decode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, state &__s)

Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Remark

To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.

Parameters:
  • __input – [in] The input view to read code uunits from.

  • __output – [in] The output view to write code points into.

  • __error_handler – [in] The error handler to invoke if encoding fails.

  • __s – [inout] The necessary state information. For this encoding, the state is empty and means very little.

Returns:

A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.

template<typename _Input, typename _Output, typename _ErrorHandler>
static inline constexpr auto encode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, state &__s)

Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Remark

To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.

Parameters:
  • __input – [in] The input view to read code points from.

  • __output – [in] The output view to write code units into.

  • __error_handler – [in] The error handler to invoke if encoding fails.

  • __s – [inout] The necessary state information. For this encoding, the state is empty and means very little.

Returns:

A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.

Public Static Attributes

static constexpr ::std::size_t max_code_points = 1

The maximum number of code points a single complete operation of decoding can produce. This is 1 for all Unicode Transformation Format (UTF) encodings.

static constexpr ::std::size_t max_code_units = 2

The maximum code units a single complete operation of encoding can produce.

static constexpr ::ztd::text_encoding_id encoded_id = __surrogates_allowed ? ::ztd::text_encoding_id::ucs2 : ::ztd::text_encoding_id::utf16

The encoding ID for this type. Used for optimization purposes.

static constexpr ::ztd::text_encoding_id decoded_id = __surrogates_allowed ? ::ztd::text_encoding_id::ucs4 : ::ztd::text_encoding_id::utf32

The encoding ID for this type. Used for optimization purposes.