Wide Execution

This is the locale-based, wide runtime encoding. It uses a number of compile-time and runtime heuristics to eventually be resolved to an implementation-defined encoding. It is not required to work in constant expressions either: for this, use ztd::text::wide_literal, which represents the compile-time wide string (e.g. L"my string") encoding.

Currently, the hierachy of behaviors is like so:

  • If the platform is Windows, then it assumes this is UTF-16;

  • Otherwise, if libiconv is available, then it attempts to use iconv configured to the "wchar_t"-identified encoding;

  • Otherwise, if the platform is MacOS and WCHAR_MAX is greater than the maximum of an unsigned 21-bit number, or __STDC_ISO_10646__ is defined, then it attempts to use UTF-32;

  • Otherwise, if the headers <cwchar> or <wchar.h> are available, then it attempts to use a gnarly, lossy, and dangerous encoding that potentially traffics through the C Standard Library and Locale APIs in conjunction with a roundtrip through the ztd::text::execution encoding;

  • Otherwise, it produces a compile-time error.

Warning

The C Standard Library has many design defects in its production of code points, which may make it unsuitable even if your C Standard Library recognizes certain locales (e.g., Big5-HKSCS). The runtime will always attempt to load iconv if the definition is turned on, since it may do a better job than the C Standard Library’s interfaces until C23.

Even if, on a given platform, it can be assumed to be a static encoding (e.g., Apple/MacOS where it always returns the “C” Locale but processes text as UTF-32), ztd::text::wide_execution will always present itself as a runtime and unknowable encoding. This is to prevent portability issues from relying on, e.g., ztd::text::is_decode_injective_v<ztd::text::wide_execution> being true during development and working with that assumption, only to have it break when ported to a platform where that assumption no longer holds.

constexpr wide_execution_t ztd::text::wide_execution = {}

An instance of the wide_execution_t type for ease of use.

typedef __txt_impl::__wide_execution_cwchar ztd::text::wide_execution_t

The Encoding that represents the “Wide Execution” (wide locale-based) encoding. The wide execution encoding is typically associated with the locale, which is tied to the C standard library’s setlocale function.

Remark

Windows uses UTF-16, unless you call the C Standard Library directly. If ZTD_TEXT_USE_CUNEICODE or ZTD_TEXT_ICONV are not defined, this object may use the C Standard Library to perform transcoding if certain platform facilities are disabled or not available. If this is the case, the C Standard Library has fundamental limitations which may treat your UTF-16 data like UCS-2, and result in broken input/output. This object uses UTF-16 directly on Windows when possible to avoid some of the platform-specific shenanigans. It will attempt to do UTF-32 conversions where possible as well, relying on C Standard definitions.

Internal Type

Warning

⚠️ Names with double underscores, and within the __*detail and __*impl namespaces are reserved for the implementation. Referencing this entity directly is bad, and the name/functionality can be changed at any point in the future. Relying on anything not guaranteed by the documentation is ☢️☢️Undefined Behavior☢️☢️.

<cwchar>-based

class ztd::text::__txt_impl::__wide_execution_cwchar

The Encoding that represents the “Wide Execution” (wide locale-based) encoding. This iteration uses the C Standard Library to do its job.

Remark

Because this encoding uses the C Standard Library’s functions, it is both slower and effectively dangerous because it requires a roundtrip through the encoding to get to UTF-32, and vice-versa. This is only used when wchar_t and its locale-based runtime encoding cannot be determined to be UTF-32, UTF-16, or some other statically-known encoding. These conversions may also be lossy.

Public Types

using code_unit = wchar_t

The individual units that result from an encode operation or are used as input to a decode operation.

Remark

Please note that wchar_t is a variably sized type across platforms and may not represent either UTF-16 or UTF-32, including on *nix or POSIX platforms.

using code_point = unicode_code_point

The individual units that result from a decode operation or as used as input to an encode operation. For most encodings, this is going to be a Unicode Code Point or a Unicode Scalar Value.

using decode_state = __wide_decode_state

The state of the wide encoding used between calls, which may potentially manage shift state.

Remark

This type can potentially have lots of state due to the way the C API is specified.

using encode_state = __wide_encode_state

The state of the wide encoding used between calls, which may potentially manage shift state.

Remark

This type can potentially have lots of state due to the way the C API is specified.

using is_decode_injective = ::std::false_type

Whether or not the decode operation can process all forms of input into code point values.

Remark

All known wide encodings can decode into Unicode just fine.

using is_encode_injective = ::std::false_type

Whether or not the encode operation can process all forms of input into code unit values. On Windows, this is guaranteed to be UTF-16 encoding for the platform. Normally, this is UTF-32 on *nix/POSIX machines, but it can (and has been) changed before, sometimes even at runtime.

Remark

IBM encodings/computers make life interesting…

using is_unicode_encoding = ::std::false_type

Whether or not this encoding a Unicode encoding of some type.

Remark

On Windows, this is always true. On other platforms, the guarantees are not quite there. IBM encodings/computers make life interesting…

Public Static Functions

static inline bool contains_unicode_encoding() noexcept

Returns whether or not this encoding is a unicode encoding.

Remark

This function operates at runtime and queries the existing locale through a variety of platform-specific means (such as nl_langinfo for POSIX, ACP probing on Windows, or fallin back to std::setlocale name checking otherwise).

template<typename _InputRange, typename _OutputRange, typename _ErrorHandler>
static inline auto encode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, encode_state &__s)

Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Remark

Platform APIs and/or the C Standard Library may be used to properly decode one complete unit of information (alongside std::mbstate_t usage). Whether or not the state is used is based on the implementation and what it chooses. If ZTD_TEXT_USE_CUNEICODE is defined, the ztd.cuneicode library may be used to fulfill this functionality.

Remark

To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.

Parameters
  • __input[in] The input view to read code uunits from.

  • __output[in] The output view to write code points into.

  • __error_handler[in] The error handler to invoke if encoding fails.

  • __s[inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.

Returns

A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.

template<typename _InputRange, typename _OutputRange, typename _ErrorHandler>
static inline auto decode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, decode_state &__s)

Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Remark

Platform APIs and/or the C Standard Library may be used to properly decode one complete unit of information (alongside std::mbstate_t usage). Whether or not the state is used is based on the implementation and what it chooses. If ZTD_TEXT_USE_CUNEICODE is defined, the ztd.cuneicode library may be used to fulfill this functionality.

Remark

To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.

Parameters
  • __input[in] The input view to read code uunits from.

  • __output[in] The output view to write code points into.

  • __error_handler[in] The error handler to invoke if encoding fails.

  • __s[inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.

Returns

A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.

Public Static Attributes

static constexprconst::std::size_t max_code_units = 8

The maximum code units a single complete operation of encoding can produce.

static constexprconst::std::size_t max_code_points = 8

The maximum number of code points a single complete operation of decoding can produce.

MacOS-based

class ztd::text::__txt_impl::__wide_execution_iso10646 : private __utf32_with<__wide_execution_iso10646, wchar_t, char32_t>

The wide encoding, as envisioned by ISO 10646. Typically UTF-32 with native endianness.

Remark

This is generally only turned on when the Standard Definition is turn oned ( ). It effectively uses UTF-32 since that’s the only encoding that can meet the original requirement of the C Standard and C Standard Library with respect to what happens with individual wchar_t objects.

Public Types

using code_point = code_point_t<__base_t>

The code point type that is decoded to, and encoded from.

using code_unit = code_unit_t<__base_t>

The code unit type that is decoded from, and encoded to.

using decode_state = decode_state_t<__base_t>

The associated state for decode operations.

using encode_state = encode_state_t<__base_t>

The associated state for encode operations.

using is_unicode_encoding = ::std::integral_constant<bool, is_unicode_encoding_v<__base_t>>

Whether or not this encoding is a unicode encoding or not.

using is_decode_injective = ::std::false_type

Whether or not this encoding’s decode_one step is injective or not.

using is_encode_injective = ::std::false_type

Whether or not this encoding’s encode_one step is injective or not.

Public Static Functions

template<typename _InputRange, typename _OutputRange, typename _ErrorHandler>
static inline constexpr auto decode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, decode_state &__s)

Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Parameters
  • __input[in] The input view to read code uunits from.

  • __output[in] The output view to write code points into.

  • __error_handler[in] The error handler to invoke if encoding fails.

  • __s[inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.

Returns

A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.

template<typename _InputRange, typename _OutputRange, typename _ErrorHandler>
static inline constexpr auto encode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, encode_state &__s)

Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Parameters
  • __input[in] The input view to read code uunits from.

  • __output[in] The output view to write code points into.

  • __error_handler[in] The error handler to invoke if encoding fails.

  • __s[inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.

Returns

A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.

Public Static Attributes

static constexprconst::std::size_t max_code_units = 8

The maximum code units a single complete operation of encoding can produce.

static constexprconst::std::size_t max_code_points = 8

The maximum number of code points a single complete operation of decoding can produce.

Private Types

using state = __txt_detail::__empty_state

The state that can be used between calls to the encoder and decoder. It is an empty struct because there is no shift state to preserve between complete units of encoded information.

Private Static Functions

static inline constexpr auto decode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, state &__s)

Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Remark

To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.

Parameters
  • __input[in] The input view to read code uunits from.

  • __output[in] The output view to write code points into.

  • __error_handler[in] The error handler to invoke if encoding fails.

  • __s[inout] The necessary state information. For this encoding, the state is empty and means very little.

Returns

A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.

static inline constexpr auto encode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, state &__s)

Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Remark

To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.

Parameters
  • __input[in] The input view to read code points from.

  • __output[in] The output view to write code units into.

  • __error_handler[in] The error handler to invoke if encoding fails.

  • __s[inout] The necessary state information. For this encoding, the state is empty and means very little.

Returns

A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.