Wide Execution
This is the locale-based, wide runtime encoding. It uses a number of compile-time and runtime heuristics to eventually be resolved to an implementation-defined encoding. It is not required to work in constant expressions either: for this, use ztd::text::wide_literal, which represents the compile-time wide string (e.g. L"my string"
) encoding.
Currently, the hierarchy of behaviors is like so:
If the platform is Windows, then it assumes this is UTF-16;
If the platform is MacOS or
__STDC_ISO10646__
, then it assumed this is UTF-32 of some kind;Otherwise, cuneicode is used.
Warning
The C Standard Library has many design defects in its production of code points, which may make it unsuitable even if your C Standard Library recognizes certain locales (e.g., Big5-HKSCS). The runtime will always attempt to load iconv
if the definition is turned on, since it may do a better job than the C Standard Library’s interfaces until C23.
Even if, on a given platform, it can be assumed to be a static encoding (e.g., Apple/MacOS where it always returns the “C” Locale but processes text as UTF-32), ztd::text::wide_execution
will always present itself as a runtime and unknowable encoding. This is to prevent portability issues from relying on, e.g., ztd::text::is_decode_injective_v<ztd::text::wide_execution>
being true during development and working with that assumption, only to have it break when ported to a platform where that assumption no longer holds.
Aliases
-
constexpr wide_execution_t ztd::text::wide_execution = {}
An instance of the wide_execution_t type for ease of use.
-
class wide_execution_t : public __wide_execution_cwchar
The Encoding that represents the “Wide Execution” (wide locale-based) encoding. The wide execution encoding is typically associated with the locale, which is tied to the C standard library’s setlocale function.
Remark
Windows uses UTF-16, unless you call the C Standard Library directly. This object may use the C Standard Library to perform transcoding if certain platform facilities are disabled or not available. If this is the case, the C Standard Library has fundamental limitations which may treat your UTF-16 data like UCS-2, and result in broken input/output. This object uses UTF-16 directly on Windows when possible to avoid some of the platform-specific shenanigans. It will attempt to do UTF-32 conversions where possible as well, relying on C Standard definitions.
Internal Type
Warning
⚠️ Names with double underscores, and within the __*detail
and __*impl
namespaces are reserved for the implementation. Referencing this entity directly is bad, and the name/functionality can be changed at any point in the future. Relying on anything not guaranteed by the documentation is ☢️☢️Undefined Behavior☢️☢️.
<cwchar>
-based
-
class __wide_execution_cwchar
The Encoding that represents the “Wide Execution” (wide locale-based) encoding. This iteration uses the C Standard Library to do its job.
Remark
Because this encoding uses the C Standard Library’s functions, it is both slower and effectively dangerous because it requires a roundtrip through the encoding to get to UTF-32, and vice-versa. This is only used when
wchar_t
and its locale-based runtime encoding cannot be determined to be UTF-32, UTF-16, or some other statically-known encoding. These conversions may also be lossy.Subclassed by wide_execution_t
Public Types
-
using code_unit = wchar_t
The individual units that result from an encode operation or are used as input to a decode operation.
Remark
Please note that wchar_t is a variably sized type across platforms and may not represent either UTF-16 or UTF-32, including on *nix or POSIX platforms.
-
using code_point = unicode_code_point
The individual units that result from a decode operation or as used as input to an encode operation. For most encodings, this is going to be a Unicode Code Point or a Unicode Scalar Value.
-
using decode_state = __wide_decode_state
The state of the wide encoding used between calls, which may potentially manage shift state.
Remark
This type can potentially have lots of state due to the way the C API is specified.
-
using encode_state = __wide_encode_state
The state of the wide encoding used between calls, which may potentially manage shift state.
Remark
This type can potentially have lots of state due to the way the C API is specified.
-
using is_decode_injective = ::std::false_type
Whether or not the decode operation can process all forms of input into code point values.
Remark
All known wide encodings can decode into Unicode just fine.
-
using is_encode_injective = ::std::false_type
Whether or not the encode operation can process all forms of input into code unit values. On Windows, this is guaranteed to be UTF-16 encoding for the platform. Normally, this is UTF-32 on *nix/POSIX machines, but it can (and has been) changed before, sometimes even at runtime.
Remark
IBM encodings/computers make life interesting…
-
using is_unicode_encoding = ::std::false_type
Whether or not this encoding a Unicode encoding of some type.
Remark
On Windows, this is always true. On other platforms, the guarantees are not quite there. IBM encodings/computers make life interesting…
Public Static Functions
-
static inline bool contains_unicode_encoding() noexcept
Returns whether or not this encoding is a unicode encoding.
Remark
This function operates at runtime and queries the existing locale through a variety of platform-specific means (such as
nl_langinfo
for POSIX, ACP probing on Windows, or fallin back tostd::setlocale
name checking otherwise).
-
template<typename _Input, typename _Output, typename _ErrorHandler>
static inline auto encode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, encode_state &__s) Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
Remark
Platform APIs and/or the C Standard Library may be used to properly decode one complete unit of information (alongside std::mbstate_t usage). Whether or not the state is used is based on the implementation and what it chooses.
Remark
To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.
- Parameters:
__input – [in] The input view to read code uunits from.
__output – [in] The output view to write code points into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.
- Returns:
A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
-
template<typename _Input, typename _Output, typename _ErrorHandler>
static inline auto decode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, decode_state &__s) Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
Remark
Platform APIs and/or the C Standard Library may be used to properly decode one complete unit of information (alongside std::mbstate_t usage). Whether or not the state is used is based on the implementation and what it chooses.
Remark
To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.
- Parameters:
__input – [in] The input view to read code uunits from.
__output – [in] The output view to write code points into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.
- Returns:
A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
-
using code_unit = wchar_t
MacOS-based / __STDC_ISO106464__
-based
-
class __wide_execution_iso10646 : private __utf32_with<__wide_execution_iso10646, wchar_t, ztd_char32_t>
The wide encoding, as envisioned by ISO 10646. Typically UTF-32 with native endianness.
Remark
This is generally only turned on when the Standard Definition is turn oned (
__STDC_ISO_10646__
). It effectively uses UTF-32 since that’s the only encoding that can meet the original requirement of the C Standard and C Standard Library with respect to what happens with individualwchar_t
objects.Public Types
-
using code_point = code_point_t<__base_t>
The code point type that is decoded to, and encoded from.
-
using code_unit = code_unit_t<__base_t>
The code unit type that is decoded from, and encoded to.
-
using decode_state = decode_state_t<__base_t>
The associated state for decode operations.
-
using encode_state = encode_state_t<__base_t>
The associated state for encode operations.
-
using is_unicode_encoding = ::std::integral_constant<bool, is_unicode_encoding_v<__base_t>>
Whether or not this encoding is a unicode encoding or not.
-
using is_decode_injective = ::std::false_type
Whether or not this encoding’s
decode_one
step is injective or not.
-
using is_encode_injective = ::std::false_type
Whether or not this encoding’s
encode_one
step is injective or not.
Public Static Functions
-
template<typename _Input, typename _Output, typename _ErrorHandler>
static inline constexpr auto decode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, decode_state &__s) Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
- Parameters:
__input – [in] The input view to read code uunits from.
__output – [in] The output view to write code points into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.
- Returns:
A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
-
template<typename _Input, typename _Output, typename _ErrorHandler>
static inline constexpr auto encode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, encode_state &__s) Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
- Parameters:
__input – [in] The input view to read code uunits from.
__output – [in] The output view to write code points into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.
- Returns:
A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
Public Static Attributes
-
static constexprconst::std::size_t max_code_units = 8
The maximum code units a single complete operation of encoding can produce.
-
static constexprconst::std::size_t max_code_points = 8
The maximum number of code points a single complete operation of decoding can produce.
Private Static Functions
-
static inline constexpr auto decode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, state &__s)
Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
Remark
To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.
- Parameters:
__input – [in] The input view to read code uunits from.
__output – [in] The output view to write code points into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. For this encoding, the state is empty and means very little.
- Returns:
A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
-
static inline constexpr auto encode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, state &__s)
Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
Remark
To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.
- Parameters:
__input – [in] The input view to read code points from.
__output – [in] The output view to write code units into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. For this encoding, the state is empty and means very little.
- Returns:
A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
-
using code_point = code_point_t<__base_t>
Windows-based
-
class __wide_execution_windows : private basic_utf16<wchar_t>
The Encoding that represents the “Wide Execution” (wide locale-based) encoding, as it exists on Windows. The wide encoding is typically associated with the locale, which is tied to the C standard library’s setlocale function.
Remark
Windows uses UTF-16, unless you call the C Standard Library directly. This object may use the C Standard Library to perform transcoding if certain platform facilities are disabled or not available (e.g., a Windows-like machine without the Windows SDK). If this is the case, the C Standard Library has fundamental limitations which may treat your UTF-16 data like UCS-2, and result in broken input/output. This object uses UTF-16 directly on Windows when possible to avoid some of the platform-specific shenanigans.
Public Types
-
using code_point = code_point_t<__base_t>
The code point type that is decoded to, and encoded from.
-
using code_unit = code_unit_t<__base_t>
The code unit type that is decoded from, and encoded to.
-
using decode_state = decode_state_t<__base_t>
The associated state for decode operations.
-
using encode_state = encode_state_t<__base_t>
The associated state for encode operations.
-
using is_unicode_encoding = ::std::integral_constant<bool, is_unicode_encoding_v<__base_t>>
Whether or not this encoding is a unicode encoding or not.
-
using is_decode_injective = ::std::false_type
Whether or not this encoding’s
decode_one
step is injective or not.
-
using is_encode_injective = ::std::false_type
Whether or not this encoding’s
encode_one
step is injective or not.
Public Static Functions
-
template<typename _Input, typename _Output, typename _ErrorHandler>
static inline constexpr auto decode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, state &__s) Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
- Parameters:
__input – [in] The input view to read code uunits from.
__output – [in] The output view to write code points into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.
- Returns:
A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
-
template<typename _Input, typename _Output, typename _ErrorHandler>
static inline constexpr auto encode_one(_Input &&__input, _Output &&__output, _ErrorHandler &&__error_handler, state &__s) Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
- Parameters:
__input – [in] The input view to read code uunits from.
__output – [in] The output view to write code points into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.
- Returns:
A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
-
using code_point = code_point_t<__base_t>