Execution¶
This is the locale-based, runtime encoding. It uses a number of compile-time and runtime heuristics to eventually be resolved to an implementation-defined encoding. It is not required to work in constant expressions either: for this, use ztd::text::literal, which represents the compile-time string (e.g. "my string"
) encoding.
Currently, the hierachy of behaviors is like so:
If the platform is MacOS, then it assumes this is UTF-8;
Otherwise, if libiconv is available, then it attempts to use iconv configured to the
"char"
-identified encoding;Otherwise, if the headers
<cuchar>
or<uchar.h>
are available, then it attempts to use a gnarly, lossy, and dangerous encoding that potentially traffics through the C Standard Library and Locale APIs;Otherwise, it produces a compile-time error.
Warning
The C Standard Library has many design defects in its production of code points, which may make it unsuitable even if your C Standard Library recognizes certain locales (e.g., Big5-HKSCS). The runtime will always attempt to load iconv
if the definition is turned on, since it may do a better job than the C Standard Library’s interfaces until C23.
Even if, on a given platform, it can be assumed to be a static encoding (e.g., Apple/MacOS where it always returns the “C” Locale but processes text as UTF-8), ztd::text::execution
will always present itself as a runtime and unknowable encoding. This is to prevent portability issues from relying on, e.g., ztd::text::is_decode_injective_v<ztd::text::execution>
being true during development and working with that assumption, only to have it break when ported to a platform where that assumption no longer holds.
-
constexpr execution_t ztd::text::execution = {}¶
An instance of the execution_t type for ease of use.
-
typedef no_encoding<char, unicode_code_point> ztd::text::execution_t¶
The Encoding that represents the “Execution” (narrow locale-based) encoding. The encoding is typically associated with the locale, which is tied to the C standard library’s setlocale function.
- Remark
Use of this type is subject to the C Standard Library or platform defaults. Some locales (such as the Big5 Hong King Supplementary Character Set (Big5-HKSCS)) are broken when accessed without
ZTD_TEXT_USE_CUNEICODE
beingdefined, due to fundamental design issues in the C Standard Library and bugs in glibc/musl libc’s current locale encoding support. On Apple, this is cuurrently assumed to be UTF-8 since they do not support the<cuchar>
or<uchar.h>
headers.
Internal Types¶
Warning
⚠️ Names with double underscores, and within the __detail
and __impl
namespaces are reserved for the implementation. Referencing this entity directly is bad, and the name/functionality can be changed at any point in the future. Relying on anything not guaranteed by the documentation is ☢️☢️Undefined Behavior☢️☢️.
MacOS-based¶
-
class ztd::text::__txt_impl::__execution_mac_os : private __utf8_with<__execution_mac_os, char, char32_t>¶
The default (“locale”) encoding for Mac OS.
- Remark
Note that for all intents and purposes, Mac OS demands that all text is in UTF-8. However, on Big Sur, Catalina, and a few other platforms locale functionality and data has been either forgotten/left behind or intentionally kept in place on these devices. It may be possible that with very dedicated hacks one can still change the desired default encoding from UTF-8 to something else in the majority of Apple text. Their documentation states that all text “should” be UTF-8, but very explicitly goes out of its way to not make that hard guarantee. Since it is a BSD-like system and they left plenty of that data behind from C libraries, this may break in extremely obscure cases. Please be careful on Apple machines!
Public Types
-
using code_point = code_point_t<__base_t>¶
The code point type that is decoded to, and encoded from. ///.
-
using code_unit = code_unit_t<__base_t>¶
The code unit type that is decoded from, and encoded to. ///.
-
using decode_state = decode_state_t<__base_t>¶
The associated state for decode operations. ///.
-
using encode_state = encode_state_t<__base_t>¶
The associated state for encode operations. ///.
-
using is_unicode_encoding = ::std::integral_constant<bool, is_unicode_encoding_v<__base_t>>¶
Whether or not this encoding is a unicode encoding or not. ///.
-
using is_decode_injective = ::std::false_type¶
Whether or not this encoding’s
decode_one
step is injective or not. ///.
-
using is_encode_injective = ::std::false_type¶
Whether or not this encoding’s
encode_one
step is injective or not. ///.
Public Static Functions
-
template<typename _InputRange, typename _OutputRange, typename _ErrorHandler>
static inline constexpr auto decode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, decode_state &__s)¶ Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
- Parameters
__input – [in] The input view to read code uunits from.
__output – [in] The output view to write code points into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.
- Returns
A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
-
template<typename _InputRange, typename _OutputRange, typename _ErrorHandler>
static inline constexpr auto encode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, encode_state &__s)¶ Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
- Parameters
__input – [in] The input view to read code uunits from.
__output – [in] The output view to write code points into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. Most encodings have no state, but because this is effectively a runtime encoding and therefore it is important to preserve and manage this state.
- Returns
A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
Public Static Attributes
-
static constexpr ::std::size_t max_code_points = 8¶
The maximum code units a single complete operation of encoding can produce.
- Remark
There are encodings for which one input can produce 3 code points (some Tamil encodings) and there are rumours of an encoding that can produce 7 code points from a handful of input. We use a protective/conservative 8, here, to make sure ABI isn’t broken later.
-
static constexpr ::std::size_t max_code_units = MB_LEN_MAX¶
The maximum number of code points a single complete operation of decoding can produce.
- Remark
This is bounded by the platform’s
MB_LEN_MAX
macro, which is an integral constant expression representing the maximum value of output all C locales can produce from a single complete operation.
Private Static Functions
-
static inline constexpr auto encode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, encode_state &__s)¶
Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
- Remark
To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.
- Parameters
__input – [in] The input view to read code points from.
__output – [in] The output view to write code units into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. For this encoding, the state is empty and means very little.
- Returns
A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
-
static inline constexpr auto decode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, decode_state &__s)¶
Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
- Remark
To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.
- Parameters
__input – [in] The input view to read code uunits from.
__output – [in] The output view to write code points into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. For this encoding, the state is empty and means very little.
- Returns
A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.