Encoding Scheme¶
The encoding_scheme
template turns any encoding into a byte-based encoding capable of reading and writing those bytes into and out of byte-value_type
ranges. It prevents duplicating effort to read encodings as little endian or big endian, allowing composition for any desired encoding to interface with e.g. a UTF-16 Big Endian blob of data coming over a network or shared pipe.
Aliases¶
-
using ztd::text::basic_utf16_le = encoding_scheme<utf16_t, endian::little, _Byte>¶
A UTF-16 encoding, in Little Endian format, with inputs as a sequence of bytes.
- Template Parameters
_Byte – The byte type to use. Typically, this is
std::byte
oruchar.
-
using ztd::text::utf16_le_t = basic_utf16_le<::std::byte>¶
A UTF-16 encoding, in Little Endian format, with inputs as a sequence of bytes.
-
using ztd::text::basic_utf16_be = encoding_scheme<utf16_t, endian::big, _Byte>¶
A UTF-16 encoding, in Big Endian format, with inputs as a sequence of bytes.
- Template Parameters
_Byte – The byte type to use. Typically, this is
std::byte
orunsigned char
.
-
using ztd::text::utf16_be_t = basic_utf16_be<::std::byte>¶
A UTF-16 encoding, in Big Endian format, with inputs as a sequence of bytes.
-
using ztd::text::basic_utf16_ne = encoding_scheme<utf16_t, endian::native, _Byte>¶
A UTF-16 encoding, in Native Endian format, with inputs as a sequence of bytes.
- Template Parameters
_Byte – The byte type to use. Typically, this is
std::byte
orunsigned char
.
-
using ztd::text::utf16_ne_t = basic_utf16_ne<::std::byte>¶
A UTF-16 encoding, in Native Endian format, with inputs as a sequence of bytes.
-
using ztd::text::basic_utf32_le = encoding_scheme<utf32_t, endian::little, _Byte>¶
A UTF-32 encoding, in Little Endian format, with inputs as a sequence of bytes.
- Template Parameters
_Byte – The byte type to use. Typically, this is
std::byte
orunsigned char
.
-
using ztd::text::utf32_le_t = basic_utf32_le<::std::byte>¶
A UTF-32 encoding, in Little Endian format, with inputs as a sequence of bytes.
-
using ztd::text::basic_utf32_be = encoding_scheme<utf32_t, endian::big, _Byte>¶
A UTF-32 encoding, in Big Endian format, with inputs as a sequence of bytes.
- Template Parameters
_Byte – The byte type to use. Typically, this is
std::byte
orunsigned char
.
-
using ztd::text::utf32_be_t = basic_utf32_be<::std::byte>¶
A UTF-32 encoding, in Big Endian format, with inputs as a sequence of bytes.
-
using ztd::text::basic_utf32_ne = encoding_scheme<utf32_t, endian::native, _Byte>¶
A UTF-32 encoding, in Native Endian format, with inputs as a sequence of bytes.
- Template Parameters
_Byte – The byte type to use. Typically, this is
std::byte
orunsigned char
.
-
using ztd::text::utf32_ne_t = basic_utf32_ne<::std::byte>¶
A UTF-32 encoding, in Big Endian format, with inputs as a sequence of bytes.
Base Template¶
-
template<typename _Encoding, endian _Endian = endian::native, typename _Byte = ::std::byte>
class ztd::text::encoding_scheme : public __is_unicode_encoding_es<encoding_scheme<_Encoding, _Endian, _Byte>, remove_cvref_t<unwrap_t<_Encoding>>>, private ebco<_Encoding>¶ Decomposes the provided Encoding type into a specific endianness (big, little, or native) to allow for a single encoding type to be viewed in different ways.
- Remark
For example, this can be used to construct a Big Endian UTF-16 by using
encoding_scheme<ztd::text::utf16_t, ztd::endian::big>
. It can be made interopable withunsigned char
buffers rather thanstd::byte
buffers by doing:ztd::text::encoding_scheme<ztd::text::utf32_t, ztd::endian::native, unsigned char>
.
- tparam _Encoding
The encoding type.
- tparam _Endian
The endianess to use. Defaults to ztd::endian::native.
- tparam _Byte
The byte type to use. Defaults to
std::byte
.
Public Types
-
using code_point = code_point_t<_UBaseEncoding>¶
The individual units that result from a decode operation or as used as input to an encode operation. For most encodings, this is going to be a Unicode Code Point or a Unicode Scalar Value.
-
using code_unit = _Byte¶
The individual units that result from an encode operation or are used as input to a decode operation.
- Remark
Typically, this type is usually always some kind of byte type (unsigned char or std::byte or other
sizeof(obj) == 1
type).
-
using decode_state = decode_state_t<_UBaseEncoding>¶
The state that can be used between calls to the decode function.
- Remark
Even if the underlying encoding only has a single
state
type, we need to separate the two out in order to generically handle all encodings. Therefore, the encoding_scheme will always have bothencode_state
anddecode_state.
-
using encode_state = encode_state_t<_UBaseEncoding>¶
The state that can be used between calls to the encode function.
- Remark
Even if the underlying encoding only has a single
state
type, we need to separate the two out in order to generically handle all encodings. Therefore, the encoding_scheme will always have bothencode_state
anddecode_state.
-
using is_encode_injective = ::std::integral_constant<bool, is_encode_injective_v<_UBaseEncoding>>¶
Whether or not the encode operation can process all forms of input into code point values.
- Remark
Defers to what the underlying
encoding_type
does.
-
using is_decode_injective = ::std::integral_constant<bool, is_decode_injective_v<_UBaseEncoding>>¶
Whether or not the decode operation can process all forms of input into code point values.
- Remark
Defers to what the underlying
encoding_type
does.
Public Functions
-
inline constexpr encoding_type &base() & noexcept¶
Retrives the underlying encoding object.
- Returns
An l-value reference to the encoding object.
-
inline constexpr const encoding_type &base() const & noexcept¶
Retrives the underlying encoding object.
- Returns
An l-value reference to the encoding object.
-
inline constexpr encoding_type &&base() && noexcept¶
Retrives the underlying encoding object.
- Returns
An l-value reference to the encoding object.
-
template<typename _Unused = encoding_type, ::std::enable_if_t<is_code_units_replaceable_v<_Unused>>* = nullptr>
inline decltype(auto) constexpr replacement_code_units() const noexcept¶ Returns, the desired replacement code units to use.
- Remark
This is only callable if the function call exists on the wrapped encoding. It is broken down into a contiguous view type formulated from bytes if the wrapped code unit types do not match.
-
template<typename _Unused = encoding_type, ::std::enable_if_t<is_code_points_replaceable_v<_Unused>>* = nullptr>
inline decltype(auto) constexpr replacement_code_points() const noexcept¶ Returns the desired replacement code points to use.
- Remark
Is only callable if the function call exists on the wrapped encoding.
-
template<typename _Unused = encoding_type, ::std::enable_if_t<is_code_units_maybe_replaceable_v<_Unused>>* = nullptr>
inline decltype(auto) constexpr maybe_replacement_code_units() const noexcept¶ Returns the desired replacement code units to use, or an empty optional-like type if there is nothing present.
- Remark
This is only callable if the function call exists on the wrapped encoding. It is broken down into a contiguous view type formulated from bytes if the wrapped code unit types do not match.
-
template<typename _Unused = encoding_type, ::std::enable_if_t<is_code_points_maybe_replaceable_v<_Unused>>* = nullptr>
inline decltype(auto) constexpr maybe_replacement_code_points() const noexcept¶ Returns the desired replacement code units to use.
- Remark
This Is only callable if the function call exists on the wrapped encoding.
-
inline constexpr bool contains_unicode_encoding() const noexcept¶
Whether or not this encoding is some form of Unicode encoding.
-
template<typename _InputRange, typename _OutputRange, typename _ErrorHandler>
inline constexpr auto decode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, decode_state &__s) const¶ Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
- Remark
To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.
- Parameters
__input – [in] The input view to read code uunits from.
__output – [in] The output view to write code points into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. For this encoding, the state is empty and means very little.
- Returns
A ztd::text::decode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
-
template<typename _InputRange, typename _OutputRange, typename _ErrorHandler>
inline constexpr auto encode_one(_InputRange &&__input, _OutputRange &&__output, _ErrorHandler &&__error_handler, encode_state &__s) const¶ Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.
- Remark
To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.
- Parameters
__input – [in] The input view to read code points from.
__output – [in] The output view to write code units into.
__error_handler – [in] The error handler to invoke if encoding fails.
__s – [inout] The necessary state information. For this encoding, the state is empty and means very little.
- Returns
A ztd::text::encode_result object that contains the reconstructed input range, reconstructed output range, error handler, and a reference to the passed-in state.
Public Static Attributes
-
static constexprconst::std::size_t max_code_points = max_code_points_v<_UBaseEncoding>¶
The maximum number of code points a single complete operation of decoding can produce. This is 1 for all Unicode Transformation Format (UTF) encodings.
-
static constexprconst::std::size_t max_code_units = (max_code_units_v<_UBaseEncoding> * sizeof(_BaseCodeUnit)) / (sizeof(_Byte))¶
The maximum code units a single complete operation of encoding can produce.