any_encoding

any_encoding is a class type whose sole purpose is to provide a type-generic, byte-based, runtime-deferred way of handling encodings.

Aliases

using ztd::text::any_encoding = any_byte_encoding<::std::byte>

The canonical erased encoding type which uses a std::byte as its code unit type and an unicode_code_point as its code point type, with spans for input and output operations.

Remark

If the input encoding does not match std::byte, it will be first wrapped in a ztd::text::encoding_scheme first.

using ztd::text::compat_any_encoding = any_byte_encoding<char>

The canonical erased encoding type which uses a char as its code unit type and an unicode_code_point as its code point type, with spans for input and output operations.

Remark

If the input encoding does not match char, it will be first wrapped in a ztd::text::encoding_scheme first. Use this type when dealing with what are effectively byte stream inputs but oriented in a legacy manner, such as old std::string or <iostream>-based work.

using ztd::text::ucompat_any_encoding = any_byte_encoding<unsigned char>

The canonical erased encoding type which uses a unsigned char as its code unit type and an unicode_code_point as its code point type, with spans for input and output operations.

Remark

If the input encoding does not match unsigned char, it will be first wrapped in a ztd::text::encoding_scheme first. Use this type when dealing with what are effectively byte stream inputs but oriented around a slightly more modern approach to proper unsigned data handling with unsigned char.

Base Template

template<typename _Byte, typename _CodePoint = unicode_code_point>
class any_byte_encoding : public ztd::text::any_encoding_with<_Byte, const unicode_code_point, const _Byte, unicode_code_point>

An encoding type that wraps up other encodings to specifically traffic in the given _Byte type provided, which is typically set to std::byte .

Remark

This type traffics solely in std::span s, which for most people is fine. Others may want to interface with different iterator types (e.g., from a custom Rope implementation or other). For those, one must first create ranges that can operate with those iterators, then use them themselves. (It’s not an ideal process at the moment, and we are looking to make this experience better.) It is recommended to use the provided ztd::text::any_encoding type definition instead of accessing this directly, unless you have a reason for using a different byte type (e.g., interfacing with legacy APIs).

Template Parameters:

_Byte – The byte type to use. Typically, this is either unsigned char or std::byte .

Public Types

using decode_state = any_decode_state

The state that can be used between calls to decode.

Remark

This is an opaque struct with no members. It follows the β€œencoding-dependent state” model, which means it has a constructor that takes an ztd::text::any_encoding_with so it can properly initialize its state.

using encode_state = any_encode_state

The state that can be used between calls to encode.

Remark

This is an opaque struct with no members. It follows the β€œencoding-dependent state” model, which means it has a constructor that takes an ztd::text::any_encoding_with so it can properly initialize its state.

using code_unit = ranges::range_value_type_t<_EncodeCodeUnits>

The individual units that result from an encode operation or are used as input to a decode operation.

using code_point = ranges::range_value_type_t<_DecodeCodePoints>

The individual units that result from a decode operation or as used as input to an encode operation.

using is_encode_injective = ::std::false_type

Whether or not the encode operation can process all forms of input into code point values.

Remark

This is always going to be false because this is a type-erased encoding; this value is determined by a runtime decision, which means that the most conservative and truthful answer is selected for this property.

using is_decode_injective = ::std::false_type

Whether or not the decode operation can process all forms of input into code point values.

Remark

This is always going to be false because this is a type-erased encoding; this value is determined by a runtime decision, which means that the most conservative and truthful answer is selected for this property.

Public Functions

any_byte_encoding() = delete

Cannot default-construct a ztd::text::any_byte_encoding object.

template<typename _EncodingArg, typename ..._Args, ::std::enable_if_t<!::std::is_same_v<remove_cvref_t<_EncodingArg>, any_byte_encoding> && !::std::is_same_v<__txt_detail::__code_unit_or_void_t<remove_cvref_t<_EncodingArg>>, _Byte> && !is_specialization_of_v<remove_cvref_t<_EncodingArg>, ::ztd::text::any_byte_encoding> && !::std::is_same_v<remove_cvref_t<_EncodingArg>, __base_t> && !is_specialization_of_v<remove_cvref_t<_EncodingArg>, ::std::in_place_type_t>>* = nullptr>
inline any_byte_encoding(_EncodingArg &&__encoding, _Args&&... __args)

Constructs a ztd::text::any_byte_encoding with the encoding object and any additional arguments.

Remark

If the provided encoding does not have a byte code_unit type, it is wrapped in an ztd::text::encoding_scheme first.

Parameters:
  • __encoding – [in] The encoding object that informs the ztd::text::any_byte_encoding what encoding object to store.

  • __args – [in] Any additional arguments used to construct the encoding in the erased storage.

template<typename _EncodingArg, typename ..._Args, ::std::enable_if_t<!::std::is_same_v<_Byte, code_unit_t<remove_cvref_t<_EncodingArg>>>>* = nullptr>
inline any_byte_encoding(::std::in_place_type_t<_EncodingArg>, _Args&&... __args)

Constructs a ztd::text::any_byte_encoding with the encoding object and any additional arguments.

Remark

If the provided encoding does not have a byte code_unit type, it is wrapped in an ztd::text::encoding_scheme first.

Template Parameters:

_EncodingArg – The Encoding specified by the std::in_place_type<...> argument.

Parameters:

__args – [in] Any additional arguments used to construct the encoding in the erased storage.

template<typename _EncodingArg, typename ..._Args, ::std::enable_if_t<::std::is_same_v<_Byte, code_unit_t<remove_cvref_t<_EncodingArg>>>>* = nullptr>
inline any_byte_encoding(::std::in_place_type_t<_EncodingArg> __tag, _Args&&... __args)

Constructs a ztd::text::any_byte_encoding with the encoding object and any additional arguments.

Remark

If the provided encoding does not have a byte code_unit type, it is wrapped in an ztd::text::encoding_scheme first.

Template Parameters:

_EncodingArg – The Encoding specified by the std::in_place_type<...> argument.

Parameters:
  • __tag – [in] A tag containing the encoding type.

  • __args – [in] Any additional arguments used to construct the encoding in the erased storage.

any_byte_encoding(const any_byte_encoding&) = delete

Cannot copy-construct a ztd::text::any_byte_encoding object.

any_byte_encoding &operator=(const any_byte_encoding&) = delete

Cannot copy-assign a ztd::text::any_byte_encoding object.

any_byte_encoding(any_byte_encoding&&) = default

Move-constructs a ztd::text::any_byte_encoding from the provided r-value reference.

Remark

This leaves the passed-in r-value reference without an encoding object. Calling any function on a moved-fron ztd::text::any_byte_encoding, except for destruction, is a violation and invokes Undefined Behavior (generally, a crash).

any_byte_encoding &operator=(any_byte_encoding&&) = default

Move-assigns a ztd::text::any_byte_encoding from the provided r-value reference.

Remark

This leaves the passed-in r-value reference without an encoding object. Calling any function on a moved-fron ztd::text::any_byte_encoding, except for destruction, is a violation and invokes Undefined Behavior (generally, a crash).

inline ::std::optional<::ztd::span<const code_point>> maybe_replacement_code_points() const noexcept

Retrieves the replacement code points for when conversions fail and ztd::text::replacement_handler_t (or equivalent) needs to make a substitution.

Returns:

A std::optional of ztd::span of const code_points. The returned std::optional value is engaged (has a value) if the stored encoding has a valid replacement_code_points function and it can be called. If it does not, then the library checks to see if the maybe_replacement_code_points function exists, and returns the std::optional from that type directly. If neither are present, an unengaged std::optional is returned.

inline ::std::optional<::ztd::span<const code_unit>> maybe_replacement_code_units() const noexcept

Retrieves the replacement code units for when conversions fail and ztd::text::replacement_handler_t (or equivalent) needs to make a substitution.

Returns:

A std::optional of ztd::span of const code_units. The returned std::optional value is engaged (has a value) if the stored encoding has a valid replacement_code_units function and it can be called. If it does not, then the library checks to see if the maybe_replacement_code_units function exists, and returns the std::optional from that type directly. If neither are present, an unengaged std::optional is returned.

inline bool contains_unicode_encoding() const noexcept

Returns whether or not the encoding stored in this ztd::text::any_encoding_with is a Unicode encoding.

Remark

This can be useful to know, in advance, whether or not there is a chance for lossy behavior. Even if, at compile time, various functions will demand you use an error handler, this runtime property can help you get a decent idea of just how bad and lossy this conversion might be compared to normal UTF conversion formats.

inline __decode_result decode_one(_DecodeCodeUnits __input, _DecodeCodePoints __output, __decode_error_handler __error_handler, decode_state &__state) const

Decodes a single complete unit of information as code points and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Remark

To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.

Parameters:
  • __input – [in] The input view to read code uunits from.

  • __output – [in] The output view to write code points into.

  • __error_handler – [in] The error handler to invoke if encoding fails.

  • __state – [inout] The necessary state information. For this encoding, the state is empty and means very little.

Returns:

A ztd::text::decode_result object that contains the input range, output range, error handler, and a reference to the passed-in state.

inline __encode_result encode_one(_EncodeCodePoints __input, _EncodeCodeUnits __output, __encode_error_handler __error_handler, encode_state &__state) const

Encodes a single complete unit of information as code units and produces a result with the input and output ranges moved past what was successfully read and written; or, produces an error and returns the input and output ranges untouched.

Remark

To the best ability of the implementation, the iterators will be returned untouched (e.g., the input models at least a view and a forward_range). If it is not possible, returned ranges may be incremented even if an error occurs due to the semantics of any view that models an input_range.

Parameters:
  • __input – [in] The input view to read code points from.

  • __output – [in] The output view to write code units into.

  • __error_handler – [in] The error handler to invoke if encoding fails.

  • __state – [inout] The necessary state information. For this encoding, the state is empty and means very little.

Returns:

A ztd::text::encode_result object that contains the input range, output range, error handler, and a reference to the passed-in state.

Public Static Attributes

static constexpr ::std::size_t max_code_points = _MaxCodePoints

The maximum number of code points a single complete operation of decoding can produce. This is 1 for all Unicode Transformation Format (UTF) encodings.

static constexpr ::std::size_t max_code_units = _MaxCodeUnits

The maximum code units a single complete operation of encoding can produce.

static constexpr ::ztd::text_encoding_id decoded_id = ::ztd::text_encoding_id::unknown

The decoded id. Because this is a type-erased encoding, anything can come out: therefore, it is set to β€œunknown” at all times.

static constexpr ::ztd::text_encoding_id encoded_id = ::ztd::text_encoding_id::unknown

The encoded id. Because this is a type-erased encoding, anything can come out: therefore, it is set to β€œunknown” at all times.