Decode
Decoding is the action of converting from one sequence of encoded information to a sequence of decoded information. The formula given for Decoding is effectively just the first half of the diagram shown in the main Lucky 7 documentation, reproduced here with emphasis added:

The generic pathway between 2 encodings, but modified to show the exact difference between the encoding step and the decoding step.
In particular, we are interested in the operation that helps us go from the encoded input to the decoded output, which is the top half of the diagram. The output we are interested in is labeled as an “intermediate”, because that is often what it is. But, there are many uses for working directly with the decoded data. Many Unicode algorithms are specified to work over unicode code points or unicode scalar values. In order to identify Word Breaks, classify Uppercase vs. Lowercase, perform Casefolding, Regex over certain properties properly, Normalize text for search + other operations, and many more things, one needs to be working with code points as the basic unit of operation.
Thusly, we use the algorithm as below to do the work. Given an input
of code_unit
s with an encoding
, a target output
, and any necessary additional state
, we can generically bulk convert the input sequence to a form of code_point
s in the output
:
⏩ Is the
input
value empty? If so, is thestate
finished and have nothing to output? If both are true, return the current results with the the emptyinput
,output
, andstate
, everything is okay ✅!⏩ Otherwise,
Do the
decode_one
step frominput
(using itsbegin()
andend()
) into theoutput
code_point
storage location.🛑 If it failed, return with the current
input
(unmodified from before this iteration, if possible),output
, andstate
s.
⏩ Update
input
‘sbegin()
value to point to after what was read by thedecode_one
step.⤴️ Go back to the start.
This involves a single encoding type, and so does not need any cooperation to go from the code_unit
s to the code_point
s. Notably, the encoding’s code_point
type will hopefully be some sort of unicode code point type (see: ztd::text::is_code_point for a more code-based classification). Though, it does not have to be for many different (and very valid) reasons.
Check out the API documentation for ztd::text::decode to learn more.