Converting, Counting, and Validating Text
Conversions are one of the more important aspects of dealing with textual data. To support this, ztd.text contains 7 different methods, each with various overloads and inner groupings of functions to aid in encoding, decoding, transcoding, validating, and counting code points and code units.
As shown in the Lucky 7 Design, everything here is supported by just having either the required one or two encoding objects with the designated functions, variables and type definitions. The core of the explanation is in this algorithm:
⏩ Is the
inputvalue empty? If so, is thestatefinished and have nothing to output? If both are true, return the current results, everything is okay ✅.⏩ Otherwise,
Set up an
intermediatebuffer ofcode_points using themax_code_pointsof the input encoding count for the next operation.Do the
decode_onestep frominput(using itsbegin()andend()) into theintermediatecode_pointbuffer.🛑 If it failed, return with the current
input(unmodified from before this iteration, if possible),output, andstate.
Do the
encode_onestep from theintermediateinto theoutput.🛑 If it failed, return with the current
input(unmodified from before this iteration, if possible),output, andstate.
⏩ Update
input‘sbegin()value to point to after what was read by thedecode_onestep.⤴️ Go back to the start.
That’s it for the core loop. Failure is determined exclusively by whether or not the error_code returned from the decode or encode operation’s result object is ztd::text::encoding_error::ok. If it is OK, then the loop continues until the input is exhausted. Otherwise, it stops. This forms the basis of the library, and will essentially be our version of “Elements of Programming”, but for working with Text:
The above algorithm can work for all the below operations:
transcoding: the above loop presented as-is.
encoding: take an
inputofcode_points, and simply do not do the decoding step.decoding: take an
inputofcode_units, and simply do not do the encoding step.validating code units: do the transcoding loop into 2 intermediate buffers, and compare the result of the final
intermediateoutput to theinput.validating code points: do the transcoding loop, but in the reverse direction for an
inputofcode_points (encode first, then decode) into 2 intermediate buffers, and compare the result of the finalintermediateoutput to theinput.counting code units: perform the “encoding” operation into an intermediate buffer and repeatedly count the number of buffered writes, discarding or ignoring the actual contents of the buffer each time.
counting code points: perform the “decoding” operation into an intermediate buffer and repeatedly count the number of buffered writes, discarding or ignoring actual the contents of the buffer each time.
This covers the full universe of potential operations you may want to perform on encoded text, for the purposes of input and output. If you implement the base Lucky 7 or implement the extended Lucky 7 for an encoding, you can gain access to the full ecosystem of encodings within your application.