Validate Encodable๏ƒ

Validation is the way to verify a given sequence of input can have a specific action performed on it. Particularly, we check here if the input of code points can be turned into code units of the given encoding. The way it does this, however, is two-fold:

  • it first encodes the input code units, to see if it can do the transformation without loss of information; then,

  • it decodes the output from the last step, to see if the final output is equivalent to the input.

The algorithm for this is as follows:

  • โฉ Is the input value empty? If so, is the state finished and have nothing to output? If both are true, return the current results with the the empty input, valid set to true and states, everything is okay โœ…!

  • โฉ Otherwise,

    1. Set up an intermediate storage location of code_units, using the max_code_units of the input encoding, for the next operations.

    2. Set up an intermediate_checked_output storage location of code_points, using the max_code_points of the input encoding, for the next operations.

    3. Do the encode_one step from input (using its begin() and end()) into the intermediate code_unit storage location.

      • ๐Ÿ›‘ If it failed, return with the current input (unmodified from before this iteration, if possible), valid set to false, and states.

    4. Do the decode_one step from the intermediate into the intermediate_checked_output.

      • ๐Ÿ›‘ If it failed, return with the current input (unmodified from before this iteration, if possible), valid set to false, and states.

    5. Compare the code_points of the input sequentially against the code_points within the intermediate_checked_output.

      • ๐Ÿ›‘ If it failed, return with the current input (unmodified from before this iteration, if possible), valid set to false, and states.

  • โฉ Update inputโ€˜s begin() value to point to after what was read by the decode_one step.

  • โคด๏ธ Go back to the start.

This fundamental process works for all encoding objects, provided they implement the basic Lucky 7. The reason for checking if it can be turned back is to ensure that the input code units actually match up with the output code units. If an encoding performs a lossy transformation in one direction or the other, then validation will fail if it cannot reproduce the input exactly. And, you will know the exact place in the input that caused such a failure.

There are extension points used in the API that allow certain encodings to get around the limitation of having to do both the encode_one step and the decode_one step, giving individual encodings control over the verification of a single unit of input and of bulk validation as well.

Check out the API documentation for ztd::text::validate_encodable_as to learn more.