Validate Decodable๏
Validation is the way to verify a given sequence of input can have a specific action performed on it. Particularly, we check here if the input of code units can be turned into code points of the given encoding. The way it does this, however, is two-fold:
it first decodes the input code units, to see if it can do the transformation without loss of information; then,
it encodes the output from the last step, to see if the final output is equivalent to the input.
The algorithm for this is as follows:
โฉ Is the
inputvalue empty? If so, is thestatefinished and have nothing to output? If both are true, return the current results with the the emptyinput,validset to true, andstates, everything is okay โ !โฉ Otherwise,
Set up an
intermediatestorage location ofcode_points, using themax_code_pointsof the input encoding, for the next operations.Set up an
intermediate_checked_outputstorage location ofcode_units, using themax_code_unitsof the input encoding, for the next operations.Do the
decode_onestep frominput(using itsbegin()andend()) into theintermediatecode_pointstorage location.๐ If it failed, return with the current
input(unmodified from before this iteration, if possible),validset to false, andstates.
Do the
encode_onestep from theintermediateinto theintermediate_checked_output.๐ If it failed, return with the current
input(unmodified from before this iteration, if possible),validset to false, andstates.
Compare the
code_units of theinputsequentially against thecode_units within theintermediate_checked_output.๐ If it failed, return with the current
input(unmodified from before this iteration, if possible),validset to false, andstates.
โฉ Update
inputโsbegin()value to point to after what was read by thedecode_onestep.โคด๏ธ Go back to the start.
This fundamental process works for all encoding objects, provided they implement the basic Lucky 7. The reason for checking if it can be turned back is to ensure that the input code units actually match up with the output code units. If an encoding performs a lossy transformation in one direction or the other, then validation will fail if it cannot reproduce the input exactly. And, you will know the exact place in the input that caused such a failure.
There are extension points used in the API that allow certain encodings to get around the limitation of having to do both the decode_one step and the encode_one step, giving individual encodings control over the verification of a single unit of input and of bulk validation as well.
Check out the API documentation for ztd::text::validate_decodable_as to learn more.