Progress: Major UTF-8 changes and optimization, begin updating byte_dump and utf8, and miscellaneous changes.
A previous commit accidentally include the utf8 level 3 program while it was being heavily developed.
As it is already committed, commit the latest changes.
The utf8 program is still not done.
While working on the utf program, I noticed that there are some things in the UTF-8 code that is not yet done or correct and is needed.
I also noticed that the byte_dump program needs to handle the narrow and wide widths to assure consistent column line ups.
Such a change requires new functionality in the UTF-8 code for processing the widths.
These two significant needs resulted in me finally getting around to some of the UTF-8 cleanup that I have been needing to do.
- Get rid of the width parameter, and calculate the width as needed (bitwise is chip and allocated a variable and then passing it along parameters is not as cheap).
- Swap some of the conditions to avoid using "!", saving a single operation though structural changes.
- Break out the utf string functions into its own utf_string.c, utf_string.h, private-utf_string.c, and private-utf_string.h.
- Numerous documentation comment cleanups and update (I think there is still more to do).
- Provide F_utf_fragment and F_utf_fragment_not for improved communication of UTF-8 fragments in error responses (rather than re-using F_utf).
- Provide f_utf_unicode_string_from() (I have not yet written a f_utf_unicode_string_to() but I plan to).
- Update endianess detection to use macros (I am include <endian.h>, but I may also provide custom macros to disable and explicitly designate endianess).
The UTF-8 is wide functions are drafted out, but there are a lot of wide character codes that I need to add.
This will be grunt work that will take a notable amount of time.
For now, just add a comment and I will get back to this.
The byte_dump program is depending on the is wide functions and so currently incompletely implements the narrow and wide support.
Try to use present tense in error message.
There are likely many more places, but this is a start.
Add F_first, F_first_not, F_last, F_last_not, F_next, F_next_not, F_previous, and F_previous_not for providing position return codes.
Fix a bug where width is being define by a uint8_t but the calculates are f_array_length_t.
How did this ever work before, by accident?