Kevin Day [Sat, 25 Apr 2020 17:46:36 +0000 (12:46 -0500)]
Update: redesign byte_dump --last to be inclusive
The --last parameter should be inclusive for consistency with the rest of the project, namely FSS.
By redesigning the --last to internally be represented as a relative offset, the behavior can be simplified.
Doing this also adds support for potential future designs where a --length parameter may be provided that is a relative size parameter instead of an absolute one like --last.
Kevin Day [Sat, 25 Apr 2020 01:54:05 +0000 (20:54 -0500)]
Update: add initial handling of zero-width space in FSS projects
UTF-8 zero-width characters have the potential for being combining characters.
In such cases, the combiner/joiner should be considered part of what is being combined.
That is a combiner/joiner before a graph should be treated as a graph.
A combiner/joiner before a whitespace should be treated as a whitespace.
Disclaimer: I suspect that this will eventually need to be broken down to handle each specific case.
A combiner/joiner on whitespace that results in rendering a printable/visible character would be a violation of the whitespace design principles of FSS.
Further investigation is needed and will likely require changes.
Add appropriate @todo for further development of this functionality.
Rename max_width to width_max to follow newer practices.
Kevin Day [Sat, 25 Apr 2020 00:39:35 +0000 (19:39 -0500)]
Feature: add f_string_length_size max of (uint64_t - 4)
The goal here is that any string processing must be able to add a given UTF-8 width (4-byte) and not oveflow.
Much of the code in this project will not check this as it should be done so at a higher level (performance reasons).
The ideal time is that when allocating some string, always allocate at max f_string_length_size.
Kevin Day [Fri, 24 Apr 2020 02:43:01 +0000 (21:43 -0500)]
Bugfix: fix UTF-8 whitespace detection and provide zero-width detection function
The whitespace detection codes for UTF-8 were incorrect.
Non-printing characters, called zero-width, are not whitespace.
Move them out of the whitespace detection and provide a new function for detecting zero-width.
Handle additional UTF-8 whitespace character codes that I had previously missed.
Kevin Day [Thu, 23 Apr 2020 01:40:50 +0000 (20:40 -0500)]
Feature: implement support for the -T/--trim parameter
Provide support for trimming the object names on input and output.
After implementing this I suddenly remember that the standard might require that the whitespace before and after a valid object name are to be ignored.
This may be removed in the future and fixed in the library.
Additional investigation on how I want to handle this needs to happen first.
The standard is originally designed around ASCII, which only ASCII whitespace is considered whitespace.
This will probably have to be fixed to match the additional goals of the project in terms of whitespace handling.
Kevin Day [Wed, 22 Apr 2020 02:00:18 +0000 (21:00 -0500)]
Feature: expand string functions and utf-8 string functions
The need for this was realized while developing this trim parameter feature.
This has been added as is for commit isolation and may be incomplete.
This utilizes private functions to reduce duplicate code.
While the use of private functions is generally unwanted in this project, this specific case seems to be an exception.
Add rip functions.
Add non-dynamic equivalent of some string functions.
Add trim functions.
Kevin Day [Tue, 25 Feb 2020 03:28:38 +0000 (21:28 -0600)]
Progress: continue development of FSS Extended List
In particular:
- Remove excessive fl_fss_increment_buffer() uses.
- The removed code may be a good idea long term, but for now use a simpler and more efficient approach.
- Fix some mistakes in the slash delimiter handling.
- Begin the initial work for recursively (or so..) processing the nested lists.
Kevin Day [Sat, 23 Nov 2019 04:56:20 +0000 (22:56 -0600)]
Update: build level and monolithic fixes and improvements
Add missing library: fll_file.
Make it even easier to compile against "level" and "monolithic" build processes by providing "--level" and "--monolithic" parameters to the generate.sh script and associated settings files.
Make sure package.sh clears other build modes (when --level is specified, make sur --individual and --monolithic modes are not set).
Kevin Day [Fri, 22 Nov 2019 02:24:43 +0000 (20:24 -0600)]
Update: implement *_delete_simple() and *_destroy_simple() macros and related changes
I do not want a *_delete() and *_destroy() that doesn't provide the ability to handle a status code.
However, in practice, the *_delete() and *_destroy() macros rarely needed the status response checked.
Additional f_status variables were created only to be provided for the status parameter of the stated macros.
This is a waste.
This provides and utilizes alternative *_delete() and *_destroy() macros called *_delete_simple() and *_destroy_simple().
These simple macros do not accept or process the status code.
The resulting code is simpler and easier.
I am on the fence whether or not to throw away the *_delete() and *_destroy() macros that utilize a status parameter.
Future versions may or may not replace the *_delete() and *_destroy() with the *_delete_simple() and *_destroy_simple() macros.
Update *_delete() and *_destroy() macros as needed.
Kevin Day [Wed, 20 Nov 2019 01:43:54 +0000 (19:43 -0600)]
Refactor: make status codes more consistent
Better follow the naming paradigm where reasonably possible.
Use more consistent naming, which should also help with reducing the changes of enum to function name conflicts.
Alphabetically organize status codes per group.
Kevin Day [Wed, 20 Nov 2019 01:00:00 +0000 (19:00 -0600)]
Update: rewrite status code functions and related changes
Fix several problems with the status code processing, namely missing or incomplete data.
Add status code max size defines.
Add missing "#ifndef _di_f_status_codes_" to status codes.
Update cases where 'error' was not renamed to 'code' in status code sources.
Add missing status codes.
Fix incorrect define, "#ifdef _di_fl_status_invalid_" should instead be "#ifndef _di_fl_status_invalid_".
Fix incorrect sizes associated with status code output strings.
Prepare pipe support (not yet implemented).
Ensure f_utf is included and linked in all appriopriate dependencies.
Treat f_utf as a core level_0 project and update documentation accodingly.
Relocate fl_console_parameter_process() into f_console_parameter_process().
Kevin Day [Sun, 17 Nov 2019 06:58:40 +0000 (00:58 -0600)]
Bugfix: When --last is set to 0, entire file is dumped
The --last value being set to 0 is internally used to represent entire file.
Explicitly setting --last to 0 makes no sense, so set the minimum allowed size for --last to 1.
Kevin Day [Sat, 16 Nov 2019 20:50:58 +0000 (14:50 -0600)]
Update: --name and --at in combination should process '--at' relative to '--name'
This logic is already done with --line and other parameters.
Doing this same thing with --name and --at makes the code/logic more consistent and reasonable.
Kevin Day [Thu, 14 Nov 2019 02:54:31 +0000 (20:54 -0600)]
Feature: add support for including empty content in fss_basic_read
Empty content is an object that has no content.
When there is no content for an object, no content is printed for that line and that line is not included in content totals or line selections.
Kevin Day [Wed, 13 Nov 2019 06:12:42 +0000 (00:12 -0600)]
Progress: continue implementing fss_basic_read, also numerous other fixes/tweaks
I decided to allow --at and --name to be used at the same time (and therefore at the same --depth).
The depth code is to be rewritten and that is only partially rewritten.
Many of the parameters are now written and the fss_basic_read needs to be tested and reviewed.
(There fss_basic_read is still incomplete, but there is enough working code to begin testing.)
Kevin Day [Sun, 10 Nov 2019 04:31:20 +0000 (22:31 -0600)]
Feature: add support for duodecimal (base-12)
Now that duodecimal has been added to the FLL project, make sure byte_dump can print in that format.
There is no printf() code for base-12, so implement a custom print process.
Kevin Day [Sun, 10 Nov 2019 04:28:28 +0000 (22:28 -0600)]
Update: implement f_number_signed and f_number_unsigned, as either 32-bit, 64-bit, or 128-bit
Provide the types f_number_signed and f_number_unsigned as a way to define the default "number" type to be used for string to number conversions and array indexes.
By providing 32-bit, 64-bit (default), and 128-bit types, the type can then be adjusted to more easily work on limited hardware or expand to more capable hardware.
This will be the recommended number data type to use in FLL functions going forward.
Kevin Day [Sat, 9 Nov 2019 01:25:07 +0000 (19:25 -0600)]
Progress: rewriting fss_* programs and all dependencies
I am changing the parameters and design of the fss_* programs, such as fss_basic_read.
There is no work done on the fss_*_write programs yet.
There were a lot of changes in the dependencies, including cleanups and improvements.
The parameters passed to the fss_* functions will now be more consistent across each of them.
This should make scripting much easier.
There is a lot of incomplete work and I am focused currently on getting fss_basic_read to work as desired.
I halted my work on f_conversion and fl_console to make sure none of these changes are lost.
I do not expect this commit to compile with everything due to the incomplete work.
I would rather post incomplete code than risk losing code as has happened in the past.
Kevin Day [Fri, 1 Nov 2019 04:07:31 +0000 (23:07 -0500)]
Progress: continue working on fss-003 Extended List
A nested type has been created.
I suspect that I will need to change the structure of the other types to improve consistency, but more review and consideration is needed before any such changes are made.
A read program was written but it is essentially a copy and paste of Basic List, with a few minor changes just to make it compile.
The program arguments of all the FLL programs will need to be changed such that adding support for "depth" selection can be used for things lile Extended List.
The memory allocation is implemented but not reviewed.
I converted the behavior to support nesting, but I need to review the logic to ensure I caught everything.
Kevin Day [Wed, 18 Sep 2019 00:09:44 +0000 (19:09 -0500)]
Update: finish implementing f_utf_character_is_valid() and related UTF-8 changes
UTF-8 BOM is actually not a thing but only a suggestion, see RFC 3629.
I consider it a very bad practice now that I have learned that it is also the zero width space.
Get rid of the UTF-8 BOM support, it is a bad idea and is not to be supported by this project.
The referenced rfc also provides an easier way to view the valid ranges that my previous resources (such as wikipedia).
This helped me finish this function.
Updated byte_dump to better utilize this and to remove no longer necessary code.
Fix an accidental incorrect "invalid detection" check use before calling f_utf_character_is_valid() in byte_dump.
Explicitly print a "." or " " for UTF-8 control characters (ASCII control characters are already handled before this point so it is safe to call f_utf_character_is_control()).
Kevin Day [Tue, 17 Sep 2019 00:49:00 +0000 (19:49 -0500)]
Progress: finish the main parts of invalid UTF-8 detection
This wraps up the work needed for all explicitly declared invalid sequences.
There are some sequences, such as "Overlong", that are considered invalid (according to Wikipedia at this time) but the source (namely Wikipedia) does not explicitly declare what they are.
I need to figure out what these really are and handle them.
There are also likely cases of accidental copy and paste that will be fixed as I discover them (sorry, the size of documentation I had to go through to get these invalid sequences is massive to me).
There are also some @todo situations that I would like to resolve.
Kevin Day [Sun, 15 Sep 2019 03:35:26 +0000 (22:35 -0500)]
Update: disable init until I can get around to it
I decided to start at least clean up some of the compile errors, but this was simply too much of a mess.
Instead, just comment out code and deal with it later.
Kevin Day [Sat, 14 Sep 2019 20:59:45 +0000 (15:59 -0500)]
Progress: begin converting byte_dump to using f_utf_character_is_valid()
The function, f_utf_character_is_valid(), can be a bit expensive, so only call it if the current character is not already known to be invalid.
The function, byte_dump_print_text(), will need to be updated as well, given that the invalid range now includes some sequences currently being swapped with a space.
Kevin Day [Sat, 14 Sep 2019 00:38:52 +0000 (19:38 -0500)]
Update: begin improving UTF-8
I am now moving to perform a more thorough implementation of UTF-8 support.
Cleaned up the functions.
Due to the sheer size of the changes needed, I am uploading this is stages to ensure nothing gets lost.
The work done is incomplete.
The funtions will need to be reviewed once everything is in place.
Kevin Day [Thu, 12 Sep 2019 22:17:19 +0000 (17:17 -0500)]
Update: use int8_t instead of char
Guarantee that we are always dealing with 1-byte values by using int8_t instead of char.
They should be identical, but this prevents a given system from doing something different.
char by default is signed.
Kevin Day [Thu, 12 Sep 2019 03:38:30 +0000 (22:38 -0500)]
Update: documentation for f_pipe and add additional pipe functions
Provide f_pipe_warning_exists(), f_pipe_error_exists(), and f_pipe_debug_exists().
In theory, the program should be able to grab data piped from any of these sources, if both the source exists and a way to pipe the source exists.