Kevin Day [Wed, 22 Apr 2020 02:00:18 +0000 (21:00 -0500)]
Feature: expand string functions and utf-8 string functions
The need for this was realized while developing this trim parameter feature.
This has been added as is for commit isolation and may be incomplete.
This utilizes private functions to reduce duplicate code.
While the use of private functions is generally unwanted in this project, this specific case seems to be an exception.
Add rip functions.
Add non-dynamic equivalent of some string functions.
Add trim functions.
Kevin Day [Tue, 25 Feb 2020 03:28:38 +0000 (21:28 -0600)]
Progress: continue development of FSS Extended List
In particular:
- Remove excessive fl_fss_increment_buffer() uses.
- The removed code may be a good idea long term, but for now use a simpler and more efficient approach.
- Fix some mistakes in the slash delimiter handling.
- Begin the initial work for recursively (or so..) processing the nested lists.
Kevin Day [Sat, 23 Nov 2019 04:56:20 +0000 (22:56 -0600)]
Update: build level and monolithic fixes and improvements
Add missing library: fll_file.
Make it even easier to compile against "level" and "monolithic" build processes by providing "--level" and "--monolithic" parameters to the generate.sh script and associated settings files.
Make sure package.sh clears other build modes (when --level is specified, make sur --individual and --monolithic modes are not set).
Kevin Day [Fri, 22 Nov 2019 02:24:43 +0000 (20:24 -0600)]
Update: implement *_delete_simple() and *_destroy_simple() macros and related changes
I do not want a *_delete() and *_destroy() that doesn't provide the ability to handle a status code.
However, in practice, the *_delete() and *_destroy() macros rarely needed the status response checked.
Additional f_status variables were created only to be provided for the status parameter of the stated macros.
This is a waste.
This provides and utilizes alternative *_delete() and *_destroy() macros called *_delete_simple() and *_destroy_simple().
These simple macros do not accept or process the status code.
The resulting code is simpler and easier.
I am on the fence whether or not to throw away the *_delete() and *_destroy() macros that utilize a status parameter.
Future versions may or may not replace the *_delete() and *_destroy() with the *_delete_simple() and *_destroy_simple() macros.
Update *_delete() and *_destroy() macros as needed.
Kevin Day [Wed, 20 Nov 2019 01:43:54 +0000 (19:43 -0600)]
Refactor: make status codes more consistent
Better follow the naming paradigm where reasonably possible.
Use more consistent naming, which should also help with reducing the changes of enum to function name conflicts.
Alphabetically organize status codes per group.
Kevin Day [Wed, 20 Nov 2019 01:00:00 +0000 (19:00 -0600)]
Update: rewrite status code functions and related changes
Fix several problems with the status code processing, namely missing or incomplete data.
Add status code max size defines.
Add missing "#ifndef _di_f_status_codes_" to status codes.
Update cases where 'error' was not renamed to 'code' in status code sources.
Add missing status codes.
Fix incorrect define, "#ifdef _di_fl_status_invalid_" should instead be "#ifndef _di_fl_status_invalid_".
Fix incorrect sizes associated with status code output strings.
Prepare pipe support (not yet implemented).
Ensure f_utf is included and linked in all appriopriate dependencies.
Treat f_utf as a core level_0 project and update documentation accodingly.
Relocate fl_console_parameter_process() into f_console_parameter_process().
Kevin Day [Sun, 17 Nov 2019 06:58:40 +0000 (00:58 -0600)]
Bugfix: When --last is set to 0, entire file is dumped
The --last value being set to 0 is internally used to represent entire file.
Explicitly setting --last to 0 makes no sense, so set the minimum allowed size for --last to 1.
Kevin Day [Sat, 16 Nov 2019 20:50:58 +0000 (14:50 -0600)]
Update: --name and --at in combination should process '--at' relative to '--name'
This logic is already done with --line and other parameters.
Doing this same thing with --name and --at makes the code/logic more consistent and reasonable.
Kevin Day [Thu, 14 Nov 2019 02:54:31 +0000 (20:54 -0600)]
Feature: add support for including empty content in fss_basic_read
Empty content is an object that has no content.
When there is no content for an object, no content is printed for that line and that line is not included in content totals or line selections.
Kevin Day [Wed, 13 Nov 2019 06:12:42 +0000 (00:12 -0600)]
Progress: continue implementing fss_basic_read, also numerous other fixes/tweaks
I decided to allow --at and --name to be used at the same time (and therefore at the same --depth).
The depth code is to be rewritten and that is only partially rewritten.
Many of the parameters are now written and the fss_basic_read needs to be tested and reviewed.
(There fss_basic_read is still incomplete, but there is enough working code to begin testing.)
Kevin Day [Sun, 10 Nov 2019 04:31:20 +0000 (22:31 -0600)]
Feature: add support for duodecimal (base-12)
Now that duodecimal has been added to the FLL project, make sure byte_dump can print in that format.
There is no printf() code for base-12, so implement a custom print process.
Kevin Day [Sun, 10 Nov 2019 04:28:28 +0000 (22:28 -0600)]
Update: implement f_number_signed and f_number_unsigned, as either 32-bit, 64-bit, or 128-bit
Provide the types f_number_signed and f_number_unsigned as a way to define the default "number" type to be used for string to number conversions and array indexes.
By providing 32-bit, 64-bit (default), and 128-bit types, the type can then be adjusted to more easily work on limited hardware or expand to more capable hardware.
This will be the recommended number data type to use in FLL functions going forward.
Kevin Day [Sat, 9 Nov 2019 01:25:07 +0000 (19:25 -0600)]
Progress: rewriting fss_* programs and all dependencies
I am changing the parameters and design of the fss_* programs, such as fss_basic_read.
There is no work done on the fss_*_write programs yet.
There were a lot of changes in the dependencies, including cleanups and improvements.
The parameters passed to the fss_* functions will now be more consistent across each of them.
This should make scripting much easier.
There is a lot of incomplete work and I am focused currently on getting fss_basic_read to work as desired.
I halted my work on f_conversion and fl_console to make sure none of these changes are lost.
I do not expect this commit to compile with everything due to the incomplete work.
I would rather post incomplete code than risk losing code as has happened in the past.
Kevin Day [Fri, 1 Nov 2019 04:07:31 +0000 (23:07 -0500)]
Progress: continue working on fss-003 Extended List
A nested type has been created.
I suspect that I will need to change the structure of the other types to improve consistency, but more review and consideration is needed before any such changes are made.
A read program was written but it is essentially a copy and paste of Basic List, with a few minor changes just to make it compile.
The program arguments of all the FLL programs will need to be changed such that adding support for "depth" selection can be used for things lile Extended List.
The memory allocation is implemented but not reviewed.
I converted the behavior to support nesting, but I need to review the logic to ensure I caught everything.
Kevin Day [Wed, 18 Sep 2019 00:09:44 +0000 (19:09 -0500)]
Update: finish implementing f_utf_character_is_valid() and related UTF-8 changes
UTF-8 BOM is actually not a thing but only a suggestion, see RFC 3629.
I consider it a very bad practice now that I have learned that it is also the zero width space.
Get rid of the UTF-8 BOM support, it is a bad idea and is not to be supported by this project.
The referenced rfc also provides an easier way to view the valid ranges that my previous resources (such as wikipedia).
This helped me finish this function.
Updated byte_dump to better utilize this and to remove no longer necessary code.
Fix an accidental incorrect "invalid detection" check use before calling f_utf_character_is_valid() in byte_dump.
Explicitly print a "." or " " for UTF-8 control characters (ASCII control characters are already handled before this point so it is safe to call f_utf_character_is_control()).
Kevin Day [Tue, 17 Sep 2019 00:49:00 +0000 (19:49 -0500)]
Progress: finish the main parts of invalid UTF-8 detection
This wraps up the work needed for all explicitly declared invalid sequences.
There are some sequences, such as "Overlong", that are considered invalid (according to Wikipedia at this time) but the source (namely Wikipedia) does not explicitly declare what they are.
I need to figure out what these really are and handle them.
There are also likely cases of accidental copy and paste that will be fixed as I discover them (sorry, the size of documentation I had to go through to get these invalid sequences is massive to me).
There are also some @todo situations that I would like to resolve.
Kevin Day [Sun, 15 Sep 2019 03:35:26 +0000 (22:35 -0500)]
Update: disable init until I can get around to it
I decided to start at least clean up some of the compile errors, but this was simply too much of a mess.
Instead, just comment out code and deal with it later.
Kevin Day [Sat, 14 Sep 2019 20:59:45 +0000 (15:59 -0500)]
Progress: begin converting byte_dump to using f_utf_character_is_valid()
The function, f_utf_character_is_valid(), can be a bit expensive, so only call it if the current character is not already known to be invalid.
The function, byte_dump_print_text(), will need to be updated as well, given that the invalid range now includes some sequences currently being swapped with a space.
Kevin Day [Sat, 14 Sep 2019 00:38:52 +0000 (19:38 -0500)]
Update: begin improving UTF-8
I am now moving to perform a more thorough implementation of UTF-8 support.
Cleaned up the functions.
Due to the sheer size of the changes needed, I am uploading this is stages to ensure nothing gets lost.
The work done is incomplete.
The funtions will need to be reviewed once everything is in place.
Kevin Day [Thu, 12 Sep 2019 22:17:19 +0000 (17:17 -0500)]
Update: use int8_t instead of char
Guarantee that we are always dealing with 1-byte values by using int8_t instead of char.
They should be identical, but this prevents a given system from doing something different.
char by default is signed.
Kevin Day [Thu, 12 Sep 2019 03:38:30 +0000 (22:38 -0500)]
Update: documentation for f_pipe and add additional pipe functions
Provide f_pipe_warning_exists(), f_pipe_error_exists(), and f_pipe_debug_exists().
In theory, the program should be able to grab data piped from any of these sources, if both the source exists and a way to pipe the source exists.
Kevin Day [Thu, 12 Sep 2019 02:03:08 +0000 (21:03 -0500)]
Update: start enum's at 1 where possible
By always setting enums as 1, the 0 value can be reserved as not-set.
There are still a few situations where enums must not start at 1.
Some are:
1) Type defenitions, so as in f_types where the status codes need to start at 0 for f_false.
2) Any enums that map 1to1 to an array, such as with parameter options.
Kevin Day [Tue, 10 Sep 2019 00:44:12 +0000 (19:44 -0500)]
Update: Add 3 presentation modes to byte_dump: normal, simple, and classic
Normal presentation will replace ASCII control or whitespace character with the UTF-8 characters that represent this with a picture character.
Simple presentation will use a single space to represent any given ASCII control or whitespace character.
Classic presentation will do what the "hexdump" tool traditionally does and use a single period to represent an ASCII control or whitespace character.
Kevin Day [Mon, 9 Sep 2019 04:19:10 +0000 (23:19 -0500)]
Update: remove common type wrappers and use typedef instead of '#define'
I intend to begin transitioning from the core types like 'int', 'char', etc...
As part of this, I need to remove a number of the type #define wrappers.
This is also done, in part, because I learned that there are some equivalents to f_min_s_int.
Using explicit types is safer and better designed than something like 'char'.
The goal will be to replace 'char' with uint8_t (or int8_t as needed).
Furthermore, specifying int32_t and int64_t (and similar) should improve the code quality.
The use of types like "wchar", is dangerous because some systems use different sizes.
Instead, for something like "wchar", an uint32_t, might be used.
(although this project is to be designed around UTF-8 so the use of wchar is wrong anyway but it does make good example.)
Kevin Day [Mon, 9 Sep 2019 03:57:47 +0000 (22:57 -0500)]
Update: add a space after "combining" characters and catch a few more invalid UTF-8 sequences
Previously, I just printed a space instead of printing the "combining" characters.
It occurred to me that I could print a space following a known "combining" character to cause it to combine into a space.
This makes things easier to view and still displays the combining character instead of hiding it behind a blank space.
The downside is that this might cause problems if someone tried to copy and paste these combined characters.
Catch a few more invalid UTF-8 sequences that I came across while making these changes.
Fix an existing invalid UTF-8 sequence detection that seems to have been incomplete and incorrect.
Kevin Day [Sun, 8 Sep 2019 21:46:27 +0000 (16:46 -0500)]
Cleanup: replace argc and argv usage with a single structure of argc and argv (f_console_arguments)
Simplify the parameters being passed to functions by providing a helper structure called f_console_arguments to handle the argc and argv standard arguments.
Due to being standard arguments, I am leaving the names as 'argc' and 'argv' despite it being a violation of the naming policy of this project.
('argc' should be something like 'used', and 'argv' should be 'arguments'.)
The firewall had a naming conflict, so rename the usage of "arguments" in firewall into "parameters".
Kevin Day [Sun, 8 Sep 2019 20:39:29 +0000 (15:39 -0500)]
Regression: display "+" and "++" and not "-" and "--" for special parameter options
When I wrote fll_program_print_help_option() I completely forgot to provide a way to set eithe "-" or "+" and "--" or "++".
This resulted in the "--help" display of the options to incorrectly print using "-" and "--".
Add additional function parameters to allow setting the symbols when calling fll_program_print_help_option().
Kevin Day [Sun, 8 Sep 2019 20:22:41 +0000 (15:22 -0500)]
Update: Simplify console priority checking by providing fl_console_parameter_prioritize() and fix name of fl_console functions
Simplify the code for determining which console parameter of some set of console parameters has priority.
Abstract this functionality into its own function so that other projects can leverage this.
The functions in fl_console should be prefixed with fl_console.
Kevin Day [Sun, 8 Sep 2019 04:25:50 +0000 (23:25 -0500)]
Feature: bit_dump level 3 program
Provide a program to help analyze files, supporting UTF-8.
This should work similar to "hexdump" but is not intended to match it feature for feature.
Provides three byte printing modes (with plans for a fourth):
1) hexidecimal (default)
2) octal
3) binary
4) digit (planned)
Provides first and last byte selection support.
A width option is available for specifying the number of bytes to be printed on screen such that each byte is essentially a data column.
With a width of 16, then there would be 16 data columns, each displaying one byte.
Although similar to "hexdump", the first column in bit_dump represents the specific row number.
A text option is provided to display the bytes as a character (similar to how "hexdump" uses "-C").
A placeholder option is available for showing a placeholder where placeholder spaces would otherwise be printed.
A placeholder is printed to ensure alignment.
For example, a printable UTF-8 character that is 3-bytes wide would only visibly take up 1 character of space.
To keep the alignment with text to bytes accurate and consistent, two additional placeholder spaces are appended following the UTF-8 characte.
If the bytes terminate before an entire column set of bytes are printed, then spaces or placeholders are printed until the full column may be printed when in "text" mode.
This will detect and report invalid UTF-8 codes.
Handling printing the characters (via the text option) can be tricky.
There is more work needed to catch all cases.
Some cases cannot be handled if the character is wider than the expected width (causing alignment printing issues).
I am still a bit inexperienced with the intricacies of UTF-8 and I expect there to be issues in this first pass.
Try to avoid returning f_invalid_parameter to represent invalid parameters for standard C/POSIX functions.
Implement new exceptions, like f_invalid_name and f_invalid_desciptor, to accommodate this.