Kevux Git Server

]> Kevux Git Server - fll/log

Kevin Day [Sun, 17 Jul 2022 14:07:56 +0000 (09:07 -0500)]

Regression: An if-condition followed by a completed if-condition is not being processed.

The commit 4ddc0910eb872bf242895e1e5e804f50f671901d did not address this use case.

Example:
  if == x x
    print x is x
  if == y y
    print y is y

commit | commitdiff | tree

Kevin Day [Sun, 17 Jul 2022 03:36:45 +0000 (22:36 -0500)]

Cleanup: Documentation in regards to the newly added +E/++Error and the changes from +q to +Q.

commit | commitdiff | tree

Kevin Day [Sun, 17 Jul 2022 03:09:09 +0000 (22:09 -0500)]

Update: Add support for +E/++error, change '+q' to '+Q', and fix some past tense words.

Selecting a quiet mode that still prints errors is very helpful in embedded fakefiles inside of controller rules.

I noticed that almost all of the verbosity related parameters are upper case except for the "quiet" parameter.
Change the "quiet" parameter from "+q" to "+Q".

I noticed some words in the past tense.
The goal is to use present tense.
Using past tense a habit I hope to get out of when programming.

commit | commitdiff | tree

Kevin Day [Sun, 17 Jul 2022 01:25:57 +0000 (20:25 -0500)]

Bugfix: Error verbosity is not being preserved.

commit | commitdiff | tree

Kevin Day [Sun, 17 Jul 2022 01:09:53 +0000 (20:09 -0500)]

Update: Testfiles now need to use the "exist" rather than "exists".

commit | commitdiff | tree

Kevin Day [Sun, 17 Jul 2022 01:00:35 +0000 (20:00 -0500)]

Bugfix: IKI variable substitutionr results in incorrect parameter structure.

The commit d94d5337c44d7b2d6f3ed183e8d2c94b4bdad1f6 exposed an existing bug.

There parameters are not always expanding properly resulting in the parameter being empty, having the incorrect order, or being improperly merged with another parameter.
Change the array resize behavior to resize after incrementing the arguments rather than before.
Detect and handle special cases where separation needs to be applied and when separation does not need to be applied.
Remove random space that is being accidentally appended when printing arguments (probably an accident from a previous commit).

The commit 6012208c61e616a5d31d285ba8873f55b987bf70 did not fully solve the problems it attempted to solve.
Handle additional cases, such as:

  settings:
    parameter a iki <-assure_space unassure_space->
    parameter b value

  main:

    if exist 'parameter:"a"'
      print yes (parameter:"a")
    else
      print no (parameter:"a")

    print 0 parameter:"a"
    print 1 parameter:"b"
    print 2 parameter:"b".
    print 3 "parameter:"b""
    print 4 "parameter:"b\" between parameter:"b""
    print 5 'begin parameter:"a" middle parameter:"a" end'
    print 6 "begin parameter:"a\" middle parameter:"a\" end"
    print 7 begin parameter:"a" middle parameter:"a" end
    print 8 begin parameter:"a"! middle parameter:"a"@parameter:"a" end

Should produce results like:
  no (iki <-assure_space unassure_space->)
  0 iki <-assure_space unassure_space->
  1 value
  2 value.
  3 value
  4 value between value
  5 begin iki <-assure_space unassure_space-> middle iki <-assure_space unassure_space-> end
  6 begin iki <-assure_space unassure_space-> middle iki <-assure_space unassure_space-> end
  7 begin iki <-assure_space unassure_space-> middle iki <-assure_space unassure_space-> end
  8 begin iki <-assure_space unassure_space->! middle iki <-assure_space unassure_space->@iki <-assure_space unassure_space-> end

Move the relevant arguments and iki data into a shared cache to save memory consumption.
Rename path_cache to cache_path for consistency.

commit | commitdiff | tree

Kevin Day [Fri, 15 Jul 2022 04:27:43 +0000 (23:27 -0500)]

Bugfix: Single quotes are not being properly detected in FSS Extended Read functions.

A copy and paste mistake where f_fss_quote_type_double_e when instead f_fss_quote_type_single_e should be used resulted in the quote being set to NULL.

Also do some code clean up.

commit | commitdiff | tree

Kevin Day [Thu, 14 Jul 2022 02:05:08 +0000 (21:05 -0500)]

Update: Change "exists" to "exist" in fakefile syntax.

The use of "exists" is grammatically correct and the use of "exist" is grammatically incorrect.
This is not in English grammar.
The practices of this project are to focus on using "s" strictly for plural.
The practices of this project are to use simple or base words more often.

The area in which proper grammar is allowed is when interacting with the user rather than with code.
A project like Fake has a target user who is a programmer.
This is a grey area.

The project is already using "if define" rather than "if defined".
For the purpose of keeping a consistent design, I am favoring "exist" over "exists" for this grey area.
Another bonus is that "exist" is shorter than "exists" (however trivial).

commit | commitdiff | tree

Kevin Day [Wed, 13 Jul 2022 23:19:21 +0000 (18:19 -0500)]

Update: Strip out NULL characters after applying delimits.

Once a rule is read and the IKI data is parsed, apply the IKI delimits.
NULL characters replace the delimits.
Strip out all NULL characters from the string after the delimits are applied.

commit | commitdiff | tree

Kevin Day [Wed, 13 Jul 2022 12:07:45 +0000 (07:07 -0500)]

Update: Replace Unicode Terminate escape sequence with Non-printing escape sequence.

Use "\!" instead of "\U-".
It is simpler and has the convenience of not being part of an IKI variable.

Update the alphabetic ordering.

The documentation is missing context IKI variable.
Describe all supported context IKI variables.

commit | commitdiff | tree

Kevin Day [Wed, 13 Jul 2022 12:05:32 +0000 (07:05 -0500)]

Bugfix: Delimits are not being applied for IKI variables.

The controller program is not applying the delimits for would-be-valid IKI variables.
This becomes a huge problem when these would-be-valid IKI variables are passed to the fake program.
If the would-be-valid IKI variables are properly delimited, then the fake program would see them as valid IKI variables.

commit | commitdiff | tree

Kevin Day [Wed, 13 Jul 2022 12:01:39 +0000 (07:01 -0500)]

Bugfix: IKI variables are incorrectly being processed when there is a non-IKI IKI-like string.

The delimits, after the first, are not being processed.

This is the result of an accidental double increment.
When checking against a possible IKI variable and it is determined that the string cannot be an IKI variable, a double increment occurs.
What is happening is that the break statement only breaks out of the immediate loop.
There is a second loop that does an increment is not being broken out of.

Utilize the separator_found boolean to determine whether or not to perform the additional break.

commit | commitdiff | tree

Kevin Day [Tue, 12 Jul 2022 01:03:12 +0000 (20:03 -0500)]

Cleanup: Fix alphabetic ordering.

Some of the color structures cannot be alphabetically ordered.
These can be, so make it so.

commit | commitdiff | tree

Kevin Day [Mon, 11 Jul 2022 02:45:33 +0000 (21:45 -0500)]

Security: Invalid read for formatted printing using partial ranges on a string.

If the start position is greater than the used buffer, then an invalid read occurs.
Properly verify that the start position is not greater than or equal to the used length of the string.

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 22:49:53 +0000 (17:49 -0500)]

Feature: Add support to the "print" operation for escape sequences just like the "write" operation has.

The print now prints raw data.

Fix documentation comments.

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 22:26:42 +0000 (17:26 -0500)]

Cleanup: Converted type is actually uint32_t rather than f_utf_char_t.

The f_utf_char_t is supposed to be an uint32_t so this is not a problem.

The intent and design of this, however, is that f_utf_char_t is a special case representing the character as a string rather than as a digit.
The f_utf_char_t is stored as a 4-byte integer to store each byte representing a character.

The uint32_t is simply a straight up 4-byte integer.

This is the numeric value of the code point rather than the representation as a string.
This is an important semantic difference.

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 22:10:52 +0000 (17:10 -0500)]

Feature: The featureless make program now supports the "write" operation.

This is an oversight on my part.
There should be an operation to write to a file.

There are two forms of this new "write" operation.
1) Truncate a file (deletes all data within a file).
2) Append to a file.

A file is created if it does not already exist in both cases.

The "write" operation supports some standard escape sequences as well as some non-standard ones.

Standard Escape Sequences:
- "\f": Form Feed.
- "\n": New Line.
- "\r": Carriage Return.
- "\t": Tab.
- "\v": Vertical Tab.
- "\\": Backslash Character (may require additional slashes in certain circumstances.)
- "\0": NULL Character.

Non-Standard Escape Sequences:
- "\U+": Unicode Sequence (followed by a valid Unicode sequence with a minimum 4 hexidecimal digits and a maximum of 6 hexidecimal digits).
- "\U-": Terminate a Unicode Sequence, allowing for "\U+000A\U-5" to be equivalent to "\n5".

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 22:04:57 +0000 (17:04 -0500)]

Cleanup: Fix documentation comments.

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 05:45:05 +0000 (00:45 -0500)]

Bugfix: Incorrect information is printed on certain errors.

Remove unused line variable.

The f_fss_count_lines() function appends to the calculated length variable.
The number is not being reset.
This results in each iteration adding to the previous:
  Line number 1, count = 1.
  Line number 2, count = 3.
  Line number 3, count = 6.
  etc...

Reset the line number on each pass of the loop to get the correct line number.

An error message is printing "1" when it should be printing "2".

The "%Q" should be used instead of "%s" for the static string.

Replace "parameter" with "Content" to be consistent with other error messages.

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 05:20:27 +0000 (00:20 -0500)]

Cleanup: Update controller examples.

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 04:52:15 +0000 (23:52 -0500)]

Bugfix: "State is now ..." should not be printed when quiet is passed.

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 03:28:12 +0000 (22:28 -0500)]

Bugfix: Empty strings improperly pass validation checks.

An operation such as the following:
if exists 'define:"does_not_exist"'

Results in an empty string.

The empty string is passing the existence check.
Empty strings should fail existence checks in this case.

Handle all such cases that I am able to quickly find.

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 02:52:07 +0000 (21:52 -0500)]

Refactor: "if defined" and "if not defined" to be easier to use in fake program.

Using "if not defined parameter work" can be confusing.
Using "if not parameter work" is shorter and easier to understand.

Using "if defined environment PATH" can be very confusing.
Using "if define PATH" is shorter and a lot easier to understand.

Break apart the "if defined" (and "if not defined") logic into two operations:
1) "if define" (and "if not define").
2) "if parameter" (and "if not parameter").

This makes the behavior easier to understand as it directly maps to the "defined" and "parameter" settings.

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 02:50:53 +0000 (21:50 -0500)]

Update: Ensure first if block is initialized to operate.

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 00:30:25 +0000 (19:30 -0500)]

Cleanup: Remove unnecessary includes in fake program.

commit | commitdiff | tree

Kevin Day [Sun, 10 Jul 2022 00:00:59 +0000 (19:00 -0500)]

Cleanup: Fix spelling mistake, 'tread' should be 'treat'.

commit | commitdiff | tree

Kevin Day [Sat, 9 Jul 2022 22:59:59 +0000 (17:59 -0500)]

Update: Use "settings" instead of "setting" for better consistency between fake program and controller program.

Featureless Make is using "settings" and the Controller program is using "setting".
Fix this inconsistency.
The term "settings" sounds more accurate than "setting".

commit | commitdiff | tree

Kevin Day [Sat, 9 Jul 2022 22:44:24 +0000 (17:44 -0500)]

Update: Restrict environment to PATH and LD_LIBRARY_PATH by default.

Change all of the setting files and fakefiles to restrict the environment variables.
Only PATH and LD_LIBRARY_PATH are exposed so that custom build environments can easily be used by default.

commit | commitdiff | tree

Kevin Day [Sat, 9 Jul 2022 22:33:21 +0000 (17:33 -0500)]

Feature: The controller program should expose the "define" and "parameter" at the Entry and Exit level.

The "define" and "parameter" should be made available in the Entry and Exit files.
This allows for passing data to all Rules.

Update documentation.

commit | commitdiff | tree

Kevin Day [Sat, 9 Jul 2022 22:19:55 +0000 (17:19 -0500)]

Security: Invalid read when using -s/--settings in fake program.

The Featureless Make -s/--settings parameter handling code has a typoe where the wrong enumeration is used.
This results in an invalid read.

commit | commitdiff | tree

Kevin Day [Sat, 9 Jul 2022 22:02:15 +0000 (17:02 -0500)]

Update: Make environment variable handling design consistent between fake and controller programs.

The Featureless Make system does not have a way of passing all environment variables.
Add a flag to designate whether or not environments is empty because it is not defined or is empty because it is defined as empty just like the controller program does.

This then allows for more flexible control over whether or not the environment variable security.

Update the example setting files and fakefiles to expose PATH and LD_LIBRARY_PATH by default.

commit | commitdiff | tree

Kevin Day [Sat, 9 Jul 2022 16:29:51 +0000 (11:29 -0500)]

Update: Further reduce memory allocation pressure by increasing default small allocation.

Double the default small allocation size from 4 to 8.
This accepts the compromise that this will increase the amount of memory used in certain cases.

commit | commitdiff | tree

Kevin Day [Sat, 9 Jul 2022 16:27:09 +0000 (11:27 -0500)]

Feature: Add missing function f_environment_get_all().

A get all environment variables function should exist.
The POSIX/libc standards do not seem to provide one.

Utilize the "environ" variable to load all of the environment variables into a string map array.

commit | commitdiff | tree

Kevin Day [Sat, 9 Jul 2022 16:14:12 +0000 (11:14 -0500)]

Security: Environment value has invalid read.

The string may not be allocated.
Check that the string.used is not 0 but if it is then pass an empty string.

commit | commitdiff | tree

Kevin Day [Sat, 9 Jul 2022 04:58:16 +0000 (23:58 -0500)]

Bugfix: Condition blocks are still being processed when they should be skipped.

Re-design the block code to simplify the logic and make the code more readable.
This is only a partial re-design.
I did not perform an extensive review.
I am considering writing some runtime/program tests to better catch problems and regressions.

commit | commitdiff | tree

Kevin Day [Tue, 5 Jul 2022 15:34:35 +0000 (10:34 -0500)]

Update: The "engine" rule setting should support parameters.

commit | commitdiff | tree

Kevin Day [Tue, 5 Jul 2022 13:53:42 +0000 (08:53 -0500)]

Cleanup: Remove unused task feature.

This functionality did not make the cut for the 0.6.x stable release series.
I may revisit this in the future.

commit | commitdiff | tree

Kevin Day [Tue, 5 Jul 2022 13:47:27 +0000 (08:47 -0500)]

Update: Change default controller settings path to './'.

This feels more natural to me now that I am writing and testing the controller settings.
I originally wanted the default to be self contained.
If I want to run in a sub-directory such as 'controller/', then just pass '-s controller'.

commit | commitdiff | tree

Kevin Day [Tue, 5 Jul 2022 13:43:31 +0000 (08:43 -0500)]

Refactor: Rename "script" setting to "engine".

The scripting engine is now called "engine".
This fixes ambiguity issues between the "script" action and the scripting engine.
This makes the code and configuration files easier to read and understand.

commit | commitdiff | tree

Kevin Day [Tue, 5 Jul 2022 12:50:07 +0000 (07:50 -0500)]

Update: Change execute error code handling to better accommodate standard GNU Bash return codes.

This introduces the status codes F_call and F_call_not as part of the required changes.
This also introduces F_yes, F_yes_not, F_no, and F_no_not.

Improve the wording of some of the error messages.

commit | commitdiff | tree

Kevin Day [Tue, 5 Jul 2022 01:30:27 +0000 (20:30 -0500)]

Cleanup: Fix word mistake.

commit | commitdiff | tree

Kevin Day [Tue, 5 Jul 2022 01:17:48 +0000 (20:17 -0500)]

Feature: The fake progam is supposed to support a piped fakefile.

I thought I implemented this already.
I just tried to use it and found it that I had not implemented.
This feature is supposed to be in the stable release.

commit | commitdiff | tree

Kevin Day [Mon, 4 Jul 2022 20:16:47 +0000 (15:16 -0500)]

Update: Next minor version (0.6.0).

commit | commitdiff | tree

Kevin Day [Sat, 2 Jul 2022 03:28:42 +0000 (22:28 -0500)]

Update: Improve design in f_conversion to be safer when handling variables allowed to be replaced.

The constants are allowed (and encouraged) to be changed as desired by some developer or distributor.
This means constant strings like f_string_ascii_1_s could, in theory, be any length.

Change the do..while loops into normal while loops.
Change the while loop using sizeof(f_char_t) to instead use the actual constant string structure(via a pointer).

This situations was identified by the -fanalyzer functionality of GCC-12.1.
The -fanalyzer from earlier GCC versions, such as GCC 11, did not identify this.

The sizeof(f_char_t) is not strictly needed for the fwrite_unlocked() calls because they are wrapped in a loop.
The sizeof(f_char_t) can be assumed to be 1 in general and even if it is greater than 1, the loop will still ensure success.
Removing the sizeof(f_char_t) simplifies the design.

commit | commitdiff | tree

Kevin Day [Sat, 2 Jul 2022 03:00:49 +0000 (22:00 -0500)]

Bugfix: Signal code is not being set when accepted signal is received.

commit | commitdiff | tree

Kevin Day [Fri, 1 Jul 2022 22:55:29 +0000 (17:55 -0500)]

Update: Add additional checks just in case execution is attempted with both program name and arguments have no data.

commit | commitdiff | tree

Kevin Day [Fri, 1 Jul 2022 22:29:48 +0000 (17:29 -0500)]

Cleaup: Spelling mistake in debugging documentation.

commit | commitdiff | tree

Kevin Day [Fri, 1 Jul 2022 22:22:05 +0000 (17:22 -0500)]

Update: Example bootstrap script now supports building all programs.

Looping over all programs, building them, and installing them is a very common process.
Adding support for this to the example bootstrap script should save me some time and effort.

Also change the separate clean and build commands into a single command using the rebuild command.

commit | commitdiff | tree

Kevin Day [Fri, 1 Jul 2022 21:19:54 +0000 (16:19 -0500)]

Update: Add note about ulimit privileges potentially causes failure of example controller rules.

commit | commitdiff | tree

Kevin Day [Fri, 1 Jul 2022 05:36:16 +0000 (00:36 -0500)]

Update: Add the last of the unit tests for f_thread.

This implements the last of the intended unit tests for f_thread.

commit | commitdiff | tree

Kevin Day [Fri, 1 Jul 2022 05:32:38 +0000 (00:32 -0500)]

Bugfix: Problems exposed by f_thread unit tests.

Notable fixes:
- Rename f_thread_semaphore_file_create() to f_thread_semaphore_file_open().
- Rename f_thread_semaphore_file_delete() to f_thread_semaphore_file_close().
- Rename f_thread_semaphore_file_destroy() to f_thread_semaphore_file_delete().
- Have f_thread_semaphore_file_open() accept a double pointer for semaphore because sem_open() returns a pointer.
- Initializer f_thread_semaphore_t_initialize is on a union which is initialized differently from a normal digit.

commit | commitdiff | tree

Kevin Day [Thu, 30 Jun 2022 05:40:06 +0000 (00:40 -0500)]

Progress: Add more unit tests for f_thread.

commit | commitdiff | tree

Kevin Day [Thu, 30 Jun 2022 05:39:12 +0000 (00:39 -0500)]

Bugfix: Problems exposed by f_thread unit tests.

commit | commitdiff | tree

Kevin Day [Wed, 29 Jun 2022 22:10:19 +0000 (17:10 -0500)]

Update: The featureless make should default to 'make' mode.

This makes the fake program closer to how make operations.
With this changed, just type 'fake' and it operates as if 'fake make' was the command given.

commit | commitdiff | tree

Kevin Day [Tue, 28 Jun 2022 03:44:07 +0000 (22:44 -0500)]

Progress: Add more unit tests for f_thread.

commit | commitdiff | tree

Kevin Day [Tue, 28 Jun 2022 03:43:37 +0000 (22:43 -0500)]

Bugfix: Problems exposed by f_thread unit tests.

commit | commitdiff | tree

Kevin Day [Mon, 27 Jun 2022 01:54:20 +0000 (20:54 -0500)]

Progress: Add unit tests for f_thread.

There is still a long way to go but this is a good start.

Ideally, the f_thread will be the last project I write unit tests before the stable release is made.

commit | commitdiff | tree

Kevin Day [Mon, 27 Jun 2022 01:52:48 +0000 (20:52 -0500)]

Bugfix: Problems and clean ups exposed when writing f_thread unit tests.

This in particular extracts the structure related functions into separate files to better follow the functional oriented programming practice.

commit | commitdiff | tree

Kevin Day [Mon, 27 Jun 2022 01:52:12 +0000 (20:52 -0500)]

Cleanup: Add missing new line.

commit | commitdiff | tree

Kevin Day [Sat, 25 Jun 2022 15:51:37 +0000 (10:51 -0500)]

Update: Add the last of the planned f_utf unit tests.

Implement the white space unit tests.

commit | commitdiff | tree

Kevin Day [Sat, 25 Jun 2022 15:49:30 +0000 (10:49 -0500)]

Update: White space function changes.

Make the is white space functions accept "strict" to be more consistent with how other functions operation.
For the next development release I want to consider separate functions to avoid passing a boolean as a parameter to do this (for performance reasons).

This changes behavior in some cases and if I did something wrong then there will be a regression.
Look out for white space regressions specifically in the FSS programs.

commit | commitdiff | tree

Kevin Day [Sat, 25 Jun 2022 05:00:37 +0000 (00:00 -0500)]

Update: Use "decimal" instead of "digit".

The unit tests are failing because the function no longer exists.
The use if "digit" is probably the result of an over zealous refactor.
Rename the affected functions back to "decimal".

commit | commitdiff | tree

Kevin Day [Sat, 25 Jun 2022 04:13:46 +0000 (23:13 -0500)]

Update: Implement more f_utf unit tests.

Only the is white space tests are yet to be implemented.

commit | commitdiff | tree

Kevin Day [Sat, 25 Jun 2022 04:09:26 +0000 (23:09 -0500)]

Bugfix: Problems in f_utf exposed by unit tests.

The is alphabetic needs to perform the is valid check because its default catch-all is returning F_true.
Ideally at some point (probably distant point) in the future, the literal codes for alphabetic will be matched rather than calling all of the other functions.
In this situation the is valid check can be removed.

Several of the is digit test value assignments are not checking if the value (the pointer) is NULL.

Some of the is word sequences are incorrect.

Add missing f_utf_character_is_alphabetic_numeric().

Fix function name for f_utf_character_is_control_format().

Several is word checks for f_utf_char_t are improperly comparing the entire sequence to an ASCII value when only the first byte should be compared.

commit | commitdiff | tree

Kevin Day [Fri, 24 Jun 2022 05:23:16 +0000 (00:23 -0500)]

Update: Finish f_utf unit tests for digits.

This does not handle the alphabetic variation of the digit unit tests.
There are still several other (non-digit) unit tests that I plan to get to.

commit | commitdiff | tree

Kevin Day [Thu, 23 Jun 2022 23:52:19 +0000 (18:52 -0500)]

Update: The fake program should check file existence when clean is combined with another command like build or make.

When the make or build command is specified after a clean command, the clean command should do an appropriate file dependency check.
This acts as a safety measure such that if the make or build command could not normally be run due to the missing required files, then the clean operation should not be run.

Rename a related function to a shorter name.

commit | commitdiff | tree

Kevin Day [Thu, 23 Jun 2022 23:10:32 +0000 (18:10 -0500)]

Bugfix: The fss_identify needs the latest version of f_utf_is_digit().

commit | commitdiff | tree

Kevin Day [Thu, 23 Jun 2022 22:50:51 +0000 (17:50 -0500)]

Bugfix: Entry file not found when --settings is used.

The incorrect range is being used on the wrong variable.

commit | commitdiff | tree

Kevin Day [Thu, 23 Jun 2022 03:16:59 +0000 (22:16 -0500)]

Progress: Continue on f_utf digits.

This also fixes problems observed in running the unit tests.

commit | commitdiff | tree

Kevin Day [Wed, 22 Jun 2022 23:06:06 +0000 (18:06 -0500)]

Bugfix: The < 0xc2 test is supposed to be against the first byte rather than the second.

commit | commitdiff | tree

Kevin Day [Wed, 22 Jun 2022 05:20:54 +0000 (00:20 -0500)]

Bugfix: A typo resulting in treating < 0xc3 as invalid UTF-8 when it is instead < 0xc2.

This is for 2-width characters, such as: '²' (U+00B2) and '½' (U+00BD).
These character should not be treated as invalid.

I have not yet investigated to see if I need to make other corrections.
This is just an obvious mistake that I found and immediately fixed.

commit | commitdiff | tree

Kevin Day [Wed, 22 Jun 2022 04:19:36 +0000 (23:19 -0500)]

Progress: f_utf unit tests and make significant change Unicode digit/decimal functions.

The controller program is using f_utf_is_alphabetic_decimal().
The previous functionality of that function is now handled by f_utf_is_alphabetic_digit().

I reconsidered some of the design I implemented in the previous commit (e696e3941592c6910f2f8ecc87a698d4f618c3b4).
The design of reading the value for the variable "value" and then saving to it is too much complexity.
Simplify the design and just expect the caller to read the "value" and decide if it is or is not in range.

Get rid of the "*_is_decimal()" functions.
The "*_is_digit()" functions work like the "*_is_decimal()" functions did.
I am avoiding the term "decimal" because it refers to base-10.
The term "digit" is a bit more general.
The only downside is that fractions might fall under "digit" (really they are two digits), which this function currently does not handle.
The "*_is_numeric()" functions will recognize fractions.

I didn't get as far as I wanted to.
The number of Unicode values to assign has worn me down.
The tests processing is not complete and I haven't gone back and done my normal review.

I tried to keep the "value" as small as possible, but unsurprisingly some language out there has a digit that represents billions.
I am forced to use a 64-bit data type for this.

commit | commitdiff | tree

Kevin Day [Tue, 21 Jun 2022 00:26:53 +0000 (19:26 -0500)]

Update: The f_utf project regarding digits and perform other clean ups follow up.

I had not gotten around to testing the programs after the previous commit.
I did not get to writing the function f_utf_is_alphabetic_digit() (and then forgot about this important part).

The controller program is using f_utf_is_alphabetic_decimal().
The previous functionality of that function is now handled by f_utf_is_alphabetic_digit().

commit | commitdiff | tree

Kevin Day [Mon, 20 Jun 2022 04:42:18 +0000 (23:42 -0500)]

Update: The f_utf project regarding digits and perform other clean ups.

Redesign the digit and decimal behavior.
The is digit functions now refers to base-10 but does not attempt to return the identified digit.
The is decimal functions now refers to base-10 and supports providing the identified digit.
The is decimal functions also support other base units than just base-10.

The alphabetic digit/numeric functions now also have a alphebetic decimal function.

Clean up more places in the code using "sequence" rather than "character" or "characters".

Functions like f_utf_character_is_alpha_digit() are now like f_utf_character_is_alphabetic_digit().

Add related unit tests.
The is digit functions have unit tests that tests if the digit returned is correct.
I have not reviewed all of the "numeric" Unicode digits to confirm/deny that my is decimal functions are complete.

I observed what looks like bugs in the alphabetic functions.
In these cases the final return statement is returning F_false when they instead should be returning F_true.

There are minor corrections in documentation.

commit | commitdiff | tree

Kevin Day [Sun, 19 Jun 2022 00:24:56 +0000 (19:24 -0500)]

Update: Use a regular int instead of uint8_t for counting digit in conversion function.

The calculations are based on data.width, which is an int.
Make the code more consistent and less error prone by matching the data top.

commit | commitdiff | tree

Kevin Day [Sun, 19 Jun 2022 00:22:56 +0000 (19:22 -0500)]

Security: Floating point exception due to incorrect number type used in conversion function.

The power is being used as the entire value.
To do this it must be capable of holding the entire supported digits of f_number_unsigned_t.
Using int results in a floating point exception.

commit | commitdiff | tree

Kevin Day [Sun, 19 Jun 2022 00:11:07 +0000 (19:11 -0500)]

Bugfix: Add missing endianness check to f_convesion.

This just adds the inverse shift on the assumption that the original code is correct and is for little-endian.

commit | commitdiff | tree

Kevin Day [Sat, 18 Jun 2022 23:58:03 +0000 (18:58 -0500)]

Bugfix: Uppercase 'T' needs to be supported.

I incorrectly used lower case 't' in both condition blocks when I need to check for both lower and upper.

commit | commitdiff | tree

Kevin Day [Sat, 18 Jun 2022 23:57:47 +0000 (18:57 -0500)]

Cleanup: Style improvement.

commit | commitdiff | tree

Kevin Day [Sat, 18 Jun 2022 23:50:38 +0000 (18:50 -0500)]

Bugfix: The byte_dump --first and --last are not always working as expected.

The problem is due to a calculation resulting in a negative value.
The code "width_utf == -1 ? 0 : width_utf - width_count" does not account for when width_count > width_utf.

Refactor the use of "characters" with "sequence" to reflect recent changes in terminology usage.

commit | commitdiff | tree

Kevin Day [Sat, 18 Jun 2022 23:29:37 +0000 (18:29 -0500)]

Regression: Remove invalid validation block.

This appears to have been introduced in 002bf17595459e65173be16f983977ead99593b6.

There is a lot of restructuring during that commit which may explain the mistake.

Bad code somehow got mixed in.
I'm not sure what this block is supposed to do but it is clearly wrong in multiple ways.
Remove it entirely.

commit | commitdiff | tree

Kevin Day [Sat, 18 Jun 2022 22:28:32 +0000 (17:28 -0500)]

Update: Follow up previous Unicode changes.

The previous commit changed a significant amount of behavior.
That commit noted that follow up changes would be necessary.

First things first.
I noticed that when I simplified the is valid checks I ended up over simplifying them.
There are several byte sequences that are not valid UTF-8 sequences.

I previously added surrogates and it turns out that UTF-8 specifically does not support Unicode surrogates.
Remove all related code.

The f_utf_char_t is supposed to be in big-endian format.
The macros are fixed to properly handle this.
This fix exposed problems in the conversion functions.
The conversion functions lack the proper big-endian and little-endian support.
Introduce a new structure and parameters to support designating the big-endian and little-endian.
Support a default order to host byte order.

The utf8 program needs to properly handle the endianness in a different way.
The bytes are in left-to-right format but when converted are converted in a left-to-right format but shifted to the right.
Swapping between little-endian to big-endian would be incorrect because the byte order is aleady correct.
The byte position is what is incorrect.
That is 0x0000c280 should be shifted to 0xc2800000.
Swapping the endianness would instead yield 0x80c20000 (which is incorrect).

The use of the word "character" as a variable name and in documentation can be confusing.
I have recently defined a "byte sequence", a "code point", and a "unicode" as specific types.
Change the word "character" to the appropriate name to make the code less confusing and more specific.
There are also other words used in place of "character" that might not be the ones listed above.

Some of the tests, particularly the emoji tests, have incorrect data.
I discovered that many sources out on the internet violate the standard and call code points an emoji that are not official recognized as an emoji by the standard.
I'm going with wikipedia on the new and updated emoji list.

The f_char_t is available so update old code that still uses uint8_t to instead use f_char_t for character related data.

Changes to the is valid code resulted in identifying invalid byte sequences that were previously considered valid.

commit | commitdiff | tree

Kevin Day [Fri, 17 Jun 2022 03:56:34 +0000 (22:56 -0500)]

Update: Unit tests for f_utf and relating changes or bug fixes.

Fix several problems exposed by unit tests.
Fix several unit tests to work as expected due to problems with the data files.

At some point I seem to have diverged from always ensuring that the f_utf_char_t is always big-endian.
I probably got lost in ensuring the differences between big and little endian that I ended up making the f_utf_char_t act little-endian in cases where host is little-endian.
The f_utf_char_t must always be big-endian.
However, there are cases where the big and little endian behavior must be processed.
Break up the macros into having "_be" and "_le" to make this possible.

The iscntrl() check return value needs to be explicitly handled to ensure that only F_false or F_true is returned.
This is already fixed in one function already.
Apply the existing fix to the other function.

The f_utf_char_t should be seen as a single character rather than a stream of bytes.
Unit tests now treat any non-zero value after the designated width as invalid.
The is valid checking code now tests for this invalid case.

The unit tests are improved.
Test for F_true and F_false rather than calling assert_true() and assert_false().
Error bits and other status codes were previously passing when they should fail due to the use of assert_true() and assert_false().

This commit changes the byte order of the f_utf_char_t.
This will break code such as the code used in the utf8 program.
A follow up commit is necessary to fix any byte order problems.

commit | commitdiff | tree

Kevin Day [Tue, 14 Jun 2022 12:25:18 +0000 (07:25 -0500)]

Update: Add unit tests in f_utf project.

These tests are created based on the comments in the code.
Unlike previous tests, I reviewed the Unicode database separate of my code.
This approach is faster but will not expose any Unicode Codepoints that I missed when writing the code.

This adds tests for the following:
- combining
- phonetic
- subscript
- superscript
- wide

commit | commitdiff | tree

Kevin Day [Tue, 14 Jun 2022 12:24:24 +0000 (07:24 -0500)]

Cleanup: Use correct range.

This is not a problem because but it does contain ranges that can never match.

commit | commitdiff | tree

Kevin Day [Tue, 14 Jun 2022 00:02:51 +0000 (19:02 -0500)]

Update: Add some unit tests for f_utf.

Add the structure for the remaining tests.
The (currently) unused tests are just copy and pastes and may need changes.

With the exception of the "valid" tests, these unit tests use statically generated data files containing bytesequences (unsigned 32-bit) in base-10 format.
The base-10 format representation of the bytesequence is used for easy reading using standard libc functions.

These static data files are generated using a combination of the Unicode database codepoints, a script I wrote, and the "unicode" program.
The Unicode codepoints used for each generated bytesequence set are also provided.

This implements the following tests:
- control
- digit
- emoji
- symbol
- valid

The intent of this is to test the entire spectrum of valid codepoints (except for the "valid" tests which tests every single possible value).
Due to every single value being tested by "valid" tests for both f_utf_is_valid() and f_utf_character_is_valid() results in a long running test.

commit | commitdiff | tree

Kevin Day [Mon, 13 Jun 2022 23:58:12 +0000 (18:58 -0500)]

Bugfix: Problems exposed by unit tests for f_utf.

Correct comments and add missing characters.

Add the missing f_utf_character_is_surrogate() function.

Change the is valid algorithm to one I developed for the unit tests.
These are cleaner and simpler due to bitwise operations.

commit | commitdiff | tree

Kevin Day [Sun, 12 Jun 2022 15:00:11 +0000 (10:00 -0500)]

Update: Add unit tests for is_punctuation in f_utf project.

This includes minor style clean ups.

commit | commitdiff | tree

Kevin Day [Sun, 12 Jun 2022 13:33:09 +0000 (08:33 -0500)]

Update: Add unit tests for is_private in f_utf project.

commit | commitdiff | tree

Kevin Day [Sun, 12 Jun 2022 04:23:37 +0000 (23:23 -0500)]

Update: Add script for generating Unicode Codepoints from integers.

A simple script that I am using for generating a range of Unicode Codepoints in the format needed by the generate_unicode.sh script.
The private use area is represented by multiple sets of all values within some range.
These ranges are a massive list.
This script generates these.
The only thing I need to do is use a calculator program to convert hex to integer.
These integers are then passed to the script as an inclusive range.

commit | commitdiff | tree

Kevin Day [Sun, 12 Jun 2022 03:55:07 +0000 (22:55 -0500)]

Cleanup: Incorrect comments in iki headers.

commit | commitdiff | tree

Kevin Day [Sun, 12 Jun 2022 03:54:32 +0000 (22:54 -0500)]

Update: Implement symbol function unit tests and fix comments.

This brings in and utilizes the symbol test data.

commit | commitdiff | tree

Kevin Day [Sun, 12 Jun 2022 03:52:52 +0000 (22:52 -0500)]

Update: Generate Unicode script to support generating test data.

The test data is generated from a line separate Unicude Codepoint file.
The generated test data is in base-10 format rather than hexidecimal to make it easier for standard libc functions like atoll() to be used.

commit | commitdiff | tree

Kevin Day [Sun, 12 Jun 2022 03:45:03 +0000 (22:45 -0500)]

Bugfix: Problems exposed by unit tests in f_utf.

Only UTF-8 symbols are tested.

commit | commitdiff | tree

Kevin Day [Sat, 11 Jun 2022 19:09:39 +0000 (14:09 -0500)]

Bugfix: Last character of file after conversion from code point is not printed by utf8 program.

The algorithm doesn't print the character until it knows when the character is complete.
There are no checks for when end of file is reached.
This results in the last character not being printed, even if the code is complete.

Be sure to return the status rather than always returning F_none under certain circumstances in utf8_detect_codepoint().
Update documentation about return value in utf8_detect_codepoint().
Initialize the character.used to 0 rather than 4 (because it has no data!).
For better practice, compare using >= rather than ==.
Remove unnecessary i = 0 assignment.

commit | commitdiff | tree

Kevin Day [Sat, 11 Jun 2022 02:18:14 +0000 (21:18 -0500)]

Bugfix: Incorrect 4-width characters are generated.

This is caused by a simple typo.

commit | commitdiff | tree

Kevin Day [Fri, 10 Jun 2022 05:17:59 +0000 (00:17 -0500)]

Cleanup: Remove execute bit from script.

The build scripts should not have the execute bit set.
These bits will get appropriately set during the packaging process.

commit | commitdiff | tree

Kevin Day [Fri, 10 Jun 2022 05:16:07 +0000 (00:16 -0500)]

Update: Finish adding Unicode symbol handling code.

I used a script that I wrote to assist.
Additional tweaks were still necessary.
There is a lot of room for error, but this saved me an enormous amount of time.

commit | commitdiff | tree

Kevin Day [Fri, 10 Jun 2022 05:15:17 +0000 (00:15 -0500)]

Feature: Provide simple script for assisting in the mass generation of Unicode handling code.

This is a very simple script and is not intended for complex tasks.

commit | commitdiff | tree

Kevin Day [Wed, 8 Jun 2022 04:46:30 +0000 (23:46 -0500)]

Update: Add unit tests for f_signal project.

The Featureless Linux Library (FLL) Git repository.

RSS Atom