From: Kevin Day Date: Sun, 31 May 2026 15:36:39 +0000 (-0500) Subject: Update: Further clarify FSS specification in regards to Unicode combining characters. X-Git-Url: https://git.kevux.org/?a=commitdiff_plain;h=5115ff79c2bd495e155217e11bcdf688ecdf4d12;p=fll Update: Further clarify FSS specification in regards to Unicode combining characters. I added combining characters to a new line (which is after the new line character as per **Unicode** rules). I discovered that the tools I tested do not combine into the new line. Instead, they present an invalid/incomplete/partial character on the next line. This seems incorrect to me. I want the **FSS** to combine onto the new line and not treat it as a new line for the purposes of termination. Update the standard to favor the current behavior that I have observed. I am going to do further research to try and understand how **Unicode** defines how this should happen. I will follow this up with updates as needed. --- diff --git a/specifications/fss.txt b/specifications/fss.txt index b0ba607b1..88c176bd3 100644 --- a/specifications/fss.txt +++ b/specifications/fss.txt @@ -1,7 +1,7 @@ # fss-0002 iki-0000 # # license open-standard-license-1.0-or-later -# version 2024/01/15 +# version 2026/05/31 # # This file (assumed to be named fss.txt) can be more easily read using the following iki_read commands: # iki_read fss.txt +Q -w -rrrrrrrr anti-KISS anti-KISS ASCII ASCII BOM BOM FSS FSS KISS KISS UTF-8 UTF-8 URL URL XML XML -WWW character "'" "'" code '"' '"' italic '"' '"' @@ -42,18 +42,25 @@ Featureless Settings Specifications: In all cases, specifications that separate Objects from Contents using white space, the first white space separating the Object and Content must not be considered part of the Object nor part of the Content. All spaces after the first separating white space is generally ignored until the first non white space character is found, unless otherwise specified. - Unless otherwise specified, all specifications are newline sensitive (character:"\n" only). - Newline characters are only character:"\n" and are never anything else (character:"\r" is not considered newline in any manner). + Unless otherwise specified, all specifications are new line sensitive (character:"\n" (unicode:"U+000A") only). + Newline characters are only character:"\n" (unicode:"U+000A") and are never anything else (character:"\r" (unicode:"U+000D") is not considered new line in any manner). These specifications refer to characters that have printable representation as italic:"printable". These specifications refer to characters that have no printable representation as italic:"non-printable". White spaces characters that are printable, such as tabs and spaces, must be considered the same type for the purposes of parsing. Non-printing white spaces characters (zero-width characters) are ignored, are treated as placeholders for processing with the exception of combining characters. White spaces that use combining characters result in printable characters and the resulting combination is treated as not white space. + A specification may override this handling of combining characters through explicit definitions. Zero-width characters that use combining characters are treated as non-printing characters and are skipped. In terms of processing, it is recommended that the code:"NULL" character is not considered the end of a string, but this is only a suggestion. Any specification may chose to limit, restrict, or otherwise prohibit special Unicode characters such as combining characters or zero-width characters. - Unless otherwise specified, newlines designate the potential start (or end) of an Object or Content. + The current behavior of existing tools treat new line character:"\n" (unicode:"U+000A") the same regardless if a combining character follows this. + As an exception to the above rules (in order to avoid confusion with how existing software handles this behavior) a new line character:"\n" (unicode:"U+000A") is always treated as a such even if followed by a combining character. + In this situation the combining characters at the start can be considered invalid, however the standard behavior is to ignore them and not treat them as part of the data. + Should the data be sensitive in nature or require strict structure, then treating this situation as an error invalid is recommended. + Should tools, some time in the future, decide to combine onto character:"\n" (unicode:"U+000A"), then this standard can (and should) be reveiwed and updated to a more proper behavior of actually combining onto the control character. + + Unless otherwise specified, new lines designate the potential start (or end) of an Object or Content. Unless otherwise specified, white space may exist to the left of the start of Objects. Unless otherwise specified, white space may exist to the right of the end of Objects, but only if that given Object is properly quoted and the white space is after the terminating quote but before any Content. @@ -118,7 +125,7 @@ Featureless Settings Specifications: Unless otherwise specified, comments are designated by the pound symbol character:"#" but only if only white space is to the left of the pound or the pound character:"#" is at the start of the line. There is no support for inline comments. Unless otherwise specified, the start comment may be delimited by character:"\" in the same manner as Objects and Contents are. - This delimit only applies to the start of a comment (the pound character:"#" character) as there is no terminating character for a comment (other than a newline character:"\n"). + This delimit only applies to the start of a comment (the pound character:"#" character) as there is no terminating character for a comment (other than a new line character:"\n"). A line containing a valid comment is in its entirety ignored. This means that if there is white space before the designation symbol (the pound character:"#" character) then that white space is ignored.