From: Kevin Day Date: Mon, 23 May 2022 05:22:57 +0000 (-0500) Subject: Update: The utf8 documentation to reflect recent changes in the program arguments. X-Git-Url: https://git.kevux.org/?a=commitdiff_plain;h=7147c69645234ec02fe49d3f4674d90aa9089a38;p=kevux.org-website Update: The utf8 documentation to reflect recent changes in the program arguments. --- diff --git a/documentation/utf8.html b/documentation/utf8.html index 8e8cab3..27db545 100644 --- a/documentation/utf8.html +++ b/documentation/utf8.html @@ -91,7 +91,7 @@

- The UTF8 program is a tool for converting from a UTF-8 byte code sequence to the Unicode code point. The byte code can also be referred to as the binary representation of the code even though the byte code is considered text. The term "byte code" here is used to refer to a sequence of bytes intended to represent something, which in this case is a Unicode character. The Unicode code point is the Unicode designation uniquely identifying that particular sequence. The Unicode code point persists across different Unicode encoding beyond UTF-8, such as UTF-16. + The UTF8 program is a tool for converting from a UTF-8 byte sequence to the Unicode code point. The byte sequence can also be referred to as the binary representation of the code even though the byte sequence is considered text. The term "byte sequence" here is used to refer to a sequence of bytes intended to represent something, which in this case is a Unicode character. The Unicode code point is the Unicode designation uniquely identifying that particular sequence. The Unicode code point is not specific to UTF-8 and persists across different encodings, such as UTF-16.

The idea behind the UTF8 program is to provide the answer to the question of what some special UTF-8 character is or to provide a way to create the UTF-8 character given the Unicode code point. @@ -100,7 +100,10 @@ This tool is intended to be scriptable, should handle both piped data and files, and can convert entire files.

- In addition, this tool can be used to validate a given byte code or can be used to get the character width of some byte code or code point. + This tool can be used to validate a given byte sequence or can be used to get the character width of some byte sequence or code point. +

+

+ This tool can be used to store binary data in a text-friendly format and then restore the binary data.

@@ -171,7 +174,7 @@

- The +q/++quiet parameter silences all output that is not the intent and purpose of the program. For example, the purpose of the utf8 program is to print the Unicode code point or the UTF-8 byte code. The +q/++quiet will not suppress this output. The new line printed at the end of the program, is however, not printed. The +q/++quiet is ideal for using in scripting to help guarantee more consistent and controlled output. + The +q/++quiet parameter silences all output that is not the intent and purpose of the program. For example, the purpose of the utf8 program is to print the Unicode code point or the UTF-8 byte sequence. The +q/++quiet will not suppress this output. The new line printed at the end of the program, is however, not printed. The +q/++quiet is ideal for using in scripting to help guarantee more consistent and controlled output.

The +n/++no_color simplifies the output to avoid the special color character codes. The special color character codes tend to take up a lot of extra space and may slow down printing performance. @@ -198,8 +201,8 @@ -b - --from_bytecode - The expected input format is byte code (character data). + --from_bytesequence + The expected input format is byte sequence (character data). -c @@ -213,8 +216,8 @@ -B - --to_bytecode - The output format is bytecode (character data). + --to_bytesequence + The output format is bytesequence (character data). -C @@ -263,7 +266,7 @@ This program establishes a pattern for some of the parameters. The parameters that represent a "from" use lower case short characters and the parameters that represent a "to" use upper case short characters. For short parameters that have both a "from" and a "to", they use the same character with their case being different.

- The default behavior is to assume the expected input is byte code from the command line to be output to the screen as codepoints. + The default behavior is to assume the expected input is byte sequence from the command line to be output to the screen as codepoints.

Multiple input sources are allowed but only a single output destination is allowed.