• Default Character Set when none is defined

    From Scott Street@1:266/625 to All on Tue Jun 3 09:45:20 2025
    Fellow Developers,

    I've been tinkering writing a Python based tosser/packer. Some things I've noticed in processing mail; not all messages contain a character set kludge flag (and I find, to my surprise, my own current BBS doesn't support it -cringe-). Though, more to the specific question: what character set does one use as a default when one is not defined?

    Do I use ASCII, CP437, CP850, or something else? (I'm in the US, so gut reaction is to use CP437) But is that the right choice?

    Secondly, FTS documents suggest that the character set is to applied to the body and header portions of the message, does that include the kludge lines? I'm currently handling them separately as 'ascii'; which given the history of Fidonet, I chose as the more likely answer.

    Looking for your thoughts.

    And if a different echo is more proper, please point me in that direction and I'll continue the discuss there.


    Scott

    --- Mystic BBS v1.12 A49 2024/05/29 (Linux/64)
    * Origin: <=-{ The Digital Post }-=> (1:266/625)
  • From deon@3:633/509 to Scott Street on Wed Jun 4 08:07:31 2025
    Re: Default Character Set when none is defined
    By: Scott Street to All on Tue Jun 03 2025 09:45 am

    Hey Scott,

    Though, more to the specific question: what character set does
    one use as a default when one is not defined?

    Do I use ASCII, CP437, CP850, or something else? (I'm in the US, so gut reaction is to use CP437) But is that the right choice?

    The CHARS kludge is for the benefit of the reader, not the sender.

    It basically tells the reader, that this message is encoded in (CP.., ASCII.., etc), so that if:
    * The reader uses the same encoding, it can display the message as is, OR
    * The reader uses a different encoding, it can convert it from the (CP.., ASCII.., etc) to the one it uses.

    So whats the right choice for you - I'm assuming you (and your users) author messages in the same encoding, so that should be what it is set to.

    I add CP437 to messages that my mailer authors, when sending out echomail/netmail.

    Secondly, FTS documents suggest that the character set is to applied to the body and header portions of the message, does that include the kludge lines? I'm currently handling them separately as 'ascii'; which given the history of Fidonet, I chose as the more likely answer.

    So, the To, From, Subject, and Message Body (including Tagline/Origin lines) can all be technically changed by the sender, and thus in the sender's encoding.

    I dont recall what the FTSC documents say (there is one on the chars kludge), but I think it should apply to all of them - such that a reader knows how to present the receied message to a user.


    ...лоеп
    --- SBBSecho 3.27-Linux
    * Origin: I'm playing with ANSI+videotex - wanna play too? (3:633/509)
  • From Rob Swindell@1:103/705 to Scott Street on Tue Jun 3 17:16:35 2025
    Re: Default Character Set when none is defined
    By: Scott Street to All on Tue Jun 03 2025 09:45 am

    -cringe-). Though, more to the specific question: what character set does one use as a default when one is not defined?

    Do I use ASCII, CP437, CP850, or something else? (I'm in the US, so gut reaction is to use CP437) But is that the right choice?

    Here's what FTS-5003 says about that:
    Incoming messages without "CHRS" control lines should be considered
    as being written in pure ASCII, but may be treated as being written
    in some default character set or character encoding scheme. Such as
    IBM codepage 437, IBM codepage 866 or UTF-8. It is recommended that
    message readers offer the user the option of manually selecting a
    different character set or encoding scheme for these messages on a
    per-area, per-message or other basis.

    For Synchronet, CP437 is assumed when no other character set/encoding is explicitly specified.

    Secondly, FTS documents suggest that the character set is to applied to the body and header portions of the message, does that include the kludge lines? I'm currently handling them separately as 'ascii'; which given the history of Fidonet, I chose as the more likely answer.

    Interesting question. Are you actually finding non-ASCIi chars in kludge lines? I'd be curious what those are (the kludge lines/values).

    And if a different echo is more proper, please point me in that direction and I'll continue the discuss there.

    Maybe FTSC_PUBLIC would be more appropriate for FTN development questions. I seem to recall a NET_DEV echo too, though I don't think it gets much participation.
    --- SBBSecho 3.27-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Scott Street@1:266/625 to deon on Tue Jun 3 21:25:52 2025
    On 04 Jun 2025, deon said the following...
    The CHARS kludge is for the benefit of the reader, not the sender.

    I was thinking I would need to do some translation before a message was stored, but I since changed my thinking on that. I'll let the BBS translate from the writer's charset to the reader's as needed.

    Thanks for the reply.

    --- Mystic BBS v1.12 A49 2024/05/29 (Linux/64)
    * Origin: <=-{ The Digital Post }-=> (1:266/625)
  • From Scott Street@1:266/625 to Rob Swindell on Tue Jun 3 21:28:58 2025
    On 03 Jun 2025, Rob Swindell said the following...
    Here's what FTS-5003 says about that:
    Incoming messages without "CHRS" control lines should be considered
    as being written in pure ASCII, but may be treated as being written
    in some default character set or character encoding scheme. Such as
    IBM codepage 437, IBM codepage 866 or UTF-8. It is recommended that
    message readers offer the user the option of manually selecting a
    different character set or encoding scheme for these messages on a
    per-area, per-message or other basis.

    Perfect. I missed that reading the docs; but it does spell out what I had thought.

    Interesting question. Are you actually finding non-ASCIi chars in kludge lines? I'd be curious what those are (the kludge lines/values).

    No, not in my small, about 500 messages sample; but I wanted to be prepared.

    Maybe FTSC_PUBLIC would be more appropriate for FTN development
    questions. I seem to recall a NET_DEV echo too, though I don't think it gets much participation.

    Indeed, thanks for those. I thought I remembered echos specific for development, it's been a while.

    Thanks for the reply.

    --- Mystic BBS v1.12 A49 2024/05/29 (Linux/64)
    * Origin: <=-{ The Digital Post }-=> (1:266/625)