bugGNU roff - Bugs: bug #63985, [troff] diagnose when attempting...

 
 

bug #63985: [troff] diagnose when attempting to remove an ordinary character

Submitter:  G. Branden Robinson <gbranden>
Submitted:  Fri 31 Mar 2023 04:43:10 PM UTC
   
 
Category:  Core Severity:  3 - Normal
Item Group:  Warning/Suspicious behaviour Status:  Need Info
Privacy:  Public Assigned to:  gbranden
Open/Closed:  Open Planned Release:  None
* Mandatory Fields

Add a New Comment Rich Markup
   

Thu 10 Aug 2023 01:16:07 PM UTC, comment #5: 


comment #3:

> I hit an impediment to implementing this.  Almost everything in the tree is fine with it; all but one automated test passes.  The exception is something internal to the mom(7) package which attempts to remove a whole bunch of ordinary ASCII/Basic Latin characters.  So this is on hold pending my exploration of mom internals and a discussion with Peter Schaffter over alternative solutions or whether, in fact, what mom is doing today should block this change.


Need to do this research/have this discussion.

G. Branden Robinson <gbranden>
Group administrator
Mon 10 Jul 2023 08:46:33 AM UTC, comment #4: 

Just a cross reference.


commit 1ddcdf7c3ea8a8976ea0269cb1561b9ef101fbf3
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
Date:   Sun Apr 2 10:47:33 2023 -0500

    [troff]: Revise ordinary character descriptions.

    * src/roff/troff/input.cpp (token::description): Revise construction of
      description of printable ordinary input characters (U+0021 through
      U+007E).  This is to faciliate better diagnostics from the `rchar`
      request in the future.  See Savannah #63985.


(Typo fixed now in my working copy.)

G. Branden Robinson <gbranden>
Group administrator
Wed 05 Apr 2023 09:44:17 PM UTC, comment #3: 

comment #1:

> There's no differentiation between input and output in that snippet, so in case anyone is confused by it, Branden must have typed a ^D after the .pl line.


Yep--I pasted my shell session and moved on without thinking much about readability.  Whoops!

comment #2:

> This problem is not limited to characters in the ASCII range; it seems to apply to any Latin-1 (groff's native input encoding) character.  (The following uses a Latin-1-encoded input file and a Latin-1 output environment.)


> $ cat rchar_test
> .nf
> äbc
> .rchar ä
> äbc
> .pl \n(nlu
> $ nroff -ww rchar_test
> äbc
> äbc


I think this is because the printable characters in the Unicode Latin-1 supplement (U+00A0..U+00FF) are first-class citizens to groff.  (Because CCSID ["code page"] 1047 is a rearrangement of ISO 8859 Latin-1, and because GNU troff is compiled expecting one or the other as its input encoding, the same characters are first-class citizens in it despite their different code points.)

The planned (but unscheduled) migration to accept UTF-8 input will abandon that support in favor of being able to interpret UTF-8 multiple sequences.

Anyway, as a bit of status, I hit an impediment to implementing this.  Almost everything in the tree is fine with it; all but one automated test passes.  The exception is something internal to the mom(7) package which attempts to remove a whole bunch of ordinary ASCII/Basic Latin characters.  So this is on hold pending my exploration of mom internals and a discussion with Peter Schaffter over alternative solutions or whether, in fact, what mom is doing today should block this change.

G. Branden Robinson <gbranden>
Group administrator
Wed 05 Apr 2023 01:02:12 AM UTC, comment #2: 

This problem is not limited to characters in the ASCII range; it seems to apply to any Latin-1 (groff's native input encoding) character.  (The following uses a Latin-1-encoded input file and a Latin-1 output environment.)

$ cat rchar_test
.nf
äbc
.rchar ä
äbc
.pl \n(nlu
$ nroff -ww rchar_test
äbc
äbc


Dave <barx>
Group Member
Tue 04 Apr 2023 10:54:31 PM UTC, comment #1: 

There's no differentiation between input and output in that snippet, so in case anyone is confused by it, Branden must have typed a ^D after the .pl line.

Dave <barx>
Group Member
Fri 31 Mar 2023 04:43:10 PM UTC, original submission:  

Attempts to undefine ordinary characters (U+0021..U+007E) silently fail.


$ nroff
.nf
abc
.rchar a
abc
.pl \n(nlu
abc
abc


They should gripe at the user instead.  This would also help users who might assume that they only need to specify a special character identifier as an argument to `rchar` or `rfschar`.  Instead you need the full special character escape sequence syntax.

G. Branden Robinson <gbranden>
Group administrator

 

(Note: upload size limit is set to 16384 kB, after insertion of the required escape characters.)

Attach Files:
   
   
Comment:
   

No files currently attached

 

Depends on the following items: None found

Items that depend on this one: None found

 

Carbon-Copy List
  • -email is unavailable- added by barx (Posted a comment)
  • -email is unavailable- added by gbranden (Submitted the item)
  •  

    There are 0 votes so far. Votes easily highlight which items people would like to see resolved in priority, independently of the priority of the item set by tracker managers.

    Only logged-in users can vote.

     

    Follow 3 latest changes.

    Date Changed by Updated Field Previous Value => Replaced by
    2023-08-10 gbranden StatusPostponed Need Info
    2023-04-05 gbranden StatusNone Postponed
        Assigned toNone gbranden

    Back to the top

    Powered by Savane 3.14-79a4.
    Corresponding source code