bugGNU roff - Bugs: bug #66143, build doesn't find Homebrew...

 
 

bug #66143: build doesn't find Homebrew uchardet on macOS 14.6.1

Submitter:  Sven Schober <sschober>
Submitted:  Thu 29 Aug 2024 06:49:22 PM UTC
   
 
Category:  General Severity:  3 - Normal
Item Group:  Documentation Status:  Fixed
Privacy:  Public Assigned to:  gbranden
Open/Closed:  Closed Planned Release:  1.24.0
* Mandatory Fields

Add a New Comment Rich Markup
   

Jump to the original submission

Tue 22 Oct 2024 12:17:30 AM UTC, comment #31: 


commit 9b426f27bc9aafd05726059f5eec488da4c6a31e
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
Date:   Sat Oct 19 12:25:54 2024 -0500

    PROBLEMS: Document macOS/Homebrew/uchardet issue.

    Add item and document workaround under "groff 1.22.4" because it likely
    affected that release as well; Bertrand added uchardet as an optional
    dependency in 2017, and the next groff release was in December 2018.

    Fixes <https://savannah.gnu.org/bugs/?66143>.  Thanks to Sven Schober
    for the report and for determining the workaround.


G. Branden Robinson <gbranden>
Group administrator
Tue 15 Oct 2024 04:44:17 PM UTC, comment #30: 

Hi Branden,
hi Dave!

Thanks for bringing this issue to a close. I was wondering what I could contribute any further to not let all these words go to waste. :)

I agree a notice in the repo probably is the best resolution.

Cheers
Sven

Sven Schober <sschober>
Mon 14 Oct 2024 08:29:38 PM UTC, comment #29: 


comment #28:

> comment #24:
> > > And I will try to
> > > think about a proposal on what to do. (I mean, after all, it could
> > > boil down to a INSTALL.REPO sentence: "Beware, when on macos and using
> > > homebrew, do not forget to set HOMEBREW_ISYSTEM_PATHS during
> > > configure.")
> >
> > It would be more an issue for the "PROBLEMS" file, but yes.  We can
> > document our way around it.
>
> This exchange says to me that rather than Rejected or Invalid, it should be reopened and classified as a documentation issue.


Ah, fair point.  And that also tells me where to look for language to put in "PROBLEMS".

Thanks, Dave!  Reopening and fiddling.

G. Branden Robinson <gbranden>
Group administrator
Mon 14 Oct 2024 08:24:01 PM UTC, comment #28: 

comment #24:

> > And I will try to
> > think about a proposal on what to do. (I mean, after all, it could
> > boil down to a INSTALL.REPO sentence: "Beware, when on macos and using
> > homebrew, do not forget to set HOMEBREW_ISYSTEM_PATHS during
> > configure.")
>
> It would be more an issue for the "PROBLEMS" file, but yes.  We can
> document our way around it.


This exchange says to me that rather than Rejected or Invalid, it should be reopened and classified as a documentation issue.

Dave <barx>
Group Member
Mon 14 Oct 2024 06:04:05 PM UTC, comment #27: 

Would "Invalid" be a better status?  I can't decide.

Anyway, forgot to close. Closing.

G. Branden Robinson <gbranden>
Group administrator
Mon 14 Oct 2024 06:03:05 PM UTC, comment #26: 

I'm resolving this as "Rejected" because apparently this comes down to a difference of opinion between me and the Homebrew developers, or the uchardet, uh, "bottlers" over where that library's header file should be found.

My opinion is that it should be sought here:


/usr/include/uchardet/uchardet.h


That's where Debian keeps it, and therefore probably Ubuntu and between them a whole boatload of derivative distributions.

It is also the reasonable place to keep header files for a library with a version number of "0.0.7" (I explain why in comment #22).  That release was more recent than I expected; the Debian packager prepared it in May 2020.

The uchardet developers may in fact feel differently, and I'm interested to know if they've gone on the record thus.  I'd also add that a good way to document such a decision, along with the rest of the API, is to write a man page for the library.  That uchardet(3) doesn't exist is a bug.

I believe Sven has a workaround documented in this ticket's history; I'd appreciate it if he'd follow up pointing to or summarizing it.  I don't want macOS users of groff to suffer, and I am not pleased that they are adversely affected by the foregoing difference of opinion.  In case it need be said, if I didn't think my view was on solid ground, I'd change it.

G. Branden Robinson <gbranden>
Group administrator
Sun 08 Sep 2024 11:02:00 AM UTC, comment #25: 

Ineiev says it's fixed...

https://savannah.nongnu.org/support/?111121

Let's find out!

G. Branden Robinson <gbranden>
Group administrator
Sun 08 Sep 2024 05:31:04 AM UTC, comment #24: 

Well, within a day of my publicly singing the praises of Savannah's email-based ticket reply feature, it broke.

I've tried twice to update this ticket by that means and nothing happens.

en,

At 2024-09-07T05:06:11-0400, Sven Schober wrote:

> Follow-up Comment #23, bug #66143 (group groff):
> Thanks again for your very long and elaborate answer!


Can't really help myself!  😅

> > I'm assuming that Autoconf doesn't pick it up, hence the failure on
> > macOS to find the uchardet library's header file.
>
> I did not express myself clear enough: I was talking about the
> homebrew build environment. I can enter it via `brew reinstall -i
> groff`. And it was in that env, that I saw these env variables. And
> there-in I saw that /opt/homebrew/include was "put onto the include
> paths list", meaning, a -I/opt/homebrew/include was appended to the
> clang invocation.
>
> I just followed up on the question how this homebrew specific variable
> is creating any effect:
>
>
> …/tmp/groff-20240905-40241-pj7zgj/groff-1.23.0
> $ grep -i 'HOMEBREW_ISYSTEM_PATHS' -R .
> $
>
>
> Nothing. Ok, I'd half expected that. Why would Autoconf consider such
> a homebrew specific variable? But the question remained...


As far as I know, Autoconf doesn't consider it.

It certainly doesn't show up in the Autoconf-generated files on my
system, including "configure", which contains all the feature tests for
the groff build.

> So, homebrew sets HOMEBREW_ISYSTEM_PATHS in its build env and the shim
> puts it onto the include path list. (Btw. googling for
> HOMEBREW_ISYSTEM_PATHS yields exactly two hits for me, none of which
> are documentation, and none of which were helpful to me.)


My hunch is that HOMEBREW_ISYSTEM_PATHS is a blind alley.

> > No, because what if a distribution puts the "uchardet" directory in
> > a weird place that isn't "/usr/include"?
>
> Yes, like, exactly what homebrew does. :) I was speaking only in the
> context of groff/preconv and it's dependency on uchardet (the "whole
> business" wording was unfortunate). (I would not question the
> motivation and legitimacy of pkg-config in general.)
>
> I am slowly getting the impression, that /usr/include could be
> considered as a distribution managed include path.


That meshes with my understanding.

> (And homebrew is trying to emulate that with /opt/homebrew/include.)


Seems likely, given that if they mess around with /usr/include, they
risk their stuff being stomped on by Apple.

> I read up in the FHS [1] on /usr/include:
>
> > This is where all of the system's general-use include files for the
> > C programming language should be placed.
>
> Mhm, by whom? How is general-use defined?


Don't expect too much of FHS.  It was largely put together by
volunteers.  Standards-writing seems to work best as a compensated
activity.  Which is why tech companies avoid underwriting it.  "How is
this going to make us money?"

Neither is it something the Linux Foundation has elected to spend money
on.

> >> I always understood the c/c++ inclusion mechanism to be simply a
> >> matter of trying all include dirs,
> >
> > Where "all" is probably only "/usr/include", yes.  I don't know off
> > the top of my head how strictly ISO C prescribes this.
>
> I was including (haha) in my head also stuff given via -I on the
> command-line here.


That describes a typical compilation just fine.

Standardized processes are typically a very small subset of what one
encounters in practice.  A standard is the intersection of the
relatively small number of things that practitioners can agree upon.

> > One thing I do know about that standard is that libc header files,
> > meaning those specified by the standard and locatable via "#include
> > <whatever.h>", do not actually have to exist as files.
>
> > The compiler is free to interpret such standard header inclusions in
> > a way that they supply some equivalent to the text of such files.
> > This matters for embedded systems, and is in part a consequence of
> > the fact that ISO C does not take on the responsibility of
> > specifying what a file system is.
>
> Ugh. This, eh, ..., I am having a hard time imagining, what this could
> mean in this context. Probably, this concerns stuff like <stdio.h>,
> <string.h> or such, right?


Right.  I said, "libc header files, meaning those specified by the [ISO
C] standard"...

> But for <uchardet.h> it would probably be safe to assume it is a file?


...which "uchardet.h" is not, so yes, I'd expect it to exist as a plain
file.

I said what I did to avoid saying something so general that it would be
false.

> > I reach the opposite conclusion.  It's pretty obvious that the
> > uchardet library API is wholly unspecified, including the location
> > of its header file.  (I do not say undefined--once you do locate
> > "uchardet.h", it has valid C language declarations.)
>
> My problem would not have arisen at least, but that is surely a very
> limited perspective. But also on other distributions, it would have
> worked, as pkg-config would have found uchardet and put the correct
> path on the include path list. That was my thinking.


In my opinion, Homebrew should alter its "uchardet.pc" file to support
groff's expectation about where "uchardet.h" is located, because as your
research showed, other projects share that expectation.

An alternative is for the uchardet project to document their
expectations, thereby establishing a specified API.  If that then
means that groff needs to change (and to be fair, since as I said,
uchardet's de facto API seems to be fossilized, there seems to be little
prospect of it splitting out a new header file soon), then we can do
adapt.

If they need advice in writing a uchardet(3) man page, I'm prepared to
offer it.

> I have to admit, I often have to read your answers multiple times to
> extract all your meaning from it. :) I will do that. And I will try to
> think about a proposal on what to do. (I mean, after all, it could
> boil down to a INSTALL.REPO sentence: "Beware, when on macos and using
> homebrew, do not > forget to set HOMEBREW_ISYSTEM_PATHS during
> configure.")


It would be more an issue for the "PROBLEMS" file, but yes.  We can
document our way around it.

Regards,
Branden

> {savane: user = 108747; tracker = bugs; item = 66143}

G. Branden Robinson <gbranden>
Group administrator
Sat 07 Sep 2024 09:06:07 AM UTC, comment #23: 

Hi Branden!

Thanks again for your very long and elaborate answer!

>> But, I ask myself, why this is not necessary in the homebrew env.
>>
>> I found the following:
>>
>>
>> env | grep include
>> HOMEBREW_INCLUDE_PATHS=/opt/homebrew/opt/readline/include:/opt/homebrew/opt/sqlite/include
>> HOMEBREW_ISYSTEM_PATHS=/opt/homebrew/include:/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk/System/Library/Frameworks/OpenGL.framework/Versions/Current/Headers
>>
>> It seems, they set a variable called HOMEBREW_ISYSTEM_PATHS. But how configure
>> gets to pick that up is still a riddle to me.
>
> I'm assuming that Autoconf doesn't pick it up, hence the failure on
> macOS to find the uchardet library's header file.


I did not express myself clear enough: I was talking about the homebrew build environment. I can enter it via `brew reinstall -i groff`. And it was in that env, that I saw these env variables. And there-in I saw that /opt/homebrew/include was "put onto the include paths list", meaning, a -I/opt/homebrew/include was appended to the clang invocation.

I just followed up on the question how this homebrew specific variable is creating any effect:


…/tmp/groff-20240905-40241-pj7zgj/groff-1.23.0
$ grep -i 'HOMEBREW_ISYSTEM_PATHS' -R .
$


Nothing. Ok, I'd half expected that. Why would Autoconf consider such a homebrew specific variable? But the question remained...


$ which clang++
/opt/homebrew/Library/Homebrew/shims/mac/super/clang++


Aha! A shim! A mac super shim. :)


grep -i 'ISYSTEM_PATHS' /opt/homebrew/Library/Homebrew/shims/mac/super/clang++ -C5
    end
  end

  def refurbished_args
    @lset = Set.new(library_paths + system_library_paths)
    @iset = Set.new(isystem_paths + include_paths)

    args = []
    enum = @args.each

    loop do
--
    args
  end

  def cppflags
    args = []
    args += path_flags("-isystem", isystem_paths) + path_flags("-I", include_paths)
    # Add -nostdinc when building against glibc@2.13 to avoid mixing system and brewed glibc headers.
    args << "-nostdinc" if @deps.include?("glibc@2.13")
    # Ideally this would be -ffile-prefix-map, but that requires a minimum of GCC 8, LLVM Clang 10 or Apple Clang 12
    # and detecting the version dynamically based on what `HOMEBREW_CC` may have been rewritten to point to is awkward
    args << "-fdebug-prefix-map=#{formula_buildpath}=." if formula_buildpath
--
    else
      args
    end
  end

  def isystem_paths
    path_split("HOMEBREW_ISYSTEM_PATHS")
  end

  def include_paths
    path_split("HOMEBREW_INCLUDE_PATHS")
  end


So, homebrew sets HOMEBREW_ISYSTEM_PATHS in its build env and the shim puts it onto the include path list. (Btw. googling for HOMEBREW_ISYSTEM_PATHS yields exactly two hits for me, none of which are documentation, and none of which were helpful to me.)

>> But doesn't this mean, my point still stands: the whole pkg-config
>> business is superfluous,
>
> No, because what if a distribution puts the "uchardet" directory in a
> weird place that isn't "/usr/include"?


Yes, like, exactly what homebrew does. :) I was speaking only in the context of groff/preconv and it's dependency on uchardet (the "whole business" wording was unfortunate). (I would not question the motivation and legitimacy of pkg-config in general.)

I am slowly getting the impression, that /usr/include could be considered as a distribution managed include path. (And homebrew is trying to emulate that with /opt/homebrew/include.) I read up in the FHS [1] on /usr/include:

> This is where all of the system's general-use include files for the C programming language should be placed.


Mhm, by whom? How is general-use defined?

>> I always understood the c/c++ inclusion mechanism to be simply a
>> matter of trying all include dirs,
>
> Where "all" is probably only "/usr/include", yes.  I don't know off the
> top of my head how strictly ISO C prescribes this.


I was including (haha) in my head also stuff given via -I on the command-line here.

> One thing I do know about that standard is that libc header files,
> meaning those specified by the standard and locatable via "#include
> <whatever.h>", do not actually have to exist as files.


> The compiler is free to interpret such standard header inclusions in a
> way that they supply some equivalent to the text of such files.  This
> matters for embedded systems, and is in part a consequence of the fact
> that ISO C does not take on the responsibility of specifying what a file
> system is.


Ugh. This, eh, ..., I am having a hard time imagining, what this could mean in this context. Probably, this concerns stuff like <stdio.h>, <string.h> or such, right? But for <uchardet.h> it would probably be safe to assume it is a file?

>> In summary, I don't know. I now know how to fix my build problem, but
>> I have a hunch, that removing the directory prefix would be a
>> portability win.
>
> I reach the opposite conclusion.  It's pretty obvious that the
> uchardet library API is wholly unspecified, including the location of
> its header file.  (I do not say undefined--once you do locate
> "uchardet.h", it has valid C language declarations.)


My problem would not have arisen at least, but that is surely a very limited perspective. But also on other distributions, it would have worked, as pkg-config would have found uchardet and put the correct path on the include path list. That was my thinking.

I have to admit, I often have to read your answers multiple times to extract all your meaning from it. :) I will do that. And I will try to think about a proposal on what to do. (I mean, after all, it could boil down to a INSTALL.REPO sentence: "Beware, when on macos and using homebrew, do not forget to set HOMEBREW_ISYSTEM_PATHS during configure.")

[1]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s05.html

Sven Schober <sschober>
Sat 07 Sep 2024 05:09:12 AM UTC, comment #22: 

At 2024-09-06T15:03:52-0400, Sven Schober wrote:

> Follow-up Comment #21, bug #66143 (group groff):
>
> So, me again: I found the problem! I called make with CPPFLAGS=-v,
> which displays the exact command line that is executed, in both my
> environments (homebrew and outside).
>
> Inside the homebrew environment, there is a single additional include
> path added: /opt/homebrew/include
>
> And lo and behold:
>
>
> ls -ahl /opt/homebrew/include/ | grep uchar
> lrwxr-xr-x    1 svenschober  admin    41B Feb 13  2023 uchardet ->
> ../Cellar/uchardet/0.0.8/include/uchardet


Aha!

> That is a symlink to the other place, where uchardet is installed,
> and more importantly, that is why an include of the form
> `<uchardet/uchardet.h>` is working:
> `/opt/homebrew/include/uchardet/uchardet.h` is existant.
>
> But not when that include path is missing.


Right.

> I can add that path via:
>
> ../configure --without-x --with-uchardet CPPFLAGS="-I/opt/homebrew/include"
>
>
> But, I ask myself, why this is not necessary in the homebrew env.
>
> I found the following:
>
>
> env | grep include
> HOMEBREW_INCLUDE_PATHS=/opt/homebrew/opt/readline/include:/opt/homebrew/opt/sqlite/include
> HOMEBREW_ISYSTEM_PATHS=/opt/homebrew/include:/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk/System/Library/Frameworks/OpenGL.framework/Versions/Current/Headers
>
> It seems, they set a variable called HOMEBREW_ISYSTEM_PATHS. But how configure
> gets to pick that up is still a riddle to me.


I'm assuming that Autoconf doesn't pick it up, hence the failure on
macOS to find the uchardet library's header file.

> But doesn't this mean, my point still stands: the whole pkg-config
> business is superfluous,


No, because what if a distribution puts the "uchardet" directory in a
weird place that isn't "/usr/include"?

> as long as the cpp source references
> `<uchardet/uchardet.h>`?


That just moves the problem.  "uchardet.h" or "uchardet/uchardet.h", if
the system doesn't supply it in /usr/include, odds are that pkg-config
will be necessary to locate it.  And nobody, as far as I know is so
prescriptivist about the Filesystem Hierarchy Standard (FHS) that they
require that .h files have names resolvable only from that directory.

> I always understood the c/c++ inclusion mechanism to be simply a
> matter of trying all include dirs,


Where "all" is probably only "/usr/include", yes.  I don't know off the
top of my head how strictly ISO C prescribes this.

One thing I do know about that standard is that libc header files,
meaning those specified by the standard and locatable via "#include
<whatever.h>", do not actually have to exist as files.

The compiler is free to interpret such standard header inclusions in a
way that they supply some equivalent to the text of such files.  This
matters for embedded systems, and is in part a consequence of the fact
that ISO C does not take on the responsibility of specifying what a file
system is.

> one by one and append the inclusion path.


For just about anything that uses any library that isn't libc, the
specification of `-I` is somewhere from strongly encouraged to
mandatory.  /usr/include itself has been getting more crowded with plain
files for decades; as a result the name space collision problem
re-asserts itself.

> If there is no match, we get the error I am getting.  But I could be
> wrong.


I think you're mostly on track.

> I did some further digging and found another project using uchardet
> via uchardets website:
>
> https://github.com/mpv-player/mpv/blob/0eb5e914d9699b6f7fb91ee383dedadc491b0d7d/misc/charset_conv.c#L67
>
> They also include it without the directory. In this old branch of
> another project the same:
>
> https://gitlab.gnome.org/World/gedit/libgedit-tepl/-/blob/tepl-3-0/tepl/tepl-file-loader.c
>
> But, I also found counter examples (using the qualified form):
>
> https://github.com/wang-bin/QtAV/blob/8bb780215bcd4a16d098a2a913d01f83b16193d7/config.tests/uchardet/main.cpp#L21
>
> and
>
> https://github.com/eranif/codelite/blob/345332fca8ae6d29ca6615dccec81d92525bd6f1/LiteEditor/cl_editor.cpp#L102


Right.  The uchardet authors/maintainers didn't document their API, so
people took guesses.  I suspect that users of that API selected whatever
happened to work on their system, if the first try didn't work, the
second usually did, so "must have been right".

> In summary, I don't know. I now know how to fix my build problem, but
> I have a hunch, that removing the directory prefix would be a
> portability win.


I reach the opposite conclusion.  It's pretty obvious that the
uchardet library API is wholly unspecified, including the location of
its header file.  (I do not say undefined--once you do locate
"uchardet.h", it has valid C language declarations.)

This unspecified status is common in the early stages of library
development.  The developers don't want to be wedded to mistaken
decisions or held responsible for support of aspects of the interface
that they had reason to reconsider.  They want to be free to delete
functions, change the arguments they take, rename data types, and even
merge or split header files.

Importantly, if the uchardet developers had decided in early days to
split "uchardet.h" into multiple files--maybe they'd want a "uchar.h"
file too--it would be a pain in the butt to go and try to hunt up name
collisions to see if they could claim a particular name.

I reckon that at least some package maintainers, especially those with
hard experience in GNU/Linux distributions, appreciated that the API was
in flux, and so they gave uchardet's header file a sandbox directory.

If someone were to ask:

"Why would you have a subdirectory of /usr/include for just one header
file with the same name (barring the .h suffix) as the directory
itself?"

That'd be why.  People downstream of the uchardet developers had no
idea what the future would hold.

It appears that uchardet began life as a paid development project of
employees of the Mozilla Corporation.  It's possible that these
developers were suddenly "made redundant" and thus "uchardet.h"
"stabilized"--by fossilizing as it was when the pink slips came down,
like the half-finished breakfasts and open newspapers on the kitchen
tables of Pripyat, Ukraine, in April 1986.

That's tech sector capitalism for you--destruction is creative and while
you can't turn a ship on a dime, you can fire people so fast that the
optical Doppler shift is visible to the naked eye.

> Otherwise, why use pkg-config then after all?


To an extent, pkg-config insulates software dependent on a library
from disruptive changes to that library's interface at the coarsest
level (where the header files and library objects are, and what they are
named).

There was, and is, an alternative--writing Autoconf tests to figure
these things out.  But few people find that more reliable, or enjoyable.

Regards,
Branden

G. Branden Robinson <gbranden>
Group administrator
Fri 06 Sep 2024 07:03:49 PM UTC, comment #21: 

So, me again: I found the problem! I called make with CPPFLAGS=-v, which displays the exact command line that is executed, in both my environments (homebrew and outside).

Inside the homebrew environment, there is a single additional include path added: /opt/homebrew/include

And lo and behold:


ls -ahl /opt/homebrew/include/ | grep uchar
lrwxr-xr-x    1 svenschober  admin    41B Feb 13  2023 uchardet -> ../Cellar/uchardet/0.0.8/include/uchardet


That is a symlink to the other place, where uchardet is installed, and more importantly, that is why an include of the form `<uchardet/uchardet.h>` is working: `/opt/homebrew/include/uchardet/uchardet.h` is existant.

But not when that include path is missing.

I can add that path via:


../configure --without-x --with-uchardet CPPFLAGS="-I/opt/homebrew/include"


But, I ask myself, why this is not necessary in the homebrew env.

I found the following:


env | grep include
HOMEBREW_INCLUDE_PATHS=/opt/homebrew/opt/readline/include:/opt/homebrew/opt/sqlite/include
HOMEBREW_ISYSTEM_PATHS=/opt/homebrew/include:/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk/System/Library/Frameworks/OpenGL.framework/Versions/Current/Headers


It seems, they set a variable called HOMEBREW_ISYSTEM_PATHS. But how configure gets to pick that up is still a riddle to me.

But doesn't this mean, my point still stands: the whole pkg-config business is superfluous, as long as the cpp source references `<uchardet/uchardet.h>`?

I always understood the c/c++ inclusion mechanism to be simply a matter of trying all include dirs, one by one and append the inclusion path. If there is no match, we get the error I am getting. But I could be wrong.

I did some further digging and found another project using uchardet via uchardets website:

    https://github.com/mpv-player/mpv/blob/0eb5e914d9699b6f7fb91ee383dedadc491b0d7d/misc/charset_conv.c#L67

They also include it without the directory. In this old branch of another project the same:

    https://gitlab.gnome.org/World/gedit/libgedit-tepl/-/blob/tepl-3-0/tepl/tepl-file-loader.c

But, I also found counter examples (using the qualified form): 

    https://github.com/wang-bin/QtAV/blob/8bb780215bcd4a16d098a2a913d01f83b16193d7/config.tests/uchardet/main.cpp#L21

and

    https://github.com/eranif/codelite/blob/345332fca8ae6d29ca6615dccec81d92525bd6f1/LiteEditor/cl_editor.cpp#L102

In summary, I don't know. I now know how to fix my build problem, but I have a hunch, that removing the directory prefix would be a portability win. Otherwise, why use pkg-config then after all?

Sven Schober <sschober>
Thu 05 Sep 2024 08:30:53 PM UTC, comment #20: 

Hi Branden!

Sorry to keep you so busy with my strange problem and or setup.

But at least I am learning new stuff! I did not know about:
 

make V=1 preconv


That is great, and there are many situations, where I wish I had known this.

So, I went ahead and investigated the homebrew build process some more. Turns it, there is a way to download the sources and get a shell just before the recipe, or formula how they call it, is being executed. So, I could follow the recipe manually and inspect what is happening.

First observation, the sources build untouched (surprise). This is what I see:


…/tmp/groff-20240905-40241-pj7zgj/groff-1.23.0
$  touch src/preproc/preconv/preconv.cpp

…/tmp/groff-20240905-40241-pj7zgj/groff-1.23.0
$ make V=1 preconv
clang++ -DHAVE_CONFIG_H -I. -I./src/include  -I./src/include -I./lib -I./src/include -I./lib -I/opt/homebrew/Cellar/uchardet/0.0.8/include/uchardet   -g -O2 -MT src/preproc/preconv/preconv-preconv.o -MD -MP -MF src/preproc/preconv/.deps/preconv-preconv.Tpo -c -o src/preproc/preconv/preconv-preconv.o `test -f 'src/preproc/preconv/preconv.cpp' || echo './'`src/preproc/preconv/preconv.cpp
mv -f src/preproc/preconv/.deps/preconv-preconv.Tpo src/preproc/preconv/.deps/preconv-preconv.Po
clang++  -g -O2   -o preconv src/preproc/preconv/preconv-preconv.o libgroff.a -lm -liconv -L/opt/homebrew/Cellar/uchardet/0.0.8/lib -luchardet lib/libgnu.a


So, clang++ is the compiler and pkg-config output is added to the include paths.

Then, I wanted to see which headers are really considered and learned about the -H flag:


CPPLAGS=-H make V=1 CPPFLAGS=-H 2>&1 | grep uchardet
clang++ -DHAVE_CONFIG_H -I. -I./src/include  -I./src/include -I./lib -I./src/include -I./lib -I/opt/homebrew/Cellar/uchardet/0.0.8/include/uchardet -H  -g -O2 -MT src/preproc/preconv/preconv-preconv.o -MD -MP -MF src/preproc/preconv/.deps/preconv-preconv.Tpo -c -o src/preproc/preconv/preconv-preconv.o `test -f 'src/preproc/preconv/preconv.cpp' || echo './'`src/preproc/preconv/preconv.cpp
. /opt/homebrew/include/uchardet/uchardet.h
clang++  -g -O2   -o preconv src/preproc/preconv/preconv-preconv.o libgroff.a -lm -liconv -L/opt/homebrew/Cellar/uchardet/0.0.8/lib -luchardet lib/libgnu.a


The line with the dot shows, that indeed the uchardet.h put on the include path is taken.

Then, I inspected with my newly acquired V=1 skill, what is happening on my machine (outside of the homebrew build and again with unaltered sources):


make V=1 preconv
g++ -std=gnu++11 -DHAVE_CONFIG_H -I. -I.. -I./src/include  -I../src/include -I../lib -I./src/include -I./lib -I/opt/homebrew/Cellar/uchardet/0.0.8/include/uchardet   -g -O2 -MT src/preproc/preconv/preconv-preconv.o -MD -MP -MF src/preproc/preconv/.deps/preconv-preconv.Tpo -c -o src/preproc/preconv/preconv-preconv.o `test -f 'src/preproc/preconv/preconv.cpp' || echo '../'`src/preproc/preconv/preconv.cpp
../src/preproc/preconv/preconv.cpp:30:10: fatal error: 'uchardet/uchardet.h' file not found
#include <uchardet/uchardet.h>
         ^~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [src/preproc/preconv/preconv-preconv.o] Error 1


AHA! The compiler is g++ (I remember that I installed it, when I was experimenting with c++ modules some time ago)!

So, what happens when I switch this to clang++?:


make V=1 CXX=clang++ preconv
clang++ -DHAVE_CONFIG_H -I. -I.. -I./src/include  -I../src/include -I../lib -I./src/include -I./lib -I/opt/homebrew/Cellar/uchardet/0.0.8/include/uchardet   -g -O2 -MT src/preproc/preconv/preconv-preconv.o -MD -MP -MF src/preproc/preconv/.deps/preconv-preconv.Tpo -c -o src/preproc/preconv/preconv-preconv.o `test -f 'src/preproc/preconv/preconv.cpp' || echo '../'`src/preproc/preconv/preconv.cpp
../src/preproc/preconv/preconv.cpp:30:10: fatal error: 'uchardet/uchardet.h' file not found
#include <uchardet/uchardet.h>
         ^~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [src/preproc/preconv/preconv-preconv.o] Error 1


So, this is now the time when I am slowly starting to doubt my sanity. I don't see any differences in the invocations. The clang++ versions called outside homebrew and inside are the same.


Sven Schober <sschober>
Thu 05 Sep 2024 07:17:26 PM UTC, comment #19: 

comment #6:

> For my uchardet header include problem: My package config file reads like this:
>


> cat /opt/homebrew/lib/pkgconfig/uchardet.pc
> libdir=/opt/homebrew/Cellar/uchardet/0.0.8/lib
> includedir=/opt/homebrew/Cellar/uchardet/0.0.8/include
>
> Name: uchardet
> Description: An encoding detector library ported from Mozilla
> Version: 0.0.8
> Requires:
> Libs: -L${libdir} -luchardet
> Libs.private: -lstdc++
> Cflags: -I${includedir}/uchardet


>
> And it seems, this is unaltered from the sources [1]. My pkg-config directly picks this up and emits the following cflags:


Ah, I see I managed to circle back to where you were in comment #6 and had lost track of where this ticket had been.

I went to uchardet's upstream repository and cannot find any documentation of the library API; the closest thing to an example of its use that I could locate is not useful because it expects the "uchardet.h" header to be within the same source tree.

For practical purposes, this library is undocumented.  I feel that to be a significant defect.


G. Branden Robinson <gbranden>
Group administrator
Thu 05 Sep 2024 07:22:29 AM UTC, comment #18: 

Sorry, disordered recipe in comment #17.  (I had some false starts in my shell history and edited it clumsily.)


$ rm -rf build
$ mkdir build
$ ./bootstrap
$ cd build
$ UCHARDET_CFLAGS=-I/hi/sven/here/is/an/example ../configure
$ make -j
$ rm preconv ./src/preproc/preconv/*.o
$ make V=1 preconv


G. Branden Robinson <gbranden>
Group administrator
Thu 05 Sep 2024 07:18:09 AM UTC, comment #17: 

comment #16:

> Thank you for your long answer and background information on C and C++!
>
> > To be honest, I didn't try,
>
> I am a bit sad - but I fully understand your hesitancy.


Sorry, I'm the kind of guy whose first instinct is to offer a graduate seminar in the history of fishing before teaching someone to fish, let alone to think to feed myself or others from the bucket full of slowly putrefying fish next to me.  ;-)
 

> I will try and channel my emotional energy into trying to find out, why and how the homebrew build of groff works. I have a suspicion it is related to setting the prefix explicitly [1].


> [1]: https://github.com/Homebrew/homebrew-core/blob/ece78c067d1462730eb0602077316f5ce76d15ab/Formula/g/groff.rb#L39


I see nothing amiss there.  The `--prefix` flag determines whither components of groff will be installed.  It has little or no influence on where dependencies of the build are sought.

> I am hard pressed of guessing, what that variable is set tot during build, so I will have to try it out.


If a variable is what you're seeking, that I can offer.  I just did the following.  (I build in a subdirectory, as documented in our "INSTALL.REPO" file.)

You might do the same.  Try this.


$ rm -r build
$ mkdir build
$ cd build
$ ./bootstrap
$ UCHARDET_CFLAGS=-I/hi/sven/here/is/an/example ../configure
$ make -j
$ rm preconv ./src/preproc/preconv/*.o
$ make V=1 preconv


The build succeeded for me in spite of the mischief I got up to, so I deleted preconv's executable and object file, and rebuilt verbosely ("V=1") to prove that the overridden `UCHARDET_CFLAGS` was being used.  And indeed it was.

> Honestly, I do not fully understand, what you mean by this. But maybe my investigation of homebrews own build process will further my understanding.


Libraries that support pkg-config for themselves tell the pkg-config command what to say by means of a ".pc" file.

Example:


$ cat /usr/lib/x86_64-linux-gnu/pkgconfig/libsasl2.pc
prefix=/usr
exec_prefix=/usr
libdir=/usr/lib/x86_64-linux-gnu
includedir=/usr/include

Name: Cyrus SASL
Description: Cyrus SASL implementation
URL: http://www.cyrussasl.org/
Version: 2.1.27
Cflags: -I${includedir}
Libs: -L${libdir} -lsasl2
Libs.private:  -ldl -lresolv


My hypothesis is that the "Cflags:" line in Homebrew uchardet's ".pc" file is incorrect.
 

> My goal is simply to be able to build groff from source on macos with homebrew without needing to patch the groff sources.


Let me know if the foregoing "UCHARDET_CFLAGS" environment variable trick works.

> If it is homebrew, which is at fault, I will certainly open a ticket there. :)


I firmly believe that it should not be necessary to specify "UCHARDET_FLAGS" to get groff to build on macOS, even if that tactic works.

I'm prepared, with my barrels of ink tapped and ready to flow like draft beer, to argue the point with Homebrew developers, if necessary.

A guy on the groff list named John Gardner has long experience with building groff on Homebrew.  That is one reason I suggested subscribing to the list.  There are doubtless others lurking.

G. Branden Robinson <gbranden>
Group administrator
Thu 05 Sep 2024 06:37:49 AM UTC, comment #16: 

Hi Branden!

Thank you for your long answer and background information on C and C++!

> To be honest, I didn't try,


I am a bit sad - but I fully understand your hesitancy.

I will try and channel my emotional energy into trying to find out, why and how the homebrew build of groff works. I have a suspicion it is related to setting the prefix explicitly [1]. I am hard pressed of guessing, what that variable is set tot during build, so I will have to try it out.

> because I've concluded that Homebrew is misconfiguring uchardet's pkg-config output.


Honestly, I do not fully understand, what you mean by this. But maybe my investigation of homebrews own build process will further my understanding.

> Feel free to direct Homebrew developers to this ticket.  Maybe I'm wrong.


My goal is simply to be able to build groff from source on macos with homebrew without needing to patch the groff sources. If it is homebrew, which is at fault, I will certainly open a ticket there. :)

Thank you again for the time you take to address my problem!

[1]: https://github.com/Homebrew/homebrew-core/blob/ece78c067d1462730eb0602077316f5ce76d15ab/Formula/g/groff.rb#L39

Sven Schober <sschober>
Wed 04 Sep 2024 11:32:52 PM UTC, comment #15: 

comment #14:

> Hi Branden!
>
> To answer your questions first:
>


> grep UCHARDET_ config.status
> S["UCHARDET_LIBS"]="-L/opt/homebrew/Cellar/uchardet/0.0.8/lib -luchardet"
> S["UCHARDET_CFLAGS"]="-I/opt/homebrew/Cellar/uchardet/0.0.8/include/uchardet"


Okay.
 

> and
>


> pkg-config --cflags   "uchardet"
> -I/opt/homebrew/Cellar/uchardet/0.0.8/include/uchardet
> pkg-config --libs   "uchardet"
> -L/opt/homebrew/Cellar/uchardet/0.0.8/lib -luchardet


Okay.  So our build variables are being faithfully populated by the output of pkg_config.  That's good!
 

> So, I think this is equivalent to your path, just with a different prefix.


Looks that way to me, too.
 

> But, in the mean time I no longer think pkg-config, or autotools is at fault here.
>
> My somewhat bold claim would be, that the #include statement with a subdirectory is simply wrong and works only accidentally on systems where /usr/include is on the default search path.


I disagree.  Remember that C is a language without name spaces.  So is C++, when it comes to names of header files.  (Or, at most, there are two name spaces--the "system" name space accessed with `#include <whatever.h>` and the "local" name space accessed with `#include "whatever.h"`.

Even if every library on your system has complete and accurate pkg-config coverage, that tool cannot control the order in which `-I`-included directories are searched.

And that creates a problem.

What if, as is likely in C, multiple libraries exist with their own header file for fairly basic data structures?

list.h
stack.h
dictionary.h

Or other fairly common concerns?

db.h
charenc.h

Which one the preprocessor will find depends on `-I` flag ordering.

Thus a tradition arose fairly early in the days of C programming--after the curses library revealed the full horror of the Bell Labs CSRC's failure to contemplate name space matters when designing the language (ironically enough, a matter of "not planning for success")--of stuffing header files for things other than the C standard library into subdirectories named for the project/package, leveraging the hierarchical structure of the file system for a crude but effective name spacing system.

Thus, to take another hoary and early example:

#include <X11/Core.h>
#include <X11/Intrinsics.h>
#include <X11/Xos.h>

If pkg-config existed back then, should it have directed people to rip the "X11/"s out of these directives, trusting that `pkg-config --cflags libx11` and `pkg-config --cflags libxt` would suffice to locate these header files uniquely?

No.

> Works fine on both systems, as only now the pkg-config generated includes are heeded. What do you think? Can you reproduce this on your systems?


To be honest, I didn't try, because I've concluded that Homebrew is misconfiguring uchardet's pkg-config output.

Feel free to direct Homebrew developers to this ticket.  Maybe I'm wrong.

And I could be, because I do note my own comment #13.


> $ pkg-config --cflags uchardet
-I/usr/include/uchardet


But I further note that uchardet offers no man page documenting the API, of which the full #include directive is a part in C and C++.  As a programmer, technical writer, and grumpy old man, my opinion of that lacuna can be guessed.  That `pkg-config --cflags uchardet` is working on my system is no endorsement if that fact is due to chance.

Alternatively, or in addition, we have some macOS, and Homebrew, users on the groff mailing list.  So you might mail groff@gnu dot org and ask for opinions from people.  Also please consider subscribing to that list if you're not already.

G. Branden Robinson <gbranden>
Group administrator
Wed 04 Sep 2024 06:15:11 PM UTC, comment #14: 

Hi Branden!

To answer your questions first:


grep UCHARDET_ config.status
S["UCHARDET_LIBS"]="-L/opt/homebrew/Cellar/uchardet/0.0.8/lib -luchardet"
S["UCHARDET_CFLAGS"]="-I/opt/homebrew/Cellar/uchardet/0.0.8/include/uchardet"


and


pkg-config --cflags   "uchardet"
-I/opt/homebrew/Cellar/uchardet/0.0.8/include/uchardet
pkg-config --libs   "uchardet"
-L/opt/homebrew/Cellar/uchardet/0.0.8/lib -luchardet


So, I think this is equivalent to your path, just with a different prefix.

But, in the mean time I no longer think pkg-config, or autotools is at fault here.

My somewhat bold claim would be, that the #include statement with a subdirectory is simply wrong and works only accidentally on systems where /usr/include is on the default search path.

You could check that on your system by simply removing the subdirectory from the include in src/preproc/preconv/preconv.cpp (I execute the following commands in my build subdir):


sed -i -e 's;uchardet/;;' ../src/preproc/preconv/preconv.cpp
make
make
/Library/Developer/CommandLineTools/usr/bin/make  all-recursive
  CXX      src/preproc/preconv/preconv-preconv.o
  CXXLD    preconv
  GROFF    doc/meintro_fr.ps


or on debian


sed -i -e 's;uchardet/;;' ../src/preproc/preconv/preconv.cpp
svenschober@debian:~/src/groff/build$ make
make  all-recursive
make[1]: Entering directory '/home/svenschober/src/groff/build'
make[2]: Entering directory '/home/svenschober/src/groff/build'
  CXX      src/preproc/preconv/preconv-preconv.o
  CXXLD    preconv
  GROFF    doc/meintro_fr.ps
make[2]: Leaving directory '/home/svenschober/src/groff/build'
make[1]: Leaving directory '/home/svenschober/src/groff/build'


Works fine on both systems, as only now the pkg-config generated includes are heeded. What do you think? Can you reproduce this on your systems?

Sven Schober <sschober>
Wed 04 Sep 2024 01:35:16 AM UTC, comment #13: 

Hi Sven,

comment #11:

> So, I now installed a debian bookworm distribution in a VM and checked: pkg-config is behaving the same on that system:
>


> grep -IR UCHARDET_CFLAGS Makefile
> UCHARDET_CFLAGS = -I/usr/include/uchardet
> preconv_CPPFLAGS = $(AM_CPPFLAGS) $(UCHARDET_CFLAGS)


>
> But, I think, that the build is working currently is a coincidence, as uchardet.h is installed under /usr/include/uchardet/; with /usr/include being a default search path (correct?).
>
> To test my assumption, I applied my "patch" (removing the 'uchardet' directory prefix from the include statement) to the current sources and rebuilt on debian and it still worked just fine.
>
> This suggests, in my interpretation at least, that the pkg-config provided CFLAGS is not used during a default build currently at all.


All I can really tell you is that groff itself doesn't have any logic to determine values for `UCHARDET_CFLAGS` and `UCHARDET_LIBS`: we rely upon a stock Autoconf macro for that.

Specifically, we use `PKG_CHECK_MODULES`.

https://git.savannah.gnu.org/cgit/groff.git/tree/m4/groff.m4?h=1.23.0#n1854

You should have a "config.status" file at the top of your build directory.  Here's what mine says about these variables.


$ grep UCHARDET_ ./build/config.status
S["UCHARDET_LIBS"]="-luchardet"
S["UCHARDET_CFLAGS"]="-I/usr/include/uchardet"


If the variable assignments are wrong for your system, then it seems likely that either the Autoconf macro is somehow wrong, or the uchardet package shipped (by Homebrew?) is misconfigured and produces the wrong output.  A major point of pkg-config's design, as I understand it, is to delegate the emission of compiler and linker flags to the library package itself rather than trying to detect such things via experimentation (as Autoconf historically has done), since the packager of the library is in the best position to know what these flags should be.

My system says this:


$ pkg-config --cflags uchardet
-I/usr/include/uchardet
$ pkg-config --libs uchardet
-luchardet


Do the same commands on your system produce correct results?

G. Branden Robinson <gbranden>
Group administrator
Sat 31 Aug 2024 08:42:03 PM UTC, comment #12: 

comment #10:

> > The dueling-pdftotexts problem should likely be a separate bug report.
>
> Sorry for the mess


Not at all; this is how bug reports are supposed to work.  You report your findings, but without enough info to determine whether they're related.  Subsequent research finds they're not, so new reports are opened to cover them.  I've made plenty of such "messes" in my time here.

I went ahead and opened a new ticket for the pdftotext issue: bug #66155.

> I thought about a configure test, issuing a warning if the
> wrong/missbehaving xpdf tools are found?


I don't think a tool working differently from what groff happened to expect is wrong or misbehaving.  It's just an unfortunate (but also unsurprising, given the nature of the command) name collision between different contributors' implementations of the same basic functionality.  This to me isn't something that needs to be warned about.

Dave <barx>
Group Member
Sat 31 Aug 2024 08:25:11 PM UTC, comment #11: 

So, I now installed a debian bookworm distribution in a VM and checked: pkg-config is behaving the same on that system:


grep -IR UCHARDET_CFLAGS Makefile
UCHARDET_CFLAGS = -I/usr/include/uchardet
preconv_CPPFLAGS = $(AM_CPPFLAGS) $(UCHARDET_CFLAGS)


But, I think, that the build is working currently is a coincidence, as uchardet.h is installed under /usr/include/uchardet/; with /usr/include being a default search path (correct?).

To test my assumption, I applied my "patch" (removing the 'uchardet' directory prefix from the include statement) to the current sources and rebuilt on debian and it still worked just fine.

This suggests, in my interpretation at least, that the pkg-config provided CFLAGS is not used during a default build currently at all.

Sven Schober <sschober>
Sat 31 Aug 2024 07:05:45 PM UTC, comment #10: 


> The dueling-pdftotexts problem should likely be a separate bug report.


Sorry for the mess - but, in my defense, I was a bit hesitant in the beginning.. :)

> Solving (2) is a matter of, I think, having the test scripts check the program behavior and SKIPping the test if the program doesn't behave the way we expect.


I thought about a configure test, issuing a warning if the wrong/missbehaving xpdf tools are found?

> We have (1) a possible pkg-config problem on macOS


ATM, I only have access to macos, otherwise I would try to compare the outcomes of the different pkg-configs on macos and archlinux, e.g.. Could it be also the way uchardet is installed  via homebrew?


brew list uchardet | grep '\.h'
/opt/homebrew/Cellar/uchardet/0.0.8/include/uchardet/uchardet.h



Sven Schober <sschober>
Sat 31 Aug 2024 06:27:30 PM UTC, comment #9: 

I agree with Dave.

We have (1) a possible pkg-config problem on macOS; and (2) a problem with providers of identically named programs exhibiting distinct behavior.

Solving (2) is a matter of, I think, having the test scripts check the program behavior and SKIPping the test if the program doesn't behave the way we expect.


G. Branden Robinson <gbranden>
Group administrator
Sat 31 Aug 2024 06:20:40 PM UTC, comment #8: 

comment #7:

> The underlying problem seems to be...


To clarify: the underlying problem to this test issue, not to the build issue that is this bug report's primary focus.  The dueling-pdftotexts problem should likely be a separate bug report.

Dave <barx>
Group Member
Sat 31 Aug 2024 06:16:29 PM UTC, comment #7: 

comment #5:

> Maybe, there could be a note somewhere (INSTALL.extra in the
> deps section, or somewhere?) mentioning this for MacOS and
> homebrew users?


The problem isn't necessarily specific to MacOS: xpdf is available on many platforms, and if it installs a pdftotext that doesn't work as groff expects, that failure can show up anywhere.  The underlying problem seems to be that groff expects certain capabilities from the system's pdftotext without checking that the installed pdftotext is a version that delivers those capabilities.

Dave <barx>
Group Member
Fri 30 Aug 2024 10:53:12 AM UTC, comment #6: 

For my uchardet header include problem: My package config file reads like this:


cat /opt/homebrew/lib/pkgconfig/uchardet.pc
libdir=/opt/homebrew/Cellar/uchardet/0.0.8/lib
includedir=/opt/homebrew/Cellar/uchardet/0.0.8/include

Name: uchardet
Description: An encoding detector library ported from Mozilla
Version: 0.0.8
Requires:
Libs: -L${libdir} -luchardet
Libs.private: -lstdc++
Cflags: -I${includedir}/uchardet


And it seems, this is unaltered from the sources [1]. My pkg-config directly picks this up and emits the following cflags:


Makefile
3876:UCHARDET_CFLAGS = -I/opt/homebrew/Cellar/uchardet/0.0.8/include/uchardet


I am no expert on clang invocations and header search path semantics, but with that include string it seems natural for me that this is failing.

What am I missing?

[1]: https://gitlab.freedesktop.org/uchardet/uchardet/-/blob/master/uchardet.pc.in?ref_type=heads

Sven Schober <sschober>
Fri 30 Aug 2024 10:09:10 AM UTC, comment #5: 

I did some googling and it seems there is a poppler version of pdftotext (probabely also pdfimages?), which does support reading from stdin (who am I telling this, you wrote the test :)) and a version from xpdfreader.com, which does not (see [1,2]).

Unfortunately, homebrew's xpdf installs the latter one.

But I found that there is a poppler package (conflicting with the xpdf package). After installing that, all tests are passing:


============================================================================
Testsuite summary for GNU roff 1.23.0.1842-c9b3c
============================================================================
# TOTAL: 214
# PASS:  211
# SKIP:  0
# XFAIL: 3
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================


Hooray! :) Maybe, there could be a note somewhere (INSTALL.extra in the deps section, or somewhere?) mentioning this for MacOS and homebrew users?

So, my only remaining real issue is this uchardet detection thing. I will dig around there a bit too... any hint for me?

[1]: https://www.xpdfreader.com/pdftotext-man.html
[2]: https://forum.xpdfreader.com/viewtopic.php?t=41941

Sven Schober <sschober>
Fri 30 Aug 2024 07:12:26 AM UTC, comment #4: 


> Just a quick note, an "XFAIL" is an expected failure...


Aha! Oh man, this is a bit embarassing.. I thought it meant execution failure or something... and I did not take the time to investigate the calling code. Sorry for wasting your time on that.

There is one other real FAIL left, which seems to be connected to not behaving as expected, right?:


...
checking HTML output of web URI
pdftotext version 4.05 [www.xpdfreader.com]
Copyright 1996-2024 Glyph & Cog, LLC
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>               : first page to convert
  -l <int>               : last page to convert
  -layout                : maintain original physical layout
  -simple                : simple one-column page layout
...


I think my version of pdftotext does not support reading the PDF from stdin?

Sven Schober <sschober>
Fri 30 Aug 2024 06:57:00 AM UTC, comment #3: 

Hi Sven,

comment #2:

> There is one about unicode escaping:
>


> ...
> XFAIL: src/roff/groff/tests/string_case_xform_unicode_escape.sh
> ===============================================================
>
> XFAIL src/roff/groff/tests/string_case_xform_unicode_escape.sh (exit status: 1)
> ...


Just a quick note, an "XFAIL" is an expected failure, so the test suite does not fail ("make check" does not exit with a nonzero status) if that test does.  In fact, if the test passes, that's an "XPASS" ("unexpected pass") and that makes "make check" fail.  So I would not go chasing this, or other "XFAIL" cases, down.

G. Branden Robinson <gbranden>
Group administrator
Fri 30 Aug 2024 06:47:30 AM UTC, comment #2: 


> I've not aware of uchardet library detection being broken anywhere else, and there are no recent reports of such breakage that I can remember.


I do not think it's 'broken' -- it's more like too exact, or specific?


$ rg uchardet
....
config.status
...
685:S["UCHARDET_LIBS"]="-L/opt/homebrew/Cellar/uchardet/0.0.8/lib -luchardet"
686:S["UCHARDET_CFLAGS"]="-I/opt/homebrew/Cellar/uchardet/0.0.8/include/uchardet"
...


Here, this line 686: The added include path already contains the subfolder, so there is no need to specify it in the header file.

Anyway, I think, I was being too ominous with my hint, that I am seeing test failures.

But to first do what you asked me for:


./preconv -v
GNU preconv (groff) version 1.23.0.1842-c9b3c with iconv support and with uchardet support


I think preconv is working fine.

I've attached my test-suite.log file, so you can see what is going on for me.

Those mom tests failing are not related to preconv not functioning correctly, but actually one is caused by the version of pdfimages on my machine takes a different amount of arguments, than what seems to be expected in the test case. This is what I get in the test-suite.log:


...
FAIL: contrib/mom/examples/tests-mom.sh
=======================================

Checking number of pages of /Users/svenschober/src/groff/build/contrib/mom/examples/letter.pdf
Checking number of pages of /Users/svenschober/src/groff/build/contrib/mom/examples/mom-pdf.pdf
Checking number of pages of /Users/svenschober/src/groff/build/contrib/mom/examples/mon_premier_doc.pdf
Checking number of pages of /Users/svenschober/src/groff/build/contrib/mom/examples/sample_docs.pdf
Checking number of pages of /Users/svenschober/src/groff/build/contrib/mom/examples/slide-demo.pdf
Checking number of pages of /Users/svenschober/src/groff/build/contrib/mom/examples/typesetting.pdf
Checking number of pages of /Users/svenschober/src/groff/build/contrib/mom/examples/copyright-chapter.pdf
Checking number of pages of /Users/svenschober/src/groff/build/contrib/mom/examples/copyright-default.pdf
Checking if /Users/svenschober/src/groff/build/contrib/mom/examples/typesetting.pdf has images
pdfimages version 4.05 [www.xpdfreader.com]
Copyright 1996-2024 Glyph & Cog, LLC
Usage: pdfimages [options] <PDF-file> <image-root>
  -f <int>         : first page to convert
  -l <int>         : last page to convert
  -j               : write JPEG images as JPEG files
  -raw             : write raw data in PDF-native formats
  -list            : write information to stdout for each image
  -opw <string>    : owner password (for encrypted files)
  -upw <string>    : user password (for encrypted files)
  -verbose         : print per-page status information
  -q               : don't print any messages or errors
  -cfg <string>    : configuration file to use in place of .xpdfrc
  -v               : print copyright and version info
  -h               : print usage information
  -help            : print usage information
  --help           : print usage information
  -?               : print usage information
 no images found
...


My version of pdfimages is:


pdfimages -v
pdfimages version 4.05 [www.xpdfreader.com]
Copyright 1996-2024 Glyph & Cog, LLC


And I've installed it via homebrew and the xpdf package:


brew info xpdf
==> xpdf: stable 4.05 (bottled)
PDF viewer
https://www.xpdfreader.com/
Conflicts with:
  pdf2image (because poppler, pdftohtml, pdf2image, and xpdf install conflicting executables)
  pdftohtml (because poppler, pdftohtml, pdf2image, and xpdf install conflicting executables)
  poppler (because poppler, pdftohtml, pdf2image, and xpdf install conflicting executables)
Installed
/opt/homebrew/Cellar/xpdf/4.05 (26 files, 14.9MB) *
  Poured from bottle using the formulae.brew.sh API on 2024-02-16 at 20:11:31
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/x/xpdf.rb
License: GPL-2.0-only OR GPL-3.0-only
==> Dependencies
Build: cmake ✔
Required: fontconfig ✔, freetype ✔, libpng ✔, qt@5 ✔
==> Analytics
install: 316 (30 days), 977 (90 days), 5,427 (365 days)
install-on-request: 316 (30 days), 977 (90 days), 5,427 (365 days)
build-error: 0 (30 days)


I can fix this, adding an image root argument of 1 to the invocation and then I get:


...
Checking if /Users/svenschober/src/groff/build/contrib/mom/examples/typesetting.pdf has images
 no images found
Checking if /Users/svenschober/src/groff/build/contrib/mom/examples/slide-demo.pdf has images
 no images found
...


Hmm... I check what pdfimage is giving me on those files:


pdfimages -list contrib/mom/examples/typesetting.pdf 1
1-0000.ppm: page=3 width=81 height=96 hdpi=72.00 vdpi=72.00 colorspace=DeviceRGB bpc=8
1-0001.pgm: page=3 width=81 height=96 hdpi=72.00 vdpi=72.00 colorspace=DeviceGray bpc=8


This seems now to be caused by pdfimages spitting out two lines and the test expects there to be more (?). I check the file and I see only one image on page three to be sure.

Now, again, I don't know the expectations of the test and the results, but I can change that to -lt and the test passes. :)

So, my changes to the test would be:


git diff
diff --git a/contrib/mom/examples/test-mom.sh.in b/contrib/mom/examples/test-mom.sh.in
index 21c8835b8..e2b1364da 100644
--- a/contrib/mom/examples/test-mom.sh.in
+++ b/contrib/mom/examples/test-mom.sh.in
@@ -63,8 +63,8 @@ check_number_pages()
 check_has_images()
 {
     echo "Checking if $1 has images"
-    n_lines=`pdfimages -list $1 | wc -l `
-    if test $n_lines -le 2; then
+    n_lines=`pdfimages -list $1 1 | wc -l `
+    if test $n_lines -lt 2; then
         echo " no images found"
         ret=255
     fi


But continuing from there, there are further failures.

There is one about unicode escaping:


...
XFAIL: src/roff/groff/tests/string_case_xform_unicode_escape.sh
===============================================================

XFAIL src/roff/groff/tests/string_case_xform_unicode_escape.sh (exit status: 1)
...


Now, this is not terribly helpful, so I added a set -x to that script and got:


XFAIL: src/roff/groff/tests/string_case_xform_unicode_escape.sh
===============================================================

+ groff=/Users/svenschober/src/groff/build/test-groff
+ expected='attaché ATTACHÉ'
++ /Users/svenschober/src/groff/build/test-groff -Tutf8
+ actual='troff:<standard input>:5: warning: special character '\''U0065_0301'\'' not defined
attaché ATTACH'
+ echo 'troff:<standard input>:5: warning: special character '\''U0065_0301'\'' not defined
attaché ATTACH'
+ grep -Fqx 'attaché ATTACHÉ'
XFAIL src/roff/groff/tests/string_case_xform_unicode_escape.sh (exit status: 1)


So, if I interpret this correctly, groff is issuing a warning, which is then part of the string to be compared and thus the comparison fails...

I can continue like this, and I kindly ask you, if you think, this is time well spent?

Would you be interessted in a short write-up of my experiences of building groff on macos?

(file #56398)

Sven Schober <sschober>
Fri 30 Aug 2024 06:06:30 AM UTC, comment #1: 

Hi Sven,

original submission:

> I am currently trying to build the HEAD/master (c9b3c99) on my MacOS 14.6.1 machine using a lot of homebrew installed deps (so I am always willing to admit, this might be due to a botched environment if mine caused by historical accumulation of cruft).

[...]

I've snipped most of your report because your analysis seems sound to me.  I've not aware of uchardet library detection being broken anywhere else, and there are no recent reports of such breakage that I can remember.

> As a sidenote: With my "patch" above everything builds just fine, but I am seeing failed tests, when issuing make check, e.g., in mom tests. I could solve some of them using relatively harmless changes to contrib/mom/examples/test-mom.sh.in.
>
> Should I create a separate issue for the test failures?


It could be that preconv is still failing, and mom relies on it for at least some of her examples.[1]

Please gather preconv's version info, which also discloses its uchardet linkage, and run it on with its "-d" debugging flag one of the mom examples.  For example, here's my system.


$ ./build/preconv -v
GNU preconv (groff) version 1.23.0.1842-c9b3c with iconv support and with uchardet support
1$ ./build/preconv -d contrib/mom/examples/mon_premier_doc.mom >/dev/null
fallback encoding: 'UTF-8'
processing 'contrib/mom/examples/mon_premier_doc.mom'
  coding tag: 'utf-8'
  encoding used: 'UTF-8'


(Don't worry about the version number being later than what's on Savannah.  This is my working copy.)

How do you results compare?

[1] Just before sending this, I remembered that it doesn't.

("contrib/mom/mom.am"):


# pdfmom command used to generate .pdf
#
# Use '-K utf8', not '-k', in case 'configure' didn't find uchardet.
MOMPDFMOM = \
  GROFF_COMMAND_PREFIX= \
  GROFF_BIN_PATH="$(GROFF_BIN_PATH)" \
  PDFMOM_BIN_PATH="$(top_builddir)" \
  $(PDFMOMBIN) $(FFLAG) $(MFLAG) -M$(mom_srcdir) -K utf8 -p -e -t \
  -wall -b


Still, let's see if getting preconv working makes the test failures go away.  If they don't, there's ample opportunity for another ticket.

G. Branden Robinson <gbranden>
Group administrator
Thu 29 Aug 2024 06:49:22 PM UTC, original submission:  

Hi!

I am currently trying to build the HEAD/master (c9b3c99) on my MacOS 14.6.1 machine using a lot of homebrew installed deps (so I am always willing to admit, this might be due to a botched environment if mine caused by historical accumulation of cruft).

I followed INSTALL.REPO and boostrapped the repo successfully.

When issuing the make command, I get the following error:

$ make
/Library/Developer/CommandLineTools/usr/bin/make  all-recursive
  CXX      src/preproc/preconv/preconv-preconv.o
../src/preproc/preconv/preconv.cpp:30:10: fatal error: 'uchardet/uchardet.h' file not found
#include <uchardet/uchardet.h>
         ^~~~~~~~~~~~~~~~~~~~~
1 error generated.
make[2]: *** [src/preproc/preconv/preconv-preconv.o] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2


Which is easily solvable for me, by issuing the following patch:

diff --git a/src/preproc/preconv/preconv.cpp b/src/preproc/preconv/preconv.cpp
index cc661c525..8a13988a4 100644
--- a/src/preproc/preconv/preconv.cpp
+++ b/src/preproc/preconv/preconv.cpp
@@ -27,7 +27,7 @@ along with this program.  If not, see <http://www.gnu.org/licenses/>. */
 #include <errno.h>
 #include <sys/stat.h>
 #ifdef HAVE_UCHARDET
-#include <uchardet/uchardet.h>
+#include <uchardet.h>
 #endif

 #include "errarg.h"


But I am not sure, why this is necessary.

I had the suspicion, that this might be due to pkg-config? My version is installed via homebrew:


brew info pkg-config
==> pkg-config: stable 0.29.2 (bottled)
Manage compile and link flags for libraries
https://freedesktop.org/wiki/Software/pkg-config/
Conflicts with:
  pkgconf (because both install `pkg.m4` file)
Installed
/opt/homebrew/Cellar/pkg-config/0.29.2_3 (12 files, 679.3KB)
  Poured from bottle using the formulae.brew.sh API on 2024-07-19 at 19:20:48
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/p/pkg-config.rb
License: GPL-2.0-or-later
==> Analytics
install: 63,483 (30 days), 188,030 (90 days), 642,084 (365 days)
install-on-request: 16,770 (30 days), 50,447 (90 days), 183,614 (365 days)
build-error: 342 (30 days)


My installed version of uchardet is 0.0.8 also via homebrew:


brew info uchardet
==> uchardet: stable 0.0.8 (bottled), HEAD
Encoding detector library
https://www.freedesktop.org/wiki/Software/uchardet/
Installed
/opt/homebrew/Cellar/uchardet/0.0.8 (17 files, 652.3KB) *
  Poured from bottle on 2023-02-13 at 14:47:25
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/u/uchardet.rb
License: MPL-1.1 OR GPL-2.0-or-later OR LGPL-2.1-or-later
==> Dependencies
Build: cmake ✔
==> Options
--HEAD
        Install HEAD version
==> Analytics
install: 1,172 (30 days), 3,842 (90 days), 15,415 (365 days)
install-on-request: 120 (30 days), 424 (90 days), 1,698 (365 days)
build-error: 0 (30 days)


Please let me know, if you need any further information about my system (I've attached the config.log, which might help).

As a sidenote: With my "patch" above everything builds just fine, but I am seeing failed tests, when issuing make check, e.g., in mom tests. I could solve some of them using relatively harmless changes to contrib/mom/examples/test-mom.sh.in.

Should I create a separate issue for the test failures?

Sven Schober <sschober>

 

(Note: upload size limit is set to 16384 kB, after insertion of the required escape characters.)

Attach Files:
   
   
Comment:
   

Attached Files
file #56398:  test-suite.log.macos added by sschober (11KiB - application/octet-stream)
file #56394:  config.log added by sschober (324KiB - application/octet-stream)

 

Depends on the following items: None found

Items that depend on this one: None found

 

Carbon-Copy List
  • -email is unavailable- added by barx (Posted a comment)
  • -email is unavailable- added by gbranden (Posted a comment)
  • -email is unavailable- added by sschober (Submitted the item)
  •  

    There are 0 votes so far. Votes easily highlight which items people would like to see resolved in priority, independently of the priority of the item set by tracker managers.

    Only logged-in users can vote.

     

    Follow 19 latest changes.

    Date Changed by Updated Field Previous Value => Replaced by
    2024-10-22 gbranden StatusIn Progress Fixed
        Open/ClosedOpen Closed
        Planned ReleaseNone 1.24.0
    2024-10-19 gbranden StatusConfirmed In Progress
        Assigned toNone gbranden
    2024-10-17 gbranden Assigned togbranden None
    2024-10-14 gbranden Item GroupBuild/Installation Documentation
        StatusRejected Confirmed
        Open/ClosedClosed Open
    2024-10-14 gbranden Open/ClosedOpen Closed
    2024-10-14 gbranden StatusNeed Info Rejected
        Assigned toNone gbranden
        Summarybuild doesn't find Homebrew uchardet on MacOS 14.6.1 build doesn't find Homebrew uchardet on macOS 14.6.1
    2024-09-07 gbranden Summarybuild of current master (c9b3c99) fails on MacOS 14.6.1 build doesn't find Homebrew uchardet on MacOS 14.6.1
    2024-09-04 gbranden StatusNone Need Info
    2024-08-31 barx StatusNeed Info None
    2024-08-30 sschober Attached File- Added test-suite.log.macos, #56398
    2024-08-30 gbranden StatusNone Need Info
    2024-08-29 sschober Attached File- Added config.log, #56394

    Back to the top

    Powered by Savane 3.14-79a4.
    Corresponding source code