having Tahoe or any dependency installed in site-packages (or any other shared directory) can cause us to run or test the wrong code #1258
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1258
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
From /tahoe-lafs/trac-2024-07-25/issues/6315#comment:20 :
and /tahoe-lafs/trac-2024-07-25/issues/6315#comment:81144 :
This might have the same cause as /tahoe-lafs/trac-2024-07-25/issues/6308#comment:-1 .
I ran
sudo /bin/rm -rf /Library/Python/2.6/site-packages/allmydata*
on zomp and then rebuilt a build that had failed. Here is the result:http://tahoe-lafs.org/buildbot/builders/Zooko%20zomp%20Mac-amd64%2010.6%20py2.6/builds/212
Replying to zooko:
OK, that succeeding is consistent with the underlying issue being the same as in #1246. The site-packages directory has a
pycrypto-2.1.0-py2.6.egg-info
file, which could have caused it to be put on thesys.path
by setuptools.If I'm understanding correctly, this can cause us to import the wrong code in any buildbot test step, and also when running
bin/tahoe
.test step is not testing the right codeto having Tahoe or any dependency installed in site-packages (or any other shared directory) can cause us to run or test the wrong codeDavid-Sarah: I think we can safely put off this ticket til after v1.8.1. Do you agree?
Replying to zooko:
Yes if we add documentation recommending not to use
setup.py install
(and saying why).I just saw what is probably another instance of this (on a tree with local changes):
Notice the
c:\\python26\\lib\\site-packages\\allmydata
paths in the traceback. (I don't know how that directory got there; I thought that I'd deleted it and not installed Tahoe since.)More significantly, when running the whole test suite, this was the only test that failed. I thought that when running tests, this bug was restricted to testing the wrong code and complaining about it, but apparently it can still fail silently, despite [test_the_right_code]source:src/allmydata/test/test_runner.py@4853#L52 that we added for #1137 :-(
I will investigate tomorrow why
test_the_right_code
didn't detect this.Replying to davidsarah:
This is a really confusing bug.
python setup.py trial
is importing the right code in its own process. The problem is that the subprocess created by [source:src/allmydata/test/test_system.py@4895#L1777 _run_cli_in_subprocess] intest_system.py
(called from the [source:src/allmydata/test/test_system.py@4895#L1601 test added] for #1253) is importing the wrong code, from a version installed before #1253 was fixed.However, that situation -- where we import the wrong code when running
bin/tahoe
in a subprocess -- should have been picked up by [source:src/allmydata/test/test_runner.py@4853#L85 test_path] intest_runner.py
._run_cli_in_subprocess
does not runbin/tahoe
in exactly the same way astest_path
(they differ in whether the command usessys.executable
, and in the passed environment), but the differences seem not to be significant. In fact adding code to runbin/tahoe --version-and-path
using_run_cli_in_subprocess
, like this:gives output showing a version of
which is correct. (The difference between r4908 here and r4900 in comment:81129 is not significant, I committed a few patches.) The value of the PYTHONPATH variable set by
d:\tahoe\tahoe-clean\bin\tahoe.pyscript
isd:\tahoe\tahoe-clean\support\Lib\site-packages
, which is also right.So, I have no explanation of why the
--version-and-path
output from the subprocess is correct, butita subprocess run in exactly the same way is still importing the wrong code. (c:\python26\lib\site-packages\allmydata\_version.py
is present, so it is not the case that it is falling back to a latersys.path
entry when importing_version.py
because that file is missing.)Replying to [davidsarah]comment:7:
It turned out that the
--version-and-path
output was wrong. At source:src/allmydata/init.py@4796#L227, the versions and paths obtained frompkg_resources.require
take precedence over those obtained by importing modules. But in a case such as this bug where the imported code is different from whatpkg_resources
attempted to put on thesys.path
, the result ofpkg_resources.require
is the wrong thing to use.Replying to [davidsarah]comment:8:
That is ticket #1287. add-test-import-in-repl.darcs.patch adds a different test which would catch any problem with importing the right version of Tahoe in subprocesses, without relying on
bin/tahoe --version-and-path
to be correct.Note that it is still not explained what the difference is between the
setup.py
process and thebin/tahoe
subprocesses, that causes us to import the right code in the former and the wrong code in the latter.add-test-import-in-repl.darcs.patch has a confusing shadowing of the
srcfile
variable. I will fix that.Attachment add-test-import-in-repl.darcs.patch (12108 bytes) added
test_runner: add test_import_in_repl, which uses 'bin/tahoe debug repl' to import the allmydata module and checks that it comes from the right path. Also, fix a latent bug that caused test_the_right_code to incorrectly conclude that a path mismatch was due to a Unicode path (causing a skip rather than a failure). This version of the patch avoids a confusing shadowing of 'srcfile'. refs #1258
Oh, and adding
test_import_in_repl
has the side-effect of testingbin/tahoe debug repl
, which is not tested anywhere else :-)In [4940/ticket1306]:
Replying to [davidsarah]comment:8
Two questions about this:
Could we start writing a unit test for this by taking current trunk and adding an assertion in [allmydata/init.py get_package_versions_and_locations()]source:src/allmydata/init.py@4796#L227 that the
pkg_resources.require()
version and the specific-attribute-of-this-module version have to match or else exit noisily? Then we could hopefully produce a minimal test case which triggered that assertion.If we "solve" this bug by removing the
pkg_resources.require()
, as is done by [20110101110141-93fa1-3557bc2136f970fae05c1d20e336c32fec8e3d6d]/ticket1306, is it really a full solution? Could it be then that we are relying on thezetuptoolz
-based dependency-resolution to make sure we have deps of the right version number, and there is a bug inzetuptoolz
'spkg_resources
so that we're actually getting a package of the wrong version number, and [20110101110141-93fa1-3557bc2136f970fae05c1d20e336c32fec8e3d6d] causes us to ignore this fact?In [4959/ticket1306]:
Replying to [zooko]comment:18:
The calls to
pkg_resources.require()
that were removed in [4936/ticket1306], were never relied on to perform version checking. It was the calls in_auto_deps.require_auto_deps()
that were supposed to be doing that. But they weren't working either.In [4942/ticket1306], we replace
require_auto_deps
withcheck_all_requirements
, which compares the requirement with the imported version as obtained directly from each package module. Then in [4959/ticket1306], we change that check to use verlib and add tests for both verlib andcheck_requirement
.If we did as you suggest in 1, we'd be using
pkg_resources.require
only in order to detect the case where it's giving the wrong result. (As it happens, that corresponds to a case where*requires* = ...; import pkg_resources
in the support script failed to do the right thing. However, that's almost a coincidence! We shouldn't rely on the fact that two setuptools bugs happen to coincide. The [4942/ticket1306] change already adds a reliable test for what we actually want to know, which is that the imported dependencies satisfy the requirements.)Caveat:
check_all_requirements
doesn't currently check indirect dependencies, e.g. pyOpenSSL, pyutil, argparse, and zbase32.require_auto_deps
didn't check those either.Replying to [davidsarah]comment:20:
I should also have mentioned [4949/ticket1306], which generalizes the check to handle disjunctive requirements like "pycrypto == 2.0.1, == 2.1.0, >= 2.3" (and also exposes
check_requirement
at module level to make it testable).In [4960/ticket1306]:
In changeset:727b25f622b01593:
Replying to [davidsarah]comment:20:
I didn't mean either of those, I meant that
python setup.py build
,python setup.py install
, andeasy_install allmydata-tahoe
rely onpkg-resources
-based dependency resolution. (The former two rely onzetuptoolz
, the latter relies on eithersetuptools
ordistribute
, depending on which one provides your localeasy_install
executable.)So, if
pkg_resources.require()
is giving us the wrong version of a dependency, does that suggest that the setup step that we just did may have been satisfied with an extant dependency even though it was of an incompatible version?Replying to [davidsarah]comment:20:
Hm. If we were using
pkg_resources
, thenpkg_resources.require("allmydata-tahoe")
would check all dependencies including indirect ones likepyOpenSSL
.I'm unsetting
review-needed
and assigning to davidsarah for davidsarah (or anyone else who knows) to answer my questions from comment:18 and comment:81143. In short: I think we're depending onpkg_resources
to install the right versions of deps, so ifpkg_resources
is reporting incorrect version numbers when queried, perhaps this indicates a bug that we need to fix more deeply than just to stop asking it what the version numbers are.Brian (Release Master for 1.8.2) says that for 1.8.2 we intend to land a patch to detect if this problem has occurred but not to fix the problem.
David-Sarah: I would give +1 to [20110101110141-93fa1-3557bc2136f970fae05c1d20e336c32fec8e3d6d] if we put back the
pkg_resources.require()
and add code to abort with a specific error message if the answer provided bypkg_resources.require()
differs from the one detected by importing and inspecting the module.This would satisfy my current uncertainty (re comment:81147), would be in keeping with the intent to detect this problem in 1.8.2, would fail-safe, and would not cause any problem in the case that
pkg_resources
was behaving correctly. What do you think?Replying to zooko:
+1. This check might have to be disabled for a frozen executable (but in that case the dependencies should also have been frozen at build time, so I don't think the check is needed in that case).
In changeset:fd6cdc48ae1ccbe1:
Replying to davidsarah:
Okay, I did this in a series of patches [20110120070136-92b7f-88ac67b4d3aa7d3402186eac8c4ee7dbb42f0f1f]/ticket1306, [20110120081620-92b7f-6e0039e28365c86a97a2f26072aedf0b742d6c79]/ticket1306, [20110120084014-92b7f-e7777f5b76d7bbd269272cdd9dddf02922f4e197]/ticket1306, [20110120085221-92b7f-cf62522aca14becf31748723fee15458333d7774]/ticket1306, [20110120085325-92b7f-028722d1cf43f20122d2fa022d0c5cf66272eaaf]/ticket1306, [20110120085608-92b7f-aad66de877bf56c05918a921078a7a405130173c]/ticket1306.
Thise check is causing several failures on the buildslaves. Arguably they are false positives from a user's perspective (because we are importing acceptable versions of the dependencies in these cases), even though they indicate that pkg_resources is deeply confused.
I propose that we should only perform this cross-check when one of the version-checking options is used (
bin/tahoe --version
,--version-and-path
or--debug-version-and-path
), or whensetup.py test
orsetup.py trial
is used. In the latter case, it should not stop us running tests.Replying to davidsarah:
+1
Attachment auto-deps-and-init-changes.darcs.patch (55726 bytes) added
Refactor _auto_deps.py and init.py, adding more robust checking of dependency versions, and not trusting pkg_resources to get the versions right. refs #1258, #1287
In changeset:29336a09163cd3d5:
In changeset:b1f15a630a6389cf:
I don't know how to fix this in the general case.
How about if we establish a weaker goal: to detect at run-time whether this has happened and raise an exception if so. That's probably more doable than actually fixing the Python packaging system to always import the right code.
Replying to davidsarah:
To be more precise, I don't know how to do it while still using setuptools. If we ditched setuptools then I would know how to do it (by making sure that the dependencies are always added at the start of
sys.path
, and/or by using a custom import hook). The amount of work needed to replace the affected setuptools functionality shouldn't be underestimated, though.We already check that the version found by import is acceptable, except in the case of packages that don't declare their version in an easily inspectable manner (#1473). That isn't equivalent to checking whether this bug has occurred, though: we can (and frequently do) import a package that is not the version that setuptools intended to import, but that just happens to have an acceptable version. We know which version and path setuptools intended to import, and the original code I wrote to check for mismatches did treat this as an error. We changed it to be a warning that is only displayed in the situations described in comment:81154, because the error was happening too often :-(
(The ticket1306 branch, which did the more thorough check, has been deleted from the trac, so links to it no longer work. It can still be checked out using
darcs get --lazy <http://tahoe-lafs.org/source/tahoe-lafs/ticket1306>
, though.)This interacts particularly badly with release tarballs with bad permissions (600, rather than 644), because those tarballs, at least with packaging systems that don't attempt to muck with them, result in installed versions of tahoe were non-root users may not read the egg files. So the next attempt to build tahoe (as a non-root user, because that build stage does not require root) will fail. The workaround is just to remove the old/wrong package.
In /tahoe-lafs/trac-2024-07-25/commit/b0b76a7c5b89c3fed5a65ef6732dc45e578f12f4:
We now recommend tox for running tests, and virtualenv for running tahoe at all, which gives us the desired isolation between any system/site-packages -installed python packages and tahoe's own requirements. So we can close this now.
I came back to this while reviewing the pkgsrc package. While it is certainly ok to recommend using a virtualenv, it is not ok to say that tahoe working properly outside of a virtualenv is beyond the specification.
Certainly it is fair to expect that the installed versions of dependencies are ok. It's really the separation of the installed, likely previous working tahoe, and the being-built next version for testing that matters. It is normal for packaging systems to build and test not installed versions.
Do people think this problem is resolved, for running tests in a build dir while a previous version is in site packages? Or is this just tahoe adopting an usual definition of correctness? (Really asking; I have no idea.)