pip packaging plan #2077
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2077
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I have a plan for improving Tahoe-LAFS packaging:
Replace source:src/allmydata/_auto_deps.py with a pip-compatible requirements.txt file, and change source:setup.py to parse that to get install_requires, instead of doing execfile on
_auto_deps.py
.Add documentation for how to install Tahoe-LAFS using pip (still using zetuptoolz), assuming pip is already installed.
Add a bundled egg of pip 1.5 (or maybe virtualenv).
Change
setup.py
to add agetpip
command: ifimport pip
yields a version >= 1.5 already, this command does nothing, otherwise it performs a pip bootstrap using the bundled egg.zooko-unauth: I guess, I would be worried about that, unless we had good automated tests of packaging, which we sort of have a little of already and nejucomo sounds serious about improving it.
sudo python setup.py getpip
, so change the docs to recommend doing that.zooko-unauth: Whoa, what docs?
daira: The docs that we added about pip-installing
zooko-unauth: My main concern about all of this is that
quickstart.rst
does not get more manual steps added.daira:
quickstart.rst
doesn't get updated until the last step, except perhaps to add "if you like pip, here's the doc to install it that way".Write buildsteps to test a pip build and a (prefixed) pip installation.
When we're confident that there are no important regressions, make
python setup.py build/install
use pip automatically (without installing it).Fix any remaining obstacles to ditching zetuptoolz (preferably by fixing those upstream)
zooko-unauth: This is where regression tests would be extremely useful.
Replace the bundled zetuptoolz with a copy of upstream setuptools
(unfortunately we can't just delete it and rely on an installed setuptools and still get the required security, I think).
At some point in the far future where good versions of pip and setuptools (if that's still a dependency) are widely deployed, delete the bundled versions and make it a hard error if acceptable versions are not installed.
On #2055, dstufft wrote:
Any reason why you want to specify your dependendies in a requirements.txt instead of directly in setup.py? See - https://caremad.io/blog/setup-vs-requirement/ It'll work of course, but it's not the main focus of those files.
Is there any reason this can't all be uploaded to PyPI and your quickstart.rst installation instructions be
pip install
? I feel like there was a reason but I can't remember it now.We don't trust setuptools to put the correct dependencies on
sys.path
, because of /tahoe-lafs/trac-2024-07-25/issues/6308#comment:-1 among other reasons. Therefore, we need to double-check them, both in tests and so that we can fail hard if they are not actually met at runtime. (Failing hard is much better from a support point of view than tolerating whatever bug arises from the not-satisfied requirement.)In order to do that, we need the requirements to be machine-readable, and in one place. Currently that place is source:src/allmydata/_auto_deps.py. However, ideally it would be in a data file rather than in code, just as an application of the Principle of Least Power.
I don't see an actual valid argument in https://caremad.io/blog/setup-vs-requirement/. It seems to be assuming that
requirements.txt
would have only==
dependencies, but that's not what we would be doing.Well that's why I asked about using pip, things installed with pip do not have a runtime dependency on setuptools nor do they use the setuptools egg system. They use plain jane Python imports via sys.path.
I must be misunderstanding something. How do entry scripts work in pip-installed packages? They still import
pkg_resources
, like they do in a setuptools install/build, right?BTW, when I said that we don't trust setuptools, I should have said that we don't trust
pkg_resources
, since that's where some of the setuptools-related bugs are. (It was all written by PJE which is why I don't make a strong distinction.)More on the rationale for parsing
requirements.txt
:We don't want to import our
setup.py
at run-time because it won't in general be installed/built somewhere importable. Therefore, given the design criterion of having the requirements specified only in one place, the options are either to havesetup.py
do anexecfile
on some other code file to get the requirements (as we're doing now), or to have bothsetup.py
andallmydata/*init*.py
read the requirements from the same data file -- which might as well berequirements.txt
since that has a well-defined existing format.So right now entry points if you install from a sdist still uses pkg_resources (sorry I forgot about that case), but there's very little path fuckery because pip doesn't install eggs so in your typical install there aren't multiple versions of things to get the wrong version activated by accident. If you install from a Wheel it actually doesn't use pkg_resources at all (and I have plans to make this the case for sdists as well).
FWIW I don't mean to keep trying to shove you towards a particular toolchain, I just don't want to see you have to reinvent stuff when I think the problems you're having are solved by pip already (or are on the roadmap to be solved) :)
The problem in /tahoe-lafs/trac-2024-07-25/issues/6308#comment:93472 doesn't depend on eggs. It can happen whenever you have any shared directories anywhere on the
sys.path
that is before the entry added for some other dependency.BTW, I am a big fan of pip. That's why this ticket is about using it!
Replying to dstufft:
I'm very interested in this option. Where is it documented?
Oh, http://wheel.readthedocs.org/en/latest/.
Wheels are basically a standardized Egg with some of the more nebulous/dangerous features removed. They are a binary format like Egg is. Since 1.4 pip supports them with a flag to enable, since 1.5 hey are on by default. The script wrapper for something installed from a Wheel looks like
See also comment:93126.
Sorry for the monster comment length here; please read at your leisure, and if there's important release stuff going on, do not feel the need to address this now. I'm trying to collect all of my thoughts about this into one place to leave as few assumptions unstated as possible.
I am trying to understand the desired user experience outlined on this ticket.
Here's the experience I want: as a developer and system administrator, I want to install Tahoe to hack on it the same way I install anything else that uses Python:
pip wheel tahoe-lafs; pip install tahoe-lafs
. To work on the code directly I want to be able to domkvirtualenv Tahoe; pip install -e .
. This works for all the other Python stuff I work on and use. It appears that if I do this right now (viapip wheel allmydata-tahoe
) I get a version that is 3 years out of date which dies with a traceback:distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('zope.interface==3.6.0,==3.6.1,==3.6.2,>=3.6.5')
.This ticket seems to propose an extraordinarily complicated way of achieving this result, by bundling specific versions of pip and setuptools (I guess to preserve their present vulnerabilities in perpetuity within Tahoe? :)) rather than having users retrieve
get-pip.py
et. al. from the usual place. Reading between the lines, my understanding of this solution is that it intends to provide a quicker, one-step on-boarding process for developers who have never worked on another Python project, and want to get started with Tahoe without understanding how the packaging ecosystem works. I don't think this works very well; much as the packaging ecosystem needs to improve, it provides a far better and far better-documented experience than the "helpers" Tahoe provides to get you started working on it.As a user, I don't want to touch pip or setuptools or for that matter a Python interpreter at all; I want to double-click on a mac app and be presented with a UI. Tahoe does not have any build solution for this sort of user right now, so I think it can be ignored for the purposes of this ticket.
Some users may want to install e.g. a Debian package, but that's Debian's responsibility to build, and they have an extensive toolchain that can be applied to anything that works with
pip install
.So here's my straw-man proposal:
zetuptoolz
entirely.setup.py
to reduce interactions withpip
and tools like it runningsetup.py
; "helpers" likesetup.py trial
can be run directly by developers. Similarly, eliminate the "sumo" option; it's not clear to me who this helps.setup.py
with a small, declarative example, something like this (ignore the py.test command that contradicts point 2 ;-)), which just saysinstall_requires
, and possibly syncs the version or uses something like Brian's versioneer. Useentry_points
to produce console scripts rather than the bespokeMakeExecutable
. To get the "required security" mentioned in the description, don't bundle setuptools, simply add a version check to the top ofsetup.py
and refuse to run with antediluvian versions.quickstart.rst
with a line that just saysvirtualenv tahoe; tahoe/bin/pip install tahoe-lafs
.This does leave the issue of not trusting setuptools to put the right versions in the right places on
sys.path
. If you tell users to create new, normal (i.e. the default, not--system-site-packages
) virtual environments, rather than trying (and, based on the long history of painful ticket comments in this cluster of tickets, consistently failing) to convince Python to use a set of modules built into a particular directory as its primary import location, you skip not only all the confusing behavior in setuptools and distutils, but also quite a lot of confusing behavior in Python itself around which directories take precedence under which conditions. Virtual environments provide a predictable execution context which is almost impossible to completely replicate with a Python interpreter environment that has arbitrary other stuff installed in it.There's also the issue of the
console_scripts
entry point executable shims not doing the right thing on Windows. I saw a previous comment about it not working on win64; this has apparently been fixed. I'm not sure about the unicode command-line arguments issue but I don't see an issue on their tracker about it. One of the reasons to prefer this solution on Windows, by the way, is that if you don't have python 3 installed (and thepy.exe
launcher) then shebang lines won't be honored in.py
(or.pyscript
) files and they'll be launched with the global python interpreter. There's only one global setting for what runs "Python.File" files, and on my Windows computers it's generally a text editor of some kind, so the tahoe command-line just doesn't work. (For what it's worth, neither does Twisted's, and I am hoping we switch to entry points for this reason as well.)I did say that this was a straw-man proposal though, and that's because I am pretty sure I'm missing some requirements that Tahoe has. I strongly suspect that these other requirements would be better-satisfied by scripts that sit as wrappers around
setup.py
rather than trying to burrow around inside of distutils and setuptools, but I'd have to know what they are to be sure of that.For the record, I recently figured out how to use PyInstaller to make a go-lang style single file binary that doesn't require Python to be installed on Linux, OSX, or Windows (and I assume it would work on other OSs as well, but didn't actually verify that). The attempts to make things more user friendly might be better addressed by making an end-user targeted download using that (I can point you to how I achieved it, it's OSS) and then let the pip/virtualenv workflow be targeted towards developers or other "advanced" users.
The
Requirement.parse('zope.interface==3.6.0,==3.6.1,==3.6.2,>=3.6.5')
error is already fixed on trunk. (Note that this was due to an incompatible, undocumented change in setuptools and we would have had this problem anyway if we had been using upstream setuptools or pip rather than zetuptoolz. In any case it is fixed, so please do not get distracted by it.)A couple of other difficulties with using pip have just been fixed on a branch, and will be in the Tahoe-LAFS v1.10.1 release.
This ticket was filed some time ago, and I know more about pip and its relation with virtualenv and setuptools than I did then.
Forking zetuptoolz was probably a mistake, and failing to keep up with changes to upstream for this long certainly was. You don't need to reiterate that; we get it.
I've had a migraine today; actually this is the third time I've had a bad migraine after spending the previous night thinking about setuptools/pkg_resources. So forgive me if I sound a bit snappy or irritated.
Tahoe-LAFS in fact does have an OS X installer, which will be in 1.10.1. It may also have a Windows installer if that is finished in time. However you're right that it is independent of this ticket.
Several issues are non-negotiable:
We need "download some file, extract it, run
python setup.py build
" to Just Work, given that Python is already installed. It must not rely on the user performing extra manual steps, especially not steps that vary by platform. (In particular Windows must work like any other platform.)The file that the user downloads must contain only source; no binary blobs. It's acceptable for precompiled binaries of dependencies to be downloaded from other sites over https. Ideally this download would be separable from build or install, although
python setup.py build
should download dependencies by default. (There may be a different download that includes all dependencies, but that is a "nice to have", not a required feature.)pkg_resources
to correctly set up asys.path
that will result in importing the versions of dependencies asked for insetup_requiresinstall_requires (there are several known cases where it doesn't, and these are not only associated with the use of eggs, nor are they easily fixable given the variety of ways in which dependent packages may have been installed on a user's system). To work around this problem, we explicitly check the versions that were actually imported. We do this in a way that does not involve duplicating information about version requirements in multiple places, and we would not want to lose that property.I have a question here:
Why must
python setup.py build
"Just work" without first installing dependencies required to run that? What (sane) build system has that property and why do you want that particular property?Replying to daira:
I was not trying to hammer home that it was a mistake by reiterating it, rather, it seemed as though there are feelings about the continued necessity of the status quo despite its unfortunate nature, and I was trying to dispel them. I get that nobody is happy with the current state of that fork.
Somebody put this on a T-shirt. "Python Packaging: So Awful It Will Physically Damage Your Brain!" :-). In all seriousness, I'm sorry to provoke this stress. Please keep in mind that we are just talking about how to make our way to a brighter future, and none of this needs addressing immediately; as I said, if you need to ignore these messages then please do so, I don't mind if it takes you a couple of months at a time to get back to me. If you want me to stop replying and insert the delay on my end, please let me know.
I guess this is the main requirement I don't quite understand. Why is manually unpacking a source tarball such an important use-case? It's not like
wget
andtar xvzf
are substantially less obscure thanpip
. Also: why is it so important that this file be calledsetup.py
? The general trend in the Python community is to unbundle these responsibilities as much as possible, and to havesetup.py
serve as a metadata calculator for distutils and setuptools, and other, external build scripts live elsewhere.In my mind, there are three audiences here.
pip install -e
into a virtualenv.pip install allmydata-tahoe
, and ought to be able to tolerate binary blobs for their platform of choice.pip
workflow until it can be improved).Who is the audience for a direct tarball download?
Oh, are you literally talking about
setup_requires
, and notinstall_requires
? That feature seems to be a lost cause :-. What does Tahoe currently needsetup_requires
for?Sorry, I meant to say
install_requires
. Tahoe-LAFS does currently usesetup_requires
but that is one of the things that has been fixed on a branch: see https://github.com/tahoe-lafs/tahoe-lafs/commit/dc04bd2b331a2ba8f2737cbd891b8a9508a10ab9, and #2028 for why we still need a hacky exception to this on Windows.It would probably be good to insert a few days delay in this conversation so that I'm completely recovered from the migraine and don't have to think about it until then.
Replying to daira:
Let's call it a week. See you then :-).
Dear Glyph: I'm so glad to have your attention on this issue. I hope you feel sufficiently rewarded for the time you're putting into this that you keep coming back for more.
I don't have a response to most of this material at this point, but I just want to explain "what is the use case" that Glyph was asking about, when he wrote "I am trying to understand the desired user experience outlined on this ticket." in comment:93483.
There are at least two use-cases: developer and end-user. The one that I have been feeling like I am the sole defender of during the last seven (!?!) years has the end-user use case. That use case goes like this:
Because of this use-case, an important requirement for me is that changes we make do not cause regressions in the above process! If a change causes the above process to break, or requires us to add text to source:trunk/docs/quickstart.rst, then that is an important regression in my view.
For example, if a change that we make adds a new dependency, and that new dependency has to be manually installed (i.e. it does not get auto-installed when you follow
quickstart.rst
), then that fails this test. If we make a change and that causes the install process to become different on Windows than on Mac OS X, then that fails this test. If we make a change and it causes a working, correctly-configured C compiler to become a requirement for install, then that fails this test.Now, I'm willing to believe that this has been a fool's errand and a waste of time all these years. I'm kind of depressed about it, to be honest, because I find the alternative of "Tell the user to go find a software developer/operating system maintainer that has done this for them" to be unsatisfying. (For one reason, because that disempowers those end users by making them reliant on that expert, and for another reason, because software developers are people too! The software developer that you find will have the same problems you had, but just more experience at working-around them.)
But, my point in writing this comment is not to argue that this use case is a worthy one, but instead to explain why we've done so much of what we have done over the years, stuff that seems inexplicable if you are unaware of this goal of mine.
@zooko
So my argument here is that telling end users to install Python and then execute
setup.py build
is still too many steps for an end user. With PyInstaller you could add a patch like this for your quickstart:That is, it's entirely possible to generate a single file, that when executed runs tahoe. That single file will contain Python, the standard library, all your dependencies, and tahoe itself. Think static linking but for Python.
There are a few possible downsides to this:
This would mean that end users (and your quickstart guide) never mentions pip and the fact you're written in Python is, for end users, just an implementation detail as is all your other dependencies. You can then focus on making pip (and the python packaging toolchain) work for developers and for redistributors.
I'm in complete agreement with glyph and dstufft here (see #1951 and my experiments related to #2255). I think we can usefully distinguish between folks who simply want to use tahoe and those who want to take it to the next level and hack on it. The hackers can be expected to either know the basics of python hacking, or can be expected to read a second file to remind them of what their options are.
Tahoe's packaging code dates back to 2007, before pip and virtualenv, at a time when setuptools was at best an interesting set of tradeoffs. We began with
setup.py build
and a Makefile that rummaged around inbuild/lib*
to find the right thing to add to PYTHONPATH. I was eager to make the install process more GNU-like (make install
into /usr/local, ormake install PREFIX=X
to use Stow or build a debian package), and there wasn't really a python-like process to use instead.These days, pip is well-established, setuptools is healthy and commonplace, and virtualenv is a great tool for providing the sort of isolated developer environment that we need.
I don't feel that
download; unpack; setup.py build
needs to be the way people interact with tahoe. I'm not even sure whatsetup.py build
is supposed to accomplish anymore: back in 2007 I thought it would create abuild/libSOMETHING
directory that could be put on PYTHONPATH (but even then it felt weird to parse Python's version just to figure out what the SOMETHING was).After 8 years, I think we're allowed to change the recommended command :).
As glyph and dstufft have pointed out, our quickstart.rst is in fact aimed at developers: a non-developer won't bother to figure out wget/git-clone/tar to see the file in the first place. The furthest a non-developer can be expected to look is the front page of our website, which should have a pointer to the platform-specific installer or executable (#1951/#182/#195). We can put the user-oriented instructions on the website, and the developer-oriented instructions inside the source tree, and not be obligated to use the same text for both.
There's one other feature I'd like to offer our hack-on-tahoe audience: safe builds (#2055). In particular, I'd like to offer a tool which fetches known versions of the dependencies (identified by SHA256) and makes it easy to build an environment that uses only those bits. The idea is to reduce the reliance set down to the original Tahoe source tree. (pip-over-https is a great start, but this would remove the pypi maintainers and the dependency authors from the set of people who could change the behavior of what you've built). It would require compilers and such, or hashes of wheels, but that's ok for the hacker audience.
This could be a standalone script in the source tree, or an extension command in
setup.py
, but it should interact sensibly with the normal virtualenv tools. Like, the end goal for a developer is to have a local virtualenv with the right dependencies installed and a 'setup.py develop' symlink-ish in place. You might create this with "ve/bin/pip install tahoe-lafs" (which gives you the latest of everything) or this proposed "misc/safe-install" (which gives you the hash-verified compatible-but-older bits), but in either case you wind up with a ve/bin/tahoe that works.I have experimental branches in which we put a copy of pip and virtualenv in the source tree, enabling a standalone command to create this sort of environment. But if you're the sort of python hacker who already has them, you can just run your own. My thinking is that our quickstart.rst should point at this standalone tool, with a note that explains that it's really just running virtualenv+pip.
Maybe we need that one-file executable to unblock this. I can imagine Zooko and Daira being more comfortable with making the source tree more developer-centric if they know there's a user-centric option that works. Doing it the other way around might feel riskier, or might feel like we're abandoning the non-developers.
dstufft: could you add your PyInstaller pointers/notes to #1951?
Oh, dstufft reminded me to mention peep (https://pypi.python.org/pypi/peep). It's probably the right tool to handle the "safe build" goal (or may become the right tool in the future).
I really don't like a system in which an operating system priest is required to intervene between the developers who write the software and the users who run it. (Which is different from saying that such intercessors shouldn't be possible or easy or whatever.) I don't see why the continuous integration server shouldn't build a package on every commit, and that package shouldn't be executable on every platform. Who's in charge of computing technology, anyway? They're fucking up.
Oh, I didn't mean to suggest that someone other than us should be building those packages. Our buildbot should definitely be creating them. And then we host them on tahoe-lafs.org . I'm just saying that the source tree isn't the desired executable artifact for this audience, because it's not "executable" enough.
We may need some help from the priests to create the automation that makes the packages (I certainly don't know the windows stuff), but I think we can incorporate that process into our usual CI workflow.
Yes, for PyInstaller in particular once you have the .spec file written, building a new artifact is done by executing
pyinstaller <pathtospecfile>
. You'll get a single file that someone can copy to their own machine and it will work, no matter what they do or do not have installed (with the sole exceptions of the fact the file is platform specific and it requires TMPDIR to not be mounted as no-exec).And to be clear, you can do that in your CI on every commit if you'd like :)
Replying to zooko:
I completely agree. But, if you want to solve that problem, it's best to address it at the source (work on pip, setuptools, apt, etc) than to re-do the work in each package. Each software package like Tahoe having its own build idioms makes the situation worse because it increases the surface area of expertise that the anointed intercessors have and the plebs don't, increasing the need for them (us?).
There's a deeper philosophical argument here, too, which we should really discuss over some very strong beverage, either caffeinated or alcoholic :-).
That sounds like a great idea, and Tahoe should probably have that. Twisted kinda does, and although we don't distribute the artifacts that are created on each commit, our release manager does distribute various artifacts produced by the build system and doesn't build all of them directly.
The reason that one package shouldn't be executable on every platform is that a "platform" is (more or less by definition) a set of requirements for an executable artifact, and different platforms are, well, different. Trying to create a combo Mac app / Windows app / iOS app / Red Hat package / Debian package / Ubuntu package all at once is just about as silly to try to make a program which is valid C, Perl, and Python all at the same time.
Oops. It's all of us, right here. Sorry :-(.
I think a good next-step on this is #2473 (stop using
setup_requires
).Replying to zooko:
Actually the dependency is the other way around; #2473 is blocked on using pypiwin32 on Windows, which is blocked on using newer setuptools (e.g. the setuptools provided by pip). Fortunately it is possible to solve this ticket and #2473 at the same time.
Replying to [daira]comment:34:
That's #2392 (add dependency on pypiwin32 to satisfy Twisted on Windows)
We have stopped using
setup_requires=
, and we've switched to modern pip/setuptools/virtualenv. So I think we can close this one now. We have other tickets for single-file executables (#1951 for the general case, #182 for mac, #195 for windows).I want to thank everyone who's put their blood, sweat, and tears into this issue over the last 8 years. Especially zooko, who persevered in his noble battle to improve the state of python packaging (by filing bugs, creating packages, and maintaining entire forks) where I was content to make localized hacks to work around the breakage. And for continuing to fight for the users, not just people who self-identify as developers. I think we're in a much much much better position now than we were 3-4 years ago.