move bundled dependencies out of revision control history and make them optional #249
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#249
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As per this tahoe-dev discussion, it would be nice to move the bundled dependencies out of revision control history and make them optional.
With the bundled dependencies being optional, then people who downloaded the "normal" tarball would be getting a fat tarball with all easy-installable dependencies bundled in so that the "Desert Island Build" would work. The "Desert Island Build" is that someone installs all of the Manual Dependencies, downloads the allmydata-tahoe source tarball, gets on an airplane where they don't have internet access, and then tries to build and install Tahoe.
People who checked out the source with a revision control tool or who downloaded the "minimal" tarball would get only Tahoe-specific source code.
I had a thought: we build a tarball that contains the libraries that people
might need, and make it available from our website. The build process checks
to see if this tarball exists in the top of the tree, or in the directory
just above it (so that the buildslaves can keep re-using the same tarball
without re-downloading it each time). We also provide a make target that will
download the tarball if necessary (perhaps using wget and a If-Modified-Since
header). If the build process decides it needs to use the tarball, it can
unpack it into a directory which setuptools can then use as a repository.
If done right, this would make the following users happy:
to download the ext tarball multiple times. I would download it once,
place it in my trees' parent directory, and then type 'make build-deps'
in each new tree. This would grab the tarball from the parent directory
(or maybe find a previously-unpacked directory in the same place and use
that as-is)
'make build-deps'. That notices that there is no ext tarball, so it
downloads one, then unpacks it, then creates the dependent libraries.
download a snapshot), and also download the ext tarball. Then they
move to their desert island. They unpack the tahoe tree, and copy the
ext tarball into it (or to its parent). They type 'make build-deps' and
it uses that tarball without trying to download a new one.
this is closely related (or perhaps a duplicate) of #415.
#415 was a duplicate of this one.
With setuptools, you can have extra_requires, which act as option install dependencies, so you can say something like:
And we would specify the 'misc' extra require to be all those things under misc/dependencies. This would also still be supported in a Desert Island build if for instance you had all of your misc. dependency tarballs in <path/to/deps>, you would just say:
Does this cover all of your use-cases or am I missing something?
Hm, on second thought, I'm not yet familiar with tahoe's build process, but are these things actually needed at build time or do they just provide additional functionality once you have tahoe installed?
cgalvan: I'm afraid this ticket wasn't explicit enough. All of the packages in question are required for Tahoe. The only question is how to acquire them.
One desideratum is "The Desert Island scenario", i.e. an off-line build, in which you download the Tahoe source and run
./setup.py build
, but the build process is not allowed to make connections to the internet. As you've probably seen on the distutils-sig list, this kind of scenario is common behind corporate firewalls. Also it has happened -- twice now I think -- to Brian on an airplane.Another desideratum is not to keep large binaries (tarballs) in our revision control history under darcs revision control.
Another is not to have a large tarball download of the Tahoe source code itself.
These are somewhat in contention, so the agreed plan is to offer more than one way to do it: if you want just the Tahoe source and you don't mind if the build process fetches more things from the Internet when you build, then you can get just the Tahoe source tarball or the Tahoe darcs checkout. If you want to be able to build behind corporate firewall, on airplane, or on a desert island then you get both the Tahoe tarball/darcs checkout and the "dependent libs" tarball/darcs checkout.
I believe I have a solution for this problem that satisfies each of the scenarios described in the ticket. Here is how they specifically relate to the 3 scenarios described by Brian.
In the parent directory of your multiple source trees, you would have a folder named 'tahoe_deps'(this can be whatever you choose it to be). This folder would contain all of the tarballs for the external dependencies. Doing a 'setup.py build' or 'develop' would find the tarballs from the packages in these locations and would install them just as it had downloaded them from pypi or another repository.
Since the user is connected to the internet, the packages will automatically be built after they are found in a repository(most likely PyPi), or the backup dependency link of the allmydata site.
Similar to #1, except there is only a single source tree.
I think Brian may have later told me on the phone that he didn't like the build process to look "outside of its own subtree" by following "..". But I'll leave that to Brian and Chris to work out -- all of the proposals in this ticket seem acceptable to me.
One thing that I am careful about is what effect this will have on the install.html. I will not accept a change to that document which adds a branch (i.e., it includes the word "if"). I would be okay with any of these approaches being documented in install.html, but I would prefer one in which the user gets both the Tahoe source code and the complete dependency set in a single download operation (i.e. they download a single file after following the instructions in "Get the Source Code" in install.html).
cgalvan: thanks for the patch. I agree that Tahoe setup_require's Twisted for the tests, but why Nevow?
I personally would also prefer if the 'tahoe_deps' were pulled from somewhere inside the source tree, the only reason I designed it to be in the parent was so that it would satisfy the first use-case that Brian described, but if this is no longer desired it can be easily changed :)
This wouldn't change the current install approach, doesn't someone doing a Desert Island build already have to download the external tarballs separately? To make it easier, we could have a single tarball which contained all of the tarballs for the dependencies.
Also, I wasn't certain whether the tested needed just Twisted or Nevow as well, I meant to ask you about this and it looks like you have already answered my question :)
cgalvan: Thanks again! I'm glad to have your help on these issues.
Okay, here are the next steps:
Wait for Brian to wake up and login and notice this ticket and to decide whether he wants the deps to be in an uncle directory or inside the tahoe directory. (I hope he chooses the latter.)
If it is the latter then put back the dependent links variable to
misc/dependencies
.cgalvan: Do we need to specify each file per its ".tar" name, as in the current trunk, or can we specify just a directory and setuptools will look for all source tarballs in that directory? It used to be the former, which is why the current trunk of Tahoe uses os.listdir() and then filters for files that end with .tar.
Collect a set of source tarballs of all of the dependent libraries that Tahoe requires and recursively all of the dependent libraries that those dependent libraries require. Uncompress them so that they are in .tar form instead of .tar.gz or .tar.bz2 etc.. Make a .tar.bz2 of a directory containing all of those .tar's.
Test it out: unpack the dependent libs tarball into
misc/dependencies
(or into../tahoe-dependencies
, depending on step 1 above), and see if the Tahoe build succeeds without downloading anything from the network.Write a script -- probably inside source:Makefile, which unpacks such a dependent lib tarball into
misc/dependencies
and then uses./setup.py sdist
to build a tarball which includes the dependencies and is named "allmydata-tahoe-SUMO-1.3.0.tar.gz" instead of "allmydata-tahoe-1.3.0.tar.gz".Change source:docs/install.html to link to a sumo tarball in the "Get the Source" section.
You only need to specify the path to 'tahoe_deps' as a dependency link and setuptools will treat it as a repository, so you don't have to specify each file name explicitly.
For some reason in my testing, it wasn't picking up the .tar's, I had to grab the source tarballs from PyPi to test it out(which were gzipped and bzipped), can you confirm this?
If you want to eventually move away from using the Makefile, we can do this instead by adding a command such as 'sdist_sumo' that could do this by specifying additional data_files :)
Nevermind, I was missing one of the .tar's, which happened to be the first one it checked :) It recognizes .tar's just fine.
On second thought, it may be better to subclass the sdist command and write our own hook so that it checks sys.argv for '--sumo' or something, and then makes the appropriate 'sumo' tarball.
I thought this feature of just specifying a dir (which contains tarballs) didn't work in the past, but heck, let's try it and see.
Good thinking! I approve.
Attachment tahoe_ext_deps.patch (3625 bytes) added
Updated patch, note though that it will need to be updated once more when the location of the external dependencies is decided on.
I have updated the patch since Nevow wasn't needed at test time. I also added a '--sumo' option to the sdist command, which toggles including the external dependency tarballs into the whole sdist. *Note: The proper place for the external dependencies to be pulled from has not yet been decided on, so the current patch will need to be updated to reflect that. Currently, when building it will be grabbing from a 'tahoe_deps' folder in the parent of the tahoe source tree, but the '--sumo' option uses the tarballs under 'misc/dependencies', which allows anyone to test out that option currently since they are already under version control.
cgalvan: I tried your patch out and it worked fine!
One thing, though: I think that it is deciding which things to include in
misc/dependencies
based on the normal setuptools package-data-inclusion logic (i.e. currently it is including everything which is included in darcs revision control).We would like to remove those .tar's from darcs revision control, and still have them excluded from normal
sdist
, but still have them included insdist --sumo
. What's the best way to do that? I'm thinking maybe just an inclusion rule (in the--sumo
case only) to include everything namedmisc/dependencies/*.tar
.Glad that it worked for you :)
Yes, the current implementation was just an example so that you could see how it worked using the latest revision, which still had all the .tars. Just as you suggested, the --sumo case would do an include based on a pattern like you described.
Could you show me the actual code for such an include pattern that would go in our setup.py?
I'd like them to be in an uncle file, because I have lots of trees (at least
40) and I want to download a single dependency tarball for use by all of
them. So:
If it looks in the current source tree and the parent directory, that's
fine, I just want to be able to hit the same tarball from multiple
directories without having to make a symlink for every tree (because I'll
forget, and then my builds will take a long time to download stuff, and by
the time I notice this and remember the reason for it and create the symlink
the build will be far enough along that adding the symlink won't make
anything better, and that would annoy me).
I'd prefer it to be a single file that gets downloaded, rather than a
directory full of tarballs, but I'd survive if I had to unpack a downloaded
tarball first. (at least I wouldn't be replicating that work for every
feature tree I have).
I don't see a lot of value to the second part. Conserve the developer's disk
space and just leave the files on disk compressed. I just did a test against
the contents of our misc/dependencies/ directory, and 'tar cjf' of the
current .tar files uses nearly the same space as a 'tar cjf' of .tar.bz2
files (actually the bz2(tar) uses 0.5% more space than bz2(tar(bz2)) ).
(zooko did some measurements a while ago that showed the contrary, but those
were on several thousand small darcs patch files, whereas misc/dependencies
is a handful or fairly large files).
I'll look more closely at the patch now.
thanks!
No, my measurement was dependent lib tarballs of Tahoe:
http://allmydata.org/pipermail/tahoe-dev/2007-December/000292.html
But it doesn't help much with gzip or bzip2.
I applied cgalvan's patch (modified) as changeset:2cbba0efa0c928b1.
ok, so what we just discussed on irc:
and ./misc/dependencies/ for dependent library tarballs
source distribution tarball/zipfile
And our various use cases will be satisfied as follows:
My personal use case (multiple darcs trees) will be handled by having a tahoe-deps.tar.bz2 file in their mutual parent directory.
Now, will setuptools look inside a .tar.bz2 for its files? or do we need to have something (either the user, or some code inside setup.py) unpack that tarball before letting setuptools see it?
Everything looks good to me :)
setuptools can't look inside the dependency tarball itself, it will need to be extracted. I would think it'd be sufficient to make the unpacking a necessary step, but it is really up to you(or whoever wants to weigh in on this one) :) Most people that fall into the Desert Island scenario will probably just download the sumo tarball.
I'm ok with unpacking. So the process will be:
The tahoe-deps.tar.bz2 file will unpack into say tahoe-deps/*.tar.bz2, and
the setup.py build process will look in
["./tahoe-deps", "./misc/dependencies", and "../tahoe-deps"]
for those libs.Sounds good!
So some of our builds are failing, like this:
http://allmydata.org/buildbot/builders/feisty2.5/builds/1663/steps/compile/logs/stdio
We can make these compiles work by manually installing pyOpenSSL on those machines, but it might be better to make them work by changing the build steps to automatically test the "bundled dependencies/Desert Island scenario".
Does anyone want to do that? I can't take the time for it right now.
It would be sweet to finish this ticket for the 1.3.0 release so that 1.3.0 would have a working sumo/desert-island install and so that the slim tarball and the darcs checkout would be slimmer.
Brian is Release Manager for 1.3.0 (at the moment), so he can kick this ticket back out of the Milestone if he wants.
Also he is probably the only person who has a chance of implementing this ticket in time. ;-) Assigning to Brian.
Just imagine a Sumo wrestler on a Desert Island.
Hey, that reminds me of Virtua Fighter 3.
So, I'm experimenting with having the following in
setup.cfg
:And it appears to do the right thing w.r.t. finding .tar.gz files in those different directories. I'm assembling a tahoe-deps.tar.gz aggregate from the things we depend upon.
However, I'm running into a problem (that I think we've seen before). If I have, say, foolscap-0.3.1 installed (via a debian package) in /usr/lib, and if there is a foolscap-0.3.1.tar.gz present in tahoe-deps/ , then the tahoe build process will build foolscap and install it to ./support/lib/ even though it's already installed. If foolscap is in /usr/lib but the .tar.gz is not present in tahoe-deps/ , it is content to use the /usr/lib version. This appears to be true for most of our dependent libraries: twisted, simplejson, nevow, and pyopenssl, at least.
This is annoying, but not fatal. It builds take a good bit longer than they ought to. I'll poke at this some more, but I might push the changes that take advantage of tahoe-deps/ (and publish the tahoe-deps.tar.gz tarball to allmydata.org, and update the docs) even without fixing this.
Hm... This bug doesn't sound familiar to me. It would totally be familiar if you were talking about Nevow instead of foolscap:
http://bugs.python.org/setuptools/issue20
http://bugs.python.org/setuptools/issue17
http://bugs.python.org/setuptools/issue36
http://divmod.org/trac/ticket/2699
http://divmod.org/trac/ticket/2629
http://divmod.org/trac/ticket/2527
Does the foolscap from the Debian package come with a .egg-info file? Is the .egg-info file in /var/lib/python-support/python2.5 ?
Foolscap is packaged with 'pyshared' (as opposed to 'python-support'), so the code lives in /usr/lib/python2.5/site-packages/foolscap . There is an /usr/lib/python2.5/site-packages/foolscap-0.3.1.egg-info/ directory right next to it. The files in that directory are all symlinks to /usr/share/pyshared/foolscap-0.3.1.egg-info/* .
So it seems like it's a different problem than the nevow/python-support issue.
The next version of setuptools is going to be shipped any day now (it is currently blocked on a couple of bugs that I opened and that PJE fixed and that he asked me to test his fix). So now would be a fine time to open a bug report on http://bugs.python.org/setuptools/ .
I just pushed a bunch of changes that add those tahoe-deps/ directories to setup.cfg, and remove most of the tarballs from misc/dependencies/ . There is now a tahoe-deps.tar.gz available at http://allmydata.org/source/tahoe/tarballs/tahoe-deps.tar.gz which contains up-to-date versions of everything. There is also a unit test (well, an extra step in the 'clean' builder) that asserts that a build with tahoe-deps/ in place does not try to download anything.
I still need to update the docs and the wiki to explain this stuff, but the basic code is now in place. Note that there was a problem related to #455 (involving pyutil not being built correctly), with a workaround in place (run 'build-once' up to three times, if the first two attempts fail).
-SUMO tarballs are now being generated and uploaded by the buildbot. Only the docs are left.
Ok, docs are done. I've added the InstallDetails wiki page, and I've added some small changes to source:docs/install.html to reference it. Finally closing this ticket.