move bundled dependencies out of revision control history and make them optional #249

Closed
opened 2007-12-29 05:07:19 +00:00 by zooko · 36 comments

As per this tahoe-dev discussion, it would be nice to move the bundled dependencies out of revision control history and make them optional.

With the bundled dependencies being optional, then people who downloaded the "normal" tarball would be getting a fat tarball with all easy-installable dependencies bundled in so that the "Desert Island Build" would work. The "Desert Island Build" is that someone installs all of the Manual Dependencies, downloads the allmydata-tahoe source tarball, gets on an airplane where they don't have internet access, and then tries to build and install Tahoe.

People who checked out the source with a revision control tool or who downloaded the "minimal" tarball would get only Tahoe-specific source code.

As per [this tahoe-dev discussion](http://allmydata.org/pipermail/tahoe-dev/2007-December/000290.html), it would be nice to move the bundled dependencies out of revision control history and make them optional. With the bundled dependencies being optional, then people who downloaded the "normal" tarball would be getting a fat tarball with all easy-installable dependencies bundled in so that the "Desert Island Build" would work. The "Desert Island Build" is that someone installs all of the Manual Dependencies, downloads the allmydata-tahoe source tarball, gets on an airplane where they don't have internet access, and then tries to build and install Tahoe. People who checked out the source with a revision control tool or who downloaded the "minimal" tarball would get only Tahoe-specific source code.
zooko added the
packaging
major
enhancement
0.7.0
labels 2007-12-29 05:07:19 +00:00
zooko added this to the eventually milestone 2007-12-29 05:07:19 +00:00

I had a thought: we build a tarball that contains the libraries that people
might need, and make it available from our website. The build process checks
to see if this tarball exists in the top of the tree, or in the directory
just above it (so that the buildslaves can keep re-using the same tarball
without re-downloading it each time). We also provide a make target that will
download the tarball if necessary (perhaps using wget and a If-Modified-Since
header). If the build process decides it needs to use the tarball, it can
unpack it into a directory which setuptools can then use as a repository.

If done right, this would make the following users happy:

  1. Brian: I have a bunch of trees, all in sibling directories. I don't want
    to download the ext tarball multiple times. I would download it once,
    place it in my trees' parent directory, and then type 'make build-deps'
    in each new tree. This would grab the tarball from the parent directory
    (or maybe find a previously-unpacked directory in the same place and use
    that as-is)
  2. New Users (connected): they use darcs to get a tahoe tree, then type
    'make build-deps'. That notices that there is no ext tarball, so it
    downloads one, then unpacks it, then creates the dependent libraries.
  3. New Users (disconnected): they use darcs to get a tahoe tree (or
    download a snapshot), and also download the ext tarball. Then they
    move to their desert island. They unpack the tahoe tree, and copy the
    ext tarball into it (or to its parent). They type 'make build-deps' and
    it uses that tarball without trying to download a new one.
I had a thought: we build a tarball that contains the libraries that people might need, and make it available from our website. The build process checks to see if this tarball exists in the top of the tree, or in the directory just above it (so that the buildslaves can keep re-using the same tarball without re-downloading it each time). We also provide a make target that will download the tarball if necessary (perhaps using wget and a If-Modified-Since header). If the build process decides it needs to use the tarball, it can unpack it into a directory which setuptools can then use as a repository. If done right, this would make the following users happy: 1. Brian: I have a bunch of trees, all in sibling directories. I don't want to download the ext tarball multiple times. I would download it once, place it in my trees' parent directory, and then type 'make build-deps' in each new tree. This would grab the tarball from the parent directory (or maybe find a previously-unpacked directory in the same place and use that as-is) 1. New Users (connected): they use darcs to get a tahoe tree, then type 'make build-deps'. That notices that there is no ext tarball, so it downloads one, then unpacks it, then creates the dependent libraries. 1. New Users (disconnected): they use darcs to get a tahoe tree (or download a snapshot), and also download the ext tarball. Then they move to their desert island. They unpack the tahoe tree, and copy the ext tarball into it (or to its parent). They type 'make build-deps' and it uses that tarball without trying to download a new one.
zooko modified the milestone from eventually to 1.1.0 2008-06-02 23:15:28 +00:00

this is closely related (or perhaps a duplicate) of #415.

this is closely related (or perhaps a duplicate) of #415.
Author

#415 was a duplicate of this one.

#415 was a duplicate of this one.
cgalvan commented 2008-08-20 02:10:53 +00:00
Owner

With setuptools, you can have extra_requires, which act as option install dependencies, so you can say something like:

easy_install tahoe[misc]

And we would specify the 'misc' extra require to be all those things under misc/dependencies. This would also still be supported in a Desert Island build if for instance you had all of your misc. dependency tarballs in <path/to/deps>, you would just say:

easy_install -f <path/to/deps> tahoe[misc]

Does this cover all of your use-cases or am I missing something?

With setuptools, you can have extra_requires, which act as option install dependencies, so you can say something like: ``` easy_install tahoe[misc] ``` And we would specify the 'misc' extra require to be all those things under misc/dependencies. This would also still be supported in a Desert Island build if for instance you had all of your misc. dependency tarballs in <path/to/deps>, you would just say: ``` easy_install -f <path/to/deps> tahoe[misc] ``` Does this cover all of your use-cases or am I missing something?
cgalvan commented 2008-08-20 02:14:15 +00:00
Owner

Hm, on second thought, I'm not yet familiar with tahoe's build process, but are these things actually needed at build time or do they just provide additional functionality once you have tahoe installed?

Hm, on second thought, I'm not yet familiar with tahoe's build process, but are these things actually needed at build time or do they just provide additional functionality once you have tahoe installed?
Author

cgalvan: I'm afraid this ticket wasn't explicit enough. All of the packages in question are required for Tahoe. The only question is how to acquire them.

One desideratum is "The Desert Island scenario", i.e. an off-line build, in which you download the Tahoe source and run ./setup.py build, but the build process is not allowed to make connections to the internet. As you've probably seen on the distutils-sig list, this kind of scenario is common behind corporate firewalls. Also it has happened -- twice now I think -- to Brian on an airplane.

Another desideratum is not to keep large binaries (tarballs) in our revision control history under darcs revision control.

Another is not to have a large tarball download of the Tahoe source code itself.

These are somewhat in contention, so the agreed plan is to offer more than one way to do it: if you want just the Tahoe source and you don't mind if the build process fetches more things from the Internet when you build, then you can get just the Tahoe source tarball or the Tahoe darcs checkout. If you want to be able to build behind corporate firewall, on airplane, or on a desert island then you get both the Tahoe tarball/darcs checkout and the "dependent libs" tarball/darcs checkout.

cgalvan: I'm afraid this ticket wasn't explicit enough. All of the packages in question are required for Tahoe. The only question is how to acquire them. One desideratum is "The Desert Island scenario", i.e. an off-line build, in which you download the Tahoe source and run `./setup.py build`, but the build process is *not* allowed to make connections to the internet. As you've probably seen on the distutils-sig list, this kind of scenario is common behind corporate firewalls. Also it has happened -- twice now I think -- to Brian on an airplane. Another desideratum is not to keep large binaries (tarballs) in our revision control history under darcs revision control. Another is not to have a large tarball download of the Tahoe source code itself. These are somewhat in contention, so the agreed plan is to offer more than one way to do it: if you want just the Tahoe source and you don't mind if the build process fetches more things from the Internet when you build, then you can get just the Tahoe source tarball or the Tahoe darcs checkout. If you want to be able to build behind corporate firewall, on airplane, or on a desert island then you get both the Tahoe tarball/darcs checkout and the "dependent libs" tarball/darcs checkout.
cgalvan commented 2008-08-26 05:26:25 +00:00
Owner

I believe I have a solution for this problem that satisfies each of the scenarios described in the ticket. Here is how they specifically relate to the 3 scenarios described by Brian.

  1. In the parent directory of your multiple source trees, you would have a folder named 'tahoe_deps'(this can be whatever you choose it to be). This folder would contain all of the tarballs for the external dependencies. Doing a 'setup.py build' or 'develop' would find the tarballs from the packages in these locations and would install them just as it had downloaded them from pypi or another repository.

  2. Since the user is connected to the internet, the packages will automatically be built after they are found in a repository(most likely PyPi), or the backup dependency link of the allmydata site.

  3. Similar to #1, except there is only a single source tree.

I believe I have a solution for this problem that satisfies each of the scenarios described in the ticket. Here is how they specifically relate to the 3 scenarios described by Brian. 1. In the parent directory of your multiple source trees, you would have a folder named 'tahoe_deps'(this can be whatever you choose it to be). This folder would contain all of the tarballs for the external dependencies. Doing a 'setup.py build' or 'develop' would find the tarballs from the packages in these locations and would install them just as it had downloaded them from pypi or another repository. 2. Since the user is connected to the internet, the packages will automatically be built after they are found in a repository(most likely [PyPi](wiki/PyPi)), or the backup dependency link of the allmydata site. 3. Similar to #1, except there is only a single source tree.
Author

I think Brian may have later told me on the phone that he didn't like the build process to look "outside of its own subtree" by following "..". But I'll leave that to Brian and Chris to work out -- all of the proposals in this ticket seem acceptable to me.

One thing that I am careful about is what effect this will have on the install.html. I will not accept a change to that document which adds a branch (i.e., it includes the word "if"). I would be okay with any of these approaches being documented in install.html, but I would prefer one in which the user gets both the Tahoe source code and the complete dependency set in a single download operation (i.e. they download a single file after following the instructions in "Get the Source Code" in install.html).

I think Brian may have later told me on the phone that he didn't like the build process to look "outside of its own subtree" by following "..". But I'll leave that to Brian and Chris to work out -- all of the proposals in this ticket seem acceptable to me. One thing that I am careful about is what effect this will have on the [install.html](http://allmydata.org/source/tahoe/trunk/docs/install.html). I will not accept a change to that document which adds a branch (i.e., it includes the word "if"). I would be okay with any of these approaches being documented in install.html, but I would prefer one in which the user gets both the Tahoe source code and the complete dependency set in a single download operation (i.e. they download a single file after following the instructions in "Get the Source Code" in install.html).
Author

cgalvan: thanks for the patch. I agree that Tahoe setup_require's Twisted for the tests, but why Nevow?

cgalvan: thanks for the patch. I agree that Tahoe setup_require's Twisted for the tests, but why Nevow?
cgalvan commented 2008-08-26 15:47:27 +00:00
Owner

I personally would also prefer if the 'tahoe_deps' were pulled from somewhere inside the source tree, the only reason I designed it to be in the parent was so that it would satisfy the first use-case that Brian described, but if this is no longer desired it can be easily changed :)

This wouldn't change the current install approach, doesn't someone doing a Desert Island build already have to download the external tarballs separately? To make it easier, we could have a single tarball which contained all of the tarballs for the dependencies.

Also, I wasn't certain whether the tested needed just Twisted or Nevow as well, I meant to ask you about this and it looks like you have already answered my question :)

I personally would also prefer if the 'tahoe_deps' were pulled from somewhere inside the source tree, the only reason I designed it to be in the parent was so that it would satisfy the first use-case that Brian described, but if this is no longer desired it can be easily changed :) This wouldn't change the current install approach, doesn't someone doing a Desert Island build already have to download the external tarballs separately? To make it easier, we could have a single tarball which contained all of the tarballs for the dependencies. Also, I wasn't certain whether the tested needed just Twisted or Nevow as well, I meant to ask you about this and it looks like you have already answered my question :)
Author

cgalvan: Thanks again! I'm glad to have your help on these issues.

Okay, here are the next steps:

  1. Wait for Brian to wake up and login and notice this ticket and to decide whether he wants the deps to be in an uncle directory or inside the tahoe directory. (I hope he chooses the latter.)

  2. If it is the latter then put back the dependent links variable to misc/dependencies.

  3. cgalvan: Do we need to specify each file per its ".tar" name, as in the current trunk, or can we specify just a directory and setuptools will look for all source tarballs in that directory? It used to be the former, which is why the current trunk of Tahoe uses os.listdir() and then filters for files that end with .tar.

  4. Collect a set of source tarballs of all of the dependent libraries that Tahoe requires and recursively all of the dependent libraries that those dependent libraries require. Uncompress them so that they are in .tar form instead of .tar.gz or .tar.bz2 etc.. Make a .tar.bz2 of a directory containing all of those .tar's.

  5. Test it out: unpack the dependent libs tarball into misc/dependencies (or into ../tahoe-dependencies, depending on step 1 above), and see if the Tahoe build succeeds without downloading anything from the network.

  6. Write a script -- probably inside source:Makefile, which unpacks such a dependent lib tarball into misc/dependencies and then uses ./setup.py sdist to build a tarball which includes the dependencies and is named "allmydata-tahoe-SUMO-1.3.0.tar.gz" instead of "allmydata-tahoe-1.3.0.tar.gz".

  7. Change source:docs/install.html to link to a sumo tarball in the "Get the Source" section.

cgalvan: Thanks again! I'm glad to have your help on these issues. Okay, here are the next steps: 1. Wait for Brian to wake up and login and notice this ticket and to decide whether he wants the deps to be in an uncle directory or inside the tahoe directory. (I hope he chooses the latter.) 2. If it is the latter then put back the dependent links variable to `misc/dependencies`. 3. cgalvan: Do we need to specify each file per its ".tar" name, as in the current trunk, or can we specify just a directory and setuptools will look for all source tarballs in that directory? It used to be the former, which is why the current trunk of Tahoe uses os.listdir() and then filters for files that end with .tar. 4. Collect a set of source tarballs of all of the dependent libraries that Tahoe requires and recursively all of the dependent libraries that those dependent libraries require. Uncompress them so that they are in .tar form instead of .tar.gz or .tar.bz2 etc.. Make a .tar.bz2 of a directory containing all of those .tar's. 5. Test it out: unpack the dependent libs tarball into `misc/dependencies` (or into `../tahoe-dependencies`, depending on step 1 above), and see if the Tahoe build succeeds without downloading anything from the network. 6. Write a script -- probably inside source:Makefile, which unpacks such a dependent lib tarball into `misc/dependencies` and then uses `./setup.py sdist` to build a tarball which includes the dependencies and is named "allmydata-tahoe-SUMO-1.3.0.tar.gz" instead of "allmydata-tahoe-1.3.0.tar.gz". 7. Change source:docs/install.html to link to a sumo tarball in the "Get the Source" section.
cgalvan commented 2008-08-26 16:26:55 +00:00
Owner
  1. You only need to specify the path to 'tahoe_deps' as a dependency link and setuptools will treat it as a repository, so you don't have to specify each file name explicitly.

  2. For some reason in my testing, it wasn't picking up the .tar's, I had to grab the source tarballs from PyPi to test it out(which were gzipped and bzipped), can you confirm this?

  3. If you want to eventually move away from using the Makefile, we can do this instead by adding a command such as 'sdist_sumo' that could do this by specifying additional data_files :)

3. You only need to specify the path to 'tahoe_deps' as a dependency link and setuptools will treat it as a repository, so you don't have to specify each file name explicitly. 4. For some reason in my testing, it wasn't picking up the .tar's, I had to grab the source tarballs from [PyPi](wiki/PyPi) to test it out(which were gzipped and bzipped), can you confirm this? 6. If you want to eventually move away from using the Makefile, we can do this instead by adding a command such as 'sdist_sumo' that could do this by specifying additional data_files :)
cgalvan commented 2008-08-26 16:54:00 +00:00
Owner
  1. Nevermind, I was missing one of the .tar's, which happened to be the first one it checked :) It recognizes .tar's just fine.

  2. On second thought, it may be better to subclass the sdist command and write our own hook so that it checks sys.argv for '--sumo' or something, and then makes the appropriate 'sumo' tarball.

4. Nevermind, I was missing one of the .tar's, which happened to be the first one it checked :) It recognizes .tar's just fine. 6. On second thought, it may be better to subclass the sdist command and write our own hook so that it checks sys.argv for '--sumo' or something, and then makes the appropriate 'sumo' tarball.
Author
  1. I thought this feature of just specifying a dir (which contains tarballs) didn't work in the past, but heck, let's try it and see.

  2. Good thinking! I approve.

3. I thought this feature of just specifying a dir (which contains tarballs) didn't work in the past, but heck, let's try it and see. 6. Good thinking! I approve.
cgalvan commented 2008-08-26 23:18:22 +00:00
Owner

Attachment tahoe_ext_deps.patch (3625 bytes) added

Updated patch, note though that it will need to be updated once more when the location of the external dependencies is decided on.

**Attachment** tahoe_ext_deps.patch (3625 bytes) added Updated patch, note though that it will need to be updated once more when the location of the external dependencies is decided on.
cgalvan commented 2008-08-26 23:22:27 +00:00
Owner

I have updated the patch since Nevow wasn't needed at test time. I also added a '--sumo' option to the sdist command, which toggles including the external dependency tarballs into the whole sdist. *Note: The proper place for the external dependencies to be pulled from has not yet been decided on, so the current patch will need to be updated to reflect that. Currently, when building it will be grabbing from a 'tahoe_deps' folder in the parent of the tahoe source tree, but the '--sumo' option uses the tarballs under 'misc/dependencies', which allows anyone to test out that option currently since they are already under version control.

I have updated the patch since Nevow wasn't needed at test time. I also added a '--sumo' option to the sdist command, which toggles including the external dependency tarballs into the whole sdist. *Note: The proper place for the external dependencies to be pulled from has not yet been decided on, so the current patch will need to be updated to reflect that. Currently, when building it will be grabbing from a 'tahoe_deps' folder in the parent of the tahoe source tree, but the '--sumo' option uses the tarballs under 'misc/dependencies', which allows anyone to test out that option currently since they are already under version control.
Author

cgalvan: I tried your patch out and it worked fine!

One thing, though: I think that it is deciding which things to include in misc/dependencies based on the normal setuptools package-data-inclusion logic (i.e. currently it is including everything which is included in darcs revision control).

We would like to remove those .tar's from darcs revision control, and still have them excluded from normal sdist, but still have them included in sdist --sumo. What's the best way to do that? I'm thinking maybe just an inclusion rule (in the --sumo case only) to include everything named misc/dependencies/*.tar.

cgalvan: I tried your patch out and it worked fine! One thing, though: I think that it is deciding which things to include in `misc/dependencies` based on the normal setuptools package-data-inclusion logic (i.e. currently it is including everything which is included in darcs revision control). We would like to remove those .tar's from darcs revision control, and still have them excluded from normal `sdist`, but still have them included in `sdist --sumo`. What's the best way to do that? I'm thinking maybe just an inclusion rule (in the `--sumo` case only) to include everything named `misc/dependencies/*.tar`.
cgalvan commented 2008-08-27 18:46:21 +00:00
Owner

Glad that it worked for you :)

Yes, the current implementation was just an example so that you could see how it worked using the latest revision, which still had all the .tars. Just as you suggested, the --sumo case would do an include based on a pattern like you described.

Glad that it worked for you :) Yes, the current implementation was just an example so that you could see how it worked using the latest revision, which still had all the .tars. Just as you suggested, the --sumo case would do an include based on a pattern like you described.
Author

Could you show me the actual code for such an include pattern that would go in our setup.py?

Could you show me the actual code for such an include pattern that would go in our setup.py?
  1. Wait for Brian to wake up and login and notice this ticket and to decide
    whether he wants the deps to be in an uncle directory or inside the
    tahoe directory. (I hope he chooses the latter.)

I'd like them to be in an uncle file, because I have lots of trees (at least
40) and I want to download a single dependency tarball for use by all of
them. So:

 wget http://allmydata.org/something/tahoe-deps.tar.gz
 darcs get http://allmydata.org/something/trunk tahoe-trunk
 darcs get tahoe-trunk tahoe-feature1
 darcs get tahoe-trunk tahoe-feature2
 (cd tahoe-trunk && make all)
 (cd tahoe-feature1 && make all)
 (cd tahoe-feature2 && make all)

If it looks in the current source tree and the parent directory, that's
fine, I just want to be able to hit the same tarball from multiple
directories without having to make a symlink for every tree (because I'll
forget, and then my builds will take a long time to download stuff, and by
the time I notice this and remember the reason for it and create the symlink
the build will be far enough along that adding the symlink won't make
anything better, and that would annoy me).

I'd prefer it to be a single file that gets downloaded, rather than a
directory full of tarballs, but I'd survive if I had to unpack a downloaded
tarball first. (at least I wouldn't be replicating that work for every
feature tree I have).

  1. Collect a set of source tarballs of all of the dependent libraries that
    Tahoe requires and recursively all of the dependent libraries that those
    dependent libraries require. Uncompress them so that they are in .tar
    form instead of .tar.gz or .tar.bz2 etc.. Make a .tar.bz2 of a directory
    containing all of those .tar's.

I don't see a lot of value to the second part. Conserve the developer's disk
space and just leave the files on disk compressed. I just did a test against
the contents of our misc/dependencies/ directory, and 'tar cjf' of the
current .tar files uses nearly the same space as a 'tar cjf' of .tar.bz2
files (actually the bz2(tar) uses 0.5% more space than bz2(tar(bz2)) ).
(zooko did some measurements a while ago that showed the contrary, but those
were on several thousand small darcs patch files, whereas misc/dependencies
is a handful or fairly large files).

I'll look more closely at the patch now.

thanks!

> 1. Wait for Brian to wake up and login and notice this ticket and to decide > whether he wants the deps to be in an uncle directory or inside the > tahoe directory. (I hope he chooses the latter.) I'd like them to be in an uncle file, because I have lots of trees (at least 40) and I want to download a single dependency tarball for use by all of them. So: ``` wget http://allmydata.org/something/tahoe-deps.tar.gz darcs get http://allmydata.org/something/trunk tahoe-trunk darcs get tahoe-trunk tahoe-feature1 darcs get tahoe-trunk tahoe-feature2 (cd tahoe-trunk && make all) (cd tahoe-feature1 && make all) (cd tahoe-feature2 && make all) ``` If it looks in the current source tree *and* the parent directory, that's fine, I just want to be able to hit the same tarball from multiple directories without having to make a symlink for every tree (because I'll forget, and then my builds will take a long time to download stuff, and by the time I notice this and remember the reason for it and create the symlink the build will be far enough along that adding the symlink won't make anything better, and that would annoy me). I'd prefer it to be a single file that gets downloaded, rather than a directory full of tarballs, but I'd survive if I had to unpack a downloaded tarball first. (at least I wouldn't be replicating that work for every feature tree I have). > 4. Collect a set of source tarballs of all of the dependent libraries that > Tahoe requires and recursively all of the dependent libraries that those > dependent libraries require. Uncompress them so that they are in .tar > form instead of .tar.gz or .tar.bz2 etc.. Make a .tar.bz2 of a directory > containing all of those .tar's. I don't see a lot of value to the second part. Conserve the developer's disk space and just leave the files on disk compressed. I just did a test against the contents of our misc/dependencies/ directory, and 'tar cjf' of the current .tar files uses nearly the same space as a 'tar cjf' of .tar.bz2 files (actually the bz2(tar) uses 0.5% more space than bz2(tar(bz2)) ). (zooko did some measurements a while ago that showed the contrary, but those were on several thousand small darcs patch files, whereas misc/dependencies is a handful or fairly large files). I'll look more closely at the patch now. thanks!
Author

No, my measurement was dependent lib tarballs of Tahoe:

http://allmydata.org/pipermail/tahoe-dev/2007-December/000292.html

But it doesn't help much with gzip or bzip2.

No, my measurement was dependent lib tarballs of Tahoe: <http://allmydata.org/pipermail/tahoe-dev/2007-December/000292.html> But it doesn't help much with gzip or bzip2.
Author

I applied cgalvan's patch (modified) as changeset:2cbba0efa0c928b1.

I applied cgalvan's patch (modified) as changeset:2cbba0efa0c928b1.

ok, so what we just discussed on irc:

  • the tahoe build process ('make all') will look in ./tahoe_deps.tar.bz2 and ../tahoe_deps.tar.bz2
    and ./misc/dependencies/ for dependent library tarballs
  • we'll create a tarball with our dependent libraries, publish it on allmydata.org somewhere
  • the 'setup.py sdist' command will not include those dependent library files in the generated
    source distribution tarball/zipfile
  • the 'setup.py sdist --sumo' command will include those files, in misc/dependencies/
  • we'll set up a buildslave that does both 'sdist' and 'sdist --sumo', and publish both

And our various use cases will be satisfied as follows:

  • 'darcs get tahoe' + build : download all deps from the internet
  • 'darcs get tahoe' + 'wget tahoe-deps.tar.bz2', then get on a plane
  • 'wget tahoe-nightly.tar.bz2' + build : download all deps from the internet
  • 'wget tahoe-sumo.tar.bz2', then get on a plane

My personal use case (multiple darcs trees) will be handled by having a tahoe-deps.tar.bz2 file in their mutual parent directory.

Now, will setuptools look inside a .tar.bz2 for its files? or do we need to have something (either the user, or some code inside setup.py) unpack that tarball before letting setuptools see it?

ok, so what we just discussed on irc: * the tahoe build process ('make all') will look in ./tahoe_deps.tar.bz2 and ../tahoe_deps.tar.bz2 and ./misc/dependencies/ for dependent library tarballs * we'll create a tarball with our dependent libraries, publish it on allmydata.org somewhere * the 'setup.py sdist' command will *not* include those dependent library files in the generated source distribution tarball/zipfile * the 'setup.py sdist --sumo' command *will* include those files, in misc/dependencies/ * we'll set up a buildslave that does both 'sdist' and 'sdist --sumo', and publish both And our various use cases will be satisfied as follows: * 'darcs get tahoe' + build : download all deps from the internet * 'darcs get tahoe' + 'wget tahoe-deps.tar.bz2', then get on a plane * 'wget tahoe-nightly.tar.bz2' + build : download all deps from the internet * 'wget tahoe-sumo.tar.bz2', then get on a plane My personal use case (multiple darcs trees) will be handled by having a tahoe-deps.tar.bz2 file in their mutual parent directory. Now, will setuptools look inside a .tar.bz2 for its files? or do we need to have something (either the user, or some code inside setup.py) unpack that tarball before letting setuptools see it?
cgalvan commented 2008-08-28 00:53:43 +00:00
Owner

Everything looks good to me :)

setuptools can't look inside the dependency tarball itself, it will need to be extracted. I would think it'd be sufficient to make the unpacking a necessary step, but it is really up to you(or whoever wants to weigh in on this one) :) Most people that fall into the Desert Island scenario will probably just download the sumo tarball.

Everything looks good to me :) setuptools can't look inside the dependency tarball itself, it will need to be extracted. I would think it'd be sufficient to make the unpacking a necessary step, but it is really up to you(or whoever wants to weigh in on this one) :) Most people that fall into the Desert Island scenario will probably just download the sumo tarball.

I'm ok with unpacking. So the process will be:

  • 'darcs get tahoe' + build: downloads deps from internet
  • 'darcs get tahoe' + 'wget tahoe-deps.tar.bz2' + 'tar xf tahoe-deps.tar.bz2', then get on a plane
  • 'wget tahoe-nightly.tar.bz2' + build: download deps from internet
  • 'wget tahoe-sumo.tar.bz2' then get on a plane

The tahoe-deps.tar.bz2 file will unpack into say tahoe-deps/*.tar.bz2, and
the setup.py build process will look in ["./tahoe-deps", "./misc/dependencies", and "../tahoe-deps"] for those libs.

I'm ok with unpacking. So the process will be: * 'darcs get tahoe' + build: downloads deps from internet * 'darcs get tahoe' + 'wget tahoe-deps.tar.bz2' + 'tar xf tahoe-deps.tar.bz2', then get on a plane * 'wget tahoe-nightly.tar.bz2' + build: download deps from internet * 'wget tahoe-sumo.tar.bz2' then get on a plane The tahoe-deps.tar.bz2 file will unpack into say tahoe-deps/*.tar.bz2, and the setup.py build process will look in ```["./tahoe-deps", "./misc/dependencies", and "../tahoe-deps"]``` for those libs.
Author

Sounds good!

Sounds good!
Author

So some of our builds are failing, like this:

Using /usr/lib/python2.5/site-packages
Searching for pyOpenSSL==0.6
Reading http://allmydata.org/trac/tahoe/wiki/Dependencies
Reading http://pypi.python.org/simple/pyOpenSSL/
Reading http://pyopenssl.sourceforge.net/
Best match: pyOpenSSL 0.6
Downloading http://downloads.sourceforge.net/pyopenssl/pyOpenSSL-0.6.tar.gz?modtime=1212595285&big_mirror=0
error: Download error for http://downloads.sourceforge.net/pyopenssl/pyOpenSSL-0.6.tar.gz?modtime=1212595285&big_mirror=0: (110, 'Connection timed out')

http://allmydata.org/buildbot/builders/feisty2.5/builds/1663/steps/compile/logs/stdio

We can make these compiles work by manually installing pyOpenSSL on those machines, but it might be better to make them work by changing the build steps to automatically test the "bundled dependencies/Desert Island scenario".

Does anyone want to do that? I can't take the time for it right now.

So some of our builds are failing, like this: ``` Using /usr/lib/python2.5/site-packages Searching for pyOpenSSL==0.6 Reading http://allmydata.org/trac/tahoe/wiki/Dependencies Reading http://pypi.python.org/simple/pyOpenSSL/ Reading http://pyopenssl.sourceforge.net/ Best match: pyOpenSSL 0.6 Downloading http://downloads.sourceforge.net/pyopenssl/pyOpenSSL-0.6.tar.gz?modtime=1212595285&big_mirror=0 error: Download error for http://downloads.sourceforge.net/pyopenssl/pyOpenSSL-0.6.tar.gz?modtime=1212595285&big_mirror=0: (110, 'Connection timed out') ``` <http://allmydata.org/buildbot/builders/feisty2.5/builds/1663/steps/compile/logs/stdio> We can make these compiles work by manually installing pyOpenSSL on those machines, but it might be better to make them work by changing the build steps to automatically test the "bundled dependencies/Desert Island scenario". Does anyone want to do that? I can't take the time for it right now.
Author

It would be sweet to finish this ticket for the 1.3.0 release so that 1.3.0 would have a working sumo/desert-island install and so that the slim tarball and the darcs checkout would be slimmer.

Brian is Release Manager for 1.3.0 (at the moment), so he can kick this ticket back out of the Milestone if he wants.

Also he is probably the only person who has a chance of implementing this ticket in time. ;-) Assigning to Brian.

It would be sweet to finish this ticket for the 1.3.0 release so that 1.3.0 would have a working sumo/desert-island install and so that the slim tarball and the darcs checkout would be slimmer. Brian is Release Manager for 1.3.0 (at the moment), so he can kick this ticket back out of the Milestone if he wants. Also he is probably the only person who has a chance of implementing this ticket in time. ;-) Assigning to Brian.
zooko added this to the 1.3.0 milestone 2008-09-08 21:43:43 +00:00
Author

Just imagine a Sumo wrestler on a Desert Island.

Hey, that reminds me of Virtua Fighter 3.

Just imagine a Sumo wrestler on a Desert Island. Hey, that reminds me of Virtua Fighter 3.

So, I'm experimenting with having the following in setup.cfg:

[easy_install]
find_links=misc/dependencies tahoe-deps ../tahoe-deps
           http://allmydata.org/trac/tahoe/wiki/Dependencies

And it appears to do the right thing w.r.t. finding .tar.gz files in those different directories. I'm assembling a tahoe-deps.tar.gz aggregate from the things we depend upon.

However, I'm running into a problem (that I think we've seen before). If I have, say, foolscap-0.3.1 installed (via a debian package) in /usr/lib, and if there is a foolscap-0.3.1.tar.gz present in tahoe-deps/ , then the tahoe build process will build foolscap and install it to ./support/lib/ even though it's already installed. If foolscap is in /usr/lib but the .tar.gz is not present in tahoe-deps/ , it is content to use the /usr/lib version. This appears to be true for most of our dependent libraries: twisted, simplejson, nevow, and pyopenssl, at least.

This is annoying, but not fatal. It builds take a good bit longer than they ought to. I'll poke at this some more, but I might push the changes that take advantage of tahoe-deps/ (and publish the tahoe-deps.tar.gz tarball to allmydata.org, and update the docs) even without fixing this.

So, I'm experimenting with having the following in `setup.cfg`: ``` [easy_install] find_links=misc/dependencies tahoe-deps ../tahoe-deps http://allmydata.org/trac/tahoe/wiki/Dependencies ``` And it appears to do the right thing w.r.t. finding .tar.gz files in those different directories. I'm assembling a tahoe-deps.tar.gz aggregate from the things we depend upon. However, I'm running into a problem (that I think we've seen before). If I have, say, foolscap-0.3.1 installed (via a debian package) in /usr/lib, and if there is a foolscap-0.3.1.tar.gz present in tahoe-deps/ , then the tahoe build process will build foolscap and install it to ./support/lib/ even though it's already installed. If foolscap is in /usr/lib but the .tar.gz is *not* present in tahoe-deps/ , it is content to use the /usr/lib version. This appears to be true for most of our dependent libraries: twisted, simplejson, nevow, and pyopenssl, at least. This is annoying, but not fatal. It builds take a good bit longer than they ought to. I'll poke at this some more, but I might push the changes that take advantage of tahoe-deps/ (and publish the tahoe-deps.tar.gz tarball to allmydata.org, and update the docs) even without fixing this.
Author

Hm... This bug doesn't sound familiar to me. It would totally be familiar if you were talking about Nevow instead of foolscap:

http://bugs.python.org/setuptools/issue20
http://bugs.python.org/setuptools/issue17
http://bugs.python.org/setuptools/issue36
http://divmod.org/trac/ticket/2699
http://divmod.org/trac/ticket/2629
http://divmod.org/trac/ticket/2527

Does the foolscap from the Debian package come with a .egg-info file? Is the .egg-info file in /var/lib/python-support/python2.5 ?

Hm... This bug doesn't sound familiar to me. It would totally be familiar if you were talking about Nevow instead of foolscap: <http://bugs.python.org/setuptools/issue20> <http://bugs.python.org/setuptools/issue17> <http://bugs.python.org/setuptools/issue36> <http://divmod.org/trac/ticket/2699> <http://divmod.org/trac/ticket/2629> <http://divmod.org/trac/ticket/2527> Does the foolscap from the Debian package come with a .egg-info file? Is the .egg-info file in /var/lib/python-support/python2.5 ?

Foolscap is packaged with 'pyshared' (as opposed to 'python-support'), so the code lives in /usr/lib/python2.5/site-packages/foolscap . There is an /usr/lib/python2.5/site-packages/foolscap-0.3.1.egg-info/ directory right next to it. The files in that directory are all symlinks to /usr/share/pyshared/foolscap-0.3.1.egg-info/* .

So it seems like it's a different problem than the nevow/python-support issue.

Foolscap is packaged with 'pyshared' (as opposed to 'python-support'), so the code lives in /usr/lib/python2.5/site-packages/foolscap . There is an /usr/lib/python2.5/site-packages/foolscap-0.3.1.egg-info/ directory right next to it. The files in that directory are all symlinks to /usr/share/pyshared/foolscap-0.3.1.egg-info/* . So it seems like it's a different problem than the nevow/python-support issue.
Author

The next version of setuptools is going to be shipped any day now (it is currently blocked on a couple of bugs that I opened and that PJE fixed and that he asked me to test his fix). So now would be a fine time to open a bug report on http://bugs.python.org/setuptools/ .

The next version of setuptools is going to be shipped any day now (it is currently blocked on a couple of bugs that I opened and that PJE fixed and that he asked me to test his fix). So now would be a fine time to open a bug report on <http://bugs.python.org/setuptools/> .

I just pushed a bunch of changes that add those tahoe-deps/ directories to setup.cfg, and remove most of the tarballs from misc/dependencies/ . There is now a tahoe-deps.tar.gz available at http://allmydata.org/source/tahoe/tarballs/tahoe-deps.tar.gz which contains up-to-date versions of everything. There is also a unit test (well, an extra step in the 'clean' builder) that asserts that a build with tahoe-deps/ in place does not try to download anything.

I still need to update the docs and the wiki to explain this stuff, but the basic code is now in place. Note that there was a problem related to #455 (involving pyutil not being built correctly), with a workaround in place (run 'build-once' up to three times, if the first two attempts fail).

I just pushed a bunch of changes that add those tahoe-deps/ directories to setup.cfg, and remove most of the tarballs from misc/dependencies/ . There is now a tahoe-deps.tar.gz available at <http://allmydata.org/source/tahoe/tarballs/tahoe-deps.tar.gz> which contains up-to-date versions of everything. There is also a unit test (well, an extra step in the 'clean' builder) that asserts that a build with tahoe-deps/ in place does not try to download anything. I still need to update the docs and the wiki to explain this stuff, but the basic code is now in place. Note that there was a problem related to #455 (involving pyutil not being built correctly), with a workaround in place (run 'build-once' up to *three* times, if the first two attempts fail).

-SUMO tarballs are now being generated and uploaded by the buildbot. Only the docs are left.

-SUMO tarballs are now being generated and uploaded by the buildbot. Only the docs are left.

Ok, docs are done. I've added the InstallDetails wiki page, and I've added some small changes to source:docs/install.html to reference it. Finally closing this ticket.

Ok, docs are done. I've added the [InstallDetails](wiki/InstallDetails) wiki page, and I've added some small changes to source:docs/install.html to reference it. Finally closing this ticket.
warner added the
fixed
label 2008-09-17 22:59:08 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#249
No description provided.