Building tahoe safely is non-trivial #2055
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
9 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2055
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary: to safely build Tahoe on an untrustworthy (read: any) network it currently seems necessary to take an unintuitive step such as setting up a restrictive firewall or simply disconnecting from the internet in order to prevent
setup.py
from downloading and running arbitrary code via http.In this ticket I describe the two approaches I've tried: virtualenv v1.9.1 (w/ pip v1.3), and the "Desert Island" build. If appropriate precautions are taken, both methods can yield what I believe are relatively "safe" builds (that is to say, they at least use HTTPS (and require CA-signed certificates) to ensure the integrity of the downloaded dependencies).
The former requires blocking pip's port 80 connections and the latter requires disconnecting from the internet during the build.
virtualenv+pip
Ideally,
pip install allmydata-tahoe
would be an easy and safe command to run!Version 1.3 of pip finally added certificate verification when making https connections, but when installing allmydata-tahoe v1.10 it still attempts to fetch foolscap and pycrypto via HTTP first. If that fails, perhaps because you've configured a firewall to not allow port 80 connections, it will fall back to downloading them from PyPI via HTTPS.
^Note that using virtualenv 1.9 and pip 1.3,
pip install allmydata-tahoe
fails unlesspip install twisted
is run first. This might be because the former installs Twisted 11.0 while the latter installs Twisted 13.0.^The "Desert Island" Build
On the AdvancedInstall wiki page there are instructions for a "Desert Island" build, which consists of downloading and extracting https://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-deps.tar.gz in the tahoe-lafs source directory and running "
python setup.py build
".While this does work fine without an internet connection, it still tries repeatedly to connect to the internet. These are the lines of "
python setup.py build
" output which contain "Reading http":Here is the context around one of them on my offline system (the others are similar):
I'm assuming (but have not confirmed) from the "Best match" part of this output that if any of these attempted requests were successful and the response indicated that there is a newer version of one of the dependencies than the corresponding egg in tahoe-deps, it would actually download and execute that code.
I've updated the desert island build instructions on AdvancedInstall to indicate that it is currently necessary to disconnect from the internet to have a truly offline build.
This is a relevant thread on the tahoe-dev mailing list:
https://tahoe-lafs.org/pipermail/tahoe-dev/2013-August/008643.html
FWIW, I set the
http_proxy
andhttps_proxy
environment variables to bogus values when I want to perform an offline build. The installation will try to (and will be unable to) go out to the internet to fetch newer dependencies.Based on my experiences your assumption is correct.
Thanks, killyourtv. I feel kind of terrible now, as your comment made me realize that even after my careful research writing this ticket I actually just published a script that was still unsafely installing tahoe. :(
I did much of the testing in an environment with Tor configured to refuse all connections on port 80, but in the first version of my tails bootstrap script which I published a couple hours ago I was foolishly operating under the assumption that setup.py on Tails wasn't able to connect to the internet because I saw some "Connection refused" lines. It turns out, Tails 0.19 sets the http_proxy environment variable but NOT https_proxy, so the errors I was seeing were only about the https connections. And, tahoe's setup.py only prints URLs when they fail. :(
To anyone who ran that first version of the script, I apologize. Hopefully there aren't malicious Tor exits serving higher-numbered versions of Tahoe dependencies than tahoe-deps.tar.gz has. :(
Maybe something like peep would solve this? Peep is just a wrapper around pip that will verify tarballs against a hash you give it. If any of the hashes mismatch, peep will abort the installation.
Is there a sufficiently convenient way to ask your operating system to deny networking to a given subprocess while still allowing it for your other processes? That would be useful not only for building Tahoe-LAFS, but any other package that you wanted to build. It is important to note that this would not be an attempt to prevent a malicious process from communicating, it would only be preventing an honest but imprudent process from downloading packages.
Replying to markberger:
I had not heard of peep before, but after skimming over its mere 224 lines of code just now I think I like it! One thing that isn't clear to me though is the process by which a user or developer is supposed to become aware of new versions of libraries and decide to use them.
Replying to zooko:
That depends on your definition of sufficiently convenient :)
There are LD_PRELOAD tools (such as
usewithtor
/torsocks
/tsocks
) which catch most things and redirect them to a socks proxy but they aren't 100% reliable. Some programs (even non-malicious ones) might make connections in ways those tools don't catch. Also, they're another binary dependency.Linux's netfilter firewall can do everything, but we obviously don't want to require Linux or root access to build tahoe. ,,But if anyone is interested, you can have per-user firewall rules which I believe are as reliable as the rest of Linux's privilege separation. On modern Debian or Ubuntu systems, you can use the iptables frontend ufw. It is as easy as "
sudo apt-get install ufw
", adding a line like "-A ufw-before-output -m owner --uid-owner offline-user -j REJECT
" somewhere before the last line in "/etc/ufw/before.rules
", running "sudo ufw disable; sudo ufw enable
", and su'ing to the "offline-user" user. Another way to use ufw is to edit/etc/default/ufw
and change DEFAULT_OUTPUT_POLICY from ACCEPT to REJECT, and then add rules tobefore.rules
allowing a certain user to connect, and then running tor or another proxy or VPN as that user. Then you can use your proxy or VPN to restrict what kinds of connections are allowed.,,I'm not really in favor of making the official build process use firewall or LD_PRELOAD tricks, though, as there are of course much better ways to do an offline build.
Today I learned that pip has a
--no-download
option! So, the short-term thing I'd like to see, and which I might try to do myself on my branch in the near future is to migrate the build process to use virtualenv (which includes pip, and weighs in at 2MB compressed) and include that in the repository instead of the zetuptoolz fork of setuptools from 2010 which is in there now. The next step is to either use peep or make sure that when the deps are not already present pip can only ever learn about HTTPS URLs to download them.The longer-term thing I'd like to see is deterministic builds (#2057)! In my ideal world everyone would be able to build identical debs, tarballs, exes, and dmgs. Of course, part of that involves specifying precise versions of all dependencies. Another part is building in a VM (gitian automates that) which would certainly make it easier to be confident that the build process can't get online. I haven't looked very closely at gitian yet, but I'm under the impression that it will be quite a bit of work to get to that point.
Leif: your comment covers enough different (related) topics that I think it should be a post to the tahoe-dev thread instead of just a comment. (Then maybe some parts of it should be some comments on a few different tickets…)
Oldish ticket, but it was linked to me today!
So here's some information about various versions of packaging tools and what they support wrt HTTPS.
pip < 1.3 - YOLO with HTTP all around
pip 1.3 - Hits PyPI using HTTPS (does not fall back to HTTP), however it automatically scrapes things located on a packages /simple/foo/ page on PyPI, which may be hosted over HTTP, additionally if anything uses a setup_requires that is downloaded+installed by setuptools not pip, additonally if a package has dependency_links then pip will also scrape those which may be hosted via HTTP, uses an old copy of root certificates that were incorrectly taken from mozilla's trust root and are old.
pip 1.4 - Mostly the same as 1.3, however it adds the ability to disable scraping external site to PyPI, uses an old copy of root certificates that were incorrectly taken from mozilla's trust root and are old.
pip 1.5 - Switches the options in 1.4 to on by default, pip no longer scrapes sites other than PyPI by default, additionally disables processing dependency links by default. With the default configuration the only non HTTPS network access can come from setup_requires. Uses an up to date (at time of release) bundled ca bundle that was properly taken from Mozilla (via a tool agl wrote).
pip 1.6 (future/proposed) - Removes the ability to enable dependency links at all, takes control of setup_requires so that setuptools no longer has any control over it and
pip install <something>
by default is only over verified HTTPS unless the user invoking pip explicitly uses a HTTP url somewhere.setuptools < 0.7 - YOLO with HTTP all around
setuptools >= 0.7 - Will use HTTPS to hit PyPI, may or may not acually be active because it attempts to discover certificates and I believe it fails open, installing depends on an old version of certifi which incorrectly uses the mozilla cert bundle and is outdated. Can still use HTTP if listed on a project /simple/foo/ or inside of a dependency link. No way to specify it must be loaded over HTTPS but can restrict which hosts are used.
FWIW pip --no-download is bad and you shouldn't use it. If you want to do that you should donwload the packages to a directory (you can use pip install --download [[package ...]package] for that) and then use pip install --no-index -find-links [[package ...]package].
You can tell easy_install/setuptools not to hit the network by telling it the allowed hosts are 'None' (http://pythonhosted.org/setuptools/easy_install.html#restricting-downloads-with-allow-hosts).
See also #2077.
Here's a useful summary of the situation from dstufft:
[//pipermail/tahoe-dev/2014-June/009106.html]
Sounds like a good next step is to visit the transitive closure of tahoe-lafs and its dependencies and see if we can remove all the
setup_requires
dependencies.Removing the
setup_requires
would also fix #2066.Replying to zooko:
Fixing #2066 would be easier if we require a newer Nevow which is now available: /tahoe-lafs/trac-2024-07-25/issues/7062#comment:-1
I think a good next-step on this is #2473 (stop using
setup_requires
).Milestone renamed
renaming milestone
Moving open issues out of closed milestones.
I wonder what's left to do on this.
I naively believe that
pip install --no-index --find-links path-to-wheelhouse/
will install Tahoe-LAFS and all its dependencies frompath-to-wheelhouse
or fail to install if there are missing dependencies - and not hit the network.Just now I tried just this with my network disconnected and the installation completed successfully!
This is no proof that it will succeed tomorrow, of course. But maybe the desired behavior is provided now and what remains is to automatically verify it as part of continuous integration?
Ticket retargeted after milestone closed