pyOpenSSL 0.14 pulls in a bunch of new dependencies #2193
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2193
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
cryptography
in turn depends oncffi
andpycparser
, resulting in the following warnings:This is annoying, since it's going backwards with respect to our goal of reducing the number of crypto library dependencies.
Note that the change in comment:94470 only hides the warning. We need to decide whether the additional dependency on
cryptography
is acceptable, and requirepyOpenSSL == 0.13
if it is not. (Thecryptography
dependency is via pyOpenSSL 0.14.)Relevant to this issue, is my experience provisioning new S4 subscriptions.
The provisioning process began failing unexpectedly on Tuesday March 03, 2014.
I don't immediately recall when I last ran it prior to that, but it was probably within a week (i.e. since Februrary 24th).
I ssh'd in to the SSEC2 which had been launched during the failed provisioning.
I manually repeated the provisioning steps until I reached the first step that failed.
The failing step was:
sudo python ./setup.py install
run in the context oftxAWS-0.2.1.post4
's 'base' directory.I upgraded the version of txAWS used during deployment to
post5
, by applying a commit daira wrote a while back.post5
depends onpython-dateutil
instead ofEpsilon
.Here's the relevant commit:
<https://github.com/LeastAuthority/leastauthority.com/commit/05ce3d44708c89158c6ae828d2a11827b74001f7>
After making only that change, the S4 provisioning process began functioning as expected again. (As evidenced by a "smoke-test" on an S4 instance provisioned as above.)
The following is the version info from the provisioned S4:
And the version of txAWS:
As you can see from the tahoe version information, the current version deployed with S4, does not depend on cryptography because the version of pyOpenSSL it uses is:
0.12
.When we upgrade to newer versions of Tahoe-LAFS (an urgent needed upgrade, to fulfill customer-driven feature requests) however, this issue will effect S4.
Finally the upgrade to txAWS-0.2.1.post5 is not in production, and we can therefore expect new signups to fail.
Also, as Daira pointed out elsewhere, the version of pyOpenSSL selected by setuptools when building ticket999-S3-backend, is liable to unpredictably change.
dstufft asked on IRC why this is such a big deal to us, and my answer was that for us resolving a dependency is a user-facing action, so if a dependency of ours (pyOpenSSL) gains a new dependency (cryptography.io), then that is not just a developer-facing issue, but is in fact a regression for our users. That's because our users start with this document: [quickstart.rst]source:docs/quickstart.rst, and follow the instructions written there, and at the end they have a working
bin/tahoe
script. Before the pyOpenSSL v0.14 release came out, following those instructions worked for most users, after the pyOpenSSL v0.14 release came out, it did not work for most users (because the new dependencies of pyOpenSSL cannot be automatically resolved by setuptools, especially considering that most users do not have a C compiler and Python header files installed). So it is a regression for us.I intend to pin our dependency on pyOpenSSL to
pyOpenSSL < 0.14
for now.(We don't really rely on pyOpenSSL for much anyway, so if we could in fact remove the dependency on pyOpenSSL entirely, that would be nice. See also [tickets with "openssl" in their tags]query:status=!closed&keywords=~openssl&order=priority.)
Replying to zooko:
If you are telling end-users and not developers to install a tool with a development toolchain (and setuptools is definitely a development toolchain) then perhaps the problem is with the instructions? Your dependencies' dependencies should not be a user-visible change.
Have you considered creating distributions for end-users that bundle everything together into a single file, bundle, or linux distro package, so that dependency issues like this aren't exposed? Or perhaps at least updating quickstart.rst to use contemporary tools, i.e. pip and virtualenv, rather than ez_setup?
These dependencies can be automatically resolved by pip. There are already binary wheels for Windows so those folks don't need a C compiler. And in the coming months my understanding is that this will be extended to OS X as well.
By pinning this dependency you're opting out of all potential future security updates for pyOpenSSL which seems like a bad idea, if you depend on it at all. And the move to Cryptography and thereby cffi is a huge upgrade to the simplicity and security of the basic implementation strategy of pyOpenSSL itself.
Removing the dependency might be nice. The OpenSSL API is rightly universally reviled. Although I would suggest that Cryptography is a promising new project to provide backend agility for cryptographic primitives and you should be depending upon it directly at some point in the future :-).
It's quite likely that Twisted will acquire a hard dependency on Cryptography or some other cffi-based project in the future, so this is probably worth working out now.
In a packaging environment, having a pinned dependency on 0.13 is
really not ok, because upstream py-OpenSSL doesn't seem to provide for
parallel-installable multiple versions. So I think tahoe has to cope
with whatever upstream releases, or stop using py-OpenSSL. But
py-twisted uses pyOpenSSL, so it's not clear that what tahoe does
matters in the grand scheme of things.
longer version after I retyped when the above was lost to spam filtering:
Regarding pinnning the dependency to 0.13, I don't think that's a
resaonable approach. First, I think that any software that is broadly
successful is used almost entirely from prebuilt packages in various
packaging systems (or for us odd pkgsrc types, automatically built from
source using the packaging system, which amounts to the same thing).
My impression is that python generally does not support installing
multiple versions of a package, e.g py-OpenSSL 0.13.1 and 0.14 both.
(While pkgsrc has made python27 and python33 parallel installable, and
most libraries can have both at once, I haven't seen this extend to
individual libraries.) So a packaging system has to just pick a
version. Given that upstreams do not do security maintenance on
obsolete versions, the only reasonable choice is the latest, except that
taking a few months to move to a new release is also reasonable. So
packaging systems soon (within a year) will no longer offer py-OpenSSL
0.13.1.
Twisted also requires py-OpenSSL, so I think fighting this implies
pulling py-Twisted and py-OpenSSL into tahoe sources. That seems crazy
in terms of maintenance burden.
Overall, my impresssion is that we're seeing a big hiccup because of a
poorly-documented unexpected rototill, and that once packaging systems
catch up with the new dependencies, things will be ok as far as building
tahoe in a packaging system context (or installing deps from packaging
system and then building tahoe itself from source).
I'm annoyed and frustrated by pyOpenSSL adding a dependency on yet another fucking cryptography library. It's going in entirely the wrong direction. The actual mechanics of how to make Tahoe tolerate the dependency are not so much the issue here.
Note that the only overlap between what cryptography.io provides, and what Tahoe-LAFS needs (and can't get from the Python standard lib) is AES and RSA. So we can't use it to remove any of our other crypto dependencies.
Replying to daira:
I strongly disagree. The mechanics of how to make Tahoe tolerate the dependency are the only issue here. If you want to talk about how pyOpenSSL should revert to being a giant pile of ad-hoc C code rather than a tiny wrapper around some statically-verified and well-tested wrappers for the OpenSSL API, the right place for that discussion would be the pyOpenSSL tracker, not here. (However, I can tell you that you're not likely to meet with much success with that proposal.)
pyOpenSSL implements a thing, and the hope here is that the way in which it implements this is abstracted and its clients, like Tahoe, should not need to care how it does it. Having dependencies is one way it might choose to implement this. It may acquire new dependencies in the future (as might Twisted, for that matter, for reasons having nothing to do with cryptography). It's deeply unfortunate that this addition broke things for Tahoe, and hopefully that won't happen in the future, but "your dependencies are part of your API contract and you can't ever add new ones" is not an acceptable constraint for Twisted; nor, I suspect, for any other thing that Tahoe depends on. We should work out how to do this non-destructively in the future. (Note that I believe the Cryptography 0.3 release resolves the installation issue at least for Windows users because it fixed the thing that would cause binary wheels to need to be rebuilt on the installer's system if a new release of cffi came out.)
However, since you seem frustrated let me try to address that. It might be helpful to read the Cryptography team's rationale for creating something new rather than trying to maintain one of the existing options, but that's pretty brief.
First of all, Cryptography is not really "yet another ... cryptography library". My understanding is that it implements exactly no cryptographic primitives itself. Instead, its mission is to be a comprehensive wrapper around and common interface to multiple back-ends which allow for using your platform's existing cryptography engine, allowing you to access things like hardware support and platform security updates without updating all of your application's code. For example, it's not just a wrapper around OpenSSL; it also wraps [the CommonCrypto API from OS X's security framework](https://github.com/pyca/cryptography/tree/master/cryptography/hazmat/bindings/commoncrypto).
(Please be aware that this is my interpretation and I'm not a Cryptography core developer.)
In light of this substantially improved approach, pyOpenSSL is really just a light wrapper around one of these layers. It would be very premature to say pyOpenSSL is "deprecated" – lots of existing software uses it, it'll likely be around for a very long time, and Cryptography doesn't implement nearly enough high-level functionality to replace it – but as Cryptography implements higher level functionality I think there's no real reason why one would rely on the pyOpenSSL version of that functionality rather than Cryptography's. If you use Cryptography, it will be doing your crypto functionality in a way that is multi-backend and doesn't, for example, require you to compile OpenSSL on Windows; instead you could just use existing Windows APIs for these algorithms.
For Twisted, all this stuff is a godsend. Instead of having some stdlib stuff and some pyOpenSSL stuff and some PyCrypto stuff (with known exploits) we can (eventually) just use one library to provide all of our security features.
Replying to daira:
It's pretty early days for Cryptography. This is why I said "at some point in the future". They only added RSA like a couple of weeks ago, and the only high-level (i.e. non-"hazmat") thing implemented so far is Fernet. So there's a huge swath of functionality they're going to be adding already. If some of the Tahoe developers were to talk to the Cryptography folks, it might be easy to convince them to add all of your needs to the library in some form.
Since Tahoe uses more novel cryptographic primitives than many applications, it's possible that the Cryptography developers won't be interested in adding everything you need, but I really think it would be worthwhile to try; given the effort that those developers are investing in fixing their build and deployment story, it might make issues like this a lot easier to deal with in the future.
Replying to [glyph]comment:12:
Well exactly, and this is the core of the issue. The decision to change the scope of
pyOpenSSL
from being a wrapper around justOpenSSL
, to being something that depends on a considerably more ambitious project, came entirely out of the blue from the point of view of Tahoe-LAFS developers. Perhaps we should have been paying more attention, I don't know. I certainly don't have any objection to thecryptography
project; what I have an objection to is the lack of any heads-up to projects depending onpyOpenSSL
(a list of them is easily determined from the package metadata on PyPI) of a rather major design change.In any case, I guess we have no choice but to add
cffi
as another binary dependency. (The pure Python dependencies aren't so much a problem.) Sigh.Replying to [daira]comment:13:
This change was announced in January on the pyOpenSSL mailing list, and then on the Twisted list, and then again on the Twisted list, specifically stressing how major the change would be, not to mention the 4 other alpha pre-release announcements on those lists. I feel like this was reasonably broadly communicated. What more do you feel we should have done?
Replying to daira:
While I understand that the task of adding
cffi
itself is yet more frustrating package churn, there are a number of reasons to be enthusiastic about it:cffi
modules is far easier, safer, etc, than writing any other kind of extension module. Unless you've used it, it's hard to explain just how much easier and safer it is, but one personal anecdote I can give you is that I once spent about 20 hours wrapping a fairly complex C library with cffi, including callbacks, reentrancy, and pointer math, and literally experienced zero segfaults in the process. One segfault I had near the end turned out to be a bug in the C library itself and not a problem with my extension.cffi
modules are JIT-visible in PyPy and will therefore actually speed up applications on PyPy, unlikecpyext
-built extensions, which generally have enough overhead that they slow things down.cffi
team provides packaging and building tools for extension modules to normalize the process of distributing binary dependencies, in a way which is challenging to do for arbitrary extension modules, and (in combination with the PyPA developers) is working on more. This has been somewhat rocky to get off the ground, but once the binary compatibility story for wheels has been worked out for OS X and Linux, this kind of uproar from users when a binary dependency is added should get much rarer, because they'll actually be able to install them from PyPI and have them actually work without a C compiler.Finally, there's a
ctypes
backend forcffi
; I'm not sure how complete it is, but in principle it ought to allow for (slow) invocation of arbitrary cffi-based extensions with no C compiler installed. If Tahoe still has users having problems before The Great Packaging Singularity Where Everything Finally Starts Working, it might be worthwhile to figure out if this can be enabled for your users somehow if their C compiler isn't installed or doesn't work.For what it's worth, I'm both a cryptography developer (kind of, I'm probably the least active one of us) and a PyPA developer. I also have PyNaCl and bcrypt which also use the cffi backends. cffi has some pitfalls with packaging right now, however they are being worked on both on the cffi side with work to remove the implicit compile stuff, in the project themselves with work arounds, and in the packaging tools to better handle binary dependencies in general.
What tahoe-lafs decides to do with it's dependencies will affect y'all far more than it affects me, but I would suggest that pinning to an old release of pyOpenSSL will end up causing you more grief in the long run. I concur with that you should either get rid of it or adopt to the newer style of things.
I also think it'd probably be reasonable for cryptography.io to look to handle more of what tahoe needs from a crypto library too if you're all interested in that. We have very stringent requirements for what gets committed to our code base as far as code quality, test coverage, documentation etc goes and we have goals of being somewhat of a standard library of crypto :).
OK, well, the immediate problem is that
cryptography
depends oncffi
which depends (on Debian at least) on the libffi-dev package. Is there any way to build acffi
egg that (at run-time) doesn't depend on libffi-dev having been installed?Any binary package of cffi (Wheel or Egg) should only require libffi installed not libffi-dev if that's what you mean.
Replying to dstufft:
Sorry, I didn't mean to make a distinction between libffi and libffi-dev here. I meant to ask, is there any way to build a
cffi
egg that at run-time doesn't depend on libffi having been manually installed?Probably if you link it statically instead of dynamically, but I'm not sure that the cffi setup.py supports that ATM.
Replying to dstufft:
Tempting as it is to ask “but then how the heck does it work on Windows!?” I think maybe this issue on the CFFI tracker is the place to keep discussing this stuff, and make sure that the cffi developers are aware of tahoe's requirements.
Zooko and I agreed on the Dev Chat to temporarily work around this by fixing the pyOpenSSL dependency to == 0.13. We're aware that this can only be a short-term work-around, but
cryptography
simply has too many build problems at the moment (#2217 for example).The ticket for a longer-term solution to allowing pyOpenSSL >= 0.14 is #2221.
Re-reading the above comments, I think I may have given a false impression that I don't approve of the overall technical direction that the
cryptography
andpyOpenSSL
developers are taking. That's not the case; I absolutely appreciate the need for FFIs between memory-safe and non-memory-safe languages to reduce the amount of glue code required, and to allow most of the remaining glue code to be on the memory-safe side.cffi
seems to be on the right track here, and it's clear from comparing the code of pyOpenSSL 0.13 and 0.14 that it should be a great deal more maintainable and less error-prone. (I've long been a fan of Standard ML's NLFFI which takes a similar approach.) Also, the ability to use multiple backends fromcryptography
is useful and important.I'm not entirely following the pinning argument. tahoe is building ok in pkgsrc with 0.14. So if you require 0.13, the package will fail and no longer be available, and I'll just mark it BROKEN. What's wrong with a situation where pyOpenSSL 0.14 is properly installed, with all its dependencies? If the new code handles that gracefully, there's no issue - if all you're talking about is forcing a choice of 0.13 when the build of pyOpenSSL is triggered from within tahoe-lafs that sounds fine.
Replying to gdt:
That's a good question. The issue is that the Python package-distribution metadata conflates the question "what dependent package versions are acceptable to meet this package's requirement?", with the question "what dependent package versions should I attempt to fetch and build in order to meet this package's requirement (when not using an OS packaging system)?" That's a problem in situations where the fetch-and-build process is unreliable.
We could hack around this conflation by doing the following:
setup.py
, attempt "from OpenSSL import SSL
".pyOpenSSL == $WHATEVER_VERSION_WAS_IMPORTED
.pyOpenSSL == 0.13
(or== 0.13.1
if we decide to do the OpenSSL version check that way).That would implement the following:
However, attempting to import dependent packages from
setup.py
has caused problems in the past and I'm not entirely sure it's a good idea. If we do it, we should have some exit strategy to stop doing it in a future Tahoe-LAFS version.That sounds reasonable.
Alternatively, there could be two setup.py versions, and a pre-build stage that symlinks one in. The standard one would be for packaging environments, and fail if any prereqs are not installed, never trying to download or build anything else. The developer one could do whatever you want.
I've said this before, but it's important to realize that for software that has widespread usage, almost all uses of it are via a packaging system. Developers live in a bubble where they build the programs they like to hack on directly from source. But I bet nearly all if not all of you are using even python from a packaging system, not a by-hand build.
Replying to [daira]comment:28:
One important problem is that if the
setup.py
process importsOpenSSL
but it turns out to be broken or vulnerable, then there is no way to "unimport" it (well, not reliably), and this may cause problems later in the build. It would be possible to shell out to acheck_pyopenssl.py
script in a separate process, but that's getting rather complicated.gdt: would it be sufficient for you to just patch the
pyOpenSSL == 0.13
requirement insrc/allmydata/_auto_deps.py
in your pkgsrc packaging of Tahoe-LAFS 0.11?Yes, probably it would suffice to patch the control file.
Replying to dstufft:
The Debian package name seems to be libffi6, not libffi. That's annoying because it'll probably change in future.
Replying to [daira]comment:28:
I don't think we should add all this logic into the Tahoe-LAFS build system. I agree that there is a problem here, but I don't think we should try to address that problem by adding code into the Tahoe-LAFS build system. The problem, as I understand it, is that there are two different audiences listening to Tahoe-LAFS's announcements about what version of
pyOpenSSL
it depends on. One audience is packaging engineers like gdt, and the other is the automatic build mechanisms such aspip
.There is no single statement we can make that will give the right idea to both of these audiences. So I propose that we let the machine-parseable declaration in [src/allmydata/_auto_deps.py]source:trunk/src/allmydata/_auto_deps.py?annotate=blame&rev=7bb07fb5e28756fa13ba5190e6c39003c84d3e1e be optimized for the automatic build mechanisms like
pip
, and we use an attendant comment to communicate to the human engineer like gdt. So I propose a solution along these lines:Relatedly, I find it pretty annoying that Tahoe-LAFS needs to mention
pyOpenSSL
at all in its packaging metadata.pyOpenSSL
is not a direct dependency of Tahoe-LAFS, it is a dependency of Foolscap. So, I feel like it ought to be Foolscap's responsibility to deal with all this crap. Hopefully in the future Foolscap will stop depending onpyOpenSSL
at all (perhaps because Foolscap has switched to nacl/libsodium), and at that point there should not need to be any change made to Tahoe-LAFS, because Tahoe-LAFS should just continue to say "I depend on Foolscap.".Related tickets on the foolscap trac:
Here's another thing that Foolscap might switch to instead of SSL: https://github.com/trevp/noise/wiki
The combination of this problem, #2028, and #2249, are making it almost impossible to install or build on Windows :-(
(This problem isn't Windows-specific but it particularly affects Windows systems because most do not have a working compiler.)
In /tahoe-lafs/trac-2024-07-25/commit/b0b76a7c5b89c3fed5a65ef6732dc45e578f12f4:
Fixed by [18ffc29f4949b6098b8b89e6e89c89923121cda2/trunk].
https://caremad.io/2014/11/distributing-a-cffi-project/ explains what is wrong with cffi from a packaging point of view.