utf-8 decoding fails when certain pyOpenSSL library is used #704

Closed
opened 2009-05-12 19:06:51 +00:00 by bewst · 22 comments
bewst commented 2009-05-12 19:06:51 +00:00
Owner

Please see attached test log

Please see attached test log
tahoe-lafs added the
unknown
major
defect
1.4.1
labels 2009-05-12 19:06:51 +00:00
tahoe-lafs added this to the undecided milestone 2009-05-12 19:06:51 +00:00
bewst commented 2009-05-12 19:07:15 +00:00
Author
Owner

Attachment tahoe.log (134024 bytes) added

**Attachment** tahoe.log (134024 bytes) added
131 KiB

Interesting.. I see a lot of unicode decode exceptions while trying to parse the "subject" of an X.509 certificate (must be some underlying SSL thing, since foolscap doesn't care about fields like that).

Does your system perhaps have a non-ascii hostname?

Could you run the Foolscap unit tests (see http://foolscap.lothar.com/trac to download a tarball directly) and see if they complain about the same sort of thing?

What version of Twisted are these tests using? ("twistd --version" is probably the easiest way to get it, although "./bin/tahoe --version" from your Tahoe tree will give even more information if it works)

Also, please check to see what Python's default encodings are.. here's how I look at them on my system:

% python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'utf-8'
>>> 
Interesting.. I see a lot of unicode decode exceptions while trying to parse the "subject" of an X.509 certificate (must be some underlying SSL thing, since foolscap doesn't care about fields like that). Does your system perhaps have a non-ascii hostname? Could you run the Foolscap unit tests (see <http://foolscap.lothar.com/trac> to download a tarball directly) and see if they complain about the same sort of thing? What version of Twisted are these tests using? ("twistd --version" is probably the easiest way to get it, although "./bin/tahoe --version" from your Tahoe tree will give even more information if it works) Also, please check to see what Python's default encodings are.. here's how I look at them on my system: ``` % python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'ascii' >>> sys.getfilesystemencoding() 'utf-8' >>> ```

Also, could you run the following steps to generate a new certificate and
then examine it to see what the "Subject" names are?

% python
>>> from foolscap import Tub
>>> t = Tub(certFile="dummy.pem")
>>> (Control-D)
% ls dummy.pem
dummy.pem
% openssl x509 -in dummy.pem -text

On my OS-X system, I see "Subject: CN=newpb_thingy". Do you get the same? It
might also help us if you could attach that dummy.pem file to this ticket
(but of course don't use it for anything else).

My current hunch is that the Foolscap-generated x509 certificates are either
being created with corrupt (i.e. non-UTF-8) subject-name strings, or they're
somehow being corrupted afterwards.

Also, could you run the following steps to generate a new certificate and then examine it to see what the "Subject" names are? ``` % python >>> from foolscap import Tub >>> t = Tub(certFile="dummy.pem") >>> (Control-D) % ls dummy.pem dummy.pem % openssl x509 -in dummy.pem -text ``` On my OS-X system, I see "Subject: CN=newpb_thingy". Do you get the same? It might also help us if you could attach that dummy.pem file to this ticket (but of course don't use it for anything else). My current hunch is that the Foolscap-generated x509 certificates are either being created with corrupt (i.e. non-UTF-8) subject-name strings, or they're somehow being corrupted afterwards.

We're waiting for more information from the original bug reporter, bewst.

We're waiting for more information from the original bug reporter, bewst.
zooko added
code-network
and removed
unknown
labels 2009-05-14 20:30:03 +00:00
bewst commented 2009-05-28 04:00:22 +00:00
Author
Owner

Replying to warner:

Interesting.. I see a lot of unicode decode exceptions while trying to parse the "subject" of an X.509 certificate (must be some underlying SSL thing, since foolscap doesn't care about fields like that).

Does your system perhaps have a non-ascii hostname?

Nope. The hostname command yields: “zreba.local”

Could you run the Foolscap unit tests (see http://foolscap.lothar.com/trac to download a tarball directly) and see if they complain about the same sort of thing?

Looks like it does. See attached foolscap.log.

What version of Twisted are these tests using? ("twistd --version" is probably the easiest way to get it, although "./bin/tahoe --version" from your Tahoe tree will give even more information if it works)

Hmm,

$ twistd --version
twistd (the Twisted daemon) 2.5.0
Copyright (c) 2001-2006 Twisted Matrix Laboratories.
See LICENSE for details.
$ # err, OK, that was the one installed with the system's python (2.4)
$ twistd2.5 --version
twistd (the Twisted daemon) 8.2.0
Copyright (c) 2001-2008 Twisted Matrix Laboratories.
See LICENSE for details.
$ ./bin/tahoe --version
allmydata-tahoe: 1.4.1, foolscap: 0.3.2, pycryptopp: 0.5.10, zfec: 1.4.2, Twisted: 8.2.0, Nevow: 0.9.32, zope.interface: 3.3.0, python: 2.5.4, platform: Darwin-9.7.0-i386-32bit, simplejson: 2.0.1, argparse: 0.8.0, pyOpenSSL: 0.7, pyutil: 1.3.28, zbase32: 1.1.1, setuptools: 0.6c12dev

Also, please check to see what Python's default encodings are.. here's how I look at them on my system:

Looks the same as yours:

 python2.5
Python 2.5.4 (r254:67916, May  6 2009, 18:40:46) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'utf-8'
>>> 
Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/704#issuecomment-71065): > Interesting.. I see a lot of unicode decode exceptions while trying to parse the "subject" of an X.509 certificate (must be some underlying SSL thing, since foolscap doesn't care about fields like that). > > Does your system perhaps have a non-ascii hostname? Nope. The `hostname` command yields: “zreba.local” > Could you run the Foolscap unit tests (see <http://foolscap.lothar.com/trac> to download a tarball directly) and see if they complain about the same sort of thing? Looks like it does. See attached foolscap.log. > What version of Twisted are these tests using? ("twistd --version" is probably the easiest way to get it, although "./bin/tahoe --version" from your Tahoe tree will give even more information if it works) Hmm, ``` $ twistd --version twistd (the Twisted daemon) 2.5.0 Copyright (c) 2001-2006 Twisted Matrix Laboratories. See LICENSE for details. $ # err, OK, that was the one installed with the system's python (2.4) $ twistd2.5 --version twistd (the Twisted daemon) 8.2.0 Copyright (c) 2001-2008 Twisted Matrix Laboratories. See LICENSE for details. $ ./bin/tahoe --version allmydata-tahoe: 1.4.1, foolscap: 0.3.2, pycryptopp: 0.5.10, zfec: 1.4.2, Twisted: 8.2.0, Nevow: 0.9.32, zope.interface: 3.3.0, python: 2.5.4, platform: Darwin-9.7.0-i386-32bit, simplejson: 2.0.1, argparse: 0.8.0, pyOpenSSL: 0.7, pyutil: 1.3.28, zbase32: 1.1.1, setuptools: 0.6c12dev ``` > Also, please check to see what Python's default encodings are.. here's how I look at them on my system: <schnipp> Looks the same as yours: ``` python2.5 Python 2.5.4 (r254:67916, May 6 2009, 18:40:46) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'ascii' >>> sys.getfilesystemencoding() 'utf-8' >>> ```
bewst commented 2009-05-28 04:00:57 +00:00
Author
Owner

Attachment foolscap.log (172243 bytes) added

**Attachment** foolscap.log (172243 bytes) added
bewst commented 2009-05-28 04:05:59 +00:00
Author
Owner

Replying to warner:

Also, could you run the following steps to generate a new certificate and
then examine it to see what the "Subject" names are?

On my OS-X system, I see "Subject: CN=newpb_thingy". Do you get the same? It
might also help us if you could attach that dummy.pem file to this ticket
(but of course don't use it for anything else).

My current hunch is that the Foolscap-generated x509 certificates are either
being created with corrupt (i.e. non-UTF-8) subject-name strings, or they're
somehow being corrupted afterwards.

Looks like things are going wrong much earlier:

$ python2.5
Python 2.5.4 (r254:67916, May  6 2009, 18:40:46) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from foolscap import Tub
>>> t = Tub(certFile="dummy.pem")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 222, in __init__
    self.setupEncryptionFile(certFile)
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 234, in setupEncryptionFile
    self.setupEncryption(certData)
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 249, in setupEncryption
    cert = self.createCertificate()
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 442, in createCertificate
    132)
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 539, in signCertificateRequest
    hlreq = CertificateRequest.load(requestData, requestFormat)
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 310, in load
    dn._copyFrom(req.get_subject())
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 64, in _copyFrom
    value = getattr(x509name, name, None)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-5: unsupported Unicode code range
>>> 
Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/704#issuecomment-71066): > Also, could you run the following steps to generate a new certificate and > then examine it to see what the "Subject" names are? <schnipp> > On my OS-X system, I see "Subject: CN=newpb_thingy". Do you get the same? It > might also help us if you could attach that dummy.pem file to this ticket > (but of course don't use it for anything else). > > My current hunch is that the Foolscap-generated x509 certificates are either > being created with corrupt (i.e. non-UTF-8) subject-name strings, or they're > somehow being corrupted afterwards. Looks like things are going wrong much earlier: ```#!python $ python2.5 Python 2.5.4 (r254:67916, May 6 2009, 18:40:46) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from foolscap import Tub >>> t = Tub(certFile="dummy.pem") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 222, in __init__ self.setupEncryptionFile(certFile) File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 234, in setupEncryptionFile self.setupEncryption(certData) File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 249, in setupEncryption cert = self.createCertificate() File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 442, in createCertificate 132) File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 539, in signCertificateRequest hlreq = CertificateRequest.load(requestData, requestFormat) File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 310, in load dn._copyFrom(req.get_subject()) File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 64, in _copyFrom value = getattr(x509name, name, None) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-5: unsupported Unicode code range >>> ```
bewst commented 2009-05-28 20:55:02 +00:00
Author
Owner

I don't know if this is any help, but pdb is showing me this:

(Pdb) p x509name
<X509Name object '/CN=\xFD\xAE\x99\x97\x9D\xB0\xFD\xA2\x97\xB7\x91\xA8\xFD\xA9\x9B\xA6\x9D\xB9'>
I don't know if this is any help, but pdb is showing me this: ``` (Pdb) p x509name <X509Name object '/CN=\xFD\xAE\x99\x97\x9D\xB0\xFD\xA2\x97\xB7\x91\xA8\xFD\xA9\x9B\xA6\x9D\xB9'> ```
bewst commented 2009-05-29 00:44:03 +00:00
Author
Owner

Problem solved, I guess. I mean, it's still a mystery how this could have happened, but I had a pyOpenSSL egg installed that was causing the problem... and it masked the py25-openssl package that I subsequently installed with macports. Everything started working once I had removed the original egg. My strong suspicion is that it was built with a different Python2.5, with a UCS4 setting.

My current Python says:

$ python -c "import sys;print(sys.maxunicode<66000)and'UCS2'or'UCS4'"
UCS2

This page put me onto that possibility.

Problem solved, I guess. I mean, it's still a mystery how this could have happened, but I had a pyOpenSSL egg installed that was causing the problem... and it masked the py25-openssl package that I subsequently installed with macports. Everything started working once I had removed the original egg. My strong suspicion is that it was built with a different Python2.5, with a UCS4 setting. My current Python says: ``` $ python -c "import sys;print(sys.maxunicode<66000)and'UCS2'or'UCS4'" UCS2 ``` [This page](http://www.egenix.com/products/python/pyOpenSSL/) put me onto that possibility.
tahoe-lafs added the
invalid
label 2009-05-29 00:44:23 +00:00
bewst closed this issue 2009-05-29 00:44:23 +00:00

Wow, that's wacky. My OS-X box also reports UCS2, while my linux box reports UCS4. I wonder if that means the pyopenssl library is doing naieve string conversion: interpreting some underlying openssl field as a unicode string, and hoping that openssl is using the same representation as python is using.

Anyways, thanks for tracking this down! I'm sure others will run into this problem again in the future, and it's great to have a searchable page that explains how to fix it.

Wow, that's wacky. My OS-X box also reports UCS2, while my linux box reports UCS4. I wonder if that means the pyopenssl library is doing naieve string conversion: interpreting some underlying openssl field as a unicode string, and hoping that openssl is using the same representation as python is using. Anyways, thanks for tracking this down! I'm sure others will run into this problem again in the future, and it's great to have a searchable page that explains how to fix it.
bewst commented 2009-05-29 15:18:56 +00:00
Author
Owner
Looks like this is an old, old problem: <http://mail.python.org/pipermail/distutils-sig/2006-August/006585.html> :(
bewst commented 2009-05-29 15:27:41 +00:00
Author
Owner
A better link, perhaps: <http://markmail.org/message/bla5vrwlv3kn3n7e>

I opened a ticket for setuptools:

http://bugs.python.org/setuptools/issue78 # egg platform names don't reflect unicode variant (UCS2, UCS4)

I opened a ticket for setuptools: <http://bugs.python.org/setuptools/issue78> # egg platform names don't reflect unicode variant (UCS2, UCS4)
zooko added
packaging
and removed
code-network
invalid
labels 2009-06-10 17:55:27 +00:00
zooko reopened this issue 2009-06-10 17:55:27 +00:00

Thanks for tracking this one down, bewst.

Thanks for tracking this one down, bewst.
zooko changed title from Test failures on MacOS to eggs don't say whether they have UCS2 or UCS4 unicode implementation 2009-06-10 17:56:59 +00:00
bewst commented 2009-09-22 02:20:02 +00:00
Author
Owner

Zooko, what are you waiting for me to do/answer? I don't see it above.

Zooko, what are you waiting for me to do/answer? I don't see it above.

There was no request for you outstanding, so this should have been unassigned from you. However, just recently I started a discussion on the python-dev list, and referenced this ticket, and they said that the symptoms that we observed are not the symptoms they would expect from having an inconsistency of internal unicode format between Python interpreter and Python module. If that were the problem, we should have seen something like "undefined symbol:
PyUnicodeUCS4_FromUnicode", not the utf-8 decode error that we saw.

Here is the comment on python-dev to that effect:

http://mail.python.org/pipermail/python-dev/2009-September/091943.html

So, now there is something you could do to help: see if you still have that pyOpenSSL library that you mentioned, the removal of which fixed this problem for you, so we can try to see what was wrong with it.

There was no request for you outstanding, so this should have been unassigned from you. However, just recently I started a discussion on the python-dev list, and referenced this ticket, and they said that the symptoms that we observed are not the symptoms they would expect from having an inconsistency of internal unicode format between Python interpreter and Python module. If that were the problem, we should have seen something like "undefined symbol: [PyUnicode](wiki/PyUnicode)UCS4_FromUnicode", not the utf-8 decode error that we saw. Here is the comment on python-dev to that effect: <http://mail.python.org/pipermail/python-dev/2009-September/091943.html> So, now there *is* something you could do to help: see if you still have that pyOpenSSL library that you mentioned, the removal of which fixed this problem for you, so we can try to see what was wrong with it.
zooko changed title from eggs don't say whether they have UCS2 or UCS4 unicode implementation to utf-8 decoding fails when certain pyOpenSSL library is used 2009-09-22 02:36:30 +00:00

By the way, over on http://bugs.python.org/setuptools/issue78 midnightmagic says that he had the same symptoms. Maybe he could help us diagnose it.

By the way, over on <http://bugs.python.org/setuptools/issue78> midnightmagic says that he had the same symptoms. Maybe he could help us diagnose it.

I opened a bug report with the pyOpenSSL project: https://bugs.launchpad.net/setuptools/+bug/434411 . pyOpenSSL uses launchpad as its issue tracker, and launchpad has a nice quality of integrating with other issue trackers in order to track issues which span multiple projects. launchpad bug 434411 is currently linked to pyOpenSSL, Tahoe-LAFS, and setuptools, although it may turn out that this issue is independent of the setuptools issue, which has to do with whether your python packages use UCS4 or UCS2 internal unicode encoding.

I opened a bug report with the pyOpenSSL project: <https://bugs.launchpad.net/setuptools/+bug/434411> . pyOpenSSL uses launchpad as its issue tracker, and launchpad has a nice quality of integrating with other issue trackers in order to track issues which span multiple projects. launchpad bug 434411 is currently linked to pyOpenSSL, Tahoe-LAFS, and setuptools, although it may turn out that this issue is independent of the setuptools issue, which has to do with whether your python packages use UCS4 or UCS2 internal unicode encoding.
launchpad commented 2009-09-22 03:21:00 +00:00
Author
Owner

Updating Launchpad bug reference

Updating Launchpad bug reference

Okay, we can't reproduce this issue so I'm going to close this ticket as "wontfix".

Okay, we can't reproduce this issue so I'm going to close this ticket as "wontfix".
zooko added the
wontfix
label 2009-10-27 03:09:07 +00:00
zooko closed this issue 2009-10-27 03:09:07 +00:00

kpreid encountered this same issue. I will add details to http://launchpad.net/bugs/434411 .

kpreid encountered this same issue. I will add details to <http://launchpad.net/bugs/434411> .
zooko removed the
wontfix
label 2010-04-11 17:35:18 +00:00
zooko reopened this issue 2010-04-11 17:35:18 +00:00

I'm closing this (again) as wontfix -- only the pyOpenSSL project, or possibly Python or setuptools or someone -- can fix this.

I'm closing this (again) as `wontfix` -- only the pyOpenSSL project, or possibly Python or setuptools or someone -- can fix this.
zooko added the
wontfix
label 2011-12-29 06:43:08 +00:00
zooko closed this issue 2011-12-29 06:43:08 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#704
No description provided.