memory leak (during check --repair --add-lease) #1939

Open
opened 2013-04-02 21:47:48 +00:00 by killyourtv · 19 comments
killyourtv commented 2013-04-02 21:47:48 +00:00
Owner

While repairing a list of files I've had a problem with the memory usage for the tahoe process growing wildly and leading to OOM.

Attached are incident reports and screencaps of the webui's status page. I don't know if the statuses will tell you what the problem is but it'll be obvious that there is a problem.

Edit:

Oops, I neglected to report the versions:

allmydata-tahoe: 1.9.2,
foolscap: 0.6.4,
pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958,
zfec: 1.4.24,
Twisted: 12.0.0,
Nevow: 0.10.0,
zope.interface: unknown,
python: 2.7.3,
platform: Linux-debian_7.0-x86_64-64bit_ELF,
pyOpenSSL: 0.13,
simplejson: 2.5.2,
pycrypto: 2.6,
pyasn1: unknown,
mock: 0.8.0,
sqlite3: 2.6.0 [sqlite 3.7.13],
setuptools: 0.6 [distribute]
While repairing a list of files I've had a problem with the memory usage for the tahoe process growing wildly and leading to OOM. Attached are incident reports and screencaps of the webui's status page. I don't know if the statuses will tell you what the problem is but it'll be obvious that there is a problem. Edit: Oops, I neglected to report the versions: ``` allmydata-tahoe: 1.9.2, foolscap: 0.6.4, pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958, zfec: 1.4.24, Twisted: 12.0.0, Nevow: 0.10.0, zope.interface: unknown, python: 2.7.3, platform: Linux-debian_7.0-x86_64-64bit_ELF, pyOpenSSL: 0.13, simplejson: 2.5.2, pycrypto: 2.6, pyasn1: unknown, mock: 0.8.0, sqlite3: 2.6.0 [sqlite 3.7.13], setuptools: 0.6 [distribute] ```
tahoe-lafs added the
unknown
normal
defect
1.9.2
labels 2013-04-02 21:47:48 +00:00
tahoe-lafs added this to the undecided milestone 2013-04-02 21:47:48 +00:00
killyourtv commented 2013-04-02 21:48:57 +00:00
Author
Owner

Attachment incident-2013-04-02--11-19-06Z-urhgqtq.flog.bz2 (69488 bytes) added

incident #1

**Attachment** incident-2013-04-02--11-19-06Z-urhgqtq.flog.bz2 (69488 bytes) added incident #1
killyourtv commented 2013-04-02 21:49:29 +00:00
Author
Owner

Attachment incident-2013-04-02--11-51-59Z-3lr26oy.flog.bz2 (37196 bytes) added

incident 2

**Attachment** incident-2013-04-02--11-51-59Z-3lr26oy.flog.bz2 (37196 bytes) added incident 2
killyourtv commented 2013-04-02 21:50:44 +00:00
Author
Owner

Attachment o.png (516033 bytes) added

mutable file retrieve status

**Attachment** o.png (516033 bytes) added mutable file retrieve status
504 KiB
daira commented 2013-04-03 01:48:36 +00:00
Author
Owner

Possibly a duplicate of #1824.

Possibly a duplicate of #1824.

Dumped the flogs; I see no similarities to #1824.

Dumped the flogs; I see no similarities to #1824.

Hey kytv! Good to hear from you again. By the way, have you seen the renewed interest in merging #68? Check it out. Lebek is away, so maybe you should take it over yourself.

Anyway, about this bug report: thank you for the bug report! Are you using exactly Tahoe-LAFS v1.9.2 as generated from our sources, e.g. https://tahoe-lafs.org/source/tahoe-lafs/releases/allmydata-tahoe-1.9.2.tar.bz2, or are you using your branch with the #68 patch, as described on wiki/OSPackages#FoolscapTahoe-LAFSpatchedforuseonI2Pwithsupportformultipleintroducers?

I have a request: change the APPNAME on line 42 of [setup.py]source:setup.py?annotate=blame&rev=ae754e9b15253ea863ced38ebf8cd39ed36fddb4#L42 from allmydata-tahoe to something else like maybe tahoe-lafs-i2p. That way I'll never have to wonder again if someone else's "1.10.0" is the same as my "1.10.0" when someone reports a bug.

Hey kytv! Good to hear from you again. By the way, have you seen the renewed interest in merging #68? Check it out. Lebek is away, so maybe you should take it over yourself. Anyway, about this bug report: thank you for the bug report! Are you using exactly Tahoe-LAFS v1.9.2 as generated from our sources, e.g. <https://tahoe-lafs.org/source/tahoe-lafs/releases/allmydata-tahoe-1.9.2.tar.bz2>, or are you using your branch with the #68 patch, as described on [wiki/OSPackages#FoolscapTahoe-LAFSpatchedforuseonI2Pwithsupportformultipleintroducers](wiki/OSPackages#FoolscapTahoe-LAFSpatchedforuseonI2Pwithsupportformultipleintroducers)? I have a request: change the `APPNAME` on line 42 of [setup.py]source:setup.py?annotate=blame&rev=ae754e9b15253ea863ced38ebf8cd39ed36fddb4#L42 from `allmydata-tahoe` to something else like maybe `tahoe-lafs-i2p`. That way I'll never have to wonder again if someone else's "1.10.0" is the same as my "1.10.0" when someone reports a bug.

Having looked at o.png, I sure hope it is running over i2p, because then the 2 to 10 seconds round-trip times to storage servers are somewhat justifiable. ;-) Seriously, please confirm what specific version it is...

Having looked at [o.png](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-0adf-295d-5cc6-1b24a8c7f317), I sure hope it is running over i2p, because then the 2 to 10 seconds round-trip times to storage servers are somewhat justifiable. ;-) Seriously, please confirm what specific version it is...
killyourtv commented 2013-04-03 20:20:11 +00:00
Author
Owner

Yes, this is over I2P (the build from OSPackages)...and I'll be sure to change APPNAME in future revisions.

RE: #68 -- I need to try reintegrating the multiple introducer support into trunk. I've not been keeping up on it recent developments as much as I'd like to.

Yes, this is over I2P (the build from OSPackages)...and I'll be sure to change APPNAME in future revisions. RE: #68 -- I need to try reintegrating the multiple introducer support into trunk. I've not been keeping up on it recent developments as much as I'd like to.
daira commented 2013-04-04 04:52:01 +00:00
Author
Owner

Replying to zooko:

I have a request: change the APPNAME on line 42 of [setup.py]source:setup.py?annotate=blame&rev=ae754e9b15253ea863ced38ebf8cd39ed36fddb4#L42 from allmydata-tahoe to something else like maybe tahoe-lafs-i2p. That way I'll never have to wonder again if someone else's "1.10.0" is the same as my "1.10.0" when someone reports a bug.

That would break node directories created by earlier versions (or by the official branch), due to #1159. It would be nice to report the git branch name in the version info, but that could be done independently of changing the appname.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1939#issuecomment-91307): > I have a request: change the `APPNAME` on line 42 of [setup.py]source:setup.py?annotate=blame&rev=ae754e9b15253ea863ced38ebf8cd39ed36fddb4#L42 from `allmydata-tahoe` to something else like maybe `tahoe-lafs-i2p`. That way I'll never have to wonder again if someone else's "1.10.0" is the same as my "1.10.0" when someone reports a bug. That would break node directories created by earlier versions (or by the official branch), due to #1159. It would be nice to report the git branch name in the version info, but that could be done independently of changing the appname.

killyourtv: despite daira's comment:8, I would still appreciate it if you would change the APPNAME as described in comment:91307.

killyourtv: despite daira's comment:8, I would still appreciate it if you would change the `APPNAME` as described in [comment:91307](/tahoe-lafs/trac-2024-07-25/issues/1939#issuecomment-91307).
daira commented 2013-04-24 13:49:01 +00:00
Author
Owner

I really think that changing the appname, in any branch, before fixing #1159 is going to cause far more problems than it can solve. I'll file a ticket to include the branch name (and maybe another identifying string separate from the appname that is easier to change) in the version info.

I really think that changing the appname, in any branch, before fixing #1159 is going to cause far more problems than it can solve. I'll file a ticket to include the branch name (and maybe another identifying string separate from the appname that is easier to change) in the version info.
daira commented 2013-04-24 13:59:19 +00:00
Author
Owner

Replying to daira:

I'll file a ticket to include the branch name (and maybe another identifying string separate from the appname that is easier to change) in the version info.

Filed as #1953.

Replying to [daira](/tahoe-lafs/trac-2024-07-25/issues/1939#issuecomment-91312): > I'll file a ticket to include the branch name (and maybe another identifying string separate from the appname that is easier to change) in the version info. Filed as #1953.

Replying to daira:

I really think that changing the appname, in any branch, before fixing #1159 is going to cause far more problems than it can solve. I'll file a ticket to include the branch name (and maybe another identifying string separate from the appname that is easier to change) in the version info.

Argh, fine but then this makes me feel like #1159 is more urgent. ☺

Replying to [daira](/tahoe-lafs/trac-2024-07-25/issues/1939#issuecomment-91312): > I really think that changing the appname, in any branch, before fixing #1159 is going to cause far more problems than it can solve. I'll file a ticket to include the branch name (and maybe another identifying string separate from the appname that is easier to change) in the version info. Argh, fine but then this makes me feel like #1159 is more urgent. ☺
daira commented 2013-04-25 01:48:46 +00:00
Author
Owner

Replying to [zooko]comment:12:

Argh, fine but then this makes me feel like #1159 is more urgent. ☺

I would have bumped its priority, but it already has priority 'major' and milestone 1.11.

Replying to [zooko]comment:12: > Argh, fine but then this makes me feel like #1159 is more urgent. ☺ I would have bumped its priority, but it already has priority 'major' and milestone 1.11.

This ticket is important to me. Seems like a bad bug.

This ticket is important to me. Seems like a bad bug.
zooko added
major
and removed
normal
labels 2013-05-21 20:32:53 +00:00
warner added
code
and removed
unknown
labels 2014-09-11 22:36:14 +00:00

In contradiction to what kpreid said in comment:91306, the flog that kv attached (attachment:incident-2013-04-02--11-19-06Z-urhgqtq.flog.bz2) does have some evidence that points to a similar problem as kpreid's original report. It has this:

[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ngqc) (during method=RIBucketReader:read)
]

and this:

2013-04-01_21:20:12.017651Z [110581]: UNUSUAL {'facility':'tahoe.encoder', 'failure':<foolscap.call.CopiedFailure allmydata.util.pipeline.PipelineError>, 'format':'error while sending %(method)s to shareholder=%(shnum)d', 'incarnation':('\xd0N\xca\xf7\xa7l\x8f\xce', None), 'parent':110314, 'shnum':5, ...} FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: allmydata.util.pipeline.PipelineError: <PipelineError error=([Failure instance: Traceback: <class 'foolscap.ipb.DeadReferenceError'>: Calling Stale Broker
/usr/lib/pymodules/python2.7/allmydata/immutable/layout.py:237:_write
/usr/lib/pymodules/python2.7/allmydata/util/pipeline.py:89:add
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:134:maybeDeferred
/usr/lib/pymodules/python2.7/foolscap/referenceable.py:415:callRemote
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/internet/defer.py:134:maybeDeferred
/usr/lib/pymodules/python2.7/foolscap/referenceable.py:455:_callRemote
/usr/lib/pymodules/python2.7/foolscap/broker.py:477:newRequestID
])>
]

This is similar to the evidence in kpreid's log, which suggests that maybe a foolscap rref becoming None led to the problem.

In contradiction to what kpreid said in [comment:91306](/tahoe-lafs/trac-2024-07-25/issues/1939#issuecomment-91306), the flog that kv attached (attachment:incident-2013-04-02--11-19-06Z-urhgqtq.flog.bz2) does have some evidence that points to a similar problem as kpreid's original report. It has this: ``` [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ngqc) (during method=RIBucketReader:read) ] ``` and this: ``` 2013-04-01_21:20:12.017651Z [110581]: UNUSUAL {'facility':'tahoe.encoder', 'failure':<foolscap.call.CopiedFailure allmydata.util.pipeline.PipelineError>, 'format':'error while sending %(method)s to shareholder=%(shnum)d', 'incarnation':('\xd0N\xca\xf7\xa7l\x8f\xce', None), 'parent':110314, 'shnum':5, ...} FAILURE: [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): Failure: allmydata.util.pipeline.PipelineError: <PipelineError error=([Failure instance: Traceback: <class 'foolscap.ipb.DeadReferenceError'>: Calling Stale Broker /usr/lib/pymodules/python2.7/allmydata/immutable/layout.py:237:_write /usr/lib/pymodules/python2.7/allmydata/util/pipeline.py:89:add /usr/lib/python2.7/dist-packages/twisted/internet/defer.py:134:maybeDeferred /usr/lib/pymodules/python2.7/foolscap/referenceable.py:415:callRemote --- <exception caught here> --- /usr/lib/python2.7/dist-packages/twisted/internet/defer.py:134:maybeDeferred /usr/lib/pymodules/python2.7/foolscap/referenceable.py:455:_callRemote /usr/lib/pymodules/python2.7/foolscap/broker.py:477:newRequestID ])> ] ``` This is similar to the evidence in kpreid's log, which suggests that maybe a foolscap rref becoming None led to the problem.

Oh wow, attachment:2013-04-02--11-19-06Z-urhgqtq.flog.bz2 also has this:

2013-04-02_00:00:45.292535Z [190484]: WEIRD {'f_value':'AES.__init__() argument 1 must be string or read-only character buffer, not None', 'facility':'tahoe.mutable.mapupdate', 'failure':<foolscap.call.CopiedFailure exceptions.TypeError>, 'format':'error during privkey query: %(f_value)s', 'incarnation':('\xd0N\xca\xf7\xa7l\x8f\xce', None), 'parent':190483, ...} FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 723, in _got_results
    d4.addCallback(lambda results, shnum=shnum:
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 301, in addCallback
    callbackKeywords=kw)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 290, in addCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 724, in <lambda>
    self._try_to_validate_privkey(results, server, shnum, lp))
  File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 929, in _try_to_validate_privkey
    alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)
  File "/usr/lib/pymodules/python2.7/allmydata/mutable/filenode.py", line 168, in _decrypt_privkey
    enc = AES(self._writekey)
exceptions.TypeError: AES.__init__() argument 1 must be string or read-only character buffer, not None
]

I don't think I've seen that before!

Oh wow, attachment:2013-04-02--11-19-06Z-urhgqtq.flog.bz2 also has this: ``` 2013-04-02_00:00:45.292535Z [190484]: WEIRD {'f_value':'AES.__init__() argument 1 must be string or read-only character buffer, not None', 'facility':'tahoe.mutable.mapupdate', 'failure':<foolscap.call.CopiedFailure exceptions.TypeError>, 'format':'error during privkey query: %(f_value)s', 'incarnation':('\xd0N\xca\xf7\xa7l\x8f\xce', None), 'parent':190483, ...} FAILURE: [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 723, in _got_results d4.addCallback(lambda results, shnum=shnum: File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 301, in addCallback callbackKeywords=kw) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 290, in addCallbacks self._runCallbacks() --- <exception caught here> --- File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 724, in <lambda> self._try_to_validate_privkey(results, server, shnum, lp)) File "/usr/lib/pymodules/python2.7/allmydata/mutable/servermap.py", line 929, in _try_to_validate_privkey alleged_privkey_s = self._node._decrypt_privkey(enc_privkey) File "/usr/lib/pymodules/python2.7/allmydata/mutable/filenode.py", line 168, in _decrypt_privkey enc = AES(self._writekey) exceptions.TypeError: AES.__init__() argument 1 must be string or read-only character buffer, not None ] ``` I don't think I've seen that before!

killyourtv: I'd really like to investigate these issues, but I can't be sure that I'm looking at the right source code. Tahoe-LAFS v1.10.0 doesn't have alleged_privkey_s = self._node._decrypt_privkey(enc_privkey) on line 929.

killyourtv: I'd really like to investigate these issues, but I can't be sure that I'm looking at the right source code. Tahoe-LAFS v1.10.0 doesn't have `alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)` on [line 929](https://github.com/tahoe-lafs/tahoe-lafs/blame/4751a468408d8cc8c0e7d8165237d128741ce08a/src/allmydata/mutable/servermap.py#l929).

Replying to zooko:

killyourtv: I'd really like to investigate these issues, but I can't be sure that I'm looking at the right source code. Tahoe-LAFS v1.10.0 doesn't have alleged_privkey_s = self._node._decrypt_privkey(enc_privkey) on line 929.

Oh, sorry, the incident report file says that it is Tahoe-LAFS v1.9.2, not v1.10.0. Let's see… Yes! line 929 of Tahoe-LAFS v1.9.2 is alleged_privkey_s = self._node._decrypt_privkey(enc_privkey).

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1939#issuecomment-91323): > killyourtv: I'd really like to investigate these issues, but I can't be sure that I'm looking at the right source code. Tahoe-LAFS v1.10.0 doesn't have `alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)` on [line 929](https://github.com/tahoe-lafs/tahoe-lafs/blame/4751a468408d8cc8c0e7d8165237d128741ce08a/src/allmydata/mutable/servermap.py#l929). Oh, sorry, the incident report file says that it is Tahoe-LAFS v1.9.2, not v1.10.0. Let's see… Yes! [line 929](https://github.com/tahoe-lafs/tahoe-lafs/blame/470acbf1e1d0a525cb8af9fb7a82ebe09033e8ca/src/allmydata/mutable/servermap.py#L292) of Tahoe-LAFS v1.9.2 is `alleged_privkey_s = self._node._decrypt_privkey(enc_privkey)`.
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1939
No description provided.