bug in mutable publish that could cause an IndexError when a writer is removed in Publish._connection_problem #1749

Closed
opened 2012-05-23 04:14:47 +00:00 by davidsarah · 10 comments
davidsarah commented 2012-05-23 04:14:47 +00:00
Owner
"Traceback (most recent call last):
Failure: allmydata.mutable.common.NotEnoughServersError: (\"Publish ran out of good servers, last failure was:
[Failure instance: Traceback: <type 'exceptions.IndexError'>: list index out of range
/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/base.py:800:runUntilCurrent
/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/foolscap-0.6.3-py2.6.egg/foolscap/eventual.py:26:_turn
/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:368:callback
/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:464:_startRunCallbacks
--- <exception caught here> ---\\n/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:551:_runCallbacks
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:634:_push
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:651:push_segment
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:637:_push
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:773:push_everything_else
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:878:finish_publishing
/home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:886:_record_verinfo

I can reproduce this, at least on the cloud-branch, when I do a tahoe put --mutable shortly after the gateway has started.

``` "Traceback (most recent call last): Failure: allmydata.mutable.common.NotEnoughServersError: (\"Publish ran out of good servers, last failure was: [Failure instance: Traceback: <type 'exceptions.IndexError'>: list index out of range /home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/base.py:800:runUntilCurrent /home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/foolscap-0.6.3-py2.6.egg/foolscap/eventual.py:26:_turn /home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:368:callback /home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:464:_startRunCallbacks --- <exception caught here> ---\\n/home/davidsarah/cloud-branch/support/lib/python2.6/site-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py:551:_runCallbacks /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:634:_push /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:651:push_segment /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:637:_push /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:773:push_everything_else /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:878:finish_publishing /home/davidsarah/cloud-branch/src/allmydata/mutable/publish.py:886:_record_verinfo ``` I can reproduce this, at least on the cloud-branch, when I do a `tahoe put --mutable` shortly after the gateway has started.
tahoe-lafs added the
code-mutable
critical
defect
1.9.1
labels 2012-05-23 04:14:47 +00:00
tahoe-lafs added this to the 1.9.2 milestone 2012-05-23 04:14:47 +00:00
davidsarah commented 2012-05-23 04:16:23 +00:00
Author
Owner

zooko suggests this fix, which I reviewed and approved of:

--- old-dw/src/allmydata/mutable/publish.py     2012-05-22 21:48:48.764939441 -0600
+++ new-dw/src/allmydata/mutable/publish.py     2012-05-22 21:48:50.641598788 -0600
@@ -253,6 +253,10 @@
         # updating, we ignore damaged and missing shares -- callers must
         # do a repair to repair and recreate these.
         self.goal = set(self._servermap.get_known_shares())
+
+        # k: shnum, v: [ instance of IMutableSlotWriter ]
+        # The value is required to always be non-empty if the item is present
+        # in the dict at all.
         self.writers = {}
 
         # SDMF files are updated differently.
@@ -891,8 +895,10 @@
         """
         self.log("found problem: %s" % str(f))
         self._last_failure = f
-        self.writers[writer.shnum].remove(writer)
-
+        writers = self.writers[writer.shnum]
+        writers.remove(writer)
+        if len(writers) == 0:
+            del self.writers[writer.shnum]
 
     def log_goal(self, goal, message=""):
         logmsg = [message]
zooko suggests this fix, which I reviewed and approved of: ``` --- old-dw/src/allmydata/mutable/publish.py 2012-05-22 21:48:48.764939441 -0600 +++ new-dw/src/allmydata/mutable/publish.py 2012-05-22 21:48:50.641598788 -0600 @@ -253,6 +253,10 @@ # updating, we ignore damaged and missing shares -- callers must # do a repair to repair and recreate these. self.goal = set(self._servermap.get_known_shares()) + + # k: shnum, v: [ instance of IMutableSlotWriter ] + # The value is required to always be non-empty if the item is present + # in the dict at all. self.writers = {} # SDMF files are updated differently. @@ -891,8 +895,10 @@ """ self.log("found problem: %s" % str(f)) self._last_failure = f - self.writers[writer.shnum].remove(writer) - + writers = self.writers[writer.shnum] + writers.remove(writer) + if len(writers) == 0: + del self.writers[writer.shnum] def log_goal(self, goal, message=""): logmsg = [message] ```
zooko was assigned by tahoe-lafs 2012-05-23 04:16:23 +00:00

sounds like a job for dictutil.DictOfSets!

sounds like a job for `dictutil.DictOfSets`!

Seriously though, yes, DictOfSets was made for this. It deletes the key from the dict when the last value is removed. I skimmed through publish.py: the only place that populates self.writers does so from an unordered set, so self.writers does not need to retain order, so DictOfSets is sufficient, and DictOfLists is not necessary.

Seriously though, yes, `DictOfSets` was made for this. It deletes the key from the dict when the last value is removed. I skimmed through publish.py: the only place that populates `self.writers` does so from an unordered set, so `self.writers` does not need to retain order, so `DictOfSets` is sufficient, and `DictOfLists` is not necessary.
davidsarah commented 2012-05-23 15:36:26 +00:00
Author
Owner

+1 for using DictOfSets.

+1 for using `DictOfSets`.
davidsarah commented 2012-06-15 03:30:48 +00:00
Author
Owner

I looked at changing it to use DictOfSets, but there are several instances of self.writers.values()[0][0]. Currently, that will always pick the first writer added for some arbitrary shnum. If I use list(self.writers.values()[0])[0], that does pass tests, but it is using an arbitrary writer for an arbitrary shnum. Kevan (who wrote this code), is that change ok?

I looked at changing it to use `DictOfSets`, but there are several instances of `self.writers.values()[0][0]`. Currently, that will always pick the first writer added for some arbitrary shnum. If I use `list(self.writers.values()[0])[0]`, that does pass tests, but it is using an arbitrary writer for an arbitrary shnum. Kevan (who wrote this code), is that change ok?
davidsarah commented 2012-06-15 03:33:34 +00:00
Author
Owner

Oh, warner is right, the writers are populated based on self.goal which is unordered.

Oh, warner is right, the writers are populated based on `self.goal` which is unordered.
davidsarah commented 2012-06-15 03:44:13 +00:00
Author
Owner

Note that there are two places where self.writers is initialized, and the diff in comment:88747 only changes one of them. (In general there's a lot of duplicated code between the update and publish methods of class Publish.)

Note that there are two places where `self.writers` is initialized, and the diff in [comment:88747](/tahoe-lafs/trac-2024-07-25/issues/1749#issuecomment-88747) only changes one of them. (In general there's a lot of duplicated code between the `update` and `publish` methods of class `Publish`.)
davidsarah commented 2012-06-15 03:47:23 +00:00
Author
Owner

Attachment fix-1749.darcs.patch (132378 bytes) added

Fix a bug in mutable publish that could cause an IndexError when a writer is removed in Publish._connection_problem. This version uses DictOfSets as suggested by warner. fixes #1749

**Attachment** fix-1749.darcs.patch (132378 bytes) added Fix a bug in mutable publish that could cause an [IndexError](wiki/IndexError) when a writer is removed in Publish._connection_problem. This version uses [DictOfSets](wiki/DictOfSets) as suggested by warner. fixes #1749
davidsarah commented 2012-06-15 03:49:34 +00:00
Author
Owner

The existing tests should be sufficient to check that the changed code works, but we probably should have a test for the original issue. Review needed for the fix.

The existing tests should be sufficient to check that the changed code works, but we probably should have a test for the original issue. Review needed for the fix.
david-sarah@jacaranda.org commented 2012-06-16 18:20:42 +00:00
Author
Owner

In changeset:635e87bd7b460324:

Fix a bug in mutable publish that could cause an IndexError when a writer is removed in Publish._connection_problem. This version uses DictOfSets as suggested by warner. fixes #1749
In changeset:635e87bd7b460324: ``` Fix a bug in mutable publish that could cause an IndexError when a writer is removed in Publish._connection_problem. This version uses DictOfSets as suggested by warner. fixes #1749 ```
tahoe-lafs added the
fixed
label 2012-06-16 18:20:42 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1749
No description provided.