cloud backend: a failed upload can leave chunk objects that prevent subsequent uploads of the same file #1920

Closed
opened 2013-02-19 05:08:18 +00:00 by davidsarah · 6 comments
davidsarah commented 2013-02-19 05:08:18 +00:00
Owner

When attempting to upload a ~10 MiB file using the OpenStack cloud backend (to a Rackspace account), the upload failed due to a DNS error:

<class 'allmydata.interfaces.UploadUnhappinessError'>:
shares could be placed or found on only 0 server(s). We were asked to place
shares on at least 1 server(s) such that any 1 of them have enough shares to
recover the file.:
[Failure instance: Traceback (failure with no frames):
<class 'allmydata.util.pipeline.PipelineError'>:
<PipelineError error=([Failure instance: Traceback (failure with no frames):
<class 'foolscap.tokens.RemoteException'>:
<RemoteException around '[CopiedFailure instance: Traceback from remote host --
Traceback (most recent call last):
Failure: twisted.internet.error.DNSLookupError: DNS lookup failed: address
'storage101.dfw1.clouddrive.com' not found: [Errno -2] Name or service not known.
]'> ])> ]

That error is not itself the subject of this ticket. The issue for this ticket is that subsequent uploads of the same immutable file also failed, even after the DNS error had resolved itself:

<class 'allmydata.interfaces.UploadUnhappinessError'>:
server selection failed for <Tahoe2ServerSelector for upload sefes>:
shares could be placed or found on only 0 server(s). We were asked to place
shares on at least 1 server(s) such that any 1 of them have enough shares to
recover the file. (placed 0 shares out of 1 total (1 homeless), want to place
shares on at least 1 servers such that any 1 of them have enough shares to
recover the file, sent 1 queries to 1 servers, 0 queries placed some shares,
1 placed none (of which 0 placed none due to the server being full and 1 placed
none due to an error)) (last failure (from <ServerTracker for server kdu2jtww
and SI sefes>) was:
[Failure instance: Traceback (failure with no frames):
<class 'foolscap.tokens.RemoteException'>:
<RemoteException around '[CopiedFailure instance: Traceback from remote host --
Traceback (most recent call last):
[...]
File "/home/davidsarah/tahoe/git/working/src/allmydata/storage/backends/cloud/cloud_common.py", line 339, in _retry
  d2 = self._handle_error(f, 1, None, description, operation, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks
  current.result = callback(current.result, *args, **kw)
File "/home/davidsarah/tahoe/git/working/src/allmydata/storage/backends/cloud/openstack/openstack_container.py", line 81, in _got_response
  message="unexpected response code %r %s" % (response.code, response.phrase))
allmydata.storage.backends.cloud.cloud_common.CloudError:
("try 1 failed: GET object ('shares/se/sefeslgzc4su3i66b72aytmebm/0',) {}", 404,
'unexpected response code 404 Not Found', None) ]'> ])

On looking at the container contents via the Rackspace Cloud Files WUI, there is only one chunk object stored for this file, with key:

shares/se/sefeslgzc4su3i66b72aytmebm/0.5

(i.e. the 6th chunk).

When attempting to upload a ~10 MiB file using the OpenStack cloud backend (to a Rackspace account), the upload failed due to a DNS error: ``` <class 'allmydata.interfaces.UploadUnhappinessError'>: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file.: [Failure instance: Traceback (failure with no frames): <class 'allmydata.util.pipeline.PipelineError'>: <PipelineError error=([Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): Failure: twisted.internet.error.DNSLookupError: DNS lookup failed: address 'storage101.dfw1.clouddrive.com' not found: [Errno -2] Name or service not known. ]'> ])> ] ``` That error is not itself the subject of this ticket. The issue for this ticket is that subsequent uploads of the same immutable file also failed, even after the DNS error had resolved itself: ``` <class 'allmydata.interfaces.UploadUnhappinessError'>: server selection failed for <Tahoe2ServerSelector for upload sefes>: shares could be placed or found on only 0 server(s). We were asked to place shares on at least 1 server(s) such that any 1 of them have enough shares to recover the file. (placed 0 shares out of 1 total (1 homeless), want to place shares on at least 1 servers such that any 1 of them have enough shares to recover the file, sent 1 queries to 1 servers, 0 queries placed some shares, 1 placed none (of which 0 placed none due to the server being full and 1 placed none due to an error)) (last failure (from <ServerTracker for server kdu2jtww and SI sefes>) was: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): [...] File "/home/davidsarah/tahoe/git/working/src/allmydata/storage/backends/cloud/cloud_common.py", line 339, in _retry d2 = self._handle_error(f, 1, None, description, operation, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 551, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/davidsarah/tahoe/git/working/src/allmydata/storage/backends/cloud/openstack/openstack_container.py", line 81, in _got_response message="unexpected response code %r %s" % (response.code, response.phrase)) allmydata.storage.backends.cloud.cloud_common.CloudError: ("try 1 failed: GET object ('shares/se/sefeslgzc4su3i66b72aytmebm/0',) {}", 404, 'unexpected response code 404 Not Found', None) ]'> ]) ``` On looking at the container contents via the Rackspace Cloud Files WUI, there is only one chunk object stored for this file, with key: ``` shares/se/sefeslgzc4su3i66b72aytmebm/0.5 ``` (i.e. the 6th chunk).
tahoe-lafs added the
code-storage
normal
defect
1.9.2
labels 2013-02-19 05:08:18 +00:00
tahoe-lafs added this to the undecided milestone 2013-02-19 05:08:18 +00:00
davidsarah commented 2013-02-19 05:11:37 +00:00
Author
Owner

I suspect (but I'm not sure) that this is a generic cloud backend issue and could also happen in principle for S3. It may be less likely, especially since we reduced the frequency of S3 errors by retrying.

I suspect (but I'm not sure) that this is a generic cloud backend issue and could also happen in principle for S3. It may be less likely, especially since we reduced the frequency of S3 errors by retrying.

I think this is one kind of failure that would be prevented by Two-Phase Commit (#1755).

I think this is one kind of failure that would be prevented by Two-Phase Commit (#1755).
tahoe-lafs modified the milestone from undecided to 1.12.0 2013-07-22 20:49:37 +00:00

Milestone renamed

Milestone renamed
warner modified the milestone from 1.12.0 to 1.13.0 2016-03-22 05:02:25 +00:00

renaming milestone

renaming milestone
warner modified the milestone from 1.13.0 to 1.14.0 2016-06-28 18:17:14 +00:00

Moving open issues out of closed milestones.

Moving open issues out of closed milestones.
exarkun modified the milestone from 1.14.0 to 1.15.0 2020-06-30 14:45:13 +00:00

The established line of development on the "cloud backend" branch has been abandoned. This ticket is being closed as part of a batch-ticket cleanup for "cloud backend"-related tickets.

If this is a bug, it is probably genuinely no longer relevant. The "cloud backend" branch is too large and unwieldy to ever be merged into the main line of development (particularly now that the Python 3 porting effort is significantly underway).

If this is a feature, it may be relevant to some future efforts - if they are sufficiently similar to the "cloud backend" effort - but I am still closing it because there are no immediate plans for a new development effort in such a direction.

Tickets related to the "leasedb" are included in this set because the "leasedb" code is in the "cloud backend" branch and fairly well intertwined with the "cloud backend". If there is interest in lease implementation change at some future time then that effort will essentially have to be restarted as well.

The established line of development on the "cloud backend" branch has been abandoned. This ticket is being closed as part of a batch-ticket cleanup for "cloud backend"-related tickets. If this is a bug, it is probably genuinely no longer relevant. The "cloud backend" branch is too large and unwieldy to ever be merged into the main line of development (particularly now that the Python 3 porting effort is significantly underway). If this is a feature, it may be relevant to some future efforts - if they are sufficiently similar to the "cloud backend" effort - but I am still closing it because there are no immediate plans for a new development effort in such a direction. Tickets related to the "leasedb" are included in this set because the "leasedb" code is in the "cloud backend" branch and fairly well intertwined with the "cloud backend". If there is interest in lease implementation change at some future time then that effort will essentially have to be restarted as well.
exarkun added the
wontfix
label 2020-10-30 12:35:44 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1920
No description provided.