cloud backend: for a very large upload, the accounting crawler deletes shares before they are leased #1987

New Issue

tahoe-lafs · 2013-05-25T03:06:45Z

daira commented

2013-05-25 03:06:45 +00:00

I tested uploading a 10 GB (sic) file to the cloud backend on Azure. Before the upload had finished, the accounting crawler ran and started deleting the uploaded chunks. I thought this had been fixed, but it clearly hasn't. (A share is supposed to be considered leased while it is being uploaded, i.e. while it is in STATE_COMING.)

I took a copy of leasedb.sqlite while it was doing this so that I can examine the share state.

I tested uploading a 10 GB (sic) file to the cloud backend on Azure. Before the upload had finished, the accounting crawler ran and started deleting the uploaded chunks. I thought this had been fixed, but it clearly hasn't. (A share is supposed to be considered leased while it is being uploaded, i.e. while it is in STATE_COMING.) I took a copy of leasedb.sqlite while it was doing this so that I can examine the share state.

tahoe-lafs added the

labels 2013-05-25 03:06:45 +00:00

tahoe-lafs added this to the soon milestone 2013-05-25 03:06:45 +00:00

daira commented

2013-05-25 22:32:09 +00:00

#1921 may be the same bug as this. I'm not marking them as duplicates because I'm not sure of that yet.

zooko commented

2013-05-27 20:27:22 +00:00

#1833 would fix this. I would be happy with that method of fixing this, because I really like #1833.

daira commented

2013-05-27 21:08:18 +00:00

I would like to understand why the current code is failing, anyway. It may be a symptom of shares being in the wrong state when they are being written, or something similar.

daira commented

2013-05-28 01:07:35 +00:00

After examining the logs more closely, I think I misinterpreted the problem. The upload failed because four consecutive HTTP PUT requests to Azure failed (with ~~ConnectionLost~~ TimeoutError exceptions). Then the share chunks were deleted because that is the behaviour coded in BucketWriter._abort.

After examining the logs more closely, I think I misinterpreted the problem. The upload failed because four consecutive HTTP PUT requests to Azure failed (with ~~`ConnectionLost`~~ `TimeoutError` exceptions). Then the share chunks were deleted because that is the behaviour coded in [BucketWriter._abort](https://github.com/LeastAuthority/tahoe-lafs/blob/1819-cloud-merge/src/allmydata/storage/bucket.py#L88).

daira commented

2013-05-28 01:09:13 +00:00

I'm retrying the 10 GB upload; if it succeeds this time then I'll reenable share deletion.

daira commented

2013-05-30 17:25:54 +00:00

The 10 GB upload failed but for an unrelated reason (#1991), and uploads up to 2 GB succeeded concurrently with an accounting crawler run.

tahoe-lafs added the

invalid

label 2013-05-30 17:25:54 +00:00

daira closed this issue

2013-05-30 17:25:54 +00:00

daira commented

2013-05-30 18:13:44 +00:00

I made share deletion by the accounting crawler conditional in 416e91ed, and reenabled it in 98b4d8ee on the 1819-cloud-merge branch.

I made share deletion by the accounting crawler conditional in [416e91ed](https://github.com/LeastAuthority/tahoe-lafs/commit/416e91ed0948ee2802e0e2ea20dd48befcaae94c), and reenabled it in [98b4d8ee](https://github.com/LeastAuthority/tahoe-lafs/commit/98b4d8ee3cfeccbdd56b45507e4c0e06d8c5bb10) on the 1819-cloud-merge branch.

zooko commented

2013-05-30 19:11:43 +00:00

I put a comment on https://github.com/LeastAuthority/tahoe-lafs/commit/416e91ed0948ee2802e0e2ea20dd48befcaae94c#commitcomment-3322692

I put a comment on <https://github.com/LeastAuthority/tahoe-lafs/commit/416e91ed0948ee2802e0e2ea20dd48befcaae94c#commitcomment-3322692>

Sign in to join this conversation.