cloud backend: for a very large upload, the accounting crawler deletes shares before they are leased #1987
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1987
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I tested uploading a 10 GB (sic) file to the cloud backend on Azure. Before the upload had finished, the accounting crawler ran and started deleting the uploaded chunks. I thought this had been fixed, but it clearly hasn't. (A share is supposed to be considered leased while it is being uploaded, i.e. while it is in STATE_COMING.)
I took a copy of leasedb.sqlite while it was doing this so that I can examine the share state.
#1921 may be the same bug as this. I'm not marking them as duplicates because I'm not sure of that yet.
#1833 would fix this. I would be happy with that method of fixing this, because I really like #1833.
I would like to understand why the current code is failing, anyway. It may be a symptom of shares being in the wrong state when they are being written, or something similar.
After examining the logs more closely, I think I misinterpreted the problem. The upload failed because four consecutive HTTP PUT requests to Azure failed (with
ConnectionLost
TimeoutError
exceptions). Then the share chunks were deleted because that is the behaviour coded in BucketWriter._abort.I'm retrying the 10 GB upload; if it succeeds this time then I'll reenable share deletion.
The 10 GB upload failed but for an unrelated reason (#1991), and uploads up to 2 GB succeeded concurrently with an accounting crawler run.
I made share deletion by the accounting crawler conditional in 416e91ed, and reenabled it in 98b4d8ee on the 1819-cloud-merge branch.
I put a comment on https://github.com/LeastAuthority/tahoe-lafs/commit/416e91ed0948ee2802e0e2ea20dd48befcaae94c#commitcomment-3322692