"tahoe backup" on the same immutable content when some shares are missing does not repair that content. #2035
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2035
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
From
srl295
on IRC:I think it was set to need 2, happy 2, total 3 on 1.9.2 when the original directory upload happen. Same settings under 1.10 when the failure and re-publish happened.
Also from IRC, a recommended reproduction:
More IRC from nejucomo (me):
Replying to nejucomo:
srl
verified that removingbackupdb.sqlite
, deleting the backup directories, and then rerunning backup successfully stored their data into the same immutable caps.Therefore I propose this is a bug in backupdb caching logic. If possible it should verify the health of items in the cache. If this is expensive, maybe it could be an opt-in behavior with a commandline option.
I'm going to update the keywords to reflect this new information.
I'm not certain the current keywords are accurate. I attempted to err on the side of caution and apply them liberally.
upload
because I believe upload does the right thing.repair
may not be relevant because although this is about repairing backups, it's not using any specialized repair mechanism outside of immutable-dedup upload.usability
because without knowing the trick of nukingbackupdb.sqlite
users may believe they've successfully made a backup where some files remain unrecoverable due to the cache.Replying to [nejucomo]comment:5:
backup-and-verify would be nice. I would think that backup could be efficient here, check if the shares are there before re-using its cache.
Also note the trick is to nuke the database AND unlink. So that trick probably can't work with preserving the Archived items.
The backupdb is a performance hack to avoid the latency cost of asking servers whether each file exists on the grid. If the latter were fast enough (which would probably require batching requests for multiple files), then it wouldn't be needed. (
tahoe backup
probabilistically checks some files on each run even if they are present in the backupdb, but I don't think that particularly helps.)In the meantime, how about adding a
--repair
option totahoe backup
, which would bypass the backupdb-based conditional upload and upload/repair every file?Hmm, it looks from this code in the method
BackupDB_v2.check_file
in source:src/allmydata/scripts/backupdb.py, as though the--ignore-timestamps
option oftahoe backup
causes existing db entries to be completely ignored, rather than only ignoring timestamps.Perhaps we just need to rename
--ignore-timestamps
or document it better?Sorry, intended to paste the code:
So when
not use_timestamps
, the existing db entry is deleted and theFileResult
hasNone
for the existing file URI. (Note that we still might not repair the file very well; see #1382.)Replying to daira:
Just to note, I had to both rename the db AND unlink the bad directories to get them repaired.
Replying to [srl]comment:13:
Unlink them from where?
Replying to [zooko]comment:15:
I unlinked the
Latest
andArchives
directories thattahoe backup
createdNitpick: use "uploading" for immutable files or shares, and "publishing" for (versions of) mutable files or shares.
markberger: do any of your improvements address this?
Publishing the same immutable content when some shares are unrecoverable does not repair that content.to Uploading the same immutable content when some shares are unrecoverable does not repair that content.It's unclear to me whether this is just a duplicate of other bugs (e.g. #1130 and #1124) that are being fixed in #1382, or whether it is a separate problem in
tahoe backup
.Replying to daira:
I think this is a different problem to #1382. I think this problem has to do with the fact that "tahoe backup" inspects its local cache "backupdb" and decides that the file is already backed-up, and then does not issue any network requests, which would allow it find out that the file is damaged or even broken.
If that's the issue, possible solutions include:
Changing the Summary of this ticket to reflect what I think the issue is.
Uploading the same immutable content when some shares are unrecoverable does not repair that content.to "tahoe backup" on the same immutable content when some shares are unrecoverable does not repair that content.Shares can be missing; only files/directories can be unrecoverable.
"tahoe backup" on the same immutable content when some shares are unrecoverable does not repair that content.to "tahoe backup" on the same immutable content when some shares are missing does not repair that content.Oh, I missed this:
[nejucomo]comment:5:
So it's definitely the backupdb logic.
I strongly +1 for an "ignore-backupdb" kind of option that could ensure that all files were uploaded at backup time without any backupdb optimization. Even if it is at that added time cost.
If I'm not wrong, unmodified files would produce shares identical to already stored ones and no bandwith would be used.
See also #1331 (--verify option for
tahoe backup
)."(tahoe backup probabilistically checks some files on each run even if they are present in the backupdb, but I don't think that particularly helps.) "
If the random check is to be helpful, then when it encounters a file that the backupdb says should be there, but isn't, it should discard the backupdb and start over assuming that it needs to check every file.
It would also be good to be able to specify a frequency for the random checking since the size and composition of the data in question affects what the best tradeoff is between speed and thoroughness.