shrink UEB: derive more fields from version+filesize #446
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#446
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Our roadmap.txt had "URI step 4" as "perhaps derive more information from
version and filesize, to remove codec_name, codec_params, tail_codec_params,
needed_shares, total_shares, segment_size from the URI Extension"
The idea was to reduce the per-share overhead by being less
forwards-compatible with the contents of the UEB. For example, we include
separate codec_params and tail_codec_params, to give the encoder more
flexibility in choosing these parameters. If we declared that "UEB version 1"
means some well-specified algorithm to derive these parameters from the file
size, then we could rely upon that algorithm instead of storing the
parameters separately.
At the time, we were also using multiple encoders (we had a dummy
"replication" encoder for use before zfec was ready). Another aspect of this
change would be to declare that "UEB version 1" always used the same encoder,
and remove the codec_name field from the UEB.
Personally, I'm not convinced that this is a huge savings, especially compared to all the 32-byte hashes that we keep in the share. OTOH, retaining flexibility in the codec_name even though we only have one codec implemented is kind of pointless.
In changeset:b315619d6b3e5f20 I changed the download side to not require these redundant fields, but to check them if they are there and assert that they are consistent with the other non-redundant fields. That patch was released in Tahoe-1.3.0 so in theory once people use no versions of Tahoe older than 1.3.0 to download files then we could change it to not include these fields at all in upload. I'll keep this ticket open in case that happens, but I think more likely is that we will introduce a new separate format and continue using the old format unchanged.