shrink UEB: derive more fields from version+filesize #446

Open
opened 2008-06-03 05:05:53 +00:00 by warner · 1 comment

Our roadmap.txt had "URI step 4" as "perhaps derive more information from
version and filesize, to remove codec_name, codec_params, tail_codec_params,
needed_shares, total_shares, segment_size from the URI Extension"

The idea was to reduce the per-share overhead by being less
forwards-compatible with the contents of the UEB. For example, we include
separate codec_params and tail_codec_params, to give the encoder more
flexibility in choosing these parameters. If we declared that "UEB version 1"
means some well-specified algorithm to derive these parameters from the file
size, then we could rely upon that algorithm instead of storing the
parameters separately.

At the time, we were also using multiple encoders (we had a dummy
"replication" encoder for use before zfec was ready). Another aspect of this
change would be to declare that "UEB version 1" always used the same encoder,
and remove the codec_name field from the UEB.

Personally, I'm not convinced that this is a huge savings, especially compared to all the 32-byte hashes that we keep in the share. OTOH, retaining flexibility in the codec_name even though we only have one codec implemented is kind of pointless.

Our roadmap.txt had "URI step 4" as "perhaps derive more information from version and filesize, to remove codec_name, codec_params, tail_codec_params, needed_shares, total_shares, segment_size from the URI Extension" The idea was to reduce the per-share overhead by being less forwards-compatible with the contents of the UEB. For example, we include separate codec_params and tail_codec_params, to give the encoder more flexibility in choosing these parameters. If we declared that "UEB version 1" means some well-specified algorithm to derive these parameters from the file size, then we could rely upon that algorithm instead of storing the parameters separately. At the time, we were also using multiple encoders (we had a dummy "replication" encoder for use before zfec was ready). Another aspect of this change would be to declare that "UEB version 1" always used the same encoder, and remove the codec_name field from the UEB. Personally, I'm not convinced that this is a huge savings, especially compared to all the 32-byte hashes that we keep in the share. OTOH, retaining flexibility in the codec_name even though we only have one codec implemented is kind of pointless.
warner added the
code-encoding
minor
enhancement
1.0.0
labels 2008-06-03 05:05:53 +00:00
warner added this to the undecided milestone 2008-06-03 05:05:53 +00:00

In changeset:b315619d6b3e5f20 I changed the download side to not require these redundant fields, but to check them if they are there and assert that they are consistent with the other non-redundant fields. That patch was released in Tahoe-1.3.0 so in theory once people use no versions of Tahoe older than 1.3.0 to download files then we could change it to not include these fields at all in upload. I'll keep this ticket open in case that happens, but I think more likely is that we will introduce a new separate format and continue using the old format unchanged.

In changeset:b315619d6b3e5f20 I changed the download side to not require these redundant fields, but to check them if they are there and assert that they are consistent with the other non-redundant fields. That patch was released in Tahoe-1.3.0 so in theory once people use no versions of Tahoe older than 1.3.0 to download files then we could change it to not include these fields at all in upload. I'll keep this ticket open in case that happens, but I think more likely is that we will introduce a new separate format and continue using the old format unchanged.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#446
No description provided.