add some questions to allmydata.interfaces

This commit is contained in:
Brian Warner 2007-03-05 20:57:38 -07:00
parent 0d31acf113
commit 61760047cf
1 changed files with 40 additions and 10 deletions

View File

@ -115,17 +115,47 @@ class ICodecEncoder(Interface):
"""Encode some data. This may be called multiple times. Each call is """Encode some data. This may be called multiple times. Each call is
independent. independent.
inshares is a sequence of length required_shares, containing buffers, inshares is a sequence of length required_shares, containing buffers
where each buffer contains the next contiguous non-overlapping (i.e. strings), where each buffer contains the next contiguous
segment of the input data. Each buffer is required to be the same non-overlapping segment of the input data. Each buffer is required to
length, and the sum of the lengths of the buffers is required to be be the same length, and the sum of the lengths of the buffers is
exactly the data_size promised by set_params(). (This implies that required to be exactly the data_size promised by set_params(). (This
the data has to be padded before being passed to encode(), unless of implies that the data has to be padded before being passed to
course it already happens to be an even multiple of required_shares in encode(), unless of course it already happens to be an even multiple
length.) of required_shares in length.)
'desired_share_ids', if provided, is required to be a sequence of ints, QUESTION for zooko: that implies that 'data_size' must be an
each of which is required to be >= 0 and < max_shares. integral multiple of 'required_shares', right? Which means these
restrictions should be documented in set_params() rather than (or in
addition to) encode(), since that's where they must really be
honored. This restriction feels like an abstraction leak, but maybe
it is cleaner to enforce constraints on 'data_size' rather than
quietly implement internal padding. I dunno.
ALSO: the requirement to break up your data into 'required_shares'
chunks before calling encode() feels a bit surprising, at least from
the point of view of a user who doesn't know how FEC works. It feels
like an implementation detail that has leaked outside the
abstraction barrier. Can you imagine a use case in which the data to
be encoded might already be available in pre-segmented chunks, such
that it is faster or less work to make encode() take a list rather
than splitting a single string?
ALSO ALSO: I think 'inshares' is a misleading term, since encode()
is supposed to *produce* shares, so what it *accepts* should be
something other than shares. Other places in this interface use the
word 'data' for that-which-is-not-shares.. maybe we should use that
term?
ALSO*3: given that we need to keep share0+shareid0 attached from
encode() to the eventual decode(), would it be better to return and
accept a zip() of these two lists? i.e. [(share0,shareid0),
(share1,shareid1),...]
'desired_share_ids', if provided, is required to be a sequence of
ints, each of which is required to be >= 0 and < max_shares. If not
provided, encode() will produce 'max_shares' shares, as if
'desired_share_ids' were set to range(max_shares).
For each call, encode() will return a Deferred that fires with two For each call, encode() will return a Deferred that fires with two
lists, one containing shares and the other containing the shareids. lists, one containing shares and the other containing the shareids.