add some questions to allmydata.interfaces

2007-03-05 20:57:38 -07:00 · 2007-03-05 20:57:38 -07:00 · 61760047cf
commit 61760047cf
parent 0d31acf113
1 changed files with 40 additions and 10 deletions
--- a/src/allmydata/interfaces.py
+++ b/src/allmydata/interfaces.py
@ -115,17 +115,47 @@ class ICodecEncoder(Interface):
        """Encode some data. This may be called multiple times. Each call is 
        independent.
-        inshares is a sequence of length required_shares, containing buffers,
+        inshares is a sequence of length required_shares, containing buffers
-        where each buffer contains the next contiguous non-overlapping
+        (i.e. strings), where each buffer contains the next contiguous
-        segment of the input data.  Each buffer is required to be the same
+        non-overlapping segment of the input data. Each buffer is required to
-        length, and the sum of the lengths of the buffers is required to be
+        be the same length, and the sum of the lengths of the buffers is
-        exactly the data_size promised by set_params().  (This implies that
+        required to be exactly the data_size promised by set_params(). (This
-        the data has to be padded before being passed to encode(), unless of
+        implies that the data has to be padded before being passed to
-        course it already happens to be an even multiple of required_shares in
+        encode(), unless of course it already happens to be an even multiple
-        length.) 
+        of required_shares in length.)
-        'desired_share_ids', if provided, is required to be a sequence of ints,
+         QUESTION for zooko: that implies that 'data_size' must be an
-        each of which is required to be >= 0 and < max_shares.
+         integral multiple of 'required_shares', right? Which means these
         restrictions should be documented in set_params() rather than (or in
         addition to) encode(), since that's where they must really be
         honored. This restriction feels like an abstraction leak, but maybe
         it is cleaner to enforce constraints on 'data_size' rather than
         quietly implement internal padding. I dunno.
         ALSO: the requirement to break up your data into 'required_shares'
         chunks before calling encode() feels a bit surprising, at least from
         the point of view of a user who doesn't know how FEC works. It feels
         like an implementation detail that has leaked outside the
         abstraction barrier. Can you imagine a use case in which the data to
         be encoded might already be available in pre-segmented chunks, such
         that it is faster or less work to make encode() take a list rather
         than splitting a single string?
         ALSO ALSO: I think 'inshares' is a misleading term, since encode()
         is supposed to *produce* shares, so what it *accepts* should be
         something other than shares. Other places in this interface use the
         word 'data' for that-which-is-not-shares.. maybe we should use that
         term?
         ALSO*3: given that we need to keep share0+shareid0 attached from
         encode() to the eventual decode(), would it be better to return and
         accept a zip() of these two lists? i.e. [(share0,shareid0),
         (share1,shareid1),...]
        'desired_share_ids', if provided, is required to be a sequence of
        ints, each of which is required to be >= 0 and < max_shares. If not
        provided, encode() will produce 'max_shares' shares, as if
        'desired_share_ids' were set to range(max_shares).
        For each call, encode() will return a Deferred that fires with two
        lists, one containing shares and the other containing the shareids.