update architecture.txt a little bit
This commit is contained in:
parent
9c5ab89afe
commit
a45bb727d9
|
@ -47,18 +47,18 @@ that would cause it to consume more space than it wants to provide. When a
|
||||||
lease expires, the data is deleted. Peers might renew their leases.
|
lease expires, the data is deleted. Peers might renew their leases.
|
||||||
|
|
||||||
This storage is used to hold "shares", which are themselves used to store
|
This storage is used to hold "shares", which are themselves used to store
|
||||||
files in the grid. There are many shares for each file, typically around 100
|
files in the grid. There are many shares for each file, typically between 10
|
||||||
(the exact number depends upon the tradeoffs made between reliability,
|
and 100 (the exact number depends upon the tradeoffs made between
|
||||||
overhead, and storage space consumed). The files are indexed by a piece of
|
reliability, overhead, and storage space consumed). The files are indexed by
|
||||||
the URI called the "verifierid", which is derived from the contents of the
|
a "StorageIndex", which is derived from the encryption key, which may be
|
||||||
file. Leases are indexed by verifierid, and a single StorageServer may hold
|
randomly generated or it may be derived from the contents of the file. Leases
|
||||||
multiple shares for the corresponding file. Multiple peers can hold leases on
|
are indexed by StorageIndex, and a single StorageServer may hold multiple
|
||||||
the same file, in which case the shares will be kept alive until the last
|
shares for the corresponding file. Multiple peers can hold leases on the same
|
||||||
lease expires. The typical lease is expected to be for one month: enough time
|
file, in which case the shares will be kept alive until the last lease
|
||||||
for interested parties to renew it, but not so long that abandoned data
|
expires. The typical lease is expected to be for one month: enough time for
|
||||||
consumes unreasonable space. Peers are expected to "delete" (drop leases) on
|
interested parties to renew it, but not so long that abandoned data consumes
|
||||||
data that they know they no longer want: lease expiration is meant as a
|
unreasonable space. Peers are expected to "delete" (drop leases) on data that
|
||||||
safety measure.
|
they know they no longer want: lease expiration is meant as a safety measure.
|
||||||
|
|
||||||
In this release, peers learn about each other through the "introducer". Each
|
In this release, peers learn about each other through the "introducer". Each
|
||||||
peer connects to this central introducer at startup, and receives a list of
|
peer connects to this central introducer at startup, and receives a list of
|
||||||
|
@ -78,28 +78,34 @@ http://allmydata.org/trac/tahoe/ticket/22 ).
|
||||||
FILE ENCODING
|
FILE ENCODING
|
||||||
|
|
||||||
When a file is to be added to the grid, it is first encrypted using a key
|
When a file is to be added to the grid, it is first encrypted using a key
|
||||||
that is derived from the hash of the file itself. The encrypted file is then
|
that is derived from the hash of the file itself (if convergence is desired)
|
||||||
broken up into segments so it can be processed in small pieces (to minimize
|
or randomly generated (if not). The encrypted file is then broken up into
|
||||||
the memory footprint of both encode and decode operations, and to increase
|
segments so it can be processed in small pieces (to minimize the memory
|
||||||
the so-called "alacrity": how quickly can the download operation provide
|
footprint of both encode and decode operations, and to increase the so-called
|
||||||
validated data to the user, basically the lag between hitting "play" and the
|
"alacrity": how quickly can the download operation provide validated data to
|
||||||
movie actually starting). Each segment is erasure coded, which creates
|
the user, basically the lag between hitting "play" and the movie actually
|
||||||
encoded blocks that are larger than the input segment, such that only a
|
starting). Each segment is erasure coded, which creates encoded blocks that
|
||||||
subset of the output blocks are required to reconstruct the segment. These
|
are larger than the input segment, such that only a subset of the output
|
||||||
blocks are then combined into "shares", such that a subset of the shares can
|
blocks are required to reconstruct the segment. These blocks are then
|
||||||
be used to reconstruct the whole file. The shares are then deposited in
|
combined into "shares", such that a subset of the shares can be used to
|
||||||
StorageServers in other peers.
|
reconstruct the whole file. The shares are then deposited in StorageServers
|
||||||
|
in other peers.
|
||||||
|
|
||||||
A tagged hash of the original file is called the "fileid", while a
|
A tagged hash of the encryption key is used to form the "storage index",
|
||||||
differently-tagged hash of the original file provides the encryption key. A
|
which is used for both peer selection (described below) and to index shares
|
||||||
tagged hash of the *encrypted* file is called the "verifierid", and is used
|
within the StorageServers on the selected peers.
|
||||||
for both peer selection (described below) and to index shares within the
|
|
||||||
StorageServers on the selected peers.
|
A variety of hashes are computed while the shares are being produced, to
|
||||||
|
validate the plaintext, the crypttext, and the shares themselves. Merkle hash
|
||||||
|
trees are also produced to enable validation of individual segments of
|
||||||
|
plaintext or crypttext without requiring the download/decoding of the whole
|
||||||
|
file. These hashes go into the "URI Extension Block", which will be stored
|
||||||
|
with each share.
|
||||||
|
|
||||||
|
The URI contains the encryption key, the hash of the URI Extension Block, and
|
||||||
|
any encoding parameters necessary to perform the eventual decoding process.
|
||||||
|
For convenience, it also contains the size of the file being stored.
|
||||||
|
|
||||||
The URI contains the fileid, the verifierid, the encryption key, any encoding
|
|
||||||
parameters necessary to perform the eventual decoding process, and some
|
|
||||||
additional hashes that allow the download process to validate the data it
|
|
||||||
receives.
|
|
||||||
|
|
||||||
On the download side, the node that wishes to turn a URI into a sequence of
|
On the download side, the node that wishes to turn a URI into a sequence of
|
||||||
bytes will obtain the necessary shares from remote nodes, break them into
|
bytes will obtain the necessary shares from remote nodes, break them into
|
||||||
|
@ -113,8 +119,12 @@ Netstrings are used where necessary to insure these tags cannot be confused
|
||||||
with the data to be hashed. All encryption uses AES in CTR mode. The erasure
|
with the data to be hashed. All encryption uses AES in CTR mode. The erasure
|
||||||
coding is performed with zfec (a python wrapper around Rizzo's FEC library).
|
coding is performed with zfec (a python wrapper around Rizzo's FEC library).
|
||||||
A Merkle Hash Tree is used to validate the encoded blocks before they are fed
|
A Merkle Hash Tree is used to validate the encoded blocks before they are fed
|
||||||
into the decode process, and a second tree is used to validate the shares
|
into the decode process, and a transverse tree is used to validate the shares
|
||||||
before they are retrieved. The hash tree root is put into the URI.
|
before they are retrieved. A third merkle tree is constructed over the
|
||||||
|
plaintext segments, and a fourth is constructed over the crypttext segments.
|
||||||
|
All necessary hash chains are stored with the shares, and the hash tree roots
|
||||||
|
are put in the URI extension block. The final hash of the extension block
|
||||||
|
goes into the URI itself.
|
||||||
|
|
||||||
Note that the number of shares created is fixed at the time the file is
|
Note that the number of shares created is fixed at the time the file is
|
||||||
uploaded: it is not possible to create additional shares later. The use of a
|
uploaded: it is not possible to create additional shares later. The use of a
|
||||||
|
@ -126,13 +136,16 @@ calculated correctly.
|
||||||
URIs
|
URIs
|
||||||
|
|
||||||
Each URI represents a specific set of bytes. Think of it like a hash
|
Each URI represents a specific set of bytes. Think of it like a hash
|
||||||
function: you feed in a bunch of bytes, and you get out a URI. The URI is
|
function: you feed in a bunch of bytes, and you get out a URI. If convergence
|
||||||
deterministically derived from the input data: changing even one bit of the
|
is enabled, the URI is deterministically derived from the input data:
|
||||||
input data will result in a drastically different URI. The URI provides both
|
changing even one bit of the input data will result in a drastically
|
||||||
"identification" and "location": you can use it to locate/retrieve a set of
|
different URI. If convergence is not enabled, the encoding process will
|
||||||
bytes that are probably the same as the original file, and then you can use
|
generate a different URI each time the file is uploaded.
|
||||||
it to validate that these potential bytes are indeed the ones that you were
|
|
||||||
looking for.
|
The URI provides both "location" and "identification": you can use it to
|
||||||
|
locate/retrieve a set of bytes that are possibly the same as the original
|
||||||
|
file, and then you can use it to validate ("identify") that these potential
|
||||||
|
bytes are indeed the ones that you were looking for.
|
||||||
|
|
||||||
URIs refer to an immutable set of bytes. If you modify a file and upload the
|
URIs refer to an immutable set of bytes. If you modify a file and upload the
|
||||||
new version to the grid, you will get a different URI. URIs do not represent
|
new version to the grid, you will get a different URI. URIs do not represent
|
||||||
|
|
Loading…
Reference in New Issue