download a subtree as an archive #1029
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1029
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
For some use cases it may be useful to retrieve an entire directory tree as an archive. Perhaps the wapi call would look like:
-to retrieve a gzipped tarball.
Issues:
Should the action parameter be
t=
or some other name such asoutput=
?How will the browser name this file?
What if the directory structure contains loops?
What if the full directory tree is huge?
My suggested answers:
The action parameter should be
t=
, because:GET ...?t=
actions retrieve different kinds of information about the referenced object(s), which is the case here;t=
first.The format parameter name doesn't need to be as long as
archive_format
, it could just beformat
oroutput
.The filename should be the last component of the path to the directory if given, otherwise the short base32 SI of the directory. The filetype should be given by the format parameter. It should be possible to override the filename+type using
@@named
.Loops should cause an error. Since the response may already have been started when the loop is detected, this can't be an HTTP error response -- see #822 for possible ways of dealing with that. The gateway will have to remember the SIs of already-seen directories in order to detect loops. (In theory it should be sufficient to remember only mutable directories. We should already be doing that for recursive operations, but I'm not sure we are.)
The directory tree potentially being huge does not present any opportunities for malicious DoS that aren't already present. To avoid these, don't share a gateway with potential DoS-attackers. It does increase the risk of accidental DoS. OTOH, the client can always abort the HTTP request.
Download a dircap as an archive.to download a subtree as an archive#1030 is a CLI interface to this functionality.
Reasons to implement this ticket as a webapi operation rather than directly in the CLI:
On directory loops: Some formats, such as tar, allow symlinks. Would it be possible to translate directory loops into symlinks appropriately?
Replying to nejucomo:
Yes, for those formats.
Python has built-in zipfile and tarfile modules to create .zip and .tar[.gz,.bz2] archives. The tarfile module appears to support writing an archive with symlinks (using a TarInfo object with
.type = SYMTYPE
and.linkname
set).Another issue is the character encoding of file paths. For .zip files there is a bit in the local file header of each file that indicates the encoding is UTF-8 (see Appendix D of the zip format spec), although only a few recently updated zip extractors will recognize this; others will misinterpret the path as Cp437. For .tar files, the PAX format always stores paths as UTF-8. PAX might not be supported by as many extractors as the GNU tar format, although it should be fairly widely supported now.