WUI server should have a disallow-all robots.txt #823
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#823
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Currently, if a web crawler gets access to a Tahoe WUI gateway server then it will crawl all reachable links. This is probably undesirable, or at least not a sensible default (even though it is understood that
robots.txt
is not meant as a security mechanism).WUI servers should have a disallow-all
robots.txt
:The robots.txt specification is at http://www.robotstxt.org/orig.html
On closer examination, the Welcome (root) page only links to statistics pages. OTOH, a directory page might be linked from elsewhere on the web, in which case everything reachable from that directory would be crawled. Anyway, it seems easy to fix.
The Welcome page does include the introducer FURL, which some users might want to keep private as per #562.
I think it is kind of cool that I occasionally find files on Tahoe-LAFS grid in google search results.
If you like this bug, you might also like #860.
warner in /tahoe-lafs/trac-2024-07-25/issues/5189#comment:29 gives another reason to fix this ticket:
I disagree with "WUI server should have a disallow-all robots.txt". I think if a web crawler gets access to a cap then it should crawl and index all the files and directories reachable from that cap. I suppose you can put a
robots.txt
file into a directory in Tahoe-LAFS if you want crawlers to ignore that directory.