Improve google search results for phrases like "tahoe file storage" #1719
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1719
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Tahoe-LAFS could benefit from some SEO.
If you search for "tahoe lafs", the first result is tahoe-lafs.org - straight to where you'd expect. However, if you search for "tahoe secure file storage", "tahoe secure", or other reasonable phrases (omitting "lafs"), the results are much less useful. The pycon talk notes tend to show up as the first result -they're filled with allmydata.org links that correctly redirect to https://tahoe-lafs.org, at least.
Beyond that, perhaps by helping web crawlers access the site, we can benefit from the external search engines when searching for tickets, code, etc. (See #1691 for trac search delays)
I was wrong about robots.txt. https://tahoe-lafs.org/robots.txt currently says:
Which I think ought to allow search engines to inde the wiki. I don't know what else is needed to get search engines to give useful results to people making those sorts of services.
Some of our content, such as https://tahoe-lafs.org/trac/tahoe-lafs/browser/docs/about.rst for example, is served up directly from the trac source browser. To let that stuff be indexable, at Tony Arcieri's suggestion, I removed the exclusion of trac from robots.txt. It now looks like this:
This might impose too much CPU and disk-IO load on our server. We'll see.
Brian pointed out that this might also clobber the trac.db, which contains cached information from darcs. Specifically, it caches the "annotate" results (a.k.a. "blame") from darcs. I don't know if it caches anything else.
It currently looks like this:
But "annotate"/"blame" has been broken ever since I upgraded the darcs executable from v2.5 to v2.8, so maybe nothing will get cached.
Looking at the HTTP logs, I'm seeing hits with the Googlebot UA happening a lot faster than every 60 seconds, e.g. 18 hits in a 4 minute period. The "Crawl-Delay" wasn't changed, though, so I'm wondering if maybe that's the wrong field name.
The site feels slower than it did a few months ago, but I don't have any measurements to support it.
The trac.db file today (2012-05-31) is currently at 567MB, up from 408MB in the last three weeks.