benchmark Tahoe-LAFS compared to nosql dbs #932
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#932
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I'm curious how Tahoe-LAFS performs compared to nosql databases on the nosqlish loads that those users care about. Aaron Cordova did some benchmarks of Tahoe-LAFS vs. HDFS as the storage backend for Hadoop and reported in his HadoopWorld presentation that they performed about the same for the map-reduce computation (which is a read-intensive workload): http://www.slideshare.net/cloudera/hw09-map-reduce-over-tahoe-a-least-authority-encrypted-distributed-filesystem
Recently a scientist from Yahoo posted about his benchmarks of various nosql systems:
(@@http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201001.mbox/%3cC2D6929236FAC846B7A4FE1EC39910C64F27B52F25@SP1-EX07VS01.ds.corp.yahoo.com%3e@@)
He says that his benchmarking code will be open-sourced soon pending approval from Yahoo's legal department. Maybe we could contribute patches that make Tahoe-LAFS one of the systems that his benchmark system can measure.
N.B. not to get anyone's hopes up, I would expect Tahoe-LAFS to perform very badly on those workloads! They typically want to assign values to user-specified keys, which we don't have a native implementation of and which we would have to simulate somehow, such as by letting the user-chosen keys be the childnames in a mutable directory. So I would expect Tahoe-LAFS to be pretty much off the charts for bad performance on those workloads. But, I might be pleasantly surprised. And also: "What gets measured gets improved!" :-)
That benchmark that Brian Frank Cooper said would be open sourced has subsequently been open sourced:
http://github.com/brianfrankcooper/YCSB/wiki
I'm going to attempt this benchmarking against mongo.
YCSB Interface layer skeleton @ https://github.com/grubino/Tahoe-YCSB--Interface-Layer/blob/master/TahoeLAFSClient.java
ping me if you want to help out, and i'll give out push privileges.
reorganized and updated Tahoe java driver:
https://github.com/grubino/Tahoe-YCSB--Interface-Layer/blob/master/org/lafs/TahoeLAFSConnection.java
currently blocked on figuring out why the InputStream returned by HttpResponse.getEntity().getContent() is empty. The request seems to be processed correctly, but there's no content which can't be correct. Probably something I'm doing wrong with the Apache HTTP interface. I'll ask around.
What does Apache have to do with it? Isn't the HTTP server the Tahoe-LAFS gateway?
Hi zooko, org.apache.http.[...] is the client-side web interface that I'm using. If you followed the link that I provided, you should have seen some 'import org.apache.[...]' statements in the top of the source files. That's what I was referring to. It turns out that in the Java community, the apache http classes are preferred to the native Java ones. Go figure! Anywho, I believe I've ironed out most of the problems I was having there. I'm currently talking to one of the maintainers of the MongoDB YCSB layer to find out how to get this merged into the YCSB repo, or at least reviewed by someone who knows Java and YCSB. That reminds me: PLEASE_REVIEW_THIS_CODE (when you get a chance):
https://github.com/grubino/Tahoe-YCSB--Interface-Layer
I'm sure that I've run afoul of Java best practices and general development best practices, and I invite anyone reading this to pleez point out my mistakes to me. I've looked over the code and have found a few things that I want to fix, but I'm sure I'm missing some stuff. Also, and not least of all, having reviewers makes me feel loved.
I forgot to mention that I have been able to run some of the workloads (most notably workloada), and the performance for write operations is many orders of magnitude worse for Tahoe LAFS than for MongoDB. Mongo writes about 11,000 entries/sec (on my thinkpad T50) and my Tahoe LAFS test grid (1:1:1) writes about 0.5 (that's one entry every two seconds) or so. I'm not sure if that number would go up or down if I increased N/H/K. I'll post the real numbers when I have them handy, but it hasn't been a priority because there are other workloads that don't seem to be running properly. I want to make sure that the code is relatively bug-free before I actually post the numbers.
Very cool! Real numbers! I look forward to having the time to investigate this. :-)
Need a public place to put TahoeLAFSConnection.jar.
Currently, I just have the source directly in the YCSB tree (err my branch of it):
https://github.com/grubino/YCSB/tree/master/db/tahoe/src/org/lafs
But this isn't really appropriate since the TahoeLAFSConnection class is not really part of YCSB, and I don't think this is going to pass muster with the YCSB maintainers. So once I jar this up, I'll need to put it somewhere that I can link from in the Tahoe YCSB client docs. Preferably somewhere on tahoe-lafs.org. Also, someone from the project may want to review the code at some point and make sure I didn't do anything too horrendous. It might actually be appropriate to put the source for this in the darcs repo too at some point. That would have the nice side-effect of increasing the likelihood that someone from the project would look at it.
Let's create a project below https://github.com/tahoe-lafs for this.