Performance problems testing on a 45 disk Dual core 3.3Ghz 4G memory Box #962
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#962
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Performance problems testing on a 45 disk Dual core 3.3Ghz 4G memory Box.
When uploading a 200MB file generating 10 Shares, 3 Needed to recover. The system load spiked to 100% for ~20 minutes. During this time the system was also Swapping.
Performance was extremely slow.
So did you have 45 different Tahoe-LAFS storage server processes? Could you check how much RAM each one is using by using top?
(Or, for bonus points, using smem instead: http://www.selenic.com/smem/ )
It would helpful to know how you have this configured.
System Configuration (http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage)
1 Intel(R) Core(TM)2 Duo CPU E8600 @ 3.33GHz
1 80G boot drive mounted as /
45 Seagate 1.5 T drives mounted as /tahoe/node1-45
4G memory
eth0 is at 100MB/FULL
Default tahoe configuration.
SMEM :
root@firetruck1:~/app/smem-0.9# ./smem |egrep "PID|python"
================
Note,
shares.needed = 32
shares.happy = 35
shares.total = 40
Performance has greatly increased. The 3.3 Blowup may have been the largest contributing factor.
I'm still concerned about memory utilization per node as the nodes increase per grid, but that may be a separate ticket.
Performace is still much less than I would like to see.
bash-3.2$ time curl -T 524stottlemyer.zip http://10.20.0.151:3470/uri/URI:DIR2:whzhc6kxh5mfem4basdoioewji:4qpwpea2s46tg3lnd5znwsyqqenwjoddjxpsvr733sk7n4on6uua/
URI:CHK:fsmf5rn6me7mrueq2ri4zvboay:sd5lnugn2gnisrvodxtzp7zvkx4f2skyk6ri5ncs65rmq6qkdoqa:32:40:106554122
real 1m17.962s
bash-3.2$ ls -larth 524stottlemyer.zip
-rw-r--r-- 1 justin staff 102M Nov 20 15:44 524stottlemyer.zip
I'll have to collect more stats during an upload to present what is going on. A snapshot indicated heavy CPU, nearly no swap in use at this time, and disks were much better able to handle the load w/ great distribution.
Here is a brief network snapshot though time spent on internal operations was much greater.
eth0: 0.003 In, 0.000 Out Mbps | 4 In, 0 Out pps
eth0: 40.860 In, 0.963 Out Mbps | 3709 In, 1912 Out pps
eth0: 92.249 In, 2.226 Out Mbps | 8409 In, 4354 Out pps
eth0: 37.314 In, 0.893 Out Mbps | 3430 In, 1761 Out pps
eth0: 0.006 In, 0.015 Out Mbps | 8 In, 3 Out pps
Is there just one upload happening at a time? Or are you doing many in parallel?
One of the "recent uploads and downloads" stats that you pasted in IRC showed a mutable upload, rather than the normal immutable upload (which the
curl
command above would use). We've documented once or twice how memory usage for mutable files is pretty bad (N/kfilesize, as opposed to the well-limited N/k128KiB for immutable files). source:docs/performance.txt mentions about half the story.. I guess we should probably improve that section.If you had a lot of simultaneous large-memory uploads going on, I can easily see how the overall memory usage would spike, and you'd get a lot of swapping and stalls. It sounds like there were a couple different kinds of tests taking place, though, so I don't want to misdiagnose problem A if the symptoms actually occurred during test B.
Updating with attached straces
http://testgrid.allmydata.org:3567/uri/URI%3ADIR2RO%3Axqm7hlme3oncrzpoqr5thvsh7a%3A64dg3rttsh6nqujoyr62nt7kssmmqaf5rv5nozyuvqocpi4mzzba/
Attachment node1.pid.9394 (5020912 bytes) added
gateway strace of a tahoe put 106mb file.
Attachment node2.pid.3225 (173852 bytes) added
server node strace file.
Are node1.pid.9394 and node2.pid.3225 from the same upload of a 106 MB file?
I should have specified, this latest operation was a Tahoe get of that 106MB file.
Yes these two straces are from the same get, just watching it from two places. The gateway and a storage server, the same operation.
Sorry for the confusion of the bad note in the attachment. I also have the strace from the get, where you can see the 24S wait with no correlating hangs in the other processes.
The Get is in the testgrid link above, the get was too large for an attachment to TRAC.
Replying to stott:
What is this URL? It appears to be a broken link.
Replying to stott:
Sorry, I'm still confused. So node1.pid.9394 is strace of the gateway during a 106 MB GET and node2.pid.3225 is the strace of a random server during that same GET? And the encoding parameters were: shares.needed = 32 shares.happy = 35 shares.total = 40 ?
And, when you did that GET, how well did it perform? Do you know about the "Recent Uploads and Downloads" page? From there you can find timing information about the transfer (as observed by the Tahoe-LAFS gateway itself) that might be useful. Could you attach the timing results from that particular GET (if it is still there on the Recent Uploads and Downloads) or from a new similar GET?
(http://codepad.org/cxuaFwz2) here is the relevant recent get for that download.
I'll attempt to attach the html
Attachment codepad-tahoeget.html (7436 bytes) added
recent stats of the get
Okay, thank you for the timings in codepad-tahoeget.html . I see that it was a mutable file of size 106 MB, with
K=32, N=40
encoding. Therefore I would expect gateway to (temporarily) use up to 2 times 106 MB of RAM when downloading the file.(Aside: hmm, [docs/performance.txt]source:docs/performance.txt@4240#L51 says that it will take
N/K*A
. I don't think that's right—I think it will take justA
(possibly times a factor of 2 or so of memory usage because it does a certain amount of copying the data in RAM). I'll change the doc to say justA
there.)Okay, and the timings reported by the gateway node for each server will just have a single data point because shares of a mutable file are transferred in one big block of size
A/K
. (As contrasted with shares of an immutable file, which are transferred in a succession of blocks each of size128 KiB/K
.)Each of these blocks should be about 106MB/32≅3.33MB.
The timing results are that one of the servers—
cvpcsjio
—sent its 3.33MB block in 442ms and every other server sent its 3.33MB block in 22sec.(Aside: I would really like to know if that was exactly 22,000ms for every one of them or what. The display of these measurements should include a fixed and appropriate amount of precision instead of rounding off to seconds.)
What is special about
cvpcsjio
?Okay, so does this make sense so far? The next question in my mind is how much of that 22 seconds was spent doing what. That 22 seconds encompasses the following sequence of actions:
Looking at the strace of one of the servers (which one was it? Was it
cvpcsjio
?) might help explain how much of the 22 seconds was spent in which of those steps. Also turning on verbose logging (per [docs/logging.txt]source:docs/logging.txt) on one of the servers would give information, but it would also make that server use a lot more CPU and possibly respond slower.Hm... It might be worth adding to source:docs/performance.txt a note about the fact that when downloading a mutable file you first have to wait for the entire file data to transfer from the storage servers to the gateway, and then the data streams from the gateway to your client. By "streams" here, I mean that your client can receive the first part of the data before the gateway has sent the last part of the data. Observe that the data of a mutable file does not "stream" from the storage servers all the way to your client—your client won't see the first part of the mutable file until after the gateway has received the last part of the mutable file from the storage servers.
By contrast, if you download an immutable file it will "stream" all the way from the storage servers to your client—your client can see the first part of the immutable file as soon as the gateway receives the first block of each share from the storage servers, long before the gateway has seen the last part of the immutable file.
This is potentially another reason to prefer immutable files over mutable files for performance when you're using Tahoe-LAFS v1.6. (How big of a performance issue this is depends on the details of your network and servers and the size of the file.)
Replying to zooko:
It would be interesting to see the timing results of downloading a 106 MB immutable file on stott's petabyte-on-a-budget box. It would also be interesting to know the total time from the perspective of the client (which is different from the perspective of the Tahoe-LAFS gateway). For example, if you are downloading the file with "tahoe get" then use the bash "time" builtin to tell you how long from when you pressed ENTER on the "tahoe get" command to when that command finished. Or if you are downloading the file with wget or curl then capture the output that those tools can emit to show elapsed time and observe throughput of their download.
I was collecting those stats as well, the curl and the tahoe get were both taking the same amount of time indicated by the gateway.
If you read the strace output, you can see that it appears that each server actually sends its data in ~450-600ms however the measurement of 22S appears to be the total time of all servers, not the actual time per server. There is nothing special about cvpcsjio except that I believe it actually gave an accurate measure.
I'll test again tomorrow with an immutable file and provide more details.
Oh, I see! Yes, this is suggestive of either a bug in mutable file download performance or a bug in the timings report for mutable file downloads, or both.
I'll examine the source code for the timing reports for mutable file downloads.
I have upgraded to 1.6.0. The speeds upload/download are the same.
Recent stats no longer show uploads.
Recent stats now shows downloads as the newest attachment shows.
Attachment 1.6DL.html (71866 bytes) added
Okay what's the status of this issue? I think it is time to open new tickets for more specific issues now that we can tell them apart.
The first issue that began the ticket was that upload of a large mutable file caused high CPU load on the storage servers. I'll open a new ticket for that issue.
Okay I've created #983 (high CPU load on storage servers when uploading large mutable file) to track that issue.
Stott: I'm not sure what the next step is to make progress on this ticket or to close this ticket. How about: you rerun your performance tests using the shiny new Tahoe-LAFS v1.8.0c3. :-)
It should be not worse than the previous version for mutable files or for uploading immutable files, and better for downloading immutable files. It might be a little bit better if your servers are all similarly performant and if your shares are well distributed, or a lot better if there is more heterogeneity in your server performance, or if some servers are overloaded compared to others.
Stott: would you please run your performance tests using Tahoe-LAFS v1.8.2? If you're not going to do that, let's just close this ticket because it is old and because I don't know what to do to make progress towards closing it. If you have some other request or can explain more specifically one particular issue that you had then we could use that as a starting point for making progress.
Reading back over this ticket, I don't see a lot of evidence about a performance issue specifically in immutable files. Most of the problems turned out to be with mutables, including ticket #983, which has been fixed. We don't know for sure that there aren't still performance issues in the current version when used on a 45-drive box, but until we get more evidence we'll leave this issue closed.