remove redundant read from web GET of directory #2822

Open
opened 2016-09-03 21:40:31 +00:00 by warner · 0 comments

While checking out the "recent and active operations" page, I noticed that doing a simple tahoe cp into a pre-existing top-level directory caused a total of 4 mapupdate operations, 4 retrieves, and 1 publish (where I was expecting a single retrieve and a single publish).

It looks like we're doing some redundant operations. The tahoe cp command does two WAPI operations: GET /uri/ALIAS/CHILD?t=json (to see what we're replacing), then a PUT /uri/ALIAS/CHILD (to do the actual assignment). The WAPI GET causes two dirnode operations:

  • a get(childname), called from allmydata.web.directory.DirectoryNodeHandler.childFactory() as it walks through the ALIAS dirnode to find CHILD
  • a get_metadata_for(childname), called from web.filenode.FileNodeHandler.render_GET (in the t=json clause when self.parentnode and self.name are present). We have to retrieve the metadata from the parent directory, because that's how tahoe dirnodes work

I think we should remove the get_metadata_for call, by changing DirectoryNodeHandler.childFactory to use get_child_and_metadata, and passing the metadata into the new FileNodeHandler.

It might be possible to remove the first read that PUT does, but I'm not yet sure how. In general, I wonder if we should have some sort of write-through cache that allows us to remember the contents of dirnodes for a little while, until we know they've changed (because we wrote to the dirnode ourselves).

While checking out the "recent and active operations" page, I noticed that doing a simple `tahoe cp` into a pre-existing top-level directory caused a total of 4 mapupdate operations, 4 retrieves, and 1 publish (where I was expecting a single retrieve and a single publish). It looks like we're doing some redundant operations. The `tahoe cp` command does two WAPI operations: `GET /uri/ALIAS/CHILD?t=json` (to see what we're replacing), then a `PUT /uri/ALIAS/CHILD` (to do the actual assignment). The WAPI GET causes two dirnode operations: * a `get(childname)`, called from `allmydata.web.directory.DirectoryNodeHandler.childFactory()` as it walks through the ALIAS dirnode to find CHILD * a `get_metadata_for(childname)`, called from `web.filenode.FileNodeHandler.render_GET` (in the `t=json` clause when `self.parentnode` and `self.name` are present). We have to retrieve the metadata from the parent directory, because that's how tahoe dirnodes work I think we should remove the `get_metadata_for` call, by changing `DirectoryNodeHandler.childFactory` to use `get_child_and_metadata`, and passing the metadata into the new `FileNodeHandler`. It might be possible to remove the first read that PUT does, but I'm not yet sure how. In general, I wonder if we should have some sort of write-through cache that allows us to remember the contents of dirnodes for a little while, until we know they've changed (because we wrote to the dirnode ourselves).
warner added the
code-frontend-web
normal
defect
1.11.0
labels 2016-09-03 21:40:31 +00:00
warner added this to the undecided milestone 2016-09-03 21:40:31 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#2822
No description provided.