WebAPI: GET /uri/$FILECAP?t=json doesn't return size for mutable files, but the HTML version does #677

Open
opened 2009-04-07 21:34:44 +00:00 by soult · 18 comments
soult commented 2009-04-07 21:34:44 +00:00
Owner

Compare:
HTML version
JSON version

The JSON version is returning "?" as the file size, while the HTML version returns the correct size. This only happens on mutable files.

Compare: [HTML version](http://testgrid.allmydata.org:3567/uri/URI%3ADIR2-RO%3Aas2uresriaeed44mvdstfu46je%3Auhtfyxhbdwbp4zpeda5hydccwt4szrx6dxl27xkmswxo7xmbus4a/mutable-file?t=info) [JSON version](http://testgrid.allmydata.org:3567/uri/URI%3ADIR2-RO%3Aas2uresriaeed44mvdstfu46je%3Auhtfyxhbdwbp4zpeda5hydccwt4szrx6dxl27xkmswxo7xmbus4a/mutable-file?t=json) The JSON version is returning "?" as the file size, while the HTML version returns the correct size. This only happens on mutable files.
tahoe-lafs added the
code-frontend-web
trivial
defect
1.3.0
labels 2009-04-07 21:34:44 +00:00
tahoe-lafs added this to the undecided milestone 2009-04-07 21:34:44 +00:00
swillden commented 2009-06-17 02:27:28 +00:00
Author
Owner

Attachment fix_mutable_size_in_json.patch (239702 bytes) added

Patch that appears to fix the problem

**Attachment** fix_mutable_size_in_json.patch (239702 bytes) added Patch that appears to fix the problem
swillden commented 2009-06-17 02:30:44 +00:00
Author
Owner

This one is causing me some trouble, so I did some investigation.

The immediate culprit is them implementation of MutableFileNode.get_size(), in mutable/filenode.py, line 195. The implementation is supremely simple, and not very useful:

return "?"

I'm assuming this is because finding out the size of a mutable node isn't easy. The implementation of the HTML "More Info" page, makes a deferred call to get_size_of_best_version(), so I presume that's what's needed for the JSON as well.

I've attached a patch that seems to fix the problem, by doing a deferred call to get_size_of_best_version(), but I'm not very familiar with this code so I don't know it's the best way to handle the issue. I'll create a test case, too.

This one is causing me some trouble, so I did some investigation. The immediate culprit is them implementation of [MutableFileNode](wiki/MutableFileNode).get_size(), in mutable/filenode.py, line 195. The implementation is supremely simple, and not very useful: `return "?"` I'm assuming this is because finding out the size of a mutable node isn't easy. The implementation of the HTML "More Info" page, makes a deferred call to get_size_of_best_version(), so I presume that's what's needed for the JSON as well. I've attached a patch that seems to fix the problem, by doing a deferred call to get_size_of_best_version(), but I'm not very familiar with this code so I don't know it's the best way to handle the issue. I'll create a test case, too.

yeah, get_size_of_best_version() is fairly new, and the JSON-rendering code was written before it was available. Also, I seem to remember that requiring a Deferred to get that size was annoying, and I didn't want to slow down a directory listing by doing a mapupdate for every single mutable file therein.

For CHK files, the size is stored in the filecap, so it's trivial to get it. For mutable files, you have to ask around and find out what versions are available. This "mapupdate" operation isn't as expensive as actually retrieving the file's contents, but it's close (a roundtrip or two).

Your approach sounds reasonable. Be sure to think about whether it's t=JSON on a single file that gets this fix, and/or t=JSON on a whole directory (which might not be a good idea, if there are lots of mutable files in it). It'd probably be better to leave the "size" field out of the elements of a directory fetch than to include "?" strings.

yeah, get_size_of_best_version() is fairly new, and the JSON-rendering code was written before it was available. Also, I seem to remember that requiring a Deferred to get that size was annoying, and I didn't want to slow down a directory listing by doing a mapupdate for every single mutable file therein. For CHK files, the size is stored in the filecap, so it's trivial to get it. For mutable files, you have to ask around and find out what versions are available. This "mapupdate" operation isn't as expensive as actually retrieving the file's contents, but it's close (a roundtrip or two). Your approach sounds reasonable. Be sure to think about whether it's t=JSON on a single file that gets this fix, and/or t=JSON on a whole directory (which might not be a good idea, if there are lots of mutable files in it). It'd probably be better to leave the "size" field out of the elements of a directory fetch than to include "?" strings.
davidsarah commented 2009-11-03 03:23:08 +00:00
Author
Owner

Is it worth getting the size(s) only when asked for, for example using t=JSON+size?

Is it worth getting the size(s) only when asked for, for example using `t=JSON+size`?
tahoe-lafs added
minor
and removed
trivial
labels 2009-12-13 00:51:21 +00:00
kevan commented 2010-01-07 19:13:57 +00:00
Author
Owner

This patch conflicts with the current code. Here is the conflict:

        if t == "json":
v v v v v v v
            if self.node.is_mutable():
                d = defer.maybeDeferred(self.node.get_size_of_best_version)
                d.addCallback(lambda sz: FileJSONMetadata(ctx, self.node, sz))
                return d
            else:
                return FileJSONMetadata(ctx, self.node)
*************
            if self.parentnode and self.name:
                d = self.parentnode.get_metadata_for(self.name)
            else:
                d = defer.succeed(None)
            d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md))
            return d
^ ^ ^ ^ ^ ^ ^

A quick fix for that conflict (with ugly nested function) is:

if self.parentnode and self.name:
    d = self.parentnode.get_metadata_for(self.name)
else:
    d = defer.succeed(None)
if self.node.is_mutable():
    def _get_best_size(md):
        d = self.node.get_size_of_best_version()
        d.addCallback(lambda sz: (sz, md))
        return d
    d.addCallback(_get_best_size)
    d.addCallback(lambda (md, sz):
                              FileJSONMetadata(ctx, self.node, md, sz))
else:
    d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md))
return d

There is another conflict in the definition of FileJSONMetadata:

v v v v v v v
def FileJSONMetadata(ctx, filenode, best_size = None):
*************
def FileJSONMetadata(ctx, filenode, edge_metadata):
^ ^ ^ ^ ^ ^ ^

The previous conflicting code passes some metadata to this function so that (IIRC) the 'tahoe ls' command woks correctly. Perhaps a fix here would be to take both edge metadata and best size as parameters?

Also, this fix should have a test written.

When these issues are fixed, I'll be happy to re-review the patch.

This patch conflicts with the current code. Here is the conflict: ``` if t == "json": v v v v v v v if self.node.is_mutable(): d = defer.maybeDeferred(self.node.get_size_of_best_version) d.addCallback(lambda sz: FileJSONMetadata(ctx, self.node, sz)) return d else: return FileJSONMetadata(ctx, self.node) ************* if self.parentnode and self.name: d = self.parentnode.get_metadata_for(self.name) else: d = defer.succeed(None) d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md)) return d ^ ^ ^ ^ ^ ^ ^ ``` A quick fix for that conflict (with ugly nested function) is: ``` if self.parentnode and self.name: d = self.parentnode.get_metadata_for(self.name) else: d = defer.succeed(None) if self.node.is_mutable(): def _get_best_size(md): d = self.node.get_size_of_best_version() d.addCallback(lambda sz: (sz, md)) return d d.addCallback(_get_best_size) d.addCallback(lambda (md, sz): FileJSONMetadata(ctx, self.node, md, sz)) else: d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md)) return d ``` There is another conflict in the definition of FileJSONMetadata: ``` v v v v v v v def FileJSONMetadata(ctx, filenode, best_size = None): ************* def FileJSONMetadata(ctx, filenode, edge_metadata): ^ ^ ^ ^ ^ ^ ^ ``` The previous conflicting code passes some metadata to this function so that (IIRC) the 'tahoe ls' command woks correctly. Perhaps a fix here would be to take both edge metadata and best size as parameters? Also, this fix should have a test written. When these issues are fixed, I'll be happy to re-review the patch.
davidsarah commented 2010-01-07 21:02:09 +00:00
Author
Owner

Replying to kevan:

A quick fix for that conflict (with ugly nested function) is:

if self.parentnode and self.name:
    d = self.parentnode.get_metadata_for(self.name)
else:
    d = defer.succeed(None)
if self.node.is_mutable():
    def _get_best_size(md):
        d = self.node.get_size_of_best_version()
        d.addCallback(lambda sz: (sz, md))
        return d
    d.addCallback(_get_best_size)
    d.addCallback(lambda (md, sz):
                              FileJSONMetadata(ctx, self.node, md, sz))
else:
    d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md))
return d

I'm totally flummoxed by this code. (I even had to look up whether d in _get_best_size will shadow the outer d or reference it. According to PEP 227, this changed between versions of Python.) Also using (sz, md) in one place and (md, sz) in another looks wrong.

I suspect that using two separate deferreds with different variable names would be clearer.

Replying to [kevan](/tahoe-lafs/trac-2024-07-25/issues/677#issuecomment-70574): > A quick fix for that conflict (with ugly nested function) is: ``` if self.parentnode and self.name: d = self.parentnode.get_metadata_for(self.name) else: d = defer.succeed(None) if self.node.is_mutable(): def _get_best_size(md): d = self.node.get_size_of_best_version() d.addCallback(lambda sz: (sz, md)) return d d.addCallback(_get_best_size) d.addCallback(lambda (md, sz): FileJSONMetadata(ctx, self.node, md, sz)) else: d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md)) return d ``` I'm totally flummoxed by this code. (I even had to look up whether d in `_get_best_size` will shadow the outer d or reference it. According to [PEP 227](http://www.python.org/dev/peps/pep-0227/), this changed between versions of Python.) Also using `(sz, md)` in one place and `(md, sz)` in another looks wrong. I suspect that using two separate deferreds with different variable names would be clearer.
kevan commented 2010-01-07 22:39:35 +00:00
Author
Owner

Replying to [davidsarah]comment:7:

I'm totally flummoxed by this code. (I even had to look up whether d in _get_best_size will shadow the outer d or reference it. According to PEP 227, this changed between versions of Python.) Also using (sz, md) in one place and (md, sz) in another looks wrong.

I suspect that using two separate deferreds with different variable names would be clearer.

Indeed, the ordering of md and sz is wrong -- I'd fixed it in my sandbox, but not in that comment. Sorry for the confusion! I also agree on the names of the deferreds.

Replying to [davidsarah]comment:7: > I'm totally flummoxed by this code. (I even had to look up whether d in `_get_best_size` will shadow the outer d or reference it. According to [PEP 227](http://www.python.org/dev/peps/pep-0227/), this changed between versions of Python.) Also using `(sz, md)` in one place and `(md, sz)` in another looks wrong. > > I suspect that using two separate deferreds with different variable names would be clearer. Indeed, the ordering of `md` and `sz` is wrong -- I'd fixed it in my sandbox, but not in that comment. Sorry for the confusion! I also agree on the names of the deferreds.

FWIW, I usually use "d2" when nested callbacks want to use Deferreds too.
Python will shadow the outer name, so it's safe to use "d" everywhere, but
the time spent remembering that is worse than the time spent typing the "2"
:).

Also, there's a new method named get_current_size that is better to use
here. It's defined on both mutable and immutable files, and always returns a
Deferred. So try something like the following instead (this includes a little
bit of cleanup, and keeps the filenode-specific code inside
!FileJSONMetadata.. needs a test, of course):

diff --git a/src/allmydata/web/filenode.py b/src/allmydata/web/filenode.py
index 9fd4402..104fc49 100644
--- a/src/allmydata/web/filenode.py
+++ b/src/allmydata/web/filenode.py
@@ -427,25 +427,29 @@ class FileDownloader(rend.Page):
 
 
 def FileJSONMetadata(ctx, filenode, edge_metadata):
-    if filenode.is_readonly():
-        rw_uri = None
-        ro_uri = filenode.get_uri()
-    else:
-        rw_uri = filenode.get_uri()
-        ro_uri = filenode.get_readonly_uri()
-    data = ("filenode", {})
-    data[1]['size'] = filenode.get_size()
-    if ro_uri:
-        data[1]['ro_uri'] = ro_uri
-    if rw_uri:
-        data[1]['rw_uri'] = rw_uri
-    verifycap = filenode.get_verify_cap()
-    if verifycap:
-        data[1]['verify_uri'] = verifycap.to_string()
-    data[1]['mutable'] = filenode.is_mutable()
-    if edge_metadata is not None:
-        data[1]['metadata'] = edge_metadata
-    return text_plain(simplejson.dumps(data, indent=1) + "\n", ctx)
+    d = filenode.get_current_size()
+    def _got_size(size):
+        data = ("filenode", {})
+        data[1]['size'] = size
+        if filenode.is_readonly():
+            rw_uri = None
+            ro_uri = filenode.get_uri()
+        else:
+            rw_uri = filenode.get_uri()
+            ro_uri = filenode.get_readonly_uri()
+        if ro_uri:
+            data[1]['ro_uri'] = ro_uri
+        if rw_uri:
+            data[1]['rw_uri'] = rw_uri
+        verifycap = filenode.get_verify_cap()
+        if verifycap:
+            data[1]['verify_uri'] = verifycap.to_string()
+        data[1]['mutable'] = filenode.is_mutable()
+        if edge_metadata is not None:
+            data[1]['metadata'] = edge_metadata
+        return text_plain(simplejson.dumps(data, indent=1) + "\n", ctx)
+    d.addCallback(_got_size)
+    return d
 
 def FileURI(ctx, filenode):
     return text_plain(filenode.get_uri(), ctx)
FWIW, I usually use "d2" when nested callbacks want to use Deferreds too. Python will shadow the outer name, so it's safe to use "d" everywhere, but the time spent remembering that is worse than the time spent typing the "2" :). Also, there's a new method named `get_current_size` that is better to use here. It's defined on both mutable and immutable files, and always returns a Deferred. So try something like the following instead (this includes a little bit of cleanup, and keeps the filenode-specific code inside !FileJSONMetadata.. needs a test, of course): ``` diff --git a/src/allmydata/web/filenode.py b/src/allmydata/web/filenode.py index 9fd4402..104fc49 100644 --- a/src/allmydata/web/filenode.py +++ b/src/allmydata/web/filenode.py @@ -427,25 +427,29 @@ class FileDownloader(rend.Page): def FileJSONMetadata(ctx, filenode, edge_metadata): - if filenode.is_readonly(): - rw_uri = None - ro_uri = filenode.get_uri() - else: - rw_uri = filenode.get_uri() - ro_uri = filenode.get_readonly_uri() - data = ("filenode", {}) - data[1]['size'] = filenode.get_size() - if ro_uri: - data[1]['ro_uri'] = ro_uri - if rw_uri: - data[1]['rw_uri'] = rw_uri - verifycap = filenode.get_verify_cap() - if verifycap: - data[1]['verify_uri'] = verifycap.to_string() - data[1]['mutable'] = filenode.is_mutable() - if edge_metadata is not None: - data[1]['metadata'] = edge_metadata - return text_plain(simplejson.dumps(data, indent=1) + "\n", ctx) + d = filenode.get_current_size() + def _got_size(size): + data = ("filenode", {}) + data[1]['size'] = size + if filenode.is_readonly(): + rw_uri = None + ro_uri = filenode.get_uri() + else: + rw_uri = filenode.get_uri() + ro_uri = filenode.get_readonly_uri() + if ro_uri: + data[1]['ro_uri'] = ro_uri + if rw_uri: + data[1]['rw_uri'] = rw_uri + verifycap = filenode.get_verify_cap() + if verifycap: + data[1]['verify_uri'] = verifycap.to_string() + data[1]['mutable'] = filenode.is_mutable() + if edge_metadata is not None: + data[1]['metadata'] = edge_metadata + return text_plain(simplejson.dumps(data, indent=1) + "\n", ctx) + d.addCallback(_got_size) + return d def FileURI(ctx, filenode): return text_plain(filenode.get_uri(), ctx) ```
davidsarah commented 2010-01-15 01:16:08 +00:00
Author
Owner

Needs to address comments before further review.

Needs to address comments before further review.
davidsarah commented 2010-02-08 00:54:24 +00:00
Author
Owner

Replying to swillden:

The immediate culprit is them implementation of MutableFileNode.get_size(), in mutable/filenode.py, line 195. The implementation is supremely simple, and not very useful:

return "?"

Note that this was changed in changeset:e046744f40d59e70. It now returns the last cached size, initially None.

Replying to [swillden](/tahoe-lafs/trac-2024-07-25/issues/677#issuecomment-70569): > The immediate culprit is them implementation of [MutableFileNode](wiki/MutableFileNode).get_size(), in mutable/filenode.py, line 195. The implementation is supremely simple, and not very useful: > > `return "?"` Note that this was changed in changeset:e046744f40d59e70. It now returns the last cached size, initially None.
tahoe-lafs modified the milestone from undecided to 1.7.0 2010-03-17 01:03:51 +00:00
tahoe-lafs modified the milestone from 1.7.0 to 1.7.1 2010-06-12 21:08:35 +00:00
davidsarah commented 2010-07-18 02:20:28 +00:00
Author
Owner

Out of time.

Out of time.
tahoe-lafs modified the milestone from 1.7.1 to 1.8β 2010-07-18 02:20:28 +00:00
davidsarah commented 2010-07-23 00:18:37 +00:00
Author
Owner

Attachment current-size-in-mutable-file-json.dpatch (2427 bytes) added

web.filenode: include the current (rather than cached) file size in the JSON metadata for a mutable file. fixes #677

**Attachment** current-size-in-mutable-file-json.dpatch (2427 bytes) added web.filenode: include the current (rather than cached) file size in the JSON metadata for a mutable file. fixes #677
davidsarah commented 2010-07-23 00:19:23 +00:00
Author
Owner

Attachment omit-size-from-dir-json-if-unknown.dpatch (1424 bytes) added

web.directory: omit the 'size' field for a mutable file in the JSON rendering of a directory if it is not known.

**Attachment** omit-size-from-dir-json-if-unknown.dpatch (1424 bytes) added web.directory: omit the 'size' field for a mutable file in the JSON rendering of a directory if it is not known.
davidsarah commented 2010-07-23 00:27:17 +00:00
Author
Owner

current-size-in-mutable-file-json.dpatch is an adaptation of Brian's code in comment:70577 to current trunk. It only affects t=json for a mutable file, not for directory listings.

omit-size-from-dir-json-if-unknown.dpatch omits the "size" field from directory listings if it is not currently cached. I'm not sure that this is necessary; the current behaviour of including "size": null does not seem too unreasonable.

[current-size-in-mutable-file-json.dpatch](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-e0cc-7b71-e9bf-fc2e5362dc92) is an adaptation of Brian's code in [comment:70577](/tahoe-lafs/trac-2024-07-25/issues/677#issuecomment-70577) to current trunk. It only affects `t=json` for a mutable file, not for directory listings. [omit-size-from-dir-json-if-unknown.dpatch](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-e0cc-7b71-e9bf-a413e2698c3c) omits the "`size`" field from directory listings if it is not currently cached. I'm not sure that this is necessary; the current behaviour of including `"size": null` does not seem too unreasonable.

Hey, you've done this again. I don't understand what it means for a ticket to be both review-needed and test-needed. Shouldn't the reviewer just point out that tests are needed and then unset the review-needed flag?

Hey, you've done this again. I don't understand what it means for a ticket to be both review-needed and test-needed. Shouldn't the reviewer just point out that tests are needed and then unset the review-needed flag?
davidsarah commented 2010-09-11 00:59:22 +00:00
Author
Owner

Replying to zooko:

Shouldn't the reviewer just point out that tests are needed and then unset the review-needed flag?

Yes. Unfortunately I forgot about this ticket, and we're out of time for 1.8.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/677#issuecomment-70585): > Shouldn't the reviewer just point out that tests are needed and then unset the review-needed flag? Yes. Unfortunately I forgot about this ticket, and we're out of time for 1.8.
tahoe-lafs modified the milestone from 1.8β to 1.9.0 2010-09-11 00:59:22 +00:00
davidsarah commented 2011-08-14 01:17:14 +00:00
Author
Owner

I forgot about this ticket again :-( Still needs a test.

I forgot about this ticket again :-( Still needs a test.
tahoe-lafs modified the milestone from 1.9.0 to 1.10.0 2011-08-14 01:17:14 +00:00
davidsarah commented 2013-01-04 03:04:02 +00:00
Author
Owner

Updated to current trunk (taking into account MDMF) and gitified: https://github.com/davidsarah/tahoe-lafs/commits/677-current-size-in-mutable-json. Tests still needed.

Updated to current trunk (taking into account MDMF) and gitified: <https://github.com/davidsarah/tahoe-lafs/commits/677-current-size-in-mutable-json>. Tests still needed.
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#677
No description provided.