clean up and improve asymptotic complexity of Spans and DataSpans #1182
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1182
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Asympotic efficiency problems in source:src/allmydata/util/spans.py were partly responsible for the regression in download performance discussed in #1170. See:
We changed the downloader so that fewer spans were retained in Spans and DataSpans objects, but this limits their potential for reuse, and we haven't checked whether this could still affect the downloader in some situations.
Here is a copy of /tahoe-lafs/trac-2024-07-25/issues/5860#comment:18 :
Reviewing changeset:cbcb728e7ea0031d:
the
(start, length)
form of theSimpleSpans
constructor is not used outside test code (and the test code can be changed to pass a[(start, length)]
array). Removing this would slightly simplify the constructor and avoid a possibly error-prone overloading.in the
Spans
class comment, ", frequently used to represent .newsrc contents" is out-of-context and not needed.in the
_check
method ofSpans
, if assertions are switched on then theself._spans
array is re-sorted in order to check whether it is ordered. This is unnecessary: if you add anassert length > 0, length
in the loop, then the loop will be checking a condition that is stronger than the array being ordered, given that the starts and lengths are numbers. (The sorting actually takes O(n) time rather than O(n log n) on each call to_check
, because Timsort will detect the ordered input, but it's still unnecessary overhead.)the assert statements should include the variables they use in the exception message, e.g.
assert start > prev_end, (start, prev_end)
."
overlap(s_start, s_length, start, length) or adjacent(s_start, s_length, start, length)
" is equivalent to "overlap(s_start, s_length, start-1, length+2)
".in the only other use of
adjacent
(inDataSpans.add
), only thestart0 < start1
case should be able to occur. Inline and simplify.in the loop over
enumerate(self._spans)
, you could exit early whens_start > start+length
. At that point you know where to insert the(start, length)
span without having to re-sort.a
Spans
object behaves somewhat like a set of the elements in all of its spans, but the*contains*
and*iter*
methods are not consistent with that (instead viewing it as a set of(start, length)
pairs). I realize this may allow for more convenient use ofin
and iteration, but it should at least be documented._check
andassert_invariants
do similar things; give them the same name.DataSpans._dump
is poorly named.DataSpans.assert_invariants
should check that none of the data strings are zero-length.is it intentional that
DataSpans.add
callsself.assert_invariants()
butremove
(andpop
, although that's much simpler) don't?if s_start <= start < s_end:
I find this Python construct too clever by half. Anyway, at this points_start <= start
is always true (otherwise we won't get past the firstcontinue
), so I would write this asPerhaps rename
s_*
toold_*
.DataSpans.add
: if I understand correctly, case A also covers:This isn't immediately clear from the comment. Depicting it as
might help. Also, use uppercase consistently for the cases.
suffix_len
in case D has a different meaning tosuffix_len
in case E. Maybe rename it toreplace_len
for case D.Tests:
The
Spans
class is tested byByteSpans
, but it just stores spans of integers, not necessarily byte offsets. I would suggests/ByteSpans/TestSpans/
ands/StringSpans/TestDataSpans/
.I could be wrong, but I don't think the deterministic tests are covering all cases in
add
andremove
. I'd prefer to see those tests have full coverage rather than relying on the randomized tests to make up the gap.source:trunk/misc/simulators/bench_spans.py@4700 could be useful for measuring your success on this ticket. It reads in a trace file, such as run-112-above28-flog-dump-sh8-on-nsziz.txt and exercise a
DataSpans
object in the way the trace file indicates.This patch: debuggery-trace-spans.dpatch.txt was used to generate that trace file with 1.8.0c2 (which had an issue with superlinear behavior of
Share._received
):Please see the docs of pyutil/benchutil.py for an explanation of the meaning of the measurements output by bench_spans.py.
#1196 is a duplicate.
improve asymptotic complexity of Spans and DataSpansto clean up and improve asymptotic complexity of Spans and DataSpans