cloud branch: allmydata.storage.leasedb.NonExistentShareError #2204

Closed
opened 2014-03-22 17:48:54 +00:00 by CyberAxe · 4 comments
CyberAxe commented 2014-03-22 17:48:54 +00:00
Owner
processing 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9'
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_0.jpg'..
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_1.jpg'..
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_0.jpg'..
skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_1.jpg'..
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files'
 creating directory for 'c:\BackMeUp\'
Traceback (most recent call last):
> File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
> File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner
    rc = cli.dispatchcommand(so)
> File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup
    rc = tahoe_backup.backup(options)
> File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup
    return bu.run()
> File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 123, in run
    put_child(archives_url, now, new_backup_dircap)
> File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 57, in put_child
    raise HTTPError("Error during put_child", resp)
HTTPError: Error during put_child: 500 Internal Server Error
"Traceback (most recent call last):
Failure: allmydata.mutable.common.NotEnoughServersError: ('Publish ran out of good servers, last failure was: [instance: Traceback (
failure with no frames): <class \\'foolscap.tokens.RemoteException\\'>: <RemoteException around \\'[CopiedFailure instance: Traceback from r
emote host -- Traceback (most recent call last):\\n  File \"/usr/lib/python2.7/dist-packages/foolscap/eventual.py\", line 26, in _turn\\n
 cb(*args, **kwargs)\\n  File \"/home/customer/LAFS_source/src/allmydata/util/deferredutil.py\", line 55, in _with_log\\n    op(res)\\n  Fil
e \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback\\n    self
._startRunCallbacks(result)\\n  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\
", line 490, in _startRunCallbacks\\n    self._runCallbacks()\\n--- <exception caught here> ---\\n  File \"/usr/local/lib/python2.7/dist-pac
kages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks\\n    current.result = callback(current.re
sult, *args, **kw)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/base.py\", line 170, in _update_lease\\n    account.
add_or_renew_default_lease(self.storage_index, shnum)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 58, in
add_or_renew_default_lease\\n    return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)\\n  File \"/home/custom
er/LAFS_source/src/allmydata/storage/account.py\", line 63, in add_or_renew_lease\\n    renewal_time, expiration_time)\\n  File \"/home/cust
omer/LAFS_source/src/allmydata/storage/leasedb.py\", line 267, in add_or_renew_leases\\n    raise [NonExistentShareError](wiki/NonExistentShareError)(si_s, shnum)\\nallmy
data.storage.leasedb.NonExistentShareError: can\\'t find SI=\\'o2ofk2psfahdhyf5uq3nqdmrwe\\' shnum=0 in \`shares\` table\\n]Failure\\'>\\n]', None)

"

Extracting and reformatting the traceback contained in that error:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/foolscap/eventual.py", line 26, in _turn
 cb(*args, **kwargs)
  File "/home/customer/LAFS_source/src/allmydata/util/deferredutil.py", line 55, in _with_log
    op(res)
  File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py", line 382, in callback
    self._startRunCallbacks(result)
  File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py", line 490, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/customer/LAFS_source/src/allmydata/storage/backends/base.py", line 170, in _update_lease
    account.add_or_renew_default_lease(self.storage_index, shnum)
  File "/home/customer/LAFS_source/src/allmydata/storage/account.py", line 58, in add_or_renew_default_lease
    return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)
  File "/home/customer/LAFS_source/src/allmydata/storage/account.py", line 63, in add_or_renew_lease
    renewal_time, expiration_time)
  File "/home/customer/LAFS_source/src/allmydata/storage/leasedb.py", line 267, in add_or_renew_leases
    raise NonExistentShareError(si_s, shnum)
allmydata.storage.leasedb.NonExistentShareError: can't find SI='o2ofk2psfahdhyf5uq3nqdmrwe' shnum=0 in `shares` table
``` processing 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9' skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_0.jpg'.. skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\0_1.jpg'.. skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_0.jpg'.. skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_1.jpg'.. re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9' re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files' re-using old directory for 'c:\BackMeUp\WesternColorado_files\images' re-using old directory for 'c:\BackMeUp\WesternColorado_files' creating directory for 'c:\BackMeUp\' Traceback (most recent call last): > File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) > File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner rc = cli.dispatchcommand(so) > File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup rc = tahoe_backup.backup(options) > File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup return bu.run() > File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 123, in run put_child(archives_url, now, new_backup_dircap) > File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 57, in put_child raise HTTPError("Error during put_child", resp) HTTPError: Error during put_child: 500 Internal Server Error "Traceback (most recent call last): Failure: allmydata.mutable.common.NotEnoughServersError: ('Publish ran out of good servers, last failure was: [instance: Traceback ( failure with no frames): <class \\'foolscap.tokens.RemoteException\\'>: <RemoteException around \\'[CopiedFailure instance: Traceback from r emote host -- Traceback (most recent call last):\\n File \"/usr/lib/python2.7/dist-packages/foolscap/eventual.py\", line 26, in _turn\\n cb(*args, **kwargs)\\n File \"/home/customer/LAFS_source/src/allmydata/util/deferredutil.py\", line 55, in _with_log\\n op(res)\\n Fil e \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback\\n self ._startRunCallbacks(result)\\n File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\ ", line 490, in _startRunCallbacks\\n self._runCallbacks()\\n--- <exception caught here> ---\\n File \"/usr/local/lib/python2.7/dist-pac kages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks\\n current.result = callback(current.re sult, *args, **kw)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/backends/base.py\", line 170, in _update_lease\\n account. add_or_renew_default_lease(self.storage_index, shnum)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 58, in add_or_renew_default_lease\\n return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)\\n File \"/home/custom er/LAFS_source/src/allmydata/storage/account.py\", line 63, in add_or_renew_lease\\n renewal_time, expiration_time)\\n File \"/home/cust omer/LAFS_source/src/allmydata/storage/leasedb.py\", line 267, in add_or_renew_leases\\n raise [NonExistentShareError](wiki/NonExistentShareError)(si_s, shnum)\\nallmy data.storage.leasedb.NonExistentShareError: can\\'t find SI=\\'o2ofk2psfahdhyf5uq3nqdmrwe\\' shnum=0 in \`shares\` table\\n]Failure\\'>\\n]', None) " ``` Extracting and reformatting the traceback contained in that error: ``` Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/foolscap/eventual.py", line 26, in _turn cb(*args, **kwargs) File "/home/customer/LAFS_source/src/allmydata/util/deferredutil.py", line 55, in _with_log op(res) File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py", line 382, in callback self._startRunCallbacks(result) File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py", line 490, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/customer/LAFS_source/src/allmydata/storage/backends/base.py", line 170, in _update_lease account.add_or_renew_default_lease(self.storage_index, shnum) File "/home/customer/LAFS_source/src/allmydata/storage/account.py", line 58, in add_or_renew_default_lease return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time) File "/home/customer/LAFS_source/src/allmydata/storage/account.py", line 63, in add_or_renew_lease renewal_time, expiration_time) File "/home/customer/LAFS_source/src/allmydata/storage/leasedb.py", line 267, in add_or_renew_leases raise NonExistentShareError(si_s, shnum) allmydata.storage.leasedb.NonExistentShareError: can't find SI='o2ofk2psfahdhyf5uq3nqdmrwe' shnum=0 in `shares` table ```
tahoe-lafs added the
unknown
normal
defect
1.10.0
labels 2014-03-22 17:48:54 +00:00
tahoe-lafs added this to the undecided milestone 2014-03-22 17:48:54 +00:00
CyberAxe commented 2014-03-25 02:08:18 +00:00
Author
Owner

I started with a new SQLite.db but get the same result.

skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_1.jpg'..
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files\images'
 re-using old directory for 'c:\BackMeUp\WesternColorado_files'
 re-using old directory for 'c:\BackMeUp\'
Traceback (most recent call last):
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner
    rc = cli.dispatch[command](so)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup
    rc = tahoe_backup.backup(options)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup
    return bu.run()
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 124, in run
    put_child(to_url, "Latest", new_backup_dircap)
  File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 57, in put_child
    raise HTTPError("Error during put_child", resp)
HTTPError: Error during put_child: 500 Internal Server Error
"Traceback (most recent call last):
Failure: allmydata.mutable.common.NotEnoughServersError: ('Publish ran out of good servers, last failure was: [Failure instance: Traceback (
failure with no frames): <class \\'foolscap.tokens.RemoteException\\'>: <RemoteException around \\'[CopiedFailure instance: Traceback from r
emote host -- Traceback (most recent call last):\\n  File \"/usr/lib/python2.7/dist-packages/foolscap/eventual.py\", line 26, in _turn\\n
 cb(*args, **kwargs)\\n  File \"/home/customer/LAFS_source/src/allmydata/util/deferredutil.py\", line 55, in _with_log\\n    op(res)\\n  Fil
e \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback\\n    self
._startRunCallbacks(result)\\n  File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\
", line 490, in _startRunCallbacks\\n    self._runCallbacks()\\n--- <exception caught here> ---\\n  File \"/usr/local/lib/python2.7/dist-pac
kages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks\\n    current.result = callback(current.re
sult, *args, **kw)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/backends/base.py\", line 170, in _update_lease\\n    account.
add_or_renew_default_lease(self.storage_index, shnum)\\n  File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 58, in
add_or_renew_default_lease\\n    return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)\\n  File \"/home/custom
er/LAFS_source/src/allmydata/storage/account.py\", line 63, in add_or_renew_lease\\n    renewal_time, expiration_time)\\n  File \"/home/cust
omer/LAFS_source/src/allmydata/storage/leasedb.py\", line 267, in add_or_renew_leases\\n    raise NonExistentShareError(si_s, shnum)\\nallmy
data.storage.leasedb.NonExistentShareError: can\\'t find SI=\\'acbv6l7p3wqrcaxgcskc5tmxgy\\' shnum=0 in \`shares\` table\\n]\\'>\\n]', None)

"
I started with a new SQLite.db but get the same result. ``` skipping 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9\1_1.jpg'.. re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files\9' re-using old directory for 'c:\BackMeUp\WesternColorado_files\images\WaspWar3_files' re-using old directory for 'c:\BackMeUp\WesternColorado_files\images' re-using old directory for 'c:\BackMeUp\WesternColorado_files' re-using old directory for 'c:\BackMeUp\' Traceback (most recent call last): File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 156, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\runner.py", line 141, in runner rc = cli.dispatch[command](so) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\cli.py", line 574, in backup rc = tahoe_backup.backup(options) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 325, in backup return bu.run() File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 124, in run put_child(to_url, "Latest", new_backup_dircap) File "c:\allmydata-tahoe-1.10.0\src\allmydata\scripts\tahoe_backup.py", line 57, in put_child raise HTTPError("Error during put_child", resp) HTTPError: Error during put_child: 500 Internal Server Error "Traceback (most recent call last): Failure: allmydata.mutable.common.NotEnoughServersError: ('Publish ran out of good servers, last failure was: [Failure instance: Traceback ( failure with no frames): <class \\'foolscap.tokens.RemoteException\\'>: <RemoteException around \\'[CopiedFailure instance: Traceback from r emote host -- Traceback (most recent call last):\\n File \"/usr/lib/python2.7/dist-packages/foolscap/eventual.py\", line 26, in _turn\\n cb(*args, **kwargs)\\n File \"/home/customer/LAFS_source/src/allmydata/util/deferredutil.py\", line 55, in _with_log\\n op(res)\\n Fil e \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 382, in callback\\n self ._startRunCallbacks(result)\\n File \"/usr/local/lib/python2.7/dist-packages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\ ", line 490, in _startRunCallbacks\\n self._runCallbacks()\\n--- <exception caught here> ---\\n File \"/usr/local/lib/python2.7/dist-pac kages/Twisted-13.2.0-py2.7-linux-i686.egg/twisted/internet/defer.py\", line 577, in _runCallbacks\\n current.result = callback(current.re sult, *args, **kw)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/backends/base.py\", line 170, in _update_lease\\n account. add_or_renew_default_lease(self.storage_index, shnum)\\n File \"/home/customer/LAFS_source/src/allmydata/storage/account.py\", line 58, in add_or_renew_default_lease\\n return self.add_or_renew_lease(storage_index, shnum, renewal_time, expiration_time)\\n File \"/home/custom er/LAFS_source/src/allmydata/storage/account.py\", line 63, in add_or_renew_lease\\n renewal_time, expiration_time)\\n File \"/home/cust omer/LAFS_source/src/allmydata/storage/leasedb.py\", line 267, in add_or_renew_leases\\n raise NonExistentShareError(si_s, shnum)\\nallmy data.storage.leasedb.NonExistentShareError: can\\'t find SI=\\'acbv6l7p3wqrcaxgcskc5tmxgy\\' shnum=0 in \`shares\` table\\n]\\'>\\n]', None) " ```
zooko changed title from in add_or_renew_leases\\n raise NonExistentShareError(si_s, shnum)\\nallmy to data.storage.leasedb.NonExistentShareError 2014-04-15 16:06:30 +00:00
tahoe-lafs added
code-storage
cloud-branch
and removed
unknown
1.10.0
labels 2014-04-15 16:22:31 +00:00
tahoe-lafs changed title from data.storage.leasedb.NonExistentShareError to cloud branch: allmydata.storage.leasedb.NonExistentShareError 2014-04-15 16:22:31 +00:00
daira commented 2014-04-15 16:24:55 +00:00
Author
Owner

Possible duplicate of #1921.

Possible duplicate of #1921.

I've encountered this now as well.

The comment about starting with a new SQLite3 lease database is appreciated - that agrees with my situation and supports my hypothesis about what's going wrong here.

My understanding of the shares table of the lease database is that it is populated in one of two ways:

  1. A new share is created.
  2. The accounting crawler finds a share not in the lease database.

The code is all meant to tolerate lose of the lease database, too.

When _locked_testv_and_readv_and_writev is done applying writes, it attempts to add_or_renew_default_lease for the share (via the account). This eventually calls down to add_or_renew_leases on the lease database. add_or_renew_leases requires that the share being leased already exists in the shares table.

So, if the lease database is ever lost, there is a period while the accounting crawler is re-populating the shares table when it is not possible to write to mutable files.

Because the problem depends on both the implementation of mutable storage, the accounting system, the lease database, and on an external event causing the loss of the lease database, it's not clear to me what testing approach will work.

Also, the implementation of _locked_testv_and_readv_and_writev is sufficiently complex that the idea of touching it at all is pretty scary. It seems like a reasonable fix would be to cause it to always account.add_share before _update_lease (rather than the current behavior of only account.add_shareing if shnum not in sharemap).

I've encountered this now as well. The comment about starting with a new SQLite3 lease database is appreciated - that agrees with my situation and supports my hypothesis about what's going wrong here. My understanding of the `shares` table of the lease database is that it is populated in one of two ways: 1. A new share is created. 2. The accounting crawler finds a share not in the lease database. The code is all meant to tolerate lose of the lease database, too. When `_locked_testv_and_readv_and_writev` is done applying writes, it attempts to `add_or_renew_default_lease` for the share (via the account). This eventually calls down to `add_or_renew_leases` on the lease database. `add_or_renew_leases` requires that the share being leased already exists in the `shares` table. So, if the lease database is ever lost, there is a period while the accounting crawler is re-populating the `shares` table when it is not possible to write to mutable files. Because the problem depends on both the implementation of mutable storage, the accounting system, the lease database, and on an external event causing the loss of the lease database, it's not clear to me what testing approach will work. Also, the implementation of `_locked_testv_and_readv_and_writev` is sufficiently complex that the idea of touching it at all is pretty scary. It seems like a reasonable fix would be to cause it to _always_ `account.add_share` before `_update_lease` (rather than the current behavior of only `account.add_share`ing if `shnum not in sharemap`).

Fixed in the #2237 branch(es):

  • ba6c233e96c9093d419edabadfc4e72a5609966d
  • e936b89fc0b67d0c503630acd83320047ebe7aa9
Fixed in the #2237 branch(es): * ba6c233e96c9093d419edabadfc4e72a5609966d * e936b89fc0b67d0c503630acd83320047ebe7aa9
exarkun added the
fixed
label 2017-08-15 13:04:53 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#2204
No description provided.