backup has over-general exception swallowing that leads to assertion failures. #1865

Open
opened 2012-11-18 06:42:54 +00:00 by nejucomo · 1 comment

Symptom

I had a long-running tahoe backup command which I suspended. Then I started another tahoe backup and saw this traceback:

Traceback (most recent call last):
  File "/home/n/virtualenvs/default/bin/tahoe", line 8, in <module>
    load_entry_point('allmydata-tahoe==1.9.2', 'console_scripts', 'tahoe')()
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 113, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 99, in runner
    rc = cli.dispatch[command](so)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/cli.py", line 569, in backup
    rc = tahoe_backup.backup(options)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 325, in backup
    return bu.run()
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 118, in run
    new_backup_dircap = self.process(options.from_dir)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 194, in process
    childcap, metadata = self.upload(childpath)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 313, in upload
    bdb_results.did_upload(filecap)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 134, in did_upload
    self.mtime, self.ctime, self.size)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 276, in did_upload_file
    fileid = self.get_or_allocate_fileid_for_cap(filecap)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 270, in get_or_allocate_fileid_for_cap
    assert foundrow
AssertionError

Diagnosis:

After examining the code with much help from Zooko, I tweaked the installed source like this:

--- ./allmydata-tahoe-1.9.2/src/allmydata/scripts/backupdb.py   2012-05-14 03:07:24.000000000 -0700
+++ /home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py       2012-11-17 22:15:15.000000000 -0800
@@ -261,7 +261,8 @@
         c = self.cursor
         try:
             c.execute("INSERT INTO caps (filecap) VALUES (?)", (filecap,))
-        except (self.sqlite_module.IntegrityError, self.sqlite_module.OperationalError):
+        except (self.sqlite_module.IntegrityError, self.sqlite_module.OperationalError), e:
+            print 'HACKY DEBUG HACK: %r %r' % (e, e.args)
             # sqlite3 on sid gives IntegrityError
             # pysqlite2 on dapper gives OperationalError
             pass

I reran the backup command to see this:

HACKY DEBUG HACK: OperationalError('attempt to write a readonly database',) ('attempt to write a readonly database',)
Traceback (most recent call last):
  File "/home/n/virtualenvs/default/bin/tahoe", line 8, in <module>
    load_entry_point('allmydata-tahoe==1.9.2', 'console_scripts', 'tahoe')()
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 113, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 99, in runner
    rc = cli.dispatch[command](so)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/cli.py", line 569, in backup
    rc = tahoe_backup.backup(options)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 325, in backup
    return bu.run()
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 118, in run
    new_backup_dircap = self.process(options.from_dir)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 188, in process
    childcap = self.process(childpath)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 194, in process
    childcap, metadata = self.upload(childpath)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 313, in upload
    bdb_results.did_upload(filecap)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 134, in did_upload
    self.mtime, self.ctime, self.size)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 277, in did_upload_file
    fileid = self.get_or_allocate_fileid_for_cap(filecap)
  File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 271, in get_or_allocate_fileid_for_cap
    assert foundrow
AssertionError

backupdb's get_or_allocate_fileid_for_cap() is swallowing exceptions with the intent of allowing key collisions to implement the semantics of: "Insert, but if it's already there, that's ok." However, it is actually swallowing a more general set of exceptional conditions. In this case, a lock is held by the suspended backup process so the new process times out.

Resolution:

Zooko suggests this fix:

  1. Ensure the sql commands are in a transaction (See caveat below)
  2. query for the cap to get the fileid
  3. if it is not found:
  4. insert it without swallowing exceptions
  5. query for the cap to get the fileid

Caveat:

The transaction scope of the db backend is important. Zooko did some quick searching to determine sqlite3 is table-scoped which should be safe, but in order to support other databases more research needs to be done to determine if this approach is feasible.

**Symptom** I had a long-running tahoe backup command which I suspended. Then I started another tahoe backup and saw this traceback: ``` Traceback (most recent call last): File "/home/n/virtualenvs/default/bin/tahoe", line 8, in <module> load_entry_point('allmydata-tahoe==1.9.2', 'console_scripts', 'tahoe')() File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 113, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 99, in runner rc = cli.dispatch[command](so) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/cli.py", line 569, in backup rc = tahoe_backup.backup(options) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 325, in backup return bu.run() File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 118, in run new_backup_dircap = self.process(options.from_dir) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 194, in process childcap, metadata = self.upload(childpath) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 313, in upload bdb_results.did_upload(filecap) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 134, in did_upload self.mtime, self.ctime, self.size) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 276, in did_upload_file fileid = self.get_or_allocate_fileid_for_cap(filecap) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 270, in get_or_allocate_fileid_for_cap assert foundrow AssertionError ``` **Diagnosis**: After examining the code with much help from Zooko, I tweaked the installed source like this: ``` --- ./allmydata-tahoe-1.9.2/src/allmydata/scripts/backupdb.py 2012-05-14 03:07:24.000000000 -0700 +++ /home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py 2012-11-17 22:15:15.000000000 -0800 @@ -261,7 +261,8 @@ c = self.cursor try: c.execute("INSERT INTO caps (filecap) VALUES (?)", (filecap,)) - except (self.sqlite_module.IntegrityError, self.sqlite_module.OperationalError): + except (self.sqlite_module.IntegrityError, self.sqlite_module.OperationalError), e: + print 'HACKY DEBUG HACK: %r %r' % (e, e.args) # sqlite3 on sid gives IntegrityError # pysqlite2 on dapper gives OperationalError pass ``` I reran the backup command to see this: ``` HACKY DEBUG HACK: OperationalError('attempt to write a readonly database',) ('attempt to write a readonly database',) Traceback (most recent call last): File "/home/n/virtualenvs/default/bin/tahoe", line 8, in <module> load_entry_point('allmydata-tahoe==1.9.2', 'console_scripts', 'tahoe')() File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 113, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/runner.py", line 99, in runner rc = cli.dispatch[command](so) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/cli.py", line 569, in backup rc = tahoe_backup.backup(options) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 325, in backup return bu.run() File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 118, in run new_backup_dircap = self.process(options.from_dir) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 188, in process childcap = self.process(childpath) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 194, in process childcap, metadata = self.upload(childpath) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/tahoe_backup.py", line 313, in upload bdb_results.did_upload(filecap) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 134, in did_upload self.mtime, self.ctime, self.size) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 277, in did_upload_file fileid = self.get_or_allocate_fileid_for_cap(filecap) File "/home/n/virtualenvs/default/lib/python2.7/site-packages/allmydata/scripts/backupdb.py", line 271, in get_or_allocate_fileid_for_cap assert foundrow AssertionError ``` backupdb's `get_or_allocate_fileid_for_cap()` is swallowing exceptions with the intent of allowing key collisions to implement the semantics of: "Insert, but if it's already there, that's ok." However, it is actually swallowing a more general set of exceptional conditions. In this case, a lock is held by the suspended backup process so the new process times out. **Resolution**: Zooko suggests this fix: 1. Ensure the sql commands are in a transaction (See caveat below) 1. query for the cap to get the fileid 1. if it is not found: 1. insert it without swallowing exceptions 1. query for the cap to get the fileid **Caveat**: The transaction scope of the db backend is important. Zooko did some quick searching to determine sqlite3 is table-scoped which should be safe, but in order to support other databases more research needs to be done to determine if this approach is feasible.
nejucomo added the
unknown
normal
defect
1.9.2
labels 2012-11-18 06:42:54 +00:00
nejucomo added this to the undecided milestone 2012-11-18 06:42:54 +00:00

Here's a patch which I think will do the right thing while not catching exceptions that it shouldn't catch. This is safe in sqlite, because sqlite always locks whole tables. For databases with finer-grained transactions, it would be necessary to use isolation level "serializable" or stricter for this to be safe.

diff --git a/src/allmydata/scripts/backupdb.py b/src/allmydata/scripts/backupdb.py
index 75ee0d9..8eed692 100644
--- a/src/allmydata/scripts/backupdb.py
+++ b/src/allmydata/scripts/backupdb.py
@@ -247,14 +247,12 @@ class BackupDB_v2:
         # sqlite has "INSERT ON CONFLICT REPLACE", but not mysql
         # So we use INSERT, ignore any error, then a SELECT
         c = self.cursor
-        try:
-            c.execute("INSERT INTO caps (filecap) VALUES (?)", (filecap,))
-        except (self.sqlite_module.IntegrityError, self.sqlite_module.OperationalError):
-            # sqlite3 on sid gives IntegrityError
-            # pysqlite2 (which we don't use, so maybe no longer relevant) on dapper gives OperationalError
-            pass
         c.execute("SELECT fileid FROM caps WHERE filecap=?", (filecap,))
         foundrow = c.fetchone()
+        if foundrow is None:
+            c.execute("INSERT INTO caps (filecap) VALUES (?)", (filecap,))
+            c.execute("SELECT fileid FROM caps WHERE filecap=?", (filecap,))
+            foundrow = c.fetchone()
         assert foundrow
         fileid = foundrow[0]
         return fileid
Here's a patch which I think will do the right thing while not catching exceptions that it shouldn't catch. This is safe in sqlite, because sqlite always locks whole tables. For databases with finer-grained transactions, it would be necessary to use isolation level "serializable" or stricter for this to be safe. ``` diff --git a/src/allmydata/scripts/backupdb.py b/src/allmydata/scripts/backupdb.py index 75ee0d9..8eed692 100644 --- a/src/allmydata/scripts/backupdb.py +++ b/src/allmydata/scripts/backupdb.py @@ -247,14 +247,12 @@ class BackupDB_v2: # sqlite has "INSERT ON CONFLICT REPLACE", but not mysql # So we use INSERT, ignore any error, then a SELECT c = self.cursor - try: - c.execute("INSERT INTO caps (filecap) VALUES (?)", (filecap,)) - except (self.sqlite_module.IntegrityError, self.sqlite_module.OperationalError): - # sqlite3 on sid gives IntegrityError - # pysqlite2 (which we don't use, so maybe no longer relevant) on dapper gives OperationalError - pass c.execute("SELECT fileid FROM caps WHERE filecap=?", (filecap,)) foundrow = c.fetchone() + if foundrow is None: + c.execute("INSERT INTO caps (filecap) VALUES (?)", (filecap,)) + c.execute("SELECT fileid FROM caps WHERE filecap=?", (filecap,)) + foundrow = c.fetchone() assert foundrow fileid = foundrow[0] return fileid ```
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1865
No description provided.