document in what ways Tahoe-LAFS builds are not currently verifiable #2357

New Issue

tahoe-lafs · 2014-12-29T16:15:29Z

daira commented

2014-12-29 16:15:29 +00:00

A long-term goal, ticketed as #2057, is to enable end-users to verify that the package of Tahoe-LAFS that they are using was generated from the exact same source code that a security auditor examined.

In order to explain the verifiable build concept, consider this simple diagram:

    distributor: source code ➾ binary package → user

Here we use “➾” to mean “build” — the process that produces usable packages out of source code.

Now consider a security auditor who does a source-code-based examination (as opposed to binary-based, which is called “reverse engineering”). This security auditor will start with the source code, and examine it for vulnerabilities or backdoors.

    auditor: source code → security audit

How can the user who receives a binary package know whether that package was built from the source that the auditor examined?

The “verifiable build” approach attempts to answer that question by having the security auditor perform the “source code ➾ binary package” on their own trusted system, and then taking a fingerprint (secure hash) of the resulting binary package:

   auditor: source code ➾ binary package
   auditor: binary package → generate fingerprint

The auditor then publishes that fingerprint along with their report about their security audit. Users who receive the binary package can take a fingerprint of that package and compare it to the fingerprint
in the published report.

   distributor: source code ➾ binary package → user
   user: binary package → check fingerprint

This approach can work only if the ➾ operation performed by the distributor results in a bytewise-identical binary as the ➾ operation performed by the security auditor.

Here is a news article from LWN.net about the concept of verifiable builds (prompted in part by an open letter that we wrote): “Security software verifiability”. Here is a [//pipermail/tahoe-dev/2013-August/008684.html post on the tahoe-dev mailing list] about our desire to have verifiable builds for Tahoe-LAFS.

The goal of this ticket is to have documentation of the ways in which Tahoe-LAFS builds are not currently verifiable. Its scope includes:

Tahoe-LAFS as built via setup.py (using setuptools and/or pip), and
the MAC OS X (#182) and Windows (#195) packages

but does not include Tahoe-LAFS as packaged by an operating system distribution or package management system.

It may be useful to consider how existing projects have approached this problem: Debian, Tor, Bitcoin, and the recent ad-hoc [reproduction of the TrueCrypt Windows binaries](https://madiba.encs.concordia.ca/~x_decarn/truecrypt-binaries-analysis/).

A long-term goal, ticketed as #2057, is to enable end-users to *verify* that the package of Tahoe-LAFS that they are using was generated from the exact same source code that a security auditor examined. In order to explain the verifiable build concept, consider this simple diagram: ``` distributor: source code ➾ binary package → user ``` Here we use “➾” to mean “build” — the process that produces usable packages out of source code. Now consider a security auditor who does a source-code-based examination (as opposed to binary-based, which is called “reverse engineering”). This security auditor will start with the source code, and examine it for vulnerabilities or backdoors. ``` auditor: source code → security audit ``` How can the user who receives a binary package know whether that package was built from the source that the auditor examined? The “verifiable build” approach attempts to answer that question by having the security auditor perform the “source code ➾ binary package” on their own trusted system, and then taking a fingerprint (secure hash) of the resulting binary package: ``` auditor: source code ➾ binary package auditor: binary package → generate fingerprint ``` The auditor then publishes that fingerprint along with their report about their security audit. Users who receive the binary package can take a fingerprint of that package and compare it to the fingerprint in the published report. ``` distributor: source code ➾ binary package → user user: binary package → check fingerprint ``` This approach can work only if the ➾ operation performed by the distributor results in a bytewise-identical binary as the ➾ operation performed by the security auditor. Here is a news article from LWN.net about the concept of verifiable builds (prompted in part by an open letter that we wrote): [“Security software verifiability”](https://lwn.net/Articles/564263/). Here is a [//pipermail/tahoe-dev/2013-August/008684.html post on the tahoe-dev mailing list] about our desire to have verifiable builds for Tahoe-LAFS. The goal of *this* ticket is to have documentation of the ways in which Tahoe-LAFS builds are not currently verifiable. Its scope includes: * Tahoe-LAFS as built via setup.py (using setuptools and/or pip), and * the MAC OS X (#182) and Windows (#195) packages but does not include Tahoe-LAFS as packaged by an operating system distribution or package management system. It may be useful to consider how existing projects have approached this problem: [Debian](https://wiki.debian.org/ReproducibleBuilds), [Tor](https://blog.torproject.org/category/tags/deterministic-builds), [Bitcoin](https://en.bitcoin.it/wiki/Release_process), and the recent ad-hoc [reproduction of the [TrueCrypt](wiki/TrueCrypt) Windows binaries](https://madiba.encs.concordia.ca/~x_decarn/truecrypt-binaries-analysis/).

tahoe-lafs added the

labels 2014-12-29 16:15:29 +00:00

tahoe-lafs added this to the undecided milestone 2014-12-29 16:15:29 +00:00

tahoe-lafs modified the milestone from undecided to soon

2014-12-29 16:17:29 +00:00

daira commented

2015-01-09 02:21:32 +00:00

OpenITP meeting 5 January 2014

note: nondeterminism that results in obvious build failures is ok
different build targets can have different fingerprints
what counts as a build target?
[operating system versions, patches, variants, distribution if counted as the same target]NONDET:
quickstart build flow:
install Python if necessary
download the allmydata-tahoe-*.zip file (for a given build target)
unzip it
[unzip programs might vary in e.g. permissions of unzipped files]NONDET:
[file timestamps may depend on the clock of the build system]NONDET:
[order of files/subdirs in directories, if filesystem does not sort them]NONDET:
run setup.py build in a command prompt
[which Python version runs setup.py?]NONDET:
[other installed Python versions might affect the build?]NONDET:
[which setuptools/pkg_resources/virtualenv version?]NONDET:
[system or virtualenv?]NONDET:
[which other Python packages installed on system and in virtualenv?]NONDET:
[PYTHONPATH]NONDET:

it has some set of URLs where it looks for package distributions ("dists")
[using the net at all is hopeless wrt determinism]NONDET:
which dists it chooses can influence further choices of dist for other dependencies
try to build each dist

[order of builds? not sure what algorithm is used]NONDET:
dists are either pure Python or have C/C++ code
[buildchain for C/C++ code (includes many non-obvious dependencies)]NONDET:
[build process for C/C++ code]NONDET:
[distutils properties that affect compilation]NONDET:
[environment vars that affect compilation]NONDET:
[execution of Python code for building a dist (e.g dict order etc.)]NONDET:
[do any dependencies rely on entropy sources (e.g. os.urandom)?]NONDET:
[can operations like running tests affect the built copy of Tahoe?]NONDET:
sources of nondeterminism from builds of dependencies

OpenITP meeting 5 January 2014 note: nondeterminism that results in obvious build failures is ok different build targets can have different fingerprints what counts as a build target? [operating system versions, patches, variants, distribution if counted as the same target]NONDET: quickstart build flow: install Python if necessary download the allmydata-tahoe-*.zip file (for a given build target) unzip it [unzip programs might vary in e.g. permissions of unzipped files]NONDET: [file timestamps may depend on the clock of the build system]NONDET: [order of files/subdirs in directories, if filesystem does not sort them]NONDET: run setup.py build in a command prompt [which Python version runs setup.py?]NONDET: [other installed Python versions might affect the build?]NONDET: [which setuptools/pkg_resources/virtualenv version?]NONDET: [system or virtualenv?]NONDET: [which other Python packages installed on system and in virtualenv?]NONDET: [PYTHONPATH]NONDET: it has some set of URLs where it looks for package distributions ("dists") [using the net at all is hopeless wrt determinism]NONDET: which dists it chooses can influence further choices of dist for other dependencies try to build each dist [order of builds? not sure what algorithm is used]NONDET: dists are either pure Python or have C/C++ code [buildchain for C/C++ code (includes many non-obvious dependencies)]NONDET: [build process for C/C++ code]NONDET: [distutils properties that affect compilation]NONDET: [environment vars that affect compilation]NONDET: [execution of Python code for building a dist (e.g dict order etc.)]NONDET: [do any dependencies rely on entropy sources (e.g. os.urandom)?]NONDET: [can operations like running tests affect the built copy of Tahoe?]NONDET: sources of nondeterminism from builds of dependencies

daira commented

2015-02-03 16:21:45 +00:00

Fixed; the report is at https://github.com/LeastAuthority/openitp-good-packaging-proposal/blob/master/openitp-good-packaging-for-LAFS_sources-of-nondeterminism.rst.

Fixed; the report is at <https://github.com/LeastAuthority/openitp-good-packaging-proposal/blob/master/openitp-good-packaging-for-LAFS_sources-of-nondeterminism.rst>.

tahoe-lafs added the

fixed

label 2015-02-03 16:21:45 +00:00

tahoe-lafs modified the milestone from soon to soon (release n/a)

2015-02-03 16:21:45 +00:00

daira closed this issue

2015-02-03 16:21:45 +00:00

Sign in to join this conversation.