The CI Docker image builders are hard to test and are happy to push broken images #3484

Open
opened 2020-10-22 19:01:02 +00:00 by exarkun · 1 comment

A lot of the current CI configuration uses Docker images as a basis for the testing environment. There is also CI configuration to build these Docker images. These images are pre-loaded with as much software as we can manage so that they bear most of the environment setup cost. Then individual CI jobs that are for running tests only have to get Tahoe-LAFS itself in place. This lets them run more quickly.

Docker images are built on CircleCI by a cron-driven nightly workflow. This means images are typically only built based on what's in master. This means that it is hard to test any code changes to how the images are built since those changes don't happen on master. If you want to test them at all, you have to do it manually outside of CI (which is error prone since you cannot reproduce the exact CI environment) or you have to mess with the configuration to make the images build on your branch (and then un-mess it afterwards).

It would be a nicer developer experience if changes to the image building code were testable in essentially the same way changes to any other code are testable - push changes to a branch, let CI run against that branch, see if CI succeeds or not.

Apart from those issues, another issue is that the image-building CI jobs will push the images they build as long as that build succeeds. The build may succeed any include an incompatible version of some dependency (eg because it was just released and the builders pull the latest version of many dependencies).

It would be nice if new images were only pushed if they worked at least as well as the image they were replacing. This would let normal Tahoe-LAFS development continue undisturbed when a dependency publishes an incompatible release. When a developer has a chance to look at the issue, they can then address the problem. Once the problem is resolved in Tahoe-LAFS the image builder would be unblocked to push new images.

These two things may really be independent problems with independent solutions and if so then this ticket should be split in half (if not further). I describe both of the problems here because they seem very interconnected to me, partly due to the constraints placed on us by the capabilities of the CI systems we rely on.

A lot of the current CI configuration uses Docker images as a basis for the testing environment. There is also CI configuration to build these Docker images. These images are pre-loaded with as much software as we can manage so that they bear most of the environment setup cost. Then individual CI jobs that are for running tests only have to get Tahoe-LAFS itself in place. This lets them run more quickly. Docker images are built on CircleCI by a cron-driven nightly workflow. This means images are typically only built based on what's in master. This means that it is hard to test any code changes to how the images are built since those changes don't happen on master. If you want to test them at all, you have to do it manually outside of CI (which is error prone since you cannot reproduce the exact CI environment) or you have to mess with the configuration to make the images build on your branch (and then un-mess it afterwards). It would be a nicer developer experience if changes to the image building code were testable in essentially the same way changes to any other code are testable - push changes to a branch, let CI run against that branch, see if CI succeeds or not. Apart from those issues, another issue is that the image-building CI jobs will push the images they build as long as that build succeeds. The build may succeed any include an incompatible version of some dependency (eg because it was just released and the builders pull the latest version of many dependencies). It would be nice if new images were only pushed if they worked at least as well as the image they were replacing. This would let normal Tahoe-LAFS development continue undisturbed when a dependency publishes an incompatible release. When a developer has a chance to look at the issue, they can then address the problem. Once the problem is resolved in Tahoe-LAFS the image builder would be unblocked to push new images. These two things may really be independent problems with independent solutions and if so then this ticket should be split in half (if not further). I describe both of the problems here because they seem very interconnected to me, partly due to the constraints placed on us by the capabilities of the CI systems we rely on.
exarkun added the
dev-infrastructure
normal
defect
n/a
labels 2020-10-22 19:01:02 +00:00
exarkun added this to the undecided milestone 2020-10-22 19:01:02 +00:00
Author

Some pieces of the description are now out-of-date. The images are no longer built on a schedule. They are only built when a developer explicitly requests this using .circleci/rebuild-images.sh.

The other parts are still relevent, though.

Some pieces of the description are now out-of-date. The images are no longer built on a schedule. They are only built when a developer explicitly requests this using `.circleci/rebuild-images.sh`. The other parts are still relevent, though.
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#3484
No description provided.