Cli fs operations refactoring #640

New Issue

tahoe-lafs · 2009-02-23T02:27:25Z

azazel commented

2009-02-23 02:27:25 +00:00

Each of the cli commands implements its own logic to do fs operations, both local-side and tahoe-side. I think that most of the command modules should be refactored to use common code. This is for many reasons among:

remove duplication;
code comprehensibility and normalization of behavior;
spread availability of improvements found in latter commands (backup) to other commands;
easier implementation of new features like non-ascii charset handling for filenames;
better handling of logging?

The steps in order to do that (that I can see):

Recognize the needs of each command in term of fs manipulation;
Look at the best implementation available and eventually improve it;
maybe design a minimal layer that will help handling alias/path/cap based operations and other cases.

Other interesting points to discuss:

See whether a replacement of httplib with twisted can help in parallelizing and increasing the bandwidth usage.

Let's discuss it!

Each of the cli commands implements its own logic to do fs operations, both local-side and tahoe-side. I think that most of the command modules should be refactored to use common code. This is for many reasons among: * remove duplication; * code comprehensibility and normalization of behavior; * spread availability of improvements found in latter commands (backup) to other commands; * easier implementation of new features like non-ascii charset handling for filenames; * better handling of logging? The steps in order to do that (that I can see): 1. Recognize the needs of each command in term of fs manipulation; 2. Look at the best implementation available and eventually improve it; 3. maybe design a minimal layer that will help handling alias/path/cap based operations and other cases. Other interesting points to discuss: * See whether a replacement of httplib with twisted can help in parallelizing and increasing the bandwidth usage. Let's discuss it!

tahoe-lafs added the

labels 2009-02-23 02:27:25 +00:00

tahoe-lafs added this to the undecided milestone 2009-02-23 02:27:25 +00:00

zooko commented

2009-02-24 19:25:33 +00:00

azazel: Those are good reasons! We already have reasonable test coverage for most cli commands, so after doing this refactoring it should be possible to look at the per-line code coverage output and see that the tests are thoroughly exercising the code in the new refactored version.

I don't have time to do this myself (see my blog for notes about what I'm doing), but I would happily review and accept patches that do this (once the buildbot is green).

azazel: Those are good reasons! We already have reasonable test coverage for most cli commands, so after doing this refactoring it should be possible to look at the per-line code coverage output and see that the tests are thoroughly exercising the code in the new refactored version. I don't have time to do this myself (see my blog for notes about what I'm doing), but I would happily review and accept patches that do this (once the buildbot is green).

davidsarah commented

2010-03-25 02:22:21 +00:00

See whether a replacement of httplib with twisted can help in parallelizing and increasing the bandwidth usage.

That should probably be a separate ticket.

> See whether a replacement of httplib with twisted can help in parallelizing and increasing the bandwidth usage. That should probably be a separate ticket.

nejucomo commented

2010-11-21 18:48:59 +00:00

I read over the code to understand how to get from my shell into ./src/allmydata/scripts/* and it seems overly complex:

./bin/tahoe is a python script generated from ./bin/tahoe-script.template.

Q1: Why is this generated from a template? That seems unpythonic and unnecessary. When I diff the template and the generated script on my Mac, the only change is to replace "#!/bin/false" with "#!/usr/bin/python".

./bin/tahoe does a bunch of platform difference wrangling and eventually calls ./support/bin/tahoe.

Q2: Can any of the complexity in ./bin/tahoe be removed?

./support/bin/tahoe is very short and simply calls: load_entry_point('allmydata-tahoe==1.8.0', 'console_scripts', 'tahoe')()

Q3: Why does ./support/bin/tahoe exist? Why doesn't ./bin/tahoe call load_entry_point directly?

Q4: Why is the load_entry_point mechanism used? Is it necessary?

load_entry_point() resolves a mapping for "console_scripts" for tahoe to ./src/allmydata/scripts/runner.py's run() function. This mapping is configured in ./setup.py.

Q5: Why doesn't ./bin/tahoe call runner.py's run() directly?

Q6: Why are there two separate packaging processes necessary to make the commandline work? One being script template generation, the other being entry_point registration.

My perspective probably doesn't account for use cases, but I hope by asking these questions we might notice potential simplifications.

I read over the code to understand how to get from my shell into ./src/allmydata/scripts/* and it seems overly complex: ./bin/tahoe is a python script generated from ./bin/tahoe-script.template. Q1: Why is this generated from a template? That seems unpythonic and unnecessary. When I diff the template and the generated script on my Mac, the only change is to replace "#!/bin/false" with "#!/usr/bin/python". ./bin/tahoe does a bunch of platform difference wrangling and eventually calls ./support/bin/tahoe. Q2: Can any of the complexity in ./bin/tahoe be removed? ./support/bin/tahoe is very short and simply calls: load_entry_point('allmydata-tahoe==1.8.0', 'console_scripts', 'tahoe')() Q3: Why does ./support/bin/tahoe exist? Why doesn't ./bin/tahoe call load_entry_point directly? Q4: Why is the load_entry_point mechanism used? Is it necessary? load_entry_point() resolves a mapping for "console_scripts" for tahoe to ./src/allmydata/scripts/runner.py's run() function. This mapping is configured in ./setup.py. Q5: Why doesn't ./bin/tahoe call runner.py's run() directly? Q6: Why are there two separate packaging processes necessary to make the commandline work? One being script template generation, the other being entry_point registration. My perspective probably doesn't account for use cases, but I hope by asking these questions we might notice potential simplifications.

davidsarah commented

2010-11-21 20:19:15 +00:00

Replying to nejucomo:

I read over the code to understand how to get from my shell into ./src/allmydata/scripts/* and it seems overly complex:

./bin/tahoe is a python script generated from ./bin/tahoe-script.template.

Q1: Why is this generated from a template? That seems unpythonic and unnecessary. When I diff the template and the generated script on my Mac, the only change is to replace "#!/bin/false" with "#!/usr/bin/python".

The intention is that the script runs with the same Python interpreter that ran setup.py build. A shebang line must specify an absolute path. It would be possible to use "/usr/bin/env python", but note that some of the built dependencies contain compiled code that is specific to a Python version, so upgrading the system Python (that is first on the PATH) would then cause an existing installation to fail.

./bin/tahoe does a bunch of platform difference wrangling and eventually calls ./support/bin/tahoe.

Q2: Can any of the complexity in ./bin/tahoe be removed?

Yes. Most of the complexity is just due to the fact that we have two scripts (this one and support/bin/tahoe). The reason why we need two scripts is that the one generated by setuptools doesn't set up the sys.path as we want, and setting the PYTHONPATH environment variable is the documented way to change the initial sys.path. The Windows-specific code is necessary in order to pass on Unicode arguments correctly to the second script, which would not be needed if we only had one script.

./support/bin/tahoe is very short and simply calls: load_entry_point('allmydata-tahoe==1.8.0', 'console_scripts', 'tahoe')()

Q3: Why does ./support/bin/tahoe exist? Why doesn't ./bin/tahoe call load_entry_point directly?

support/bin/tahoe is the entry script generated by setuptools. I presume that originally we considered the contents of that script to be an implementation detail of setuptools, and therefore didn't want to duplicate it.

Since then, we have ended up changing how setuptools generates scripts, and we would probably be better off just setting sys.path ourselves. That is what Brian's "unsuck branch" does, IIUC.

Q4: Why is the load_entry_point mechanism used? Is it necessary?

Sort of. Some mechanism to put the right copy of Tahoe and dependent libraries on the sys.path is necessary, and load_entry_point is the setuptools way of doing that.

load_entry_point() resolves a mapping for "console_scripts" for tahoe to ./src/allmydata/scripts/runner.py's run() function. This mapping is configured in ./setup.py.

Q5: Why doesn't ./bin/tahoe call runner.py's run() directly?

The sys.path would be wrong.

Q6: Why are there two separate packaging processes necessary to make the commandline work? One being script template generation, the other being entry_point registration.

A combination of overcomplication in the setuptools design, and our workarounds for setuptools bugs.

Replying to [nejucomo](/tahoe-lafs/trac-2024-07-25/issues/640#issuecomment-69924): > I read over the code to understand how to get from my shell into ./src/allmydata/scripts/* and it seems overly complex: > > ./bin/tahoe is a python script generated from ./bin/tahoe-script.template. > > Q1: Why is this generated from a template? That seems unpythonic and unnecessary. When I diff the template and the generated script on my Mac, the only change is to replace "#!/bin/false" with "#!/usr/bin/python". The intention is that the script runs with the same Python interpreter that ran `setup.py build`. A shebang line must specify an absolute path. It would be possible to use "`/usr/bin/env python`", but note that some of the built dependencies contain compiled code that is specific to a Python version, so upgrading the system Python (that is first on the PATH) would then cause an existing installation to fail. > ./bin/tahoe does a bunch of platform difference wrangling and eventually calls ./support/bin/tahoe. > > Q2: Can any of the complexity in ./bin/tahoe be removed? Yes. Most of the complexity is just due to the fact that we have two scripts (this one and `support/bin/tahoe`). The reason why we need two scripts is that the one generated by setuptools doesn't set up the `sys.path` as we want, and setting the PYTHONPATH environment variable is the documented way to change the initial `sys.path`. The Windows-specific code is necessary in order to pass on Unicode arguments correctly to the second script, which would not be needed if we only had one script. > ./support/bin/tahoe is very short and simply calls: load_entry_point('allmydata-tahoe==1.8.0', 'console_scripts', 'tahoe')() > > Q3: Why does ./support/bin/tahoe exist? Why doesn't ./bin/tahoe call load_entry_point directly? `support/bin/tahoe` is the entry script generated by setuptools. I presume that originally we considered the contents of that script to be an implementation detail of setuptools, and therefore didn't want to duplicate it. Since then, we have ended up changing how setuptools generates scripts, and we would probably be better off just setting `sys.path` ourselves. That is what Brian's "unsuck branch" does, IIUC. > Q4: Why is the load_entry_point mechanism used? Is it necessary? Sort of. Some mechanism to put the right copy of Tahoe and dependent libraries on the `sys.path` is necessary, and `load_entry_point` is the setuptools way of doing that. > load_entry_point() resolves a mapping for "console_scripts" for tahoe to ./src/allmydata/scripts/runner.py's run() function. This mapping is configured in ./setup.py. > > Q5: Why doesn't ./bin/tahoe call runner.py's run() directly? The `sys.path` would be wrong. > Q6: Why are there two separate packaging processes necessary to make the commandline work? One being script template generation, the other being entry_point registration. A combination of overcomplication in the setuptools design, and our workarounds for setuptools bugs.

Sign in to join this conversation.