file formatting conventions for text files in our source repo #2138

Open
opened 2013-12-17 22:20:24 +00:00 by zooko · 7 comments

This makes it so that emacs knows the intended character encoding, BOM, end-of-line markers, standard line-width, and tabs-vs-spaces policy for these files.

This is also a form of documentation. It means that you should put only utf-8-encoded things into text files, only utf-8-encoded things into source code files (and actually you should write only put ASCII-encoded things except possibly in comments or docstrings!), and that you should line-wrap everything at 77 columns wide.

It also specifies that text files should start with a "utf-8 BOM". (Brian questions the point of this, and my answer is that it adds information and doesn't hurt. Whether that information will ever be useful is an open question.)

It also specifies that text files should have unix-style end-of-line markers (i.e. '\n'), not windows-style or old-macos-style.

For Python source code files, it also specifies that you should not insert tab characters (so you should use spaces for Python block structure).

I generated this patch by writing and running the following script, and then reading the resulting diff to make sure it was correct. I then undid the changes that the script had done to the files inside the "setuptools-0.6c16dev4.egg" directory before committing the patch.

------- begin appended script::

# -*- coding: utf-8-with-signature-unix; fill-column: 77 -*-

import os

magic_header_line_comment_prefix = {
    '.py': u"# ",
    '.rst': u".. ",
    }

def format():
    for dirpath, dirnames, filenames in os.walk('.'):
        for filename in filenames:
            ext = os.path.splitext(filename)[-1]
            if ext in ('.py', '.rst'):
                fname = os.path.join(dirpath, filename)
                info = open(fname, 'rU')
                formattedlines = [ line.decode('utf-8') for line in info ]
                info.close()

                if len(formattedlines) == 0:
                    continue

                outfo = open(fname, 'w')
                outfo.write(u"\ufeff".encode('utf-8'))

                commentsign = magic_header_line_comment_prefix[ext]

                firstline = formattedlines.pop(0)
                while firstline.startswith(u"\ufeff"):
                    firstline = firstline[len(u"\ufeff"):]
                if firstline.startswith(u"#!"):
                    outfo.write(firstline.encode('utf-8'))
                    outfo.write(commentsign+"-*- coding: utf-8-with-signature-unix; fill-column: 77 -*-\n".encode('utf-8'))
                    if ext == '.py':
                        outfo.write(commentsign+"-*- indent-tabs-mode: nil -*-\n".encode('utf-8'))
                else:
                    outfo.write(commentsign+"-*- coding: utf-8-with-signature-unix; fill-column: 77 -*-\n".encode('utf-8'))
                    if ext == '.py':
                        outfo.write(commentsign+"-*- indent-tabs-mode: nil -*-\n".encode('utf-8'))
                    if (firstline.strip().startswith(commentsign)) and ("-*-" in firstline) and ("coding:" in firstline):
                        print "warning there was already a coding line %r in %r"  % (firstline, fname)
                    else:
                        outfo.write(firstline.encode('utf-8'))

                for l in formattedlines:
                    if (l.strip().startswith(commentsign)) and ("-*-" in l) and ("coding:" in l):
                        print "warning there was already a coding line %r in %r"  % (l, fname)
                    else:
                        outfo.write(l.encode('utf-8'))
                outfo.close()

if __name__ == '__main__':
    format()
This makes it so that emacs knows the intended character encoding, BOM, end-of-line markers, standard line-width, and tabs-vs-spaces policy for these files. This is also a form of documentation. It means that you should put only utf-8-encoded things into text files, only utf-8-encoded things into source code files (and actually you should write only put ASCII-encoded things except possibly in comments or docstrings!), and that you should line-wrap everything at 77 columns wide. It also specifies that text files should start with a "utf-8 BOM". (Brian questions the point of this, and my answer is that it adds information and doesn't hurt. Whether that information will ever be useful is an open question.) It also specifies that text files should have unix-style end-of-line markers (i.e. '\n'), not windows-style or old-macos-style. For Python source code files, it also specifies that you should not insert tab characters (so you should use spaces for Python block structure). I generated this patch by writing and running the following script, and then reading the resulting diff to make sure it was correct. I then undid the changes that the script had done to the files inside the "setuptools-0.6c16dev4.egg" directory before committing the patch. ------- begin appended script:: ```/usr/bin/env python # -*- coding: utf-8-with-signature-unix; fill-column: 77 -*- import os magic_header_line_comment_prefix = { '.py': u"# ", '.rst': u".. ", } def format(): for dirpath, dirnames, filenames in os.walk('.'): for filename in filenames: ext = os.path.splitext(filename)[-1] if ext in ('.py', '.rst'): fname = os.path.join(dirpath, filename) info = open(fname, 'rU') formattedlines = [ line.decode('utf-8') for line in info ] info.close() if len(formattedlines) == 0: continue outfo = open(fname, 'w') outfo.write(u"\ufeff".encode('utf-8')) commentsign = magic_header_line_comment_prefix[ext] firstline = formattedlines.pop(0) while firstline.startswith(u"\ufeff"): firstline = firstline[len(u"\ufeff"):] if firstline.startswith(u"#!"): outfo.write(firstline.encode('utf-8')) outfo.write(commentsign+"-*- coding: utf-8-with-signature-unix; fill-column: 77 -*-\n".encode('utf-8')) if ext == '.py': outfo.write(commentsign+"-*- indent-tabs-mode: nil -*-\n".encode('utf-8')) else: outfo.write(commentsign+"-*- coding: utf-8-with-signature-unix; fill-column: 77 -*-\n".encode('utf-8')) if ext == '.py': outfo.write(commentsign+"-*- indent-tabs-mode: nil -*-\n".encode('utf-8')) if (firstline.strip().startswith(commentsign)) and ("-*-" in firstline) and ("coding:" in firstline): print "warning there was already a coding line %r in %r" % (firstline, fname) else: outfo.write(firstline.encode('utf-8')) for l in formattedlines: if (l.strip().startswith(commentsign)) and ("-*-" in l) and ("coding:" in l): print "warning there was already a coding line %r in %r" % (l, fname) else: outfo.write(l.encode('utf-8')) outfo.close() if __name__ == '__main__': format() ```
zooko added the
unknown
normal
enhancement
1.10.0
labels 2013-12-17 22:20:24 +00:00
zooko added this to the undecided milestone 2013-12-17 22:20:24 +00:00
Author
(https://github.com/tahoe-lafs/tahoe-lafs/pull/77)
Author

Oh wait, there is a bug in my script. Also I want to add indent-tabs-mode: nil since I just now had a bug due to automatically inserted tabs! brb.

Oh wait, there is a bug in my script. Also I want to add `indent-tabs-mode: nil` since I just now had a bug due to automatically inserted tabs! brb.
Author
pull request: <https://github.com/tahoe-lafs/tahoe-lafs/pull/78>
daira commented 2013-12-19 00:39:24 +00:00
Owner

I don't agree that the fill column for Python code should be 77. It's unnecessarily short and not consistent with the line wrapping in the majority of our existing code. wiki/CodingStandards says:

Ignore the part of PEP-8 which specifes 79- or 72- char line widths. Lines should preferably be less than 100 columns, but we don't enforce this strictly. It is more important to break lines at points that are natural for readability than to follow a fixed line width restriction. Where possible, continuation lines should be indented as far as necessary to make them match up with the subexpression (e.g. argument list) they belong to.

I don't agree that the fill column for Python code should be 77. It's unnecessarily short and not consistent with the line wrapping in the majority of our existing code. [wiki/CodingStandards](wiki/CodingStandards) says: > Ignore the part of PEP-8 which specifes 79- or 72- char line widths. Lines should preferably be less than 100 columns, but we don't enforce this strictly. It is more important to break lines at points that are natural for readability than to follow a fixed line width restriction. Where possible, continuation lines should be indented as far as necessary to make them match up with the subexpression (e.g. argument list) they belong to.
daira commented 2013-12-19 00:49:35 +00:00
Owner

It's not clear that "-- coding: utf-8-with-signature-unix; fill-column: 77 --" is a valid Python source encoding declaration according to PEP 263. I don't know what Python actually implements, but the PEP doesn't require it to accept the full syntax accepted by emacs.

On the issue of BOMs, I'm with Brian in not seeing the point in adding them, especially for Python source code files that don't actually contain any non-ASCII characters (i.e. the vast majority of files in the Tahoe-LAFS source). And I do see harm in patches that touch large numbers of files, since that can create conflicts with other work in progress (granted, only trivial conflicts).

It's not clear that "-*- coding: utf-8-with-signature-unix; fill-column: 77 -*-" is a valid Python source encoding declaration according to [PEP 263](http://www.python.org/dev/peps/pep-0263/). I don't know what Python actually implements, but the PEP doesn't require it to accept the full syntax accepted by emacs. On the issue of BOMs, I'm with Brian in not seeing the point in adding them, especially for Python source code files that don't actually contain any non-ASCII characters (i.e. the vast majority of files in the Tahoe-LAFS source). And I do see harm in patches that touch large numbers of files, since that can create conflicts with other work in progress (granted, only trivial conflicts).
daira commented 2013-12-19 00:51:58 +00:00
Owner

Note that enabling editor auto-wrapping for source code is generally a bad idea anyway. Wrapping should be done manually.

Note that enabling editor auto-wrapping for source code is generally a bad idea anyway. Wrapping should be done manually.
tahoe-lafs added
code
and removed
unknown
labels 2013-12-19 00:53:43 +00:00
daira commented 2015-02-09 01:51:25 +00:00
Owner

Comments comment:94261, comment:94262 and comment:94263 are my review; generally -1.

Comments [comment:94261](/tahoe-lafs/trac-2024-07-25/issues/2138#issuecomment-94261), [comment:94262](/tahoe-lafs/trac-2024-07-25/issues/2138#issuecomment-94262) and [comment:94263](/tahoe-lafs/trac-2024-07-25/issues/2138#issuecomment-94263) are my review; generally -1.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#2138
No description provided.