diff options
author | Konstantin Ryabitsev <konstantin@linuxfoundation.org> | 2021-05-18 16:33:29 -0400 |
---|---|---|
committer | Konstantin Ryabitsev <konstantin@linuxfoundation.org> | 2021-05-18 16:33:29 -0400 |
commit | 4950093c0c3ee71e7045b545626d2b232271cbc8 (patch) | |
tree | d2edceff7ce0814ae8caf0bce048f12f4bd1fa28 /tests/__init__.py | |
parent | 6bec820cb9d015e99e4f5bf375fba098ba5b2f30 (diff) | |
download | b4-4950093c0c3ee71e7045b545626d2b232271cbc8.tar.gz |
Don't use mboxo for anything
While trying to figure out some odd DKIM failures, I've discovered that
there is an important incompatibility between git's idea of what "mbox"
format is, and Python's mboxo implementation -- at least when it comes
to treating "\nFrom " escapes.
According to the "original mbox" standard, when a message body contains
a "\nFrom " sequence, it should be converted to "\n>From " in order not
to confuse the parser. When reading messages in that format, clients are
supposed to back-convert "\n>From " into their original form. This is
the so-called "mboxo" format, which is what Python's mailbox.mbox
supports:
https://docs.python.org/3/library/mailbox.html#mailbox.mbox
The "mboxrd" format was created to avoid a corruption problem whereas a
body that legitimately contains "\n>From " would be wrongly converted
into "\nFrom " upon parsing the mailbox, so mboxrd standard requires
that, when saving a mailbox, "\n>From " sequences are additionally
escaped as "\n>>From ". This is the format public-inbox supports, so
when we grab mailboxes from remote, they are in mboxrd format.
Git will try to guess the format of the mbox file, but it will ONLY
back-convert "\n>From " sequences when you specifically tell it that
it's "mboxrd" format, even when it's in fact "mboxo":
git am --patch-format=mboxrd
If you don't force the mboxrd format, git-am will preserve all escaped
"\n>From " lines as-is.
We've been previously operating on the assumption that git-am's mbox
support properly implements "mboxo", but this was wrong, resulting in
some commits like the following:
https://git.kernel.org/torvalds/c/137733d08f4a
This large-ish change ditches all internal use of Python's mboxo. When
asked to save mbox files, we will save them without any escaping, the
way git-am (i.e. git-mailsplit) espects them. The same goes when we're
outputting to stdout.
There is also a way now to pass -M to both "b4 am" and "b4 mbox" that
will save things as maildirs -- git-am supports this natively and thus
avoids any possible parsing ambiguities. You can set a config option
b4.save-maildirs=yes to make this the default behaviour.
The fallout of this is fairly benign, if annoying. There is no situation
in which a patch would have "\nFrom " as part of its body, so the
problem only affected commit messages. We will have a handful of these
sprinkled around the trees, and will hopefully not introduce any new
ones once everyone switches to the b4 version that outputs things in the
format git-am expects.
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Diffstat (limited to 'tests/__init__.py')
0 files changed, 0 insertions, 0 deletions