aboutsummaryrefslogtreecommitdiff
path: root/b4/pr.py
diff options
context:
space:
mode:
authorKonstantin Ryabitsev <konstantin@linuxfoundation.org>2021-05-18 16:33:29 -0400
committerKonstantin Ryabitsev <konstantin@linuxfoundation.org>2021-05-18 16:33:29 -0400
commit4950093c0c3ee71e7045b545626d2b232271cbc8 (patch)
treed2edceff7ce0814ae8caf0bce048f12f4bd1fa28 /b4/pr.py
parent6bec820cb9d015e99e4f5bf375fba098ba5b2f30 (diff)
downloadb4-4950093c0c3ee71e7045b545626d2b232271cbc8.tar.gz
Don't use mboxo for anything
While trying to figure out some odd DKIM failures, I've discovered that there is an important incompatibility between git's idea of what "mbox" format is, and Python's mboxo implementation -- at least when it comes to treating "\nFrom " escapes. According to the "original mbox" standard, when a message body contains a "\nFrom " sequence, it should be converted to "\n>From " in order not to confuse the parser. When reading messages in that format, clients are supposed to back-convert "\n>From " into their original form. This is the so-called "mboxo" format, which is what Python's mailbox.mbox supports: https://docs.python.org/3/library/mailbox.html#mailbox.mbox The "mboxrd" format was created to avoid a corruption problem whereas a body that legitimately contains "\n>From " would be wrongly converted into "\nFrom " upon parsing the mailbox, so mboxrd standard requires that, when saving a mailbox, "\n>From " sequences are additionally escaped as "\n>>From ". This is the format public-inbox supports, so when we grab mailboxes from remote, they are in mboxrd format. Git will try to guess the format of the mbox file, but it will ONLY back-convert "\n>From " sequences when you specifically tell it that it's "mboxrd" format, even when it's in fact "mboxo": git am --patch-format=mboxrd If you don't force the mboxrd format, git-am will preserve all escaped "\n>From " lines as-is. We've been previously operating on the assumption that git-am's mbox support properly implements "mboxo", but this was wrong, resulting in some commits like the following: https://git.kernel.org/torvalds/c/137733d08f4a This large-ish change ditches all internal use of Python's mboxo. When asked to save mbox files, we will save them without any escaping, the way git-am (i.e. git-mailsplit) espects them. The same goes when we're outputting to stdout. There is also a way now to pass -M to both "b4 am" and "b4 mbox" that will save things as maildirs -- git-am supports this natively and thus avoids any possible parsing ambiguities. You can set a config option b4.save-maildirs=yes to make this the default behaviour. The fallout of this is fairly benign, if annoying. There is no situation in which a patch would have "\nFrom " as part of its body, so the problem only affected commit messages. We will have a handful of these sprinkled around the trees, and will hopefully not introduce any new ones once everyone switches to the b4 version that outputs things in the format git-am expects. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Diffstat (limited to 'b4/pr.py')
-rw-r--r--b4/pr.py27
1 files changed, 8 insertions, 19 deletions
diff --git a/b4/pr.py b/b4/pr.py
index 374d8ed..8e8fdd7 100644
--- a/b4/pr.py
+++ b/b4/pr.py
@@ -393,19 +393,15 @@ def explode(gitdir, lmsg, mailfrom=None, retrieve_links=True, fpopts=None):
# Did we already retrieve it as part of a previous tread?
if msgid in seen_msgids:
continue
- savefile = mkstemp()[1]
- mboxfile = b4.get_pi_thread_by_msgid(msgid, savefile)
- if mboxfile is not None:
- # Open it and append any messages we don't yet have
- ambx = mailbox.mbox(mboxfile)
- for amsg in ambx:
+ msgs = b4.get_pi_thread_by_msgid(msgid)
+ if msgs:
+ # Append any messages we don't yet have
+ for amsg in msgs:
amsgid = b4.LoreMessage.get_clean_msgid(amsg)
if amsgid not in seen_msgids:
seen_msgids.add(amsgid)
logger.debug('Added linked: %s', amsg.get('Subject'))
tmbx.add(amsg.as_string(policy=b4.emlpolicy).encode())
- ambx.close()
- os.unlink(savefile)
if len(tmbx):
tmbx.close()
@@ -441,21 +437,14 @@ def main(cmdargs):
logger.debug('Getting PR message from public-inbox')
msgid = b4.get_msgid(cmdargs)
- savefile = mkstemp()[1]
- mboxfile = b4.get_pi_thread_by_msgid(msgid, savefile)
- if mboxfile is None:
- os.unlink(savefile)
+ msgs = b4.get_pi_thread_by_msgid(msgid)
+ if not msgs:
return
- # Find the message with the msgid we were asked about
- mbx = mailbox.mbox(mboxfile)
- for msg in mbx:
+ for msg in msgs:
mmsgid = b4.LoreMessage.get_clean_msgid(msg)
if mmsgid == msgid:
lmsg = parse_pr_data(msg)
-
- # Got all we need from it
- mbx.close()
- os.unlink(savefile)
+ break
if lmsg is None or lmsg.pr_remote_tip_commit is None:
logger.critical('ERROR: Could not find pull request info in %s', msgid)