aboutsummaryrefslogtreecommitdiff
path: root/b4/__init__.py
AgeCommit message (Collapse)Author
2021-08-16When deduping, prefer DKIM-validating messagesKonstantin Ryabitsev
With newer lore.kernel.org and /all/, we get duplicate messages when message bodies are different due to one of the messages passing through a DKIM-compliant list, and another one through something that injects in-body or in-subject junk. When dealing with duplicates, check both for DKIM status and prefer the message that actually passes DKIM validation. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-08-05Handle decoding incorrectly encoded headersKonstantin Ryabitsev
Sometimes the encoding indicated in the header lies and it's not actualy that codepage at all. When that happens, just replace errors and continue. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-08-03Parse just headers when extracting message ID from stdin mboxKyle Meyer
When the mbox and am subcommands grab a message ID from the mbox on stdin, they call message_from_bytes(), which in turn calls BytesParser().parsebytes(s). parsebytes() has a headersonly parameter that can be used to tell it to stop parsing after reading the headers. The headers are all that's needed here, so use BytesParser directly and set headersonly. Signed-off-by: Kyle Meyer <kyle@kyleam.com> Link: https://lore.kernel.org/tools/20210717164836-mutt-send-email-mst@kernel.org/ Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-08-03Avoid decoding errors when extracting message ID from stdinKyle Meyer
The mbox, am, and pr subcommands accept an mbox on stdin and extract the message ID. When stdin.read() is called, Python assumes the encoding is locale.getpreferredencoding(False). This may not match the content encoding, leading to a decoding error. Instead feed the stdin bytes to message_from_bytes(), which leads to a decode('ASCII', errors='surrogateescape') underneath. That's sufficient to get the message ID from the ASCII headers. Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Kyle Meyer <kyle@kyleam.com> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-08-03Don't consider signature contents for trailersKonstantin Ryabitsev
Drop anything in the body below "-- " before parsing the contents for trailers. This won't catch all possible situations, as the "-- " standard is a bit of a dying standard, so add a list of known baddies like "Phone:" and "Email:" that are likely to trip us up. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/tools/20210719213535.vw3u4yg5mgxqysaf@pengutronix.de/ Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-22Allow '.git' to be a file for worktreesRob Herring
With multiple git worktrees, '.git' can be a file pointing to the real '.git' directory, so the current check for a directory is too strict. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://lore.kernel.org/r/20210621204335.1627303-1-robh@kernel.org
2021-06-22Additional --guess-base refinementsKonstantin Ryabitsev
Use --all by default, instead of limiting ourselves just to the current HEAD. This is actually a faster operation, because we don't have to pre-filter results. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-21Reimplement --guess-baseKonstantin Ryabitsev
Based on some feedback, attempt to reimplement --guess-base by looking at the file index hashes and using --find-object to locate when they were last changed. We limit this using --since and --until, so that we aren't trying to look through the entire history of the repo. For the --until date, we take the date of the patch. For the --since date, we take the timedelta using the number of days specified by --guess-lookback (default is 14 days). Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-17Don't append .git unnecessarilyKonstantin Ryabitsev
We already do this automatically elsewhere, so this causes a problem if we do it again. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-11Save mbox files with proper unixfromKonstantin Ryabitsev
In order to avoid some of the more obscure charset encoding problems, we switched to using as_string() for generating messages before saving them in an mbox file. However, this uncovered a bug where the unixfrom was not actually generated and saved, despite as_bytes() and as_string() supposedly behaving identically. See: https://docs.python.org/3/library/email.message.html#email.message.EmailMessage.as_string This commit fixes the problem by properly setting the unixfrom and using the recommended (and hopefully less buggy) email.generator interface when saving mailboxes. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-10Start using pytest for the test frameworkKonstantin Ryabitsev
Since we're not caring about 2.x compatibility, pytest seems to be a good candidate for this job. Obviously, there's a lot of ground to cover, but the goal is to do all future modifications with tests added so we can reduce regressions. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-09Fix sloppy trailer handlingKonstantin Ryabitsev
When returning sloppy trailers, make sure we always return a 4-member list, which includes the provenant LoreMessage itself. Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-07Save exploded pull requests as maildirs as wellKonstantin Ryabitsev
This moves maildir saving code into __init__.py so that we can benefit from it via other subcommands, such as pr. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-07Shuffle DKIM logging code around a bitKonstantin Ryabitsev
PyCharm is unhappy with PEP conformance, so shuffle things around a bit to satisfy it. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-07Include dkim log output when -d/--debug argument is passedPaul Barker
We can pass a logger object to dkim.verify() which will be used to report internal errors and debugging info. This can be helpful when investigating DKIM verification issues but is probably not wanted during normal operation so the log level of each message is reset to DEBUG. Each message is also prefixed with 'DKIM: ' to identify its origin when debug output is enabled. Signed-off-by: Paul Barker <paul@pbarker.dev> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://lore.kernel.org/r/20210607100252.8253-3-paul@pbarker.dev
2021-06-07Handle MIME encoded-word in DKIM-Signature headersPaul Barker
As recently found in patatt [1], mail gateways and archivers may mangle headers like DKIM-Signature if they are sent as an excessively long line. An example of this occuring was found when the DKIM-Signature header generated by Microsoft Office 365 collided with the header re-encoding performed by lists.sr.ht when generating mbox archive files. This encoding causes dkim.verify() to fail. The Python email.header module provides the decode_header() and make_header() functions which can be used to handle MIME encoded-word syntax or other header manglings which may occur. Fixing up the header content using these functions before calling dkim.verify() allows the verification to succeed. [1]: https://lore.kernel.org/tools/20210531140539.7630-1-paul@pbarker.dev/ Signed-off-by: Paul Barker <paul@pbarker.dev> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://lore.kernel.org/r/20210607100252.8253-2-paul@pbarker.dev
2021-06-03Account for in-body headers when trimming bodyKonstantin Ryabitsev
When we discover that a message can only be attested after we trim the body, we *must* set the body to that version, otherwise an attacker could append arbitrary content past the l= value boundary. We already do this in the current form, but we weren't properly handing in-body headers like From: and Subject: that are used to indicate to git the patch author vs. committer. This patch set fixes that and also streamlines a few other places where we were already relying on git mailinfo calls. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-03Fix partial reroll TUI visuals for v1->v2Konstantin Ryabitsev
Before: ✓ [PATCH v2 1/8] selftests/x86: Test signal frame XSTATE header corruption handling ✓ [PATCH v2 2/8] x86/fpu: Prevent state corruption in __fpu__restore_sig() ✓ [PATCH 3/8] x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer ✓ [PATCH 4/8] x86/fpu: Limit xstate copy size in xstateregs_set() ✓ [PATCH v2 5/8] x86/fpu: Sanitize xstateregs_set() ✓ [PATCH 6/8] x86/fpu: Add address range checks to copy_user_to_xstate() ✓ [PATCH 7/8] x86/fpu: Clean up the fpu__clear() variants ✓ [PATCH 8/8] x86/fpu: Deduplicate copy_xxx_to_xstate() After: ✓ [PATCH v2 1/8] selftests/x86: Test signal frame XSTATE header corruption handling ✓ [PATCH v2 2/8] x86/fpu: Prevent state corruption in __fpu__restore_sig() ✓ [PATCH v1->v2 3/8] x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer ✓ [PATCH v1->v2 4/8] x86/fpu: Limit xstate copy size in xstateregs_set() ✓ [PATCH v2 5/8] x86/fpu: Sanitize xstateregs_set() ✓ [PATCH v1->v2 6/8] x86/fpu: Add address range checks to copy_user_to_xstate() ✓ [PATCH v1->v2 7/8] x86/fpu: Clean up the fpu__clear() variants ✓ [PATCH v1->v2 8/8] x86/fpu: Deduplicate copy_xxx_to_xstate() Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-02Implement trim_body supportKonstantin Ryabitsev
When a message has a developer signature but is failing the signature check, rerun it again with trim_body. If that passes, we know that the signature is failing due to mailing list junk appended to the bottom of the message. In that case, automatically trim the message body so we have exactly what the developer attested and signed. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-06-01Fix cache aging cleanup of threadsRob Herring
The cache aging for threads was not running resulting in failures to fetch new messages in threads. Fix the empty cache check which should be for no '.msgs' directories. Fixes: 4950093c0c3e ("Don't use mboxo for anything") Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://lore.kernel.org/r/20210601200835.940887-1-robh@kernel.org
2021-05-28Limit 'From mboxrd@z' replacement to start of messageKyle Meyer
save_git_am_mbox() replaces 'From mboxrd@z ' with 'From git@z ' to make it clear that the output format is not mboxrd. However, all occurrences in the message are replaced, corrupting patches that contain 'From mboxrd@z '. Restrict the replacement to the first line of the message. Signed-off-by: Kyle Meyer <kyle@kyleam.com> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://lore.kernel.org/r/20210528042635.24959-1-kyle@kyleam.com
2021-05-26Up version to final 0.7.0v0.7.0Konstantin Ryabitsev
I think we are ready to go with the 0.7.0 release. There's always more tweaks to add, but at this point we can benefit from wider usage. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-26Check uids on the key when using default keyringKonstantin Ryabitsev
When the signature is validated using the default keyring, run an additional check on the UIDs and show the discrepancy if the identity used in the X-Developer-Signature header is different from the UIDs we have on the key. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-25Don't depend on List-Archive lore headerKonstantin Ryabitsev
The newer version of public-inbox is not injecting its own List-Archive headers, so stop relying on it for any purpose. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-21Tweak lore.kernel.org matchKonstantin Ryabitsev
Be a bit more discerning about the header matches for lore.kernel.org. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-21Strip any List-* headers matching loreKonstantin Ryabitsev
Our version of public-inbox still adds List-* headers of its own. This is gone in the newer version, so strip these in hopes that this helps verify more DKIM signatures. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-21Handle partial reroll of series without coverKonstantin Ryabitsev
A series may not have a cover letter, so properly handle that situation. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-20Reimplement single-msgid cherrypickingKonstantin Ryabitsev
When processing -P_, filter by that msgid (and its follow-ups) early on, instead of parsing the entire thread and only then looking for matches. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-20Initial support for Obsoleted-by: trailerKonstantin Ryabitsev
Per discussion on the users list, add initial support for the "Obsoleted-by" trailer that points at the new revision for the series instead of doing a blind match by subject+from. Probably buggy and needs better support for series number collisions (right now we don't check if the newly retrieved series has a revision number greater than the revision we already have). Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-20Minor visual tweak in outputKonstantin Ryabitsev
Group patch output inside the indented ---, and all processing messages before the indent. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-20Fix a crash on incomplete/missing threadsKonstantin Ryabitsev
Properly handle situation where we can get a None as well as an empty message list. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-18Don't use mboxo for anythingKonstantin Ryabitsev
While trying to figure out some odd DKIM failures, I've discovered that there is an important incompatibility between git's idea of what "mbox" format is, and Python's mboxo implementation -- at least when it comes to treating "\nFrom " escapes. According to the "original mbox" standard, when a message body contains a "\nFrom " sequence, it should be converted to "\n>From " in order not to confuse the parser. When reading messages in that format, clients are supposed to back-convert "\n>From " into their original form. This is the so-called "mboxo" format, which is what Python's mailbox.mbox supports: https://docs.python.org/3/library/mailbox.html#mailbox.mbox The "mboxrd" format was created to avoid a corruption problem whereas a body that legitimately contains "\n>From " would be wrongly converted into "\nFrom " upon parsing the mailbox, so mboxrd standard requires that, when saving a mailbox, "\n>From " sequences are additionally escaped as "\n>>From ". This is the format public-inbox supports, so when we grab mailboxes from remote, they are in mboxrd format. Git will try to guess the format of the mbox file, but it will ONLY back-convert "\n>From " sequences when you specifically tell it that it's "mboxrd" format, even when it's in fact "mboxo": git am --patch-format=mboxrd If you don't force the mboxrd format, git-am will preserve all escaped "\n>From " lines as-is. We've been previously operating on the assumption that git-am's mbox support properly implements "mboxo", but this was wrong, resulting in some commits like the following: https://git.kernel.org/torvalds/c/137733d08f4a This large-ish change ditches all internal use of Python's mboxo. When asked to save mbox files, we will save them without any escaping, the way git-am (i.e. git-mailsplit) espects them. The same goes when we're outputting to stdout. There is also a way now to pass -M to both "b4 am" and "b4 mbox" that will save things as maildirs -- git-am supports this natively and thus avoids any possible parsing ambiguities. You can set a config option b4.save-maildirs=yes to make this the default behaviour. The fallout of this is fairly benign, if annoying. There is no situation in which a patch would have "\nFrom " as part of its body, so the problem only affected commit messages. We will have a handful of these sprinkled around the trees, and will hopefully not introduce any new ones once everyone switches to the b4 version that outputs things in the format git-am expects. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-17Allow passing entire mbox via stdinKonstantin Ryabitsev
Per request, allow passing entire mbox files via stdin, allowing fully pipe-through operation from something like mutt: b4 am -sl -m - -o - Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://lore.kernel.org/tools/YFETLu8TKWI2WlSF@hirez.programming.kicks-ass.net
2021-05-17Perform mboxo unescaping before DKIM checkKonstantin Ryabitsev
Python's mailbox will not automatically remove mboxo escaping, so perform this manually before passing the message to dkim for verification. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-17Implement partial rerollKonstantin Ryabitsev
It has been a common request to support partial series rerolls where someone sends an amended patch as a follow-up to a previous series, e.g.: [PATCH v3 1/3] Patch one [PATCH v3 2/3] Patch two \- Re: [PATCH v3 2/3] Patch two Looks good, but please fix this $small_thing \- [PATCH v4 2/3] Patch two [PATCH v3] Patch three Previously, b4 refused to consider v4 as a complete new series, but now it will properly perform a partial reroll, but only in the cases where such patches are sent as follow-ups to the exact same patch number in the previous series: [PATCH v3->v4 1/3] Patch one [PATCH v4 2/3] Patch two [PATCH v3->v4 3/3] Patch three Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://lore.kernel.org/r/CAPcyv4ggbuHbqKV33_TpE7pqxvRag34baJrX3yQe-jXOikoATQ@mail.gmail.com
2021-05-14Restore check for attestation-check-dkimKonstantin Ryabitsev
Seems we have lost this check in the rewrite, so restore it to make sure that we only check dkim if b4.attestation-check-dkim == 'yes' (default). Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-14Improve subject parsing for bracketed prefixesKonstantin Ryabitsev
Look in all of the brackets and reconstitute the subject based on what we find there. This way we properly handle even the following: Subject: [foo-list] [PATCH [RFC] v1 x/n] [RESEND] foo: do foo Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-14Ensure trailers are tracked with source messagesKonstantin Ryabitsev
When we aggregate trailers, make sure that we track their originating messages so we can properly check attestation on all of them. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-12Fix DKIM check on headers that don't lowercase hKonstantin Ryabitsev
The h= field headers may not be lowercased, so make sure we handle that when looking if the date header is signed. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-12Properly fail with BADSIG on bad signatureKonstantin Ryabitsev
Fix logic error where we incorrectly reported "No key" when it was actually "BADSIG". Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-12Force datetime to UTC if it's nativeKonstantin Ryabitsev
We always want the datetime object to be tz-aware, but certain Date: header formats result in timezone-naive variants. For those cases, just pretend it's UTC, as that's sufficiently accurate for our purposes. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-11Rudimentary support for showing patatt key infoKonstantin Ryabitsev
I expect that we'll have better keyring management tooling in the future, but for now show some rudimentary information about patatt keys used in a thread via --show-keys, e.g.: b4 mbox --show-keys 20210511143536.743919-1-konstantin@linuxfoundation.org b4 mbox --show-keys 20210507181322.172569-1-konstantin@linuxfoundation.org Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-11Fix for DKIM signatures without t= fieldKonstantin Ryabitsev
Many DKIM signatures just sign the Date: field and do not include the t= timestamp. Properly handle this situation when we're checking for drift. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-11Reimplement attestation-staleness-daysKonstantin Ryabitsev
Looks like we lost this feature in the rewrite, so reimplement it again. This commit also removes obsolete configuration options and sets the default attestation check level at "softfail". Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-11Python 3.6 compatibility fixesKonstantin Ryabitsev
Looks like subscripting list[] and dict[] for typing hints is not supported in python-3.6. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2021-05-11Reimplement attestation code one more timeKonstantin Ryabitsev
Move end-to-end attestation code into its own library: patatt. See https://git.kernel.org/pub/scm/utils/patatt/patatt.git/about/ It is included into b4 as a submodule, but you will need to init it first: git submodule update --init This change significantly simplifies our attestation code, dropping thousands of lines of rather hairy code. Notably, patatt-style attestation is incompatible with previous attestation implementations done directly in b4, but that's just as well -- we've always marked it as "experimental" and the lack of adoption was proving that we weren't on the right path. Next to come is keyring management and documentation. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2020-12-28Save to/cc headers as-is for trackingKonstantin Ryabitsev
If we clean the to/cc headers to get rid of all unicode escaping, we run into a Python bug that is unable to properly parse addresses, e.g.: In [5]: from email import utils In [6]: utils.getaddresses(['foo <foo@bar.com>']) Out[6]: [('foo', 'foo@bar.com')] In [7]: utils.getaddresses(['Shuming [范書銘] <shumingf@realtek.com>']) Out[7]: [('', 'Shuming'), ('', ''), ('', '范書銘'), ('', ''), ('', 'shumingf@realtek.com')] If we store the headers as-is from the original message, we are less likely to run into this bug, as all non-ascii sequences should be qp-escaped in the original headers: =?big5?B?U2h1bWluZyBbrVOu0bvKXQ==?= <shumingf@realtek.com> This doesn't fix the underlying bug in Python, but works around it. Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2020-12-15Convert mboxrd to mboxoKonstantin Ryabitsev
Public-inbox emits mboxrd, but Python only understands mboxo, so we need to convert from mboxrd to mboxo before passing the retrieved results to mailbox.mbox. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=whRm2sKHeY-YQqxEJF=d9fGhnU2ajJs9i7CKC4feuPMTA@mail.gmail.com Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2020-12-14Allow passing extra opts to git-format-patchKonstantin Ryabitsev
We probably want to be able to tweak the output of git-format-patch based on which list we're running it for (e.g. passing --minimal or --histogram), so make it possible to pass extra parameters to the git command. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2020-12-14Fix crasher when we don't use -g with b4 prKonstantin Ryabitsev
If we're not passing -g to "b4 pr -e", then we should try to see if we are inside a git checkout and use that as our source. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>