-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Fix: ensure unpacked **kwargs have string-compatible keys (#20706) #20713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix: ensure unpacked **kwargs have string-compatible keys (#20706) #20713
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
hauntsaninja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add tests, analyse the primer diff, make sure you account for this #20706 (comment)
|
The interesting thing is that we already check this, I guess just not for the def f(**kwargs: int) -> None:
pass
f(**{1: 1}) # E: Keywords must be stringsSo IMO we just need to find what's doing skipping that check and not do that. |
|
Yes, see my note here: #20706 (comment) |
This comment has been minimized.
This comment has been minimized.
…ation for dict() calls
|
I have refactored the fix based on the feedback from @hauntsaninja and @A5rocks. I found that dict() calls are transformed into DictExpr during semantic analysis, which caused them to bypass standard keyword validation. I've introduced a from_dict_call flag to the DictExpr class to preserve this context. This allows checkexpr.py to utilize the existing is_valid_keyword_var_arg logic to catch non-string keys in dict(**kwargs) while avoiding false positives for standard dictionary literals. I also added a permanent unit test in test-data/unit/check-expressions.test as requested. |
This comment has been minimized.
This comment has been minimized.
| ): | ||
| self.all_exports = [n for n in self.all_exports if n != expr.args[0].value] | ||
|
|
||
| def translate_dict_call(self, call: CallExpr) -> DictExpr | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we have to do this? Maybe this is from older mypy where a workaround was required... Can you test the fallout from removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I see from blame it's required for TypedDict assignments. That doesn't seem completely necessary (surely we can special case TypedDict type context for dict calls?), but I guess that's more of a followup.
EDIT: however, I think it may be cleaner simply to return None for any dict w/ non-string keys at this point, rather than add an extra attribute
…g-20706 he commit.
660c862 to
6e9f1e4
Compare
|
I have refactored the solution to follow the cleaner approach suggested by @A5rocks. I removed the extra from_dict_call attribute and reverted the changes to nodes.py and checkexpr.py. Instead, I modified translate_dict_call in semanal.py to return None when it encounters a **kwargs unpacking literal with non-string keys. This allows the standard CallExpr validation logic to handle the error naturally, which I've verified locally with dict(**{10: 20}) and Any types. The permanent unit test is included in check-expressions.test. |
This comment has been minimized.
This comment has been minimized.
|
Sorry for adding more and more! Could you add a test case for |
|
Meh, I thought a bit more. I think your original idea was better (eg adding an attribute). We need all the type information for checking kwargs... I guess the alternative is moving |
…ytes and nested unpacking
1b58d34 to
2534663
Compare
for more information, see https://pre-commit.ci
|
I have synchronized the branch with upstream/master and implemented the attribute-based approach as discussed. This version correctly handles full type information during checking, allowing us to catch nested unpacking and bytes keys while keeping dictionary literals unaffected. All 23+ checks were passing in the previous iteration, and this version maintains that logic. |
| # Check if any **kwargs argument is a dict literal with non-string keys. | ||
| # In that case, don't translate so that normal function call type checking | ||
| # will catch the "keywords must be strings" error. | ||
| for kind, arg in zip(call.arg_kinds, call.args): | ||
| if kind == ARG_STAR2 and isinstance(arg, DictExpr): | ||
| # Check if all keys in the dict literal are strings (not bytes!) | ||
| for key, _ in arg.items: | ||
| if key is not None and not isinstance(key, StrExpr): | ||
| # Non-string key found, don't translate | ||
| for a in call.args: | ||
| a.accept(self) | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Check if any **kwargs argument is a dict literal with non-string keys. | |
| # In that case, don't translate so that normal function call type checking | |
| # will catch the "keywords must be strings" error. | |
| for kind, arg in zip(call.arg_kinds, call.args): | |
| if kind == ARG_STAR2 and isinstance(arg, DictExpr): | |
| # Check if all keys in the dict literal are strings (not bytes!) | |
| for key, _ in arg.items: | |
| if key is not None and not isinstance(key, StrExpr): | |
| # Non-string key found, don't translate | |
| for a in call.args: | |
| a.accept(self) | |
| return None |
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
A5rocks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given above suggestion is accepted, this certainly works. I'm a bit worried about whether from_dict_call should roundtrip to the cache (it doesn't right now), and I don't see why we can't do translate_dict_call in the typechecker stage (which would obviate the need for an extra attribute)... but adding more work for you doesn't really help!
|
PS I may be getting the wrong vibes, but your comments feel AI-generated. Please write them yourself! Any context the AI has, we have too! (We can see your changes!) However, things like your motivations or what you think about different approaches are what's actually important... which maybe an AI can do but you can certainly do better. |
Summary
This PR addresses issue #20706, where mypy failed to catch a TypeError occurring at runtime when non-string keys were used within dictionary unpacking (**).
In Python, keyword unpacking via the ** operator requires all keys to be valid identifiers (strings). While mypy caught this in most contexts, it missed the error inside dict() constructors because the unpacked mapping was incorrectly being matched to positional parameters in the constructor's overloads.
Changes
Updated visit_dict_expr: Modified this method in mypy/checkexpr.py to validate **expr items before they are unpacked.
Key Validation: Integrated a check to ensure unpacked expressions are mappings with keys compatible with builtins.str.
Error Diagnostic: Triggered invalid_keyword_var_arg when non-string keys are detected to align static analysis with runtime behavior.
Checklist
[x] Read the Contributing Guidelines.
[x] Added tests for all changed behavior.
[x] Verified that my code passes the local test suite using pytest mypy/test/testcheck.py.
Verification
Verified locally with the following reproduction script:
Python
d: dict[int, int] = {10: 20}
dict(**d) # Now correctly reports: error: Argument after ** must have string keys [arg-type]