feat: index systems in MultiSystems by short names#929
feat: index systems in MultiSystems by short names#929wanghan-iapcm merged 3 commits intodeepmodeling:develfrom
Conversation
deepmodeling#554 dumps file names in short name. However, `dpgen simplify` replies on the filename to find a system in MultiSystems. This commit makes it possible to find a system by its short name.
There was a problem hiding this comment.
Pull request overview
This PR extends MultiSystems to support indexing systems by their short_name, aligning in-memory access with the truncated filenames used when dumping to DeepMD formats.
Changes:
- Add a
__short_name_mapinMultiSystemsand update__getitem__to resolve lookups byshort_namebefore falling back to the full key. - Populate
__short_name_mapin__appendso systems can always be retrieved via theirshort_name. - Extend long-filename tests to exercise accessing systems through
ms[system.short_name]after dumping.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| dpdata/system.py | Adds internal short-name indexing to MultiSystems via __short_name_map and routes __getitem__ lookups through it. |
| tests/test_multisystems.py | Extends long-filename tests to verify that systems can be accessed via short_name after to_deepmd_npy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jinzhe Zeng <njzjz@qq.com>
Merging this PR will not alter performance
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdded short_name-based lookup to MultiSystems by introducing a private Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
dpdata/system.py (2)
1444-1450:⚠️ Potential issue | 🟠 MajorAvoid short-name lookup shadowing formula keys.
If a system’s
short_nameequals another system’s formula (possible when long formulas shorten), the current order returns the wrong system. Prefer exact formula matches first.🔁 Proposed fix
def __getitem__(self, key): """Returns proerty stored in System by key or by idx.""" if isinstance(key, int): return list(self.systems.values())[key] - if key in self.__short_name_map: - return self.systems[self.__short_name_map[key]] - return self.systems[key] + if key in self.systems: + return self.systems[key] + if key in self.__short_name_map: + return self.systems[self.__short_name_map[key]] + raise KeyError(key)
1520-1531:⚠️ Potential issue | 🟠 MajorGuard against short-name collisions.
When two formulas produce the same
short_name, the map is silently overwritten, which can misroute lookups. Add a collision check.🧯 Proposed fix
if formula in self.systems: self.systems[formula].append(system) else: self.systems[formula] = system.copy() - self.__short_name_map[system.short_name] = formula + short_name = system.short_name + if ( + short_name in self.__short_name_map + and self.__short_name_map[short_name] != formula + ): + raise DataError( + f"short_name collision: {short_name} maps to both " + f"{self.__short_name_map[short_name]} and {formula}" + ) + self.__short_name_map[short_name] = formula
🤖 Fix all issues with AI agents
In `@dpdata/system.py`:
- Around line 1371-1373: The field assignment currently uses a typed assignment
syntax that fails at runtime (self.__short_name_map = dict[str, str] = {});
change it to a proper instance variable type annotation and plain assignment for
__short_name_map (i.e., annotate self.__short_name_map as a dict[str, str] and
then assign an empty dict) so the runtime gets a normal dict object and static
type checkers see the correct type; if supporting older Python versions use
typing.Dict instead of built-in generic.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## devel #929 +/- ##
==========================================
+ Coverage 86.06% 86.07% +0.01%
==========================================
Files 83 83
Lines 7886 7893 +7
==========================================
+ Hits 6787 6794 +7
Misses 1099 1099 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@dpdata/system.py`:
- Around line 1448-1449: check_atom_names currently rebuilds self.systems with
expanded formula keys but does not update the stale __short_name_map, causing
KeyError on later lookups; after the code in check_atom_names that reconstructs
self.systems, rebuild __short_name_map (either by calling an existing helper
like __build_short_name_map or by iterating self.systems and repopulating
self.__short_name_map mapping short names to the new full keys) so that lookups
using __short_name_map reference the updated keys.
🧹 Nitpick comments (1)
dpdata/system.py (1)
1531-1531: Consider handling short_name collisions.If two systems have different formulas but the same
short_name(possible whenshort_namefalls back toshort_formula), the later mapping silently overwrites the earlier one. This is an edge case (requires formulas > 255 bytes), but could lead to unexpected lookup behavior.Options:
- Raise a warning/error on collision
- Store a list of formulas per short_name
- Document the limitation
#554 dumps file names in short name. However,
dpgen simplifyreplies on the filename to find a system in MultiSystems. This commit makes it possible to find a system by its short name.Summary by CodeRabbit
New Features
Tests