Skip to content

feat: index systems in MultiSystems by short names#929

Merged
wanghan-iapcm merged 3 commits intodeepmodeling:develfrom
njzjz:short_name
Feb 5, 2026
Merged

feat: index systems in MultiSystems by short names#929
wanghan-iapcm merged 3 commits intodeepmodeling:develfrom
njzjz:short_name

Conversation

@njzjz
Copy link
Member

@njzjz njzjz commented Feb 4, 2026

#554 dumps file names in short name. However, dpgen simplify replies on the filename to find a system in MultiSystems. This commit makes it possible to find a system by its short name.

Summary by CodeRabbit

  • New Features

    • Systems in collections can now be accessed by their short name as well as by formula-based keys.
  • Tests

    • Added test coverage verifying index-based access using a system's short name.

deepmodeling#554 dumps file names in short name. However, `dpgen simplify` replies on the filename to find a system in MultiSystems. This commit makes it possible to find a system by its short name.
Copilot AI review requested due to automatic review settings February 4, 2026 15:46
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 4, 2026
@dosubot dosubot bot added dpdata enhancement New feature or request labels Feb 4, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends MultiSystems to support indexing systems by their short_name, aligning in-memory access with the truncated filenames used when dumping to DeepMD formats.

Changes:

  • Add a __short_name_map in MultiSystems and update __getitem__ to resolve lookups by short_name before falling back to the full key.
  • Populate __short_name_map in __append so systems can always be retrieved via their short_name.
  • Extend long-filename tests to exercise accessing systems through ms[system.short_name] after dumping.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
dpdata/system.py Adds internal short-name indexing to MultiSystems via __short_name_map and routes __getitem__ lookups through it.
tests/test_multisystems.py Extends long-filename tests to verify that systems can be accessed via short_name after to_deepmd_npy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Jinzhe Zeng <njzjz@qq.com>
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 4, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 2 untouched benchmarks
⏩ 2 skipped benchmarks1


Comparing njzjz:short_name (2527433) with devel (2eadf39)

Open in CodSpeed

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@coderabbitai
Copy link

coderabbitai bot commented Feb 4, 2026

Warning

Rate limit exceeded

@njzjz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 29 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Added short_name-based lookup to MultiSystems by introducing a private __short_name_map that maps each system's short_name to its formula key, enabling indexing access via ms[short_name] in addition to ms[formula].

Changes

Cohort / File(s) Summary
MultiSystems implementation
dpdata/system.py
Added private __short_name_map to MultiSystems; updated __append__ to populate the map and __getitem__ to perform lookup by short_name if the provided key does not match a formula.
Tests
tests/test_multisystems.py
Added runtime test usage that indexes MultiSystems by system.short_name (e.g., ms[system.short_name]) after serialization to DeepMD formats to exercise the new lookup behavior.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: index systems in MultiSystems by short names' clearly and concisely summarizes the main change: adding the ability to access systems in MultiSystems using their short_name as an index key.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
dpdata/system.py (2)

1444-1450: ⚠️ Potential issue | 🟠 Major

Avoid short-name lookup shadowing formula keys.

If a system’s short_name equals another system’s formula (possible when long formulas shorten), the current order returns the wrong system. Prefer exact formula matches first.

🔁 Proposed fix
     def __getitem__(self, key):
         """Returns proerty stored in System by key or by idx."""
         if isinstance(key, int):
             return list(self.systems.values())[key]
-        if key in self.__short_name_map:
-            return self.systems[self.__short_name_map[key]]
-        return self.systems[key]
+        if key in self.systems:
+            return self.systems[key]
+        if key in self.__short_name_map:
+            return self.systems[self.__short_name_map[key]]
+        raise KeyError(key)

1520-1531: ⚠️ Potential issue | 🟠 Major

Guard against short-name collisions.

When two formulas produce the same short_name, the map is silently overwritten, which can misroute lookups. Add a collision check.

🧯 Proposed fix
         if formula in self.systems:
             self.systems[formula].append(system)
         else:
             self.systems[formula] = system.copy()
-        self.__short_name_map[system.short_name] = formula
+        short_name = system.short_name
+        if (
+            short_name in self.__short_name_map
+            and self.__short_name_map[short_name] != formula
+        ):
+            raise DataError(
+                f"short_name collision: {short_name} maps to both "
+                f"{self.__short_name_map[short_name]} and {formula}"
+            )
+        self.__short_name_map[short_name] = formula
🤖 Fix all issues with AI agents
In `@dpdata/system.py`:
- Around line 1371-1373: The field assignment currently uses a typed assignment
syntax that fails at runtime (self.__short_name_map = dict[str, str] = {});
change it to a proper instance variable type annotation and plain assignment for
__short_name_map (i.e., annotate self.__short_name_map as a dict[str, str] and
then assign an empty dict) so the runtime gets a normal dict object and static
type checkers see the correct type; if supporting older Python versions use
typing.Dict instead of built-in generic.

@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.07%. Comparing base (2eadf39) to head (2527433).
⚠️ Report is 1 commits behind head on devel.

Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #929      +/-   ##
==========================================
+ Coverage   86.06%   86.07%   +0.01%     
==========================================
  Files          83       83              
  Lines        7886     7893       +7     
==========================================
+ Hits         6787     6794       +7     
  Misses       1099     1099              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@dpdata/system.py`:
- Around line 1448-1449: check_atom_names currently rebuilds self.systems with
expanded formula keys but does not update the stale __short_name_map, causing
KeyError on later lookups; after the code in check_atom_names that reconstructs
self.systems, rebuild __short_name_map (either by calling an existing helper
like __build_short_name_map or by iterating self.systems and repopulating
self.__short_name_map mapping short names to the new full keys) so that lookups
using __short_name_map reference the updated keys.
🧹 Nitpick comments (1)
dpdata/system.py (1)

1531-1531: Consider handling short_name collisions.

If two systems have different formulas but the same short_name (possible when short_name falls back to short_formula), the later mapping silently overwrites the earlier one. This is an edge case (requires formulas > 255 bytes), but could lead to unexpected lookup behavior.

Options:

  • Raise a warning/error on collision
  • Store a list of formulas per short_name
  • Document the limitation

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Feb 4, 2026
@njzjz njzjz requested a review from wanghan-iapcm February 4, 2026 16:01
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 5, 2026
@wanghan-iapcm wanghan-iapcm merged commit b3cd20b into deepmodeling:devel Feb 5, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dpdata enhancement New feature or request lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants