Fix feature alignment when applying XGBoost/HistGradientBoosting weights without specifying ss_main_score#167
Fix feature alignment when applying XGBoost/HistGradientBoosting weights without specifying ss_main_score#167
Conversation
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
|
@copilot can you add a test for this using the https://github.com/PyProphet/pyprophet/tree/master/tests/data/test_data.osw test file. You can test and then apply the weights with You can captuer the stdout of the importance of var XX. They should be the same |
|
@jcharkow can you test if this works for you? |
|
Not really, the only score that seems to work as intended is the Command Run: Apply scores run (same file for testing):
Furthermore, if I exclude any of the other parameters I get errors.
|
|
I am curious if possibly an easier solution is to create --apply_weights as its own command and then just not have any parameters like --ss_... or --classifier or --level and this is all just inferred from the --apply_weights file |
Problem
When training an XGBoost or HistGradientBoosting model with a specific
--ss_main_scoreparameter and then applying the trained weights without specifying the same parameter, features become misaligned, causing incorrect scoring results.Example of the issue:
# Step 1: Train with specific main score pyprophet score --in data.osw --level=ms1ms2 --classifier=XGBoost --ss_main_score=var_dotprod_scoreThe model trains successfully with
var_dotprod_scoreas the main score, showing correct feature importances:# Step 2: Apply weights WITHOUT specifying the main score pyprophet score --in data.osw --level=ms1ms2 --classifier=XGBoost --apply_weights=weights.binThis applies weights to incorrect features because
--ss_main_scoredefaults toauto, potentially selecting a different main score and changing the feature order:The root cause is that during training, features are prepared based on the specified
ss_main_score, but when applying weights, if this parameter is not specified, it defaults toauto, which may select a different main score. This changes the feature order, causing the model to apply weights to the wrong features.Solution
This PR stores metadata (ss_main_score, classifier, level) alongside the trained model and automatically restores the correct
ss_main_scorewhen applying weights.Implementation
1. Enhanced Model Serialization (
pyprophet/io/_base.py,pyprophet/io/scoring/osw.py)Models are now saved with metadata:
2. Automatic Metadata Restoration (
pyprophet/scoring/runner.py)When loading weights:
--ss_main_score=auto(default), automatically uses the stored valuess_use_dynamic_main_scoreflag for correct semi-supervised learning behavior3. Backward Compatibility
Old weight files (without metadata) are automatically detected and still work with an appropriate warning:
Usage
After this fix, applying weights no longer requires manually specifying
--ss_main_score:Benefits
ss_main_scorewas used during trainingTesting
Comprehensive testing demonstrates:
Related
Fixes issue: "
--apply_weightsrequires--ss_main_scoreto be specified as in the original command"Related to draft PR #117 which explored feature name tracking approaches.
Original prompt
Fixes #151
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.