fix: check for NaNs in emd loss matrix#623
fix: check for NaNs in emd loss matrix#623bobluppes wants to merge 11 commits intoPythonOT:masterfrom
Conversation
ot/lp/__init__.py
Outdated
|
|
||
| if np.isnan(M).any(): | ||
| raise ValueError('The loss matrix should not contain NaN values.') | ||
|
|
There was a problem hiding this comment.
Failing early here ensures that we do not segfault in the accelerated emd_c call.
I did not look too deep into the emd_c implementation, but my assumption is that this check is somewhat pessimistic. Maybe it is possible to formulate problems for which we do not need to access a subset of values in the loss matrix (possibly due to the graph being disconnected). In that case we could support NaN values in some cases. @rflamary what is your opinion on this?
There was a problem hiding this comment.
if the graph is disconnected then the parts that are not used should have an infinite value (which is ha,ndled by the C++ solver). i'm OK with not handling naNs.
rflamary
left a comment
There was a problem hiding this comment.
A few comments. Thanks @bobluppes for the PR
ot/lp/__init__.py
Outdated
| ot.optim.cg : General regularized OT | ||
| """ | ||
|
|
||
| if np.isnan(M).any(): |
There was a problem hiding this comment.
A problem here is that you are using numpy on arrays that might not be numpy (see backend function below). You should do the test later in the function on the OT loss marix that hhas been converted to numpy to avoid backend errors.
Types of changes
This PR introduces an additional check for
NaNs in the loss matrix of the emd computation. IfNaNs are detected we raise an error in order to protect against segfaults in the C++ backend.Motivation and context / Related issue
The motivation of this PR is to fail more gracefully in cases of
NaNcosts.Closes #469
How has this been tested (if it applies)
Added new tests.
PR checklist
CONTRIBUTING.md