-
Notifications
You must be signed in to change notification settings - Fork 301
Some PETSc vector modifications #4374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
The code path that used to exist after the `operator=` branch is incredibly non-performant. It led to a serial GPU run running 20% slower on residual evaluations than a 30 rank CPU run. After its removal, the GPU run residual evaluation is 3 times faster. That's how large the performance penalty is
roystgnr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I can find is that one assertion failing - and that's probably an overzealous assertion with any PETSc version that's not so ancient we stopped supporting it a decade ago? I'd try just removing it and see what happens.
src/numerics/petsc_vector.C
Outdated
| LibmeshPetscCall(VecCopy(_vec,v_local->_vec)); | ||
| else | ||
| libmesh_error_msg("Vectors are inconsistent"); | ||
| libmesh_assert(this->comm().verify(int(this->type()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To satisfy my OCD: could we use std::underlying_type<ParallelType>::type instead of int here?
|
Job Test MOOSE on a1f776a : invalidated by @roystgnr Kicking CI; a bunch of timeouts all on suspiciously similar (functions/solution_function) tests |
|
Job Coverage, step Generate coverage on 1ee4fc3 wanted to post the following: Coverage
Warnings
This comment will be updated on new commits. |
||||||||||||||||||||||||||
src/numerics/petsc_vector.C
Outdated
|
|
||
| LibmeshPetscCall(VecCopy (v._vec, this->_vec)); | ||
| // This is PETSc's only requirement for VecCopy | ||
| if (this->local_size() == v.local_size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add an assertion, or even only a dbg mode assertion, that this == gives the same result on all processors?
I can't imagine why someone's going to try to assign between two vectors with the same sizes but different non-serial partitionings, and I don't feel the need to support it either, but I would like to try to catch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failure in MOOSE is just this case. Single element so that the parallel vector has all its dofs on one rank such that its local size matches the serial vector's local size on one rank but not the other
src/numerics/petsc_vector.C
Outdated
| libmesh_assert(this->type() != SERIAL); | ||
| libmesh_error_msg( | ||
| "Scattering from a serial vector on every rank to a parallel vector is not behavior we " | ||
| "define because we do not verify the serial vector is the same on each rank"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this a libmesh_not_implemented_msg()? It'd be reasonable to say "if processor p owns dof d on the target vector, then we treat that processor's value for dof d on the source vector as definitive", which would be efficient and well-defined; I just don't see any reason to bother with that until someone needs it.
|
Huh. Those timeouts look real. |
|
Interesting |
|
Ok this should be ready for further review |
roystgnr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my question about if clauses vs assertions, but I'm also okay just merging as-is and adding O(1) CPU cycles in opt mode.
| } | ||
| else if (v.local_size() == this->size()) | ||
| AssignmentType assign_type = Error; | ||
| if (this->type() == SERIAL && v.type() != SERIAL && v.size() == this->local_size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we've already asserted this->size() == v.size(), and if this->type() == SERIAL should imply this->size() == this->local_size(), then shouldn't the v.size() == this->local_size() test be redundant under proper use?
| AssignmentType assign_type = Error; | ||
| if (this->type() == SERIAL && v.type() != SERIAL && v.size() == this->local_size()) | ||
| assign_type = ParallelToSerial; | ||
| else if (this->type() != SERIAL && v.type() == SERIAL && this->size() == v.local_size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise for the this->size() == v.local_size() here - couldn't this be an assertion rather than a test in opt modes?
I'll be interested to see whether the
localizechanges break stuff. I bet it will at least break unit testing but it will be more interesting to see whether it breaks production code