Initialize data version control for managing test images#1036
Initialize data version control for managing test images#1036
Conversation
Adding dvc package to environment.yml and running `dvc init` to get the barebones .dvcignore, .dvc/config & .dvc/.gitignore files.
ea5444a to
7e0940c
Compare
|
Should this update |
We don't have a requirements-dev.txt file anymore as of #812 😄. I wanted to put it in environment.yml but ran into some CI issues, might need to finish off #1033 first. Oh, and will need to document how to use dvc somewhere too. I'm still getting my head around how to make it as easy as possible for everyone. |
|
Ok, I've added a "Using data version control (dvc)" subsection to CONTRIBUTING.md at 6bd7ba9. Have a read, see if it makes sense, and maybe try and test this Pull Request out! Feel free to ask any questions too. |
|
I see that the dagshub repo is a mirror of the github repo. Do all changes in the github repo will be synchronized to the dagshub repo? How often does the dasghub repo do the synchronization. I just pushed a few commits to the new-display branch (https://github.com/GenericMappingTools/pygmt/tree/new-display), but I don't see the changes in dagshub repo (https://dagshub.com/GenericMappingTools/pygmt/src/new-display). |
|
/test-gmt-dev |
|
Sorry, had to restart the regular GMT6.1.1/Python 3.9 test, I tried triggering the GMT dev version test at #1036 (comment), and looks like test_logo*.py failed (see https://github.com/GenericMappingTools/pygmt/actions/runs/662613609) because |
d56a31e to
c37bdff
Compare
|
I saw you just migrated test_image.png to dvc. We also need to update the test_image.py script:
|
|
My last concern is, how can reviewers review the image changes? When someone runs |
No need to do a |
|
I just added a new image to my testing branch (commit b8bdb7c in #1071). However, I can't preview the image in DAGsHub: https://dagshub.com/GenericMappingTools/pygmt/src/fix-test-basemap-polar/pygmt/tests/baseline |
|
We may also need to update the maintenance guides about how to review PRs with dvc images. We can do it after we have more experience with the new workflow. |
|
Will let the full tests run before merging 🚀, thanks for the review and testing things out thoroughly!
Yep, will do that in a follow up PR. I'll post a comment on #963 to summarize the new workflow for everyone else on the team. |
…ingTools#1036) Using a data version control package called [`dvc`](https://github.com/iterative/dvc) to manage the PNG test images in the PyGMT repo! In a nutshell, store only the hash of the PNG on GitHub (in a *.png.dvc file), while having the actual PNG stored on DAGsHub at https://dagshub.com/GenericMappingTools/pygmt. * Initialize data version control Adding dvc package to environment.yml and running `dvc init` to get the barebones .dvcignore, .dvc/config & .dvc/.gitignore files. * Set dvc remote as https://dagshub.com/GenericMappingTools/pygmt.dvc * Temporarily installing dvc using pip instead of conda to make CI work * Refactor test_logo to use mpl_image_compare and track png files in dvc * Add dvc pull as a step in ci_tests.yaml to pull in data * List files in pygmt/tests/baseline/ to see what happens after dvc pull * Do `dvc pull` before `pip install dist/*` otherwise test PNGs aren't there * First draft of instructions for using dvc to store baseline images * Instruct to do `git push` first and then `dvc push` Technically the order shouldn't matter, but most tutorials seem to use `git push` first so follow that. * New checklist item for maintainers to get added to DAGsHub dvc remote * Move pygmt/tests/baseline/.gitignore to top-level * Clarify that `git rm -r --cached` only needs to run during migration * Try installing dvc from conda again now that there is a Py3.9 package * Install dvc and do `dvc pull` on GMT dev tests too * Refactor test_logo tests to be simpler and more unit-test like * Mention dvc status command to see which files need staging * Update test_image to use SI units and long aliases Co-authored-by: Dongdong Tian <seisman.info@gmail.com>

Description of proposed changes
Using a data version control package called
dvcto manage the PNG test images in our repository!In a nutshell, we will only store the hash of the PNG on GitHub (in a *.png.dvc file) and the actual PNG will be stored at https://dagshub.com/GenericMappingTools/pygmt (see e.g. https://dagshub.com/GenericMappingTools/pygmt/src/data_version_control/pygmt/tests/baseline/test_logo_on_a_map.png).
List of commands ran so far on this 'data_version_control' branch (will update as I test things out):
Background (only needed to be done once for this repo)
Installing DVC for developing PyGMT
DVC init
Setup DVC remote
When a new test is being written (Everyone will need to run)
DVC remote authentication config (changes .dvc/config.local)
Generate new baseline PNG images
Push images to DVC remote
Pull PNG images from DVC remote (needed for Github Actions CI)
Note: Some of the
dvccommands may not be necessary if wedvc installsome git hooks so thatdvcadd/checkout/push actions happen simply withgitadd/checkout/push, but need to try this out a bit more. One forseeable con with this solution is thatdvcmay (?) run even for commits where no PNG images change, so an extra layer of slowness.References:
Addresses #963
Reminders
make formatandmake checkto make sure the code follows the style guide.doc/api/index.rst.Slash Commands
You can write slash commands (
/command) in the first line of a comment to performspecific operations. Supported slash commands are:
/format: automatically format and lint the code/test-gmt-dev: run full tests on the latest GMT development version