Conversation
|
|
||
| def test_zhwiki(): | ||
| assert ([round(i) for i in solve(zhwiki.cjk, cache=cache)] == | ||
| [4, 2, 7, 0, 4, 2, 1]) |
There was a problem hiding this comment.
Maybe test each feature explicitly.
There was a problem hiding this comment.
Add test_jawiki & test_kowiki too.
|
@halfak from revscoring.dependencies import solve
from revscoring.datasources import revision_oriented
from editquality.feature_lists import jawiki
from revscoring.features import wikitext
from revscoring.features.modifiers import sub
r_text = revision_oriented.revision.text
p_text = revision_oriented.revision.parent.text
p_text_text = """
敗戦後は桑原武夫の『第二芸術-現代俳句について』
(1946年)によって、短詩型である俳句の限界が指摘された。
"""
r_text_text = """
敗戦後は桑原武夫の『第二芸術-現代俳句について』
(1946年)によって、短詩型である俳句の限界が指摘されたたた。
"""
cache = {p_text: p_text_text,
r_text: r_text_text}
# cjkwordthings_change = solve(sub(wikitext.revision.cjk.cjks, wikitext.revision.parent.cjk.cjks, name="revision.diff.cjk.cjkwordthings_change"), cache=cache)
# parent_cjks = len(solve(wikitext.revision.datasources.cjk.cjks, cache=cache))
# cjks = len(solve(wikitext.revision.datasources.parent.cjk.cjks, cache=cache))
# print("Revscoring results:\n parent_cjks = {}\n cjks = {}\n cjkwordthings_change = {}".format(parent_cjks, cjks, cjkwordthings_change))
cjkwordthings_change = list(solve(jawiki.wikitext.diff_cjk, cache=cache))
print("Editquality result:\n cjkwordthings_change = {}".format(cjkwordthings_change)) |
|
The inconsistency could be related to the cjk feature group naming issue I pointed out. See https://github.com/wikimedia/revscoring/pull/501/files#diff-499cc46dd0c97d4e81f2d23e15725821610c10e7be1f5a563846c84865c57069R21 |
|
@halfak I thought about it and I fixed it in both datasource and features(this is being published to pip right now) but it didn't help, see the fixed names in the master revscoring branch: |
|
Try enabling debug mode and running the code. You should get logging every time a dependency is evaluated. You might be able to get away with just setting Or you might have to configure the logger and set level to |
|
[UPDATE - SOLVED!!! kinda...?] @halfak
Does this all make sense? may I proceed with revscoring update/merge and add tests to edit quality? NOTE: |
|
OK I think I figured it out. If you look at this line: https://github.com/wikimedia/revscoring/blob/master/revscoring/features/wikitext/datasources/tokenized.py#L21 You'll see that we provide E.g., |
|
An alternative solution would be to modify the default name generation for the You could modify this line to be: That would ensure that the tokens datasource has a unique name and it is generally a good practice for including any argument that could change the output in the "name". |
|
FWIW, I like the second option better but both are good. |
|
@halfak well.. bill murray can only summarize my admiration towards your debugging skills :) I like the second solution better also .. I updated revscoring, I am releasing version 2.9.3 so it can be downloaded from pip, after it's published I will push new editquality update with new tests |
|
Let's add model builds to this before merging. Otherwise looks good. |
Other notes:
pavol86@ores-misc-01:~$ which python
/usr/bin/python
pavol86@ores-misc-01:~$ python --version
Python 2.7.13
(base) pavol86@ores-misc-01:~$ conda activate editquality_test
(editquality_test) pavol86@ores-misc-01:~$ python --version
Python 3.5.3
|

initial commit, not ready for merge, missing tests