Changed disclosure checks to include NAs for cross-tabulations. by anarchodoc · Pull Request #456 · datashield/dsBase

anarchodoc · 2026-02-06T22:57:19Z

This PR is to fix a disclosure bug whereby a table value below the threshold can be converted to NA and then identified. The fix entails including the NAs in the disclosure test.

In the following example, there's a variable ('sex_bin') which cannot be read as at least one of the categories has a value below the filter value (3):

> ds.table("D$sex_bin")
  Aggregated (exists("sex_bin", D)) [====================================================] 100% / 1s
  Aggregated (asFactorDS1("D$sex_bin")) [================================================] 100% / 1s
  Aggregated (tableDS(rvar.transmit = "D$sex_bin", cvar.transmit = NULL, stvar.transmit = NULL, )...

All studies failed for reasons identified below

Study1: Failed: at least one cell has a non-zero count less than nfilter.tab i.e. 3

Study2: Failed: at least one cell has a non-zero count less than nfilter.tab i.e. 3

Study3: Failed: at least one cell has a non-zero count less than nfilter.tab i.e. 3

$validity.message
[1] "All studies failed for reasons identified below"

$error.messages
$error.messages$COHORT1
[1] "Failed: at least one cell has a non-zero count less than nfilter.tab i.e. 3"

$error.messages$COHORT2
[1] "Failed: at least one cell has a non-zero count less than nfilter.tab i.e. 3"

$error.messages$COHORT3
[1] "Failed: at least one cell has a non-zero count less than nfilter.tab i.e. 3"

>

So, recode the variable (I know it has 3 categories: 1,2,9 - and here I suspect 9 might be the suspect category):

# Convert to numeric
ds.asNumeric(x.name="D$sex_bin",
             newobj = "sex_bin.n",
             datasources = working)
# Recode 9 to NA
ds.recodeValues(
   var.name = "sex_bin.n",
   values2replace = c(1,2,9),
   new.values.vector = c(0,1,NA),
   newobj = "sex_bin.n",
   datasources = working)

# Reconnect to main working data set
ds.dataFrame(x=c("D","sex_bin.n"),
             newobj = "D",
             datasources = working)

Then, we can cross-tabulate the two variables - and it works! Look at this:

> ds.table("D$sex_bin","D$sex_bin.n")
  Aggregated (exists("sex_bin", D)) [====================================================] 100% / 1s
  Aggregated (exists("sex_bin.n", D)) [==================================================] 100% / 1s
  Aggregated (asFactorDS1("D$sex_bin")) [================================================] 100% / 1s
  Aggregated (asFactorDS1("D$sex_bin.n")) [==============================================] 100% / 1s
  Aggregated (tableDS(rvar.transmit = "D$sex_bin", cvar.transmit = "D$sex_bin.n", ) [====] 100% / 1s

Data in all studies were valid

Study1: No errors reported from this study

Study2: No errors reported from this study

Study3: No errors reported from this study

$output.list
$output.list$TABLE.STUDY.COHORT1_row.props
         D$sex_bin.n
D$sex_bin   0   1  NA
       1    1   0   0
       2    0   1   0
       9    0   0   1
       NA NaN NaN NaN

$output.list$TABLE.STUDY.COHORT1_col.props
         D$sex_bin.n
D$sex_bin 0 1 NA
       1  1 0  0
       2  0 1  0
       9  0 0  1
       NA 0 0  0

$output.list$TABLE.STUDY.COHORT2_row.props
         D$sex_bin.n
D$sex_bin   0   1  NA
       1    1   0   0
       2    0   1   0
       9    0   0   1
       NA NaN NaN NaN

$output.list$TABLE.STUDY.COHORT2_col.props
         D$sex_bin.n
D$sex_bin 0 1 NA
       1  1 0  0
       2  0 1  0
       9  0 0  1
       NA 0 0  0

$output.list$TABLE.STUDY.COHORT3_row.props
         D$sex_bin.n
D$sex_bin   0   1  NA
       1    1   0   0
       2    0   1   0
       9    0   0   1
       NA NaN NaN NaN

$output.list$TABLE.STUDY.COHORT3_col.props
         D$sex_bin.n
D$sex_bin 0 1 NA
       1  1 0  0
       2  0 1  0
       9  0 0  1
       NA 0 0  0

$output.list$TABLES.COMBINED_all.sources_row.props
         D$sex_bin.n
D$sex_bin   0   1  NA
       1    1   0   0
       2    0   1   0
       9    0   0   1
       NA NaN NaN NaN

$output.list$TABLES.COMBINED_all.sources_col.props
         D$sex_bin.n
D$sex_bin 0 1 NA
       1  1 0  0
       2  0 1  0
       9  0 0  1
       NA 0 0  0

$output.list$TABLE_STUDY.COHORT1_counts
         D$sex_bin.n
D$sex_bin    0    1 NA
       1  3751    0  0
       2     0 3180  0
       9     0    0  2
       NA    0    0  0

$output.list$TABLE_STUDY.COHORT2_counts
         D$sex_bin.n
D$sex_bin    0   1 NA
       1  1107   0  0
       2     0 983  0
       9     0   0  2
       NA    0   0  0

$output.list$TABLE_STUDY.COHORT3_counts
         D$sex_bin.n
D$sex_bin    0    1 NA
       1  1750    0  0
       2     0 1638  0
       9     0    0  1
       NA    0    0  0

$output.list$TABLES.COMBINED_all.sources_counts
         D$sex_bin.n
D$sex_bin    0    1 NA
       1  6608    0  0
       2     0 5801  0
       9     0    0  5
       NA    0    0  0


$validity.message
[1] "Data in all studies were valid"

>

Now we clearly see that there were some individuals such that n was below the filter value (3) but these don't get picked up by the filter trap as they are now in the NA column. Specifically, there are 2 in COHORT1, 2 in COHORT2 and 1 in COHORT3 - giving a total of 5 subjects with value 9 in the original variable.

Changed disclosure checks to include NAs for cross-tabulations.

5fef434

anarchodoc marked this pull request as ready for review February 6, 2026 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed disclosure checks to include NAs for cross-tabulations.#456

Changed disclosure checks to include NAs for cross-tabulations.#456
anarchodoc wants to merge 1 commit intodatashield:masterfrom
anarchodoc:master

anarchodoc commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

anarchodoc commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

anarchodoc commented Feb 6, 2026 •

edited

Loading