-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
Problem
The current random() function in policyengine_core/commons/formulas.py uses a global execution counter (count_random_calls) to differentiate random streams:
seeds = np.abs(entity_ids * 100 + population.simulation.count_random_calls)This creates a "ripple effect": adding, removing, or reordering variables that call random() changes the random values for ALL subsequent variables. This makes it impossible to:
- Compare policy versions with confidence (random noise shifts underneath)
- Isolate the effect of a specific policy change
- Run variables in parallel without counter synchronization
Proposed Solution: Name-Based Salting
Replace the global counter with the variable name (accessible via population.simulation.tracer.stack[-1]["name"]):
base_seed = stable_hash(f"{variable_name}:{per_variable_call_count}")
seeds = entity_ids ^ base_seedBenefits:
- Order-independent: Adding/removing variables doesn't affect others
- True reproducibility: Same variable + entity ID = same value, always
- Parallelizable: No global state to synchronize
Breaking Change
This will change random values for all existing simulations using random(). Downstream packages (policyengine-us, policyengine-uk) will see different takeup modeling results.
Questions for Maintainers
- Is this change acceptable given the breaking nature?
- Should we provide a
legacy_random()for transition? - Any concerns about the tracer stack approach?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels