I find the term
NoiseMultiplier in the following part of tensorflow-federated tutorial.
def train(rounds, noise_multiplier, clients_per_round, data_frame): # Using the `dp_aggregator` here turns on differential privacy with adaptive # clipping. aggregation_factory = tff.learning.model_update_aggregator.dp_aggregator( noise_multiplier, clients_per_round)
I have read the paper about differential privacy with adaptive clipping Andrew et al. 2021, Differentially Private Learning with Adaptive Clipping. I guess
NoiseMultiplier is the noise we input to the system. However, I find the
NoiseMultiplier is a scalar we set. Actually, the different noise should be put into the corresponding weight_variables, so I am so confused about that.
To make an aggregation in TFF differentially private (focusing on an additive application of the Gaussian mechanism, which seems to be what is happening in the symbol you’re using), two steps are required:
- Incoming tensors must be clipped, so that the total norm of these (potentially structured) tensors considered as a single vector has an explicit upper bound. In this case, the adaptivity in adaptive clipping refers to the upper bound on this norm; this upper bound (the norm to which the incoming tensors are clipped) is what is adjusted through time.
- Noise sampled according to an isotropic Gaussian with some variance is added to these clipped tensors (possibly before or after aggregating, depending on the DP model, e.g. local vs central).
The noise multiplier here refers to the relationship between the clipping norm in step 1 and the variance in step 2; in fact, it is their ratio. It is this ratio which determines the privacy budget `used up’ by one application of the query; this can be seen, e.g., in the relationship between sensitivity, epsilon and variance in the Wikipedia article on the Gaussian mechanism.
The parameter set here, then, can be understood as a mechanism for specifying how much privacy each step of the algorithm ‘costs’. It is a scalar because it is simply the ratio of two scalars; the ‘vectorizing’ is handled by sampling from the high-dimensional Gaussian, but since this Gaussian is isotropic (spherical), only its scalar variance needs to be known (rather than the full covariance matrix).
Answered By – Keith Rush