BatchNorm2d
Applies Batch Normalization over a 4D input as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift .
$$y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta$$
The mean and standard-deviation are calculated per-dimension over the mini-batches and $\gamma$ and $\beta$ are learnable parameter vectors of size [C] (where [C] is the number of features or channels of the input). By default, the elements of $\gamma$ are set to 1 and the elements of $\beta$ are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to [torch.var(input, unbiased=False)].
Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default momentum
of 0.1.
If trackRunningStats
is set to false
, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.
Example:
import torch.nn
// With Learnable Parameters
var m = nn.BatchNorm2d(numFeatures = 100)
// Without Learnable Parameters
m = nn.BatchNorm2d(100, affine = false)
val input = torch.randn(Seq(20, 100, 35, 45))
val output = m(input)
Value parameters
- affine:
-
a boolean value that when set to
true
, this module has learnable affine parameters. Default:True
- eps:
-
a value added to the denominator for numerical stability. Default: 1e-5
- momentum
-
the value used for the runningVean and runningVar computation. Can be set to
None
for cumulative moving average (i.e. simple average). Default: 0.1 - numFeatures
-
number of features or channels $C$ of the input
- trackRunningStats:
-
a boolean value that when set to
true
, this module tracks the running mean and variance, and when set tofalse
, this module does not track such statistics, and initializes statistics buffersrunningMean
andrunningVar
asNone
. When these buffers areNone
, this module always uses batch statistics. in both training and eval modes. Default:true
Shape:- Input: $(N, C, H, W)$
- Output: $(N, C, H, W)$ (same shape as input)
Attributes
- Note
-
This
momentum
argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is $\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t$, where $\hat{x}$ is the estimated statistic and $x_t$ is the new observed value. Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization. - Source
- BatchNorm2d.scala
- Graph
-
- Supertypes