LayerNorm
Applies Layer Normalization over a mini-batch of inputs as described in the paper Normalization https://arxiv.org/abs/1607.06450
TODO $$ y=x−E[x]Var[x]+ϵ∗γ+β y=Var[x]+ϵ x−E[x]∗γ+β $$
The mean and standard-deviation are calculated over the last D dimensions, where D is the dimension of normalized_shape
. For example, if normalized_shape
is (3, 5) (a 2-dimensional shape), the mean and standard-deviation are computed over the last 2 dimensions of the input (i.e. input.mean((-2, -1))). γ and β are learnable affine transform parameters of normalized_shape
if elementwise_affine
is true
. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False)
.
Value parameters
- `normalized_shape`
-
– input shape from an expected input of size [∗×normalized_shape[0]×normalized_shape[1]×…×normalized_shape[−1]] [∗×normalized_shape[0]×normalized_shape[1]×…×normalized_shape[−1]] If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size.
- elementwise_affine
-
– a boolean value that when set to
true
, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases). Default:true
. - eps
-
– a value added to the denominator for numerical stability. Default: 1e-5
Attributes
- Note
-
Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the
affine
option, Layer Normalization applies per-element scale and bias withelementwise_affine
. - Example
-
TODO
// NLP Example val Seq(batch, sentence_length, embedding_dim) = Seq(20, 5, 10) val embedding = torch.randn(batch, sentence_length, embedding_dim) val layer_norm = nn.LayerNorm(embedding_dim) // Activate module val out = layer_norm(embedding) // Image Example val Seq(N, C, H, W) = Seq(20, 5, 10, 10) val input = torch.randn(N, C, H, W) // Normalize over the last three dimensions (i.e. the channel and spatial dimensions) val layer_norm = nn.LayerNorm([C, H, W]) val output = layer_norm(input)
- Source
- LayerNorm.scala
- Graph
-
- Supertypes
-
trait TensorModule[ParamType]trait HasWeight[ParamType]class Moduleclass Objecttrait Matchableclass Any