init
No gradients will be recorded for these operations.
Attributes
- See also
-
init.h https://pytorch.org/cppdocs/api/file_torch_csrc_api_include_torch_nn_init.h.html#file-torch-csrc-api-include-torch-nn-init-h
- Source
- init.scala
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
init.type
Members list
Type members
Classlikes
Attributes
- Source
- init.scala
- Supertypes
Attributes
- Source
- init.scala
- Supertypes
Value members
Concrete methods
Return the recommended gain value for the given nonlinearity function. The values are as follows:
Return the recommended gain value for the given nonlinearity function. The values are as follows:
nonlinearity gain
Linear / Identity 1 Conv{1,2,3}D 1 Sigmoid 1 Tanh \frac{5}{3} ReLU $sqrt{r}$ Leaky Relu $\sqrt{\frac{1}{1+\text{negative_slope}^2}}$ SELU \frac{3}{4}
Value parameters
- nonlinearity
-
– the non-linear function (nn.functional name)
- param
-
– optional parameter for the non-linear function
Attributes
- See also
- Note
-
In order to implement Neural Networks https://papers.nips.cc/paper/2017/hash/5d44ee6f2c3f71b73125876103c8f6c4-Abstract.html, you should use
nonlinearity='linear'
instead ofnonlinearity='selu'
. This gives the initial weights a variance of 1/N, which is necessary to induce a stable fixed point in the forward pass. In contrast, the default gain forSELU
sacrifices the normalization effect for more stable gradient flow in rectangular layers. - Source
- init.scala
Fills the input Tensor with the value valval.
Fills the input Tensor with the value valval.
Value parameters
- fillValue
-
– the value to fill the tensor with
- t
-
– an n-dimensional torch.Tensor
Attributes
- See also
- Source
- init.scala
From libTorch Fills the given tensor with the Dirac delta function in-place, and returns it. No gradient will be recorded for this operation.
From libTorch Fills the given tensor with the Dirac delta function in-place, and returns it. No gradient will be recorded for this operation.
From Pytorch
Fills the {3, 4, 5}-dimensional input Tensor with the Dirac delta function. Preserves the identity of the inputs in Convolutional layers, where as many input channels are preserved as possible. In case of groups>1, each group of channels preserves identity
Value parameters
- groups
-
(int, optional) – number of groups in the conv layer (default: 1)
- t
-
– a {3, 4, 5}-dimensional torch.Tensor
Attributes
- See also
- Source
- init.scala
Fills the given 2-dimensional matrix with an identity matrix. No gradient will be recorded for this operation.
Fills the given 2-dimensional matrix with an identity matrix. No gradient will be recorded for this operation.
Fills the 2-dimensional input Tensor with the identity matrix. Preserves the identity of the inputs in Linear layers, where as many inputs are preserved as possible.
Value parameters
- t
-
– a 2-dimensional torch.Tensor
Attributes
- See also
- Source
- init.scala
Fills the input Tensor with values according to the method described in "Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from $N(0,std^22)$ where: $$std = \frac{gain}{\sqrt{fan_mode}}
Fills the input Tensor with values according to the method described in "Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from $N(0,std^22)$ where: $$std = \frac{gain}{\sqrt{fan_mode}}
Also known as He initialization.
No gradient will be recorded for this operation.
Value parameters
- a
-
– the negative slope of the rectifier used after this layer (only used with 'leaky_relu')
- mode
-
– either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.
- nonlinearity
-
– the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' (default).
- t
-
– an n-dimensional torch.Tensor
Attributes
- See also
- Source
- init.scala
Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from $U(−bound,bound)$ where $\text{bound} = \text{gain} \times \sqrt{\frac{3}{fan_mode}}
Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from $U(−bound,bound)$ where $\text{bound} = \text{gain} \times \sqrt{\frac{3}{fan_mode}}
Also known as He initialization.
No gradient will be recorded for this operation.
Value parameters
- a
-
– the negative slope of the rectifier used after this layer (only used with 'leaky_relu')
- mode
-
– either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.
- nonlinearity
-
– the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' (default).
- t
-
– an n-dimensional torch.Tensor
Attributes
- See also
- Source
- init.scala
Fills the he given 2-dimensional input Tensor with values drawn from the normal distribution $N(\text{mean},\text{std}^2)$. No gradient will be recorded for this operation.
Fills the he given 2-dimensional input Tensor with values drawn from the normal distribution $N(\text{mean},\text{std}^2)$. No gradient will be recorded for this operation.
Value parameters
- mean
-
– the mean of the normal distribution
- std
-
– the standard deviation of the normal distribution
- t
-
– an n-dimensional torch.Tensor
Attributes
- See also
- Source
- init.scala
Fills the input Tensor with the scalar value 1. No gradient will be recorded for this operation.
Fills the input Tensor with the scalar value 1. No gradient will be recorded for this operation.
Value parameters
- t
-
– an n-dimensional torch.Tensor
Attributes
- See also
- Source
- init.scala
Fills the input Tensor with a (semi) orthogonal matrix, as described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened. No gradient will be recorded for this operation.
Fills the input Tensor with a (semi) orthogonal matrix, as described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened. No gradient will be recorded for this operation.
Value parameters
- gain
-
– optional scaling factor
- t
-
– an n-dimensional torch.Tensor, where n≥2n≥2
Attributes
- See also
- Source
- init.scala
Fills the 2D input Tensor as a sparse matrix, where the non-zero elements will be drawn from the normal distribution $N(0,0.01)$, as described in "Deep learning via Hessian-free optimization" - Martens, J. (2010). The sparsity is a real value between 0 and 1 that controls the fraction of elements in each column to be set to zero.
Fills the 2D input Tensor as a sparse matrix, where the non-zero elements will be drawn from the normal distribution $N(0,0.01)$, as described in "Deep learning via Hessian-free optimization" - Martens, J. (2010). The sparsity is a real value between 0 and 1 that controls the fraction of elements in each column to be set to zero.
No gradient will be recorded for this operation.
Value parameters
- gain
-
– The fraction of elements in each column to be set to zero
- std
-
– the standard deviation of the normal distribution used to generate the non-zero values
- t
-
– an n-dimensional torch.Tensor
Attributes
- See also
- Source
- init.scala
Fills the given 2-dimensional input Tensor with values drawn from the uniform distribution $U(a,b)$. No gradient will be recorded for this operation.
Fills the given 2-dimensional input Tensor with values drawn from the uniform distribution $U(a,b)$. No gradient will be recorded for this operation.
Value parameters
- a
-
– the lower bound of the uniform distribution
- b
-
– the upper bound of the uniform distribution
- t
-
– an n-dimensional torch.Tensor
Attributes
- See also
- Source
- init.scala
Fills the input Tensor with values according to the method described in "Understanding the difficulty of training deep feedforward neural networks"" - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. Values are scaled by the gain parameter. The resulting tensor will have values sampled from $U(−a,a)$ where $a=gain \times \sqrt{\frac{6}{fan_in+fan_out$}}
Fills the input Tensor with values according to the method described in "Understanding the difficulty of training deep feedforward neural networks"" - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. Values are scaled by the gain parameter. The resulting tensor will have values sampled from $U(−a,a)$ where $a=gain \times \sqrt{\frac{6}{fan_in+fan_out$}}
Also known as Glorot initialization.
Value parameters
- gain
-
– an optional scaling factor
- t
-
– an n-dimensional torch.Tensor
Attributes
- See also
- Source
- init.scala
Fills the input Tensor with values according to the method described in "Understanding the difficulty of training deep feedforward neural networks"" - Glorot, X. & Bengio, Y. (2010), using a normal distribution. Values are scaled by the gain parameter. The resulting tensor will have values sampled from $N(0,\text{std}^2) $ where $a=gain \times \sqrt{\frac{2}{fan_in+fan_out$}}
Fills the input Tensor with values according to the method described in "Understanding the difficulty of training deep feedforward neural networks"" - Glorot, X. & Bengio, Y. (2010), using a normal distribution. Values are scaled by the gain parameter. The resulting tensor will have values sampled from $N(0,\text{std}^2) $ where $a=gain \times \sqrt{\frac{2}{fan_in+fan_out$}}
Also known as Glorot initialization.
Attributes
- See also
- Source
- init.scala
Fills the input Tensor with the scalar value 0. No gradient will be recorded for this operation.
Fills the input Tensor with the scalar value 0. No gradient will be recorded for this operation.
Value parameters
- t
-
– an n-dimensional torch.Tensor
Attributes
- See also
- Source
- init.scala