init

torch.nn.init
object init

No gradients will be recorded for these operations.

Attributes

See also

init.h https://pytorch.org/cppdocs/api/file_torch_csrc_api_include_torch_nn_init.h.html#file-torch-csrc-api-include-torch-nn-init-h

Source
init.scala
Graph
Supertypes
class Object
trait Matchable
class Any
Self type
init.type

Members list

Type members

Classlikes

enum Mode

Attributes

Source
init.scala
Supertypes
trait Enum
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Source
init.scala
Supertypes
trait Enum
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Value members

Concrete methods

Return the recommended gain value for the given nonlinearity function. The values are as follows:

Return the recommended gain value for the given nonlinearity function. The values are as follows:

nonlinearity gain

Linear / Identity 1 Conv{1,2,3}D 1 Sigmoid 1 Tanh \frac{5}{3} ReLU $sqrt{r}$ Leaky Relu $\sqrt{\frac{1}{1+\text{negative_slope}^2​}}$ SELU \frac{3}{4}

Value parameters

nonlinearity

– the non-linear function (nn.functional name)

param

– optional parameter for the non-linear function

Attributes

See also
Note

In order to implement Neural Networks https://papers.nips.cc/paper/2017/hash/5d44ee6f2c3f71b73125876103c8f6c4-Abstract.html, you should use nonlinearity='linear' instead of nonlinearity='selu'. This gives the initial weights a variance of 1/N, which is necessary to induce a stable fixed point in the forward pass. In contrast, the default gain for SELU sacrifices the normalization effect for more stable gradient flow in rectangular layers.

Source
init.scala

Fills the input Tensor with the value valval.

Fills the input Tensor with the value valval.

Value parameters

fillValue

– the value to fill the tensor with

t

– an n-dimensional torch.Tensor

Attributes

See also
Source
init.scala
def dirac_[D <: DType](t: Tensor[D]): Tensor[D]

From libTorch Fills the given tensor with the Dirac delta function in-place, and returns it. No gradient will be recorded for this operation.

From libTorch Fills the given tensor with the Dirac delta function in-place, and returns it. No gradient will be recorded for this operation.

From Pytorch

Fills the {3, 4, 5}-dimensional input Tensor with the Dirac delta function. Preserves the identity of the inputs in Convolutional layers, where as many input channels are preserved as possible. In case of groups>1, each group of channels preserves identity

Value parameters

groups

(int, optional) – number of groups in the conv layer (default: 1)

t

– a {3, 4, 5}-dimensional torch.Tensor

Attributes

See also
Source
init.scala
def eye_[D <: DType](t: Tensor[D]): Tensor[D]

Fills the given 2-dimensional matrix with an identity matrix. No gradient will be recorded for this operation.

Fills the given 2-dimensional matrix with an identity matrix. No gradient will be recorded for this operation.

Fills the 2-dimensional input Tensor with the identity matrix. Preserves the identity of the inputs in Linear layers, where as many inputs are preserved as possible.

Value parameters

t

– a 2-dimensional torch.Tensor

Attributes

See also
Source
init.scala

Fills the input Tensor with values according to the method described in "Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from $N(0,std^22)$ where: $$std = \frac{gain}{\sqrt{fan_mode}}

Fills the input Tensor with values according to the method described in "Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from $N(0,std^22)$ where: $$std = \frac{gain}{\sqrt{fan_mode}}

Also known as He initialization.

No gradient will be recorded for this operation.

Value parameters

a

– the negative slope of the rectifier used after this layer (only used with 'leaky_relu')

mode

– either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

nonlinearity

– the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' (default).

t

– an n-dimensional torch.Tensor

Attributes

See also
Source
init.scala

Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from $U(−bound,bound)$ where $\text{bound} = \text{gain} \times \sqrt{\frac{3}{fan_mode}}

Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from $U(−bound,bound)$ where $\text{bound} = \text{gain} \times \sqrt{\frac{3}{fan_mode}}

Also known as He initialization.

No gradient will be recorded for this operation.

Value parameters

a

– the negative slope of the rectifier used after this layer (only used with 'leaky_relu')

mode

– either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

nonlinearity

– the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' (default).

t

– an n-dimensional torch.Tensor

Attributes

See also
Source
init.scala
def normal_[D <: DType](t: Tensor[D], mean: Double, std: Double): Tensor[D]

Fills the he given 2-dimensional input Tensor with values drawn from the normal distribution $N(\text{mean},\text{std}^2)$. No gradient will be recorded for this operation.

Fills the he given 2-dimensional input Tensor with values drawn from the normal distribution $N(\text{mean},\text{std}^2)$. No gradient will be recorded for this operation.

Value parameters

mean

– the mean of the normal distribution

std

– the standard deviation of the normal distribution

t

– an n-dimensional torch.Tensor

Attributes

See also
Source
init.scala
def ones_[D <: DType](t: Tensor[D]): Tensor[D]

Fills the input Tensor with the scalar value 1. No gradient will be recorded for this operation.

Fills the input Tensor with the scalar value 1. No gradient will be recorded for this operation.

Value parameters

t

– an n-dimensional torch.Tensor

Attributes

See also
Source
init.scala

Fills the input Tensor with a (semi) orthogonal matrix, as described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened. No gradient will be recorded for this operation.

Fills the input Tensor with a (semi) orthogonal matrix, as described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened. No gradient will be recorded for this operation.

Value parameters

gain

– optional scaling factor

t

– an n-dimensional torch.Tensor, where n≥2n≥2

Attributes

See also
Source
init.scala

Fills the 2D input Tensor as a sparse matrix, where the non-zero elements will be drawn from the normal distribution $N(0,0.01)$, as described in "Deep learning via Hessian-free optimization" - Martens, J. (2010). The sparsity is a real value between 0 and 1 that controls the fraction of elements in each column to be set to zero.

Fills the 2D input Tensor as a sparse matrix, where the non-zero elements will be drawn from the normal distribution $N(0,0.01)$, as described in "Deep learning via Hessian-free optimization" - Martens, J. (2010). The sparsity is a real value between 0 and 1 that controls the fraction of elements in each column to be set to zero.

No gradient will be recorded for this operation.

Value parameters

gain

– The fraction of elements in each column to be set to zero

std

– the standard deviation of the normal distribution used to generate the non-zero values

t

– an n-dimensional torch.Tensor

Attributes

See also
Source
init.scala
def trunc_[D <: DType](t: Tensor[D]): Tensor[D]

Attributes

Source
init.scala
def uniform_[D <: DType](t: Tensor[D], a: Double, b: Double): Tensor[D]

Fills the given 2-dimensional input Tensor with values drawn from the uniform distribution $U(a,b)$. No gradient will be recorded for this operation.

Fills the given 2-dimensional input Tensor with values drawn from the uniform distribution $U(a,b)$. No gradient will be recorded for this operation.

Value parameters

a

– the lower bound of the uniform distribution

b

– the upper bound of the uniform distribution

t

– an n-dimensional torch.Tensor

Attributes

See also
Source
init.scala

Fills the input Tensor with values according to the method described in "Understanding the difficulty of training deep feedforward neural networks"" - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. Values are scaled by the gain parameter. The resulting tensor will have values sampled from $U(−a,a)$ where $a=gain \times \sqrt{\frac{6}{fan_in+fan_out$}}

Fills the input Tensor with values according to the method described in "Understanding the difficulty of training deep feedforward neural networks"" - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. Values are scaled by the gain parameter. The resulting tensor will have values sampled from $U(−a,a)$ where $a=gain \times \sqrt{\frac{6}{fan_in+fan_out$}}

Also known as Glorot initialization.

Value parameters

gain

– an optional scaling factor

t

– an n-dimensional torch.Tensor

Attributes

See also
Source
init.scala

Fills the input Tensor with values according to the method described in "Understanding the difficulty of training deep feedforward neural networks"" - Glorot, X. & Bengio, Y. (2010), using a normal distribution. Values are scaled by the gain parameter. The resulting tensor will have values sampled from $N(0,\text{std}^2) $ where $a=gain \times \sqrt{\frac{2}{fan_in+fan_out$}}

Fills the input Tensor with values according to the method described in "Understanding the difficulty of training deep feedforward neural networks"" - Glorot, X. & Bengio, Y. (2010), using a normal distribution. Values are scaled by the gain parameter. The resulting tensor will have values sampled from $N(0,\text{std}^2) $ where $a=gain \times \sqrt{\frac{2}{fan_in+fan_out$}}

Also known as Glorot initialization.

Attributes

See also
Source
init.scala
def zeros_[D <: DType](t: Tensor[D]): Tensor[D]

Fills the input Tensor with the scalar value 0. No gradient will be recorded for this operation.

Fills the input Tensor with the scalar value 0. No gradient will be recorded for this operation.

Value parameters

t

– an n-dimensional torch.Tensor

Attributes

See also
Source
init.scala