torch.nn.functional
Attributes
Members list
Grouped members
Loss functions
Function that measures Binary Cross Entropy between target and input logits.
Function that measures Binary Cross Entropy between target and input logits.
TODO support weight, reduction, pos_weight
Attributes
 Inherited from:
 Loss (hidden)
 Source
 Loss.scala
Linear functions
Applies a bilinear transformation to the incoming data: $y = x_1^T A x_2 + b$
Applies a bilinear transformation to the incoming data: $y = x_1^T A x_2 + b$
Shape:
 input1: $(N, , H_{in1})$ where $H_{in1}=\text{in1_features}$ and $$ means any number of additional dimensions. All but the last dimension of the inputs should be the same.
 input2: $(N, *, H_{in2})$ where $H_{in2}=\text{in2_features}$
 weight: $(\text{out_features}, \text{in1_features}, \text{in2_features})$
 bias: $(\text{out_features})$
 output: $(N, *, H_{out})$ where $H_{out}=\text{out_features}$ and all but the last dimension are the same shape as the input.
Attributes
 Inherited from:
 Linear (hidden)
 Source
 Linear.scala
Applies a linear transformation to the incoming data: $y = xA^T + b$.
Applies a linear transformation to the incoming data: $y = xA^T + b$.
This operation supports 2D weight
with sparse layout
Warning
Sparse support is a beta feature and some layout(s)/dtype/device combinations may not be supported, or may not have autograd support. If you notice missing functionality please open a feature request.
This operator supports TensorFloat32<tf32_on_ampere>
Shape:
 Input: $(*, in_features)$ where [*] means any number of additional dimensions, including none
 Weight: $(out_features, in_features)$ or $(in_features)$
 Bias: $(out_features)$ or $()$
 Output: $(, out_features)$ or $()$, based on the shape of the weight
Attributes
 Inherited from:
 Linear (hidden)
 Source
 Linear.scala
Sparse functions
Takes LongTensor with index values of shape (*)
and returns a tensor of shape (*, numClasses)
that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.
Takes LongTensor with index values of shape (*)
and returns a tensor of shape (*, numClasses)
that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.
Attributes
 Inherited from:
 Sparse (hidden)
 Source
 Sparse.scala
Pooling functions
Applies a 1D max pooling over an input signal composed of several input planes.
Applies a 1D max pooling over an input signal composed of several input planes.
Value parameters
 ceilMode

If
true
, will use ceil instead of floor to compute the output shape. This ensures that every element in the input tensor is covered by a sliding window.  countIncludePad

when true, will include the zeropadding in the averaging calculation.
 input

input tensor of shape $(\text{minibatch} , \text{in_channels} , iW)$
 kernelSize

the size of the window.
 padding

implicit zero paddings on both sides of the input. Can be a single number or a tuple
(padW,)
.  stride

the stride of the window. Default:
kernelSize
Attributes
 See also

torch.nn.AdaptiveAvgPool1d for details and output shape.
 Inherited from:
 Pooling (hidden)
 Source
 Pooling.scala
Applies a 2D max pooling over an input signal composed of several input planes.
Applies a 2D max pooling over an input signal composed of several input planes.
Attributes
 Inherited from:
 Pooling (hidden)
 Source
 Pooling.scala
Applies a 3D max pooling over an input signal composed of several input planes.
Applies a 3D max pooling over an input signal composed of several input planes.
Attributes
 Inherited from:
 Pooling (hidden)
 Source
 Pooling.scala
Applies a 1D max pooling over an input signal composed of several input planes.
Applies a 1D max pooling over an input signal composed of several input planes.
Value parameters
 ceilMode

If
true
, will use ceil instead of floor to compute the output shape. This ensures that every element in the input tensor is covered by a sliding window.  dilation

The stride between elements within a sliding window, must be > 0.
 input

input tensor of shape $(\text{minibatch} , \text{in_channels} , iW)$, minibatch dim optional.
 kernelSize

the size of the window.
 padding

Implicit negative infinity padding to be added on both sides, must be >= 0 and <= kernel_size / 2.
 stride

the stride of the window.
Attributes
 Inherited from:
 Pooling (hidden)
 Source
 Pooling.scala
Applies a 1D max pooling over an input signal composed of several input planes.
Applies a 1D max pooling over an input signal composed of several input planes.
Value parameters
 ceilMode

If
true
, will use ceil instead of floor to compute the output shape. This ensures that every element in the input tensor is covered by a sliding window.  dilation

The stride between elements within a sliding window, must be > 0.
 input

input tensor of shape $(\text{minibatch} , \text{in_channels} , iW)$, minibatch dim optional.
 kernelSize

the size of the window.
 padding

Implicit negative infinity padding to be added on both sides, must be >= 0 and <= kernel_size / 2.
 stride

the stride of the window.
Attributes
 Inherited from:
 Pooling (hidden)
 Source
 Pooling.scala
Applies a 2D max pooling over an input signal composed of several input planes.
Applies a 2D max pooling over an input signal composed of several input planes.
Value parameters
 ceilMode

If
true
, will use ceil instead of floor to compute the output shape. This ensures that every element in the input tensor is covered by a sliding window.  dilation

The stride between elements within a sliding window, must be > 0.
 input

input tensor $(\text{minibatch} , \text{in_channels} , iH , iW)$, minibatch dim optional.
 kernelSize

size of the pooling region. Can be a single number or a tuple
(kH, kW)
 padding

Implicit negative infinity padding to be added on both sides, must be >= 0 and <= kernel_size / 2.
 stride

stride of the pooling operation. Can be a single number or a tuple
(sH, sW)
Attributes
 See also

torch.nn.MaxPool2d for details.
 Inherited from:
 Pooling (hidden)
 Source
 Pooling.scala
Applies a 2D max pooling over an input signal composed of several input planes.
Applies a 2D max pooling over an input signal composed of several input planes.
Value parameters
 ceilMode

If
true
, will use ceil instead of floor to compute the output shape. This ensures that every element in the input tensor is covered by a sliding window.  dilation

The stride between elements within a sliding window, must be > 0.
 input

input tensor $(\text{minibatch} , \text{in_channels} , iH , iW)$, minibatch dim optional.
 kernelSize

size of the pooling region. Can be a single number or a tuple
(kH, kW)
 padding

Implicit negative infinity padding to be added on both sides, must be >= 0 and <= kernel_size / 2.
 stride

stride of the pooling operation. Can be a single number or a tuple
(sH, sW)
Attributes
 See also

torch.nn.MaxPool2d for details.
 Inherited from:
 Pooling (hidden)
 Source
 Pooling.scala
Applies a 3D max pooling over an input signal composed of several input planes.
Applies a 3D max pooling over an input signal composed of several input planes.
Attributes
 Inherited from:
 Pooling (hidden)
 Source
 Pooling.scala
Applies a 3D max pooling over an input signal composed of several input planes.
Applies a 3D max pooling over an input signal composed of several input planes.
Attributes
 Inherited from:
 Pooling (hidden)
 Source
 Pooling.scala
Convolution functions
Applies a 1D convolution over an input signal composed of several input planes.
Applies a 1D convolution over an input signal composed of several input planes.
Attributes
 Inherited from:
 Convolution (hidden)
 Source
 Convolution.scala
Applies a 2D convolution over an input signal composed of several input planes.
Applies a 2D convolution over an input signal composed of several input planes.
Attributes
 Inherited from:
 Convolution (hidden)
 Source
 Convolution.scala
Applies a 3D convolution over an input image composed of several input planes.
Applies a 3D convolution over an input image composed of several input planes.
Attributes
 Inherited from:
 Convolution (hidden)
 Source
 Convolution.scala
Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called “deconvolution”.
Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called “deconvolution”.
Attributes
 Inherited from:
 Convolution (hidden)
 Source
 Convolution.scala
Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution”.
Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution”.
Attributes
 Inherited from:
 Convolution (hidden)
 Source
 Convolution.scala
Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution”.
Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution”.
Attributes
 Inherited from:
 Convolution (hidden)
 Source
 Convolution.scala
Dropout functions
During training, randomly zeroes some of the elements of the input tensor with probability p
using samples from a Bernoulli distribution.
During training, randomly zeroes some of the elements of the input tensor with probability p
using samples from a Bernoulli distribution.
Attributes
 See also

torch.nn.Dropout for details.
 Inherited from:
 Dropout (hidden)
 Source
 Dropout.scala
Nonlinear activation functions
Applies a softmax followed by a logarithm.
Applies a softmax followed by a logarithm.
While mathematically equivalent to log(softmax(x)), doing these two operations separately is slower and numerically unstable. This function uses an alternative formulation to compute the output and gradient correctly.
See torch.nn.LogSoftmax
for more details.
Attributes
 Inherited from:
 Activations (hidden)
 Source
 Activations.scala
Applies the rectified linear unit function elementwise.
Applies the rectified linear unit function elementwise.
See torch.nn.ReLU for more details.
Attributes
 Inherited from:
 Activations (hidden)
 Source
 Activations.scala
Applies the elementwise function $\text{Sigmoid}(x) = \frac{1}{1 + \exp(x)}$
Applies the elementwise function $\text{Sigmoid}(x) = \frac{1}{1 + \exp(x)}$
See torch.nn.Sigmoid
for more details.
Attributes
 Inherited from:
 Activations (hidden)
 Source
 Activations.scala
Applies the Sigmoid Linear Unit (SiLU) function, elementwise. The SiLU function is also known as the swish function.
Applies the Sigmoid Linear Unit (SiLU) function, elementwise. The SiLU function is also known as the swish function.
Attributes
 Inherited from:
 Activations (hidden)
 Source
 Activations.scala
Value members
Inherited methods
This criterion computes the cross entropy loss between input logits and target. See torch.nn.loss.CrossEntropyLoss for details.
This criterion computes the cross entropy loss between input logits and target. See torch.nn.loss.CrossEntropyLoss for details.
Shape:
 Input: Shape $(C)$, $(N,C)$ or $(N,C,d_1,d_2,...,d_K)$ with $K≥1$ in the case of Kdimensional loss.
 Target: If containing class indices, shape $()$, $(N)$ or $(N,d_1,d_2,...,d_K)$ with $K≥1$ in the case of Kdimensional loss where each value should be between $[0,C)$. If containing class probabilities, same shape as the input and each value should be between [0,1][0,1].
where:
 C = number of classes
 N = batch size
Value parameters
 ignore_index

Specifies a target value that is ignored and does not contribute to the input gradient. When
size_average
istrue
, the loss is averaged over nonignored targets. Note thatignore_index
is only applicable when the target contains class indices. Default:100
 input

Predicted unnormalized logits; see Shape section above for supported shapes.
 label_smoothing

A float in [0.0, 1.0]. Specifies the amount of smoothing when computing the loss, where 0.0 means no smoothing. The targets become a mixture of the original ground truth and a uniform distribution as described in Rethinking the Inception Architecture for Computer Vision. Default: 0.0
 reduce

Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on
size_average
. When reduce isfalse
, returns a loss per batch element instead and ignores size_average. Default:true
 reduction

Specifies the reduction to apply to the output: 'none'  'mean'  'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note:
size_average
andreduce
are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'  size_average

Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field
size_average
is set tofalse
, the losses are instead summed for each minibatch. Ignored when reduce isfalse
. Default:true
 target

Ground truth class indices or class probabilities; see Shape section below for supported shapes.
 weight

a manual rescaling weight given to each class. If given, has to be a Tensor of size C
Attributes
 Returns
 See also
 Example

// Example of target with class indices val input = torch.randn(3, 5, requires_grad=True) val target = torch.randint(5, (3,), dtype=torch.int64) val loss = F.cross_entropy(input, target) loss.backward() // Example of target with class probabilities val input = torch.randn(3, 5, requires_grad=True) val target = torch.randn(3, 5).softmax(dim=1) val loss = F.crossEntropy(input, target) loss.backward()
 Inherited from:
 Loss (hidden)
 Source
 Loss.scala