Learn the Basics || Quickstart || Tensors || Datasets & DataLoaders || Transforms || Build Model || Autograd || Optimization || Save & Load Model

Tensors

Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other hardware accelerators. Tensors are also optimized for automatic differentiation (we'll see more about that later in the Autograd section). If you’re familiar with ndarrays, you’ll be right at home with the Tensor API. If not, follow along!

Initializing a Tensor

Tensors can be initialized in various ways. Take a look at the following examples:

Directly from data

Tensors can be created directly from data. The data type is automatically inferred.

val data = Seq(1, 2, 3, 4)
// data: Seq[Int] = List(1, 2, 3, 4)
val xData = torch.Tensor(data).reshape(2,2)
// xData: Tensor[Int32] = tensor dtype=int32, shape=[2, 2], device=CPU 
// [[1, 2],
//  [3, 4]]

From another tensor:

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.

// Ones Tensor:
val xOnes = torch.onesLike(xData) // retains the properties of xData
// xOnes: Tensor[Int32] = tensor dtype=int32, shape=[2, 2], device=CPU 
// [[1, 1],
//  [1, 1]]
// Random Tensor:
val xRand = torch.randLike(xData, dtype=torch.float32) // overrides the datatype of xData
// xRand: Tensor[Float32] = tensor dtype=float32, shape=[2, 2], device=CPU 
// [[0.4963, 0.7682],
//  [0.0885, 0.1320]]

With random or constant values:

shape is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.

val shape = Seq(2,3)
// shape: Seq[Int] = List(2, 3)

// Random Tensor:
val randTensor = torch.rand(shape)
// randTensor: Tensor[Float32] = tensor dtype=float32, shape=[2, 3], device=CPU 
// [[0.3074, 0.6341, 0.4901],
//  [0.8964, 0.4556, 0.6323]]

// Ones Tensor: 
val onesTensor = torch.ones(shape)
// onesTensor: Tensor[Float32] = tensor dtype=float32, shape=[2, 3], device=CPU 
// [[1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000]]

// Zeros Tensor:
val zerosTensor = torch.zeros(shape)
// zerosTensor: Tensor[Float32] = tensor dtype=float32, shape=[2, 3], device=CPU 
// [[0.0000, 0.0000, 0.0000],
//  [0.0000, 0.0000, 0.0000]]

Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.

var tensor = torch.rand(Seq(3,4))
// tensor: Tensor[Float32] = tensor dtype=float32, shape=[3, 4], device=CPU 
// [[0.3489, 0.4017, 0.0223, 0.1689],
//  [0.2939, 0.5185, 0.6977, 0.8000],
//  [0.1610, 0.2823, 0.6816, 0.9152]]

println(s"Shape of tensor: ${tensor.shape}")
// Shape of tensor: ArraySeq(3, 4)
println(s"Datatype of tensor: ${tensor.dtype}")
// Datatype of tensor: float32
println(s"Device tensor is stored on: {tensor.device}")
// Device tensor is stored on: {tensor.device}

Operations on Tensors

Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing, indexing, slicing), sampling and more are comprehensively described here.

Each of these operations can be run on the GPU (at typically higher speeds than on a CPU). If you’re using Colab, allocate a GPU by going to Runtime > Change runtime type > GPU.

By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using .to method (after checking for GPU availability). Keep in mind that copying large tensors across devices can be expensive in terms of time and memory! We move our tensor to the GPU if available

if torch.cuda.isAvailable then  
  tensor = tensor.to(torch.Device.CUDA)

Try out some of the operations from the list. If you're familiar with the NumPy API, you'll find the Tensor API a breeze to use.

Standard numpy-like indexing and slicing:

import torch.{---, Slice}
tensor = torch.ones(Seq(4, 4))
println(s"First row: ${tensor(0)}")
// First row: tensor dtype=float32, shape=[4], device=CPU 
// [1.0000, 1.0000, 1.0000, 1.0000]
println(s"First column: ${tensor(Slice(), 0)}")
// First column: tensor dtype=float32, shape=[4], device=CPU 
// [1.0000, 1.0000, 1.0000, 1.0000]
println(s"Last column: ${tensor(---, -1)}")
// Last column: tensor dtype=float32, shape=[4], device=CPU 
// [1.0000, 1.0000, 1.0000, 1.0000]
//tensor(---,1) = 0 TODO update op
println(tensor)
// tensor dtype=float32, shape=[4, 4], device=CPU 
// [[1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000]]

Joining tensors You can use torch.cat to concatenate a sequence of tensors along a given dimension. See also torch.stack, another tensor joining op that is subtly different from torch.cat.

val t1 = torch.cat(Seq(tensor, tensor, tensor), dim=1)
// t1: Tensor[Float32] = tensor dtype=float32, shape=[4, 12], device=CPU 
// [[1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000]]
println(t1)
// tensor dtype=float32, shape=[4, 12], device=CPU 
// [[1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000]]

Arithmetic operations

// This computes the matrix multiplication between two tensors. y1, y2, y3 will
// have the same value
// `tensor.mT` returns the transpose of a tensor
val y1 = tensor `@` tensor.mT
// y1: Tensor[Float32] = tensor dtype=float32, shape=[4, 4], device=CPU 
// [[4.0000, 4.0000, 4.0000, 4.0000],
//  [4.0000, 4.0000, 4.0000, 4.0000],
//  [4.0000, 4.0000, 4.0000, 4.0000],
//  [4.0000, 4.0000, 4.0000, 4.0000]]
val y2 = tensor.matmul(tensor.mT)
// y2: Tensor[Float32] = tensor dtype=float32, shape=[4, 4], device=CPU 
// [[4.0000, 4.0000, 4.0000, 4.0000],
//  [4.0000, 4.0000, 4.0000, 4.0000],
//  [4.0000, 4.0000, 4.0000, 4.0000],
//  [4.0000, 4.0000, 4.0000, 4.0000]]

//val y3 = torch.randLike(y1)
//torch.matmul(tensor, tensor.mT, out=y3)

// This computes the element-wise product. z1, z2, z3 will have the same value

val z1 = tensor * tensor
// z1: Tensor[Float32] = tensor dtype=float32, shape=[4, 4], device=CPU 
// [[1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000]]
val z2 = tensor.mul(tensor)
// z2: Tensor[Float32] = tensor dtype=float32, shape=[4, 4], device=CPU 
// [[1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000]]

//val z3 = torch.randLike(tensor)
//torch.mul(tensor, tensor, out=z3)

Single-element tensors If you have a one-element tensor, for example by aggregating all values of a tensor into one value, you can convert it to a Scala numerical value using item():

val agg = tensor.sum
// agg: Tensor[Float32] = tensor dtype=float32, shape=[], device=CPU 
// 16.0000
val aggItem = agg.item
// aggItem: Float = 16.0F
print(aggItem)
// 16.0
println(aggItem.getClass)
// float

In-place operations Operations that store the result into the operand are called in-place. They are denoted by a _ suffix. For example: x.copy_(y), x.t_(), will change x.

println(s"$tensor")
// tensor dtype=float32, shape=[4, 4], device=CPU 
// [[1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000],
//  [1.0000, 1.0000, 1.0000, 1.0000]]
tensor -= 5
// res14: Tensor[Float32] = tensor dtype=float32, shape=[4, 4], device=CPU 
// [[-4.0000, -4.0000, -4.0000, -4.0000],
//  [-4.0000, -4.0000, -4.0000, -4.0000],
//  [-4.0000, -4.0000, -4.0000, -4.0000],
//  [-4.0000, -4.0000, -4.0000, -4.0000]]
println(tensor)
// tensor dtype=float32, shape=[4, 4], device=CPU 
// [[-4.0000, -4.0000, -4.0000, -4.0000],
//  [-4.0000, -4.0000, -4.0000, -4.0000],
//  [-4.0000, -4.0000, -4.0000, -4.0000],
//  [-4.0000, -4.0000, -4.0000, -4.0000]]

In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss of history. Hence, their use is discouraged.