SGD

torch.optim.SGD
class SGD(params: Iterable[Tensor[_]], lr: Float) extends Optimizer

Implements stochastic gradient descent (optionally with momentum).

$$ \begin{aligned} &\rule{110mm}{0.4pt} \ &\textbf{input} : \gamma \text{ (lr)}, : \theta_0 \text{ (params)}, : f(\theta) \text{ (objective)}, : \lambda \text{ (weight decay)}, \ &\hspace{13mm} :\mu \text{ (momentum)}, :\tau \text{ (dampening)}, :\textit{ nesterov,}:\textit{ maximize} \[-1.ex] &\rule{110mm}{0.4pt} \ &\textbf{for} : t=1 : \textbf{to} : \ldots : \textbf{do} \ &\hspace{5mm}g_t \leftarrow \nabla_{\theta} f_t (\theta_{t-1}) \ &\hspace{5mm}\textbf{if} : \lambda \neq 0 \ &\hspace{10mm} g_t \leftarrow g_t + \lambda \theta_{t-1} \ &\hspace{5mm}\textbf{if} : \mu \neq 0 \ &\hspace{10mm}\textbf{if} : t > 1 \ &\hspace{15mm} \textbf{b}t \leftarrow \mu \textbf{b}{t-1} + (1-\tau) g_t \ &\hspace{10mm}\textbf{else} \ &\hspace{15mm} \textbf{b}t \leftarrow g_t \ &\hspace{10mm}\textbf{if} : \textit{nesterov} \ &\hspace{15mm} g_t \leftarrow g{t-1} + \mu \textbf{b}t \ &\hspace{10mm}\textbf{else} \[-1.ex] &\hspace{15mm} g_t \leftarrow \textbf{b}t \ &\hspace{5mm}\textbf{if} : \textit{maximize} \ &\hspace{10mm}\theta_t \leftarrow \theta{t-1} + \gamma g_t \[-1.ex] &\hspace{5mm}\textbf{else} \[-1.ex] &\hspace{10mm}\theta_t \leftarrow \theta{t-1} - \gamma g_t \[-1.ex] &\rule{110mm}{0.4pt} \[-1.ex] &\bf{return} : \theta_t \[-1.ex] &\rule{110mm}{0.4pt} \[-1.ex] \end{aligned} $$

Nesterov momentum is based on the formula from On the importance of initialization and momentum in deep learning

Attributes

Source
SGD.scala
Graph
Supertypes
class Optimizer
class Object
trait Matchable
class Any

Members list

Value members

Inherited methods

def step(): Unit

Performs a single optimization step (parameter update).

Performs a single optimization step (parameter update).

Attributes

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

Inherited from:
Optimizer
Source
Optimizer.scala

Attributes

Inherited from:
Optimizer
Source
Optimizer.scala
def zeroGrad(): Unit

Sets the gradients of all optimized Tensors to zero.

Sets the gradients of all optimized Tensors to zero.

Attributes

Inherited from:
Optimizer
Source
Optimizer.scala