Optimizers & Schedulers¶

This page details the supported optimizers and learning rate schedulers available in DeepFense.

Optimizers¶

Optimizers are defined in the optimizer section of training in the config.

1. Adam (`adam`)¶

Standard Adam optimizer.

Configuration Signature:

optimizer:
  type: adam
  args:
    lr: float
    weight_decay: float
    betas: [float, float]

Parameters:

lr - (float) Learning rate (default: 1e-6).
weight_decay - (float) L2 penalty (default: 1e-4).
betas - (tuple) Coefficients for computing running averages of gradient and its square (default: (0.9, 0.999)).

Example:

optimizer:
  type: adam
  args:
    lr: 0.0001
    weight_decay: 0.0001

2. AdamW (`adamw`)¶

Adam with decoupled weight decay. Generally recommended over Adam for transformer-based models (Wav2Vec2, etc.).

Configuration Signature:

optimizer:
  type: adamw
  args:
    lr: float
    weight_decay: float
    betas: [float, float]

Parameters:

lr - (float) Learning rate (default: 1e-6).
weight_decay - (float) Weight decay coefficient (default: 1e-4).
betas - (tuple) (default: (0.9, 0.999)).

Example:

optimizer:
  type: adamw
  args:
    lr: 0.000001
    weight_decay: 0.01

3. SGD (`sgd`)¶

Stochastic Gradient Descent.

Configuration Signature:

optimizer:
  type: sgd
  args:
    lr: float
    momentum: float
    weight_decay: float

Parameters:

lr - (float) Learning rate.
momentum - (float) Momentum factor (default: 0.9).
weight_decay - (float) (default: 1e-4).

Example:

optimizer:
  type: sgd
  args:
    lr: 0.01
    momentum: 0.9

Schedulers¶

Schedulers adjust the learning rate during training. They are defined in the scheduler section of training.

1. Step LR (`step_lr`)¶

Decays the learning rate by gamma every step_size epochs.

Configuration Signature:

scheduler:
  type: step_lr
  args:
    step_size: int
    gamma: float

Parameters:

step_size - (int) Period of learning rate decay (default: 10).
gamma - (float) Multiplicative factor of learning rate decay (default: 0.1).

Example:

scheduler:
  type: step_lr
  args:
    step_size: 15
    gamma: 0.1

2. Cosine Annealing (`cosine`)¶

Set the learning rate using a cosine annealing schedule.

Configuration Signature:

scheduler:
  type: cosine
  args:
    T_max: int
    eta_min: float

Parameters:

T_max - (int) Maximum number of iterations (usually set to total epochs).
eta_min - (float) Minimum learning rate (default: 0).

Example:

scheduler:
  type: cosine
  args:
    T_max: 100
    eta_min: 0.0000001

3. Exponential LR (`exponential`)¶

Decays the learning rate of each parameter group by gamma every epoch.

Configuration Signature:

scheduler:
  type: exponential
  args:
    gamma: float

Parameters:

gamma - (float) Multiplicative factor of learning rate decay (default: 0.9).

Example:

scheduler:
  type: exponential
  args:
    gamma: 0.95

Optimizers & Schedulers¶

Optimizers¶

1. Adam (adam)¶

2. AdamW (adamw)¶

3. SGD (sgd)¶

Schedulers¶

1. Step LR (step_lr)¶

2. Cosine Annealing (cosine)¶

3. Exponential LR (exponential)¶

1. Adam (`adam`)¶

2. AdamW (`adamw`)¶

3. SGD (`sgd`)¶

1. Step LR (`step_lr`)¶

2. Cosine Annealing (`cosine`)¶

3. Exponential LR (`exponential`)¶