Optimization

Optimizers

XPM Configxpm_torch.optim.Optimizer[source]

XPM Configxpm_torch.optim.SGD(*, lr, weight_decay)[source]

Wrapper for SGD optimizer in Pytorch

lr: float = 1e-05: Learning rate

weight_decay: float = 0.0: Weight decay (L2)

XPM Configxpm_torch.optim.Adafactor(*, lr, weight_decay, relative_step)[source]

Wrapper for Adafactor optimizer in Transformers library

See transformers.optimization.Adafactor for full documentation

lr: float: Learning rate

weight_decay: float = 0.0: Weight decay (L2)

relative_step: bool = True: If true, time-dependent learning rate is computed instead of external learning rate

XPM Configxpm_torch.optim.Adam(*, lr, weight_decay, eps)[source]

Wrapper for Adam optimizer in PyTorch

lr: float = 0.001: Learning rate

weight_decay: float = 0.0: Weight decay (L2)

eps: float = 1e-08

XPM Configxpm_torch.optim.AdamW(*, lr, weight_decay, eps)[source]

Adam optimizer that takes into account the regularization

See the PyTorch documentation

lr: float = 0.001

weight_decay: float = 0.01

eps: float = 1e-08

Parameter Configuration

XPM Configxpm_torch.optim.ParameterOptimizer(*, optimizer, scheduler, module)[source]

Associates an optimizer with a list of parameters to optimize

optimizer: xpm_torch.optim.Optimizer: The optimizer

scheduler: xpm_torch.schedulers.Scheduler: The optional scheduler

module: xpm_torch.module.Module: The module from which parameters should be extracted

filter: xpm_torch.optim.ParameterFilter = xpm_torch.optim.ParameterFilter()generated: How parameters should be selected for this (by default, use them all)

XPM Configxpm_torch.optim.ParameterFilter[source]: One abstract class which doesn’t do the filtrage

XPM Configxpm_torch.optim.RegexParameterFilter(*, includes, excludes)[source]

gives the name of the model to do the filtrage Precondition: Only and just one of the includes and excludes can be None

includes: List[str]: The str of params to be included from the model

excludes: List[str]: The str of params to be excludes from the model

Schedulers

XPM Configxpm_torch.schedulers.Scheduler[source]: Base class for all optimizers schedulers

XPM Configxpm_torch.schedulers.LinearWithWarmup(*, num_warmup_steps, min_factor)[source]

Linear warmup followed by decay

num_warmup_steps: int: Number of warmup steps

min_factor: float = 0.0: Minimum multiplicative factor

XPM Configxpm_torch.schedulers.CosineWithWarmup(*, num_warmup_steps, num_cycles)[source]

Cosine schedule with warmup

Uses the implementation of the transformer library

https://huggingface.co/docs/transformers/main_classes/optimizer_schedules#transformers.get_cosine_schedule_with_warmup

num_warmup_steps: int: Number of warmup steps

num_cycles: float = 0.5: Number of cycles

Optimization Hooks

XPM Configxpm_torch.optim.OptimizationHook[source]: Base class for all optimization hooks

XPM Configxpm_torch.optim.GradientHook[source]

Hooks that are called when the gradient is computed

The gradient is guaranteed to be unscaled in this case.

XPM Configxpm_torch.optim.GradientClippingHook(*, max_norm)[source]

Gradient clipping

max_norm: float: Maximum norm for gradient clipping

version: str = 1constant: version of the Hook

XPM Configxpm_torch.optim.GradientLogHook(*, name)[source]

“Log the gradient norm

name: str = gradient_norm