Optimization
Optimizers
- XPM Configxpm_torch.optim.SGD(*, lr, weight_decay)[source]
Wrapper for SGD optimizer in Pytorch
- lr: float = 1e-05
Learning rate
- weight_decay: float = 0.0
Weight decay (L2)
- XPM Configxpm_torch.optim.Adafactor(*, lr, weight_decay, relative_step)[source]
Wrapper for Adafactor optimizer in Transformers library
See
transformers.optimization.Adafactorfor full documentation- lr: float
Learning rate
- weight_decay: float = 0.0
Weight decay (L2)
- relative_step: bool = True
If true, time-dependent learning rate is computed instead of external learning rate
Parameter Configuration
- XPM Configxpm_torch.optim.ParameterOptimizer(*, optimizer, scheduler, module)[source]
Associates an optimizer with a list of parameters to optimize
- optimizer: xpm_torch.optim.Optimizer
The optimizer
- scheduler: xpm_torch.schedulers.Scheduler
The optional scheduler
- module: xpm_torch.module.Module
The module from which parameters should be extracted
- filter: xpm_torch.optim.ParameterFilter = xpm_torch.optim.ParameterFilter()generated
How parameters should be selected for this (by default, use them all)
- XPM Configxpm_torch.optim.RegexParameterFilter(*, includes, excludes)[source]
gives the name of the model to do the filtrage Precondition: Only and just one of the includes and excludes can be None
- includes: List[str]
The str of params to be included from the model
- excludes: List[str]
The str of params to be excludes from the model
Schedulers
- XPM Configxpm_torch.schedulers.LinearWithWarmup(*, num_warmup_steps, min_factor)[source]
Linear warmup followed by decay
- num_warmup_steps: int
Number of warmup steps
- min_factor: float = 0.0
Minimum multiplicative factor
Optimization Hooks
- XPM Configxpm_torch.optim.GradientHook[source]
Hooks that are called when the gradient is computed
The gradient is guaranteed to be unscaled in this case.