Configuration Reference¶
Complete reference for all YAML configuration parameters in DeepFense.
Configuration Structure¶
# Top-level structure
exp_name: string # Experiment name
output_dir: string # Output directory
seed: integer # Random seed
data: {...} # Data configuration
model: {...} # Model configuration
training: {...} # Training configuration
Global Settings¶
Parameters:
- exp_name - (str) Name for the experiment folder (default:
"default_exp"). - output_dir - (str) Base directory for outputs (default:
"./outputs/"). - seed - (int) Random seed for reproducibility (default:
42).
Data Configuration¶
Structure¶
data:
sampling_rate: 16000
label_map:
bonafide: 1
spoof: 0
train: {...} # Training data config
val: {...} # Validation data config
test: {...} # Test data config (optional)
Dataset Parameters¶
Parameters:
- dataset_type - (str) Dataset class name (default:
"StandardDataset"). - parquet_files - (list[str]) Required. Paths to Parquet files.
- dataset_names - (list[str]) Names for each Parquet file (default:
null). Length should matchparquet_files. If shorter, defaults todataset_i. - batch_size - (int) Batch size (default:
32). - shuffle - (bool) Shuffle data (default:
true). - num_workers - (int) DataLoader workers (default:
4). - drop_last - (bool) Drop incomplete batches (default:
false).
Note on Placement: All parameters above (
parquet_files,batch_size,num_workers, etc.) must be defined inside the specific split section (e.g., underdata.train:ordata.val:). They are not global.
Base Transform¶
Applied to all data (train, val, test):
base_transform:
- type: load_audio
target_sr: 16000
mono: true
- type: pad
max_len: 64600
random_pad: true # Random crop if audio > max_len
pad_type: repeat # How to pad short audio
Available Transforms:
- load_audio - Parameters:
target_sr,mono. Loads audio file. - pad - Parameters:
max_len,random_pad,pad_type. Pads/truncates to fixed length.
Augmentation Transform (Training Only)¶
augment_transform:
- type: augmentation_pipeline
p: 0.5 # Probability of applying
mode: parallel # Selection mode
execution: chain # Execution mode
concat_original: false # Keep original?
transforms:
- type: rawboost
noise_ratio: 1.0
algo: 5
- type: rir
noise_ratio: 0.8
csv_file: /path/to/rirs.csv
Pipeline Modes¶
The pipeline logic is split into two phases: Selection (what to apply) and Execution (how to apply it).
1. Selection Phase (mode)
* parallel: Selects exactly ONE random transform from the list (acts like "OneOf").
* sequential: Selects ALL transforms (or k random ones if k is set).
2. Execution Phase (execution)
* chain: Applies selected transforms in sequence to the same audio ($x \to T_1 \to T_2 \dots$).
* Returns a single waveform (Shape: L).
* independent: Applies each selected transform separately to the original audio ($x \to T_1$, $x \to T_2$).
* Returns a batch of waveforms (Shape: N x L).
concat_original:
* If true, adds the original clean audio to the output.
Common Configurations:
| Goal | Mode | Execution | Behavior | Output Shape |
|---|---|---|---|---|
| Standard Augmentation | parallel |
chain |
Pick 1 random transform. Apply it. | (L,) |
| Data Expansion | sequential |
independent |
Apply ALL transforms separately. Returns batch of variations. |
(N_aug, L) |
| Sequential Chain | sequential |
chain |
Apply ALL transforms in order to one audio. ($x \to T_1 \to T_2 \dots$) |
(L,) |
Available Augmentations¶
Complete details in Augmentation Reference.
Model Configuration¶
Structure¶
Frontend¶
frontend:
type: "wavlm" # Frontend type
args:
source: "huggingface"
ckpt_path: "microsoft/wavlm-base"
freeze: true
Frontend¶
Supported Frontends:
- Wav2Vec2 (
wav2vec2) - Parameters:source,ckpt_path,freeze. - WavLM (
wavlm) - Parameters:source,ckpt_path,freeze. - HuBERT (
hubert) - Parameters:source,ckpt_path,freeze. - MERT (
mert) - Parameters:ckpt_path,freeze,trust_remote_code. - EAT (
eat) - Parameters:ckpt_path,freeze,trust_remote_code.
Detailed usage in Frontend Reference.
Backend¶
backend:
type: "AASIST"
args:
input_dim: 768 # Must match frontend output
filts: [70, [1, 32], [32, 32], [32, 64], [64, 64]]
gat_dims: [64, 32]
pool_ratios: [0.5, 0.5, 0.5, 0.5]
temperatures: [2.0, 2.0, 100.0, 100.0]
Backend¶
Supported Backends:
- AASIST (
AASIST) - Parameters:input_dim,filts,gat_dims. - ECAPA-TDNN (
ECAPA_TDNN) - Parameters:channels,emb_dim. - RawNet2 (
RawNet2) - Parameters:filts,gru_node,emb_dim. - MLP (
MLP) - Parameters:input_dim,projection,pooling_type. - Res2Net (
Nes2Net) - Parameters:strides,filts.
Detailed usage in Backend Reference.
Loss Functions¶
Single loss:
Multiple losses:
loss:
- type: "OCSoftmax"
weight: 1.0
embedding_dim: 32
w_posi: 0.9
w_nega: 0.2
alpha: 20.0
- type: "CrossEntropy"
weight: 0.5
embedding_dim: 32
n_classes: 2
Loss Functions¶
Supported Losses:
- OC-Softmax (
OCSoftmax) - Parameters:embedding_dim,m_real,m_fake,alpha. - AM-Softmax (
AMSoftmax) - Parameters:embedding_dim,n_classes,m,s. - A-Softmax (
ASoftmax) - Parameters:embedding_dim,n_classes,m. - Cross Entropy (
CrossEntropy) - Parameters:embedding_dim,n_classes.
Detailed usage in Loss Reference.
Training Configuration¶
Structure¶
training:
trainer: "StandardTrainer"
device: "cuda"
# Loop settings
epochs: 50
gradient_accumulation_steps: 1
max_grad_norm: 1.0
# Logging
batch_log_interval: 50
# Evaluation
eval_every_epochs: 1
metrics: ["EER", "F1"]
# Checkpointing
monitor_metric: "EER"
monitor_mode: "min"
save_every_epochs: 5
early_stopping_patience: 10
# Optimizer
optimizer: {...}
# Scheduler
scheduler: {...}
# WandB (optional)
wandb: {...}
Core Parameters¶
Core Parameters¶
Parameters:
- trainer - (str) Trainer class (default:
"StandardTrainer"). - device - (str) Device (
cuda,cpu,cuda:0) (default:"cuda"). - epochs - (int) Total training epochs (default:
50). - gradient_accumulation_steps - (int) Accumulate gradients (default:
1). - max_grad_norm - (float) Gradient clipping norm (default:
1.0).
Logging¶
Logging¶
Parameters:
- batch_log_interval - (int) Log every N batches (default:
50).
Evaluation¶
Evaluation¶
Parameters:
- eval_every_epochs - (int) Evaluate every N epochs (default:
1). - eval_every_steps - (int) Evaluate every N steps (default:
null). - metrics - (list[str]) Metrics to compute (default:
["EER"]).
Note: If both
eval_every_epochsandeval_every_stepsare set, both will trigger evaluations. For example, setting both to 1 will cause evaluation at every step AND at the end of every epoch.
Checkpointing¶
Checkpointing¶
Parameters:
- monitor_metric - (str) Metric for best model (default:
"EER"). - monitor_mode - (str)
"min"or"max"(default:"min"). - save_every_epochs - (int) Save checkpoint every N epochs (default:
5). - early_stopping_patience - (int) Stop training if
monitor_metricdoes not improve for N consecutive evaluations (default:null/ disabled).- Note:
min_deltais not currently supported; any improvement counts.
- Note:
Optimizer¶
Optimizer¶
Detailed reference in Optimizers & Schedulers.
Common Parameters:
- lr - (float) Learning rate (default:
1e-6). - weight_decay - (float) L2 penalty (default:
1e-4). - betas - (tuple) (0.9, 0.999).
Scheduler¶
Scheduler¶
Detailed reference in Optimizers & Schedulers.
Common Parameters:
- T_max - (int) Max iterations.
- gamma - (float) Decay factor.
- eta_min - (float) Min learning rate.
WandB Integration¶
wandb:
enabled: true
project: "deepfense"
name: "experiment_1"
entity: "your-username" # Optional
tags: ["wav2vec2", "aasist"] # Optional
Example: Minimal Configuration¶
exp_name: "minimal_experiment"
output_dir: "./outputs/"
seed: 42
data:
sampling_rate: 16000
label_map: {bonafide: 1, spoof: 0}
train:
parquet_files: ["/path/to/train.parquet"]
batch_size: 32
base_transform:
- {type: load_audio, target_sr: 16000}
- {type: pad, max_len: 64600}
val:
parquet_files: ["/path/to/val.parquet"]
batch_size: 64
base_transform:
- {type: load_audio, target_sr: 16000}
- {type: pad, max_len: 64600}
model:
type: "ModularDetector"
frontend:
type: "wav2vec2"
args: {source: "huggingface", ckpt_path: "facebook/wav2vec2-base", freeze: true}
backend:
type: "MLP"
args: {input_dim: 768, projection: [256, 64], pooling_type: "mean"}
loss:
type: "CrossEntropy"
embedding_dim: 64
n_classes: 2
training:
epochs: 20
device: "cuda"
optimizer: {type: "adam", lr: 0.0001}
scheduler: {type: "cosine_annealing", T_max: 20}
Next Step: Component Reference →