Backend Components¶

Backends take the features extracted by the Frontend and map them to a fixed-dimensional embedding vector.

Available Backends¶

A Graph Attention Network (GAT) based architecture designed for audio deepfake detection and ASV spoofing.

Configuration Signature:

backend:
  type: AASIST
  args:
    filts: list
    gat_dims: list

Parameters:

Example:

backend:
  type: AASIST
  args:
    filts: [64, 128]
    gat_dims: [64, 32]

Strong backend for speaker verification, adapted for Deepfake Detection. Features channel attention (SE-Blocks) and multi-scale feature aggregation.

Configuration Signature:

backend:
  type: ECAPA_TDNN
  args:
    channels: int
    emb_dim: int

Parameters:

Example:

backend:
  type: ECAPA_TDNN
  args:
    channels: 512
    emb_dim: 192

A classic CNN-GRU architecture for ASV spoofing.

Configuration Signature:

backend:
  type: RawNet2
  args:
    filts: list
    gru_node: int
    emb_dim: int

Parameters:

Example:

backend:
  type: RawNet2
  args:
    filts: [128, 256, 512]
    gru_node: 1024
    emb_dim: 1024

A simple Multi-Layer Perceptron with configurable pooling. Good for SSL frontends (Wav2Vec2, WavLM) that already output high-level features.

Configuration Signature:

backend:
  type: MLP
  args:
    input_dim: int
    projection: list[int]
    pooling_type: string

Parameters:

input_dim - (int) Dimension of input features.
projection - (list[int]) List of hidden layer sizes (e.g., [128, 64]).
pooling_type - (str) Pooling method (mean, max, asp (Attentive Statistics Pooling)).

Example:

backend:
  type: MLP
  args:
    input_dim: 768
    projection: [512, 128]
    pooling_type: asp

A Res2Net-based convolutional architecture.

Configuration Signature:

backend:
  type: Nes2Net
  args:
    strides: list
    filts: list

Parameters:

Example:

backend:
  type: Nes2Net
  args:
    strides: [1, 2, 2]
    filts: [64, 128, 256]

Next Step: Loss Functions →